LINEAR-ALGEBRAIC GRAPH MINING
|
|
- Rosanna Higgins
- 8 years ago
- Views:
Transcription
1 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 1/22 LLNL-PRES New Applications of Computer Analysis to Biomedical Data Sets QB3 Seminar, UCSF Medical School, May 28 th, 215 LINEAR-ALGEBRAIC GRAPH MINING Geoffrey Sanders, CASC/LLNL Lawrence Livermore National Laboratory, P. O. Box 88, Livermore, CA 94551! This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-7NA27344
2 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 2/22 LLNL and LDRD LLNL is a DOE FFRDC Center for Applied Scientific Computing (CASC) Several of us work on Laboratory Directed Research and Development (LDRD) projects in HPC and Data Analysis Graph Analytics, Machine Learning, Network Analysis We are ALWAYS looking for domain scientist collaborators with interesting datasets or new data mining tasks DOE national labs have a history of building open HPC software for PDE-related applications (Physic Simulation) PETSc [H2], Trilinos [H4], Hypre [H3], Samrai [H1], etc. Desire to do so for graph mining applications
3 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 3/22 Outline 1. Introduction 2. Analytics that Rank 3. Analytics that Cluster 4. Analytics that Approximate Expensive Calculations 5. Current Research Directions
4 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 4/22 Graph Model Definitions Graph G(V,E) Vertices i, j in V Edges (i,j) in E Edge weights i j Undirected vs Directed? (i,j) and (j,i) Hypergraphs (i,j,k,l) and (p,q,r) in E Attributes? Vertex Labels Height, Gender, Profession Edge Labels Timestamp, volume
5 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 5/22 Difficult Topologies Scale-Free Small-World Community Structure Hierarchical Overlapping Heterogeneous in size, density, type, etc Other Structure Tree-Like Periphery
6 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 6/22 Web Data Commons Hyperlink Graph Crawled in 214, directed [D1] 1.7 B webpages, 64B hyperlinks
7 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 7/22 Spy Plot Autonomous System Graph [D2] vertices Graphs have natural sparse matrix representations vertices Linear algebra applies
8 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 8/22 Linear-Algebraic Kernels Linear Solve Eigensolve L x = b L x = λ x Matrix Factorization L F Tensor Factorization[T1] G t A r u k=1 w k k v k
9 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 9/22 Outline 1. Introduction 2. Analytics that Rank 3. Analytics that Cluster 4. Analytics that Approximate Expensive Calculations 5. Current Research Directions
10 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 1/22 Ranking Calculations Global Ranking? Ordered list Often only care about top few of the list PageRank [R1] Centrality measure Random walk Personalized PR [R2] Supervised Connection Subgraph [R3] Solve for direction Rank vertices 1
11 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 11/22 Exotic Ranking Calculations WalkScore[R4]: Meet a blend of complex constraints? My brother gets My score Worse than a buoy
12 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 12/22 Outline 1. Introduction 2. Analytics that Rank 3. Analytics that Cluster 4. Analytics that Approximate Expensive Calculations 5. Current Research Directions
13 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 13/22 Clustering Unsupervised? Spectral Clustering [O1,C4] Hard or Soft? sign(v).*(log 1 ( v +ε)+min{log 1 ( v +ε)}) Agglomerative [C1] Start with n groups Make local grouping decisions to maximize Modularity: comms! # " Internal edges $! & # % " Expected internal edges $ & % cc vec 1 cc 2 vec 3 Cc 4 ordered randomly Recursive Bipartite SC ordered by Fiedler vector O R I G I N A L V E R T E X S E T Reason split: vector splitting: connected components: Reason stopped: minimum cluster size max clusters
14 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 14/22 Overlapping (Co-)Clustering 2 HAIFENG XU, HANS DE STERCK AND GEOFFREY SANDERS A" B" C" D 1" Non-Negative 2" 3" MF[C2] Factors positive-valued L Feature'Matrix'F' 1" 2" 3" 6" 8" 9" 6" 1" 8" 1" A" B" 1" C" Latent Dirichlet D" Weighted'Bipar1te'Graph' Figure 1. Feature matrix F and induced weighted bipartite graph. Red dots correspond to row variables of F, and blue dots to column variables. The rows and columns may, e.g., represent LinkedIn users and skills, with the weights indicating how often a user s skill was endorsed by the user s connections. research groups, etc. These hierarchical structures are overlapping; for example, some professors may be active in multiple departments or faculties, and many skills (often even the more specialized ones) are taught in multiple degree programs. In a similar way, it is to be expected that many of the currently emerging online social networks also contain inherent overlapping hierarchical organization, in particular when they focus on a specific dimension of the human condition, like, e.g., the professional dimension. Consider for example the LinkedIn social network, where users connect to their business relations and acquaintances, and list user-defined skills and expertise on their user profiles that can be endorsed by their connections. Similar to the case of universities, it is clear that in a social network like LinkedIn there must be hierarchical overlapping groups of Interpreted as probabilities Coarsening [C5] Multilevel F Linked-in data G t users with similar skills and professions, and hierarchical overlapping groups of skill keywords that characterize professional groups. Figure 2. Bipartite graph hierarchy obtained by the FMCC algorithms. The input feature matrix is located at the bottom of the diagram. However, in contrast to the example of universities, in emerging social networks this hierarchy is not hard-coded into the structure of the network; if it were, it would seriously impede the growth and dynamical evolution of these networks. Since the hierarchy is not explicitly hard-coded into the structure of the network but is nevertheless present, it is at once a very interesting and a challenging problem to try to automatically generate a representation of this hierarchy from the Allocation [C3] Model a document as a weighted group of topics Each topic has individual vocabulary Document is a random combination of terms from its topics More general than termdocument data
15 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 15/22 Outline 1. Introduction 2. Analytics that Rank 3. Analytics that Cluster 4. Analytics that Approximate Expensive Calculations 5. Current Research Directions
16 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 16/22 Approximations to Expensive Calcs Triangle Counting[O3] Diagonal of A 3 is 6 x (# triangles) Mincut [O1] Estimate diagonal entries of A 3 Trace(A 3 ) = sum [ eigenvalues(a) 3 ] Nearly-planar coloring[o2] Maxcut [O4] Near Bi-Partite Structure [O5]
17 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 17/22 Outline 1. Introduction 2. Analytics that Rank 3. Analytics that Cluster 4. Analytics that Approximate Expensive Calculations 5. Current Research Directions
18 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 18/22 Important Extensions Figure 12: vector Matrix sparsity structure for example 3 under the reordering given by the Fiedler vector Figure 12: Matrix sparsity structure for example 3 under the reordering given by the Fiedler ordering in of row and column variables (right). The purple line in the figure (left) and associated ordering in of row and column variables (right). The purple (left) line inand theassociated figure on considered the right shows on the right shows the one-dimensional search space for row and column splittings by the one-dimensional search space for row and column splittings considered by the out-of-box undirected method. the out-of-box undirected method. Bipartite Graphs Tensors Time is a tensor dim. Causality? Co-cluster! Author Gather stats " " Dynamic Graphs [T2] Streaming where M [ADD THE NECESSARY CONSTRAINTS TO M]. Then, the eigenpairs of B ( o, x) and ( o, x) can be used to define a mapping into R2 such that the nodes are mapped to [DETERMINE PARAMETERS OF REGION] around vectors of length 1 at angles of 3, or 5 3. Spectrum of D 1 A " Im Directed Graphs Labeled Graphs Factor in labels Label anomalies? 13: Original graph from example 4 (left), randomly reordered (middle), and bipartized Figure 13: Original graph from example 4 (left), randomly reordered (middle),figure and bipartized where M [ADD THE NECESSARY CONSTRAINTS TO M]. Then, the eigenpairs of B ( o, x) and (right). (right). 2 ( o, x) can be used to define a mapping into R such that the nodes are mapped to [DETERMINE PARAMETERS OF REGION] around vectors of length 1 at angles of 3, or Highly-cyclic structure nz = Spectral Coordinates Im (vi wi) Im (vi wi).1 Im.2 Spectral Coordinates Re Figure 2:. Spectrum of D 1 A nz = Re (v + w ) i i Re Figure 2: Re (vi + wi) Figure 3:. 1, p,c 2 (Cc ). Entries of the eigenvector Cv = p,c v are vi = ip,c. Spectral Coordinates Spectral Coordinates.2.15 General c-cyclic structure For p =,..., c Bc = B C c.25 Conference
19 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 19/22 Summary Linear Algebra addresses a diverse set of graph analytics Linear Algebra kernels are somewhat scalable, implemented in many computing environments Often requires close interaction with math or computer scientist to tune to new type of data/analytic
20 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 2/22 References I of III HPC [H1] Wissink, et al. Large Scale Structured AMR Calculations Using the SAMRAI Framework, SC1 Proceedings, 21. LLNL tech report UCRL-JC [H2] Balay, et al. Efficient Management of Parallelism in Object Oriented Numerical Software Libraries, Modern Software Tools in Scientific Computing, 1997 [H3] Falgout et al. Design of the hypre Preconditioner Library, Proc. of the SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing, 1998 [H4] Heroux et al. An Overview of Trilinos, SNL Tech Report SAND , 23 Data [D1] Meusel et al., Web Data Commons 214 Hyperlink Graph, [D2] Leskovec et al. Stanford Large Network Dataset Collection,
21 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 21/22 References II of III Ranking [R1] Page. PageRank: Bringing order to the web. Stanford Digital Libraries Working Paper, 1997 [R2] Haveliwala. Topic-sensitive pagerank, In WWW pages , 22 [R3] Faloutsos et al. Fast Discovery of Connection Subgraphs, KDD 24 [R4] Walkscore, Clustering [C1] Blondel et al. Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment (1), P18, 28. [C2] Paatero et al. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 1994 [C3] Blei et al. Latent Dirichlet Allocation. Journal of Machine Learning, 23 [C4] von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 27 [C5] Xu et al. Fast Multilevel Co-Clustering: Unraveling the Multilevel Overlapping Cluster Structure of Social Network Data, submitted to Numerical Linear Algebra with Applications, May 215
22 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 22/22 References III of III Tensors [T1] Kolda et al. Tensor Decompositions and Applications, SIAM Review, 28 [T2] Dunlavy et al. Clustering network data using graphs, hypergraphs, and tensors, lecture given at University of Montreal, May, 215 Discrete Optimization [O1] Fiedler. Algebraic connectivity of Graphs, Czechoslovak Mathematical Journal: 23 (98), [O2] Hu et al. On Maximum Differential Graph Coloring, Lecture Notes in Computer Science, 211 [O3] Tsourakakis et al. Spectral Counting of Triangles in Power-Law Networks via Element-Wise Sparsification, Social Network Analysis and Mining, 29. [O4] Trevisan. Max Cut and the Smallest Eigenvalue, SIAM J. Comput. 212 Earlier version in Proc. of 41st ACM STOC, 29 [O5] Kirkland et al. Bipartite subgraphs and the signless laplacian matrix. Applicable Analysis and Discrete Mathematics, 211.
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
More informationStatistical and computational challenges in networks and cybersecurity
Statistical and computational challenges in networks and cybersecurity Hugh Chipman Acadia University June 12, 2015 Statistical and computational challenges in networks and cybersecurity May 4-8, 2015,
More informationComplex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications
More informationFast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
More informationNETZCOPE - a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
More informationAnalysis of Internet Topologies: A Historical View
Analysis of Internet Topologies: A Historical View Mohamadreza Najiminaini, Laxmi Subedi, and Ljiljana Trajković Communication Networks Laboratory http://www.ensc.sfu.ca/cnl Simon Fraser University Vancouver,
More informationBig Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
More informationUSE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS
USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This
More informationPractical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationAnalysis of Internet Topologies
Analysis of Internet Topologies Ljiljana Trajković ljilja@cs.sfu.ca Communication Networks Laboratory http://www.ensc.sfu.ca/cnl School of Engineering Science Simon Fraser University, Vancouver, British
More informationSocial Networks and Social Media
Social Networks and Social Media Social Media: Many-to-Many Social Networking Content Sharing Social Media Blogs Microblogging Wiki Forum 2 Characteristics of Social Media Consumers become Producers Rich
More informationCollective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University
Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationA Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks
A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks H. T. Kung Dario Vlah {htk, dario}@eecs.harvard.edu Harvard School of Engineering and Applied Sciences
More informationPerformance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
More informationComplex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
More informationCIS 700: algorithms for Big Data
CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv
More informationWalk-Based Centrality and Communicability Measures for Network Analysis
Walk-Based Centrality and Communicability Measures for Network Analysis Michele Benzi Department of Mathematics and Computer Science Emory University Atlanta, Georgia, USA Workshop on Innovative Clustering
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationSIAM PP 2014! MapReduce in Scientific Computing! February 19, 2014
SIAM PP 2014! MapReduce in Scientific Computing! February 19, 2014 Paul G. Constantine! Applied Math & Stats! Colorado School of Mines David F. Gleich! Computer Science! Purdue University Hans De Sterck!
More informationExpansion Properties of Large Social Graphs
Expansion Properties of Large Social Graphs Fragkiskos D. Malliaros 1 and Vasileios Megalooikonomou 1,2 1 Computer Engineering and Informatics Department University of Patras, 26500 Rio, Greece 2 Data
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationYousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008
A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity
More informationBig Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
More informationA scalable multilevel algorithm for graph clustering and community structure detection
A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationParallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
More informationMining Social-Network Graphs
342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationKey words. cluster analysis, k-means, eigen decomposition, Laplacian matrix, data visualization, Fisher s Iris data set
SPECTRAL CLUSTERING AND VISUALIZATION: A NOVEL CLUSTERING OF FISHER S IRIS DATA SET DAVID BENSON-PUTNINS, MARGARET BONFARDIN, MEAGAN E. MAGNONI, AND DANIEL MARTIN Advisors: Carl D. Meyer and Charles D.
More informationNetzcope - A Tool to Display and Analyze Complex Networks
NEMO Working Paper #16 Netzcope - A Tool to Display and Analyze Complex Networks Oleg Strogan and Ludwig Streit (CCM, University of Madeira) Supported by the EU FP6-NEST-Adventure Programme Contract n
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationA Survey on Outlier Detection Techniques for Credit Card Fraud Detection
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud
More informationSoft Clustering with Projections: PCA, ICA, and Laplacian
1 Soft Clustering with Projections: PCA, ICA, and Laplacian David Gleich and Leonid Zhukov Abstract In this paper we present a comparison of three projection methods that use the eigenvectors of a matrix
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationSHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH
31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationAdvanced Computational Software
Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 June 10 2011 Outline Quick review Fancy Linear Algebra libraries - ScaLAPACK -PETSc
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
More informationSocial Network Mining
Social Network Mining Data Mining November 11, 2013 Frank Takes (ftakes@liacs.nl) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationA STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then
More informationADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationMALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
More informationA Comparison Framework of Similarity Metrics Used for Web Access Log Analysis
A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis Yusuf Yaslan and Zehra Cataltepe Istanbul Technical University, Computer Engineering Department, Maslak 34469 Istanbul, Turkey
More informationSocial Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
More information! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)
E6893 Big Data Analytics Lecture 10: Linked Big Data Graph Computing (II) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
More informationExtracting Information from Social Networks
Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationSALEM COMMUNITY COLLEGE Carneys Point, New Jersey 08069 COURSE SYLLABUS COVER SHEET. Action Taken (Please Check One) New Course Initiated
SALEM COMMUNITY COLLEGE Carneys Point, New Jersey 08069 COURSE SYLLABUS COVER SHEET Course Title Course Number Department Linear Algebra Mathematics MAT-240 Action Taken (Please Check One) New Course Initiated
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationHadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
More informationGephi Tutorial Quick Start
Gephi Tutorial Welcome to this introduction tutorial. It will guide you to the basic steps of network visualization and manipulation in Gephi. Gephi version 0.7alpha2 was used to do this tutorial. Get
More informationApplication of Graph Theory to
Application of Graph Theory to Requirements Traceability A methodology for visualization of large requirements sets Sam Brown L-3 Communications This presentation consists of L-3 STRATIS general capabilities
More informationHIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationEnhancing the Ranking of a Web Page in the Ocean of Data
Database Systems Journal vol. IV, no. 3/2013 3 Enhancing the Ranking of a Web Page in the Ocean of Data Hitesh KUMAR SHARMA University of Petroleum and Energy Studies, India hkshitesh@gmail.com In today
More informationWhich universities lead and lag? Toward university rankings based on scholarly output
Which universities lead and lag? Toward university rankings based on scholarly output Daniel Ramage and Christopher D. Manning Computer Science Department Stanford University Stanford, California 94305
More informationLarge-Scale Spectral Clustering on Graphs
Large-Scale Spectral Clustering on Graphs Jialu Liu Chi Wang Marina Danilevsky Jiawei Han University of Illinois at Urbana-Champaign, Urbana, IL {jliu64, chiwang1, danilev1, hanj}@illinois.edu Abstract
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationGraph Mining Techniques for Social Media Analysis
Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented
More informationIntroduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
More informationCommunity Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer
Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities
More informationNEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES
NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES Silvija Vlah Kristina Soric Visnja Vojvodic Rosenzweig Department of Mathematics
More informationScaling Up HBase, Hive, Pegasus
CSE 6242 A / CS 4803 DVA Mar 7, 2013 Scaling Up HBase, Hive, Pegasus Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationProposal for Undergraduate Certificate in Large Data Analysis
Proposal for Undergraduate Certificate in Large Data Analysis To: Helena Dettmer, Associate Dean for Undergraduate Programs and Curriculum From: Suely Oliveira (Computer Science), Kate Cowles (Statistics),
More informationBUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationSketch As a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)
More informationDistributed R for Big Data
Distributed R for Big Data Indrajit Roy, HP Labs November 2013 Team: Shivara m Erik Kyungyon g Alvin Rob Vanish A Big Data story Once upon a time, a customer in distress had. 2+ billion rows of financial
More informationEigencuts on Hadoop: Spectral Clustering for Image Segmentation at Scale
Eigencuts on Hadoop: Spectral Clustering for Image Segmentation at Scale Shannon Quinn squinn@cmu.edu, spq1@pitt.edu Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology
More informationMATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
More informationFrancesco Sorrentino Department of Mechanical Engineering
Master stability function approaches to analyze stability of the synchronous evolution for hypernetworks and of synchronized clusters for networks with symmetries Francesco Sorrentino Department of Mechanical
More informationSmall Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
More informationAn Effective Way to Ensemble the Clusters
An Effective Way to Ensemble the Clusters R.Saranya 1, Vincila.A 2, Anila Glory.H 3 P.G Student, Department of Computer Science Engineering, Parisutham Institute of Technology and Science, Thanjavur, Tamilnadu,
More informationResearch Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging
Mathematical Problems in Engineering, Article ID 578713, 6 pages http://dx.doi.org/10.1155/2014/578713 Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of
More informationData Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction
Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration
More informationProximal mapping via network optimization
L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
More informationM E M O R A N D U M. Faculty Senate Approved April 2, 2015
M E M O R A N D U M Faculty Senate Approved April 2, 2015 TO: FROM: Deans and Chairs Becky Bitter, Sr. Assistant Registrar DATE: March 26, 2015 SUBJECT: Minor Change Bulletin No. 11 The courses listed
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationJure Leskovec (@jure) Stanford University
Jure Leskovec (@jure) Stanford University KDD Summer School, Beijing, August 2012 8/10/2012 Jure Leskovec (@jure), KDD Summer School 2012 2 Graph: Kronecker graphs Graph Node attributes: MAG model Graph
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationSubgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationRank one SVD: un algorithm pour la visualisation d une matrice non négative
Rank one SVD: un algorithm pour la visualisation d une matrice non négative L. Labiod and M. Nadif LIPADE - Universite ParisDescartes, France ECAIS 2013 November 7, 2013 Outline Outline 1 Data visualization
More information