LINEAR-ALGEBRAIC GRAPH MINING

Size: px
Start display at page:

Download "LINEAR-ALGEBRAIC GRAPH MINING"

Transcription

1 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 1/22 LLNL-PRES New Applications of Computer Analysis to Biomedical Data Sets QB3 Seminar, UCSF Medical School, May 28 th, 215 LINEAR-ALGEBRAIC GRAPH MINING Geoffrey Sanders, CASC/LLNL Lawrence Livermore National Laboratory, P. O. Box 88, Livermore, CA 94551! This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-7NA27344

2 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 2/22 LLNL and LDRD LLNL is a DOE FFRDC Center for Applied Scientific Computing (CASC) Several of us work on Laboratory Directed Research and Development (LDRD) projects in HPC and Data Analysis Graph Analytics, Machine Learning, Network Analysis We are ALWAYS looking for domain scientist collaborators with interesting datasets or new data mining tasks DOE national labs have a history of building open HPC software for PDE-related applications (Physic Simulation) PETSc [H2], Trilinos [H4], Hypre [H3], Samrai [H1], etc. Desire to do so for graph mining applications

3 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 3/22 Outline 1. Introduction 2. Analytics that Rank 3. Analytics that Cluster 4. Analytics that Approximate Expensive Calculations 5. Current Research Directions

4 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 4/22 Graph Model Definitions Graph G(V,E) Vertices i, j in V Edges (i,j) in E Edge weights i j Undirected vs Directed? (i,j) and (j,i) Hypergraphs (i,j,k,l) and (p,q,r) in E Attributes? Vertex Labels Height, Gender, Profession Edge Labels Timestamp, volume

5 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 5/22 Difficult Topologies Scale-Free Small-World Community Structure Hierarchical Overlapping Heterogeneous in size, density, type, etc Other Structure Tree-Like Periphery

6 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 6/22 Web Data Commons Hyperlink Graph Crawled in 214, directed [D1] 1.7 B webpages, 64B hyperlinks

7 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 7/22 Spy Plot Autonomous System Graph [D2] vertices Graphs have natural sparse matrix representations vertices Linear algebra applies

8 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 8/22 Linear-Algebraic Kernels Linear Solve Eigensolve L x = b L x = λ x Matrix Factorization L F Tensor Factorization[T1] G t A r u k=1 w k k v k

9 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 9/22 Outline 1. Introduction 2. Analytics that Rank 3. Analytics that Cluster 4. Analytics that Approximate Expensive Calculations 5. Current Research Directions

10 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 1/22 Ranking Calculations Global Ranking? Ordered list Often only care about top few of the list PageRank [R1] Centrality measure Random walk Personalized PR [R2] Supervised Connection Subgraph [R3] Solve for direction Rank vertices 1

11 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 11/22 Exotic Ranking Calculations WalkScore[R4]: Meet a blend of complex constraints? My brother gets My score Worse than a buoy

12 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 12/22 Outline 1. Introduction 2. Analytics that Rank 3. Analytics that Cluster 4. Analytics that Approximate Expensive Calculations 5. Current Research Directions

13 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 13/22 Clustering Unsupervised? Spectral Clustering [O1,C4] Hard or Soft? sign(v).*(log 1 ( v +ε)+min{log 1 ( v +ε)}) Agglomerative [C1] Start with n groups Make local grouping decisions to maximize Modularity: comms! # " Internal edges $! & # % " Expected internal edges $ & % cc vec 1 cc 2 vec 3 Cc 4 ordered randomly Recursive Bipartite SC ordered by Fiedler vector O R I G I N A L V E R T E X S E T Reason split: vector splitting: connected components: Reason stopped: minimum cluster size max clusters

14 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 14/22 Overlapping (Co-)Clustering 2 HAIFENG XU, HANS DE STERCK AND GEOFFREY SANDERS A" B" C" D 1" Non-Negative 2" 3" MF[C2] Factors positive-valued L Feature'Matrix'F' 1" 2" 3" 6" 8" 9" 6" 1" 8" 1" A" B" 1" C" Latent Dirichlet D" Weighted'Bipar1te'Graph' Figure 1. Feature matrix F and induced weighted bipartite graph. Red dots correspond to row variables of F, and blue dots to column variables. The rows and columns may, e.g., represent LinkedIn users and skills, with the weights indicating how often a user s skill was endorsed by the user s connections. research groups, etc. These hierarchical structures are overlapping; for example, some professors may be active in multiple departments or faculties, and many skills (often even the more specialized ones) are taught in multiple degree programs. In a similar way, it is to be expected that many of the currently emerging online social networks also contain inherent overlapping hierarchical organization, in particular when they focus on a specific dimension of the human condition, like, e.g., the professional dimension. Consider for example the LinkedIn social network, where users connect to their business relations and acquaintances, and list user-defined skills and expertise on their user profiles that can be endorsed by their connections. Similar to the case of universities, it is clear that in a social network like LinkedIn there must be hierarchical overlapping groups of Interpreted as probabilities Coarsening [C5] Multilevel F Linked-in data G t users with similar skills and professions, and hierarchical overlapping groups of skill keywords that characterize professional groups. Figure 2. Bipartite graph hierarchy obtained by the FMCC algorithms. The input feature matrix is located at the bottom of the diagram. However, in contrast to the example of universities, in emerging social networks this hierarchy is not hard-coded into the structure of the network; if it were, it would seriously impede the growth and dynamical evolution of these networks. Since the hierarchy is not explicitly hard-coded into the structure of the network but is nevertheless present, it is at once a very interesting and a challenging problem to try to automatically generate a representation of this hierarchy from the Allocation [C3] Model a document as a weighted group of topics Each topic has individual vocabulary Document is a random combination of terms from its topics More general than termdocument data

15 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 15/22 Outline 1. Introduction 2. Analytics that Rank 3. Analytics that Cluster 4. Analytics that Approximate Expensive Calculations 5. Current Research Directions

16 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 16/22 Approximations to Expensive Calcs Triangle Counting[O3] Diagonal of A 3 is 6 x (# triangles) Mincut [O1] Estimate diagonal entries of A 3 Trace(A 3 ) = sum [ eigenvalues(a) 3 ] Nearly-planar coloring[o2] Maxcut [O4] Near Bi-Partite Structure [O5]

17 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 17/22 Outline 1. Introduction 2. Analytics that Rank 3. Analytics that Cluster 4. Analytics that Approximate Expensive Calculations 5. Current Research Directions

18 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 18/22 Important Extensions Figure 12: vector Matrix sparsity structure for example 3 under the reordering given by the Fiedler vector Figure 12: Matrix sparsity structure for example 3 under the reordering given by the Fiedler ordering in of row and column variables (right). The purple line in the figure (left) and associated ordering in of row and column variables (right). The purple (left) line inand theassociated figure on considered the right shows on the right shows the one-dimensional search space for row and column splittings by the one-dimensional search space for row and column splittings considered by the out-of-box undirected method. the out-of-box undirected method. Bipartite Graphs Tensors Time is a tensor dim. Causality? Co-cluster! Author Gather stats " " Dynamic Graphs [T2] Streaming where M [ADD THE NECESSARY CONSTRAINTS TO M]. Then, the eigenpairs of B ( o, x) and ( o, x) can be used to define a mapping into R2 such that the nodes are mapped to [DETERMINE PARAMETERS OF REGION] around vectors of length 1 at angles of 3, or 5 3. Spectrum of D 1 A " Im Directed Graphs Labeled Graphs Factor in labels Label anomalies? 13: Original graph from example 4 (left), randomly reordered (middle), and bipartized Figure 13: Original graph from example 4 (left), randomly reordered (middle),figure and bipartized where M [ADD THE NECESSARY CONSTRAINTS TO M]. Then, the eigenpairs of B ( o, x) and (right). (right). 2 ( o, x) can be used to define a mapping into R such that the nodes are mapped to [DETERMINE PARAMETERS OF REGION] around vectors of length 1 at angles of 3, or Highly-cyclic structure nz = Spectral Coordinates Im (vi wi) Im (vi wi).1 Im.2 Spectral Coordinates Re Figure 2:. Spectrum of D 1 A nz = Re (v + w ) i i Re Figure 2: Re (vi + wi) Figure 3:. 1, p,c 2 (Cc ). Entries of the eigenvector Cv = p,c v are vi = ip,c. Spectral Coordinates Spectral Coordinates.2.15 General c-cyclic structure For p =,..., c Bc = B C c.25 Conference

19 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 19/22 Summary Linear Algebra addresses a diverse set of graph analytics Linear Algebra kernels are somewhat scalable, implemented in many computing environments Often requires close interaction with math or computer scientist to tune to new type of data/analytic

20 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 2/22 References I of III HPC [H1] Wissink, et al. Large Scale Structured AMR Calculations Using the SAMRAI Framework, SC1 Proceedings, 21. LLNL tech report UCRL-JC [H2] Balay, et al. Efficient Management of Parallelism in Object Oriented Numerical Software Libraries, Modern Software Tools in Scientific Computing, 1997 [H3] Falgout et al. Design of the hypre Preconditioner Library, Proc. of the SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing, 1998 [H4] Heroux et al. An Overview of Trilinos, SNL Tech Report SAND , 23 Data [D1] Meusel et al., Web Data Commons 214 Hyperlink Graph, [D2] Leskovec et al. Stanford Large Network Dataset Collection,

21 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 21/22 References II of III Ranking [R1] Page. PageRank: Bringing order to the web. Stanford Digital Libraries Working Paper, 1997 [R2] Haveliwala. Topic-sensitive pagerank, In WWW pages , 22 [R3] Faloutsos et al. Fast Discovery of Connection Subgraphs, KDD 24 [R4] Walkscore, Clustering [C1] Blondel et al. Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment (1), P18, 28. [C2] Paatero et al. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 1994 [C3] Blei et al. Latent Dirichlet Allocation. Journal of Machine Learning, 23 [C4] von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 27 [C5] Xu et al. Fast Multilevel Co-Clustering: Unraveling the Multilevel Overlapping Cluster Structure of Social Network Data, submitted to Numerical Linear Algebra with Applications, May 215

22 UCSF QB3 Seminar 5/28/215 Linear Algebraic Graph Mining, 22/22 References III of III Tensors [T1] Kolda et al. Tensor Decompositions and Applications, SIAM Review, 28 [T2] Dunlavy et al. Clustering network data using graphs, hypergraphs, and tensors, lecture given at University of Montreal, May, 215 Discrete Optimization [O1] Fiedler. Algebraic connectivity of Graphs, Czechoslovak Mathematical Journal: 23 (98), [O2] Hu et al. On Maximum Differential Graph Coloring, Lecture Notes in Computer Science, 211 [O3] Tsourakakis et al. Spectral Counting of Triangles in Power-Law Networks via Element-Wise Sparsification, Social Network Analysis and Mining, 29. [O4] Trevisan. Max Cut and the Smallest Eigenvalue, SIAM J. Comput. 212 Earlier version in Proc. of 41st ACM STOC, 29 [O5] Kirkland et al. Bipartite subgraphs and the signless laplacian matrix. Applicable Analysis and Discrete Mathematics, 211.

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Statistical and computational challenges in networks and cybersecurity

Statistical and computational challenges in networks and cybersecurity Statistical and computational challenges in networks and cybersecurity Hugh Chipman Acadia University June 12, 2015 Statistical and computational challenges in networks and cybersecurity May 4-8, 2015,

More information

Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

Analysis of Internet Topologies: A Historical View

Analysis of Internet Topologies: A Historical View Analysis of Internet Topologies: A Historical View Mohamadreza Najiminaini, Laxmi Subedi, and Ljiljana Trajković Communication Networks Laboratory http://www.ensc.sfu.ca/cnl Simon Fraser University Vancouver,

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

Analysis of Internet Topologies

Analysis of Internet Topologies Analysis of Internet Topologies Ljiljana Trajković ljilja@cs.sfu.ca Communication Networks Laboratory http://www.ensc.sfu.ca/cnl School of Engineering Science Simon Fraser University, Vancouver, British

More information

Social Networks and Social Media

Social Networks and Social Media Social Networks and Social Media Social Media: Many-to-Many Social Networking Content Sharing Social Media Blogs Microblogging Wiki Forum 2 Characteristics of Social Media Consumers become Producers Rich

More information

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks

A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks H. T. Kung Dario Vlah {htk, dario}@eecs.harvard.edu Harvard School of Engineering and Applied Sciences

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

More information

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu

More information

CIS 700: algorithms for Big Data

CIS 700: algorithms for Big Data CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv

More information

Walk-Based Centrality and Communicability Measures for Network Analysis

Walk-Based Centrality and Communicability Measures for Network Analysis Walk-Based Centrality and Communicability Measures for Network Analysis Michele Benzi Department of Mathematics and Computer Science Emory University Atlanta, Georgia, USA Workshop on Innovative Clustering

More information

Graph Processing and Social Networks

Graph Processing and Social Networks Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

SIAM PP 2014! MapReduce in Scientific Computing! February 19, 2014

SIAM PP 2014! MapReduce in Scientific Computing! February 19, 2014 SIAM PP 2014! MapReduce in Scientific Computing! February 19, 2014 Paul G. Constantine! Applied Math & Stats! Colorado School of Mines David F. Gleich! Computer Science! Purdue University Hans De Sterck!

More information

Expansion Properties of Large Social Graphs

Expansion Properties of Large Social Graphs Expansion Properties of Large Social Graphs Fragkiskos D. Malliaros 1 and Vasileios Megalooikonomou 1,2 1 Computer Engineering and Informatics Department University of Patras, 26500 Rio, Greece 2 Data

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008 A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

Big Graph Processing: Some Background

Big Graph Processing: Some Background Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs

More information

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph

More information

Mining Social-Network Graphs

Mining Social-Network Graphs 342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

Key words. cluster analysis, k-means, eigen decomposition, Laplacian matrix, data visualization, Fisher s Iris data set

Key words. cluster analysis, k-means, eigen decomposition, Laplacian matrix, data visualization, Fisher s Iris data set SPECTRAL CLUSTERING AND VISUALIZATION: A NOVEL CLUSTERING OF FISHER S IRIS DATA SET DAVID BENSON-PUTNINS, MARGARET BONFARDIN, MEAGAN E. MAGNONI, AND DANIEL MARTIN Advisors: Carl D. Meyer and Charles D.

More information

Netzcope - A Tool to Display and Analyze Complex Networks

Netzcope - A Tool to Display and Analyze Complex Networks NEMO Working Paper #16 Netzcope - A Tool to Display and Analyze Complex Networks Oleg Strogan and Ludwig Streit (CCM, University of Madeira) Supported by the EU FP6-NEST-Adventure Programme Contract n

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

More information

Soft Clustering with Projections: PCA, ICA, and Laplacian

Soft Clustering with Projections: PCA, ICA, and Laplacian 1 Soft Clustering with Projections: PCA, ICA, and Laplacian David Gleich and Leonid Zhukov Abstract In this paper we present a comparison of three projection methods that use the eigenvectors of a matrix

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH 31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Advanced Computational Software

Advanced Computational Software Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 June 10 2011 Outline Quick review Fancy Linear Algebra libraries - ScaLAPACK -PETSc

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

Social Network Mining

Social Network Mining Social Network Mining Data Mining November 11, 2013 Frank Takes (ftakes@liacs.nl) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics

More information

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then

More information

ADVANCED MACHINE LEARNING. Introduction

ADVANCED MACHINE LEARNING. Introduction 1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please

More information

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis Yusuf Yaslan and Zehra Cataltepe Istanbul Technical University, Computer Engineering Department, Maslak 34469 Istanbul, Turkey

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II) E6893 Big Data Analytics Lecture 10: Linked Big Data Graph Computing (II) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and

More information

Extracting Information from Social Networks

Extracting Information from Social Networks Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,

More information

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12 MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

SALEM COMMUNITY COLLEGE Carneys Point, New Jersey 08069 COURSE SYLLABUS COVER SHEET. Action Taken (Please Check One) New Course Initiated

SALEM COMMUNITY COLLEGE Carneys Point, New Jersey 08069 COURSE SYLLABUS COVER SHEET. Action Taken (Please Check One) New Course Initiated SALEM COMMUNITY COLLEGE Carneys Point, New Jersey 08069 COURSE SYLLABUS COVER SHEET Course Title Course Number Department Linear Algebra Mathematics MAT-240 Action Taken (Please Check One) New Course Initiated

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

Hadoop SNS. renren.com. Saturday, December 3, 11

Hadoop SNS. renren.com. Saturday, December 3, 11 Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December

More information

Gephi Tutorial Quick Start

Gephi Tutorial Quick Start Gephi Tutorial Welcome to this introduction tutorial. It will guide you to the basic steps of network visualization and manipulation in Gephi. Gephi version 0.7alpha2 was used to do this tutorial. Get

More information

Application of Graph Theory to

Application of Graph Theory to Application of Graph Theory to Requirements Traceability A methodology for visualization of large requirements sets Sam Brown L-3 Communications This presentation consists of L-3 STRATIS general capabilities

More information

HIGH PERFORMANCE BIG DATA ANALYTICS

HIGH PERFORMANCE BIG DATA ANALYTICS HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Enhancing the Ranking of a Web Page in the Ocean of Data

Enhancing the Ranking of a Web Page in the Ocean of Data Database Systems Journal vol. IV, no. 3/2013 3 Enhancing the Ranking of a Web Page in the Ocean of Data Hitesh KUMAR SHARMA University of Petroleum and Energy Studies, India hkshitesh@gmail.com In today

More information

Which universities lead and lag? Toward university rankings based on scholarly output

Which universities lead and lag? Toward university rankings based on scholarly output Which universities lead and lag? Toward university rankings based on scholarly output Daniel Ramage and Christopher D. Manning Computer Science Department Stanford University Stanford, California 94305

More information

Large-Scale Spectral Clustering on Graphs

Large-Scale Spectral Clustering on Graphs Large-Scale Spectral Clustering on Graphs Jialu Liu Chi Wang Marina Danilevsky Jiawei Han University of Illinois at Urbana-Champaign, Urbana, IL {jliu64, chiwang1, danilev1, hanj}@illinois.edu Abstract

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Graph Mining Techniques for Social Media Analysis

Graph Mining Techniques for Social Media Analysis Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented

More information

Introduction to Graph Mining

Introduction to Graph Mining Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain

More information

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities

More information

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES Silvija Vlah Kristina Soric Visnja Vojvodic Rosenzweig Department of Mathematics

More information

Scaling Up HBase, Hive, Pegasus

Scaling Up HBase, Hive, Pegasus CSE 6242 A / CS 4803 DVA Mar 7, 2013 Scaling Up HBase, Hive, Pegasus Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2016 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Proposal for Undergraduate Certificate in Large Data Analysis

Proposal for Undergraduate Certificate in Large Data Analysis Proposal for Undergraduate Certificate in Large Data Analysis To: Helena Dettmer, Associate Dean for Undergraduate Programs and Curriculum From: Suely Oliveira (Computer Science), Kate Cowles (Statistics),

More information

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Sketch As a Tool for Numerical Linear Algebra

Sketch As a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn)

More information

Distributed R for Big Data

Distributed R for Big Data Distributed R for Big Data Indrajit Roy, HP Labs November 2013 Team: Shivara m Erik Kyungyon g Alvin Rob Vanish A Big Data story Once upon a time, a customer in distress had. 2+ billion rows of financial

More information

Eigencuts on Hadoop: Spectral Clustering for Image Segmentation at Scale

Eigencuts on Hadoop: Spectral Clustering for Image Segmentation at Scale Eigencuts on Hadoop: Spectral Clustering for Image Segmentation at Scale Shannon Quinn squinn@cmu.edu, spq1@pitt.edu Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology

More information

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued). MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0

More information

Francesco Sorrentino Department of Mechanical Engineering

Francesco Sorrentino Department of Mechanical Engineering Master stability function approaches to analyze stability of the synchronous evolution for hypernetworks and of synchronized clusters for networks with symmetries Francesco Sorrentino Department of Mechanical

More information

Small Maximal Independent Sets and Faster Exact Graph Coloring

Small Maximal Independent Sets and Faster Exact Graph Coloring Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected

More information

An Effective Way to Ensemble the Clusters

An Effective Way to Ensemble the Clusters An Effective Way to Ensemble the Clusters R.Saranya 1, Vincila.A 2, Anila Glory.H 3 P.G Student, Department of Computer Science Engineering, Parisutham Institute of Technology and Science, Thanjavur, Tamilnadu,

More information

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging Mathematical Problems in Engineering, Article ID 578713, 6 pages http://dx.doi.org/10.1155/2014/578713 Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

M E M O R A N D U M. Faculty Senate Approved April 2, 2015

M E M O R A N D U M. Faculty Senate Approved April 2, 2015 M E M O R A N D U M Faculty Senate Approved April 2, 2015 TO: FROM: Deans and Chairs Becky Bitter, Sr. Assistant Registrar DATE: March 26, 2015 SUBJECT: Minor Change Bulletin No. 11 The courses listed

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Jure Leskovec (@jure) Stanford University

Jure Leskovec (@jure) Stanford University Jure Leskovec (@jure) Stanford University KDD Summer School, Beijing, August 2012 8/10/2012 Jure Leskovec (@jure), KDD Summer School 2012 2 Graph: Kronecker graphs Graph Node attributes: MAG model Graph

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Rank one SVD: un algorithm pour la visualisation d une matrice non négative

Rank one SVD: un algorithm pour la visualisation d une matrice non négative Rank one SVD: un algorithm pour la visualisation d une matrice non négative L. Labiod and M. Nadif LIPADE - Universite ParisDescartes, France ECAIS 2013 November 7, 2013 Outline Outline 1 Data visualization

More information