An Extensible Graph Toolkit for Mathematica

Size: px
Start display at page:

Download "An Extensible Graph Toolkit for Mathematica"

Transcription

1 An Extensible Graph Toolkit for Mathematica Eytan Bakshy SI 708 Introduction & Motivation The Extensible Graph Toolkit for Mathematica, or EGraph, is a framework for studying networks. The toolkit is a lightweight, extensible object framework that poses significant advantages over the two standard methods of representing graphs in Mathematica: rule lists and Combinatorica graphs. This tutorial demonstrates the usage of the kit in the analysis and visualization of static and evolving networks. Graph Programming with Rule Lists Rule-lists can be used with some of Mathematica's more modern graph utilities such as adaptive algorithms for computing all-pairs shortest paths, pseudodiameters, PageRank, and finding strong and weakly connected components. The objects in rule lists can easily be plotted using GraphPlot[]. Since the rules can contain arbitrary objects, text or graphics objects are automatically rendered inline as nodes. Here are a few examples of how one constructs a graph as a rule list, finds its weakly connected components, and use GraphPlot to plot data GraphUtilities` rulegraph " "," " 3,4 5,"a" " ","a" 3,6 7,7,7 " "," " WeakComponents rulegraph GraphPlot rulegraph GraphPlot rulegraph,vertexlabeling True,, 3, a, 6, 7,, 4, 5 6 a

2 2 EGraph Paper.nb Pitfalls with Combinatorica Graph objects and the EGraph Structure Combinatorica is a package that covers many areas of discrete mathematics, including graph theory. It has a generic Graph object, which allows users to add labels, edge weights, and progressively consruct graphs. It also has a number of algorithms to compute basic graph measures. Unfortunately, the structure is not well optimized for large graphs. In this section, we will compare the performance characteristics of the Combinatorica Graph structure with the EGraph structure. Combinatorica` In the following series of experiments, it is shown that time to compute a vertex's nearest neighbor scales linearly with the number of nodes in the graph (in this case, a random graph), using Mathematica's built-in graph structure. The EGraph structure on the other hand uses a double-hash structure so that these operations can be carried out in constant time. GraphTiming Table n,first Timing Neighborhood,, & RandomGraph n,2 n, n,00,000,00 EGraphTiming Table n,first Timing AllNeighbors, & RandomEGraph n,2 n, n,00,000,00 ; ListPlot GraphTiming,EGraphTiming,AxesOrigin 0,0, PlotLabel "Neighborhood access time vs number of nodes", AxesLabel "num nodes","access time Neighborhood access time vs number of nodes access time num nodes The difference in performance is even more pronounced when one compares Combinatorica's shortest path algorithm to the one used by EGraph. gives the time to find the shortest path length for random nodes, averaged over 50 trials AveragePathLengthTiming graph,numnodes,pathfunction : Mean Table First Timing PathFunction graph,randominteger,numnodes,randominteger,numnodes GraphTiming Table n, AveragePathLengthTiming RandomGraph n,2 n,n,shortestpath, n,00,000 EGraphTiming Table n, AveragePathLengthTiming RandomEGraph n,2 n,n,pathlength, n,00,000 ; ListPlot GraphTiming,EGraphTiming,AxesOrigin 0,0, PlotLabel "Shortest path calculation time vs number of nodes", AxesLabel "num nodes"," time Shortest path calculation time vs number of nodes time num nodes

3 EGraph Paper.nb 3 Getting Started with the EGraph Object The EGraph structure consists of a double-hash sparse matrix representation of the graph, with forward and backwards index of vertex labels. These labels may be of any data format. In most circumstances, the labels will either be integers or strings, but the toolkit supports labels of any datatype. Finally, the EGraph object contains an attribute store, which can hold arbitrary data about the graph, including individual node properties. An EGraph object can be instantiated from a list of vertex label pairs, or rules, or, as a sparse adjacency matrix ToEGraph 0,,,2, 2,3, 0,2 ToEGraph 0, 2,2 3,0 2 ToEGraph SparseArray i,j If RandomReal.0 0.,,0, 4,4 EGraph of 4 nodes EGraph of 4 nodes EGraph of 4 nodes EGraph structures are capable of dealing with non-numerical data as well. Consider the network of trade partners in the UN importlinks Select Flatten Thread Rule CountryData,"ImportPartners", & CountryData "UN" exportlinks Select Flatten Thread Rule,CountryData,"ExportPartners" & CountryData "UN" tradedata Join importlinks,exportlinks Short tradedata tradegraph ToEGraph tradedata Pakistan Afghanistan, UnitedStates Afghanistan, 2048, Zimbabwe Italy, Zimbabwe Germany EGraph of 98 nodes The graph structure itself is smaller than the base data, because vertex labels are only stored in the backwards and forwards index hash. For larger networks, like the Yahoo! QA network, this difference can be quite significant. ByteCount tradedata ByteCount tradegraph Labels can easily be extracted from the graph and used to label the output of network metrics Short VertexLabels tradegraph Short PageRanks tradegraph Afghanistan, Albania, Algeria, AmericanSamoa, Andorra, 88, Venezuela, Vietnam, Yemen, Zambia, Zimbabwe Afghanistan , Albania , 95, Zimbabwe The EGraph structure can easily be converted into other commonly used representations, like matiricies and ordered pairs

4 4 EGraph Paper.nb MatrixPlot GraphMatrix tradegraph EGraphToPairs tradegraph Short Afghanistan, UnitedStates, Afghanistan, Pakistan, 908, Zimbabwe, Italy, Zimbabwe, Germany Here is an example of how the following data can be modified to visualize the rankings of countries based on prestige. (mouse over to see country) ListPlot Sort PageRanks tradegraph. Rule country,pr Tooltip pr,country, AxesLabel "Ordinal Rank","PageRank" PageRank Ordinal Rank Analysis of Dynamic Network Data We will demonstrate various ways to analyze the dynamics of network models The Diameter of an Evolving Graph The first example shows how one can perform experiments with NetLogo and the EGraph structure to study properties of evolving networks. In this example, we use the modified preferential attachment model from class. NetLogo` NLStart " Applications NetLogo 4.0" NLLoadModel " Users ebakshy netlearn RandAndPrefNetEB.nlogo" To perform the experiment, we grow the graph to 5000 nodes and sample its structure every 00 model ticks. Because the behavior of the model might differ from run to run, we perform 20 realizations of the model. The resulting network is stored in BAG[step, realization #] (warning, the simulation takes a considerable amount of time to run)

5 To perform the experiment, we grow the graph to 5000 nodes and sample its structure every 00 model ticks. Because the behavior of the model might differ from run to run, we perform 20 realizations of the model. The resulting network is stored in BAG[step, realization #] (warning, the simulation takes a considerable amount of time to run) Do NLCommand "set m 2","setup" Do NLCommand "repeat 00 go " BAG i,j ToEGraph NLGetGraph,Directed False, i,50, j,20 Here is how one would plot a simple statistic, like the degree distribution EGraph Paper.nb 5 ListLogLogPlot Cumdist OutDegrees BAG 50,,AxesLabel "Degree","Count",PlotLabel "Degree Count Degree Distribution Degree To plot the effective diameter of the graph over time, we plot the average the results at each time step AvgDiameter i : Mean Table First First PseudoDiameter BAG i,j, j,20 ListLinePlot Table NumNodes BAG i,, AvgDiameter i, i,50,axeslabel "Num Nodes","Pseudo Pseudo Diameter Num Nodes The Spread of Opinions in a Network In this example, we will generate a network and observe the diffusion of opinions on the network. Then, the structure of the network will be used to generate a two dimensional representation of agent influence over time. NLLoadModel " Users ebakshy netlearn PrefDiffusionEBOpinion.nlogo" NLCommand "set num nodes 300","set gamma 0.3","set p 0.4","set sorting? true","generate topology NLCommand "setup" opgraph ToEGraph NLGetGraph,Directed False this part is ugly but will be more straight forward in future versions opseries Apply Rule,NLDoReport "repeat 2 spread "," list who pro? of turtles", 200, 2 agentstates Map N VertexLabels opgraph. &,opseries ; SetAttribute opgraph,"opinions",agentstates One way of tracking people's opinions over time is to represent each agent as a row in an ArrayPlot, with each column representing the agent's opinion at time t.

6 6 EGraph Paper.nb One way of tracking people's opinions over time is to represent each agent as a row in an ArrayPlot, with each column representing the agent's opinion at time t. OpinionPlot opinions : ArrayPlot opinions. True Red,False Lighter Pink,0.5,AspectRatio Here are the opinions of all agents over time, sorted in random order. In this case, the spatial ordering of agents in no way reflects the structure of interaction among agents. OpinionPlot GetAttribute opgraph,"opinions" Here is what the network looks like, using a ball and stick plot and a matrix plot

7 EGraph Paper.nb 7 GraphPlot GraphMatrix opgraph MatrixPlot GraphMatrix opgraph We can compute the shortest path between all pairs of nodes and use hierarchical clustering to resolve community structure HierarchicalClustering`

8 8 EGraph Paper.nb gdm GraphDistanceMatrix GraphMatrix opgraph gdordering ClusterFlatten DirectAgglomerate gdm GraphicsRow MatrixPlot gdm, PlotLabel "Vertex Distance Matrix unordered ", MatrixPlot gdm gdordering,gdordering,plotlabel "Vertex Distance Matrix clustered " Vertex Distance Matrix unordered Vertex Distance Matrix clustered This new ordering can be used to visualize agent states over time in a way in which agents' spatial proximity is related to their structural proximity in the network OpinionPlot GetAttribute opgraph,"opinions" gdordering Network Visualization Fisher-Smith Egonet Plots The kit has built-in support for "Fisher" plots, which have been used in our work on Yahoo! Answers. Here is one example using previous data set

9 EGraph Paper.nb 9 FisherSample tradegraph,20 Using Attribute Stores for Enhancing Network Visualizations This part is a bit experimental, but is a good demonstration of how the attribute object can be used to enhance network visualizations importlinks Select Flatten Thread Rule CountryData,"ImportPartners", & CountryData "MiddleEast exportlinks Select Flatten Thread Rule,CountryData,"ExportPartners" & CountryData "MiddleEast mideastgraph ToEGraph Join importlinks,exportlinks We assign each node a shape attribute SetAttribute mideastgraph,"shape",tooltip CountryData,"Shape", & VertexLabels mideastgraph ;

10 0 EGraph Paper.nb GraphPlot GraphMatrix mideastgraph, VertexRenderingFunction Inset GetAttribute tradegraph, "shape" 2,, Automatic, 0.5, 0.5 &

11 EGraph Paper.nb Supporting Code GraphUtilities`; Format EGraph sa SparseArray, labels, labelrules, attributes : "EGraph of " ToString Length labels " nodes" " "; GraphMatrix g EGraph : g VertexLabels g EGraph : g 2 VertexLabelToIndex g EGraph : g 3 AttributeObject g EGraph : g 4 PairIntersect pairs,baselist is a fast algorithm for reducing a list of ordered pairs to a base set returns a,b a,b pairs a baselist b baselist PairIntersect pairs, baselist : Module membershiprules, membershiprules Dispatch Thread Rule baselist, True DeleteCases Pick pairs, pairs. membershiprules, True, True, Options ToEGraph VertexLabeling True, Directed True Clear ToEGraph ToEGraph g : Rule,.., opts : OptionsPattern : ToEGraph List g, opts ToEGraph g :,.., opts : OptionsPattern : Module allvertexlabels, labeltoindex, gtally, storedvertexlabels, storedlabeltoindex, arrayrules, graphmatrix, allvertexlabels Union Flatten g labeltoindex Dispatch MapIndexed Rule, First 2 &, allvertexlabels gtally Tally g. labeltoindex storedvertexlabels If OptionValue VertexLabeling True, allvertexlabels, Range Length allvertexlabels storedlabeltoindex If OptionValue VertexLabeling True, labeltoindex, Rule i, i arrayrules gtally. in Integer, out Integer, count Integer Rule in, out, count If OptionValue Directed False, arrayrules Join arrayrules, arrayrules. Rule i, j, c Rule j, i, c graphmatrix SparseArray arrayrules, Length allvertexlabels, Length allvertexlabels, 0 EGraph graphmatrix, storedvertexlabels, storedlabeltoindex, Unique attributeobject ToEGraph m SparseArray, opts : OptionsPattern : Module allvertexlabels, labeltoindex, gtally, storedvertexlabels, storedlabeltoindex, arrayrules, graphmatrix, allvertexlabels Range Length m labeltoindex i i If OptionValue Directed False, arrayrules Join arrayrules, arrayrules. Rule i, j, c Rule j, i, c graphmatrix Map If 0,, 0 &, m m, 2 make symmetric EGraph graphmatrix, allvertexlabels, labeltoindex, Unique attributeobject EGraphToPairs G EGraph : First Most ArrayRules GraphMatrix G. a Integer, b Integer VertexLabels G a, b Clear RandomEGraph RandomEGraph n, p : Module m, ToEGraph SparseArray i, j If RandomReal.0 p,, 0 i j, n, n sets arbitrary attribute

12 2 EGraph Paper.nb sets arbitrary attribute SetAttribute G EGraph, attr String, data : AttributeObject G attr data; sets node attributes by encoding data by vertex label data rules SetAttribute G EGraph, attr String, data : List,.., opts : OptionsPattern : Module datadispatch Dispatch Rule data, SetAttribute G, attr, Table vertex. datadispatch, vertex, VertexLabels G GetAttribute G EGraph, attr String : AttributeObject G attr NumNodes G EGraph : Length GraphMatrix G InDegrees G EGraph : Total GraphMatrix G OutDegrees G EGraph : Total GraphMatrix G OutNeighbors G EGraph, i Integer : Cases ArrayRules GraphMatrix G i, Rule node Integer, val Integer node val 0 InNeighbors G EGraph, i Integer : Cases ArrayRules GraphMatrix G All, i, Rule node Integer, val Integer node val 0 OutNeighbors G EGraph, i : VertexLabels G OutNeighbors G, i. VertexLabelToIndex G InNeighbors G EGraph, i : VertexLabels G InNeighbors G, i. VertexLabelToIndex G AllNeighbors G EGraph, i : Join InNeighbors G, i, OutNeighbors G, i AllNeighbors2 G EGraph, i : Union Flatten AllNeighbors G, & AllNeighbors G, i ordered pairs of node i and its in or out neighbors InNeighborPairs G EGraph, i : Thread InNeighbors G, i, i OutNeighborPairs G EGraph, i : Thread i, OutNeighbors G, i AllNeighborPairs G EGraph, i : Join OutNeighborPairs G, i, InNeighborPairs G, i all second nearest neighbors AllNeighbors2Pairs G EGraph, i : Module myneighbors, myneighbors AllNeighbors G, i Union Join AllNeighborPairs G, i, Flatten Map AllNeighborPairs G, &, myneighbors, egonet is all nearest neighbors of i and the links between all nearest neighbors EgoNetPairs G EGraph, i : Join AllNeighborPairs G, i, PairIntersect AllNeighbors2Pairs G, i, AllNeighbors G, i AllNeighbors2Graph G EGraph, i : ToEGraph AllNeighbors2Pairs G, i. a Integer, b Integer VertexLabels G a, b EgoNetGraph G EGraph, i : ToEGraph EgoNetPairs G, i. a Integer, b Integer VertexLabels G a, b Unprotect StrongComponents, WeakComponents, PseudoDiameter, PageRankVector, CommunityStructureAssignment, CommunityStructurePartition, CommunityModularity, PathLengthMatrix, PathLengthMatrix StrongComponents G EGraph : StrongComponents GraphMatrix G PageRankVector G EGraph, opts : PageRankVector GraphMatrix G, opts PageRanks G EGraph, opts : Thread VertexLabels G PageRankVector G, opts WeakComponents G EGraph : WeakComponents GraphMatrix G PseudoDiameter G EGraph, opts : PseudoDiameter GraphMatrix G, opts CommunityStructureAssignment G EGraph, opts : CommunityStructureAssignment EGraphToPairs G CommunityStructurePartition G EGraph, opts : CommunityStructurePartition EGraphToPairs G CommunityModularity G EGraph, part List, opts : CommunityModularity EGraphToPairs G, part, opts PathLengthMatrix G EGraph : GraphDistanceMatrix GraphMatrix G PathLength G EGraph, i Integer, j Integer : GraphDistance GraphMatrix G, i, j

13 EGraph Paper.nb 3 PathLength G EGraph, i Integer, j Integer : GraphDistance GraphMatrix G, i, j PathLength G EGraph, i, j : GraphDistance GraphMatrix G, i. VertexLabelToIndex G, j. VertexLabelToIndex G PathLength G EGraph, i, j : PathLength G, i, j Protect StrongComponents, WeakComponents, PseudoDiameter, PageRankVector, CommunityStructureAssignment, CommunityStructurePartition, CommunityModularity, GraphDistanceMatrix, GraphDistance cumulative binning for doing log plots Cumdist l : Module valuetally, sorted in descending order by value valuetally Reverse SortBy Tally l, First pair each value with the of values less than itself Transpose First valuetally, Accumulate Last valuetally Clear FisherSample RandomizePairs pairs : Module randlist, randlist Map Rule, RandomReal &, Union Flatten pairs pairs. randlist FisherSample G EGraph, n, opts : Module thegraph, thegraph Flatten Table Rule RandomizePairs EgoNetPairs G, RandomInteger, Length VertexLabels G, n GraphPlot thegraph, DirectedEdges True, EdgeRenderingFunction Black, Arrowheads 0.0, Arrow, 0.0 &, VertexLabeling False, opts

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,

More information

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu

More information

Hadoop SNS. renren.com. Saturday, December 3, 11

Hadoop SNS. renren.com. Saturday, December 3, 11 Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December

More information

An Introduction to APGL

An Introduction to APGL An Introduction to APGL Charanpal Dhanjal February 2012 Abstract Another Python Graph Library (APGL) is a graph library written using pure Python, NumPy and SciPy. Users new to the library can gain an

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Graph/Network Visualization

Graph/Network Visualization Graph/Network Visualization Data model: graph structures (relations, knowledge) and networks. Applications: Telecommunication systems, Internet and WWW, Retailers distribution networks knowledge representation

More information

SGL: Stata graph library for network analysis

SGL: Stata graph library for network analysis SGL: Stata graph library for network analysis Hirotaka Miura Federal Reserve Bank of San Francisco Stata Conference Chicago 2011 The views presented here are my own and do not necessarily represent the

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],

More information

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

More information

Data Structure [Question Bank]

Data Structure [Question Bank] Unit I (Analysis of Algorithms) 1. What are algorithms and how they are useful? 2. Describe the factor on best algorithms depends on? 3. Differentiate: Correct & Incorrect Algorithms? 4. Write short note:

More information

Spread of Viruses on a Computer Network

Spread of Viruses on a Computer Network Spread of Viruses on a Computer Network New Mexico Supercomputing Challenge Final Report March 31, 2015 Team Number: 63 Los Alamos Middle School Team: Christie Djidjev Teacher: Project Mentors: Hristo

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Gephi Tutorial Quick Start

Gephi Tutorial Quick Start Gephi Tutorial Welcome to this introduction tutorial. It will guide you to the basic steps of network visualization and manipulation in Gephi. Gephi version 0.7alpha2 was used to do this tutorial. Get

More information

SECTIONS 1.5-1.6 NOTES ON GRAPH THEORY NOTATION AND ITS USE IN THE STUDY OF SPARSE SYMMETRIC MATRICES

SECTIONS 1.5-1.6 NOTES ON GRAPH THEORY NOTATION AND ITS USE IN THE STUDY OF SPARSE SYMMETRIC MATRICES SECIONS.5-.6 NOES ON GRPH HEORY NOION ND IS USE IN HE SUDY OF SPRSE SYMMERIC MRICES graph G ( X, E) consists of a finite set of nodes or vertices X and edges E. EXMPLE : road map of part of British Columbia

More information

Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan

Network Metrics, Planar Graphs, and Software Tools. Based on materials by Lala Adamic, UMichigan Network Metrics, Planar Graphs, and Software Tools Based on materials by Lala Adamic, UMichigan Network Metrics: Bowtie Model of the Web n The Web is a directed graph: n webpages link to other webpages

More information

IBM SPSS Modeler Social Network Analysis 15 User Guide

IBM SPSS Modeler Social Network Analysis 15 User Guide IBM SPSS Modeler Social Network Analysis 15 User Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 25. This edition applies to IBM

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, [email protected] Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents

More information

The Network Structure of Hard Combinatorial Landscapes

The Network Structure of Hard Combinatorial Landscapes The Network Structure of Hard Combinatorial Landscapes Marco Tomassini 1, Sebastien Verel 2, Gabriela Ochoa 3 1 University of Lausanne, Lausanne, Switzerland 2 University of Nice Sophia-Antipolis, France

More information

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University [email protected] [email protected] ABSTRACT Depending

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.

G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P. SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background

More information

On the effect of forwarding table size on SDN network utilization

On the effect of forwarding table size on SDN network utilization IBM Haifa Research Lab On the effect of forwarding table size on SDN network utilization Rami Cohen IBM Haifa Research Lab Liane Lewin Eytan Yahoo Research, Haifa Seffi Naor CS Technion, Israel Danny Raz

More information

SAP InfiniteInsight 7.0 SP1

SAP InfiniteInsight 7.0 SP1 End User Documentation Document Version: 1.0-2014-11 Getting Started with Social Table of Contents 1 About this Document... 3 1.1 Who Should Read this Document... 3 1.2 Prerequisites for the Use of this

More information

Faculty of Computer Science Computer Graphics Group. Final Diploma Examination

Faculty of Computer Science Computer Graphics Group. Final Diploma Examination Faculty of Computer Science Computer Graphics Group Final Diploma Examination Communication Mechanisms for Parallel, Adaptive Level-of-Detail in VR Simulations Author: Tino Schwarze Advisors: Prof. Dr.

More information

Common Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki [email protected]

Common Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki hossein@databricks.com Common Patterns and Pitfalls for Implementing Algorithms in Spark Hossein Falaki @mhfalaki [email protected] Challenges of numerical computation over big data When applying any algorithm to big data

More information

Anomaly Detection and Predictive Maintenance

Anomaly Detection and Predictive Maintenance Anomaly Detection and Predictive Maintenance Rosaria Silipo Iris Adae Christian Dietz Phil Winters [email protected] [email protected] [email protected] [email protected]

More information

CS171 Visualization. The Visualization Alphabet: Marks and Channels. Alexander Lex [email protected]. [xkcd]

CS171 Visualization. The Visualization Alphabet: Marks and Channels. Alexander Lex alex@seas.harvard.edu. [xkcd] CS171 Visualization Alexander Lex [email protected] The Visualization Alphabet: Marks and Channels [xkcd] This Week Thursday: Task Abstraction, Validation Homework 1 due on Friday! Any more problems

More information

Character Image Patterns as Big Data

Character Image Patterns as Big Data 22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005 V. Adamchik 1 Graph Theory Victor Adamchik Fall of 2005 Plan 1. Basic Vocabulary 2. Regular graph 3. Connectivity 4. Representing Graphs Introduction A.Aho and J.Ulman acknowledge that Fundamentally, computer

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich [email protected] 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

More information

Broadcasting in Wireless Networks

Broadcasting in Wireless Networks Université du Québec en Outaouais, Canada 1/46 Outline Intro Known Ad hoc GRN 1 Introduction 2 Networks with known topology 3 Ad hoc networks 4 Geometric radio networks 2/46 Outline Intro Known Ad hoc

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

VEHICLE ROUTING PROBLEM

VEHICLE ROUTING PROBLEM VEHICLE ROUTING PROBLEM Readings: E&M 0 Topics: versus TSP Solution methods Decision support systems for Relationship between TSP and Vehicle routing problem () is similar to the Traveling salesman problem

More information

5. Binary objects labeling

5. Binary objects labeling Image Processing - Laboratory 5: Binary objects labeling 1 5. Binary objects labeling 5.1. Introduction In this laboratory an object labeling algorithm which allows you to label distinct objects from a

More information

Social Network Analysis

Social Network Analysis Social Network Analysis Challenges in Computer Science April 1, 2014 Frank Takes ([email protected]) LIACS, Leiden University Overview Context Social Network Analysis Online Social Networks Friendship Graph

More information

High-dimensional labeled data analysis with Gabriel graphs

High-dimensional labeled data analysis with Gabriel graphs High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use

More information

Analysis of System Performance IN2072 Chapter M Matlab Tutorial

Analysis of System Performance IN2072 Chapter M Matlab Tutorial Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU München Analysis of System Performance IN2072 Chapter M Matlab Tutorial Dr. Alexander Klein Prof. Dr.-Ing. Georg

More information

UCINET Visualization and Quantitative Analysis Tutorial

UCINET Visualization and Quantitative Analysis Tutorial UCINET Visualization and Quantitative Analysis Tutorial Session 1 Network Visualization Session 2 Quantitative Techniques Page 2 An Overview of UCINET (6.437) Page 3 Transferring Data from Excel (From

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015 W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

More information

Tools and Techniques for Social Network Analysis

Tools and Techniques for Social Network Analysis Tools and Techniques for Social Network Analysis Pajek Program for Analysis and Visualization of Large Networks Pajek: What is it Pajek is a program, for Windows and Linux (via Wine) Developers: Vladimir

More information

Introduction to Networks and Business Intelligence

Introduction to Networks and Business Intelligence Introduction to Networks and Business Intelligence Prof. Dr. Daning Hu Department of Informatics University of Zurich Sep 17th, 2015 Outline Network Science A Random History Network Analysis Network Topological

More information

Oracle Database 10g: Building GIS Applications Using the Oracle Spatial Network Data Model. An Oracle Technical White Paper May 2005

Oracle Database 10g: Building GIS Applications Using the Oracle Spatial Network Data Model. An Oracle Technical White Paper May 2005 Oracle Database 10g: Building GIS Applications Using the Oracle Spatial Network Data Model An Oracle Technical White Paper May 2005 Building GIS Applications Using the Oracle Spatial Network Data Model

More information

Gephi Tutorial Visualization

Gephi Tutorial Visualization Gephi Tutorial Welcome to this Gephi tutorial. It will guide you to the basic and advanced visualization settings in Gephi. The selection and interaction with tools will also be introduced. Follow the

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P) Distributed Computing over Communication Networks: Topology (with an excursion to P2P) Some administrative comments... There will be a Skript for this part of the lecture. (Same as slides, except for today...

More information

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction

More information

RANGER S.A.S 3D (Survey Analysis Software)

RANGER S.A.S 3D (Survey Analysis Software) RANGER S.A.S 3D (Survey Analysis Software) QUICK START USER MANUAL INTRODUCTION This document is designed to provide a step by step guide showing how easy it is to import and manipulate raw survey data

More information

InfiniteInsight 6.5 sp4

InfiniteInsight 6.5 sp4 End User Documentation Document Version: 1.0 2013-11-19 CUSTOMER InfiniteInsight 6.5 sp4 Toolkit User Guide Table of Contents Table of Contents About this Document 3 Common Steps 4 Selecting a Data Set...

More information

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION N PROBLEM DEFINITION Opportunity New Booking - Time of Arrival Shortest Route (Distance/Time) Taxi-Passenger Demand Distribution Value Accurate

More information

Algorithmic Techniques for Big Data Analysis. Barna Saha AT&T Lab-Research

Algorithmic Techniques for Big Data Analysis. Barna Saha AT&T Lab-Research Algorithmic Techniques for Big Data Analysis Barna Saha AT&T Lab-Research Challenges of Big Data VOLUME Large amount of data VELOCITY Needs to be analyzed quickly VARIETY Different types of structured

More information

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like? What is a Network? Network/Graph Theory Network = graph Informally a graph is a set of nodes joined by a set of lines or arrows. 1 1 2 3 2 3 4 5 6 4 5 6 Graph-based representations Representing a problem

More information

Load balancing Static Load Balancing

Load balancing Static Load Balancing Chapter 7 Load Balancing and Termination Detection Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection

More information

Traffic Prediction in Wireless Mesh Networks Using Process Mining Algorithms

Traffic Prediction in Wireless Mesh Networks Using Process Mining Algorithms Traffic Prediction in Wireless Mesh Networks Using Process Mining Algorithms Kirill Krinkin Open Source and Linux lab Saint Petersburg, Russia [email protected] Eugene Kalishenko Saint Petersburg

More information

UCINET Quick Start Guide

UCINET Quick Start Guide UCINET Quick Start Guide This guide provides a quick introduction to UCINET. It assumes that the software has been installed with the data in the folder C:\Program Files\Analytic Technologies\Ucinet 6\DataFiles

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Load Balancing and Termination Detection

Load Balancing and Termination Detection Chapter 7 Load Balancing and Termination Detection 1 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection

More information

Social Network Mining

Social Network Mining Social Network Mining Data Mining November 11, 2013 Frank Takes ([email protected]) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics

More information

The PageRank Citation Ranking: Bring Order to the Web

The PageRank Citation Ranking: Bring Order to the Web The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized

More information

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 13 Overview Graphs and Graph

More information

Graph Mining Techniques for Social Media Analysis

Graph Mining Techniques for Social Media Analysis Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented

More information

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003 Graph models for the Web and the Internet Elias Koutsoupias University of Athens and UCLA Crete, July 2003 Outline of the lecture Small world phenomenon The shape of the Web graph Searching and navigation

More information

How To Understand The Network Of A Network

How To Understand The Network Of A Network Roles in Networks Roles in Networks Motivation for work: Let topology define network roles. Work by Kleinberg on directed graphs, used topology to define two types of roles: authorities and hubs. (Each

More information

Network-Based Tools for the Visualization and Analysis of Domain Models

Network-Based Tools for the Visualization and Analysis of Domain Models Network-Based Tools for the Visualization and Analysis of Domain Models Paper presented as the annual meeting of the American Educational Research Association, Philadelphia, PA Hua Wei April 2014 Visualizing

More information

Assignment 2: More MapReduce with Hadoop

Assignment 2: More MapReduce with Hadoop Assignment 2: More MapReduce with Hadoop Jean-Pierre Lozi February 5, 2015 Provided files following URL: An archive that contains all files you will need for this assignment can be found at the http://sfu.ca/~jlozi/cmpt732/assignment2.tar.gz

More information

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate

More information

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Report: Declarative Machine Learning on MapReduce (SystemML)

Report: Declarative Machine Learning on MapReduce (SystemML) Report: Declarative Machine Learning on MapReduce (SystemML) Jessica Falk ETH-ID 11-947-512 May 28, 2014 1 Introduction SystemML is a system used to execute machine learning (ML) algorithms in HaDoop,

More information

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit Data Structures Page 1 of 24 A.1. Arrays (Vectors) n-element vector start address + ielementsize 0 +1 +2 +3 +4... +n-1 start address continuous memory block static, if size is known at compile time dynamic,

More information

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Standard 3: Data Analysis, Statistics, and Probability 6 th Prepared Graduates: 1. Solve problems and make decisions that depend on un

More information

Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA

Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA Are Image Quality Metrics Adequate to Evaluate the Quality of Geometric Objects? Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA ABSTRACT

More information

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009 Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths

More information

HIGH PERFORMANCE BIG DATA ANALYTICS

HIGH PERFORMANCE BIG DATA ANALYTICS HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning

More information

Data Visualization. Scientific Principles, Design Choices and Implementation in LabKey. Cory Nathe Software Engineer, LabKey cnathe@labkey.

Data Visualization. Scientific Principles, Design Choices and Implementation in LabKey. Cory Nathe Software Engineer, LabKey cnathe@labkey. Data Visualization Scientific Principles, Design Choices and Implementation in LabKey Catherine Richards, PhD, MPH Staff Scientist, HICOR [email protected] Cory Nathe Software Engineer, LabKey [email protected]

More information

Generating Labels from Clicks

Generating Labels from Clicks Generating Labels from Clicks R. Agrawal A. Halverson K. Kenthapadi N. Mishra P. Tsaparas Search Labs, Microsoft Research {rakesha,alanhal,krisken,ninam,panats}@microsoft.com ABSTRACT The ranking function

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information