# Entropy based Graph Clustering: Application to Biological and Social Networks

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University

2 Complex Systems Definition Dynamically evolving networks that have a large number of objects with complex connectivity Examples Social networks Biological i l networks Telecommunication networks WWW

3 Clustering of Complex Systems Purpose Issues Predicting groups of objects that closely communicate with each other Groups having the same subjects (called communities) Groups having the same functions (called modules) Large scale Scalability / Efficiency Complex connectivity Robustness / Accuracy Approaches Density based methods Searching densely connected sub graphs Partition based methods Detecting the best partition Hierarchical methods Merging sub graphs iteratively, or dividing sub graphs recursively

4 New Definition: Graph Entropy Assumptions Entropy in a graph represents structural stability Isolated complete sub graphs as clusters have the most stable status (i.e., lowest entropy) Random reconnections decrease stability (i.e., increase entropy) General Notations Inner links of v given G (V,E ): edges from v to the vertices in V p i (v): probability of v having inner links Outer links of v given G (V,E ): edges from v to the vertices not in V p o (v): probability of v having outer links Definitions Vertex Entropy: Graph Entropy: e(v) = p i (v) log 2 p i (v) p o (v) log 2 p o (v) e(g(v,e)) = v V e(v)

5 Examples of Graph Entropy Examples e(a) = e(b) = e(c) = 0 e(d) = 0.81 e(f) = 1.00 e(g) = e(h) = e(k) = 0 e(g) = 1.81 e(a) = e(b) = e(c) = e(d) = 0 e(f) = 1.00 e(g) = 0.92 e(h) = e(k) = 0 e(g) = 1.92 {a,b,c,d} giveshigher structural stability to G than {a,b,c,d,f}

6 Entropy Based Graph Clustering Algorithm (1) Select a random seed vertex, and form an initial cluster with the seed and its neighbors (2) Remove a vertex on the inner boundary of the cluster if it decreases graph entropy (3) Repeat step (2) until graph entropy is minimal (4) Add a vertex on the outer boundary of the cluster if it decreases graph entropy (5) Repeat step (4) until graph entropy is minimal (6) Output the cluster (7) Repeat steps (1) to (6) until there is no more vertex left Features Local optimization No parameters Randomness Low density clusters Overlapping clusters

7 Application to Biological Networks Dataset Yeast genome wide protein protein interaction data from DIP 4,928 proteins (vertices) and 17,186 interactions (edges) Clustering Accuracy Evaluation Method Comparison with known protein complex data f measure score: mean of recall and precision, Results

8 Revisions of Entropy Based Clustering Priority on Seed Selection in Step (1) Selecting a seed vertex in the descending order of degree Selecting a seed vertex in the descending order of clustering coefficient Priority on Seed Growth in Step (4) Adding a vertex on the outer boundary in the ascending order of vertex entropy Increase of Cluster Overlapping Rates Allowing random selection of a seed vertex from previous clusters Post Process Filtering out the clusters with graph entropy higher than a threshold

9 Accuracy Improvement Results before post processprocess after post processprocess # clusters avg. size avg. f score # clusters avg. size avg. f score original algorithm degree based seed selection coef. based seed selection degree based seed selection AND entropy based seed growth coef. based seed selection AND entropy based seed growth Comparison with other competing methods method # clusters avg. size avg. f score Entropy Based Clustering MCL CNM

10 Application to Social / Telecommunication Networks Dataset AS (Autonomous System) link network YouTube video link network MySpace social network dataset # vertices # edges density (%) AS links 45, , YouTube 321, , MySpace 100,000 6,854, Clustering Accuracy Evaluation Method p value of cumulative hypergeometric distribution for each vertex in a cluster Averaging p values of all vertices in a cluster Results method AS links YouTube MySpace # clusters avg. size avg. log p # clusters avg. size avg. log p # clusters avg. size avg. log p Entropy Based , , MCL ,

11 Efficiency Improvement Parallelization Multiple seed growth concurrently producing different clusters Issues Avoiding duplicate cluster generation Sharing resource of candidate seeds Results Close to linear increase of speed up as the increment of the number of threads Runtime Comparison with other competing methods method AS links YouTube MySpace Entropy Based (1 thread) ,028 Entropy Based (4 threads) ,519 MCL 6, ,764 CNM ,334

12 Conclusion Significance Systematic approach to analyze complex systems Information theoretic model to formulate modularity Local optimization algorithm for graph clustering Applications to diverse areas: Computational systems biology Social network analysis National security Internet flow analysis Future Works Further revisions of entropy based graph clustering algorithm Revising the seed selection step, e.g., starting from cliques as initial seed clusters Revising the seed growth step, e.g., alternating vertex addition and removal

### Protein Protein Interaction Networks

Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

### SCAN: A Structural Clustering Algorithm for Networks

SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected

### 6. If there is no improvement of the categories after several steps, then choose new seeds using another criterion (e.g. the objects near the edge of

Clustering Clustering is an unsupervised learning method: there is no target value (class label) to be predicted, the goal is finding common patterns or grouping similar examples. Differences between models/algorithms

### 10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

### MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

### Clipping polygons the Sutherland-Hodgman algorithm

Clipping polygons would seem to be quite complex. Clipping polygons would seem to be quite complex. single Clipping polygons would seem to be quite complex. single Clipping polygons would seem to be quite

### Crawling and Detecting Community Structure in Online Social Networks using Local Information

Crawling and Detecting Community Structure in Online Social Networks using Local Information TU Delft - Network Architectures and Services (NAS) 1/12 Outline In order to find communities in a graph one

### A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

### Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

### Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

### Part 2: Community Detection

Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

### Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### Link Prediction in Social Networks

CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily

### GEOENGINE MSc in Geomatics Engineering (Master Thesis) Anamelechi, Falasy Ebere

Master s Thesis: ANAMELECHI, FALASY EBERE Analysis of a Raster DEM Creation for a Farm Management Information System based on GNSS and Total Station Coordinates Duration of the Thesis: 6 Months Completion

### Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri

Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph

### Why? A central concept in Computer Science. Algorithms are ubiquitous.

Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

### Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

### Graph Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 3. Topic Overview Definitions and Representation Minimum

### Appendix A: Sampling Methods

Appendix A: Sampling Methods What is Sampling? Sampling is used in an @RISK simulation to generate possible values from probability distribution functions. These sets of possible values are then used to

### KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. ref. Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms ref. Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Outline Prototype-based Fuzzy c-means Mixture Model Clustering Density-based

### Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Geometrical Segmentation of Point Cloud Data using Spectral Clustering

Geometrical Segmentation of Point Cloud Data using Spectral Clustering Sergey Alexandrov and Rainer Herpers University of Applied Sciences Bonn-Rhein-Sieg 15th July 2014 1/29 Addressed Problem Given: a

### The tipping point is often referred to in popular literature. First introduced in Schelling 78 and Granovetter 78

Paulo Shakarian and Damon Paulo Dept. Electrical Engineering and Computer Science and Network Science Center U.S. Military Academy West Point, NY ASONAM 2012 The tipping point is often referred to in popular

### MODEL SELECTION FOR SOCIAL NETWORKS USING GRAPHLETS

MODEL SELECTION FOR SOCIAL NETWORKS USING GRAPHLETS JEANNETTE JANSSEN, MATT HURSHMAN, AND NAUZER KALYANIWALLA Abstract. Several network models have been proposed to explain the link structure observed

### Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications

### Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

### USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

### Graph Database Proof of Concept Report

Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

### Big Data Graph Algorithms

Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian

### Minimum Spanning Trees

Minimum Spanning Trees Algorithms and 18.304 Presentation Outline 1 Graph Terminology Minimum Spanning Trees 2 3 Outline Graph Terminology Minimum Spanning Trees 1 Graph Terminology Minimum Spanning Trees

### Network Algorithms for Homeland Security

Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially

### Hadoop SNS. renren.com. Saturday, December 3, 11

Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December

### An Intelligent Matching System for the Products of Small Business/Manufactures with the Celebrities

An Intelligent Matching System for the Products of Small Business/Manufactures with the Celebrities Junho Jeong 1, Yunsik Son 2, Seokhoon Ko 1 and Seman Oh 1 1 Dept. of Computer Engineering, Dongguk University,

### InfiniteGraph: The Distributed Graph Database

A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

### Link-based Analysis on Large Graphs. Presented by Weiren Yu Mar 01, 2011

Link-based Analysis on Large Graphs Presented by Weiren Yu Mar 01, 2011 Overview 1 Introduction 2 Problem Definition 3 Optimization Techniques 4 Experimental Results 2 1. Introduction Many applications

### BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE

BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor

### DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

### Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

### Clustering Social Networks

Clustering Social Networks Nina Mishra 1,4, Robert Schreiber, Isabelle Stanton 1, and Robert E. Tarjan,3 {nmishra,istanton}@cs.virginia.edu, {rob.schreiber,robert.tarjan}@hp.com 1 Department of Computer

### Forschungskolleg Data Analytics Methods and Techniques

Forschungskolleg Data Analytics Methods and Techniques Martin Hahmann, Gunnar Schröder, Phillip Grosse Prof. Dr.-Ing. Wolfgang Lehner Why do we need it? We are drowning in data, but starving for knowledge!

### Mining Large Datasets: Case of Mining Graph Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large

### Graph Processing and Social Networks

Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

### Validation and automatic repair of two- and three-dimensional GIS datasets

Validation and automatic repair of two- and three-dimensional GIS datasets M. Meijers H. Ledoux K. Arroyo Ohori J. Zhao OSGeo.nl dag 2013, Delft 2013 11 13 1 / 26 Typical error: polygon is self-intersecting

### ! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions Basic Steps in Query

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer

### Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview

### Distance Degree Sequences for Network Analysis

Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

### Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

### Mathematics for Algorithm and System Analysis

Mathematics for Algorithm and System Analysis for students of computer and computational science Edward A. Bender S. Gill Williamson c Edward A. Bender & S. Gill Williamson 2005. All rights reserved. Preface

### Conclusions and Future Directions

Chapter 9 This chapter summarizes the thesis with discussion of (a) the findings and the contributions to the state-of-the-art in the disciplines covered by this work, and (b) future work, those directions

### Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

### Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

### Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.

### Evaluating partitioning of big graphs

Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed

### Strong and Weak Ties

Strong and Weak Ties Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz April 11, 2016 Elisabeth Lex (KTI, TU Graz) Networks April 11, 2016 1 / 66 Outline 1 Repetition 2 Strong and Weak Ties 3 General

### CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/

CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction

### Fall 2015 Midterm 1 24/09/15 Time Limit: 80 Minutes

Math 340 Fall 2015 Midterm 1 24/09/15 Time Limit: 80 Minutes Name (Print): This exam contains 6 pages (including this cover page) and 5 problems. Enter all requested information on the top of this page,

### Lecture Notes on Spanning Trees

Lecture Notes on Spanning Trees 15-122: Principles of Imperative Computation Frank Pfenning Lecture 26 April 26, 2011 1 Introduction In this lecture we introduce graphs. Graphs provide a uniform model

### Software Testing Strategies and Techniques

Software Testing Strategies and Techniques Sheetal Thakare 1, Savita Chavan 2, Prof. P. M. Chawan 3 1,2 MTech, Computer Engineering VJTI, Mumbai 3 Associate Professor, Computer Technology Department, VJTI,

### Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

### BOĞAZİÇİ UNIVERSITY COMPUTER ENGINEERING

Parallel l Tetrahedral Mesh Refinement Mehmet Balman Computer Engineering, Boğaziçi University Adaptive Mesh Refinement (AMR) A computation ti technique used to improve the efficiency i of numerical systems

### Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

### Presto/Blockus: Towards Scalable R Data Analysis

/Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIA-UIUC-AL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew

### Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

### BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER Vol. 26 no. 8 2010, pages 1105 1111 doi:10.1093/bioinformatics/btq078 Data and text mining Advance Access publication February 24, 2010 SPICi: a fast clustering algorithm

### Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

### Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search

### Interpolating and Approximating Implicit Surfaces from Polygon Soup

Interpolating and Approximating Implicit Surfaces from Polygon Soup 1. Briefly summarize the paper s contributions. Does it address a new problem? Does it present a new approach? Does it show new types

### APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

### Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

### The ParTriCluster Algorithm for Gene Expression Analysis

The ParTriCluster Algorithm for Gene Expression Analysis Renata Araujo Guilherme Ferreira Gustavo Orair Wagner Meira Jr. Renato Ferreira Dorgival Guedes Department of Computer Science Universidade Federal

### A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

### Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

### Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

### MODULE 15 Clustering Large Datasets LESSON 34

MODULE 15 Clustering Large Datasets LESSON 34 Incremental Clustering Keywords: Single Database Scan, Leader, BIRCH, Tree 1 Clustering Large Datasets Pattern matrix It is convenient to view the input data

### Influence Discovery in Semantic Networks: An Initial Approach

2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing

### Automatic parameter regulation for a tracking system with an auto-critical function

Automatic parameter regulation for a tracking system with an auto-critical function Daniela Hall INRIA Rhône-Alpes, St. Ismier, France Email: Daniela.Hall@inrialpes.fr Abstract In this article we propose

### Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

### Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse

### 5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The

### MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:

### A Process for ATLAS Software Development

Atlas Software Quality Control Group A Process for ATLAS Software Development Authors : Atlas Quality Control Group M. Asai, D. Barberis (chairman), M. Bosman, R. Jones, J.-F. Laporte, M. Stavrianakou

### Affinity Prediction in Online Social Networks

Affinity Prediction in Online Social Networks Matias Estrada and Marcelo Mendoza Skout Inc., Chile Universidad Técnica Federico Santa María, Chile Abstract Link prediction is the problem of inferring whether

### Automatic Liver Segmentation using the Random Walker Algorithm

Automatic Liver Segmentation using the Random Walker Algorithm F. Maier 1,2, A. Wimmer 2,3, G. Soza 2, J. N. Kaftan 2,4, D. Fritz 1,2, R. Dillmann 1 1 Universität Karlsruhe (TH), 2 Siemens Medical Solutions,

### Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction

Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource

### Two General Methods to Reduce Delay and Change of Enumeration Algorithms

ISSN 1346-5597 NII Technical Report Two General Methods to Reduce Delay and Change of Enumeration Algorithms Takeaki Uno NII-2003-004E Apr.2003 Two General Methods to Reduce Delay and Change of Enumeration

### B490 Mining the Big Data. 2 Clustering

B490 Mining the Big Data 2 Clustering Qin Zhang 1-1 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern

### Network Analysis and Visualization of Staphylococcus aureus. by Russ Gibson

Network Analysis and Visualization of Staphylococcus aureus by Russ Gibson Network analysis Based on graph theory Probabilistic models (random graphs) developed by Erdős and Rényi in 1959 Theory and tools

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996

Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996 LMU Institut für Informatik, LFE Bioinformatik, Cheminformatics, Structure based methods J. Apostolakis 1 Genetic algorithms Inspired

### Lecture 7: Approximation via Randomized Rounding

Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan