How To Cluster Of Complex Systems
|
|
|
- Jocelyn Parrish
- 5 years ago
- Views:
Transcription
1 Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University
2 Complex Systems Definition Dynamically evolving networks that have a large number of objects with complex connectivity Examples Social networks Biological i l networks Telecommunication networks WWW
3 Clustering of Complex Systems Purpose Issues Predicting groups of objects that closely communicate with each other Groups having the same subjects (called communities) Groups having the same functions (called modules) Large scale Scalability / Efficiency Complex connectivity Robustness / Accuracy Approaches Density based methods Searching densely connected sub graphs Partition based methods Detecting the best partition Hierarchical methods Merging sub graphs iteratively, or dividing sub graphs recursively
4 New Definition: Graph Entropy Assumptions Entropy in a graph represents structural stability Isolated complete sub graphs as clusters have the most stable status (i.e., lowest entropy) Random reconnections decrease stability (i.e., increase entropy) General Notations Inner links of v given G (V,E ): edges from v to the vertices in V p i (v): probability of v having inner links Outer links of v given G (V,E ): edges from v to the vertices not in V p o (v): probability of v having outer links Definitions Vertex Entropy: Graph Entropy: e(v) = p i (v) log 2 p i (v) p o (v) log 2 p o (v) e(g(v,e)) = v V e(v)
5 Examples of Graph Entropy Examples e(a) = e(b) = e(c) = 0 e(d) = 0.81 e(f) = 1.00 e(g) = e(h) = e(k) = 0 e(g) = 1.81 e(a) = e(b) = e(c) = e(d) = 0 e(f) = 1.00 e(g) = 0.92 e(h) = e(k) = 0 e(g) = 1.92 {a,b,c,d} giveshigher structural stability to G than {a,b,c,d,f}
6 Entropy Based Graph Clustering Algorithm (1) Select a random seed vertex, and form an initial cluster with the seed and its neighbors (2) Remove a vertex on the inner boundary of the cluster if it decreases graph entropy (3) Repeat step (2) until graph entropy is minimal (4) Add a vertex on the outer boundary of the cluster if it decreases graph entropy (5) Repeat step (4) until graph entropy is minimal (6) Output the cluster (7) Repeat steps (1) to (6) until there is no more vertex left Features Local optimization No parameters Randomness Low density clusters Overlapping clusters
7 Application to Biological Networks Dataset Yeast genome wide protein protein interaction data from DIP 4,928 proteins (vertices) and 17,186 interactions (edges) Clustering Accuracy Evaluation Method Comparison with known protein complex data f measure score: mean of recall and precision, Results
8 Revisions of Entropy Based Clustering Priority on Seed Selection in Step (1) Selecting a seed vertex in the descending order of degree Selecting a seed vertex in the descending order of clustering coefficient Priority on Seed Growth in Step (4) Adding a vertex on the outer boundary in the ascending order of vertex entropy Increase of Cluster Overlapping Rates Allowing random selection of a seed vertex from previous clusters Post Process Filtering out the clusters with graph entropy higher than a threshold
9 Accuracy Improvement Results before post processprocess after post processprocess # clusters avg. size avg. f score # clusters avg. size avg. f score original algorithm degree based seed selection coef. based seed selection degree based seed selection AND entropy based seed growth coef. based seed selection AND entropy based seed growth Comparison with other competing methods method # clusters avg. size avg. f score Entropy Based Clustering MCL CNM
10 Application to Social / Telecommunication Networks Dataset AS (Autonomous System) link network YouTube video link network MySpace social network dataset # vertices # edges density (%) AS links 45, , YouTube 321, , MySpace 100,000 6,854, Clustering Accuracy Evaluation Method p value of cumulative hypergeometric distribution for each vertex in a cluster Averaging p values of all vertices in a cluster Results method AS links YouTube MySpace # clusters avg. size avg. log p # clusters avg. size avg. log p # clusters avg. size avg. log p Entropy Based , , MCL ,
11 Efficiency Improvement Parallelization Multiple seed growth concurrently producing different clusters Issues Avoiding duplicate cluster generation Sharing resource of candidate seeds Results Close to linear increase of speed up as the increment of the number of threads Runtime Comparison with other competing methods method AS links YouTube MySpace Entropy Based (1 thread) ,028 Entropy Based (4 threads) ,519 MCL 6, ,764 CNM ,334
12 Conclusion Significance Systematic approach to analyze complex systems Information theoretic model to formulate modularity Local optimization algorithm for graph clustering Applications to diverse areas: Computational systems biology Social network analysis National security Internet flow analysis Future Works Further revisions of entropy based graph clustering algorithm Revising the seed selection step, e.g., starting from cliques as initial seed clusters Revising the seed growth step, e.g., alternating vertex addition and removal
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
Crawling and Detecting Community Structure in Online Social Networks using Local Information
Crawling and Detecting Community Structure in Online Social Networks using Local Information TU Delft - Network Architectures and Services (NAS) 1/12 Outline In order to find communities in a graph one
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
A scalable multilevel algorithm for graph clustering and community structure detection
A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
GEOENGINE MSc in Geomatics Engineering (Master Thesis) Anamelechi, Falasy Ebere
Master s Thesis: ANAMELECHI, FALASY EBERE Analysis of a Raster DEM Creation for a Farm Management Information System based on GNSS and Total Station Coordinates Duration of the Thesis: 6 Months Completion
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Why? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
Link Prediction in Social Networks
CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily
KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
Unsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
MODEL SELECTION FOR SOCIAL NETWORKS USING GRAPHLETS
MODEL SELECTION FOR SOCIAL NETWORKS USING GRAPHLETS JEANNETTE JANSSEN, MATT HURSHMAN, AND NAUZER KALYANIWALLA Abstract. Several network models have been proposed to explain the link structure observed
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
Distributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
Graph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
Big Data Graph Algorithms
Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian
Network Algorithms for Homeland Security
Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining [email protected] Outline Predictive modeling methodology k-nearest Neighbor
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
Hadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
Forschungskolleg Data Analytics Methods and Techniques
Forschungskolleg Data Analytics Methods and Techniques Martin Hahmann, Gunnar Schröder, Phillip Grosse Prof. Dr.-Ing. Wolfgang Lehner Why do we need it? We are drowning in data, but starving for knowledge!
Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations
Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns
Validation and automatic repair of two- and three-dimensional GIS datasets
Validation and automatic repair of two- and three-dimensional GIS datasets M. Meijers H. Ledoux K. Arroyo Ohori J. Zhao OSGeo.nl dag 2013, Delft 2013 11 13 1 / 26 Typical error: polygon is self-intersecting
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set
http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
Software Testing Strategies and Techniques
Software Testing Strategies and Techniques Sheetal Thakare 1, Savita Chavan 2, Prof. P. M. Chawan 3 1,2 MTech, Computer Engineering VJTI, Mumbai 3 Associate Professor, Computer Technology Department, VJTI,
Conclusions and Future Directions
Chapter 9 This chapter summarizes the thesis with discussion of (a) the findings and the contributions to the state-of-the-art in the disciplines covered by this work, and (b) future work, those directions
Complex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich [email protected] 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications
Evaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist [email protected], [email protected], [email protected] Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
Chapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
Presto/Blockus: Towards Scalable R Data Analysis
/Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIA-UIUC-AL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Network Analysis. BCH 5101: Analysis of -Omics Data 1/34
Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
How To Find Influence Between Two Concepts In A Network
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing
Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
A Process for ATLAS Software Development
Atlas Software Quality Control Group A Process for ATLAS Software Development Authors : Atlas Quality Control Group M. Asai, D. Barberis (chairman), M. Bosman, R. Jones, J.-F. Laporte, M. Stavrianakou
Logistic Regression for Spam Filtering
Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used
Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)
E6893 Big Data Analytics Lecture 10: Linked Big Data Graph Computing (II) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)
Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse
Affinity Prediction in Online Social Networks
Affinity Prediction in Online Social Networks Matias Estrada and Marcelo Mendoza Skout Inc., Chile Universidad Técnica Federico Santa María, Chile Abstract Link prediction is the problem of inferring whether
Automatic parameter regulation for a tracking system with an auto-critical function
Automatic parameter regulation for a tracking system with an auto-critical function Daniela Hall INRIA Rhône-Alpes, St. Ismier, France Email: [email protected] Abstract In this article we propose
A Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II
! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
Graph Theory and Networks in Biology
Graph Theory and Networks in Biology Oliver Mason and Mark Verwoerd March 14, 2006 Abstract In this paper, we present a survey of the use of graph theoretical techniques in Biology. In particular, we discuss
CLUSTERING FOR FORENSIC ANALYSIS
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 129-136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
arxiv:cs.dm/0204001 v1 30 Mar 2002
A Steady State Model for Graph Power Laws David Eppstein Joseph Wang arxiv:cs.dm/0000 v 0 Mar 00 Abstract Power law distribution seems to be an important characteristic of web graphs. Several existing
Mathematics for Algorithm and System Analysis
Mathematics for Algorithm and System Analysis for students of computer and computational science Edward A. Bender S. Gill Williamson c Edward A. Bender & S. Gill Williamson 2005. All rights reserved. Preface
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Bioinformatics: Network Analysis
Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,
5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.
1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The
A Business Process Driven Approach for Generating Software Modules
A Business Process Driven Approach for Generating Software Modules Xulin Zhao, Ying Zou Dept. of Electrical and Computer Engineering, Queen s University, Kingston, ON, Canada SUMMARY Business processes
How To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
How to Design the Shortest Path Between Two Keyword Algorithms
Multi-Approximate-Keyword Routing in GIS Data Bin Yao Mingwang Tang Feifei Li [email protected], {tang,lifeifei}@cs.utah.edu Department of Computer Science and Engineering, Shanghai JiaoTong University
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Tools and Techniques for Social Network Analysis
Tools and Techniques for Social Network Analysis Pajek Program for Analysis and Visualization of Large Networks Pajek: What is it Pajek is a program, for Windows and Linux (via Wine) Developers: Vladimir
PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.
PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo. VLDB 2009 CS 422 Decision Trees: Main Components Find Best Split Choose split
How To Develop Software
Software Engineering Prof. N.L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture-4 Overview of Phases (Part - II) We studied the problem definition phase, with which
On Efficiently Capturing Scien3fic Proper3es in Distributed Big Data without Moving the Data:
On Efficiently Capturing Scien3fic Proper3es in Distributed Big Data without Moving the Data: Case Study in Distributed Structural Biology using MapReduce Boyu Zhang, Trilce Estrada 2, Pietro Cico@ 3,
Automatic Labeling of Lane Markings for Autonomous Vehicles
Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 [email protected] 1. Introduction As autonomous vehicles become more popular,
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions
Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)
Map/Reduce Affinity Propagation Clustering Algorithm
Map/Reduce Affinity Propagation Clustering Algorithm Wei-Chih Hung, Chun-Yen Chu, and Yi-Leh Wu Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology,
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
Introduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
IBM SPSS Modeler Social Network Analysis 15 User Guide
IBM SPSS Modeler Social Network Analysis 15 User Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 25. This edition applies to IBM
