Clustering and mapper
|
|
|
- Tracy Neal
- 10 years ago
- Views:
Transcription
1 June 17th, 2014
2 Overview Goal of talk Explain Mapper, which is the most widely used and most successful TDA technique. (At core of Ayasdi, TDA company founded by Gunnar Carlsson.) Basic idea: perform clustering at different scales, track how clusters change as scale varies. Motivation: 1 Coarser than manifold learning, but still works in nonlinear situations. 2 Still retains meaningful geometric information about data set. 3 Efficiently computable (and so can apply to very large data sets).
3 Overview Goal of talk Explain Mapper, which is the most widely used and most successful TDA technique. (At core of Ayasdi, TDA company founded by Gunnar Carlsson.) Basic idea: perform clustering at different scales, track how clusters change as scale varies. Motivation: 1 Coarser than manifold learning, but still works in nonlinear situations. 2 Still retains meaningful geometric information about data set. 3 Efficiently computable (and so can apply to very large data sets).
4 Overview Goal of talk Explain Mapper, which is the most widely used and most successful TDA technique. (At core of Ayasdi, TDA company founded by Gunnar Carlsson.) Basic idea: perform clustering at different scales, track how clusters change as scale varies. Motivation: 1 Coarser than manifold learning, but still works in nonlinear situations. 2 Still retains meaningful geometric information about data set. 3 Efficiently computable (and so can apply to very large data sets).
5 Overview Goal of talk Explain Mapper, which is the most widely used and most successful TDA technique. (At core of Ayasdi, TDA company founded by Gunnar Carlsson.) Basic idea: perform clustering at different scales, track how clusters change as scale varies. Motivation: 1 Coarser than manifold learning, but still works in nonlinear situations. 2 Still retains meaningful geometric information about data set. 3 Efficiently computable (and so can apply to very large data sets).
6 Overview Goal of talk Explain Mapper, which is the most widely used and most successful TDA technique. (At core of Ayasdi, TDA company founded by Gunnar Carlsson.) Basic idea: perform clustering at different scales, track how clusters change as scale varies. Motivation: 1 Coarser than manifold learning, but still works in nonlinear situations. 2 Still retains meaningful geometric information about data set. 3 Efficiently computable (and so can apply to very large data sets).
7 Overview Goal of talk Explain Mapper, which is the most widely used and most successful TDA technique. (At core of Ayasdi, TDA company founded by Gunnar Carlsson.) Basic idea: perform clustering at different scales, track how clusters change as scale varies. Motivation: 1 Coarser than manifold learning, but still works in nonlinear situations. 2 Still retains meaningful geometric information about data set. 3 Efficiently computable (and so can apply to very large data sets).
8 Morse theory Basic idea Describe topology of a smooth manifold M using levelsets of a suitable function h : M R. We recover M by looking at h 1 ((, t]), as t scans over the range of h. Topology of M changes at critical points of h.
9 Morse theory Basic idea Describe topology of a smooth manifold M using levelsets of a suitable function h : M R. We recover M by looking at h 1 ((, t]), as t scans over the range of h. Topology of M changes at critical points of h.
10 Morse theory Basic idea Describe topology of a smooth manifold M using levelsets of a suitable function h : M R. We recover M by looking at h 1 ((, t]), as t scans over the range of h. Topology of M changes at critical points of h.
11
12
13 Reeb graphs Convenient simplification: 1 For each t R, contract each component of f 1 (t) to a point. 2 Resulting structure is a graph.
14 Reeb graphs Convenient simplification: 1 For each t R, contract each component of f 1 (t) to a point. 2 Resulting structure is a graph.
15 Reeb graphs Convenient simplification: 1 For each t R, contract each component of f 1 (t) to a point. 2 Resulting structure is a graph.
16
17 Mapper The mapper algorithm is a generalization of this procedure. [Singh-Memoli-Carlsson] Assume given a data set X. 1 Choose a filter function f : X R. 2 Choose a cover U α of X. 3 Cluster each inverse image f 1 (U α ). 4 Form a graph where: 1 Clusters are vertices. 2 An edge connects two clusters C and C if both U α U α and C C. 5 Color vertices according to average value of f in the cluster.
18 Mapper The mapper algorithm is a generalization of this procedure. [Singh-Memoli-Carlsson] Assume given a data set X. 1 Choose a filter function f : X R. 2 Choose a cover U α of X. 3 Cluster each inverse image f 1 (U α ). 4 Form a graph where: 1 Clusters are vertices. 2 An edge connects two clusters C and C if both U α U α and C C. 5 Color vertices according to average value of f in the cluster.
19 Mapper The mapper algorithm is a generalization of this procedure. [Singh-Memoli-Carlsson] Assume given a data set X. 1 Choose a filter function f : X R. 2 Choose a cover U α of X. 3 Cluster each inverse image f 1 (U α ). 4 Form a graph where: 1 Clusters are vertices. 2 An edge connects two clusters C and C if both U α U α and C C. 5 Color vertices according to average value of f in the cluster.
20 Mapper The mapper algorithm is a generalization of this procedure. [Singh-Memoli-Carlsson] Assume given a data set X. 1 Choose a filter function f : X R. 2 Choose a cover U α of X. 3 Cluster each inverse image f 1 (U α ). 4 Form a graph where: 1 Clusters are vertices. 2 An edge connects two clusters C and C if both U α U α and C C. 5 Color vertices according to average value of f in the cluster.
21 Mapper The mapper algorithm is a generalization of this procedure. [Singh-Memoli-Carlsson] Assume given a data set X. 1 Choose a filter function f : X R. 2 Choose a cover U α of X. 3 Cluster each inverse image f 1 (U α ). 4 Form a graph where: 1 Clusters are vertices. 2 An edge connects two clusters C and C if both U α U α and C C. 5 Color vertices according to average value of f in the cluster.
22 Mapper The mapper algorithm is a generalization of this procedure. [Singh-Memoli-Carlsson] Assume given a data set X. 1 Choose a filter function f : X R. 2 Choose a cover U α of X. 3 Cluster each inverse image f 1 (U α ). 4 Form a graph where: 1 Clusters are vertices. 2 An edge connects two clusters C and C if both U α U α and C C. 5 Color vertices according to average value of f in the cluster.
23 Mapper The mapper algorithm is a generalization of this procedure. [Singh-Memoli-Carlsson] Assume given a data set X. 1 Choose a filter function f : X R. 2 Choose a cover U α of X. 3 Cluster each inverse image f 1 (U α ). 4 Form a graph where: 1 Clusters are vertices. 2 An edge connects two clusters C and C if both U α U α and C C. 5 Color vertices according to average value of f in the cluster.
24 Mapper The mapper algorithm is a generalization of this procedure. [Singh-Memoli-Carlsson] Assume given a data set X. 1 Choose a filter function f : X R. 2 Choose a cover U α of X. 3 Cluster each inverse image f 1 (U α ). 4 Form a graph where: 1 Clusters are vertices. 2 An edge connects two clusters C and C if both U α U α and C C. 5 Color vertices according to average value of f in the cluster.
25 Mapper The mapper algorithm is a generalization of this procedure. [Singh-Memoli-Carlsson] Assume given a data set X. 1 Choose a filter function f : X R. 2 Choose a cover U α of X. 3 Cluster each inverse image f 1 (U α ). 4 Form a graph where: 1 Clusters are vertices. 2 An edge connects two clusters C and C if both U α U α and C C. 5 Color vertices according to average value of f in the cluster.
26 Filter functions Clearly, choice of filter function is essential. Some kind of density measure. A score measure difference (distance) from some baseline. An eccentricity measure.
27 Filter functions Clearly, choice of filter function is essential. Some kind of density measure. A score measure difference (distance) from some baseline. An eccentricity measure.
28 Filter functions Clearly, choice of filter function is essential. Some kind of density measure. A score measure difference (distance) from some baseline. An eccentricity measure.
29 Filter functions Clearly, choice of filter function is essential. Some kind of density measure. A score measure difference (distance) from some baseline. An eccentricity measure.
30
31 Breast cancer example Highly successful example of real data analysis. [Nicolau, Carlsson, Levine] Working with vectors of gene expression data. Distance metric is correlation. Filter is a measure of (unsigned) deviation of expression from normal tissue. Results identify previously unknown c-myb+ region, which are very different from normal tissue but have very high survival rates.
32 Breast cancer example Highly successful example of real data analysis. [Nicolau, Carlsson, Levine] Working with vectors of gene expression data. Distance metric is correlation. Filter is a measure of (unsigned) deviation of expression from normal tissue. Results identify previously unknown c-myb+ region, which are very different from normal tissue but have very high survival rates.
33 Breast cancer example Highly successful example of real data analysis. [Nicolau, Carlsson, Levine] Working with vectors of gene expression data. Distance metric is correlation. Filter is a measure of (unsigned) deviation of expression from normal tissue. Results identify previously unknown c-myb+ region, which are very different from normal tissue but have very high survival rates.
34 Breast cancer example Highly successful example of real data analysis. [Nicolau, Carlsson, Levine] Working with vectors of gene expression data. Distance metric is correlation. Filter is a measure of (unsigned) deviation of expression from normal tissue. Results identify previously unknown c-myb+ region, which are very different from normal tissue but have very high survival rates.
35 Breast cancer example Highly successful example of real data analysis. [Nicolau, Carlsson, Levine] Working with vectors of gene expression data. Distance metric is correlation. Filter is a measure of (unsigned) deviation of expression from normal tissue. Results identify previously unknown c-myb+ region, which are very different from normal tissue but have very high survival rates.
36
37 NBA example Clever example of application to sports analytics. [Alagappan] Data set consists of vectors of statistics (points scored, rebounds, etc.). Distance metric is Euclidean. Filter is points per minute. Results identify many new positions.
38 NBA example Clever example of application to sports analytics. [Alagappan] Data set consists of vectors of statistics (points scored, rebounds, etc.). Distance metric is Euclidean. Filter is points per minute. Results identify many new positions.
39 NBA example Clever example of application to sports analytics. [Alagappan] Data set consists of vectors of statistics (points scored, rebounds, etc.). Distance metric is Euclidean. Filter is points per minute. Results identify many new positions.
40 NBA example Clever example of application to sports analytics. [Alagappan] Data set consists of vectors of statistics (points scored, rebounds, etc.). Distance metric is Euclidean. Filter is points per minute. Results identify many new positions.
41 NBA example Clever example of application to sports analytics. [Alagappan] Data set consists of vectors of statistics (points scored, rebounds, etc.). Distance metric is Euclidean. Filter is points per minute. Results identify many new positions.
42 Role Player One-of-a-kind Scoring Rebounder Role-Playing Ball-Handler 3PT Rebounder Shooting Ball- Handler All-NBA 1st Team All-NBA 2nd Team Offensive Ball- Handler Combo Ball-Handler Defensive Ball- Handler Scoring Paint Protector Paint Protector
43 Summary Claim Mapper can be successfully applied to analysis of geometric structures in large data sets from a wide variety of domains. Key idea: clustering across scales, represent relationships between clusters as scale varies Choice of filter function(s) is critical to successful aplication.
44 Summary Claim Mapper can be successfully applied to analysis of geometric structures in large data sets from a wide variety of domains. Key idea: clustering across scales, represent relationships between clusters as scale varies Choice of filter function(s) is critical to successful aplication.
45 Summary Claim Mapper can be successfully applied to analysis of geometric structures in large data sets from a wide variety of domains. Key idea: clustering across scales, represent relationships between clusters as scale varies Choice of filter function(s) is critical to successful aplication.
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
WORKSHOP ON TOPOLOGY AND ABSTRACT ALGEBRA FOR BIOMEDICINE
WORKSHOP ON TOPOLOGY AND ABSTRACT ALGEBRA FOR BIOMEDICINE ERIC K. NEUMANN Foundation Medicine, Cambridge, MA 02139, USA Email: [email protected] SVETLANA LOCKWOOD School of Electrical Engineering
Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
How To Understand And Understand The Theory Of Computational Finance
This course consists of three separate modules. Coordinator: Omiros Papaspiliopoulos Module I: Machine Learning in Finance Lecturer: Argimiro Arratia, Universitat Politecnica de Catalunya and BGSE Overview
They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat
HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended
Metrics on SO(3) and Inverse Kinematics
Mathematical Foundations of Computer Graphics and Vision Metrics on SO(3) and Inverse Kinematics Luca Ballan Institute of Visual Computing Optimization on Manifolds Descent approach d is a ascent direction
Topological Data Analysis Applications to Computer Vision
Topological Data Analysis Applications to Computer Vision Vitaliy Kurlin, http://kurlin.org Microsoft Research Cambridge and Durham University, UK Topological Data Analysis quantifies topological structures
JustClust User Manual
JustClust User Manual Contents 1. Installing JustClust 2. Running JustClust 3. Basic Usage of JustClust 3.1. Creating a Network 3.2. Clustering a Network 3.3. Applying a Layout 3.4. Saving and Loading
Topological Data Analysis
INF563 Topological Data Analysis Steve Oudot, Mathieu Carrière {firstname.lastname}@inria.fr Context: The data deluge - Les donnees de ce type apparaissent dans des contextes scientifiques et industrie
TOPOLOGICAL ANALYSIS OF MOBILIZE BOSTON DATA
TOPOLOGICAL ANALYSIS OF MOBILIZE BOSTON DATA A Thesis Presented to the Faculty of California State Polytechnic University, Pomona In Partial Fulfillment Of the Requirements for the Degree Master of Science
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Introduction to Topology and its Applications to Complex Data
Task: In each of these two beautiful parks, try to find a path (to walk) that uses each bridge once and exactly once. Start with the park on the right first. (The blue denotes a river that contains alligators,
Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
Geometry and Topology from Point Cloud Data
Geometry and Topology from Point Cloud Data Tamal K. Dey Department of Computer Science and Engineering The Ohio State University Dey (2011) Geometry and Topology from Point Cloud Data WALCOM 11 1 / 51
The Value of Visualization 2
The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate
Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger
Applying Data Analysis to Big Data Benchmarks Jazmine Olinger Abstract This paper describes finding accurate and fast ways to simulate Big Data benchmarks. Specifically, using the currently existing simulation
Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
Visualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland [email protected]
TDA and Machine Learning: Better Together
TDA and Machine Learning: Better Together TDA AND MACHINE LEARNING: BETTER TOGETHER 2 TABLE OF CONTENTS The New Data Analytics Dilemma... 3 Introducing Topology and Topological Data Analysis... 3 The Promise
Big Data Analytics for Healthcare
Big Data Analytics for Healthcare Jimeng Sun Chandan K. Reddy Healthcare Analytics Department IBM TJ Watson Research Center Department of Computer Science Wayne State University 1 Healthcare Analytics
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Psychological Questionnaires and Kernel Extension
Psychological Questionnaires and Kernel Extension Liberty.E 1 Almagor.M 2 James.B 3 Keller.Y 4 Coifman.R.R. 5 Zucker.S.W. 6 1 Department of Computer Science, Yale University 2 Department of Psychology,
Big Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs [email protected] Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment
Topology. Shapefile versus Coverage Views
Topology Defined as the the science and mathematics of relationships used to validate the geometry of vector entities, and for operations such as network tracing and tests of polygon adjacency Longley
Efficient Storage, Compression and Transmission
Efficient Storage, Compression and Transmission of Complex 3D Models context & problem definition general framework & classification our new algorithm applications for digital documents Mesh Decimation
Shake N Bake Basketball Services High School Level
Shake N Bake Basketball Services High School Level Author: Ganon Baker 1-866-626-0412 www.shakenbakebasketball.com Methods of Teaching 1. Be a mirror: correct wrong, praise right, repetition. 2. Teach
MeshLab and Arc3D: Photo-Reconstruction and Processing of 3D meshes
MeshLab and Arc3D: Photo-Reconstruction and Processing of 3D meshes P. Cignoni, M Corsini, M. Dellepiane, G. Ranzuglia, (Visual Computing Lab, ISTI - CNR, Italy) M. Vergauven, L. Van Gool (K.U.Leuven ESAT-PSI
Wii Remote Calibration Using the Sensor Bar
Wii Remote Calibration Using the Sensor Bar Alparslan Yildiz Abdullah Akay Yusuf Sinan Akgul GIT Vision Lab - http://vision.gyte.edu.tr Gebze Institute of Technology Kocaeli, Turkey {yildiz, akay, akgul}@bilmuh.gyte.edu.tr
3D Scanner using Line Laser. 1. Introduction. 2. Theory
. Introduction 3D Scanner using Line Laser Di Lu Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute The goal of 3D reconstruction is to recover the 3D properties of a geometric
Algebra II EOC Practice Test
Algebra II EOC Practice Test Name Date 1. Suppose point A is on the unit circle shown above. What is the value of sin? (A) 0.736 (B) 0.677 (C) (D) (E) none of these 2. Convert to radians. (A) (B) (C) (D)
Volume visualization I Elvins
Volume visualization I Elvins 1 surface fitting algorithms marching cubes dividing cubes direct volume rendering algorithms ray casting, integration methods voxel projection, projected tetrahedra, splatting
! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II
! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science [email protected]
Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science [email protected] What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
Monitoring of Internet traffic and applications
Monitoring of Internet traffic and applications Chadi BARAKAT INRIA Sophia Antipolis, France Planète research group ETH Zurich October 2009 Email: [email protected] WEB: http://www.inria.fr/planete/chadi
Digital Imaging and Communications in Medicine (DICOM) Supplement 132: Surface Segmentation Storage SOP Class
5 10 Digital Imaging and Communications in Medicine (DICOM) Supplement 132: Surface Segmentation Storage SOP Class 15 20 Prepared by: DICOM Standards Committee, Working Group 17 (3D) and 24 (Surgery) 25
How To Understand The Network Of A Network
Roles in Networks Roles in Networks Motivation for work: Let topology define network roles. Work by Kleinberg on directed graphs, used topology to define two types of roles: authorities and hubs. (Each
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Vector storage and access; algorithms in GIS. This is lecture 6
Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector
Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
SIGNATURE VERIFICATION
SIGNATURE VERIFICATION Dr. H.B.Kekre, Dr. Dhirendra Mishra, Ms. Shilpa Buddhadev, Ms. Bhagyashree Mall, Mr. Gaurav Jangid, Ms. Nikita Lakhotia Computer engineering Department, MPSTME, NMIMS University
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
Socci Sport Alternative Games
- 1 - Socci Sport Alternative Games Table of Contents 1 Roller Socci 2 2 Pass and Shoot Socci 2 3 Punt & Catch Socci 2 4 Long Pass Socci 3 5 Pass, Dribble, and Shoot Socci 3 6 Scooter Socci Basketball
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
CSCI-599 DATA MINING AND STATISTICAL INFERENCE
CSCI-599 DATA MINING AND STATISTICAL INFERENCE Course Information Course ID and title: CSCI-599 Data Mining and Statistical Inference Semester and day/time/location: Spring 2013/ Mon/Wed 3:30-4:50pm Instructor:
South East of Process Main Building / 1F. North East of Process Main Building / 1F. At 14:05 April 16, 2011. Sample not collected
At 14:05 April 16, 2011 At 13:55 April 16, 2011 At 14:20 April 16, 2011 ND ND 3.6E-01 ND ND 3.6E-01 1.3E-01 9.1E-02 5.0E-01 ND 3.7E-02 4.5E-01 ND ND 2.2E-02 ND 3.3E-02 4.5E-01 At 11:37 April 17, 2011 At
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
Bioinformatics: Network Analysis
Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,
Packet TDEV, MTIE, and MATIE - for Estimating the Frequency and Phase Stability of a Packet Slave Clock. Antti Pietiläinen
Packet TDEV, MTIE, and MATIE - for Estimating the Frequency and Phase Stability of a Packet Slave Clock Antti Pietiläinen Soc Classification level 1 Nokia Siemens Networks Expressing performance of clocks
Social Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
Healthcare data analytics. Da-Wei Wang Institute of Information Science [email protected]
Healthcare data analytics Da-Wei Wang Institute of Information Science [email protected] Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
EXPONENTIAL FUNCTIONS 8.1.1 8.1.6
EXPONENTIAL FUNCTIONS 8.1.1 8.1.6 In these sections, students generalize what they have learned about geometric sequences to investigate exponential functions. Students study exponential functions of the
203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, [email protected] Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
A Non-Linear Schema Theorem for Genetic Algorithms
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
An Open Framework for Reverse Engineering Graph Data Visualization. Alexandru C. Telea Eindhoven University of Technology The Netherlands.
An Open Framework for Reverse Engineering Graph Data Visualization Alexandru C. Telea Eindhoven University of Technology The Netherlands Overview Reverse engineering (RE) overview Limitations of current
For additional information, see the Math Notes boxes in Lesson B.1.3 and B.2.3.
EXPONENTIAL FUNCTIONS B.1.1 B.1.6 In these sections, students generalize what they have learned about geometric sequences to investigate exponential functions. Students study exponential functions of the
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired
Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)
Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath) Alex Pinto Chief Data Scientist Niddel / MLSec Project @alexcpsec @MLSecProject @NiddelCorp MLSec Project / Niddel MLSec
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
Buckets: Visualizing NBA Shot Data CPSC 547 Project Proposal
Buckets: Visualizing NBA Shot Data CPSC 547 Project Proposal [email protected] This game has always been, and will always be, about buckets. Bill Russell, 11-time NBA champion Domain, Task, Dataset Sports
Imputing Values to Missing Data
Imputing Values to Missing Data In federated data, between 30%-70% of the data points will have at least one missing attribute - data wastage if we ignore all records with a missing value Remaining data
Functional Data Analysis of MALDI TOF Protein Spectra
Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer [email protected]. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF
Multiphysics Software Applications in Reverse Engineering
Multiphysics Software Applications in Reverse Engineering *W. Wang 1, K. Genc 2 1 University of Massachusetts Lowell, Lowell, MA, USA 2 Simpleware, Exeter, United Kingdom *Corresponding author: University
Supervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
Models of Cortical Maps II
CN510: Principles and Methods of Cognitive and Neural Modeling Models of Cortical Maps II Lecture 19 Instructor: Anatoli Gorchetchnikov dy dt The Network of Grossberg (1976) Ay B y f (
The Tangent Bundle. Jimmie Lawson Department of Mathematics Louisiana State University. Spring, 2006
The Tangent Bundle Jimmie Lawson Department of Mathematics Louisiana State University Spring, 2006 1 The Tangent Bundle on R n The tangent bundle gives a manifold structure to the set of tangent vectors
Vector Spaces; the Space R n
Vector Spaces; the Space R n Vector Spaces A vector space (over the real numbers) is a set V of mathematical entities, called vectors, U, V, W, etc, in which an addition operation + is defined and in which
Dmitri Krioukov CAIDA/UCSD
Hyperbolic geometry of complex networks Dmitri Krioukov CAIDA/UCSD [email protected] F. Papadopoulos, M. Boguñá, A. Vahdat, and kc claffy Complex networks Technological Internet Transportation Power grid
Editing Common Polygon Boundary in ArcGIS Desktop 9.x
Editing Common Polygon Boundary in ArcGIS Desktop 9.x Article ID : 100018 Software : ArcGIS ArcView 9.3, ArcGIS ArcEditor 9.3, ArcGIS ArcInfo 9.3 (or higher versions) Platform : Windows XP, Windows Vista
TEXTURE AND BUMP MAPPING
Department of Applied Mathematics and Computational Sciences University of Cantabria UC-CAGD Group COMPUTER-AIDED GEOMETRIC DESIGN AND COMPUTER GRAPHICS: TEXTURE AND BUMP MAPPING Andrés Iglesias e-mail:
BB2798 How Playtech uses predictive analytics to prevent business outages
BB2798 How Playtech uses predictive analytics to prevent business outages Eli Eyal, Playtech Udi Shagal, HP June 13, 2013 Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
w ki w kj k=1 w2 ki k=1 w2 kj F. Aiolli - Sistemi Informativi 2007/2008
RSV of the Vector Space Model The matching function RSV is the cosine of the angle between the two vectors RSV(d i,q j )=cos(α)= n k=1 w kiw kj d i 2 q j 2 = n k=1 n w ki w kj n k=1 w2 ki k=1 w2 kj 8 Note
Auditing EMR System Usage. You Chen Jan, 17, 2013 [email protected]
Auditing EMR System Usage You Chen Jan, 17, 2013 [email protected] Health data being accessed by hackers, lost with laptop computers, or simply read by curious employees Anomalous Usage You Chen,
A Secure Online Reputation Defense System from Unfair Ratings using Anomaly Detections
A Secure Online Reputation Defense System from Unfair Ratings using Anomaly Detections Asha baby PG Scholar,Department of CSE A. Kumaresan Professor, Department of CSE K. Vijayakumar Professor, Department
Elementary Statistics
Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,
Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -
Course on Functional Analysis ::: Madrid, June 31st, 2007. Gonzalo Gómez, PhD. [email protected] Bioinformatics Unit CNIO ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA
Similarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics
Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)
High-dimensional labeled data analysis with Gabriel graphs
High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use
Basic Understandings
Activity: TEKS: Exploring Transformations Basic understandings. (5) Tools for geometric thinking. Techniques for working with spatial figures and their properties are essential to understanding underlying
Big Data for Satellite Business Intelligence
Big Data for Satellite Business Intelligence GSAW 2015 Loic COULET, Kratos ISE 2015 by Kratos ISE. Published by The Aerospace Corporation with permission. Who s talking? Computer Science Passionate Kratos
Christian Bettstetter. Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks
Christian Bettstetter Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks Contents 1 Introduction 1 2 Ad Hoc Networking: Principles, Applications, and Research Issues 5 2.1 Fundamental
Teaching Mathematics and Statistics Using Tennis
Teaching Mathematics and Statistics Using Tennis Reza Noubary ABSTRACT: The widespread interest in sports in our culture provides a great opportunity to catch students attention in mathematics and statistics
Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume *
Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume * Xiaosong Yang 1, Pheng Ann Heng 2, Zesheng Tang 3 1 Department of Computer Science and Technology, Tsinghua University, Beijing
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
Exploratory data analysis approaches unsupervised approaches. Steven Kiddle With thanks to Richard Dobson and Emanuele de Rinaldis
Exploratory data analysis approaches unsupervised approaches Steven Kiddle With thanks to Richard Dobson and Emanuele de Rinaldis Lecture overview Page 1 Ø Background Ø Revision Ø Other clustering methods
YMCA Basketball Games and Skill Drills for 3 5 Year Olds
YMCA Basketball Games and s for 3 5 Year Olds Tips ( s) Variations Page 2 Dribbling Game 10 Players will learn that they must be able to dribble to attack the basket (target) to score in basketball. The
Random Map Generator v1.0 User s Guide
Random Map Generator v1.0 User s Guide Jonathan Teutenberg 2003 1 Map Generation Overview...4 1.1 Command Line...4 1.2 Operation Flow...4 2 Map Initialisation...5 2.1 Initialisation Parameters...5 -w xxxxxxx...5
