Topological Data Analysis

Similar documents
Data Analysis using Computational Topology and Geometric Statistics

Topological Data Analysis Applications to Computer Vision

BARCODES: THE PERSISTENT TOPOLOGY OF DATA

Clustering and mapper

A fast and robust algorithm to count topologically persistent holes in noisy clouds

How To Understand And Understand The Theory Of Computational Finance

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Introduction to the R package TDA

Topological Data Analysis and Machine Learning Theory

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

WORKSHOP ON TOPOLOGY AND ABSTRACT ALGEBRA FOR BIOMEDICINE

RESEARCH SUMMARY PETER BUBENIK

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

TDA and Machine Learning: Better Together

Statistiques en grande dimension

Machine Learning for Data Science (CS4786) Lecture 1

Using multiple models: Bagging, Boosting, Ensembles, Forests

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Liste d'adresses URL

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

TOPOLOGY AND DATA GUNNAR CARLSSON

Unsupervised Data Mining (Clustering)

Introduction to Topology and its Applications to Complex Data

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Geometry and Topology from Point Cloud Data

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Learning to Process Natural Language in Big Data Environment

Machine learning for algo trading

Visualization of General Defined Space Data

Big Data and Complex Networks Analytics. Timos Sellis, CSIT Kathy Horadam, MGS

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Fanny Dos Reis. Visiting Assistant Professor, Texas A&M University. September May 2008

Divvy: Fast and Intuitive Exploratory Data Analysis

How To Cluster

DATA ANALYTICS Unlocking knowledge and value from data

Virtual Landmarks for the Internet

Information Management course

ADVANCED MACHINE LEARNING. Introduction

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Stable Topological Signatures for Points on 3D Shapes

INTERACTIVE DATA EXPLORATION USING MDS MAPPING

Robust Blind Watermarking Mechanism For Point Sampled Geometry

Visualization of Large Font Databases

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

DSSP Data Science Starter Program - Polytechnique

A quick trip through geometrical shape comparison

Final Project Report

Exploratory Data Analysis with MATLAB

Neural Networks Lesson 5 - Cluster Analysis

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Evaluating Ayasdi s Topological Data Analysis For Big Data

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

High-dimensional labeled data analysis with Gabriel graphs

Clustering Connectionist and Statistical Language Processing

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

A Computational Framework for Exploratory Data Analysis

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Object class recognition using unsupervised scale-invariant learning

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Machine Learning Introduction

Steven C.H. Hoi School of Information Systems Singapore Management University

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Account Manager H/F - CDI - France

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

CURRICULUM VITAE. August 2008 now: Lecturer in Analysis at the University of Birmingham.

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Machine Learning in Computer Vision A Tutorial. Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN

: Introduction to Machine Learning Dr. Rita Osadchy

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Data, Measurements, Features

How To Understand The Theory Of Probability

Introduction to Data Mining

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理

HDDVis: An Interactive Tool for High Dimensional Data Visualization

Visualization of Breast Cancer Data by SOM Component Planes

Transcription:

INF563 Topological Data Analysis Steve Oudot, Mathieu Carrière {firstname.lastname}@inria.fr

Context: The data deluge - Les donnees de ce type apparaissent dans des contextes scientifiques et industrie Data are generated at an unprecedented rate by: academia industry general public 1

Context: The data deluge - Les donnees de ce type apparaissent dans des contextes scientifiques et industrie Data are generated at an unprecedented rate by: academia industry general public Need for new scalable methods to analyze and classify these data automatically 1

Exploratory analysis of geometric data - ma recherche s inscrit dans le contexte de l analyse exploratoire des donnees, dont l obj Input: set of data points with metric or (dis-)similarity measure data point 3d point, image patch, image or 3d shape in collection, Facebook user, etc. 2

Exploratory analysis of geometric data - ma recherche s inscrit dans le contexte de l analyse exploratoire des donnees, dont l obj Input: set of data points with metric or (dis-)similarity measure data point 3d point, image patch, image or 3d shape in collection, Facebook user, etc. Goal: describe the underlying structure of the data, for interpretation or summary 2

Challenges Noise Scale Rd Dimensionality Rk 3

Challenges 4 million data points in R9 (source: [Lee, Pederson, Mumford 2003]) Motivation: study cognitive representation of space of images Topology 3

Challenges 4 million data points in R9 (source: [Lee, Pederson, Mumford 2003]) Motivation: study cognitive representation of space of images underlying structure: Klein bottle (source: [Carlsson, Ishkhanov, de Silva, Zomorodian 2008]) Topology PCA k-pca Isomap 3

Challenges - each node represents an NBA player, links represent proximity relations in a 7-dimensional spac Topology (source: http://www.sloansportsconference.com/wp-content/uploads/2012/03/alagappan-muthu-eosmarch2012ppt.pdf) 3

This The is ourtopology goal at large. To ofachieve data it, (TDA) we use concepts and tools from algebraic t topological invariants for classification β 0 = β 2 = 1 β 1 = 2 like homology groups, or the dimension of their free part (called Betti numbers) A.T. in the 20th century triangulation A.T. in the 21st century compact set topological descriptors for inference and comparison β 0 β 1 β 2 point cloud 4

The TDA community This is our goal at large. To achieve it, we use concepts and tools from algebraic t (as of 2003) Stanford (G. Carlsson) Duke (H. Edelsbrunner) 2 research groups (5-10 researchers) 5

The TDA community This is our goal at large. To achieve it, we use concepts and tools from algebraic t (as of 2007) Stanford (G. Carlsson. L. Guibas) Pomona (V. de Silva) Rutgers (K. Mischaikow) UPenn (Rob Ghrist) Duke (H. Edelsbrunner, J. Harer) Jagiellonian (M. Mrozek) IST Austria (H. Edelsbrunner) Technion (R. Adler) Topological Data Analysis (F. Chazal, S. Oudot) Geometrica (J.-D. Boissonnat, D. Cohen-Steiner) 8-10 research groups ( 40-50 researchers) 5

The TDA community This is our goal at large. To achieve it, we use concepts and tools from algebraic t (as of 2014) Stanford Edinburgh, MPI, Münster IMA, TTI, OSU, UConn Jagiellonian (M. Mrozek) Rutgers (K. Mischaikow) IST Austria (H. Edelsbrunner) (G. Carlsson. L. Guibas) UPenn (Rob Ghrist) ETH, Bologna Pomona (V. de Silva) Duke (H. Edelsbrunner, J. Harer) Technion (R. Adler) (F. Chazal, S. Oudot) ENS Paris, U. Paris-Est Geometrica (J.-D. Boissonnat, D. Cohen-Steiner) Gipsa-lab, LJK 100-150 researchers at the theory level 200-300 researchers at the applications level research themes: applied topology, algorithmics, data science success stories: natural images, dynamical systems, NBA, breast cancer, 5

The TDA community This is our goal at large. To achieve it, we use concepts and tools from algebraic t (as of 2014) Stanford Edinburgh, MPI, Münster IMA, TTI, OSU, UConn Jagiellonian (M. Mrozek) Rutgers (K. Mischaikow) IST Austria (H. Edelsbrunner) (G. Carlsson. L. Guibas) UPenn (Rob Ghrist) ETH, Bologna Pomona (V. de Silva) Duke (H. Edelsbrunner, J. Harer) Technion (R. Adler) (F. Chazal, S. Oudot) ENS Paris, U. Paris-Est Geometrica (J.-D. Boissonnat, D. Cohen-Steiner) Gipsa-lab, LJK 100-150 researchers at the theory level 200-300 researchers at the applications level C est l une des research specificites themes: de l equipe applied topology, Geometrica, algorithmics, de regarder data touscience les 3 aspect success stories: natural images, dynamical systems, NBA, breast cancer, 5

A few applications Fors de notre resultat de stabilite et de notre nouveau cadre theorique pour l analy R Scalar field analysis over sensor networks [Gao, Guibas, O., Wang 2010] [Chazal, Guibas, O. Skraba 2011] sensors Stable signatures for shape comparison [Chazal, Cohen-Steiner, Guibas, Me moli, O. 2009] [Chazal, de Silva, O. 2013] [Chazal, Glisse, Labrue re, Michel 2014] camel cat elephant face head horse Unsupervised learning with guarantees on the number of clusters [Chazal, Guibas, O., Skraba 2013] 6

Course outline Session 1: dimensionality reduction (linear vs. non-linear) + lab Session 2: clustering (hierarchical, mode-seeking) + lab Session 3: homology theory + exercises Session 4: size theory, persistence + exercises Session 5: topological inference I + exercises/lab Session 6: topological inference II + exercises/lab Session 7: topological signatures I + lab Session 8: topological signatures II + lab Session 9: Mapper + lab Evaluation: written exam 7