INF563 Topological Data Analysis Steve Oudot, Mathieu Carrière {firstname.lastname}@inria.fr
Context: The data deluge - Les donnees de ce type apparaissent dans des contextes scientifiques et industrie Data are generated at an unprecedented rate by: academia industry general public 1
Context: The data deluge - Les donnees de ce type apparaissent dans des contextes scientifiques et industrie Data are generated at an unprecedented rate by: academia industry general public Need for new scalable methods to analyze and classify these data automatically 1
Exploratory analysis of geometric data - ma recherche s inscrit dans le contexte de l analyse exploratoire des donnees, dont l obj Input: set of data points with metric or (dis-)similarity measure data point 3d point, image patch, image or 3d shape in collection, Facebook user, etc. 2
Exploratory analysis of geometric data - ma recherche s inscrit dans le contexte de l analyse exploratoire des donnees, dont l obj Input: set of data points with metric or (dis-)similarity measure data point 3d point, image patch, image or 3d shape in collection, Facebook user, etc. Goal: describe the underlying structure of the data, for interpretation or summary 2
Challenges Noise Scale Rd Dimensionality Rk 3
Challenges 4 million data points in R9 (source: [Lee, Pederson, Mumford 2003]) Motivation: study cognitive representation of space of images Topology 3
Challenges 4 million data points in R9 (source: [Lee, Pederson, Mumford 2003]) Motivation: study cognitive representation of space of images underlying structure: Klein bottle (source: [Carlsson, Ishkhanov, de Silva, Zomorodian 2008]) Topology PCA k-pca Isomap 3
Challenges - each node represents an NBA player, links represent proximity relations in a 7-dimensional spac Topology (source: http://www.sloansportsconference.com/wp-content/uploads/2012/03/alagappan-muthu-eosmarch2012ppt.pdf) 3
This The is ourtopology goal at large. To ofachieve data it, (TDA) we use concepts and tools from algebraic t topological invariants for classification β 0 = β 2 = 1 β 1 = 2 like homology groups, or the dimension of their free part (called Betti numbers) A.T. in the 20th century triangulation A.T. in the 21st century compact set topological descriptors for inference and comparison β 0 β 1 β 2 point cloud 4
The TDA community This is our goal at large. To achieve it, we use concepts and tools from algebraic t (as of 2003) Stanford (G. Carlsson) Duke (H. Edelsbrunner) 2 research groups (5-10 researchers) 5
The TDA community This is our goal at large. To achieve it, we use concepts and tools from algebraic t (as of 2007) Stanford (G. Carlsson. L. Guibas) Pomona (V. de Silva) Rutgers (K. Mischaikow) UPenn (Rob Ghrist) Duke (H. Edelsbrunner, J. Harer) Jagiellonian (M. Mrozek) IST Austria (H. Edelsbrunner) Technion (R. Adler) Topological Data Analysis (F. Chazal, S. Oudot) Geometrica (J.-D. Boissonnat, D. Cohen-Steiner) 8-10 research groups ( 40-50 researchers) 5
The TDA community This is our goal at large. To achieve it, we use concepts and tools from algebraic t (as of 2014) Stanford Edinburgh, MPI, Münster IMA, TTI, OSU, UConn Jagiellonian (M. Mrozek) Rutgers (K. Mischaikow) IST Austria (H. Edelsbrunner) (G. Carlsson. L. Guibas) UPenn (Rob Ghrist) ETH, Bologna Pomona (V. de Silva) Duke (H. Edelsbrunner, J. Harer) Technion (R. Adler) (F. Chazal, S. Oudot) ENS Paris, U. Paris-Est Geometrica (J.-D. Boissonnat, D. Cohen-Steiner) Gipsa-lab, LJK 100-150 researchers at the theory level 200-300 researchers at the applications level research themes: applied topology, algorithmics, data science success stories: natural images, dynamical systems, NBA, breast cancer, 5
The TDA community This is our goal at large. To achieve it, we use concepts and tools from algebraic t (as of 2014) Stanford Edinburgh, MPI, Münster IMA, TTI, OSU, UConn Jagiellonian (M. Mrozek) Rutgers (K. Mischaikow) IST Austria (H. Edelsbrunner) (G. Carlsson. L. Guibas) UPenn (Rob Ghrist) ETH, Bologna Pomona (V. de Silva) Duke (H. Edelsbrunner, J. Harer) Technion (R. Adler) (F. Chazal, S. Oudot) ENS Paris, U. Paris-Est Geometrica (J.-D. Boissonnat, D. Cohen-Steiner) Gipsa-lab, LJK 100-150 researchers at the theory level 200-300 researchers at the applications level C est l une des research specificites themes: de l equipe applied topology, Geometrica, algorithmics, de regarder data touscience les 3 aspect success stories: natural images, dynamical systems, NBA, breast cancer, 5
A few applications Fors de notre resultat de stabilite et de notre nouveau cadre theorique pour l analy R Scalar field analysis over sensor networks [Gao, Guibas, O., Wang 2010] [Chazal, Guibas, O. Skraba 2011] sensors Stable signatures for shape comparison [Chazal, Cohen-Steiner, Guibas, Me moli, O. 2009] [Chazal, de Silva, O. 2013] [Chazal, Glisse, Labrue re, Michel 2014] camel cat elephant face head horse Unsupervised learning with guarantees on the number of clusters [Chazal, Guibas, O., Skraba 2013] 6
Course outline Session 1: dimensionality reduction (linear vs. non-linear) + lab Session 2: clustering (hierarchical, mode-seeking) + lab Session 3: homology theory + exercises Session 4: size theory, persistence + exercises Session 5: topological inference I + exercises/lab Session 6: topological inference II + exercises/lab Session 7: topological signatures I + lab Session 8: topological signatures II + lab Session 9: Mapper + lab Evaluation: written exam 7