Introduction to Topology and its Applications to Complex Data



Similar documents
Euler Paths and Euler Circuits

Clustering and mapper

WORKSHOP ON TOPOLOGY AND ABSTRACT ALGEBRA FOR BIOMEDICINE

Graph Theory Origin and Seven Bridges of Königsberg -Rhishikesh

The Basics of Graphical Models

12 Surface Area and Volume

Million Dollar Mathematics!

Graph Theory Lecture 3: Sum of Degrees Formulas, Planar Graphs, and Euler s Theorem Spring 2014 Morgan Schreffler Office: POT 902

Geometry and Topology from Point Cloud Data

Topological Data Analysis Applications to Computer Vision

Actually Doing It! 6. Prove that the regular unit cube (say 1cm=unit) of sufficiently high dimension can fit inside it the whole city of New York.

For example, estimate the population of the United States as 3 times 10⁸ and the

Graph theory and network analysis. Devika Subramanian Comp 140 Fall 2008

Master of Arts in Mathematics

Introduction to Quadratic Functions

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Topological Treatment of Platonic, Archimedean, and Related Polyhedra

Shape Dictionary YR to Y6

Support Materials for Core Content for Assessment. Mathematics

STRAND: Number and Operations Algebra Geometry Measurement Data Analysis and Probability STANDARD:

3D shapes. Level A. 1. Which of the following is a 3-D shape? A) Cylinder B) Octagon C) Kite. 2. What is another name for 3-D shapes?

Surface bundles over S 1, the Thurston norm, and the Whitehead link

TDA and Machine Learning: Better Together

Arrangements And Duality

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Introduction to Graph Theory

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

A Working Knowledge of Computational Complexity for an Optimizer

Indicator 2: Use a variety of algebraic concepts and methods to solve equations and inequalities.

Prentice Hall Algebra Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

TOPOLOGY OF SINGULAR FIBERS OF GENERIC MAPS

Solutions to old Exam 1 problems

Lecture 7: NP-Complete Problems

Lecture 4 DISCRETE SUBGROUPS OF THE ISOMETRY GROUP OF THE PLANE AND TILINGS

How to write a technique essay? A student version. Presenter: Wei-Lun Chao Date: May 17, 2012

A Non-Linear Schema Theorem for Genetic Algorithms

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

ON TORI TRIANGULATIONS ASSOCIATED WITH TWO-DIMENSIONAL CONTINUED FRACTIONS OF CUBIC IRRATIONALITIES.

Zachary Monaco Georgia College Olympic Coloring: Go For The Gold

1 Symmetries of regular polyhedra

The Clar Structure of Fullerenes

What are the place values to the left of the decimal point and their associated powers of ten?

Cycles in a Graph Whose Lengths Differ by One or Two

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Prentice Hall: Middle School Math, Course Correlated to: New York Mathematics Learning Standards (Intermediate)

The Open University s repository of research publications and other research outputs

Revised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m)

8.1 Examples, definitions, and basic properties

Geometry and Measurement

The degree of a polynomial function is equal to the highest exponent found on the independent variables.

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

Midterm Practice Problems

SNP Essentials The same SNP story

SPERNER S LEMMA AND BROUWER S FIXED POINT THEOREM

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

Problem of the Month: Cutting a Cube

with functions, expressions and equations which follow in units 3 and 4.

Solving Simultaneous Equations and Matrices

Protein Protein Interaction Networks

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Data Mining: Algorithms and Applications Matrix Math Review

Social Media Mining. Data Mining Essentials

Sum of Degrees of Vertices Theorem

2.2. Instantaneous Velocity

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

Common Core Unit Summary Grades 6 to 8

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Teaching and Learning 3-D Geometry

Graph Theory Problems and Solutions

A REMARK ON ALMOST MOORE DIGRAPHS OF DEGREE THREE. 1. Introduction and Preliminaries

Class One: Degree Sequences

How To Understand And Understand The Theory Of Computational Finance

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Models of Cortical Maps II

Bicycle Math. presented to the Olivetti Club. Timothy E. Goldberg. March 30, Cornell University Ithaca, New York

Medial Axis Construction and Applications in 3D Wireless Sensor Networks

MATHS LEVEL DESCRIPTORS

Lecture 17 : Equivalence and Order Relations DRAFT

Data Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico

CS311H. Prof: Peter Stone. Department of Computer Science The University of Texas at Austin

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Boulder Dash is NP hard

The Data Mining Process

Just the Factors, Ma am

: Introduction to Machine Learning Dr. Rita Osadchy

Illinois State Standards Alignments Grades Three through Eleven

This puzzle is based on the following anecdote concerning a Hungarian sociologist and his observations of circles of friends among children.

Level 1 - Maths Targets TARGETS. With support, I can show my work using objects or pictures 12. I can order numbers to 10 3

Integer Operations. Overview. Grade 7 Mathematics, Quarter 1, Unit 1.1. Number of Instructional Days: 15 (1 day = 45 minutes) Essential Questions

Elements of Art Name Design Project!

Mathematics. Mathematical Practices

Transcription:

Task: In each of these two beautiful parks, try to find a path (to walk) that uses each bridge once and exactly once. Start with the park on the right first. (The blue denotes a river that contains alligators, so no swimming allowed!) Can you find a path that starts on one of the two islands? Yogesh More 1 / 63

Introduction to Topology and its Applications to Complex Data Yogesh More Yogesh More 2 / 63

I will introduce the branch of mathematics called Topology through some of the puzzles that first gave rise to the subject 100 to 200 years ago. These puzzles illustrate some ideas topologists started exploring around 1900. These classical ideas have recently found applications in the analysis of complex data. Yogesh More 3 / 63

Key aspects of Topology Three key aspects of Topology are: 1 deformation invariance 2 compressed representation (e.g. graphs or networks) 3 discrete invariants (e.g, numbers, groups, rings,...) Yogesh More 4 / 63

What is topology? Topology is the mathematical study of the properties that are preserved through deformations, twistings, and stretchings of objects. Tearing, however, is not allowed. So topology focuses on the qualitative aspects of objects or shapes. (http://mathworld.wolfram.com/topology.html) Old joke: A topologist is someone who cannot distinguish between a doughnut and a coffee cup Topology is not topography - the field of geoscience and planetary science comprising the study of surface shape and features of the Earth and other observable astronomical objects Yogesh More 5 / 63

History of topology and roadmap for this talk 1736: Leonhard Euler s paper on the Seven Bridges of Konigsberg 1863: Mobius gave a classification of two-sided surfaces: every such surface can be deformed to one of the following g holed surfaces: 1871: To any shape X, Betti associated Betti numbers b 0 (X ), b 1 (X ), b 2 (X ), b 3 (X ),.... These numbers measure the connectedness, number of loops, holes, and higher dimensional analogs of holes in the shape X. Betti numbers of a (solid) 3 dimensional ball D 3 (e.g dodgeball, but not a soccer ball) are b 0 = 1, b 1 = 0, b 2 = 0. Betti numbers of the 2-sphere S 2 (e.g soccer ball or football, but not a dodgeball) are b 0 = 1, b 1 = 0, b 2 = 1. Yogesh More 6 / 63

History of topology and roadmap for this talk 1895: Henri Poincare: provided new foundation to Betti numbers by introducing homology theory asked whether Betti numbers are enough to specify the shape (up to homeomorphism) Found the answer was no; revised question to whether contractibility of loops is enough to distinguish the 3-sphere S 3 (Poincare conjecture, took 100 years to prove the answer is yes ) 1925: Emmy Noether shifted the focus from Betti numbers to homology groups 2000-present: Gunnar Carlsson and others are using homology theory to analyse complex data arising in all sort of situations such as cancer research, drug discovery, financial-market research, voting patterns,... Yogesh More 7 / 63

Bridges of Konigsberg (now called Kaliningrad, in Russia) Photo credit: Vladimir Sedach, Wikipedia Yogesh More 8 / 63

Bridges of Konigsberg Image credit: Bogdan Giuc, Wikipedia Konigsberg Bridge problem: Find a path that crosses each bridge exactly once. It doesn t have the start and end at the same place. Such a path is now called an Euler path. Yogesh More 9 / 63

Task: Try to find an Euler path in each of these two beautiful parks. Start with the park on the right first. Yogesh More 10 / 63

Here is one solution to the map that was on the right: The map that was on the left has no solution. But why? Yogesh More 11 / 63

Euler s solution Form a graph: Yogesh More 12 / 63

Euler s solution Form a graph, label each vertex by its degree: Yogesh More 13 / 63

Euler s solution Each time we enter and exit a vertex, we use two of the edges (basically).so for any vertex that is not the starting and ending point of an Euler path, we end up using an even number of edges. So since we want to use all edges, all vertices, except possibly the starting and ending points, must have even degree. Theorem (Euler, 1736) A graph has an Euler path if and only if at most two vertices have odd degree. A Bridge problem has an Euler path if and only if at most two of the land masses have an odd number of bridges. So for Konigsberg in particular, there is no such path. Yogesh More 14 / 63

Key aspects of topology Our three key aspects of topology, applied to Konigberg bridge problem: 1 deformation invariance: size and placement of land masses and bridges didn t matter 2 compressed representation: graphs or networks 3 discrete invariant (e.g number): degree of each vertex (number of bridges on each land mass) Yogesh More 15 / 63

Magic Trick Draw any connected graph, tell me the number of vertices V and edges E, and I will tell you the number H of holes (or bounded regions) it has. Yogesh More 16 / 63

The magic formula H = E V + 1 This formula comes from a bigger story. Yogesh More 17 / 63

Platonic solids Table credit: Wikipedia Yogesh More 18 / 63

Platonic solids: Euler characteristic Table credit: Wikipedia Yogesh More 19 / 63

Soccer ball, or truncated icosahedron Photo credit: Aaron Rotenberg, Wikipedia 12 pentagonal faces, 20 hexagonal faces V = 60, E = 90, F = 12 + 20 = 32 V E + F = 2 Yogesh More 20 / 63

Euler s Polyhedron Formula Theorem (Euler s Polyhedron Formula) For any decomposition of a spherical object into V vertices, E edges, and F faces, we have V E + F = 2 Yogesh More 21 / 63

Back to magic trick How to apply Euler s polyhedron formula to our graph? We need a sphere... Trick: bend the paper into a sphere! Then the unbounded region of the plane becomes a (big) face. Hence the total number of faces is one more than the number of holes in our graph: F = H + 1 Plugging this into Euler s formula V E + F = 2 we get Solving for H gives V E + (H + 1) = 2 H = E V + 1 Yogesh More 22 / 63

Euler s formula for torus instead of a sphere What if we replace the sphere with a torus? Yogesh More 23 / 63

Euler s formula for surface with 2 holes What if we replace the sphere with a surface wth 2-holes? Yogesh More 24 / 63

Euler s formula for a surface with g holes Theorem For a surface with g-holes, V E + F = 2 2g The quantity V E + F is called the Euler characteristic. More generally, the Euler characteristic χ(x ) of any shape X is defined to be the alternating sum of the Betti numbers χ(x ) = b 0 b 1 + b 2 b 3 + Yogesh More 25 / 63

Recall our three key aspects of topology: 1 deformation invariance: the value of V E + F is the same for tetrahedron, soccer ball, or any triangulation of a spherical surface 2 compressed representation: we can represent a sphere by (the exterior of) a tetrahedron 3 discrete invariants: V E + F, g Yogesh More 26 / 63

Simplices: A topologist s building blocks Yogesh More 27 / 63

Betti numbers Any shape has Betti numbers, b 0, b 1, b 2,...,. The number b k is roughly a measure of the number of k + 1-dimensional holes. The first few Betti numbers have the following definitions for 0-dimensional, 1-dimensional, and 2-dimensional simplicial complexes: b 0 is the number of connected components b 1 is the number of circular holes b 2 is the number of two-dimensional voids or cavities In higher dimension we lose the ability to visualize geometry, but b 3, b 4, can be defined algebraically. Yogesh More 28 / 63

Example of Betti numbers Betti numbers of a (solid) 3 dimensional ball D 3 (e.g dodgeball, but not a soccer ball) are b 0 = 1, b 1 = 0, b 2 = 0. Betti numbers of the 2-sphere S 2 (e.g soccer ball or football, but not a dodgeball) are b 0 = 1, b 1 = 0, b 2 = 1. Image credit: Salix alba, Wikipedia Yogesh More 29 / 63

Example of Betti numbers Image credit: Krishnavedala, Wikipedia Betti numbers of a (hollow) torus are b 0 = 1, b 1 = 2, b 2 = 1. Betti numbers of a solid torus (donut!) are b 0 = 1, b 1 = 1, b 2 = 0. Betti numbers of the surface X g with g-holes are b 0 (X g ) = 1, b 1 (X g ) = 2g, b 2 (X g ) = 1. Yogesh More 30 / 63

Emmy Noether (1882-1935) Photo credit: Bryn Mawr College Archives, Wikipedia Emmy Noether was one of the leading mathematicians of her time. She developed the theories of rings, fields, and algebras (MA 5120 Abstract Algebra). One of her insights was to shift everyone s focus from Betti numbers b k (X ) of a shape X to more complicated algebraic objects called the homology groups H k (X ) Betti numbers can be recovered as the size the homology groups: b k (X ) = rank H k (X ) Recall we said discrete invariants can be numbers, groups, rings,... Yogesh More 31 / 63

Emmy Noether (1882-1935) In the spring of 1915, Noether was invited to the University of Gottingen by David Hilbert and Felix Klein. Their effort to recruit her, however, was blocked by some of the faculty. One faculty member protested: What will our soldiers think when they return [from WW I] to the university and find that they are required to learn at the feet of a woman? Hilbert responded with indignation, stating, I do not see that the sex of the candidate is an argument against her admission as privatdozent [subordinate teaching duties]. After all, we are a university, not a bath house. During her first years teaching at Gottingen she did not have an official position and was not paid; her family paid for her room and board and supported her academic work. Her lectures often were advertised under Hilbert s name. Yogesh More 32 / 63

Emmy Noether (1882-1935) 1933 Hitler, Nazi party, came to power. Noether, who was Jewish, was fired from her position. 1933 Noether moved to Bryn Mawr College (a women s liberal arts college near Philadelphia) Noether died in 1935 of cancer Noether provided invaluable methods of abstract conceptualization. Van der Waerden said that Noether s originality was absolute beyond comparison. Shapes Homology Betti Groups rank Numbers Yogesh More 33 / 63

Simplicial Homology (1900s) Let X be our shape, e.g a sphere. Divide it up into simplices in any way you like (deformation invariance, compressed representation). Let C k (X ) denote the (free abelian group generated by) k-simplices. Define the boundary map (or function) k : C k (X ) C k 1 (X ) to be the map taking a k-simplex σ as input and giving the alternating sum of its boundary as output Yogesh More 34 / 63

Simplicial homology Putting these groups and maps together we get a chain complex C 3 (X ) 3 C 2 (X ) 2 C 1 (X ) 1 C 0 (X ) 0 The simplicial homology group H k (X ) are the cycles modulo boundaries : cycles are elements of C k (X ) that k maps to 0 boundaries are elements of C k (X ) that equal k+1 σ for some σ C k+1 Yogesh More 35 / 63

Simplicial homology Computing the simplicial homology groups H k (X ) by hand can get complicated quickly, so mathematicians have found other equivalent definitions (cellular homology, singular homology). Mathematicians developed these other equivalent theories from 1900-1930 But computing simplicial homology is relatively easy for a computer (foreshadowing). Yogesh More 36 / 63

Little blurb about my research interests Simplicial/singular/cellular homology has been extensively studied. But it turns out to be just one way of going from Shapes Groups 1950s, 1960s, 1970s mathematicians started to find other sorts of (co)-homology theories: K-theory Cobordism theories Morava K-theories K n,p, one for every nonnegative integer n, and and prime number p Yogesh More 37 / 63

And now for something completely different This talk is getting too technical, so let s change directions... from the Math circa 1900 to Art circa 1900! Yogesh More 38 / 63

Sunday Afternoon on the Island of La Grande Jatte Image credit: Georges Seurat - Art Institute of Chicago, Wikipedia Some say they see poetry in my painting, I see only science. -Georges Seurat Yogesh More 39 / 63

Sunday Afternoon on the Island of La Grande Jatte Flickr Photo credit: Jennifer Tharp, Yogesh More 40 / 63

Sunday Afternoon on the Island of La Grande Jatte Photo credit: Jennifer Tharp, Flickr Yogesh More 41 / 63

Sunday Afternoon on the Island of La Grande Jatte Photo credit: Tom S., Wikipedia Georges Seurat s most famous work, and is an example of pointillism. 7 10 feet in size, now exhibited in the Art Institute of Chicago Took two years to paint (1884-1886) Estimated to consist of approximately 3,456,000 dots (or so says a page on internet) Yogesh More 42 / 63

Your brain is a topological data analysis machine Image credit: Georges Seurat - Art Institute of Chicago, Wikipedia Your brain can: extract shape from millions of dots (or much fewer than a million dots) extract meaning from the shape What s the difference between the two? Very big difference. The same as the difference between taking a reservation and holding a reservation. Seinfeld Clip Yogesh More 43 / 63

Data Studying data has become an extremely hot topic within the past decade: Data science, machine learning, data analytics, etc. Data can often be represented as points (in the xy-plane, or in R n for some n) There are many tools to study data: Average, Linear regression, Principal Component Analysis (PCA), Clustering... Yogesh More 44 / 63

Data is messy Data doesn t always form a straight line. It can take various shapes: One could try to develop or find a regression model for each type of shape. But that s not practical since there are an immense variety of possible shapes Instead, let s find a flexible way of dealing with all shapes. That s where topology comes in. Yogesh More 45 / 63

Topology is back New idea: use topology to construct a set of tools to find shape in data. Topological Data Analysis, used in the work Gunnar Carlsson (Stanford) and others 2000-present Persistent homology, popularized by Robert Ghrist (U Penn) 2000-present Carlsson co-founded a company, Ayasdi that applies these techniques to various business sectors: drug discovery, oil and gas exploration, and financial-market research Caution: Find meaning (if any) in the shape is a separate task, requiring domain expertise. Yogesh More 46 / 63

Data and Topology Why study data using topology? Aspects of Data: subject to noise (e.g experimental error) can be large (e.g millions of data points) we want to extract some information from the data Aspects of Topology deformation invariance: not sensitive to minor variations compressed representation discrete invariants: topological statistics Yogesh More 47 / 63

Topological Data Analysis applied to breast cancer research In 2010, Gunnar Carlsson, et. al. [Lum] applied a Topological Data Analysis to two old breast cancer databases and within minutes discovered something new: We identified a previously unknown subgroup of oncology survivors who exhibited genetic indicators of poor survivors [low ESR1 levels]. This will allow us to better understand this group and potentially help improve survival rates for this disease, which might potentially help us find a cure. -Devi Ramanan, Ayasdi head of collaboration [Ram] These insights had eluded more traditional study for more than a decade. Using Topological Data Analysis, Ayasdi was to discover new insights within minutes. [Sym] Yogesh More 48 / 63

Breast cancer research results [Lum] Image credit: [Lum] Yogesh More 49 / 63

Raw data was gene expression levels of 1500 genes, in 272 tumors. So 272 points in R 1500. Define a notion of distance between two points in R 1500 Use a filter function f to put tumors into overlapping bins/boxes/groups f 1 (U i ) Each node represents a cluster of tumors in a bin Nodes are connected if and only if they have at least one tumor in common Coordinates Yogesh More or placement of any individual node has no meaning 50 / 63 Breast cancer research results [Lum] Image credit: [Lum]

Topological Data Analysis Recipe, by toy example Toy example presented in [Aya]. Step 0: Raw Data Goal: Get a compressed representation (i.e. graph or network) that captures the shape Yogesh More 51 / 63

Topological Data Analysis Recipe, by toy example Goal: Get a compressed representation (i.e. graph or network) that captures the shape Yogesh More 52 / 63

Topological Data Analysis Recipe Step 1: Apply filter function f In our toy example, f takes each data point and returns its y-coordinate. (In the breast cancer study, the filter functor took each data point and returned the distance to the furthest data point. This is called the L -centrality function. The choice of filter function affects the output significantly.) Yogesh More 53 / 63

Topological Data Analysis Recipe Step 2: Cover the target of the filter function by intervals. The size and number of the intervals can be varied. Yogesh More 54 / 63

Topological Data Analysis Recipe Step 3: Take the inverse image under f of each interval U i, to create bins of data points. There are four bins of data, only two are shown on this slide. Yogesh More 55 / 63

Topological Data Analysis Recipe Step 3: Take the inverse image under f of each interval U i, to create bins of data points. Here are all four bins. Yogesh More 56 / 63

Topological Data Analysis Recipe Step 4: Apply a clustering algorithm to each bin Yogesh More 57 / 63

Topological Data Analysis Recipe Step 5: Connect two nodes if they both represent a common data point Yogesh More 58 / 63

Topological Data Analysis Recipe Step 5: Connect two nodes if they both represent a common data point Yogesh More 59 / 63

Topological Data Analysis Recipe Step 6: Color the nodes via functions of interest Yogesh More 60 / 63

Topological Data Analysis Recipe Step 7 (The hardest and most important): Find meaning! Check if your findings are consistent with the experience of domain experts Be willing to consider the possibility that TDA might not be give anything useful, in which case use other tools (persistent homology, other tools from statistics) Yogesh More 61 / 63

References Aya Ayasdi. TDA and Machine Learning: Better Together. GC1 Gunnar Carlsson. Why topological data analysis works http://www.ayasdi.com/blog/bigdata/why-topological-data-analysis-works/ Lum P.Y. Lum et al. Extracting insights from the shape of complex data using topology, Sci Rep 3:1236, (2013) NYT New York Times, Jauary 16, 2013. Ram http://www.ourdigitalmags.com/publication/?i=252002p=49 Sym Jonathan Symonds. http://www.ayasdi.com/blog/bigdata/big-data-brings-bigbenefits-to-drug-discovery/ Wiki Wikipedia All photos Yogesh More 62 / 63

The End Don t forget to hold the reservation! Yogesh More 63 / 63