Topological Data Analysis Applications to Computer Vision



Similar documents
A fast and robust algorithm to count topologically persistent holes in noisy clouds

Geometry and Topology from Point Cloud Data

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Part-Based Recognition

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Data Fusion Enhancing NetFlow Graph Analytics

Visualization of General Defined Space Data

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

ORIENTATIONS. Contents

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

A HYBRID APPROACH FOR AUTOMATED AREA AGGREGATION

Environmental Remote Sensing GEOG 2021

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Shortcut sets for plane Euclidean networks (Extended abstract) 1

Solving Geometric Problems with the Rotating Calipers *

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Similarity and Diagonalization. Similar Matrices

Analysis of Algorithms, I

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

High-dimensional labeled data analysis with Gabriel graphs

1 Local Brouwer degree

Computational Geometry. Lecture 1: Introduction and Convex Hulls

Random graphs with a given degree sequence

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

6.852: Distributed Algorithms Fall, Class 2

B490 Mining the Big Data. 2 Clustering

Visualization of 2D Domains

Off-line Model Simplification for Interactive Rigid Body Dynamics Simulations Satyandra K. Gupta University of Maryland, College Park

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Arrangements And Duality

Intersection of a Line and a Convex. Hull of Points Cloud

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Topological Data Analysis

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

3. Reaction Diffusion Equations Consider the following ODE model for population growth

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

Modélisation et résolutions numérique et symbolique

How To Create A Surface From Points On A Computer With A Marching Cube

Topological Data Analysis

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

A quick trip through geometrical shape comparison

The Value of Visualization 2

Fast approximation of the maximum area convex. subset for star-shaped polygons

Compact Representations and Approximations for Compuation in Games

UNIFORMLY DISCONTINUOUS GROUPS OF ISOMETRIES OF THE PLANE

Server Load Prediction

Robust Blind Watermarking Mechanism For Point Sampled Geometry

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction

Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume *

How To Understand And Understand The Theory Of Computational Finance

Taking Inverse Graphics Seriously

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]

Performance Metrics for Graph Mining Tasks

Efficient Storage, Compression and Transmission

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Cyber Graphics. Abstract. 1. What is cyber graphics? 2. An incrementally modular abstraction hierarchy of shape invariants

Introduction to Topology

Data Mining: Algorithms and Applications Matrix Math Review

PSEUDOARCS, PSEUDOCIRCLES, LAKES OF WADA AND GENERIC MAPS ON S 2

8.1 Min Degree Spanning Tree

Clustering and mapper

Solving Simultaneous Equations and Matrices

Chapter 4: Artificial Neural Networks

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Neural Gas for Surface Reconstruction

Part II Redundant Dictionaries and Pursuit Algorithms

1 Norms and Vector Spaces

Metric Spaces. Chapter 1

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Data Exploration Data Visualization

ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite

Java Modules for Time Series Analysis

TOPOLOGY AND DATA GUNNAR CARLSSON

2.3 Convex Constrained Optimization Problems

Wii Remote Calibration Using the Sensor Bar

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Linear Codes. Chapter Basics

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Design of LDPC codes

We shall turn our attention to solving linear systems of equations. Ax = b

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Subspace Analysis and Optimization for AAM Based Face Alignment

Analecta Vol. 8, No. 2 ISSN

Circle Object Recognition Based on Monocular Vision for Home Security Robot

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

Nonlinear Iterative Partial Least Squares Method

Statistical machine learning, high dimension and big data

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina

Protein Protein Interaction Networks

Transcription:

Topological Data Analysis Applications to Computer Vision Vitaliy Kurlin, http://kurlin.org Microsoft Research Cambridge and Durham University, UK

Topological Data Analysis quantifies topological structures in raw data. Topological persistence and simplification by Edelsbrunner, Letscher, Zomorodian, FOCS 00. Goal: also use machine learning and statistics.

Learning topology of data Stages of Topological Data Analysis (TDA): We need fast and stable under noise invariants.

What are data in theory? Data are finite metric spaces: noisy clouds of unorganized points with pairwise distances. A shape can be represented by a noisy sample, but we need a small structured description. Computer Vision: image in! description out.

Computer Vision problems Input: a matrix of grayscale (or RGB) values. http://kos.informatik.uni-osnabrueck.de/3dscans Questions: how many objects, what are their boundaries, how do they relate to each other?

Image recognition: error 6.6% ImageNet database: http://image-net.org. > 14M images, > 21K classes. Best algorithm: deep neural network with 60M parameters. Animals: 2.8M in 3822 classes (723 per class). If we perturb 2% of pixels: a dog! an ostrich.

Life story of a cloud: scale = 0

Life story of a cloud: scale 1.1

Life story of a cloud: scale = 1.5

Life story of a cloud: scale = 2

Life story of a cloud: scale 2.6

From a cloud to some shape Def : the -offset of a cloud C R 2 is the union of closed balls C = [ p2c B(p; ) of a radius. Filtration C = C 0 C C +1 = R 2.

Complexes on data points Instead of balls, build complexes on points: add edges, triangles, tetrahedra, high-dim simplices. Def: the Čech complex Čh( ) has vertices at points of C and simplices spanned by vertices v 1,...,v k if \ k i=1 B(v i; ) 6= ;. The Vietoris-Rips complex VR( ) has simplices spanned by v 1,...,v k if distances d(v i, v j ) apple.

Preserving a homotopy type Nerve lemma: any union of balls [ n i=1 B(v i; ) continuously deforms to the complex Čh( ).

Single edge clustering of C A manual choice of the scale is needed: all points with d(p, q) apple 2 are in one cluster. Persistent components of VR 1 ( ) living over a long interval of are more natural clusters of C.

Past drawbacks of topology The scale for a cloud C R 2 can be unclear. Usual homology isn t stable under noise, which changes the number dim H 0 of components. But homology respects continuous f : X! Y, which always induce linear f k : H k (X)! H k (Y ).

Persistent homology Any filtration S( 1 ) S( 2 ) S( m ) of complexes induces linear maps in homology: H k (S( 1 ))! H k (S( 2 ))!!H k (S( m )). For i apple j, the persistent homology group is H i,j k = Im {H k(s( i ))! H k (S( j ))}, i,j k = rk Hi,j k µ i,j i,j 1 i,j k =( k k ) ( i 1,j 1 i 1,j k k ) lin. ind. classes were born in S( i ), died entering S( j ).

Barcodes are life spans H k (S( 1 ))! H k (S( 2 ))!!H k (S( m )) decomposes over Z 2 as a sum of easy modules 0! Z 2 (birth i ) id! id! Z 2 (death j )! 0 represented by a bar [ i, j ) in the scale -axis. µ i,j k is the number of bars with b = i, d = j. Classification of finitely generated modules ) all persistent homology is described the bar code, the bars [ i, j ) with multiplicities µ i,j k.

Bar code of a 1-variable function For f : M! R, we take S( ) =f 1 ( 1, ]. Components of sublevel sets have life intervals [b, d) =[birth,death) that form the bar code of f.

0D persistence diagram PD PD{S( )} {0 apple x apple y} R 2 is formed by {x = y} and (birth, death) with multiplicities.

1D persistence of -offsets C Pairs with high persistence are true contours. Pairs near the diagonal reflect noisy contours.

Bottleneck distance on diagrams Let P be {(x, x) 2 R 2 }[{a multi-set of points}. Def : d B (P, Q) =inf sup a2p a (a) over all 1-1 maps : P! Q is the bottleneck distance.

Stability of persistence Th (C-S, E, H 07): sublevel sets of f, g : R d! R have diagrams with d B (PD f, PD g ) apple f g 1. "-perturbation of f deforms PD f by at most ".

Computing persistence Standard algorithm: if a filtration {S( )} has u updates (simplices), find a normal form of the boundary matrix of a complex in time O(u 3 ). Th (M, M, S 11): the persistence diagram PD{S( )} of a filtration can be found in time O(M(u)+u 2 log 2 u), where M(u) =O(u 2.376 ) is the time for multiplication of two u u matrices. Dimension 0: components of graphs are found via the union-find structure in time O(m (m)), where the largest graph G has m edges.

Delaunay triangulation DT(C) Def: for C = {p 1,...,p n } R 2, the Voronoi cell is V (p i )={q 2 R 2 : d(q, p i ) apple d(q, p j ), j 6= i}. Def: a Delaunay triangulation DT(C) is dual to the Voronoi cells and is found in time O(n log n).

-complexes C( ) on a cloud C Def: the -complex C( ) DT(C) has edges of length apple 2 and triangles of circumradius apple. C(0) C( ) C(+1) =DT(C).

From -offsets to -complexes Th (Edelsbrunner 95): any C deforms to C( ).

Graphs C ( ) dual to -complexes Cycles of C( ) R 2 correspond to connected components of the graph C ( ) dual to C( ). If is decreasing, then C( ) is shrinking from DT(C), the dual graph C ( ) is growing.

A union-find structure on C ( ) A cycle of C( ) is broken at an edge e $ 2 components of C ( ) merge at the edge e. O(n (n)) time for a union-find structure, where (n) is the slow inverse Ackermann function.

Running time for counting holes Corollary: for any n points C R 2, the 1-dim persistent homology (# independent cycles) of offsets C can be computed in time O(n log n).

Barcode and staircase: real cloud Right: rank H 1 (C ) for all. P(3 holes) 24%, P(2 holes) 13%, P(8 holes) 11%,...

Noisy samples of real shapes Def : C is any "-sample of X if C X ", X C ", so " is an upper bound of some arbitrary noise. Def :ashape is a compact subset X R 2. A hole is a bounded component of R 2 X. min (X) : when #holes of X starts changing. max (X) : when #holes of X stops changing.

Guarantees for counting holes Th (VK, CVPR 14): if C is an "-sample of an unknown shape X, min (X) > 1 2 max(x)+4", no new holes appear in X when is increasing, then our algorithm correctly finds # holes of X. Idea : diagram of X has a widest diagonal gap {y x < min }, which may become only a little bit thinner due to stability of persistence. We know how many holes a cloud C probably has. Question: where are these holes?

Maps and hand-drawn sketches Problem: complete all closed contours or paint all regions that they enclose (a segmentation). Motivations (and pictures) are from E. Saund. Finding perceptually closed paths. T-PAMI 03.

Computer Graphics application There can be no single ideal solution, but any user quickly drawing a sketch on a tablet might be happy with our fast automatic best guess : make contours close so that I can paint regions (a scale is easy to find, but we can t ask for it).

Input of auto-completion Input: a dotted 2D image of n sparse points without any user-defined parameters, no scale. Often a sequence of points is given, we solve the problem for any unstructured cloud C R 2.

Output of auto-completion Required output: most persistent contours.

Counting holes in C may be easy The graph G has H 1 of rank 36, hence any "-sample C of G will probably have 36 holes. How can we see that there are 36 holes in C?

Using stability of persistence We can find the widest diagonal gap separating 36 points from the rest of persistence diagram.

An initial segmentation of C Acute Delaunay triangle is a center of gravity. We attach all adjacent non-acute triangles to get an initial segmentation on the right hand side.

Harder than counting cycles Initial regions $ red dots in PD (too many). We separate 36 dots above the widest diagonal gap. We merge remaining dots with low persistence.

Merging initial regions When building PD{C }, we keep the adjacency relation for each older region (homology class that survives) and younger region that dies.

Simple vs non-simple graphs Def: G R 2 is simple if the boundary L of any bounded region in R 2 G has a radius (L) such that L is circular for < (L) and L for (L), so the hole in L dies at = (L).

Diagrams of simple graphs The diagram PD{G } of any simple graph has only (birth, death) =(0, ) in the vertical axis. For any "-sample C of G, a diagonal gap in PD{C } can become thinner by at most 4".

Noisy input! correct output Th (VK 14): let G R 2 be a simple graph with 0 < 1 apple apple m and 1 > 7" + max{ i+1 i }. For any "-sample C of G, the algorithm finds m expected contours, they are in the 2"-offset G 2".

Idea: the widest gap survives 1 > 7" + max{ i+1 i } says that the diagonal gap {0 < y x < 1 } is widest under noise. Also " max{birth} above the widest gap in PD{C }, hence all edges in a persistent contour L G have half-lengths apple ", so L C " G 2".

Fun: character recognition Handwritten characters can be topologically classified by comparing persistent homology arising from directional sweeps: we may count components in horizontal or vertical filtrations. Their 0-dim persistence diagrams are different.

Summary: TDA & applications unstructured data are a finite metric space on which we build simplicial complexes {filtration}!persistence diagram, which is stable under noise in the data or filtration counting holes in noisy 2D clouds in time O(n log n) with theoretical guarantees auto-completion of closed contours in 2D with a close geometric approximation