Pattern Recognition: Introduction Instructor: Hernán Darío Benítez Restrepo benitez@ieee.org Oficina 2.50, Facultad de Ingeniería http://escher.puj.edu.co/~hbenitez
Pattern recognition: Introduction 1.Basic concepts: Pattern, Class, features, datasets 2. Pattern recognition cycle 3. Applications 4.Supervised vs Non-supervised classification 5. Different approaches in pattern recognition and open issues.
DONDE ESTÁ WALDO?
BASIC CONCEPTS
BASIC CONCEPTS Pattern: Any abstract, symbolic, or physical manifestation of information with examples that include: audio, music, speech, text, image, graphics, video, multimedia, sensor, communication, geophysical, sonar, radar, biological, chemical, molecular, genomic, medical, data, or sequences of symbols, attributes, or numerical quantities, 1. Examples: Hand-written characters Faces recognition Seismic signals X-Ray images 1 Li Deng, New Focus, New Challenge, Signal Processing Magazine, Marzo 2010
BASIC CONCEPTS 2. Class: Category to which a pattern is assigned Examples: Face -> Man/Woman Written characters -> number / letter Torax X-Ray -> low/ middle/ high risk Seismic signal -> ice slide / Rock breaking 3. Features: Characteristic of patterns or described objects Continuos (Height, pressure, temperature) Quantitative Numerical Discrete (Number of inhabitants in a city Number of students in a classroom) Qualitative Categorical Ordinal (education level) Nominal (Genre: Woman or Man)
BASIC CONCEPTS Mathematical definitions Pattern: x = {x 1,...,x n } T n, n : Feature space Class: Ω = {w 1,...,w c } Feature: Dataset: x j Z = {z 1,...,z N },z j n The label s class is named: Example: l(z j ) Ω,j =1...N Hand written characters: How many classes? How many and which features?
PATTERN RECOGNITION CYCLE
PATTERN RECOGNITION CYCLE Important steps: 1.Data 2.Assumptions 3.Representation 4.Classification 5.Evaluation 6.Model selection
PATTERN RECOGNITION CYCLE Problem 1: Which students will pass the data mining course? 1.Data: Which historic registers could we use? 2.Assumptions: What can we assume about the course students? 3.Representation: How could we represent a student? 4.Classification: What classification algorithm should we select and train? 5.Evaluation: How well do we classify? 6.Model selection: Could we improve the classification performance?
PATTERN RECOGNITION CYCLE Data (we assume they are available) Names and grades of students in past data mining courses Grades of current data mining course students Training data Student DM Course 1 Course 2 Liliana 4 4,3 3,6 David 3,5 4,4 2,7 Test data Student DM Course 1 Course 2 Carlos? 3,4 3,2 Milena? 4,4 4,7
PATTERN RECOGNITION CYCLE Assumptions We can do the following assumptions: The course has been the same during the years. Every student works independently.
PATTERN RECOGNITION CYCLE Representation The i-th student (ex:milena) can be represented as a vector: x i =[4.0 4.5 4.6] Training data Validation data Student DM grade Student DM grade X1 4,0 X1? X2 3,5 X2?............
PATTERN RECOGNITION CYCLE Classification Given the training set: Training data Student DM grade X1 4,0 X2 3,5...... We need to find a map between the input vectors x (students) and labels y (DM grades) Possible solution: K-nearest neighbors k-nn classifier. For each student x find the nearest student xi in the training data set.
PATTERN RECOGNITION CYCLE Evaluation How do we estimate how good is our classification? We can wait until the end of the semester... We can estimate the precision using the training set Possible solution: To divide the training set into: training and validation set
PATTERN RECOGNITION CYCLE Model selection We can adjust: The estimation algorithm (ej: to use a different classifier ) The representation (To create a classifier based on a different courses set) Assumptions (ex: it is possible that students work together)
PATTERN RECOGNITION CYCLE Problem 2: Which football soccer teams will pass the quarters of the first semester in Colombia s soccer tournament? Data (available in principle) Number of matches won, lost or tied for each team that passed the quarters in the last four tournaments. Training dataset: Teams Matches won Matches lost Matches tied Team 1 9 6 3............ Team 32 8 4 7
PATTERN RECOGNITION CYCLE Test dataset: Teams Matches won during the semester Matches lost during the semester Matches tied during the semester Team 1 5 8 1............ Team 18 3 4 5
PATTERN RECOGNITION CYCLE Assumptions We can do the following assumptions: The players are the same during the periods of time analyzed. Every team plays independently.
PATTERN RECOGNITION CYCLE Representation It is possible to represent the ith equipment (example: Deportivo Cali) with a vector Training data Validation data Team [MW, ML, MT] Team Pass/No pass X1 [5,6,7] X1? X2 [3,8,7] X2?............
PATTERN RECOGNITION CYCLE Classification We need to find a map between the input vectors x and the labels y which are pass / no pass. Possible solution Nearest neighbors classifier (k-nearest neighbors) that for every team x finds the nearest team xi in the training dataset Team [MW, ML,MT] X1 [5,6,7] X2 [3,8,7]......
PATTERN RECOGNITION CYCLE Evaluation How can we estimate how good is our classification?. We can wait until the end of the tournament... We can estimate the precision using the training dataset Possible solution: To divide the training dataset into training dataset and validation dataset.
PATTERN RECOGNITION CYCLE Model selection We can adjust: The estimation algorithm (ej: to use a different classifier ) The representation (To add to the representation the matches won as local and visitor). The assumptions (ex: the players and coaches change constantly)
APPLICATIONS
APPLICATIONS Biometrics From: Blood and money, E. Strickland, IEEE Spectrum, pp 33-37,June 2012
APPLICATIONS Optical character recognition (OCR) Kurzweil Reading Edge Handwritten characters reading and automatic text reading
APPLICATIONS Content Based Image Retrieval (CBIR)
APPLICATIONS Faces recognition
APPLICATIONS Barcodes Source:http://en.wikipedia.org/wiki/File:UPC_A.svg
APPLICATIONS Midomi Source: http://www.midomi.com/
SOME APPLICATIONS IN COLOMBIA Clasificación por color del fruto chontaduro bactris gassipaes mediante visión artificial, A. Ruiz-Hoyos, D. Montilla-Perafán, L. Pencue-Fierro y J. León- Téllez, XI SIMPOSIO DE TRATAMIENTO DE SEÑALES, IMÁGENES Y VISIÓN ARTIFICIAL. STSIVA 2006.
SOME APPLICATIONS IN COLOMBIA Clasificación del canto de cinco especies de aves de la región andina Colombiana usando tres topologías de Redes Neuronales Artificiales, J. López, J.E. Cardona, XI SIMPOSIO DE TRATAMIENTO DE SEÑALES, IMÁGENES Y VISIÓN ARTIFICIAL. STSIVA 2006.
SOME APPLICATIONS IN COLOMBIA Classification of volcano events observed by multiple seismic stations, R. Duin, J.M Londoño, M. Orozco, 2010 International Conference on Pattern Recognition. Eventos registrados en las estaciones sismológicas Alfombrales (ALF), BIS y Tolda Fría (TOL) en el Volcán Nevado del Ruiz.
SUPERVISED VS NON-SUPERVISED CLASSIFICATION Problemas del mundo real Problema Recolección de datos, Nominación de características No supervisado Supervisado Selección de un método de clustering Clustering de datos Selección y extracción de características Selección de un modelo de clasificador Source: Ludmila Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, 2004 Resultado OK? Entrenamiento Prueba Resultado OK? Mundo real
SUPERVISED VS UNSUPERVISED CLASSIFICATION Supervised Classification: In this type of classification the data or observations are labeled which means that we previously know the classes to which each observation belongs to. Unsupervised classification: In this type of classification the data are not labeled and it is expected to find groups of data. It is also known as data clustering.
SUPERVISED VS UNSUPERVISED CLASSIFICATION Examples: Supervised: To classify images of products in a production line. Why to perform automatic classification?. Unsupervised classification: To classify land use in a remote sensing application. What would be the procedure if we do it with supervised classification?
SUPERVISED VS UNSUPERVISED CLASSIFICATION Advantage Supervised -It aids process automation since it decreases human intervention. Unsupervised -An expert is not required to label the observations. -In applications such as image segmentation offers high speed of processing, reliability and consistency. Disadvantage -An expert is required to label the observations. -An appropriate proximity measure must be defined to determine the cluster. -Every cluster algorithm finds clusters even if these are not present in the data.
DIFFERENT APPROACHES IN PATTERN RECOGNITION AND OPEN ISSUES Statistical pattern recognition
DIFFERENT APPROACHES IN PATTERN RECOGNITION AND OPEN ISSUES Structural pattern recognition
DIFFERENT APPROACHES IN PATTERN RECOGNITION AND OPEN ISSUES Properties Statistical Pattern recognition Structural Pattern recognition Foundations Well developed mathematical theory of vector spaces Intuitively appealing: human cognition or perception. Descriptors Numerical features Morphological primitives of variable sizes Noise Easily encoding Need regular structures Discrimination Relies on distances or inner products in a vector space. Grammars recognize valid objects. Source: The dissimilarity representation for pattern recognition, Foundations and applications,e. Pekalska, R. Duin, World Scientific, 2005
DIFFERENT APPROACHES IN PATTERN RECOGNITION AND OPEN ISSUES Example: Heart diseases diagnose by ECG signals Statistical approach: An spectral analysis with Fourier transform or Wavelet transform is performed to obtain features. Then a feature reduction is done by feature extraction or feature selection and then a classifier is trained and evaluated to classify between healthy or sick. Structural approach: Signals are described by vertical and diagonal lines. ECG Signals that represent healthy or unhealthy patients are described by segments of lines and then formal grammars are used for classification. What about the noise?.
DIFFERENT APPROACHES IN PATTERN RECOGNITION AND OPEN ISSUES Open issues Data representation and a priori knowledge In a specific problem what is the best representation for a dataset?, How to incorporate a priori knowledge?. Training and validation datasets design How to choose a representative dataset? Generalization To combine or select classifiers? Evaluation: Which classifiers are good? Source: R.P.W. Duin and E. Pekalska. Open issues in pattern recognition. In M. Kurzynski, E. Puchala, M. Wozniak, and A. Zolnierek, editors, Computer Recognition Systems, Advances in soft computing. Springer Verlag, 2005.
DIFFERENT APPROACHES IN PATTERN RECOGNITION AND OPEN ISSUES How the brain recognizes objects MIT, January 2010 A new computational model sheds light on the workings of the human visual system and could help advance artificialintelligence research, too. Source: http://web.mit.edu/ newsoffice/2010/peopleimages-0607.html A new computational model of how the primate brain recognizes objects creates a map of interesting features (right) for a given image. The model s predictions of which parts of the image will attract a viewer s attention (green clouds, left) accord well with experimental data (yellow and red dots). Images courtesy of Sharat Chikkerur
*voting member of the Board of Governors nonvoting member of the Board of Governors FURTHER READING A.K. Jain, R.P.W. Duin, and J. Mao, Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, no. 1, 2000, 4-37, Read Sections 1 and 2. PURPOSE: The IEEE Computer Society is the world s largest association of computing professionals and is the leading provider of technical information in the field. MEMBERSHIP: Members receive the monthly magazine Computer, discounts, and opportunities to serve (all activities are led by volunteer members). Membership is open to all IEEE members, affiliate society members, and others interested in the computer field. COMPUTER SOCIETY WEBSITE: www.computer.org OMBUDSMAN: To check membership status or report a change of address, call the IEEE Member Services toll-free number, +1 800 678 4333 (US) or +1 732 981 0060 (international). Direct all other Computer Society-related questions magazine delivery or unresolved complaints to help@computer.org. CHAPTERS: Regular and student chapters worldwide provide the opportunity to interact with colleagues, hear technical experts, and serve the local professional community. AVAILABLE INFORMATION: To obtain more information on any of the following, contact Customer Service at +1 714 821 8380 or +1 800 272 6657: Membership applications Publications catalog Draft standards and order forms Technical committee list Technical committee application Chapter start-up procedures Student scholarship information Volunteer leaders/staff directory IEEE senior member grade application (requires 10 years practice and significant performance in five of those 10) PUBLICATIONS AND ACTIVITIES Computer: The flagship publication of the IEEE Computer Society, Computer, publishes peer-reviewed technical content that covers all aspects of computer science, computer engineering, technology, and applications. Periodicals: The society publishes 13 magazines, 18 transactions, and one letters. Refer to membership application or request information as noted above. Conference Proceedings & Books: Conference Publishing Services publishes more than 175 titles every year. CS Press publishes books in partnership with John Wiley & Sons. Standards Working Groups: More than 150 groups produce IEEE standards used throughout the world. Technical Committees: TCs provide professional interaction in more than 45 technical areas and directly influence computer engineering conferences and publications. Conferences/Education: The society holds about 200 conferences each year and sponsors many educational activities, including computing science accreditation. Certifications: The society offers two software developer credentials. For more information, visit www.computer.org/ certification. NEXT BOARD MEETING 2 4 Feb. 2011, Long Beach, Calif., USA revised 2 Dec. 2010 EXECUTIVE COMMITTEE President: Sorel Reisman* President-Elect: John W. Walz* Past President: James D. Isaak* VP, Standards Activities: David Alan Grier (1st VP)* Secretary: Jon Rokne (2nd VP)* VP, Educational Activities: Elizabeth L. Burd* VP, Member & Geographic Activities: Sattupathu V. Sankaran VP, Publications: David Alan Grier* VP, Professional Activities: James W. Moore* VP, Technical & Conference Activities: John W. Walz* Treasurer: Frank E. Ferrante* 2011 2012 IEEE Division VIII Director: Susan K. (Kathy) Land, CSDP 2010 2011 IEEE Division V Director: Michael R. Williams Computer Editor in Chief: Carl K. Chang BOARD OF GOVERNORS Term Expiring 2011: Elisa Bertino, Jose Castillo-Velázquez, George V. Cybenko, Ann DeMarle, David S. Ebert, Hironori Kasahara, Steven L. Tanimoto Term Expiring 2012: Elizabeth L. Burd, Thomas M. Conte, Frank E. Ferrante, Jean-Luc Gaudiot, Paul K. Joannou, Luis Kun, James W. Moore Term Expiring 2013: Pierre Bourque, Dennis J. Frailey, Atsuhiro Goto, André Ivanov, Dejan S. Milojicic, Jane Chu Prey, Charlene (Chuck) Walrad EXECUTIVE STAFF Executive Director: Angela R. Burgess Associate Executive Director; Director, Governance: Anne Marie Kelly Director, Finance & Accounting: John Miller Director, Information Technology & Services: Ray Kahn Director, Membership Development: Violet S. Doan Director, Products & Services: Evan Butterfield Director, Sales & Marketing: Dick Price COMPUTER SOCIETY OFFICES Washington, D.C.: 2001 L St., Ste. 700, Washington, D.C. 20036 Phone: +1 202 371 0101 Fax: +1 202 728 9614 Email: hq.ofc@computer.org Los Alamitos: 10662 Los Vaqueros Circle, Los Alamitos, CA 90720-1314 Phone: +1 714 821 8380 Email: help@computer.org MEMBERSHIP & PUBLICATION ORDERS Phone: +1 800 272 6657 Fax: +1 714 821 4641 Email: help@computer.org Asia/Pacific: Watanabe Building, 1-4-2 Minami-Aoyama, Minato-ku, Tokyo 107-0062, Japan Phone: +81 3 3408 3118 Fax: +81 3 3408 3553 Email: tokyo.ofc@computer.org IEEE OFFICERS President: Moshe Kam President-Elect: Gordon W. Day Past President: Pedro A. Ray Secretary: Roger D. Pollard Treasurer: Harold L. Flescher President, Standards Association Board of Governors: Steven M. Mills VP, Educational Activities: Tariq S. Durrani VP, Membership & Geographic Activities: Howard E. Michel VP, Publication Services & Products: David A. Hodges VP, Technical Activities: Donna L. Hudson IEEE Division V Director: Michael R. Williams IEEE Division VIII Director: Susan K. (Kathy) Land President, IEEE-USA: Ronald G. Jensen IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Vol. 33, No. 3, March 2011 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE A publication of the IEEE Computer Society MARCH 2011 VOLUME 33 NUMBER 3 ITPIDJ (ISSN 0162-8828) REGULAR PAPERS Convolutional Neural Networks for P300 Detection with Application to Brain-Computer Interfaces H. Cecotti and A. Gräser... 433 Ray Projection for Recovering Projective Transformations and Illumination Changes Y. Zhang and C.-H.H. Chu... 446 A Dual-Bound Algorithm for Very Fast and Exact Template Matching H. Schweitzer, R. Deng, and R.F. Anderson... 459 Efficient 3D Geometric and Zernike Moments Computation from Unstructured Surface Meshes J.M. Pozo, M.-C. Villa-Uriol, and A.F. Frangi... 471 Hashed Nonlocal Means for Rapid Image Filtering N. Dowson and O. Salvado... 485 Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation T. Brox and J. Malik... 500 Learning a Family of Detectors via Multiplicative Kernels Q. Yuan, A. Thangali, V. Abiavsky, and S. Sclaroff... 514 MRF Energy Minimization and Beyond via Dual Decomposition N. Komodakis, N. Paragios, and G. Tziritas... 531 Non-Lambertian Reflectance Modeling and Shape Recovery of Faces Using Tensor Splines R. Kumar, A. Barmpoutis, A. Banerjee, and B.C. Vemuri... 553 Parallel Spectral Clustering in Distributed Systems W.-Y. Chen, Y. Song, H. Bai, C.-J. Lin, and E.Y. Chang... 568 Rigid and Articulated Point Registration with Expectation Conditional Maximization R. Horaud, F. Forbes, M. Yguel, G. Dewaele, and J. Zhang... 587 Robust Bilayer Segmentation and Motion/Depth Estimation with a Handheld Camera G. Zhang, J. Jia, W. Hua, and H. Bao... 603 Tiny Videos: A Large Data Set for Nonparametric Video Retrieval and Frame Classification A. Karpenko and P. Aarabi... 618 SHORT PAPERS Kernel Optimization in Discriminant Analysis D. You, O.C. Hamsici, and A.M. Martinez... 631 Matching Forensic Sketches to Mug Shot Photos B.F. Klare, Z. Li, and A.K. Jain... 639 Removal of Partial Occlusion from Single Images S. McCloskey, M. Langer, and K. Siddiqi... 647 Published in cooperation with: Aerospace & Electronic Systems Society, Control Systems Society, Engineering in Medicine & Biology Society, Information Theory Society, Systems, Man & Cybernetics Society, Ultrasonics, Ferroelectrics, & Frequency Control Society www.computer.org tpami@computer.org Indexed in MEDLINE /PubMed and indexed in ISI Duda, R.O., Hart, P.E., and Stork, D.G, Pattern classification, Second Edition, John Wiley and Sons, New York, 2001, Read: Introduction pages 1-20.