Face Identification by Human and by Computer: Two Sides of the Same Coin, or Not? Tsuhan Chen tsuhan@cmu.edu Carnegie Mellon University Pittsburgh, USA What do you see? 1
What do you see? 2
What do you see? [http://www.palmyra.demon.co.uk] [Tony Karp, Illusion of Beauty ] 3
[Adam Finkelstein, Mona ] What do you see? [http://www.palmyra.demon.co.uk] 4
From Human to Computer Face Identification: A Generalization Problem Single gallery image Pattern Recognition Need to generalize for all variations, without observing those variations Single probe image 5
Computer vs. Human Feng-Shui ( 風 水 ) as an example Ancient Chinese room arrangement technique Way 1: Write down all the rules Too many and do not generalize Way 2: Imagine how a dragon would move through the room to arrange it in a livable manner Intuitive and creative Done by Feng-Shui masters Biomimetics? Neural networks Among the first biologically motivated PR; some success in face detection and recognition Limited by training data poor generalization 6
Lesson from Deep Blue November 5, 1997: Deep Blue beat Kasparov (2-w, 1-l, 3-d), the first time in history Deep blue was not designed to mimic humans. Instead, Kasparov it said was it best, designed quantity to take is sometimes quality advantage of a computer s strengths, i.e. speed and memory Deep Blue beat Kasparov by memorizing a large amount of information and table lookup Quantity is not always quality Unfortunately, most object recognition work is under the Deep Blue paradigm Deeper search, more data, faster computers, etc. Object recognition requires some attributes in computers that will be more human-like Generalization (intuition) Adapting from the past Bounds on knowledge Face perception as example Initial overall examination of external features, followed by a sequential analysis of internal features [Matthews, 1978] [Fraser and Parker, 1986] 7
Generalization Banca Database (ICPR2004) Controlled Degraded Adverse - Studio lighting - High quality camera - Minimal pose variation - Varying lighting - Low quality web camera - Some pose variation - Varying lighting - High quality camera - Noticeable pose and other variations 8
Parts Representation Of The Face DCT/Gabor Transform Estimating Parametric Model EM Learner GMM DCT/Gabor Transform 9
Combining Representations Hypothesis Monolithic representation: low-frequency information Parts representation: both low- and high- frequency representation Weighting can be view-dependent Combine the scores with sum rule LDA-COS Input face + COMB FSC-GMM Comparative Results Algorithm/Protocol Mc Ud Ua P LDA-NC [1] 4.93 15.99 20.24 14.79 ORG-SVM [1] 5.43 25.43 30.11 20.33 PCA-MAH 10.2 17.84 26.63 21.57 LDA-COS 6.46 10.99 20.39 14.96 FSC-GMM 2.14 24.78 17.06 21.97 COMB 1.42 9.65 16.51 12.52 [1] M. Sadeghi, J. Kittler, A. Kostin, and K.Messer, A comparative study of automatic face verification algorithms on the BANCA database, in AVBPA, pp. 35 43, 2003. 10
Some Motivations Holistic vs. Parts [Young et al.,1987, Valentine,1995] 11
Thatcher Illusion [Thomson, 1980] Thatcher Illusion [Thomson, 1980] 12
Holistic vs. Parts Parts of faces are [Tanaka and Farah, 1993] easily recognized in typical whole-face configuration less easily in new configuration most poorly recognized in isolation Chins differences detected first [Sargent, 1984] Not as obvious when faces are inverted These suggest Face perception is holistic and by parts Orientation is important Bounds on Knowledge and Adapt from Past 13
Bounds on Knowledge Socrates (470-399 B.C.) "The only true wisdom is in knowing you know nothing." Computer is no where near this yet. It thinks it knows every conceivable variation (but in fact only limited to what has been programmed to it) Adapt from Past Humans are good at adapting using past experience. Can computers do the same? Yes, it is called relevance adaptation (RA) Previously used in speech recognition Obtains a subject-dependent model from a subject-independent average distribution (the past), using a small amount of adaptation data 14
No Relevance Adaptation Relevance Adaptation 15
Another Aspect: 3D/Video 3/4-View Frontal/profile views result in poorer recognition by human than 3/4-view for unfamiliar faces [Baddeley & Woodhead, 1981; Bruce, 1982] profile view ¾ view frontal view ¾ view profile view 3/4-view looks good too! 16
Face Mosaic m v 1 v 2 w m 17
Face in Video Moving faces are significantly better recognized by human than still images Movement provides 3D structure of the face and allows recognition of facial gestures [Knight and Johnston, 1997] To pixelating or blurring, moving images of faces are recognized better than still images [Lander, et al. 1999] (perhaps masking or super-resolution ) Face Recognition from Video Computer can use video too More than simple majority voting or frame selection Integration of temporal/motion/geometry information Updating over time Most variations are continuous (at 30Hz): pose, illumination, expression, registration, etc. 18
Face-in-Action (FiA) Database Other aspects to be explored... 19
Stages of Face Identification Face Identity Name [Young et al, 1982] Common situations: Case 1: Can not recognize the face Case 2: The face looks familiar without identity Case 3: Identify the face (e.g., occupation) but can t recall the name Own-Race Bias We are better at identifying faces belonging to races with which we are familiar [Shapiro and Penrod, 1986] 20
Own-Race Bias Independent Modules Facial expression identified independently of face identity [Bruce, 1986, Young et al.,1986] Prosopagnosia patients can still identify facial emotion Some patients cannot identify facial emotion, but could identify famous faces 21
Independent Modules McGurk effect [McGurk and MacDonald 76] Audio + Visual Perceived ba ga da pa ga ta ma ga na Internet Psychology Lab http://kahuna.psych.uiuc.edu//ipl A prosopagnosia patient can still experience McGurk effect [Campbell et al., 1986], suggesting that holistic face recognition is affected, but not the by-part A few words on sampling 22
How many samples for a face? Reconstruct One Single Image 16 12 8 Number of all possible 16 12 images = 2 >> number of all possible face images [Baker and Kanade, Hallucinating Faces ] >> 30 60 60 24 365 human history world population Power of prior; adapt from past Some Art Work 12 x 16 LEDs, 8-bit Grayscale [Jim Campbell, Portrait of a Portrait of Harry Nyquist ] 23
More 12 x 16 LEDs, 8-bit Grayscale [Jim Campbell, Portrait of a Portrait of Claude Shannon ] Finally The most compelling shapes are those near to our hearts: people s faces, a gracefully moving body, a natural scene with rustling leaves and flowing water. Evolution has tuned us to these sights. By combining vision and graphics, capturing and creating images of these scenes may soon be within reach. [Lengyel, 1998] 24
Try this [http://www.palmyra.demon.co.uk] Advanced Multimedia Processing Lab Please visit us at: http://amp.ece.cmu.edu 25