Automatic Facial Occlusion Detection and Removal

Automatic Facial Occlusion Detection and Removal Naeem Ashfaq Chaudhry October 18, 2012 Master s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Niclas Börlin Examiner: Frank Drewes Umeå University Department of Computing Science SE-901 87 UMEÅ SWEDEN

Abstract In our daily life, we are faced with many occluded faces. The occlusion may be from different objects like sunglasses, mufflers, masks, scarves etc. Sometimes, this occlusion is used by the criminal persons to hide their identity from the surroundings. In this thesis, a technique is used to detect the facial occlusion automatically. After detecting the occluded areas, a method for image reconstruction called apca (asymmetrical Principal Component Analysis) is used to reconstruct the faces. The entire face is reconstructed using the non occluded area of the face. A database of images of different persons is organized which is used in the process of reconstruction of the occluded images. Experiments were performed to examine the effect of the granularity of the occlusion on the apca reconstruction process. The input mask image is divided into different parts, the occlusion for each part is marked and apca is applied to reconstruct the faces. This process of image reconstruction takes a lot of processing time so pre-defined eigenspaces are introduced that takes very less processing time with very less quality loss of the reconstructed faces.

Contents 1 Introduction 1 1.1 Background..................................... 1 1.2 Goals of the thesis................................. 1 1.3 Related work.................................... 2 1.3.1 Occluded face reconstruction....................... 2 1.3.2 Facial occlusion detection......................... 2 2 Theory 5 2.1 Principal Component Analysis (PCA)...................... 5 2.1.1 PCA method/model............................ 5 2.1.2 PCA for images............................... 7 2.1.3 Eigen faces................................. 7 2.2 Asymmetrical PCA (apca)............................ 8 2.2.1 Description of apca............................ 8 2.2.2 apca calculation.............................. 8 2.2.3 apca for reconstruction of occluded facial region............ 9 2.3 Skin color detection................................ 10 2.4 Image registration................................. 10 2.4.1 Translation................................. 10 2.4.2 Rotation................................... 11 2.4.3 Scaling................................... 11 2.4.4 Affine transformation........................... 11 2.5 Peak signal-to-noise ratio (PSNR)........................ 12 3 Method 13 3.1 The AR face database............................... 13 3.2 Automatic occlusion detection........................... 13 3.2.1 Replace white color with black color................... 13 3.2.2 Image cropping............................... 13 3.2.3 Image division............................... 13 3.2.4 Occlusion detection for each block.................... 14 iii

iv CONTENTS 3.3 Occluded face reconstruction........................... 14 3.3.1 PSNR calculation............................. 15 4 Experiment 17 4.1 Granularity effect.................................. 17 4.1.1 Metric.................................... 17 4.1.2 Sunglasses scenario............................. 17 4.1.3 Scarf scenario................................ 20 4.1.4 Cap and sunglasses occlusion....................... 24 4.2 Pre-defined eigenspaces.............................. 24 4.2.1 Metric.................................... 32 4.2.2 Experiment description.......................... 32 5 Results 35 5.1 Occlusion detection results............................. 35 5.2 Reconstruction quality results........................... 36 5.3 Reconstruction results using pre-defined eigenspaces.............. 40 6 Conclusions 41 6.1 Discussion about granularity effect and reconstruction quality......... 41 6.2 Discussion about pre-defined eigenspaces..................... 41 6.3 Limitations..................................... 42 6.4 Future work..................................... 42 7 Acknowledgements 43 References 45

List of Figures 1.1 Different types of occlusion. (a) Sunglasses occlusion. (b) Mask occlusion... 2 2.1 The first vector Z 1 is in direction of maximum variance and second vector Z 2 is in direction of residual maximum variance................... 6 2.2 Eigenfaces (a) First eigenface. (b) Second eigenface. (c) Third eigenface.... 8 2.3 The blue part represents the eigenspace of non-occluded regions whereas the green part represents the pseudo eigenspace of the complete image....... 9 2.4 (a) and (b) represent the original images while (c) and (d) represent the registered images................................... 11 3.1 (a) an occluded facial image. (b) Image division into 6 parts. (c) Image division into 54 smaller parts (d) Image division into 486 parts......... 14 3.2 (a) an occluded facial image. (b) Image division into blocks. (c) Each black block represents an occluded block......................... 15 4.1 (a) Non-occluded facial image. (b) An occluded image. (c) Eigenspaces.... 18 4.2 (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions.. 19 4.3 An example of the reconstructed face by level 1 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.2 (c). (c) Reconstructed image. (d) Non-occluded image............ 19 4.4 (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions.. 21 4.5 An example of the reconstructed face by level 2 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.4 (c). (c) Reconstructed image. (d) Non-occluded image............ 21 4.6 (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. 22 4.7 An example of the reconstructed face by level 3a image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.6 (c). (c) Reconstructed image. (d) Non-occluded image........... 22 4.8 (a) An occluded image. (b) Occlusion detection by level 2 image division. (c) Level 3b image division. (d) Occlusion detection by level 3b image division.. 23 v

vi LIST OF FIGURES 4.9 An example of the reconstructed face by level 3b image division (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.8 (d). (c) Reconstructed image. (d) Non-occluded image............. 23 4.10 (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions.. 24 4.11 An example of the reconstructed face by level 1 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.10 (c). (c) Reconstructed image. (d) Non-occluded image........... 25 4.12 (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions.. 25 4.13 An example of the reconstructed face by level 2 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.12 (c). (c) Reconstructed image. (d) Non-occluded image........... 26 4.14 (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. 26 4.15 An example of the reconstructed face by level 3a image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.14 (c). (c) Reconstructed image. (d) Non-occluded image........... 27 4.16 (a) An occluded image. (b) Occlusion detection by level 2 image division. (c) Level 3b image division. (d) Occlusion detection by level 3b image division.. 27 4.17 An example of the reconstructed face by level 3b image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.16 (d). (c) Reconstructed image. (d) Non-occluded image........... 28 4.18 (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions.. 28 4.19 An example of the reconstructed face by level 1 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.18 (c). (c) Reconstructed image. (d) Non-occluded image........... 29 4.20 (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions.. 29 4.21 An example of the reconstructed face by level 2 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.20 (c). (c) Reconstructed image. (d) Non-occluded image........... 30 4.22 (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. 30 4.23 An example of the reconstructed face by level 3a image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.22 (c). (c) Reconstructed image. (d) Non-occluded image........... 31 4.24 (a) An occluded image. (b) Occlusion detection by level 2 image division. (c) Level 3b image division. (d) Occlusion detection by level 3b image division.. 31 4.25 An example of the reconstructed face by level 3b image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.24 (d). (c) Reconstructed image. (d) Non-occluded image........... 32 4.26 Occluded facial images used for construction of 6 eigenspaces.......... 33 4.27 (a) An occluded image. (b) Detected occlusion by level 3b image division. (c) Pre-defined eigenspace most similar to the detected occlusion in (c). (d) Reconstructed image using the eigenspace in (c)................. 34

LIST OF FIGURES vii 5.1 Occlusion detection by different image division methods. (a) Occluded image. (b) Occlusion detection by level 1 image division. (c) Occlusion detection by level 2 image division. (d) Occlusion detection by level 3a image division. (e) Occlusion detection by level 3b image division.................. 36 5.2 Reconstructed image by different image division methods. (a) An occluded image. (b) Reconstructed image by level 1 image division. (c) Reconstructed image by level 2 image division. (d) Reconstructed image by level 3a image division. (e) Reconstructed image by level 3b image division. (f) Non-occluded image......................................... 38 5.3 Reconstructed image by different image division methods. (a) An occluded image. (b) Reconstructed image by level 1 image division. (c) Reconstructed image by level 2 image division. (d) Reconstructed image by level 3a image division. (e) Reconstructed image by level 3b image division. (f) Non-occluded image........................................ 39

viii LIST OF FIGURES

List of Tables 5.1 Reconstruction quality of the complete image (PSNR)[dB] for granularity effect 37 5.2 Reconstruction quality of the occluded reconstructed parts (PSNR)[dB] for granularity effect.................................. 37 5.3 Number of Pixels used in Reconstruction..................... 37 5.4 Processing Time (sec) for granularity effect................... 38 ix

x LIST OF TABLES

Chapter 1 Introduction 1.1 Background Face recognition has been one of the most challenging and active research topics in computer vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is occluded by some object. A face recognition system should recognize a face independently and robustly as possible to the image variations such as illumination, pose, occlusion, expression, etc. (Kim et al., 2007). A face is occluded if some area of the face is hidden behind an object like a sunglass, a hand, a mask, as seen in Figure 1.1. Face occlusions can degrade the performance of face recognition systems including humans. Recent research projects e.g. (M.Al-Naser and Söderström, 2011) have used pre-determined occluded areas in standardized positions. After occlusion detection, apca (asymmetrical Principal Component Analysis) (Söderström and Li, 2011) was used for entire face reconstruction. apca is used to estimate an entire image based on the subset of the image, e.g. to reconstruct a partially occluded facial image using the non-occluded facial parts of the image. The experiments used a small database (n = 116) of facial images with no classification (Martinzer and Benavente, 1998). A property of the reconstructed images in (M.Al-Naser and Söderström, 2011) is that the reconstructed images have sharp edges between the original and reconstructed regions. This application can be used by the law enforcement agencies, access control systems, surveillance at different public places like ATM machines, air ports etc. 1.2 Goals of the thesis The overall goal of this thesis is to improve the performance of apca for reconstruction of occluded regions of facial images. The primary goal is to develop an algorithm for automatic detection and reconstruction of facial occlusions. The algorithm should be automatic and detect smaller occlusions compared to previous work. Furthermore, arbitrary occlusion should be handled, i.e. occlusions of any part of the face. A secondary goal is to develop an algorithm for smoothing the reconstructed images to reduce the edges between the original and reconstructed regions. 1

2 Chapter 1. Introduction Figure 1.1: Different types of occlusion. (a) Sunglasses occlusion. (b) Mask occlusion. A tertiary goal is to extend the AR database with more images and to classify the images individually according to gender, ethnicity etc. 1.3 Related work 1.3.1 Occluded face reconstruction M.Al-Naser and Söderström (2011) reconstructed the occluded regions using asymmetrical principal component analysis (apca). The occluded facial regions were estimated based on non-occluded facial regions. They did not detect the occlusion automatically rather occlusion on the facial images was marked manually. Jabbar and Hadi (2010) detected the face area using a combination of skin color segmentation and eye template matching. They used fuzzy c-mean clustering algorithm for detection of occluded facial regions. When the occluded region was one of the symmetric facial feature such as eye, then this feature is used to recover the occluded area. When the occluded area was not one of the symmetric facial feature then they used the most similar mean face from the database. 1.3.2 Facial occlusion detection Min et al. (2011) performed the facial occlusion detection caused by sunglasses and scarves using the Gabor wavelet. The face image were divided into an upper and lower half. The upper part was used to detect sunglass occlusions while the lower part was used for scarf occlusion detection. Kim et al. (2010) proposed a method to determine if a face is occluded by measuring skin color area ratio (SCAR). Oh et al. (2006) found the occlusion by first dividing the facial images into a finite number of local disjoints patches and then examine each patch separately.

Chapter 2 Theory 2.1 Principal Component Analysis (PCA) PCA (Jollifie, 2002) is a mathematical procedure that is used to transform potentially correlated variables into uncorrelated variables. Suppose we have a data matrix of observations of N correlated variables X 1,X 2,...,X N, PCA will transform the X i variables into N new variables Y i that are uncorrelated. The variables Y i are called principal components. The first principal component is in the direction of the largest variance of the origin. The other principal components are orthogonal to each other and represent the largest residual variance, see Figure 2.1. PCA can be used as a dimension reduction method to represent multidimensional, highly correlated data, with fewer variables. PCA is used for, e.g. information extraction, image compression, image reconstruction and image recognition. 2.1.1 PCA method/model Image-to-vector conversion A 2-dimensional image is transformed to a 1-dimensional vector by placing the rows side by side, i.e. where p i is the ith row of p and r is total number of rows. x = [p 1, p 2,..., p r ] T, (2.1) Subtract the Mean The mean is subtracted from each vector to produce a vector with zero mean. Let I 0 represent the mean then it is calculated as I 0 = 1 N N I j, (2.2) j=1 where N is the number of variables I. 3

4 Chapter 2. Theory Figure 2.1: The first vector Z 1 is in direction of maximum variance and second vector Z 2 is in direction of residual maximum variance. Calculate the covariance matrix The covariance of the mean centred matrix is calculated as Cov = W T W, (2.3) where W is a r-by-c sized matrix composed of the column vectors (I i I 0 ). Cov is a square matrix of size r-by-c. Calculate the eigenvectors and eigenvalues of covariance matrix The Singular Value Decomposition (SVD) Strang (2003) of a matrix A (r-by-c) decomposes σ 1 σ 2 E r c = U r r Σ r c Vc c T = [u 1, u 2,..., u r ]... σr 0. 0 [v 1, v 2,..., v c ] T, (2.4) where U is an r-by-r unitary matrix, σ is an rxc rectangular diagonal matrix and V is an cxc unitary matrix. In general, U and V are the left and right singular vectors, respectively and the singular values σ i 0 are sorted in descending order. If A is symmetric positive definite, U = V and contain the eigenvectors and σ i are the eigenvalues. Choosing components and forming a feature vector The eigenvector that is associated with the highest eigenvalue represents the greatest variance in the data whereas the eigenvector associated with lowest eigenvalue represents the least variance. The eigenvalues decrease in an exponential pattern (Kim, 1996). It is estimated that 90% of the total variance is contained in the first 5% to 10% of the dimensions. The eigenvectors associated with low eigenvalues are less significant and can be ignored. A

2.1. Principal Component Analysis (PCA) 5 feature vector b is constructed by selecting M eigenvectors associated with highest eigenvalues, from a total of N eigenvectors, i.e. Deriving the new dataset b = (e 1, e 2,..., e M ). (2.5) Take transpose of Feature Vector b and multiply it with W to get the final dataset Φ 2.1.2 PCA for images Φ = b T W. (2.6) The PCA is computed as the SVD of the covariance matrix Cov of the facial images. An Eigenspace φ is created by using the equation φ j = i b ij (I i I 0 ), (2.7) where b ij is eigenvector of covariance matrix {(I i I 0 ) T (I j I 0 )}. Eq. 2.6 and 2.7 are the same. The projection coefficients {α j } = { α 1, α 2, α 3...α x } for each facial image are calculated as α j = φ j (I I 0 ) T. (2.8) Each facial image is represented by taking the sum of the mean of all pixels and the weighted principal components. The representation becomes error free if all N principal components are used N I = I 0 + α j φ j. (2.9) The final facial image is constructed by I = I 0 + j=1 M α j φ j, (2.10) where M is number of selected principal components that are used for reconstruction of the facial image. An image with negligible quality loss can be represented by a few principal components because the first 5 10 % of the eigenvectors can represent more than 90% of the variance in the data (Kim, 1996). PCA achieves compression since fewer (M) than the original dimensions (N) are used to represent the images. A PCA model also allows images to be represented with only a few values (α s) and this is how PCA works for image representation. 2.1.3 Eigen faces The eigenvectors or principal components of the distribution of faces are the eigenfaces. Eigenfaces are like the ghostly faces. The first 3 eigenfaces obtained from AR database described in section 3.1 can be seen in Figure 2.2. Each individual face can be represented by a linear combination of eigenfaces. Each face is approximated using the best eigenfaces that have the most variance within the set of face images. The best M eigenfaces span an M-dimensional subspace- face space -of all possible images (Turk and Pentland, 1991). j=1

6 Chapter 2. Theory Figure 2.2: Eigenfaces (a) First eigenface. (b) Second eigenface. (c) Third eigenface. Figure 2.3: The blue part represents the eigenspace of non-occluded regions whereas the green part represents the pseudo eigenspace of the complete image. 2.2 Asymmetrical PCA (apca) apca is a method for estimating the entire space based on a subspace of this space. This method finds the correspondence between pixels in non-occluded regions and pixels behind occluded regions. 2.2.1 Description of apca apca is an extension of PCA (Principal Component Analysis). By using apca, entire faces are reconstructed by estimating the occluded regions based on the non-occluded regions of the images. Intensity (appearance) of non-occluded pixels is used to estimate the intensity of occluded pixels. In apca, two eigenspaces are constructed, one from non-occluded areas of occluded images where the eigenvectors are orthogonal to each other and the other space is the pseudo eigenspace that is constructed from the eigenvectors of the non-occluded image regions. In the pseudo eigenspace, the eigenvectors are not orthogonal, as seen in the Figure 2.2. 2.2.2 apca calculation In apca, a pseudo eigenspace is created. It models the correspondence between the pixels in the images but only non-occluded parts are orthogonal. Let I no represents the non-occluded image parts I. I no is modelled in an eigenspace Φ no = { } φ no 1, φ no 2, φ no 3,..., φ no N using the formula φ no j = i b no ij (Ii no I0 no ) (2.11)

2.3. Skin color detection 7 where b no ij are eigenvector values of the covariance matrix {(Ino i I0 no ) T (Ij no I0 no )} and I0 no is mean of the non-occluded regions, I no 0 = 1 N N j=1 (I no j ). (2.12) Eigenvectors of the non-occluded parts are used to make them orthogonal while the occluded parts are modelled according to the correspondence with the non-occluded parts. The pseudo eigenspace Φ p is calculated as Φ p j = i b no ij (I i I 0 ), (2.13) where I i is the original image and I 0 is the mean of the original images. Projection is used to extract the coefficients {α f j } from the eigenspace Φno α no j The complete facial image Î is reconstructed as = Φ no j (I no I0 no ) T. (2.14) Î = I 0 + M j=1 α no j Φ p j, (2.15) where M is the selected number of pseudo components that are used for the reconstruction. By using the above calculated projection coefficients, a complete image can be reconstructed from only non-occluded parts of the image. 2.2.3 apca for reconstruction of occluded facial region With the eigenspace modelling the non-occluded facial regions and pseudo eigenspace modelling the entire face, it is possible to use apca to estimate how a face image looks like behind the occlusions. When the spaces are created, the entire face needs to be visible so that the correspondence between the spaces can be modelled with apca. The eigenspace is created according to Eq. 2.14 and a pseudo eigenspace is constructed according to Eq. 2.16. The correspondence between the facial regions is captured in these two spaces. The non-occluded regions can then be used to extract projection coefficients α (Eq. 2.17) meaning that only non-occluded pixels affect the representation. When the pseudo eigenspace is used with these coefficients to recreate an image of the entire face (Eq. 2.18), the content of the previously occluded pixels is calculated based on their relationship with the non-occluded pixels. 2.3 Skin color detection This section follows (Cheddad et al., 2009). He uses 2 approximations l and ˆl for skin color detection. l is calculated as l(x) = ((r(x), g(x), b(x)) α) (2.16) where * represents matrix multiplication and the transformation matrix α = [0.298, 0.587, 0.140]. (2.17)

8 Chapter 2. Theory Figure 2.4: (a) and (b) represent the original images while (c) and (d) represent the registered images. The matrix ˆl is calculated as ˆl(x) = argxɛ{1,2,...,n} max(g(x), R(x)). (2.18) An error signal for each pixel is calculated as e(x) = l(x) ˆl(x), (2.19) and classified as skin or not skin by { 1, if 0.02511 e(x) 0.1177 f skin (x) = 0, otherwise. (2.20) 2.4 Image registration Image registration is the process of transforming a set of images into one coordinate system without changing the shape of the images. In this process, one image is selected as the base image and spatial transformations are applied on the other images so that these images align according to the base image. Image registration is performed as a preliminary step in order to apply different image processing operations on the dataset that have same coordinate system. If facial images are being aligned then after alignment, all the images will have their facial features like mouth eyes, nose, etc. in the same position. 2.4.1 Translation Translation is a process of geometric transformation in which an image element located at a position (x 1, y 1 ) is shifted to a new position (x 2, y 2 ) in the transformed image. The translation operation is defined as [ ] [ ] [ ] x2 x1 tx = + (2.21) y 2 y 1 t y where t x and t y are the horizontal and vertical pixels displacements, respectively.

2.5. Peak signal-to-noise ratio (PSNR) 9 2.4.2 Rotation Rotation is a geometric transformation in which the image elements are rotated by a specified rotating angle θ. The rotation operation is defined as [ ] [ ] [ ] x2 cos θ sin θ x1 = (2.22) y 2 sin θ cos θ y 1 2.4.3 Scaling Scaling is a geometric transformation that can be used to reduce or increase the size of the image coordinates. The scaling operation is defined as [ ] [ ] [ ] x2 cx 0 x1 = (2.23) y 2 0 c y 2.4.4 Affine transformation Affine transformation is a linear 2-D geometric transformation that uses rotation, scaling and translation operations. It maps variables located at position (x 1, y 1 ) in an input image into variables located at (x 2, y 2 ) in an output image by applying a linear combination of translation, rotation, scaling and/or shearing (non-uniform scaling in some direction) operations. The Affine Transformation takes the form [ ] [ ] [ ] [ ] x2 a11 a = 12 x1 tx + (2.24) y 2 a 22 y 1 t y Facial images used in this thesis are aligned using Affine Transformations. 2.5 Peak signal-to-noise ratio (PSNR) a 21 PSNR is used to calculate the ratio between the maximum possible value of a signal and the power of distorting noise that affects the quality of its representation. It is often used as a benchmark level of similarity between constructed image and the original image (Santoso et al., 2011). PSNR compares the original image with the coded/decoded image to quantify the quality of data that is the output of decompressing the encoded data. A higher PSNR value means that the reconstructed data is of better quality. The mathematical representation of the PSNR is ( ) P SNR = 10 log max 2 10, (2.25) where max is the maximum possible value of the image pixels and MSE is the mean squared difference between the compressed and the original data. MSE = X m=1 n=1 y 1 MSE Y [I 1 (m, n) I 2 (m, n)] 2 XY (2.26) where I 1 is the original image, I 2 is the reconstructed image, X and Y are the number of rows and columns respectively.

10 Chapter 2. Theory

Chapter 3 Method 3.1 The AR face database To perform the experiments, AR Face database (Martinzer and Benavente, 1998) was used. This database contains more than 4000 facial images of 126 persons including both male and female (70 men and 56 women). The database contains images with scarf and sunglasses occlusions and non-occluded images with different facial expressions. The original size of the images is 768x576 pixels. The images were taken in controlled conditions with no restrictions on wearing and style. 3.2 Automatic occlusion detection 3.2.1 Replace white color with black color The skin color detection method of section 2.3 classifies the white pixels as skin pixels. However, since white color is not a skin color, rather it is an occlusion. Therefore, white pixels are always replaced by black pixels before skin color detection. A pixel is classified as white if its R, G, B values are all greater than 190, where 255 is the maximum value. 3.2.2 Image cropping The original size of the images is 768x576 pixels. These images contain a lot of back ground area that effect the quality of reconstructed images. Therefore the images are cropped to a size of 171x144 pixels. 3.2.3 Image division The image (171x144)is divided into 6 parts: 2 head parts, 2 eyes parts and 2 mouth parts, see Figure 2.3 (b). The size of each head part is 45 72 pixels, the size of each eyes part is 54 72 pixels and the size of each mouth part is 72 72 pixels. In the second step, each part is further divided into 9 sub parts, see Figure 2.3 (c). By doing this, smaller facial occlusions can also be detected. In the third step, each part of second step is further divided into 9 sub parts, see Figure 2.3 (d). 11

12 Chapter 3. Method Figure 3.1: (a) an occluded facial image. (b) Image division into 6 parts. (c) Image division into 54 smaller parts (d) Image division into 486 parts. Figure 3.2: (a) an occluded facial image. (b) Image division into blocks. (c) Each black block represents an occluded block. 3.2.4 Occlusion detection for each block To detect the occlusion for each block, the skin color information is used. If a pixel is not a skin pixel, it is marked as an occluded pixel. If 25% of the pixels in a block are non-skin pixels, the block is marked as an occluded block. 3.3 Occluded face reconstruction After facial occlusion detection, a column vector is created that contains only the nonoccluded parts of each image. The column vectors are stored in a matrix that contains the corresponding non-occluded parts of the facial images in the database. Each image of the database is also converted into a vector and stored in a matrix. If there are 100 images in the database then this matrix will contain 100 vectors. The mean of each vector of non-occluded matrix is calculated and subtracted from each value of the vector. Similarly, the mean of each vector of the original facial matrix is calculated and subtracted from each value of the vector. This produces a dataset whose mean is zero. The covariance cov of the non-occluded facial matrix is calculated as described in section 2.1.1. The eigenvector and eigenvalues of the covariance matrix are calculated using the SVD. An eigenspace is constructed from the non-occluded parts of the images. Similarly, a pseudo eigenspace is constructed from all parts of the images in the database. The projection is used to extract the coefficients from the eigenspace. These extracted coefficients will be used for facial images reconstruction. A specific number M = 50 of eigenvectors are used for the reconstruction of the images. The choice of M = 50 was found by initial experiments. The final facial images data is constructed using the Eq. 2.15. At the last step, each vector of the matrix is reshaped to get the R, G, and B values for each image and to reconstruct the facial images. 3.3.1 PSNR calculation PSNR of the input image and reconstructed image is calculated to check the quality of the reconstructed image. If value of PSNR is more than 30, then it is normally considered that

3.3. Occluded face reconstruction 13 the reconstructed image is of good quality (Wikipedia, 2012).

14 Chapter 3. Method

Chapter 4 Experiment 4.1 Granularity effect This experiment examines the effect of the granularity of the occlusion on the apca reconstruction process. The image is divided into 6 parts at the first step. Occlusion for each facial part is determined. The non-occluded parts of the image are used to construct the eigenspace whereas the entire image is used to construct the pseudo eigenspace. At the second step, the image is first divided into 6 parts and occlusion is determined for each block. If a part is occluded then this part is further divided into 9 sub parts and the occlusion process is repeated. At the third step, the image is first divided into 6 parts then each part into 9 sub parts based on occlusion detection. Occlusion for each of these sub parts is determined. If any of the block is occluded, it is further divided into 9 sub parts and occlusion is determined for these parts. These small parts are used to construct the eigenspace and the entire image is used to construct the pseudo eigenspace. 4.1.1 Metric PSNR is used as a metric to determine the results of the granularity effect. PSNR is calculated for the entire image and for only the reconstructed part of the image. The number of non-occluded pixels used for encoding in each experiment are also calculated. 4.1.2 Sunglasses scenario In this scenario, the mask input image is occluded by the sunglasses. The image is divided into sub parts, the occlusion is detected for each of these parts individually and the full faces are reconstructed using apca image reconstruction method. The average PSNR of all the reconstructed faces is calculated to determine the quality of the reconstructed facial images and the average PSNR value of all the reconstructed occluded parts are also calculated. Furthermore, the number of pixels used in the reconstruction process and the time taken by each division method are recorded. In Figure 4.1, the image (a) is the original image, (b) is the input mask image occluded with sunglasses and (c) represents the two eigenspaces. The occluded input mask image (b) will be used in the below given 3 test cases. The green ellipse represents the pseudo eigenspace that is constructed from the non-occluded images as given in the image (a) 15

16 Chapter 4. Experiment Figure 4.1: (a) Non-occluded facial image. (b) An occluded image. (c) Eigenspaces. and non-occluded parts of the occluded images. The blue ellipse represents the eigenspace constructed from the non-occluded parts of the occluded images. Level 1 image division In level 1 image division method, the mask input image is divided into 2 head parts, 2 eyes parts and 2 mouth parts, see Figure 4.2(b). The occlusion for each part is detected separately. The full faces are reconstructed as described in section 3.3. In Figure 4.2, the image a represents mask input image occluded with sunglasses, the image b represents that the image is divided into 6 different parts and in the image c, the area marked with the black color is representing the detected occlusion in the eye parts. Note that by dividing the image into 6 parts does not detect all the occlusion and also some non-occluded regions are considered as occluded. The background regions in the 2 mouth parts are not detected by level 1 image division method. The reconstruction results of level 1 image division can be seen in Figure 4.3. The reconstructed image has some circles around the eyes. This is due to some images with the eye glasses in the database, so the corresponding eigenvectors leave some imprints on the reconstructed images. After reconstruction, the average PSNR of the complete reconstructed faces is calculated and also of the occluded reconstructed regions only. Furthermore, the number of pixels used in the reconstruction process are recorded. If more pixels are used in the reconstruction process, the reconstructed images should be better with higher average PSNR value. Level 2 image division In the level 2 image division method, the 6 parts of level 1 are further divided into 9 sub parts, see Figure 4.4(b). Each of these parts undergoes occlusion detection process and apca is applied to reconstruct the facial images. In Figure 4.4, the image a represents the mask input image occluded with sunglasses, the image b represents that the image is divided into 54 sub parts and in the image c, the black blocks represent the detected occlusions. The white background area that is not part of mouth is considered as an occlusion. This background occlusion is also detected by dividing the image into smaller parts. The level 2 image division method also marks some occluded area as non-occluded area, see Figure 4.4(c) where some parts of sunglasses are marked as non-occluded. The Figure 4.5 is an example of the image reconstruction using level 2 image

4.1. Granularity effect 17 division. Note that there are prominent circles around the eyes, the black background areas near the cheeks are not constructed well. Level 3a image division In the level 3a image division method, the 54 parts of level 2 are further divided into 9 sub parts, see Figure 4.6(b). The complete image is divided into 486 very small parts and occlusion is detected for each part separately. After occlusion detection, apca is applied to reconstruct the faces. Due to very small size of each part, very small occlusions can also be detected. In Figure 4.6, the image a represents the mask input image occluded with sunglasses, the image b represents that the image is divided into 486 sub parts and in the image c, the black blocks represent the detected occlusions. The Figure 4.6 (c) shows that it has detected almost all the facial occlusion but also has also marked the non-occluded area as the occluded area, hair and eyebrows are marked as occluded. The Figure 4.7 shows the face reconstructed by level 3a image division. The quality of the reconstructed image is better than level 1 and level 2 with less imprints of eye glasses around the eyes. Level 3b image division In the level 3b image division method, the 6 parts of level 1 are further divided into 9 sub parts. The occlusion is detected for each of these parts separately. If a part is occluded, it is further divided into 9 sub parts, see Figure 4.8(c). The occlusion is detected for these very small parts and apca is applied to reconstruct the faces. In Figure 4.8, the image (a) represents the mask input image occluded with sunglasses, the image (b) represents the detected occlusions by level 2 image division, the occlusion is marked with the black color, the image (c) represents that the detected occluded area by level 2 image division is further divided into sub parts and again the occlusion is detected for these very small parts, the image (d) represents the occlusion detection by level 3b image division method. Note that background and sunglasses occlusion is detected and very less occluded area is marked as occluded. From the Figure 4.8 (d), we can note that nose and cheeks area near the sunglasses that was marked as occluded in the Figure 4.8 (b) is now marked as non-occluded area. The Figure 4.9 is an example of the image reconstruction using this method. 4.1.3 Scarf scenario In this scenario, the input image is occluded by the scarf so that all the mouth area is occluded. The image is divided into sub parts, the occlusion is detected for each of these parts individually and the full faces are reconstructed using the apca method. The average PSNR of all the reconstructed faces is calculated to determine the quality of the reconstructed facial images and the average PSNR value of all the reconstructed occluded parts are calculated. Furthermore, the number of pixels used in the reconstruction process and the time taken by each division method are recorded. The figures 4.10 to 4.17 represent the 4 methods of image division applied on the mask input image occluded with scarf, occlusion detected by each of these methods and the reconstructed faces reconstructed using the 4 image division methods by applying the apca.

18 Chapter 4. Experiment Figure 4.2: (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions. Figure 4.3: An example of the reconstructed face by level 1 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.2 (c). (c) Reconstructed image. (d) Non-occluded image. Figure 4.4: (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions. Figure 4.5: An example of the reconstructed face by level 2 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.4 (c). (c) Reconstructed image. (d) Non-occluded image. Figure 4.6: (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions.

4.1. Granularity effect 19 Figure 4.7: An example of the reconstructed face by level 3a image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.6 (c). (c) Reconstructed image. (d) Non-occluded image. Figure 4.8: (a) An occluded image. (b) Occlusion detection by level 2 image division. (c) Level 3b image division. (d) Occlusion detection by level 3b image division. Figure 4.9: An example of the reconstructed face by level 3b image division (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.8 (d). (c) Reconstructed image. (d) Non-occluded image. Figure 4.10: (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions. Figure 4.11: An example of the reconstructed face by level 1 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.10 (c). (c) Reconstructed image. (d) Non-occluded image.

20 Chapter 4. Experiment Figure 4.12: (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions. Figure 4.13: An example of the reconstructed face by level 2 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.12 (c). (c) Reconstructed image. (d) Non-occluded image. Figure 4.14: (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. Figure 4.15: An example of the reconstructed face by level 3a image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.14 (c). (c) Reconstructed image. (d) Non-occluded image. Figure 4.16: (a) An occluded image. (b) Occlusion detection by level 2 image division. (c) Level 3b image division. (d) Occlusion detection by level 3b image division.

4.2. Pre-defined eigenspaces 21 Figure 4.17: An example of the reconstructed face by level 3b image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.16 (d). (c) Reconstructed image. (d) Non-occluded image. Figure 4.18: (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions. 4.1.4 Cap and sunglasses occlusion In this scenario, the head is covered by the cap and the eyes are covered with the sunglasses. The mouth parts contain some background occlusion so some/all areas of all the 6 parts of the mask input image are occluded. The input image is divided into different parts, occlusion is detected for each part and apca is applied to reconstruct the faces. The average PSNR of the complete reconstructed images and of only occluded reconstructed parts are calculated to determine the quality of the reconstructed images. The number of pixels used in the reconstruction process are recorded to determine the affect of the non-occluded pixels on the quality of the reconstructed faces. The processing time of apca process is also recorded. The figures 4.18 to 4.25 represent the 4 methods of image division applied on the mask input image occluded with cap and sunglasses, occlusion detected by each of these methods and the reconstructed faces reconstructed using these image division methods by applying the apca. 4.2 Pre-defined eigenspaces In this experiment, 6 different pre-defined eigenspaces are created and the pseudo eigenspace is constructed for each of them on all 116 images. The pre-defined eigenspaces have dif- Figure 4.19: An example of the reconstructed face by level 1 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.18 (c). (c) Reconstructed image. (d) Non-occluded image.

22 Chapter 4. Experiment Figure 4.20: (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions. Figure 4.21: An example of the reconstructed face by level 2 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.20 (c). (c) Reconstructed image. (d) Non-occluded image. Figure 4.22: (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. Figure 4.23: An example of the reconstructed face by level 3a image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.22 (c). (c) Reconstructed image. (d) Non-occluded image. Figure 4.24: (a) An occluded image. (b) Occlusion detection by level 2 image division. (c) Level 3b image division. (d) Occlusion detection by level 3b image division.

4.2. Pre-defined eigenspaces 23 Figure 4.25: An example of the reconstructed face by level 3b image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.24 (d). (c) Reconstructed image. (d) Non-occluded image. ferent kinds of sunglasses occlusions. When occlusion is detected, then the pre-defined eigenspace which has the least difference between the detected occlusion and the pre-defined eigenspaces is selected. This eigenspace is used to reconstruct the image with apca. The closest eigenspace is selected based on the positions of the occlusion in the eigenspace and the detected occlusion. If the occlusion of a pixel is the same in both versions, the score is 0, but if they are different, the score is 1. Then the eigenspace with the lowest score is selected. 4.2.1 Metric PSNR is used as a metric in two different ways. Calculate the PSNR for the entire image and for only the reconstructed parts. This would only need to be done for the 6 different predefined occlusions. The number of non-occluded pixels used for encoding in each experiment are also recorded. 4.2.2 Experiment description The pre-defined eigenspaces are constructed and saved in some storage media. These eigenspaces are created by dividing the occluded images of the Figure 4.19 as described in the section 4.1.2. A pseudo eigenspace and 6 eigenspace for each of the images in the Figure 4.26 are constructed and saved at some storage media. A vector containing the occlusion information about each part is also created and saved to be used later. If a part is occluded, 1 is stored in respective vector element otherwise 0 is stored. The occlusion of the mask input image is detected by following the section 4.1.2. A vector is created that contains the occlusion information about each part. This vector is compared to each vector of the predefined eigenspaces to calculate the number of occluded parts that have the same position in both input mask image and the image used in construction of pre-defined eigenspace. The eigenspace that has the maximum number of same occlusion positions is selected for the reconstruction of the facial images. The average PSNR of the complete reconstructed facial images and of occluded reconstructed areas is calculated to determine the quality of the reconstructed facial images. The time taken to perform the apca operation is recorded to determine the efficiency of pre-defined eigenspaces. The 6 faces having sunglasses occlusion that are used in the construction of 6 predefined eigenspaces can be seen in Figure 4.26. In the Figure 4.27, the image (a) is the mask input image, the image (b) represents the occlusion detection by level 3 (b)image division, the image (c) represents the pre-defined eigenspace that is selected based on the detected occlusion in the image (b), the image (d) represents the reconstructed image using pre-defined eigenspace.

24 Chapter 4. Experiment Figure 4.26: Occluded facial images used for construction of 6 eigenspaces. Figure 4.27: (a) An occluded image. (b) Detected occlusion by level 3b image division. (c) Pre-defined eigenspace most similar to the detected occlusion in (c). (d) Reconstructed image using the eigenspace in (c).

Chapter 5 Results In this chapter, the results of the experiments performed are described. The chapter is divided into three parts. In the first part, the results of 4 image division methods for automatic occlusion detection are discussed and images showing the output of these methods are displayed. In the second part, the reconstruction results based on the 4 methods of occlusion detection are discussed, tables containing the average PSNR values for the reconstructed faces, for the only reconstructed areas and table containing processing time for each image division method are displayed. In the third part, the discussion about the pre-defined eigenspaces will be made to determine the efficiency and reconstruction quality of the pre-defined eigenspaces. The tables containing the processing time to reconstruct the faces with and without pre-defined eigenfaces and average PSNR values of reconstructed faces are to be displayed and discussed. 5.1 Occlusion detection results The Figure 5.1 represents the occlusion detection by different image division methods, image (a) represents the mask input image occluded with sunglasses, image (b) represents the occlusion detection by level 1 image division, (c) represents the occlusion detection by level 2 image division (d) represents the occlusion detection by level 3a image division and (e) represents the occlusion detection by level 3b image division method. The grey blocks represent the marked occluded areas. In the level 1 image division method, the complete image is divided into 6 large parts. The size of each part is large and to determine the occlusion, its 25% area should be occluded. Due to large size of each part, less occlusion is detected. The image (b) shows that the occlusion in both eyes parts is detected. Since white background in mouth part is also an occlusion but it is not detected because this occlusion covers less than 25% of the corresponding parts. The image (b) also shows that some non-occluded area in both eyes part is also marked as an occlusion. In the level 2 image division method, the size of each part is small so it can detect the small occlusions. The image (c) shows that the eyes occlusions and background occlusions in mouth parts are detected whereas less occluded area is marked as non-occluded area. But still the size of each part is large, some non-occluded area is also marked as occluded area and less pixels are available for the reconstruction process. Many experiments were performed, the level 3a image division showed the best results for occlusion detection as compared to all other methods. 25

26 Chapter 5. Results Figure 5.1: Occlusion detection by different image division methods. (a) Occluded image. (b) Occlusion detection by level 1 image division. (c) Occlusion detection by level 2 image division. (d) Occlusion detection by level 3a image division. (e) Occlusion detection by level 3b image division. In the level 3a image division method, the size of each part is very small so it can detect very small occlusions. The image (d) shows that it has detected almost all the occlusion while marking some non-occluded area as an occlusion. The image (d) shows that it has marked eyes and background occlusion correctly but has also marked eyebrows and hair as an occlusion. Occlusion detection by level 3b image division, the process is divided into two steps. In the first step, the image is divided as described in the section 4.1.2 and the occlusion is detected for each part. This process detects the small occlusions whereas some non-occluded area is marked as an occlusion. In the second step, the occluded area marked at the first step is further divided into sub parts and occlusion is detected for each sub part. By doing this, the non-occluded areas marked as occluded area in the first step are now marked as non-occluded areas and more pixels gets available for the reconstruction of faces, see Figure 5.1 (e). The level 3b is also a good occlusion detection method. 5.2 Reconstruction quality results The quality of reconstructed faces is determined by PSNR. The average PSNR is calculated of the complete reconstructed faces and of the reconstructed occluded parts only. Table 5.1 shows the average PSNR of the complete reconstructed faces and Table 5.2 shows the PSNR for the reconstructed occluded parts. In tables 5.1, 5.2, 5.3 and 5.4, Level 1 shows the reconstruction of faces by level 1 image division, Level 2 shows the reconstruction of faces by level 2 image division, Level 3a shows the reconstruction of faces by level 3a image division and Level 3b shows the reconstruction of faces using level 3b image division method. The number of pixels used in the reconstruction of faces are recorded to determine the impact of number of non-occluded pixels on the quality of the reconstructed faces. Furthermore, the processing time taken by each image division method is also recorded. Table 5.1 contains the average PSNR values of all 116 reconstructed faces for 3 different types of occlusions. The level 1 image division has the maximum average PSNR value in case of sunglasses occlusion whereas the level 3a image division has maximum average PSNR value in scarf and cap & sunglasses occlusion. Table 5.2 contains the average PSNR values of the reconstructed occluded parts only for 3 different types of occlusions. The level 1 image division has the maximum average PSNR value in sunglasses and cap & sunglasses occlusion while the level 3a has maximum average PSNR value in the scarf occlusion. Table 5.3 contains the number of non-occluded pixels that are used in the reconstruction of the facial images. The quality of the reconstructed faces generally increases with the increase of number of non-occluded pixels.

5.2. Reconstruction quality results 27 Table 5.1: Reconstruction quality of the complete image (PSNR)[dB] for granularity effect Occlusion type Level 1 Level 2 Level 3a Level 3b Sunglasses 23.46 23.22 23.19 23.33 Scarf 19.85 19.85 20.01 19.87 Cap and sunglasses 19.95 20.32 20.38 20.34 Table 5.2: Reconstruction quality of the occluded reconstructed parts (PSNR)[dB] for granularity effect Occlusion type Level 1 Level 2 Level 3a Level 3b Sunglasses 23.30 20.99 20.89 20.78 Scarf 18.46 18.55 18.77 18.54 Cap and sunglasses 21.66 19.09 18.88 18.99 Table 5.3: Number of Pixels used in Reconstruction Occlusion type Level 1 Level 2 Level 3a Level 3b Sunglasses 50544 49464 47640 53496 Scarf 42768 42768 43512 45264 Cap and sunglasses 31104 42768 44736 46656 Table 5.4: Processing Time (sec) for granularity effect Occlusion type Level 1 Level 2 Level 3a Level 3b Sunglasses 24.04 37.77 40.33 43.60 Scarf 22.53 36.81 38.80 41.48 Cap and sunglasses 25.93 38.71 40.58 41.06

28 Chapter 5. Results Figure 5.2: Reconstructed image by different image division methods. (a) An occluded image. (b) Reconstructed image by level 1 image division. (c) Reconstructed image by level 2 image division. (d) Reconstructed image by level 3a image division. (e) Reconstructed image by level 3b image division. (f) Non-occluded image. Figure 5.3: Reconstructed image by different image division methods. (a) An occluded image. (b) Reconstructed image by level 1 image division. (c) Reconstructed image by level 2 image division. (d) Reconstructed image by level 3a image division. (e) Reconstructed image by level 3b image division. (f) Non-occluded image Table 5.4 contains the processing time for 4 image division methods that are applied on 3 types of occlusions. The results show that the level 1 image division takes the least processing time whereas the level 3b method takes the most processing time. This shows that the processing time depends on image division. When the size of each image part is large, it takes less processing time and when the size of each image part is small, it takes more processing time. The Figure 5.2 represents a single image reconstructed using different image division methods. The image (a) represents an occluded image, image (b) shows that the quality of the reconstructed image is good except some circles around the eyes but these circles are not very prominent. The images (c) shows that the quality of the reconstructed image is not good as we can notice prominent circles around the eyes. The white background area is also not reconstructed well. The image (d) and (e) represent that the images are reconstructed with good quality with some circles around the eyes but the circles are not prominent and the image (f) represents the non-occluded image. The visual evaluation and average PSNR values of the reconstructed images show that the level 3a image division generates the images with highest quality as compared to all other image division methods. 5.3 Reconstruction results using pre-defined eigenspaces Six pre-defined eigenspaces were constructed using six sunglasses occlusion masks where the vector was created by level 3a image division. The occlusion of the mask input image is detected and based on the detected occlusion, the closest eigenspace is selected for reconstruction process. The average PSNR of the reconstructed faces is calculated to determine the quality of the reconstructed faces. The processing time is recorded to determine the efficiency of the pre-defined eigenspace. Many experiments were performed and the deducted results showed a remarkable decrease in processing time with negligible quality loss of the