M.A. Huijbregts & B. Stobbe

3D Face Recognition How to make a fast and reliable database and compare the database with 2D+3D facial input? Circuits and Systems

3D Face Recognition How to make a fast and reliable database and compare the database with 2D+3D facial input? For the degree of Bachelor of Science in Electrical Engineering at Delft University of Technology July 3, 2013 Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) Delft University of Technology

Abstract A 3D face recognition algorithm has been developed for the Microsoft Kinect during the final bachelor project at Delft University of Technology in 2013. The aim of the project is to develop a prototype face recognition system. The prototype system has to outperform the existing 2D face recognition system. The main goal is to develop a 3D face recognition system and is divided into three parts. Each part was developed by a group of two students. The subjects were data-acquisition, data-processing and data-comparison. Data-comparison is the main topic of this thesis. Nowadays, security is an increasingly important topic in society. 3D face recognition can contribute to make the world a safer place. This thesis is about discovering the new 3D techniques in chapter 4, but first getting familiar with the 2D-world in chapter 3. Our results showed that a 2D system is not accurate to use as a face recognition system. This system is too sensitive for differences in lightning, poses and face expressions. Meanwhile the results achieved with our (proposed) 3D system were quite fast and accurate. The 3D system had five correct matches from the possible six matches. So only one person was not recognized and the process time is 1.19 seconds. This is not tested enough during the period that this thesis was created. It is concluded that 3D face recognition works accurately with geometry and normal map input in combination with the Haar-Walsh Transform and angle-based distance, even though the number of data is not sufficient yet. It even works with only the geometry image as input, both the Haar-Walsh and the Haar transform and the angle-based distance. These two options gave the same result. So as a future task we will increase the number of data to confirm our algorithm.

Preface The bachelor Electrical Engineering at the Delft University of Technology ends with the "Bachelor Afstudeer Project" (BAP). The last quarter of the final year was fully spent on this project. Our project group consist six Electrical Engineering students, all from an older curriculum. The main goal was to develop a 3D Face Recognition System in nine weeks. We were split up in three groups of two students: a data-acquisition, data-processing and data-comparison group. This thesis is about the data-comparison part.

Acknowledgements We would like to thank our supervisors S. Mandai MSc and Prof. Dr. E. Charbon for their support during the project. Also special credits to Harald Homulle BSc, Arie Stobbe MSc and Jan Stobbe, dr. Angela van Heerwaarden and Jan Stobbe BBA for their feedback on our thesis. Further the Circuits and Systems group deserves special thanks for the generous financial support to purchase a Microsoft Kinect for Windows. Delft, University of Technology July 3, 2013

Table of Contents Abstract Preface Acknowledgements i iii v Glossary List of Acronyms................................... List of Symbols................................... xiii 1 Introduction 1 2 Program of Requirements 3 2.1 Requirements with regards to the intended use of the product.......... 3 2.2 Requirements with regards to the design of the system............... 3 2.3 Requirements with regards to the (ecological) situation of the system in its surroundings....................................... 4 2.4 Requirements with regards to the production process................ 4 3 2D facial recognition 5 3.1 Principal Component Analysis............................ 5 3.2 Matching....................................... 6 3.3 Results........................................ 7 3.3.1 Our own database.............................. 8 3.3.2 BioID database............................... 9 3.3.3 FERET database.............................. 10 3.4 False Acceptance Rate & False Recognition Rate................. 11 xiii xiii

viii Table of Contents 4 3D facial recognition 13 4.1 Transform...................................... 15 4.1.1 Haar Wavelets................................ 16 4.1.2 Combining Walsh transform with Haar transform............. 18 4.2 Different distance measurements for 3D matching................. 18 4.3 Results........................................ 19 4.3.1 The different techniques.......................... 20 4.3.2 The process time.............................. 21 4.3.3 Summary and overview........................... 22 4.4 The choice...................................... 22 4.5 Future........................................ 24 5 Face recognition system development 25 5.1 Database....................................... 25 5.1.1 Prototype.................................. 25 5.1.2 Final System................................. 26 5.2 Graphical User Interface............................... 26 5.2.1 Prototype.................................. 26 5.2.2 Final System................................. 26 6 Conclusion 29 6.1 Conclusion...................................... 29 6.1.1 2D algorithm................................ 30 6.1.2 3D algorithm................................ 30 6.2 Future work..................................... 31 A B Our 3D algorithm using Geometry image, Haar-Walsh and Haar Transform and angle-based distance 33 Our 3D algorithm using Geometry and Normal map image, Haar-Walsh Transform and angle-based distance 35 Bibliography 37

List of Figures 3.1 Some results from the test on our own database.................. 8 3.2 The spreading of the distances; wrong and good ones............... 11 4.1 A lay-out of the matching part of the system................... 13 4.2 The input for the matching algorithm....................... 14 4.3 Three-scale standard wavelet decomposition of image: 3(a) image Lena.bmp; 3(b) result of decomposing; 3(c) structure of decomposing............... 17 4.4 Tree diagram of the different techniques in the test................ 19 5.1 Database diagram.................................. 26 5.2 Various stages of the GUI.............................. 27

x List of Figures

List of Tables 3.1 Percent of images correctly recognized for the FERET Dup. I and fafb test [1].. 6 3.2 Results of the BioID database tests......................... 9 3.3 Results of the FERET database tests........................ 10 4.1 Comparison of different compression techniques.................. 15 4.2 First step of the Haar wavelet............................ 16 4.3 Full decomposition of the image......................... 16 4.4 Results of the first 3D test. For each different technique the correct matches.. 20 4.5 the process time of the different blocks in the algorithm, including description of the block....................................... 21 4.6 Results of the first 3D test. For each test subject the corresponding matches with the database..................................... 22 4.7 The improvement in speed of the new algorithms................. 23 6.1 The different system requirements that are accomplished( ) in the present prototype. 29

xii List of Tables

Glossary List of Acronyms DBMS DCT DFT FAR FERET FRR GUI PCA RDBMS RGB Database Management System Discrete Cosinus Transform Discrete Fourier Transform False Acceptance Rate Face Recognition Technology False Recognition Rate Graphical User Interface Principal Component Analysis Relational Database Management System Red, Green and Blue

xiv Glossary

Don t worry when you are not recognized, but strive to be worthy of recognition. Abraham Lincoln

Chapter 1 Introduction Security has been a more important issue for several years. Security affects everyone, so everyone wants to reduce the risk getting hurt. A good example is the trouble caused by hooligans during soccer matches. In tackling these hooligans, innocent football supporters will be affected too. With cameras, an attempt is made to minimize this problem. This is done with 2D-cameras. However, these cameras have the great disadvantage that they are sensitive to varying poses, expressions and lightning. As a result, the hooligans would not be identified, or even worse, innocent people will be held responsible. In our Bachelor Afstudeer Project (BAP) we recognize faces through help of depth images. This is a way to give pixels in 2D images, next to an x and y coordinate, also a depth coordinate. A 3D image is created. Newer techniques work with the principal "Time of Flight". This is more accurate, but the pictures have been made with a Microsoft Kinect Camera [2,3] (which can make depth images by using triangulation [2] only). So triangulation has been used. To work with 3D images, we had to get familiar with 2D images and various algorithms first [4 15]. This was the task for the first weeks of the project. A. Almuhamadi, Masterstudent Media and Knowledge Engineering, did research [16] about 2D last year. However, 2D recognition is very sensitive to various lighting, poses and expressions [17], 2D is not accurate enough for recognizing people. 3D face recognition is an alternative to 2D face recognition and capable to recognize people. The state-of-the-art of 3D face recognition can be find in the literature [1, 9, 13, 18 20] and market [21]. The major player at the market is Artec ID from Russia [22]. They make complete systems for face recognition, they are also present at conferences to show their product. 3D face recognition is valuable for the security market, human computer authorization and computer security. The thesis is structured as follows. In chapter 2, the program of requirements about the data-comparison part of the project is listed. 2D facial recognition and the used algorithms for 2D are discussed in chapter 3. The subject of chapter 4 is the decomposition of data with wavelet functions and results of the 3D matching algorithm. The face recognition system

2 Introduction development is discussed in chapter 5. Suggestions for the future work and conclusions are provided in chapter 6.

Chapter 2 Program of Requirements Before designing a product, in this case the 3D facial recognition system, it is recommended to first define the specifications and requirements of the product first. Next to creating a good basis for the design, it also informs those not directly elaborated with the design process into the features of the final goal. The project is split in three parts, a data acquisition, a data processing and a data comparison part. The last part includes also an implementation of a database and of a graphical user interface. The program of requirements discussed here is about the third part. 2.1 Requirements with regards to the intended use of the product (2.1.1) The algorithm must compare incoming facial data with facial data from a database. (2.1.2) The algorithm must compute a Matching Rate, a False Acceptance Rate (FAR) and False Recognition Rate (FRR). (2.1.3) The database must be able to store data from 6 persons. (2.1.4) The database must be able to store a set of coefficients from a 3D model of every single person. (2.1.5) Graphical User Interface (GUI) must be user-friendly and visualize the matched name of the person. 2.2 Requirements with regards to the design of the system. (2.2.1) The database must be able to store at least 5 megabyte of data. (2.2.2) The execution time of the matching algorithm cannot exceed 2 seconds.

4 Program of Requirements 2.3 Requirements with regards to the (ecological) situation of the system in its surroundings. This Algorithm is only in the digital environment and does not have impact on the nature. The only impact is if our algorithm will be used on big databases with supercomputers to deliver the power for the calculations. The supercomputers consume a lot of energy and this energy is probably not clean. Privacy is an issue in the matching part, because it is connecting facial data with a name. So the database must be offline and secured. 2.4 Requirements with regards to the production process. (2.4.1) The algorithm will be tested and running in MATLAB. (2.4.2) The database will be connected with MATLAB. (2.4.3) The GUI will be implemented and running in MATLAB.

Chapter 3 2D facial recognition Video security systems are a well-known concept in daily life, nowadays 2D systems are the systems that are used. This chapter will introduce a 2D face recognition system based on Principal Component Analysis (PCA) and especially the matching and distance measurements. This is the last step of the total algorithm. Two techniques were used by the matching from a test picture with the database; weighted angle-based distance and Euclidean distance. 3.1 Principal Component Analysis The choice for PCA is made by the data processing group, the main reasons were that it was easy to implement and the distance measurements could also be used by the final 3D algorithm. It is known that the technique cannot provide the best results, but the main goal is to practice and get used to the basics. There are more and faster algorithms, but those algorithms take more time to understand and implement. 2D face recognition is not our final goal. According to [23], PCA is a useful statistical tool for identifying patterns in data. For data with a high dimension, where graphical representation is not easy to get, PCA is a powerful tool. Another advantage of PCA is the data compression by reducing the number of dimensions in the data by finding patterns, with a small loss of information. Those two properties are useful for face recognition. The technique PCA uses is eigenfaces; eigenfaces are generated from a database of images from persons, those eigenfaces are saved. Eigenfaces can be seen as a set of standardized face ingredients, which is a linear combination, and from those eigenfaces the face of the testobject can be extracted. Each face in the database can be generated from those eigenfaces, with the help of a feature vector. These feature vectors are used for the comparison and the recognizing. This is the input of the matching algorithm, the requirements of these vectors are that they must be real in all times and the eigenvalues which are used must be real and positive too.

6 2D facial recognition 3.2 Matching The final step of a facial recognition system is the matching, which means the comparison of a picture with the whole database. The way to do it is by measuring the distance between two eigenfeature vectors; one of the test picture and one for each picture in the database. So each test picture gets compared with each person in the database and the one with the smallest distance will be the match. The first database was a database with pictures from all six persons in the project group. The choice for which distance measurement was based on [1, 24]. In these surveys fourteen different distance measurements are compared with each other and there is concluded that four methods perform best. These are weighted angle based, L1(Manhattan distance), L2(Euclidean distance) and simplified Mahalanobis. All four measurements were tested with a part of the Face Recognition Technology (FERET) database [25]. The FERET is a large database containing 1196 persons. Each image contains a single face. From each test person there are a number of pictures; pictures with different lighting, angles and expressions. Also some were taken at a different day. In the database are four various sets of probe images; the fafb probe set containing 1195 images during the same time as the gallery picture, the only difference is the facial expression. the fafc probe set contains under significantly different lighting, it is the hardest probe and contains only 194 probe images. duplicate I probe covers 722 images of the subjects taken between one minute and 1031 days after the gallery picture. duplicate II probe is a subset from duplicate I probe set, where the probe image is taken at least 18 months after the gallery image. In the survey [1] the four different distance measurements were tested with a gallery of randomly five hundred persons of the FERET database on the duplicate I probe and the fafb probe shown in Table 3.1. From the results it can be concluded that Mahalanobis method is the best one, but the L1(Manhattan distance) is better by the fafb probe. Table 3.1: Percent of images correctly recognized for the FERET Dup. I and fafb test [1]. Classifier Dup. I fafb L1 35 77 L2 33 72 Angle 34 70 Mahalanobis 42 74

3.3 Results 7 After testing on our own database, the weighted angle-based distance measurement gave the best results. This was in contrast with the FERET database test from above, but it was in agreement with [24]. In this paper there was standing that for smaller databases indeed the weighted angle-based distance was better. Since this 2D system is a preliminary stage to come to the 3D system, there was not much time to investigate more. By the final 2D system two distance measurements were used for our tests; the weighted angle-based and the Euclidean distance. The distance can be calculated with the next two formulas [24]. X and Y represent the eigenfeature vectors with length n. The space the distances are computed in is the real space(r). (1) Euclidean distance (L2 metrics) d(x, Y ) = L p=2 (X, Y ) = X Y = n (x i y i ) 2 (3.1) According to [26] Euclidean distance is the theorem of Pythagoras applied to a distance in two-dimensions. Each point can be represented by a vector, and the Euclidean is the length of the vector minus the length of the other vector. (2) Weighted angle-based distance n z i x i y i i=1 d(x, Y ) = cos (X, Y ) = n n x 2 i i=1 yi 2 i=1 i=1, z i = 1/λ i (3.2) Where λ are the corresponding eigenvalues, from the covariance matrix, which is computed in the PCA algorithm. The formula is derived from the Euclidean dot product and has a range between -1 (perfect match) and 1 (no match). The minus symbol is used to compute the minimum distance otherwise the maximum distance was the best distance. This is not in our definition of the best match. a b = a b cos θ cos θ = a b (3.3) a b 3.3 Results When the algorithm was implemented, the next step was testing it. Each method has its own disadvantages, so also has PCA. A disadvantage of PCA is that you need a lot of different pictures of one person to ensure recognition, this results in more calculations and it will increase the process time. On 2D face recognition are also some disadvantages. Problems are: illumination of the pictures, different angles and facial expressions. With those disadvantages the following tests were chosen: Test our own database of six persons with different pictures in the gallery. Test BioID database with a large gallery 1521 pictures of 23 persons. Test the FERET database with the different probes.

8 2D facial recognition 3.3.1 Our own database Our own database contains six persons, there is a gallery map and a test map. The test map contains a test picture for each person and the gallery contains two or three pictures for each person. The pictures were taken in the same room, with almost the same conditions. Four of the six pictures have the right match. This result was already predicted, because the pictures in the gallery were not ideal and the gallery was not that large. The reason that there were four matches is because the samples in the gallery were taken in the same room with the same lighting. An ideal picture is a picture taken with neutral face expression, perfect lighting, no angle and no background. Another conclusion was, because the gallery was small around eighteen pictures, that it is a fast algorithm. It takes less than a second to make a match. Some results of one of the test can be found in Figure 3.1. By testing our own database with live capture real-time. The results were worse, almost nobody got a right match. So the test was adjusted; the program took ten pictures of the test person and compared all those pictures with the gallery. The next step was summing up the matches and the person with the most matches was the matched person. This test had good results in the same room, but when the real time test was in another room, one of six was matching with the right person. Due to the shoot of ten pictures it takes between three and ten seconds to get a match, this depends on the recognition from two eyes and a nose from the live capture, a task of the data acquisition group. A clean conclusion can be drawn by using the overview of the results; this 2D algorithm based on PCA is working with poor matching rates under different lighting and poses. When the pictures in the gallery are not ideal and with some small angle or expression, the test was also poor. (a) a mismatch (b) a match Figure 3.1: Some results from the test on our own database

3.3 Results 9 3.3.2 BioID database The next databases is the BioID database [27], it is a large database containing 1521 gray level images with a resolution of 384x286 pixels. Each one shows the frontal view of a face of one out of 23 different test persons. By making the BioID database special emphasis has been laid on real world conditions. Therefore the testset features a large variety of illumination, background and face size. For testing, a single picture of every test person was taken out of the database. From this set the eigenfaces were derived. The other pictures were the gallery to compare with. In this test the choice was to use a big gallery with 1498 pictures of 23 persons. Our hypothesis was; the results will be better, because PCA will work better with a larger database. There were two different test probes, a test probe with 23 pictures and a probe consist of 30 pictures. The first probe is with all the persons once, the second probe is with some persons double. The hypothesis was confirmed, by looking to the results in Table 3.2 there are high matching rates. If the speed is compared with the one from the first small database, an increase of the process time is observable. While the small database takes less than a second for making the eigenfaces and one match, the BioID databases takes more than two minutes (22 + 106 = 128 seconds). Two minutes is too long for real time recognition. When the databases will become larger there must come a smarter way to find the match. Table 3.2: Results of the BioID database tests Process time(sec) Size Eigenfaces One match Matching rate (%) Probe 1 23 22 111 91.3 Probe 2 30 22 106 90

10 2D facial recognition 3.3.3 FERET database The last test was on the FERET database. All the four probes were tested with the whole gallery from FERET. Next to the fact that the FERET database is very good and large, by using the FERET database the algorithm is easy to compare with the algorithms already available. The main difference from the last two tests is that the gallery, the set of pictures were the eigenfaces are computed from, consists of one picture from each person. This picture is taken in the ideal circumstances. The hypothesis on the FERET test is that the PCA algorithm will have low matching rates. The eigenfaces are getting better when you have a lot of data, more than one picture per subject. Table 3.3: Results of the FERET database tests Process time(sec) Size Eigenfaces One match Matching rate (%) Duplicate I 722 10.5 65 8.59 Duplicate II 234 10.2 65 0.43 fafb probe 1195 10.5 66 44.77 fafc probe 194 10.1 67 4.64 As shown in Table 3.3, the test did not give good matching rates. Only the third probe, the fafb probe set containing 1195 images with a different facial expression, had not bad results. The reason why the third probe was better than the others, is that the images in fafb probe are the closest to the gallery pictures. The second duplicate probe has a really low matching rate, here from can be concluded that our algorithm is not suited for aging. The database must be stay up to date. The process time is around 75 seconds, this is still too slow for live capture, but there are already some improvements compared with the BioID database. For larger databases a faster and smarter search algorithm is needed, but our database is a small database. If the rates in Table 3.3 get compared with those in the surveys named before, it can be concluded that this algorithm is not performing well. However, the 2D algorithm is just an exercise for making the final 3D facial recognition algorithm, it was not that important. Besides this fact, a Meta-Analysis [28] was done on face recognition algorithms. The conclusion was that it is difficult to compare different algorithms and results with each other because they did not use the same subset or set of subjects.

3.4 False Acceptance Rate & False Recognition Rate 11 3.4 False Acceptance Rate & False Recognition Rate In the last three tests only a matching rate is used as performance rate. It is standard in biometric recognition systems to use also False Acceptance Rate (FAR) and False Recognition Rate (FRR). According to [29]; the FAR is the likelihood that the biometric security system will incorrectly accept an access attempt by an unauthorized user. Next to the FAR there is the FRR. It is the measure of the likehood that the biometric security system will incorrectly reject an access attempt by an authorized user. It is not ideal to only focus on one of those terms, because a very low FAR in combination with a high FRR shows a poor matching rate. It is very difficult to make a clear difference (to introduce a threshold value) between a good result and a bad result, was a conclusion after evaluating 150, 75 good(square) and 75 wrong (triangle) ones. The results of this evaluation can be found in Figure 3.2. The distances are too close to each other, so setting a good threshold for what is right and what is wrong, is not easy to determine. For this algorithm it is too difficult to make the threshold in the given timeframe to investigate 2D algorithms. In the 3D algorithm the FAR and FRR will be measured. Figure 3.2: The spreading of the distances; wrong and good ones

12 2D facial recognition

Chapter 4 3D facial recognition There is a growing demand for better facial recognition systems, those which have lesser or no problems with lightning, different angles and expressions. 3D facial recognition is an upcoming market, the techniques are getting better, the research completer and the hardware less expensive. This chapter will introduce a 3D recognition system and especially the matching part. The chapter begins with an overview about the whole algorithm. section 4.1 and section 4.2 will go deeper into the parts of the algorithm. The next section, section 4.2 will contain the first results and a discussion of it. Finally, section 4.5 will give a sketch of the future work. The term distance measurement will be used often, the meaning of this term is the difference between the pixel values of test subject and the pixel values of the person in the database. So the minimum of a distance measurement is the best match. Database coefficients Name Geometry Image Transform compression technique Distance technique distances Compute minimum Smallest distance Match with name Normal Map Image Figure 4.1: A lay-out of the matching part of the system

14 3D facial recognition The algorithm s input are a geometry image and a normal map image, an example is shown in Figure 4.2. The first step is to do a transform on both the inputs, the results of this transform are coefficients. The coefficients are the input of the next block, the matching. The algorithm will compute three different distances between the input data and all the subjects in the database; angle-based distance, Euclidean distance and Manhattan distance. The last step will be the computation of the minimum (the match) and the link of a name from the database to the minimum. A lay-out of these steps can be found in Figure 4.1. In this lay-out, the database is already filled. The data in the database will be for every entry: coefficients of a normal map image coefficients of a geometry image The coefficients in the database are also the result of a transform. (a) Normal map image (the rose color (b) Geometry image (dark red is closest point, dark blue the farthest point) (left in the image) is the opposite direction (from the normal vector) of the green (right in the image) color) Figure 4.2: The input for the matching algorithm

4.1 Transform 15 4.1 Transform Before the facial data get stored in the database, it needs to be compressed, this saves space and will increase the process time. The chosen transform will convert from the images to a vector with lesser coefficients. It saves process time to compare coefficients that represent the image than the image itself. There are a lot of different compression techniques known. In Table 4.1 there is a comparison [30 38] of a few techniques: Principal Component Analysis (PCA), Discrete Cosinus Transform (DCT), Discrete Fourier Transform (DFT), Walsh transform, Haar Wavelets and a combination of the last two. Table 4.1: Comparison of different compression techniques Technique Advantages Disadvantage PCA Widely used Not capable of extracting local features little memory Hard to make reliable Lot of pictures needed to make accurate DCT Widely used quality lost Good results uses sinusoidal basis functions Coefficients small magnitude works with separate blocks in image Lots of information in coefficients Performs well on face recognition DFT Computational efficiency is complex Improved already a lot works with separate blocks in image Commonly used Walsh transform Various distribution performs bad at low compression rates Only additions and subtractions blurring near the edges less computations Haar Wavelets Simplicity A relative new technology High compression rates Well documented Haar-Walsh transform Simplicity Lesser compression rate than Haar Fast A relative new technology Good recognition rates Various distribution The choice was to do some testing on Haar Wavelets, because it is a well-documented, simple and fast algorithm with high compression as shown in Table 4.1. Next to Haar Wavelets, there will be tests on Haar-Walsh transform, this because it takes the advantages of both techniques. The only downside is that it takes more space in the database. Both techniques are commonly used in image compression and facial recognition. The reason for our choice is that the focus is on the speed, simplicity and good compression, with a very small loss of data.

16 3D facial recognition 4.1.1 Haar Wavelets The Haar basis is the simplest wavelet basis [39]. Haar basis are used by image compression, image editing and image querying. For this project, only the image compression is important. To get a feeling what the algorithm is doing, here is an example to explain it. Suppose an image with a resolution of four pixels, having the following values: [ ] 15 11 6 8 This image can be represented in the Haar basis, by computing a wavelet transform. The first step is to take average of the pixels of the image pairwise: [ ] 13 7 Now there are two pixels, the image has a lower resolution. Clearly, there is also some loss of data in the down-sampling process. For that reason some detail coefficients are stored. The detail coefficients are the difference between the average and the original pairwise pixels. So in this example the first detail coefficients has the value 2 because the computed average of 13 is 2 less than 15 and 2 more than 11. The second detail coefficient can be computed in the same way and got the value -1, since 7 + ( 1) = 6 and 7 ( 1) = 8. Summarizing, the resolution is divided by two, there are two average pixels and two detail coefficients: Table 4.2: First step of the Haar wavelet Resolution Averages Detail Coefficients [ ] 4 15 11 6 8 [ ] [ ] 2 13 7 2 1 This step can be repeated recursively on the averages. If the resolution becomes one the full decomposition is found, see Table 4.3. Table 4.3: Full decomposition of the image Resolution Averages [ ] Detail Coefficients 4 15 11 [ 6 ] 8 [ ] 2 13 [ 7 ] 2 1 [ ] 1 10 3 Finally, the result of the decomposition is represented with the total average of the original four-pixel image followed by the detail coefficients in order of increasing resolution. [ ] 10 3 2 1 Note that no information is lost: The original image can be retrieved from the coefficients. Storing the wavelet transform instead of the image itself has a number of advantages. An advantage is in general by images with a higher resolution the value of the neighbor pixels are similar, so the detail coefficients turn out to be very small. So by removing these small coefficients from the the wavelet decomposition there is only a small error introduced.

4.1 Transform 17 Two-dimensional Haar wavelet transforms The facial input, normal map image and geometry image, will not be represented by a single vector but in two dimensions, so the wavelet transform on one dimension is not enough. There are two ways to do a two dimensional Haar wavelet transform, they are both a generalization of the one dimensional wavelet transform described above. the standard decomposition [40] of an image can be obtained by first apply the one-dimensional wavelet transform on each row. The next step is doing the one-dimensional wavelet on each column separately of the transformed rows. The result are all detail coefficients except for the overall average coefficient. The second type is called the nonstandard decomposition [40] and alternates between operations on row and columns. First one step of the Haar wavelet is performed on each row of the image. Next one step of the Haar wavelet is performed on each column. To complete the transform we repeat this process only on the quadrant containing average in both directions. An example of this process is shown in Figure 4.3 [41]. Figure 4.3: Three-scale standard wavelet decomposition of image: 3(a) image Lena.bmp; 3(b) result of decomposing; 3(c) structure of decomposing. The choice for which type of two dimensional Haar wavelet was made based on the advantages of both types. An advantages of the first type is: it can be accomplished by performing the one-dimensional transform on all the rows and then on all the columns. An advantage of the second type is: it is more efficient to compute the coefficient of an image. For an m m image, the standard decomposition requires 4(m 2 m) assignment operations, while the nonstandard decomposition requires only 8 3 (m2 1) assignment operations. [39] One of the requirements of the matching algorithm is that it must be fast, this is accomplished by choosing for the nonstandard decomposition. Another advantage that there is, is a MATLAB function for the second type which can be used. Our algorithm will compute the whole Haar wavelet tree, where the level (the amount of steps) can be chosen. The reason of not having the full decomposition, is that too much data will be lost. The picture in the left corner with the average values is called the approximation picture made by the average coefficients. The black pictures with only the contours are the ones made with the help of the detail coefficients. The algorithm delete all the detail coefficients and stores only the average coefficients for comparison.

18 3D facial recognition 4.1.2 Combining Walsh transform with Haar transform Next to the Haar wavelet transform there were some test (see section 4.3) with a combination; the Haar wavelet and the Walsh transform. The reason behind this choice are that Walsh transform is a commonly used technique in the image processing [32, 37, 42] and the facial recognition [30, 31, 43]. Some other advantages can be found in Table 4.1. The combination is made by first doing the first two levels of Haar Wavelet transform and only on the average values, so the approximation picture. An example of a third level Haar wavelet is shown in Figure 4.3. The dark pictures are filled with low values and represent the detail coefficients. The product of this decomposition is the picture L2. The next step is to make a Walsh transform by multiplying the Walsh transform matrix with the picture L2, which was the result of the 2D Haar Wavelet decomposition. According to [31]: "The Walsh transform matrix is defined as a set of N rows, denoted W j, for j = 0, 1,..., N 1, which have the following properties: W j takes on the values +1 and -1. W j [0] = 1 for all j. W j W k T = 0, for j k and W j W k T = N, for j = k. W j has exactly j zero crossings, for j = 0, 1,..., N 1. Each row W j is either even or odd with respect to its midpoint." The result is saved as the Walsh coefficients and input for the matching block. 4.2 Different distance measurements for 3D matching The matching part was in the first stage divided into three distance measurements. The ones used in the 2D algorithm described in chapter 3 and there was decided to include also the most simple distance measurement; the Manhattan distance. The choice to include also the Manhattan distance was made based on [43], in this paper they used the same techniques as the data processing group. The argument is, it is a simple distance measurement. This implementation takes less process time, what makes the total algorithm faster. The Manhattan (also called L1 metric) is very straightforward: the sum of all the differences between the elements of the two vectors or matrices. The formula for the Manhattan distance with, n the size of the coefficient vector and X and Y the two compared coefficients vectors, is: n d(x, Y ) = L p=2 (X, Y ) = x i y i (4.1) i=1 In chapter 3, angle-based distance and Euclidean distance were used. There is a small difference with the ones used there. The angle-based distance in the 2D algorithm was a weighted

4.3 Results 19 angle-based distance. In the 3D algorithm the standard angle-based distance is used, because there are no eigenvalues in the techniques which is used in the data processing. The formula for Euclidean distance can be found in (3.1). The formula for standard angle-based distance, with n the size of the coefficients vectors X and Y, is: n x i y i i=1 d(x, Y ) = cos (X, Y ) = n x 2 n i yi 2 i=1 i=1 (4.2) The derivation of the angle-based distance formula can be found in (3.3). The final algorithm will be with the three different distance measurements combined or with two combined or just one. It will be a consideration between accuracy and speed. It depends on the results of the first test which are discussed in section 4.3. 4.3 Results In the first state of testing the algorithm was large, there were 12 distances computed; six distances on normal map images and six distances on geometry images. On each input picture the Haar-Walsh and the Haar transform was executed. The last stage is computing the three distance measurements of the coefficients of both the transforms. This was done to make a comparison which combination of transform plus distance measurements gave the most accurate result. An overview of all the possibilities is given in the tree diagram, Figure 4.4. Geometry Image Normal Map Image Haar-Walsh Transform Haar Transform Haar-Walsh Transform Haar Transform L1: Manhattan distance L2: Euclidean Distance Angle-based distance L1: Manhattan distance L2: Euclidean Distance Angle-based distance L1: Manhattan distance L2: Euclidean Distance Angle-based distance L1: Manhattan distance L2: Euclidean Distance Angle-based distance Figure 4.4: Tree diagram of the different techniques in the test

20 3D facial recognition 4.3.1 The different techniques The results for each technique are given in Table 4.4; for each technique, the correct matches are counted and can be found in the last column. For the Haar-Walsh transform there are two results in the last column. This is because in the computation of the Haar-Walsh coefficients is a quantization. Two tests with and without applying the quantization were carried out. A lower process time with the same matching rates were expected, but the results in Table 4.4 show that there is some degradation. If the process time gets compared, there is a slight time difference, but for one picture it is a profit of 0.04 seconds, so neglectable. Based on the better matching grade there is decided to do with quantization. Table 4.4: Results of the first 3D test. For each different technique the correct matches Inputs Transforms Distance measurements Correct matches out of 6 1 Geometry image Haar-Walsh Angle based 5/5 L1 4/3 L2 5/4 Haar Angle based 5 L1 5 L2 4 Normal map image Haar-Walsh Angle based 5/5 L1 4/5 L2 5/5 Haar Angle based 4 L1 4 L2 4 1 There are two values, with quantization/without quantization at Haar-Walsh transform. Some other conclusion that can be drawn from Table 4.4 is that the angle based distance measurements has the best results in every case, except for the normal map image Haar Transform combination. The L1 metric, the Manhattan distance, is the worst distance measurement. Based on this conclusion, the choice for angle based distance measurement is made. If the two input images get compared, the geometry image got 28 correct matches out of 36 and the Normal map image is 26 out of 36. In percent the matching grades are: geometry image 77.8% and normal map image 72.2%. So it is a difference of 5%. Another comparison derived from Table 4.4 is the one between the different Transforms; Haar-Walsh and only Haar. The combination got 28 matches out of 36, so a match rate of 77.8%. The Haar transform got 26 out of 36, so a match rate of 72.2%. The Haar Wavelet transform on the Normal map image give the lowest match rates. This one will not be chosen. A short summary of the conclusions: Applying quantization, gives a better result and a neglectable delay of 0.04 seconds is introduced Angle based distance measurements gives the best results Geometry images gives a higher matching rate than Normal map images

4.3 Results 21 The Haar-Walsh transform gives a higher matching rate than only Haar transform The Haar Wavelet transform on the Normal map image is the least favorable 4.3.2 The process time Table 4.5: the process time of the different blocks in the algorithm, including description of the block Different blocks Description process time (ms) Building database Including 6 persons, 2 pictures each 1.61 10 4 Transform Calculations Haar coefficients 1.46 10 3 Calculation Haar-Walsh coefficients 1.12 10 3 Compute distances Manhattan distance(l1 metrics) 5.11 10 1 Euclidean distance(l2 metrics) 4.70 10 1 Angle-based distance 1.32 Matching Compute minimum of 3 techniques and link to a name 3.54 10 1 Another important requirement is the speed of the algorithm. An overview can be found of the process time of the separate parts of the algorithm in Table 4.5. The highest process time is in the building of the database, 16.1 seconds. This is not a problem, because the database will only be created once and when there are new entries. The next clear conclusion is there is a difference of 0.3 seconds between the calculation of the Haar coefficients and the combination coefficient. The calculations of the Haar-Walsh coefficients takes less time. On first side this looks small, but in comparison with the other blocks it is relative large. If the level of decomposition is decreased, the process time for Haar wavelet coefficient calculations will be lower. If the three different distance computations get compared; the Manhattan and Euclidean distances get computed three times faster than the angle based distance. 0.51 and 0.47 milliseconds for the Manhattan and Euclidean distance against 1.3 milliseconds for the angle based distance. Again a short summary of the conclusions: Building the database takes the most of the processing time, but this is no problem, it will only be created once The calculations of the coefficients from the Haar-Walsh Transform is faster than the calculations of the Haar Transform coefficients. The Manhattan and Euclidean distances get computed three times faster than the angle based distance.

22 3D facial recognition 4.3.3 Summary and overview Table 4.6: Results of the first 3D test. For each test subject the corresponding matches with the database Test subject BenjaminG BenjaminS Felix Mart Michiel Thije Doff Database BenjaminG 12 5 0 0 0 0 0 BenjaminS 0 7 0 7 0 0 2 Felix 0 0 12 0 0 3 0 Mart 0 0 0 5 0 0 4 Michiel 0 0 0 0 12 2 0 Thije 0 0 0 0 0 7 6 In Table 4.6 an overview is given of the matches for each test subject with the database. In this test there is always a match, because the match is defined as the minimum of the distances. A quick look shows that BenjaminG, Felix and Michiel had a perfect match, all the twelve possible matches gave the right match. Looking to BenjaminS and Thije there is still the right match but not perfect, because seven of the possible matches are good. The only person that is not recognized in the total system is Mart with only five right matches out of twelve. Finally there was a test with someone that was not in the database included he had six matches on Thije, so it depends on the technique which are used if Doff will get recognized. 4.4 The choice The algorithm will not use all the distance measurements and transforms because this will not increase the accuracy of the whole algorithm. Another reason is if all the distance measurements and transforms will get used the algorithm will not be as fast as if the best combination is chosen, speed is also an important requirement or the algorithm. Before the choice can be made a summary is given of the results in section 4.2: Applying quantization, gives a better result and a neglectable delay of 0.04 seconds is introduced Angle based distance measurements gives the best results Geometry images gives a higher matching rate than Normal map images The Haar-Walsh transform gives a higher matching rate than only Haar transform Building the database takes the most of the processing time, but this is no problem, it will only be created once The calculations of the coefficients from the Haar-Walsh Transfrom is faster than the calculations of the Haar Transform coefficients.

4.4 The choice 23 The Manhattan and Euclidean distances get computed three times faster than the angle based distance. The Haar Wavelet transform on the Normal map image is the least favorable Based on the results there are two ideal paths on the diagram tree in Figure 4.4: The first possibility is to take only the Geometry image, do both the Haar-Walsh and Haar transform and finally compute the match, by computing the angle based distances. The second possibility is to take both the input pictures, do only the Haar-Walsh transform and compute the angle based distances. The only distance measurement that is chosen, is the angle based distance. The angle based distance had the highest matching rate and the quantization does not have influence on the angle based distance, the only downside is that it is three times slower than the other techniques. Here the accuracy is chosen in favor of the speed. After testing the new algorithms an improvement of speed was already found, this can be seen in Table 4.7. The recognition rates are still the same and can be found in Table 4.4. In the table is referred to the old algorithm, this is the algorithm; which uses the two input pictures, the two decomposition techniques and the three different distance measurements. The Geo algorithm is the first possible algorithm named above and the Geo-Nor algorithm is the second named above. In the Table 4.7 the process time is the time without the building of the database. Table 4.7: The improvement in speed of the new algorithms Algorithm Old 2.68 Geo 1.29 Geo-Nor 1.19 process time (seconds)

24 3D facial recognition 4.5 Future There is still work to be done. In the first place there must be more data for testing, this will justify the choice that is made or counter it. If the choice is countered a new choice of techniques must be made. Another problem is that there is no threshold, this implements that every person will get a match with one or more persons in the database. The test needs a False Acceptance Rate (FAR) and a False Recognition Rate (FRR) so people will get rejected. The pictures now made are with a neutral expression and with good illumination, there is no test what lighting and different expression will do with the matching rates. This is something what still needs to be done, if one wants to say something about the performance of the total system. The algorithm is implemented in MATLAB and uses MATLAB functions. In the future this algorithm needs to run on C++ to get an increase in speed and if the algorithm is written in C++ there are no problems on embedded systems.

Chapter 5 Face recognition system development 5.1 Database A database is the solution to store large amount of data decently. This varies from a few to tens of thousands of records. 3D face recognition requires a database with information of specific points from the face. This information must be compared with incoming data. 5.1.1 Prototype To get a working prototype at the end of the Bachelor Project, the database must be fairly easy to implement. This does not mean that a database is less important, but not that interesting. Because the amount of time for the whole system is little, MySql is chosen. MySQL MySQL is a very reliable, fast and open-source Database Management System (DBMS). However, MySQL is just suitable for small datasets. When the database has data from six people (the group members), MySQL will work perfectly. Relational Database Management System The database system will be a Relational Database Management System (RDBMS). The person s name, facedata and other relevant information will be stored. One database is for the personal information, another for grayscale data from the face and the third database is for the Red, Green and Blue (RGB) data from the face. To get the required information, queries will be executed.

26 Face recognition system development Personal_data RGB_data UniqueID Vector_coefficients UniqueID LastName FirstName Grayscale_data UniqueID Vector_coefficients Figure 5.1: Database diagram 5.1.2 Final System The prototype is the beginning of the final system. A database with information of six people would not be enough to put this product on the market. So for larger companies, another DBMS must be used. This would be a licensed DBMS, these are mostly capable to operate with larger datasets. According to [44] Oracle is one of the best RDBMS, it has also a security certification. This in contradiction to MySQL. The database must be secure, because several private data is stored in the database. The system must be able to "learn" data, so a database is necessary. 5.2 Graphical User Interface 5.2.1 Prototype The Graphical User Interface (GUI) is a very important part of the system. It is the only connection with the client/customer. The GUI for the prototype was built in MATLAB, because the other two groups work in MATLAB. In Figure 5.2(a) the start screen of the GUI is displayed. The algorithms behind the GUI will not work real-time, therefore the start button must be pushed manually. When the system is determining the person, a percentage will be visible to show the progress of the recognition. As can be seen in Figure 5.2(b). When the system finishes the recognizing, the matched name will show Figure 5.2(c). Otherwise "The system does not recognize you." will be visible. A new recognizing can be start by pushing the start button. 5.2.2 Final System MATLAB is the programming language for the prototype phase. However, MATLAB is not fast enough to be used in the final system. The final system will be programmed in C++ with the Qt toolkit maintained by Nokia. C++ itself has no GUI builder in it. The Qt toolkit is an extension for C++ to build GUI s in C++. C++ is very fast and powerful programming language and most suitable for the job. Other candidates are Java, Python and Perl, but C++ is a faster and more common programming language.

5.2 Graphical User Interface 27 (a) Welcome screen GUI (b) The system is trying to determine the person (c) Good match Figure 5.2: Various stages of the GUI