Extracting a Good Quality Frontal Face Images from Low Resolution Video Sequences Pritam P. Patil 1, Prof. M.V. Phatak 2 1 ME.Comp, 2 Asst.Professor, MIT, Pune Abstract The face is one of the important remote biometrics and it is used in facial analysis systems, like human-computer interaction, face detection and recognition and so on. The face detection technique is first step for any face recognition system, surveillance system. The real-world challenges of these systems is that they have problem working with low-resolution (LR) images. This is the reason that surveillance applications in public places need human operators to identify suspected people. Therefore, it is required an automated system which is working with LR and low-quality faces. Super-resolution (SR) is one of such technique for obtaining high resolution (HR) image from one or more LR input images. SR algorithms are broadly classified into two classes: reconstruction based SR (RBSR) and learning-based SR (LBSR).In this paper, we relate to extracting low quality faces from the video and convert it into high quality. Keywords Surveillance video, Face Detection, Face recognition, Super resolution, RBSR, LBSR. I. INTRODUCTION As computer becomes faster it is possible to deal with the applications dealing with human faces. Examples of these applications are face detection and recognition applied to surveillance system, gesture analysis applied to user friendly interfaces, or gender recognition. From the inexpensive surveillance cameras, inputed low quality and low resolution images to the system like face detection, face recognition, produces erroneous and unstable results. Therefore, there is a need for a mechanism to bridge the gap between low-resolution and low-quality images and on the other hand facial analysis systems. The proposed system in this paper deals with exactly this problem. Our approach is to detect the human faces from the video and apply a reconstruction-based super-resolution algorithm for obtaining high quality faces. Such an algorithm, however, has two main problems: first, it requires relatively similar images with not too much noise and second is that its improvement factor is limited. To deal with the first problem we introduce a three-step approach, which produces a face-log containing images of similar frontal faces of the highest possible quality. To deal with the second problem, limited improvement factor, we use a learning based super-resolution algorithm applied to the result of the reconstruction-based part to improve the quality by another factor. This results in an improvement for the entire system. The rest of the paper is organized as follows. Section 2 gives related work. We introduced approaches for converting low quality face into high quality face are gives in section 3.Section 4 gives super resolution techniques. Some experimental results are presented in section 5. Finally we conclude in section 6. II. RELATED WORK Face detection [1] is the first stage of a face recognition system. A lot of research has been done in this area, most of that is efficient and effective for still images only. So could not be applied to video sequences directly. In the video scenes, human faces can have unlimited orientations and positions, so its detection is of a variety of challenges to researchers. Generally there are three types process for face detection based on video. At first, it begins with frame based detection. During this process, lots of traditional methods for still images can be introduced such as statistical modelling method, neural network-based method, SVM-based method, HMM method, BOOST method and color-based face detection, etc. However, ignoring the temporal information provided by the video sequence is the main drawback of this approach. Secondly, integrating detection and tracking, this says that detecting face in the first frame and then tracking it through the whole sequence. Super resolution[2] task is cast as the inverse problem of recovering the original highresolution image by fusing the low resolution images, based on reasonable assumptions or prior knowledge about the observation model that converts the high resolution image to low resolution ones. Super resolution algorithm is classified into two classes Reconstruction based super resolution (RBSR) and Learning based super resolution (LBSR). Reconstruction based super resolution algorithm [3] usually work with more than one LR input image. These LR inputs must have intra-image sub-pixel misalignments. 237
The algorithm uses these misalignments to reconstruct the missing HR details of the inputs. These misalignments are considered in a registration step before starting the reconstruction process. The first RBSR system was a frequency domain algorithm proposed by Tsai and Huang[4]. They used the shifting property of Fourier transform. Other RBSR methods are non-uniform interpolation (NUI)[5], projection onto a convex set (POCS) methods[6], maximum likelihood (ML) methods[7], and maximum a posteriori (MAP) methods[3]. Computational cost of NUI methods is low but they cannot use the degradation models if noise and blur are different in the LR images. POCS methods are easy to use but they have high computational costs, slow convergence and problems in producing a unique response. ML methods simply look for an HR image that maximizes the probability of having the LR inputs. ML methods are prone to be extremely ill-conditioned. MAP methods explicitly use the a priori information in the form of a prior probability on the HR image. These methods are also illposed and need some regularization term for stabilizing their final response. Learning Based SR algorithm learns the relationship between the high resolutions (HR) images and their corresponding low resolution (LR) versions. They use the knowledge to predict the missing HR details of LR inputs. The relationship between HR and LR images can be learned in different ways: resolution pyramid, neighborhood embedding, manifolds, compressed sensing, sparse representation, neural networks. III. AN APPROACHES FOR QUALITY IMPROVEMENT OF LOW RESOLUTION FACES The following figure 1 shows architecture of proposed system. In the first block of the system, face-log generation, the input video sequence is summarized into frames. The images inside the face-logs are very similar to each other and are of better quality compared to the other images of the sequence. It means they are good inputs to the next block of the system, which is a cascaded SR. In this block, an RBSR is applied to the generated face-log(s) and produces an HR image. The quality of this HR image is improved compared to the LR images in the video sequence. This HR image is then fed to an LBSR to improve it even more. INPUT VIDEO A. Face Detection FACES FROM VIDEO SEQUENCE INITIAL FACE LOG INTERMEDIATE FACE LOG REFINED FACE LOG Figure 1. System Architecture RECONSTRUCTION BASED SR LEARNING BASED SR SUPER RESOLUTION SR OUTPUT FACE Usually face detection [1] and localization is the first step in many biometric applications. Face detection is defined as to isolate human faces from their background and exactly locate their position in an image. Automating the process to computer requires the use of various image processing techniques. Due to faces in different scales, orientations, and with different head poses, face detection is challenging task. In our system multiple frames are generated from the given video. From that video we detect the face or multiple faces. For this purpose we use template matching and skin color information as a method of detecting faces. Template Matching: Template matching is the technique in image processing for finding small parts of image which match a template image. Template matching can be subdivided into two approaches feature based and template based matching. In the feature based approach the template image has strong features. While in template based approach the templates are without strong features. Skin color information: Skin detection plays an important role in image processing applications. Recently skin detection methodologies based on skin color information as a cue has gain much attention as skin color provides computationally effective information about rotation, scaling and partial occlusion. Skin color can also be used as complementary information to the features. Skin color detection in visible spectrum can be a very challenging task as the skin color in an image is sensitive to various factors. 238
Illumination: a change in the light source distribution and in the illumination level produces a change in color of the skin in the image. Camera characteristics: Even under same illumination, the skin color distribution for the same person differs from one camera to another, depending on the camera sensor characteristics. Ethinicity: Skin color also varies from person to person belonging to different regions or different ethnic groups. Individual characteristics: Such as age, sex and body parts also affect. B. Converting low quality face into high quality face The proposed system deals with problem of super resolution system working with faces coming from surveillance video. Such system has several problems like ill conditioned nature of the HR response, slow convergence of the system, and small magnification factors, the slight-motions restriction of the objects which is harder to control for longer video sequences. There are many faces in videos that are not useful because of problems like not facing to the camera, blurriness due to the movement of the object, uneven illumination and so on so that feeding all frames of a surveillance video sequence to any SR system is impossible. Including all such images in the computations reduces the speed of the system and gives final result erroneous and unstable. Therefore it is a need of assess the quality of the detected faces and discard the useless one. IV. SUPER-RESOLUTION (SR) Super resolution [2] is the technique for converting low resolution images into high resolution faces. The SR task is cast as the inverse problem of recovering the original highresolution image by fusing the low-resolution images, based on reasonable assumptions or prior knowledge about the observation model that converts the high resolution image to the low-resolution ones. The fundamental reconstruction constraints for SR is that recovered image, after applying the same generation model should reproduce the observed low resolution image. SR algorithms can be categorized into four classes. Interpolation-based algorithms register low resolution images (LRIs) with the high resolution image (HRI), and then apply non-uniform interpolation to produce an improved resolution image which is further deblurred. Frequency based algorithms try to dealias the LRIs by utilizing the phase difference among the LRIs. A. Reconstruction Based Super Resolution Reconstruction based super resolution algorithm works with more than one low resolution images. These LR inputs must have intra-image sub-pixel misalignments. The algorithm uses these misalignments to reconstruct the missing HR details of the inputs. These misalignments are considered in a registration step before starting the reconstruction process. The employed SR algorithm in this paper is cascading both types of reconstruction based and learning based SR algorithm. In order to reconstruct the HR image from the LR images of the refined face-log we assume that these images have been produced from the HR image by following an imaging model [10]. Based on the imaging model each LR image has been created by warping, blurring and downsampling the HR image. It means that each X i, i = 1, 2,..., m3 LR images in the refined face-log have been obtained by Xi DBiWiH + ni... (1) Where D, B i, and W i are the down-sampling, blurring, and warping matrix, respectively, H is the HR image and n i is the introduced noise to the imaging process for producing the i th LR image from the HR image H. Obtaining the HR image H from (1) i.e. from imaging process is an inverse problem that is extremely ill-posed and ill-conditioned. A MAP method has been employed to obtain the HR response image and a Markov regularization term to convert the problem to a well-posed one. Staking all the m 3 LR image of the refined face-log into a vector X=[X1, X2,..,Xm 3 ], the HR result can be estimated by H= arg max p(h X). A. Learning Based Super Resolution The employed LBSR algorithm in this paper is a MLP. The benefit of using neural networks is their auto-ability in learning the complicated space of the human faces. For training this MLP we have used around 100 frontal and semi-frontal face images. To prepare the training data for the network, first, we have extracted the face areas of the HR images and converted them to grayscale. Then, an LR image is created for each of these HR images by down-sampling the images by a factor of two. Then, all of these LR/HR pairs are fed to the network as the training samples and the network learns the relationship between them. The original HR face images in this database are all resized to 184 224 and their corresponding LR images are of 92 112 pixels. 239
This three-layered MLP has 92 112 neurons in its input layer, each corresponding to one of the pixels of the input LR image, ten neurons in the hidden layer, and 184 224 neurons in the output layer, each corresponding to one of the pixels of the output HR image. V. EXPERIMENTAL RESULTS In this paper as our framework measure the quality of the video on the basis of resolution of the camera and it calculates the frame rate. If quality of the video is high then video will be discarding and low quality video will take for further proceedings as we worked on the low quality and low resolution videos. The imaging parameters are calculated as follows: If video quality is in between 1 to 3 it will be low resolution and low quality. This framework calculates the total number of frames. After this, entire video is converted into the frames and faces will be detected from the frames. Detected faces will be cropped and saved into databases. 240
VI. CONCLUSION Facial analysis applications require high quality frontal face images, but typical surveillance videos are of poor quality. Therefore, there is a need for a mechanism to bridge between low quality, low resolution face images from video sequences and their applications in facial analysis system. Super-resolution is one of these mechanisms. The techniques specified in this paper deals with the real world problems of super-resolution systems working with surveillance video sequences. Before applying super resolution technique we have seen different face detection techniques, their merits and future enhancement. By applying RBSR we get high quality face from low resolution video sequences. By applying LBSR technique we get more rich quality face. In this way we can get more clear face from low resolution. REFERENCES [1] Adesola Olaoluwa Anidu,Fasina Ebunoluwa Philip,C IMPLEMENTATION OF A FACE DETECTION SYSTEM USING TEMPLATE MATCHING AND SKIN COLOR INFORMATION,Annals.Computer Series.2012. [2] Y. Huang and Y. Long,Super resolution using neural networks based on the optimal recovery t theory,j.comput. Electron., vol. 5, no. 4, pp. 275-281, 2007. [3] L. Zhang, H. Zhang, H. Shen, and P. Li,A superresolution reconstruction algorithm for surveillance images, Signal Process., vol. 90, no. 3,pp. 848-859, 2010. [4] T. S. Huang and R. Tsai, Multi-frame image restoration and registration,adv. Comput. Vis. Image Process., vol. 1, no. 2, pp. 317-339,1984. [5] S. Lertrattanapanich and N. K. Bose, High resolution image formation from low resolution frames using Delaunay triangulation, IEEE Trans.Image Process., vol. 11, no. 12, pp. 1427-1441, Dec. 2002. [6] A. J. Patti and Y. Altunbasak, Artifact reduction for set theoretic super resolution image reconstruction with edge adaptive constraints and higher-order interpolants, IEEE Trans. Image Process., vol. 10, no. 1,pp. 179-186, Jan.2001. [7] M. E. Tipping and C. M. Bishop, Bayesian image superresolution,adv. Neural Inform. Process. Syst., vol. 15, pp. 1303-1310, 2002. [8] Shin-Cheol Jeong and Byung Cheol Song,Fast Super-Resolution Algorithm Based on Dictionary Size Reduction Using k-means Clustering,ETRI Journal, Volume 32, Number 4, August 2010 [9] M. Irani and S. Peleg,Improving resolution by image registration,graph. Models Image Process., vol. 53, no. 3, pp. 231-239, 1991. [10] Kamal Nasrollahi, Student Member, IEEE, and Thomas B. Moeslund,Extracting a Good Quality Frontal Face Image from a Low-Resolution Video Sequence,IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011. [11] V. H. Patil, D. S. Bormane, and V. S. Pawar, Super resolution using neural network, in Proc. IEEE 2nd Asia Int. Conf. Modeling Simul., May 2008. [12] D. Glasner, S. Bagon, and M. Irani,Super-resolution from a single image, in Proc. ICCV, 2009. 241