The Influence of Video Sampling Rate on Lipreading Performance
|
|
- Hubert Neal
- 7 years ago
- Views:
Transcription
1 The Influence of Video Sampling Rate on Lipreading Performance Alin G. ChiŃu and Leon J.M. Rothkrantz Man-Machine Interaction Group Delft University of Technology Mekelweg 4, 2628CD, Delft, The Netherlands {A.G.Chitu, Abstract That adding visual information improves the recognition of speech is an established fact. However, since the video sampling is done at a much lower rate than speech sampling and the increase in performance, while promising, is still not at the expected level, the question of whether or not is it useful to use high speed recordings makes its appearance. We propose in this paper an analysis of the gain in information from the point of view of lipreading accuracy, based on Root Mean Square Deviation (RMSD) measure. We analyze both real data, recorded using high speed technology, and synthetic data obtained using the well known software tools CUAnimate and CSLU Toolkit. The analysis is performed on two different and highly used types of features the mouth width and height and optical flow. Taking into account the rate of speech our preliminary findings show two different situations. In the case of low speech rate, usually used during spellings or when uttering connected digits (e.g. telephone numbers, account numbers) or uttering separate words, the information gained by using high speed recordings is not significant enough to justify the tremendous amount of resources needed when working with high speed recordings. However, in the case of continuous speech, when the speech rate tends to increase, a recording rate in the range of 24 to 30 frames per second is definitely insufficient. Then interpolation is needed which decreases the recognition performance. 1. Introduction The work in the domain of speech recognition gains more and more importance in our times. Modules that are offering accurate and reliable speech recognition are required for almost any system deployed: from smart control rooms, public kiosks, mobile devices to smart electric devices in our houses. In general, we notice a novel and sustained trend towards more natural ways of communication with our smart devices. With the introduction of Hidden Markov Models (HMM), the performance of speech recognition systems deployed in controlled environments (e.g. offices) has reached very high levels. However, as soon as the environment where the system is deployed differs from the one present when the training database was built, the recognition performance degrades rapidly, due to the acoustic noise. To overcome these problems, we need to add complementary information coming from other channels that are not affected by acoustic noise. The visual information seems to be the most natural source of additional information since human beings use it in their daily lives [1]. The scientific literature gives promising results (e.g. an increase in performance with 10 to 20%) in the case of high noise to signal ratio. However, independently of the method used for extracting visual information, since the video signal is sampled at a frame rate of 25-30fps, while the audio data is sampled usually at 100fps, there is a big gap between the two. To solve this problem the obvious solution is to interpolate the slower sampled signal to match the faster one. The question that arises then, is how much reduction of the available information we induce by performing interpolation? What if we sample the video stream at a much higher sample rate? The advent of digital technology makes it possible to record video data at very high frame rates. However, the costs of the equipment and the resources necessary for high speed recordings are still extremely high. Therefore, before embarking to such an enormous effort we want to know what would be the benefits with respect to speech recognition performances.
2 This paper analyzes the loss of information at the signal level caused by using a low sample rate. The signals are composed of visual features that proved very successful in the literature. We present the results of the analysis in two situations: recordings using high speed camera and simulated speech using an artificial talking head. We also consider the two separate cases: low speech rate and high speech rate. The next section introduces the settings used for data acquisition in both cases of high speed recording and high speed simulation and gives a succinct description of the visual features analyzed. Section 3 presents the analyses together with the preliminary results, while stressing the issues in each case. Our conclusions are summarized in the last section of the paper. 2. Data acquisition We used for our analysis three data sources: our old dataset built using a regular camcorder at 25fps, DUTAVSC [12], new recordings using high speed camera and synthetic data obtained with the CUAnimate [11] software. 2.1 Real Data: High Speed Camera We used for the recordings a FASTCAM-APX RS 250K device. This camera is capable of capturing at a rate of fps grayscale images. To be able to achieve good image quality we limited our recordings to 250fps and 1MP resolution. Also to cope with the large storage needs for the recordings we decided to perform the analysis on very short utterances, namely the digits from 0 to 9. The utterances were produced in Dutch by a native male speaker. 2.2 Synthetic data: CUAnimate and CSLU Toolkit To ease the acquisition of high speed data we used the CUAnimate software in combination with the Center for Spoken Language Understanding (CSLUToolkit). CUAnimate is a set of software tools for researching full-bodied three-dimensional animated characters, and for controlling and rendering them in real time. It was build at the Center for Spoken Language Research, University of Colorado, with the declared goal of enabling animated computer characters to engage in natural face-to-face conversational interaction with users. The software gives the possibility to save the animation at different frame rates, namely 15, 24, 30, 60 and 999. Therefore, we recorded for the present research the VidTIMIT utterance She had your dark suit in greasy wash water all year. using all five different possible target frame rates. During animation we inhibited any other animation, such as random head movement, blinking, emotions, etc. The facial expression was set to neutral. 2.3 Visual features We want to investigate the influence of the frame rate at which the video clip was recorded on the amount of information lost. Hence, we will look at some of those visual features that proved to be most important for lipreading. The techniques developed for extracting visual information are variate: starting with appearance based methods [2, 3], to geometry based methods [4, 5], and vector flow based methods [6, 7, 8, 9, 10]. The mouth height and width and the mouth openness are some of the most used static visual features in the literature [13, 14]. The exact definitions of these features are shown in Figure 1a. We also analyzed the effect of the frame rate when using a different type of features, introduced in paper [10]. These features are computed using optical flow analysis and were reported to provide very good results. Figure 1b shows an instance of these features, however, a better description can be found in the paper.
3 Figure 1. a) Mouth width, heights and openness; b) Optical flow based features. 3. Analysis M 1 The measure used for analysis is defined as: RMSD= ( x xˆ ) M i = 1 i i 2. In each case we compared the feature signal obtained when the maximum available frame rate was used with the signals obtained by down-sampling the original signal and using linear interpolation to recover the signal. In the case of the artificial data we also compared the down-sampled clips with the corresponding recorded clips. For the optical flow based features, where the actual features used are the vertical and horizontal increments, interpolation is not permitted since it would yield erroneous results. In this case the analysis is performed on the cumulative sums of each feature vector. 3.1 Speech rate The first step we considered was to see how much data is recorded for each unit of visual speech, namely each viseme, when using regular frame rate. We analyzed the DUTAVSC dataset searching to approximate the mean number of frames per viseme. The overall result was that each viseme was covered by approximately 6 frames. However, when plotting the result per each utterance we discovered that two situations are emerging as seen in Figure 2. Some utterances are crowded in the range 2 to 5, while a second set is more or less uniformly spread from 5 to 20. Figure 2b and 2c, show in detail the two sets, respectively. We found that in the low speech rate set we have only spellings, phone and account numbers utterances, while in the high speech rate set we have continuous speech utterances. The mean numbers of frames per viseme in the two sets were, when considered separately, 3 frames per viseme for the high speech rate and 11 frames per viseme for the low speech rate. Hence, there is a significant difference between the two cases. Therefore, the analysis has to take into account this aspect. Figure 2. The histogram of the frames per visemes values in the DUTAVSC dataset. a) entire dataset; b) low speech rate utterances; c) high speech rate utterances. 3.2 Results based on the real recordings We will show the results obtained based on the utterances of the digit 0 and 1, recorded using high speed camera. The speech rate was in this case around 10 frames per viseme. Similar results were obtained for the other digits. Figure 3 shows the plots of the mouth width, mouth height and mouth openness is time, respectively for the digit 0.
4 Figure 3. Digit "0" the a) width, b) height, and c) openness in time. We can notice in these images that the interpolated signals are following very closely the finest signal, namely the 250fps signal. However the coarsest one, the 15fps signal, as expected is sometimes somewhat farther. The differences between the true signal and the ones obtained by interpolation are more visible in the case of the width, but even there, the spikes are most probable errors introduced by the detection algorithms used. Table 1 lists the RMSD values obtained in each situation. The units used are image pixels. We can also observe here that the RMSD decreases with respect to the frame rates, which is to be expected, however the very small numbers suggest again very little loss. Table 1. RMSD values for the case of real data recordings. Digit 0 15fps 24fps 30fps 60fps Width Height Openness Digit 1 15fps 24fps 30fps 60fps Width Height Openness The same analysis was performed for the cumulative sum of the optical flow based visual features. Figure 4 shows the plots of the different signals obtained for three of the features in the case of Digit 1. In Figure 4a is shown the feature that has the biggest variance, while the other features were randomly chosen. Figure 4. Optical flow based features. Cumulative sums in time. a) The most informative feature, b) and c) Some randomly chosen features. The same observations can be made in this case. However, we can notice in the case of the image b) that there is a bit bigger difference. Also we can see that the plot of 250fps signal is very irregular, which can send us to the conclusion that the optical flow analyzer has some accuracy problems when dealing with such a fine transformation. Table 2 gives the RMSD for this situation. Table 2. RMSD values for optical flow based features. The features are the ones shown in Figure 4. Digit 1 15fps 24fps 30fps 60fps Feature a) Feature b) Feature c)
5 Figures 5 and 6 show the plots obtained in the case of two recordings from the DUTAVSC dataset. In Figure 5 we have a slow speech rate utterance (i.e. 12 frames per viseme), while in Figure 6 a high speech rate utterance (i.e. 2 frames per viseme). Figure 5. Slow speech rate utterance from DUTAVSC. a) mouth openness; b) mouth width Figure 6. High speech rate utterance from DUTAVSC. a) mouth openness; b) mouth width The plots, in Figures 5 and 6, show for comparison the signals obtained by down-sampling the original signal by 2. The most important observation for the plot above is that there are much more spikes per time unit in the case of high speech rate than in the case of low speech rate. This implies that the sample rate necessary to preserve the signal should be much higher than in the other case. We can see already that by decimating the signal by two already introduces more errors in the case of high rate than the case of low rate speech. 3.2 Results based on the artificial recordings We performed the same analysis on the recordings made using the CUAnimate talking face. The results are given in Table 3. We also present here the RMSD computed using clips with lower frame rate obtained directly from the CUAnimate software. Table 3 Comparison based on CUAnimate data. The feature shown is mouth width. 15fps 24fps 30fps 60fps Comparison between 999fps clip and decimated clips Width Height Openness Comparison between decimated clips and clips obtain directly from CUAnimate Width Height Openness Comparison between 999fps clip and the interpolated data obtain directly from CUAnimate Width Height Openness We can again see a very small difference when the frame rate is lowered. Looking at the results in the central section of the Table 3 we can also note that the morphing algorithm used by CUAnimate is slightly different than linear interpolation.
6 4. Conclusions We introduced in this paper a method to analyze the loss of information due to the low sampling rate of the video information. The analysis was performed on both real and synthetic data. We observed the behavior of some visual features most used for lipreading because we are interested in the loss of information from the point of view of lipreading accuracy. Our preliminary results show that in the case of low speech rate the loss of information is not sufficiently large to justify the effort when working with high speed recordings. However, in the case of high speech rate the danger of losing the real signal is much higher. Thus, in this case it is required to use higher recording rates. This result may be explained by the fact that when speaking slowly our mouth makes more ample and complete movements compared to the case when the speech rate is high. It is also our belief that during high speech rate the shown visemes are poorer and contain less information. Hence, by using higher recording rate we could more efficiently capture that information. However, we may need to develop better error measures, and look at larger set of features, and more advanced features before taking a definitive decision. At this moment we can definitely argue that while 15fps is a too lower target, 25fps is too low for high speech rate, more than 250fps would be also a superfluous target. The visual features used for this research were carefully computed such that to include very little noise. However, in a real system the accuracy of the segmentation is far from perfect. The noise introduced in this way will generate additional concerns with respect to the recording rate. References [1] H. McGurk, J. MacDonald, Hearing lips and seeing voices, Nature, v. 264, pp , [2] N. Li, S. Dettmer, M. Shah, Lipreading using eigen sequences, Proc. Int. Conf. on Automatic Face- and Gesture-Recognition, (Zurich, Switzerland), pp.30 34, [3] , Visually recognizing speech using eigen sequences, Motion-based recognition, [4] L.J.M. Rothkrantz, J.C. Wojdeł, P. Wiggers, Comparison between different feature extraction techniques in lipreading applications, Specom 2006, SpIIRAS Petersburg, [5] I. A. Essa, A. Pentland, A Vision System for Observing and Extracting Facial Action Parameters, Proc. of IEEE Conf. on CVPR, pp , IEEE, June [6] K. Mase, A. Pentland., Automatic Lipreading by Optical-Flow Analysis, Systems and Computers in Japan, vol. 22, pp , [7] K. Iwano, S. Tamura, S. Furui, Bimodal Speech Recognition Using Lip Movement Measured by Optical-Flow analysis, in HSC2001, [8] D. J. Fleet, M. J. Black, Y. Yacoob, A. D. Jepson, Design and Use of Linear Models for Image Motion Analysis, Int. Journal of Computer Vision, vol. 36, no. 3, pp , [9] A. Martin, Lipreading by Optical Flow Correlation, tech. rep., Compute Science Department University of Central Florida, [10] A.G. ChiŃu, L.J.M. Rothkrantz, J.C. Wojdeł, P. Wiggers, "Comparison Between Different Feature Extraction Techniques for Audio-Visual Speech Recognition", JMUI, pp. 7-20, Springer, [11] J. Ma, J. Yan, R. Cole, CUAnimate Tools for Enabling Conversations with Animated Characters [12] J.C. Wojdeł, P. Wiggers, L.J.M. Rothkrantz, An audio-visual corpus for multimodal speech recognition in Dutch language, ICSLP2002, Denver, USA, September, pp , [13] M. Chan Hmm-based audio-visual speech recognition integrating geometric and appearancebased visual features IEEE 4 th Workshop on Multimedia Signal Processing, France, pp. 9-14, [14] X. Zhang, R.M. Mersereanu & M.A. Clements, Audio-visual speech recognition by speechreading, 14 th Int. Conf on Digital Signal Processing, Aegean Island of Thera, Greece, vol.2, pp , July 2002.
Multi-Modal Acoustic Echo Canceller for Video Conferencing Systems
Multi-Modal Acoustic Echo Canceller for Video Conferencing Systems Mario Gazziro,Guilherme Almeida,Paulo Matias, Hirokazu Tanaka and Shigenobu Minami ICMC/USP, Brazil Email: mariogazziro@usp.br Wernher
More informationLow-resolution Character Recognition by Video-based Super-resolution
2009 10th International Conference on Document Analysis and Recognition Low-resolution Character Recognition by Video-based Super-resolution Ataru Ohkura 1, Daisuke Deguchi 1, Tomokazu Takahashi 2, Ichiro
More informationMobile Multimedia Application for Deaf Users
Mobile Multimedia Application for Deaf Users Attila Tihanyi Pázmány Péter Catholic University, Faculty of Information Technology 1083 Budapest, Práter u. 50/a. Hungary E-mail: tihanyia@itk.ppke.hu Abstract
More informationVery Low Frame-Rate Video Streaming For Face-to-Face Teleconference
Very Low Frame-Rate Video Streaming For Face-to-Face Teleconference Jue Wang, Michael F. Cohen Department of Electrical Engineering, University of Washington Microsoft Research Abstract Providing the best
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationParametric Comparison of H.264 with Existing Video Standards
Parametric Comparison of H.264 with Existing Video Standards Sumit Bhardwaj Department of Electronics and Communication Engineering Amity School of Engineering, Noida, Uttar Pradesh,INDIA Jyoti Bhardwaj
More informationA secure face tracking system
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking
More informationTemplate-based Eye and Mouth Detection for 3D Video Conferencing
Template-based Eye and Mouth Detection for 3D Video Conferencing Jürgen Rurainsky and Peter Eisert Fraunhofer Institute for Telecommunications - Heinrich-Hertz-Institute, Image Processing Department, Einsteinufer
More informationEffects of Pronunciation Practice System Based on Personalized CG Animations of Mouth Movement Model
Effects of Pronunciation Practice System Based on Personalized CG Animations of Mouth Movement Model Kohei Arai 1 Graduate School of Science and Engineering Saga University Saga City, Japan Mariko Oda
More informationHANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT
International Journal of Scientific and Research Publications, Volume 2, Issue 4, April 2012 1 HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT Akhil Gupta, Akash Rathi, Dr. Y. Radhika
More informationHow To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode
More informationA Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationMMGD0203 Multimedia Design MMGD0203 MULTIMEDIA DESIGN. Chapter 3 Graphics and Animations
MMGD0203 MULTIMEDIA DESIGN Chapter 3 Graphics and Animations 1 Topics: Definition of Graphics Why use Graphics? Graphics Categories Graphics Qualities File Formats Types of Graphics Graphic File Size Introduction
More informationHow To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is
More informationInternational Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014
Efficient Attendance Management System Using Face Detection and Recognition Arun.A.V, Bhatath.S, Chethan.N, Manmohan.C.M, Hamsaveni M Department of Computer Science and Engineering, Vidya Vardhaka College
More informationFacial Expression Analysis and Synthesis
1. Research Team Facial Expression Analysis and Synthesis Project Leader: Other Faculty: Post Doc(s): Graduate Students: Undergraduate Students: Industrial Partner(s): Prof. Ulrich Neumann, IMSC and Computer
More informationDigital Camera Imaging Evaluation
Digital Camera Imaging Evaluation Presenter/Author J Mazzetta, Electro Optical Industries Coauthors Dennis Caudle, Electro Optical Industries Bob Wageneck, Electro Optical Industries Contact Information
More informationHow to Improve the Sound Quality of Your Microphone
An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,
More informationInternational Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10
Accurate 3D information extraction from large-scale data compressed image and the study of the optimum stereo imaging method Riichi NAGURA *, * Kanagawa Institute of Technology nagura@ele.kanagawa-it.ac.jp
More informationVideo Affective Content Recognition Based on Genetic Algorithm Combined HMM
Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China
More informationLocating and Decoding EAN-13 Barcodes from Images Captured by Digital Cameras
Locating and Decoding EAN-13 Barcodes from Images Captured by Digital Cameras W3A.5 Douglas Chai and Florian Hock Visual Information Processing Research Group School of Engineering and Mathematics Edith
More informationHigh Quality Image Magnification using Cross-Scale Self-Similarity
High Quality Image Magnification using Cross-Scale Self-Similarity André Gooßen 1, Arne Ehlers 1, Thomas Pralow 2, Rolf-Rainer Grigat 1 1 Vision Systems, Hamburg University of Technology, D-21079 Hamburg
More informationVision based approach to human fall detection
Vision based approach to human fall detection Pooja Shukla, Arti Tiwari CSVTU University Chhattisgarh, poojashukla2410@gmail.com 9754102116 Abstract Day by the count of elderly people living alone at home
More informationLOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE. indhubatchvsa@gmail.com
LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE 1 S.Manikandan, 2 S.Abirami, 2 R.Indumathi, 2 R.Nandhini, 2 T.Nanthini 1 Assistant Professor, VSA group of institution, Salem. 2 BE(ECE), VSA
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationVideo compression: Performance of available codec software
Video compression: Performance of available codec software Introduction. Digital Video A digital video is a collection of images presented sequentially to produce the effect of continuous motion. It takes
More informationAn Active Head Tracking System for Distance Education and Videoconferencing Applications
An Active Head Tracking System for Distance Education and Videoconferencing Applications Sami Huttunen and Janne Heikkilä Machine Vision Group Infotech Oulu and Department of Electrical and Information
More informationFACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 2, February 2014,
More informationPrincipal Components of Expressive Speech Animation
Principal Components of Expressive Speech Animation Sumedha Kshirsagar, Tom Molet, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva 24 rue du General Dufour CH-1211 Geneva, Switzerland {sumedha,molet,thalmann}@miralab.unige.ch
More informationImage Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
More informationFace detection is a process of localizing and extracting the face region from the
Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.
More informationAn Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network
Proceedings of the 8th WSEAS Int. Conf. on ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING & DATA BASES (AIKED '9) ISSN: 179-519 435 ISBN: 978-96-474-51-2 An Energy-Based Vehicle Tracking System using Principal
More informationBernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA
Are Image Quality Metrics Adequate to Evaluate the Quality of Geometric Objects? Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA ABSTRACT
More informationA System for Capturing High Resolution Images
A System for Capturing High Resolution Images G.Voyatzis, G.Angelopoulos, A.Bors and I.Pitas Department of Informatics University of Thessaloniki BOX 451, 54006 Thessaloniki GREECE e-mail: pitas@zeus.csd.auth.gr
More informationMultimodal Biometric Recognition Security System
Multimodal Biometric Recognition Security System Anju.M.I, G.Sheeba, G.Sivakami, Monica.J, Savithri.M Department of ECE, New Prince Shri Bhavani College of Engg. & Tech., Chennai, India ABSTRACT: Security
More informationNavigation Aid And Label Reading With Voice Communication For Visually Impaired People
Navigation Aid And Label Reading With Voice Communication For Visually Impaired People A.Manikandan 1, R.Madhuranthi 2 1 M.Kumarasamy College of Engineering, mani85a@gmail.com,karur,india 2 M.Kumarasamy
More informationFall detection in the elderly by head tracking
Loughborough University Institutional Repository Fall detection in the elderly by head tracking This item was submitted to Loughborough University's Institutional Repository by the/an author. Citation:
More informationFace Locating and Tracking for Human{Computer Interaction. Carnegie Mellon University. Pittsburgh, PA 15213
Face Locating and Tracking for Human{Computer Interaction Martin Hunke Alex Waibel School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract Eective Human-to-Human communication
More information2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India
Integrity Preservation and Privacy Protection for Digital Medical Images M.Krishna Rani Dr.S.Bhargavi IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India Abstract- In medical treatments, the integrity
More informationMeasuring Line Edge Roughness: Fluctuations in Uncertainty
Tutor6.doc: Version 5/6/08 T h e L i t h o g r a p h y E x p e r t (August 008) Measuring Line Edge Roughness: Fluctuations in Uncertainty Line edge roughness () is the deviation of a feature edge (as
More informationTracking and Recognition in Sports Videos
Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer
More informationA ROBUST BACKGROUND REMOVAL ALGORTIHMS
A ROBUST BACKGROUND REMOVAL ALGORTIHMS USING FUZZY C-MEANS CLUSTERING ABSTRACT S.Lakshmi 1 and Dr.V.Sankaranarayanan 2 1 Jeppiaar Engineering College, Chennai lakshmi1503@gmail.com 2 Director, Crescent
More informationBandwidth Adaptation for MPEG-4 Video Streaming over the Internet
DICTA2002: Digital Image Computing Techniques and Applications, 21--22 January 2002, Melbourne, Australia Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet K. Ramkishor James. P. Mammen
More informationLaser Gesture Recognition for Human Machine Interaction
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-04, Issue-04 E-ISSN: 2347-2693 Laser Gesture Recognition for Human Machine Interaction Umang Keniya 1*, Sarthak
More informationFace Model Fitting on Low Resolution Images
Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com
More informationA video based real time fatigue detection system
DEPARTMENT OF APPLIED PHYSICS AND ELECTRONICS UMEÅ UNIVERISTY, SWEDEN DIGITAL MEDIA LAB A video based real time fatigue detection system Zhengrong Yao 1 Dept. Applied Physics and Electronics Umeå University
More informationScanners and How to Use Them
Written by Jonathan Sachs Copyright 1996-1999 Digital Light & Color Introduction A scanner is a device that converts images to a digital file you can use with your computer. There are many different types
More informationThe Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
More informationWeighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition
ISCA Archive http://www.isca-speech.org/archive Auditory-Visual Speech Processing 27 (AVSP27) Hilvarenbeek, The Netherlands August 31 - September 3, 27 Weighting and Normalisation of Synchronous HMMs for
More informationVoice Driven Animation System
Voice Driven Animation System Zhijin Wang Department of Computer Science University of British Columbia Abstract The goal of this term project is to develop a voice driven animation system that could take
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationMultihopping for OFDM based Wireless Networks
Multihopping for OFDM based Wireless Networks Jeroen Theeuwes, Frank H.P. Fitzek, Carl Wijting Center for TeleInFrastruktur (CTiF), Aalborg University Neils Jernes Vej 12, 9220 Aalborg Øst, Denmark phone:
More informationPHYSIOLOGICALLY-BASED DETECTION OF COMPUTER GENERATED FACES IN VIDEO
PHYSIOLOGICALLY-BASED DETECTION OF COMPUTER GENERATED FACES IN VIDEO V. Conotter, E. Bodnari, G. Boato H. Farid Department of Information Engineering and Computer Science University of Trento, Trento (ITALY)
More informationHigh-accuracy ultrasound target localization for hand-eye calibration between optical tracking systems and three-dimensional ultrasound
High-accuracy ultrasound target localization for hand-eye calibration between optical tracking systems and three-dimensional ultrasound Ralf Bruder 1, Florian Griese 2, Floris Ernst 1, Achim Schweikard
More informationCHAPTER 6 TEXTURE ANIMATION
CHAPTER 6 TEXTURE ANIMATION 6.1. INTRODUCTION Animation is the creating of a timed sequence or series of graphic images or frames together to give the appearance of continuous movement. A collection of
More informationARTIFICIAL NEURAL NETWORKS IN THE SCOPE OF OPTICAL PERFORMANCE MONITORING
1 th Portuguese Conference on Automatic Control 16-18 July 212 CONTROLO 212 Funchal, Portugal ARTIFICIAL NEURAL NETWORKS IN THE SCOPE OF OPTICAL PERFORMANCE MONITORING Vítor Ribeiro,?? Mário Lima, António
More informationECE 533 Project Report Ashish Dhawan Aditi R. Ganesan
Handwritten Signature Verification ECE 533 Project Report by Ashish Dhawan Aditi R. Ganesan Contents 1. Abstract 3. 2. Introduction 4. 3. Approach 6. 4. Pre-processing 8. 5. Feature Extraction 9. 6. Verification
More informationMouse Control using a Web Camera based on Colour Detection
Mouse Control using a Web Camera based on Colour Detection Abhik Banerjee 1, Abhirup Ghosh 2, Koustuvmoni Bharadwaj 3, Hemanta Saikia 4 1, 2, 3, 4 Department of Electronics & Communication Engineering,
More informationExtracting a Good Quality Frontal Face Images from Low Resolution Video Sequences
Extracting a Good Quality Frontal Face Images from Low Resolution Video Sequences Pritam P. Patil 1, Prof. M.V. Phatak 2 1 ME.Comp, 2 Asst.Professor, MIT, Pune Abstract The face is one of the important
More informationContext-aware Library Management System using Augmented Reality
International Journal of Electronic and Electrical Engineering. ISSN 0974-2174 Volume 7, Number 9 (2014), pp. 923-929 International Research Publication House http://www.irphouse.com Context-aware Library
More informationSampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.
Sampling Theorem We will show that a band limited signal can be reconstructed exactly from its discrete time samples. Recall: That a time sampled signal is like taking a snap shot or picture of signal
More informationObject Recognition and Template Matching
Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of
More informationBlind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections
Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections Maximilian Hung, Bohyun B. Kim, Xiling Zhang August 17, 2013 Abstract While current systems already provide
More informationTraffic Monitoring Systems. Technology and sensors
Traffic Monitoring Systems Technology and sensors Technology Inductive loops Cameras Lidar/Ladar and laser Radar GPS etc Inductive loops Inductive loops signals Inductive loop sensor The inductance signal
More informationInteractive person re-identification in TV series
Interactive person re-identification in TV series Mika Fischer Hazım Kemal Ekenel Rainer Stiefelhagen CV:HCI lab, Karlsruhe Institute of Technology Adenauerring 2, 76131 Karlsruhe, Germany E-mail: {mika.fischer,ekenel,rainer.stiefelhagen}@kit.edu
More information3D Face Modeling. Vuong Le. IFP group, Beckman Institute University of Illinois ECE417 Spring 2013
3D Face Modeling Vuong Le IFP group, Beckman Institute University of Illinois ECE417 Spring 2013 Contents Motivation 3D facial geometry modeling 3D facial geometry acquisition 3D facial deformation modeling
More informationVECTORAL IMAGING THE NEW DIRECTION IN AUTOMATED OPTICAL INSPECTION
VECTORAL IMAGING THE NEW DIRECTION IN AUTOMATED OPTICAL INSPECTION Mark J. Norris Vision Inspection Technology, LLC Haverhill, MA mnorris@vitechnology.com ABSTRACT Traditional methods of identifying and
More informationJPEG compression of monochrome 2D-barcode images using DCT coefficient distributions
Edith Cowan University Research Online ECU Publications Pre. JPEG compression of monochrome D-barcode images using DCT coefficient distributions Keng Teong Tan Hong Kong Baptist University Douglas Chai
More informationVEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS
VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS Aswin C Sankaranayanan, Qinfen Zheng, Rama Chellappa University of Maryland College Park, MD - 277 {aswch, qinfen, rama}@cfar.umd.edu Volkan Cevher, James
More informationTowards License Plate Recognition: Comparying Moving Objects Segmentation Approaches
1 Towards License Plate Recognition: Comparying Moving Objects Segmentation Approaches V. J. Oliveira-Neto, G. Cámara-Chávez, D. Menotti UFOP - Federal University of Ouro Preto Computing Department Ouro
More informationThe Effect of Long-Term Use of Drugs on Speaker s Fundamental Frequency
The Effect of Long-Term Use of Drugs on Speaker s Fundamental Frequency Andrey Raev 1, Yuri Matveev 1, Tatiana Goloshchapova 2 1 Speech Technology Center, St. Petersburg, RUSSIA {raev, matveev}@speechpro.com
More informationShogo Kumagai, Keisuke Doman, Tomokazu Takahashi, Daisuke Deguchi, Ichiro Ide, and Hiroshi Murase
Detection of Inconsistency between Subject and Speaker based on the Co-occurrence of Lip Motion and Voice Towards Speech Scene Extraction from News Videos Shogo Kumagai, Keisuke Doman, Tomokazu Takahashi,
More informationStatic Environment Recognition Using Omni-camera from a Moving Vehicle
Static Environment Recognition Using Omni-camera from a Moving Vehicle Teruko Yata, Chuck Thorpe Frank Dellaert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 USA College of Computing
More informationManaging Healthcare Records via Mobile Applications
Managing Healthcare Records via Mobile Applications Eileen Y.P. Li, C.T. Lau and S. Chan Abstract In this paper, a mobile application that facilitates users in managing healthcare records is proposed.
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationImplementation of OCR Based on Template Matching and Integrating it in Android Application
International Journal of Computer Sciences and EngineeringOpen Access Technical Paper Volume-04, Issue-02 E-ISSN: 2347-2693 Implementation of OCR Based on Template Matching and Integrating it in Android
More informationAnalecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
More informationQuick start guide! Terri Meyer Boake
Film Editing Tutorial Quick start guide! Terri Meyer Boake 1. Preparing yourself and your files: This information is valid for all film editing software: FCP, Premiere (the version of FC being used is
More informationKeywords image processing, signature verification, false acceptance rate, false rejection rate, forgeries, feature vectors, support vector machines.
International Journal of Computer Application and Engineering Technology Volume 3-Issue2, Apr 2014.Pp. 188-192 www.ijcaet.net OFFLINE SIGNATURE VERIFICATION SYSTEM -A REVIEW Pooja Department of Computer
More informationJournal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition
IWNEST PUBLISHER Journal of Industrial Engineering Research (ISSN: 2077-4559) Journal home page: http://www.iwnest.com/aace/ Adaptive sequence of Key Pose Detection for Human Action Recognition 1 T. Sindhu
More informationMaking Machines Understand Facial Motion & Expressions Like Humans Do
Making Machines Understand Facial Motion & Expressions Like Humans Do Ana C. Andrés del Valle & Jean-Luc Dugelay Multimedia Communications Dpt. Institut Eurécom 2229 route des Crêtes. BP 193. Sophia Antipolis.
More informationSIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID
SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID Renewable Energy Laboratory Department of Mechanical and Industrial Engineering University of
More informationHuman Behavior Analysis in Intelligent Retail Environments
Human Behavior Analysis in Intelligent Retail Environments Andrea Ascani, Emanuele Frontoni, Adriano Mancini, Primo Zingaretti 1 D.I.I.G.A., Università Politecnica delle Marche, Ancona - Italy, {ascani,
More informationColor Segmentation Based Depth Image Filtering
Color Segmentation Based Depth Image Filtering Michael Schmeing and Xiaoyi Jiang Department of Computer Science, University of Münster Einsteinstraße 62, 48149 Münster, Germany, {m.schmeing xjiang}@uni-muenster.de
More informationBeyond Built-in: Why a Better Webcam Matters
Whitepaper: Beyond Built-in: Why a Better Webcam Matters How to Uplevel Your Ability to Connect, Communicate and Collaborate Using Your Laptop or PC Introduction The ability to virtually communicate and
More informationTracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object
More informationFace Recognition For Remote Database Backup System
Face Recognition For Remote Database Backup System Aniza Mohamed Din, Faudziah Ahmad, Mohamad Farhan Mohamad Mohsin, Ku Ruhana Ku-Mahamud, Mustafa Mufawak Theab 2 Graduate Department of Computer Science,UUM
More informationHuman behavior analysis from videos using optical flow
L a b o r a t o i r e I n f o r m a t i q u e F o n d a m e n t a l e d e L i l l e Human behavior analysis from videos using optical flow Yassine Benabbas Directeur de thèse : Chabane Djeraba Multitel
More informationLIST OF CONTENTS CHAPTER CONTENT PAGE DECLARATION DEDICATION ACKNOWLEDGEMENTS ABSTRACT ABSTRAK
vii LIST OF CONTENTS CHAPTER CONTENT PAGE DECLARATION DEDICATION ACKNOWLEDGEMENTS ABSTRACT ABSTRAK LIST OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF NOTATIONS LIST OF ABBREVIATIONS LIST OF APPENDICES
More informationUsing Linear Fractal Interpolation Functions to Compress Video. The paper in this appendix was presented at the Fractals in Engineering '94
Appendix F Using Linear Fractal Interpolation Functions to Compress Video Images The paper in this appendix was presented at the Fractals in Engineering '94 Conference which was held in the École Polytechnic,
More informationUNIVERSITY OF CENTRAL FLORIDA AT TRECVID 2003. Yun Zhai, Zeeshan Rasheed, Mubarak Shah
UNIVERSITY OF CENTRAL FLORIDA AT TRECVID 2003 Yun Zhai, Zeeshan Rasheed, Mubarak Shah Computer Vision Laboratory School of Computer Science University of Central Florida, Orlando, Florida ABSTRACT In this
More informationMean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
More informationA Method of Caption Detection in News Video
3rd International Conference on Multimedia Technology(ICMT 3) A Method of Caption Detection in News Video He HUANG, Ping SHI Abstract. News video is one of the most important media for people to get information.
More informationNAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
More informationIntroduction to Computer Graphics
Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 torsten@sfu.ca www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics
More informationFrequently Asked Questions About VisionGauge OnLine
Frequently Asked Questions About VisionGauge OnLine The following frequently asked questions address the most common issues and inquiries about VisionGauge OnLine: 1. What is VisionGauge OnLine? VisionGauge
More informationAssessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
More informationA Simple Feature Extraction Technique of a Pattern By Hopfield Network
A Simple Feature Extraction Technique of a Pattern By Hopfield Network A.Nag!, S. Biswas *, D. Sarkar *, P.P. Sarkar *, B. Gupta **! Academy of Technology, Hoogly - 722 *USIC, University of Kalyani, Kalyani
More informationWhite paper. HDTV (High Definition Television) and video surveillance
White paper HDTV (High Definition Television) and video surveillance Table of contents Introduction 3 1. HDTV impact on video surveillance market 3 2. Development of HDTV 3 3. How HDTV works 4 4. HDTV
More informationSpeed Performance Improvement of Vehicle Blob Tracking System
Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu, nevatia@usc.edu Abstract. A speed
More information