engin erzin the use of speech processing applications is expected to surge in multimedia-rich scenarios



Similar documents
CV - Arif Tanju Erdem. Date of birth: EDUCATION

Emotion Detection from Speech

biometric person recognition recognizing individuals according to their physical and behavioral characteristics has

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Recent advances in Digital Music Processing and Indexing

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Speech Signal Processing: An Overview

Establishing the Uniqueness of the Human Voice for Security Applications

DR AYŞE KÜÇÜKYILMAZ. Imperial College London Personal Robotics Laboratory Department of Electrical and Electronic Engineering SW7 2BT London UK

PROFESSIONAL EXPERIENCE

Turgut Ozal University. Computer Engineering Department. TR Ankara, Turkey

Separation and Classification of Harmonic Sounds for Singing Voice Detection

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

QMeter Tools for Quality Measurement in Telecommunication Network

An Arabic Text-To-Speech System Based on Artificial Neural Networks

Habilitation. Bonn University. Information Retrieval. Dec PhD students. General Goals. Music Synchronization: Audio-Audio

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Teaching in School of Electronic, Information and Electrical Engineering

A secure face tracking system

AUTOMATIC VIDEO STRUCTURING BASED ON HMMS AND AUDIO VISUAL INTEGRATION

DR AYŞE KÜÇÜKYILMAZ. Yeditepe University Department of Computer Engineering Kayışdağı Caddesi Istanbul Turkey

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Multimedia Technology Bachelor of Science

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

How do non-expert users exploit simultaneous inputs in multimodal interaction?

Giuseppe Riccardi, Marco Ronchetti. University of Trento

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

Colorado School of Mines Computer Vision Professor William Hoff

Modeling and Design of Intelligent Agent System

MULTIMODAL VIRTUAL ASSISTANTS FOR CONSUMER AND ENTERPRISE

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

CONTROL, COMMUNICATION & SIGNAL PROCESSING (CCSP)

FRANCESCO BELLOCCHIO S CURRICULUM VITAE ET STUDIORUM

Course overview Processamento de sinais 2009/10 LEA

Tracking and Recognition in Sports Videos

Thirukkural - A Text-to-Speech Synthesis System

School Class Monitoring System Based on Audio Signal Processing

How To Get A Computer Engineering Degree

Development of a Service Robot System for a Remote Child Monitoring Platform

Multisensor Data Fusion and Applications

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition

Facial Expression Analysis and Synthesis

Gender Identification using MFCC for Telephone Applications A Comparative Study

Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification

Welcome to KU Engineering A. Murat Tekalp, Dean

How To Filter Spam Image From A Picture By Color Or Color

Pakistan-U.S. Science and Technology Cooperation Program Annual Technical Report Form

DESIGN OF CLUSTER OF SIP SERVER BY LOAD BALANCER

GLOVE-BASED GESTURE RECOGNITION SYSTEM

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

June Zhang (Zhong-Ju Zhang)

Internet Video Streaming and Cloud-based Multimedia Applications. Outline

Social Signal Processing Understanding Nonverbal Behavior in Human- Human Interactions

Hybrid Lossless Compression Method For Binary Images

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

A Voice and Ink XML Multimodal Architecture for Mobile e-commerce Systems

The Department of Electrical and Computer Engineering (ECE) offers the following graduate degree programs:

L9: Cepstral analysis

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

Lecture 1-10: Spectrograms

Ericsson T18s Voice Dialing Simulator

A very brief introduction to Electronic Engineering & Computer Science. Geraint A. Wiggins Professor of Computational Creativity & Head of School

Multi-Modal Acoustic Echo Canceller for Video Conferencing Systems

C E D A T 8 5. Innovating services and technologies for speech content management

Victoria Kostina Curriculum Vitae - September 6, 2015 Page 1 of 5. Victoria Kostina

Curriculum Vitae. 1 Person Dr. Horst O. Bunke, Prof. Em. Date of birth July 30, 1949 Place of birth Langenzenn, Germany Citizenship Swiss and German

Speech Recognition of a Voice-Access Automotive Telematics. System using VoiceXML

ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING SHORT TEST AND TRAINING SESSIONS

Template-based Eye and Mouth Detection for 3D Video Conferencing

Big Data: Image & Video Analytics

Blender in Research & Education

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification

Basic Theory of Intermedia Composing with Sounds and Images

7/3/12 Yusuf Sinan Akgül

Limitations of Human Vision. What is computer vision? What is computer vision (cont d)?

Introduction. Selim Aksoy. Bilkent University

Database-Centered Architecture for Traffic Incident Detection, Management, and Analysis

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Machine Learning. CS494/594, Fall :10 AM 12:25 PM Claxton 205. Slides adapted (and extended) from: ETHEM ALPAYDIN The MIT Press, 2004

Transcription:

engin erzin Associate Professor Department of Computer Engineering Ph.D. Bilkent University http://home.ku.edu.tr/ eerzin eerzin@ku.edu.tr Engin Erzin s research interests include speech processing, multimodal signal processing, pattern recognition and human-computer interfaces. Prof. Erzin is a member of Multimedia, Vision and Graphics Laboratory (MVGL), where he is actively part of many national and international research projects. E. Erzin. Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings. IEEE Transactions on Audio, Speech and Language Processing, 2009. The speech processing research area, which refers to analysis, synthesis and recognition of speech signals, is playing a key role in the state-of-art digital speech communication and multimedia services. While Internet and wireless telephony is expected to remain one of the most important application for several years to come, the use of speech processing applications, such as automatic speech recognition (ASR), text-to-speech synthesis (TTS), speaker identification/verification, emotion and mood analysis from speech, is expected to increase in multimedia-rich scenarios. M. E. Sargın, Y. Yemez, E. Erzin, and A. M. Tekalp. Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation. IEEE Transactions on Pattern Analysis and Machine Learning, 2008. the use of speech processing applications is expected to surge in multimedia-rich scenarios Multimodal signal processing refers to combined processing of signals from multiple modalities such as speech, still images, video, and other sources. It plays a key role in the design of future human-computer interfaces and intelligent systems, such as intelligent vehicles. The ultimate goal of human-computer interface research is to develop a machine that is able to identify humans, to analyze and understand them from biometric input signals and to synthesize a human-like output in response, in a similar way to human-to-human communication. The study of relations and correlations between different modality signals plays an important role in effective use of multimodal information. Prof. Erzin s active research activities in the area of multimodal signal processing include speech/speaker recognition, body motion analysis, speech-driven face gesture analysis and synthesis, speaker animation, audio-driven body animation and driver behavior modeling. More details on Prof. Erzin s research activities and current research projects are available under: http://mvgl.ku.edu.tr. U. Bag cı and E. Erzin. Automatic classification,of musical genres using intergenre similarity. IEEE Signal Processing Letters, 2007. H. E. C etingu l, E. Erzin, Y. Yemez, and A. M. Tekalp. Multimodal speaker/speech recognition using lip motion, lip texture and audio. Signal Processing, 2006. E. Erzin, Y. Yemez, and A. M. Tekalp. Multimodal speaker identification using an adaptive classifier cascade based on modality reliability. IEEE Transactions on Multimedia, 2005. Multimodal recognition system 1

2 Graduate Students

can yagli M.S. Koç University, 2010 Can Yagli. Artificial bandwidth extension of speech using temporal clustering. Master s thesis, Koç University, 2010. In this thesis, we investigate the Artificial Bandwidth Extension problem, which aims to reconstruct the missing frequency in wideband speech from narrowband speech. To solve the problem, we utilize the well-known source-filter reproduction of the human voice production system.. C. Yagli and E. Erzin. Artificial bandwidth extension using linear prediction within temporal clusters. submitted to ICASSP 11, 2011. 3 ferda ofli Ph.D. Koç University, 2010 Advisor: Murat Tekalp, Yücel Yemez, Engin Erzin Ferda Ofli. Learning Statistical Music-to-Dance Mappings for Choreography Synthesis. PhD thesis, Koç University, 2010. We propose many-to-many statistical mappings from music measures (music segments) to dance figures (dance segments) towards generating plausible music-driven dance choreographies. We assume that dance figures (dance segment boundaries) coincide with music measures (music segment boundaries).. F. Ofli, E. Erzin, Y. Yemez, and A.M. Tekalp. Multi-modal analysis of dance performances for music-driven choreography synthesis. In ICASSP 10, Dallas, USA, 2010. F. Ofli, E. Erzin, Y. Yemez, A.M. Tekalp, A.T. Erdem, C. Erdem, T. Abaci, and M. Ozkan. Unsupervised dance figure analysis from video for dancing avatar animation. In ICIP 08, San Diego, USA, 2008. F. Ofli, C. Canton-Ferrer, J. Tilmanne, Y. Demir, E. Bozkurt, Y. Yemez, E. Erzin, and A.M. Tekalp. Audio-driven human body motion analysis and synthesis. In ICASSP 08, Las Vegas, USA, 2008. F. Ofli, Y. Demir, E. Erzin, Y. Yemez,, and A. M. Tekalp. Multicamera audio-visual analysis of dance figures. In IEEE Int. Conf. on Multimedia Expo, ICME-2007., 2007. F. Ofli, Y. Demir, C. Canton-Ferrer, J. Tilmanne, K. Balcı, E. Bozkurt, I. Kızıloğlu, Y. Yemez, E. Erzin, A.M. Tekalp, L. Akarun, and A.T. Erdem. Çok bakışlı işitsel-görsel dans verilerinin analizi ve sentezi (analysis and synthesis of multiview audio-visual dance figures). In SIU 08, Didim, Turkey, 2008.

elif bozkurt M.S. Koç University, 2010 4 Elif Bozkurt. Emotion recognition from speech. Master s thesis, Koç University, 2010. We present formant position based weighted Mel Frequency Cepstral Coefficient (WMFCC) features for the emotion recognition problem and compare performance results with commonly used feature sets. Since, the Line Spectral Frequency (LSF) features are positioned close to each other around formant frequencies, we propose normalized inverse harmonic mean function to weight critical band energies for the extraction of MFCC features.. E. Bozkurt, C. Eroglu Erdem, T. Erdem, and E. Erzin. Formant position based weighted spectral features for emotion recognition. submitted to Speech Communication, 2010. E. Bozkurt, E. Erzin, C. Eroglu Erdem, and T. Erdem. Improving automatic emotion recognition from speech signals. In INTERSPEECH 09, UK, 2009. F. Ofli, Y. Demir, C. Canton-Ferrer, J. Tilmanne, K. Balcı, E. Bozkurt, I. Kızıloğlu, Y. Yemez, E. Erzin, A.M. Tekalp, L. Akarun, and A.T. Erdem. Çok bakışlı işitsel-görsel dans verilerinin analizi ve sentezi (analysis and synthesis of multiview audio-visual dance figures). In SIU 08, Didim, Turkey, 2008. emre öztürk M.S. Koç University, 2010 Emre Öztürk. Driver status identification from driving behavior signals. Master s thesis, Koç University, 2010. Driving behavior signals differ in how and under which conditions the driver use vehicle control units, such as pedals, driving wheel, etc. In this study we investigate how the driving behavior signals differ among drivers and among different driving tasks.. E. Ozturk and E. Erzin. Driving status identification under different distraction conditions from driving behaviour signals. In 4th Biennial Workshop on DSP for In-Vehicle Systems and Safety, UTD, TX, USA, 2009.

yasemin demir Ph.D. student at University of California, Berkeley M.S. Koç University, 2008 Yasemin Demir. Music - driven dance synthesis by multimodal dance performance analysis. Master s thesis, Koç University, 2008. We present a framework for evaluation of audio feature and dance figure correlation for audio - visual analysis and synthesis of dance figures. Dance figures are performed synchronously with the musical rhythm.. Y. Demir, E. Erzin, Y. Yemez, and A. M. Tekalp. Evaluation of audio features for audio-visual analysis of dance figures. In EUSIPCO 08, Lausanne, Switzerland, 2008. F. Ofli, C. Canton-Ferrer, J. Tilmanne, Y. Demir, E. Bozkurt, Y. Yemez, E. Erzin, and A.M. Tekalp. Audio-driven human body motion analysis and synthesis. In ICASSP 08, Las Vegas, USA, 2008. 5 F. Ofli, Y. Demir, E. Erzin, Y. Yemez,, and A. M. Tekalp. Multicamera audio-visual analysis of dance figures. In IEEE Int. Conf. on Multimedia Expo, ICME-2007., 2007. F. Ofli, Y. Demir, C. Canton-Ferrer, J. Tilmanne, K. Balcı, E. Bozkurt, I. Kızıloğlu, Y. Yemez, E. Erzin, A.M. Tekalp, L. Akarun, and A.T. Erdem. Çok bakışlı işitsel-görsel dans verilerinin analizi ve sentezi (analysis and synthesis of multiview audio-visual dance figures). In SIU 08, Didim, Turkey, 2008. emre sargın MTS at Google Ph.D. student at University of California, Santa Barbara M.S. Koç University, 2006 Advisor: Murat Tekalp, Yücel Yemez, Engin Erzin Emre Sargın. Audio-visual correlation modeling for speaker identification and synthesis. Master s thesis, Koç University, 2006. This thesis addresses two major problems of multimodal signal processing using audiovisual correlation modeling: speaker recognition and speaker synthesis. We address the first problem, i.e., the audiovisual speaker recognition problem within an open-set identification framework, where audio (speech) and lip texture (intensity) modalities are fused employing a combination of early and late integration techniques.. M. E. Sargın, Y. Yemez, E. Erzin, and A. M. Tekalp. Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006. M. E. Sargın, Y. Yemez, and A.M. Tekalp. Audio-visual synchronization and fusion using canonical correlation analysis. IEEE Transactions on Multimedia, 9(7):1396 1403, November 2007. M. E. Sargın, Y. Yemez, E. Erzin, and A. M. Tekalp. Prosody-driven head-gesture animation. In IEEE Int. Conf. on Acoustic, Speech, Signal Proc. (ICASSP 07), 2007.

ulaş bağcı Ph.D. student at University of Nottingham, UK M.S. Koç University, 2005 6 Ulaş Bağcı. Boosting classifiers for automatic music genre classification. Master s thesis, Koç University, 2005. Music genre classification is an important tool for music information retrieval systems and has been finding important applications in various media platforms. Two important problems of the automatic music genre classification are feature extraction and classifier design.. U. Bağcı and E. Erzin. Automatic classification of musical genres using inter-genre similarity. IEEE Signal Processing Letters, Vol. 14, No. 8, pp. 521-524, August 2007. U. Bağcı and E. Erzin. Boosting classifiers for music genre classification. In 20th International Symposium on Computer and Information Sciences (ISCIS 2005), Berlin, 2005. U. Bağcı and E. Erzin. Müzik türlerinin sınıflanmasında benzer kesişim bilgileri uygulamaları. In SIU 2006, Antalya, 2006. ertan çetingul Ph.D. student at Johns Hopkins University, Baltimore M.S. Koç University, 2005 Advisor: Murat Tekalp, Engin Erzin, Yücel Yemez Ertan Çetingul. Discrimination analysis of lip motion features for multimodal speaker identification and speech-reading. Master s thesis, Koç University, 2005. In this thesis a new multimodal speaker/speech recognition system that integrates audio, lip texture, lip geometry, and lip motion modalities is presented. There have been several studies that jointly use audio, lip intensity and/or lip geometry information for speaker identification and speech recognition applications.. H.E. Cetingul, E. Erzin, Y. Yemez, and Tekalp A.M. Multimodal speaker/speech recognition using lip motion, lip texture and audio. Signal Processing, Special Section: Multimodal Human-Computer Interfaces, 86:3549 3558, December 2006. H.E. Cetingul, E. Erzin, Y. Yemez, and Tekalp A.M. Discriminative analysis of lip motion features for speaker identification and speech-reading. IEEE Transactions on Image Processing, 15:2879 2891, October 2006. H.E. Cetingul, E. Erzin, Y. Yemez, and Tekalp A.M. Robust lip-motion features for speaker identification. In IEEE Int. Conf. on Acoustic, Speech and Signal Processing, Philadelphia, March 2005.

alper kanak TUBITAK-UEKAE M.S. Koç University, 2004 Advisor: Murat Tekalp, Engin Erzin, Yücel Yemez Alper Kanak. Multimodal speaker identification with audio-video processing. Master s thesis, Koç University, 2004. In this these we present a multimodal text=dependent speaker identification system. The objective is to improve the recognition performance over conventional unimodal or bimodal schemes.. A. Kanak, E. Erzin, Y Yemez, and A.M. Tekalp. Speaker identification using multimodal audio-video processing. IEEE Int. Conf. on Image Processing, 2003. A. Kanak, E. Erzin, Y Yemez, and A.M. Tekalp. Joint audio-video processing for biometric speaker identification. IEEE Int. Conf. on Acoustic, Speech and Signal Processing, 2003. 7