Incorporating Lexical and Prosodic Information at Different Levels for Meeting Summarization
|
|
- Jessie Cannon
- 7 years ago
- Views:
Transcription
1 Incorporating Lexical and Prosodic Information at Different Levels for Meeting Summarization Catherine Lai and Steve Renals Centre for Speech Technology Research University of Edinburgh 17 September 2014
2 Automatic Extractive Summarization The Extractive Summarization task is to select Dialogue Acts (DAs) to form a summary. Potentially useful for browsing/analysing dialogues Possible using only prosodic features [Maskey and Hirschberg, 2005; Murray, 2008; Xie et al., 2009; Jauhar et al., 2013]. Whether prosodic features perform better than lexical features depends on the types of features used and the evaluation metric [Murray, 2008; Xie et al., 2009].
3 Automatic Extractive Summarization Questions: Where should we incorporate prosodic in extractive summarization? How do different aspects of prosody relate to what goes into meeting summaries? Current Work: Can prosodic features be used to augment lexical features for meeting summarization. How do these summaries differ from those utterance level prosody?
4 Prosodic Features in Extractive Summarization Direct modelling over dialogue acts: Aggregate stats for F0 and energy, with varying normalization. Treated as independent to that of lexical content. Possibly compensate for ASR errors. Emphatic important: e.g. higher mean and maximum F0 and energy. [Murray, 2008]. Duration based features can make a big difference depending on the evaluation [Murray, 2008; Penn and Zhu, 2008; Xie et al., 2009; Riedhammer et al., 2010].
5 Prosodic lexical features? Prosody also marks specific words as important in information structure terms [Silipo and Crestani, 2000; Calhoun, 2012]. Combining word prosody and tf.idf scores has been shown to help word level tasks, e.g: keyword extraction from voic [Koumpis and Renals, 2005], punctuation annotation [Christensen et al., 2001], topic tracking in broadcast news [Guinaudeau and Hirschberg, 2011]. Hypothesis: integrating prosodic information at the word level will improve extractive summarization performance over plain lexical features like tf.idf and DA level prosody.
6 Plan: Augmented Lexical Features Combine term frequency and prosodic features using an MLP to predict whether a word belongs to an Extracted Dialogue Act (EDA). Feed word level probabilities into the higher level DA extraction task as augmented lexical features. Look at results using precision/recall based measures: AUROC [Murray and Renals, 2007], ROUGE [Lin, 2004]. Look at variation in distribution of extracted DAs in the meeting timeline and summary redundancy.
7 Data 140 AMI scenario meetings [Carletta, 2007] 4 speakers, 4 remote control design stages per group. Standard development and test sets, 20 meetings/5 groups each. 75 ICSI research meetings [Janin et al., 2003] 3-9 speakers, 8 topics, e.g. robustness test set as in Murray [2008], dev=6 randomly selected meetings.
8 Data 140 AMI scenario meetings [Carletta, 2007] 4 speakers, 4 remote control design stages per group. Standard development and test sets, 20 meetings/5 groups each. 75 ICSI research meetings [Janin et al., 2003] 3-9 speakers, 8 topics, e.g. robustness test set as in Murray [2008], dev=6 randomly selected meetings.
9 Gold Standard Extractive Summaries Extracted Dialogue acts (EDAs), drawn from manual transcript and DA annotation. Aimed at someone who is concerned about the state of the project, like the department head. No absolute limit on dialogue act selection (10% guideline). Use only DAs linked to human abstractive summary content.
10 Prosodic Features F0 and Intensity extraction: Via Praat at 10ms intervals, Parameter settings were automatically determined for spurts [Evanini and Lai, 2010]. Missing values via linear interpolation after octave jump removal. Speaker normalization: over conversations, F0 semitones relative to speaker mean F0 (Hz) Intensity subtract speaker mean. Downdrift correction: For words, subtract predicted values from linear regression over spurts before calculating statistics. Statistics: mean, sd, max, min over words and DAs, include slope for DAs.
11 Term-Frequency Based Lexical Features After applying the Porter Stemmer, calculate: tf.idf and su.idf [Murray and Renals, 2007] Inverse document frequency was calculated over combined AMI, ICSI, TDT-2 corpora. Pointwise Mutual Information (PMI) of words with EDA/non-EDA status on training set [Galley, 2006] DA features: sum individual features for words in the DA [Murray and Renals, 2007; Xie, 2010]
12 Word Level Prediction Classify whether a word is in an EDA or not, using an MLP to learn feature combination weights. Feature ICSI AMI Logistic Regression tf.idf su.idf pmi MLP tf.idf.pros su.idf.pros pmi.pros pros tsp tsp.pros Table: Development set AUROC for word level EDA detection.
13 Word Level Prediction Classify whether a word is in an EDA or not, using an MLP to learn feature combination weights. Feature ICSI AMI Logistic Regression tf.idf su.idf pmi MLP tf.idf.pros su.idf.pros pmi.pros pros tsp tsp.pros Table: Development set AUROC for word level EDA detection.
14 Word Level Prediction Classify whether a word is in an EDA or not, using an MLP to learn feature combination weights. Feature ICSI AMI Logistic Regression tf.idf su.idf pmi MLP tf.idf.pros su.idf.pros pmi.pros pros tsp tsp.pros Table: Development set AUROC for word level EDA detection.
15 EDA Detection and Evaluation Classification of dialogue acts as EDAs. Multilevel logistic regression: annotators, meeting types, and corpora are group level effects [Gelman and Hill, 2007], using lme4 in R. AUROC over gold standard annotations. ROUGE-1 F-scores [Riedhammer et al., 2010] with DUC standard parameters [Xie and Liu, 2010], 15% word compression rate. Use additional annotations: 3-5 for ICSI, 2-3 for AMI. Focus on tf.idf.
16 DA Level Prediction: AUROC Adding DA prosody improves on bare lexical features. Augmented lexical features outperform DA level combinations.
17 Lexical Features vs Length Features: AUROC Features AMI ICSI word-tf.idf.pros word-tsp word-tsp.pros word-tsp+da-pros DA-PMI DA-dur DA-nwords Murray [2008] full Augmented lexical features do (a bit) better than length features and the full feature set in Murray [2008]. DA-PMI is not so predictive.
18 Lexical Features as Weights Summing over lexical features assumes they weight words as being more noteworthy for summarization. Augmented lexical features do a lot better than tf.idf and somewhat better than uniform weighting (DA-nwords). DA length is usually reported as most predictive for AUROC [Penn and Zhu, 2008; Murray, 2008] but not for n-gram based ROUGE [Xie et al., 2009; Riedhammer et al., 2010]. ICSI ROUGE-1 results show the same pattern as AUROC, but...
19 ROUGE-1: AMI Best performance from DA-tsp+DA-pros (0.595), but differences are within bootstrap confidence intervals. DA-nwords=0.582
20 Summary Differences In what other ways do the resulting summaries differ? Redundancy as in unsupervised approaches? [Zechner, 2002; Riedhammer et al., 2010]. Hold out each DA and measure its cosine distance to the rest of the summary and sum the distances [Zechner, 2002]. Location of noteworthy parts of a dialogue? Look at the proportion of summed EDA time from the summaries to that of the gold standard calculated over meeting quarters hot spots.
21 Redundancy Figure: Summed redundancy: w-lex.pros summaries were significantly less redundant than those based on bare lexical features ± level prosody (Wilcoxon p < 0.01, Holm corrected)
22 Distribution of EDAs Figure: Ratio of EDA time to gold standard EDA time in meeting quarters: The highest average proportion arises from DA level prosody models, but differences are not significant.
23 Conclusions Incorporating multiple sournces of prosodic and term-frequency information at the word level provides better performance than using DA level features in AUROC terms. Summaries derived from prosodically augmented lexical features exhibited less redundancy. While DA prosody generally performs worse, it could provide information for temporally locating larger regions of interest. Understanding of how to weight DA level prosody features requires extrinsic user based testing of how summaries are used in different tasks. e.g. browsing vs audits.
24 Thanks! Questions?
25 Thanks! This work was supported by the European Union under the FP7 project inevent (grant agreement ).
26 References Calhoun, S. (2012). The theme/rheme distinction: Accent type or relative prominence? Journal of Phonetics, 40(2): Carletta, J. (2007). Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Language Resources and Evaluation, 41(2): Christensen, H., Gotoh, Y., and Renals, S. (2001). Punctuation annotation using statistical prosody models. In Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding. Evanini, K. and Lai, C. (2010). The importance of optimal parameter setting for pitch extraction. Journal of the Acoustical Society of America, 128(4):2291. Galley, M. (2006). A skip-chain conditional random field for ranking meeting utterances by importance. In Proceedings of EMNLP 06, number July, pages Gelman, A. and Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press Cambridge. Guinaudeau, C. and Hirschberg, J. (2011). Accounting for prosodic information to improve ASR-based topic tracking for TV Broadcast News. In Interspeech 2011, pages Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., et al. (2003). The ICSI meeting corpus. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 03), volume 1, pages I 364. Jauhar, S., Chen, Y., and Metze, F. (2013). Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk. In IJCNLP Koumpis, K. and Renals, S. (2005). Automatic summarization of voic messages using lexical and prosodic features. ACM Transactions on Speech and Language Processing, 2(1):1 24. Lin, C. (2004). Rouge: A package for automatic evaluation of summaries. In Proceedings of Text Summarization Branches Out, number 1. Liu, F. and Liu, Y. (2013). Towards Abstractive Speech Summarization: Exploring Unsupervised and Supervised Approaches for Spoken Utterance Compression. IEEE Transactions on Audio, Speech, and Language Processing, 21(7): Maskey, S. and Hirschberg, J. (2005). Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization. In Interspeech Murray, G. (2008). Using Speech-Specific Characteristics for Automatic Speech Summarization. PhD thesis, University of Edinburgh. Murray, G. and Renals, S. (2007). Term-weighting for summarization of multi-party spoken dialogues. In Machine Learning for Multimodal Interaction IV, volume 4892.
27 Penn, G. and Zhu, X. (2008). A Critical Reassessment of Evaluation Baselines for Speech Summarization. In ACL 2008, number June, pages Riedhammer, K., Favre, B., and Hakkani-Tür, D. (2010). Long story short Global unsupervised models for keyphrase based meeting summarization. Speech Communication, 52(10): Silipo, R. and Crestani, F. (2000). Prosodic stress and topic detection in spoken sentences. In Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000, pages IEEE Comput. Soc. Xie, S. (2010). Automatic extractive summarization on meeting corpus. PhD thesis, University of Texas at Dallas. Xie, S., Hakkani-Tur, D., Favre, B., and Liu, Y. (2009). Integrating prosodic features in extractive meeting summarization. In 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, pages IEEE. Xie, S. and Liu, Y. (2010). Improving supervised learning for meeting summarization using sampling and regression. Computer Speech & Language, 24(3): Zechner, K. (2002). Automatic Summarization of Open-Domain Multiparty Dialogues in Diverse Genres. Computational Linguistics, 28(4):
28 DA Level Prediction: AUROC Figure: AUROC for the ICSI test sets Same pattern as for AMI, but DA prosody improves on the augmented lexical features here.
29 ROUGE-1: ICSI ROUGE-1 scores mirror AUROC results. DA-nwords=0.663
30 Word and DA Prosody Features AMI ICSI word-pros DA-pros DA-pros ± Murray [2008] prosody DA prosody improves a lot with feature extraction (cf. Murray [2008]). EDAs actually have significantly lower mean pitch than non-edas on average, but have expanded pitch range. Including prosodic delta features over ±4 DAs did not produce much of a change in performance.
31 Redundancy Figure: Summed redundancy: w-lex.pros summaries were significantly less redundant than those based on bare lexical features ± level prosody (Wilcoxon p < 0.01, Holm corrected)
32 Distribution of EDAs Figure: Ratio of EDA time to gold standard EDA time in meeting quarters: The highest average proportion arises from DA level prosody models, but differences are not significant.
33 Future work How do summarizer DA rankings effect user efficiency and satisfaction in browsing tasks? How do differences in ICSI and AMI meeting structure affect intrinsic measures? How do compression techniques [Liu and Liu, 2013] change ROUGE scores? Integrate prosodic features in unsupervised summarization methods that more closely fit ROUGE s objectives [Riedhammer et al., 2010]. Look at keyword identification as an objective for the generation of augmented lexical features (cf. Koumpis and Renals [2005])
Using Confusion Networks for Speech Summarization
Using Confusion Networks for Speech Summarization Shasha Xie and Yang Liu Department of Computer Science The University of Texas at Dallas {shasha,yangl}@hlt.utdallas.edu Abstract For extractive meeting
More informationUnsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts
Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts Feifan Liu, Deana Pennell, Fei Liu and Yang Liu Computer Science Department The University of Texas at Dallas Richardson,
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationA Publicly Available Annotated Corpus for Supervised Email Summarization
A Publicly Available Annotated Corpus for Supervised Email Summarization Jan Ulrich, Gabriel Murray, and Giuseppe Carenini Department of Computer Science University of British Columbia, Canada {ulrichj,
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationAutomatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
More information2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationGenerating Training Data for Medical Dictations
Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN pakhomov.sergey@mayo.edu Michael Schonwetter Linguistech Consortium, NJ MSchonwetter@qwest.net Joan Bachenko
More informationMixed-Source Multi-Document Speech-to-Text Summarization
Mixed-Source Multi-Document Speech-to-Text Summarization Ricardo Ribeiro INESC ID Lisboa/ISCTE/IST Spoken Language Systems Lab Rua Alves Redol, 9 1000-029 Lisboa, Portugal rdmr@l2f.inesc-id.pt David Martins
More informationKey Phrase Extraction of Lightly Filtered Broadcast News
Key Phrase Extraction of Lightly Filtered Broadcast News Luís Marujo 1,2, Ricardo Ribeiro 2,3, David Martins de Matos 2,4, João P. Neto 2,4, Anatole Gershman 1, and Jaime Carbonell 1 1 LTI/CMU, USA 2 L2F
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationSOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More informationThe Project Browser: Supporting Information Access for a Project Team
The Project Browser: Supporting Information Access for a Project Team Anita Cremers, Inge Kuijper, Peter Groenewegen, Wilfried Post TNO Human Factors, P.O. Box 23, 3769 ZG Soesterberg, The Netherlands
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationTED-LIUM: an Automatic Speech Recognition dedicated corpus
TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France firstname.lastname@lium.univ-lemans.fr
More informationUsing Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents
Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationAutomatic structural metadata identification based on multilayer prosodic information
Proceedings of Disfluency in Spontaneous Speech, DiSS 2013 Automatic structural metadata identification based on multilayer prosodic information Helena Moniz 1,2, Fernando Batista 1,3, Isabel Trancoso
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationAUTOMATIC DETECTION OF CONTRASTIVE ELEMENTS IN SPONTANEOUS SPEECH
AUTOMATIC DETECTION OF CONTRASTIVE ELEMENTS IN SPONTANEOUS SPEECH Ani Nenkova University of Pennsylvania nenkova@seas.upenn.edu Dan Jurafsky Stanford University jurafsky@stanford.edu ABSTRACT In natural
More informationCall Centre Conversation Summarization: A Pilot Task at Multiling 2015
Call Centre Conversation Summarization: A Pilot Task at Multiling 2015 Benoit Favre 1, Evgeny Stepanov 2, Jérémy Trione 1, Frédéric Béchet 1, Giuseppe Riccardi 2 1 Aix-Marseille University, CNRS, LIF UMR
More informationengin erzin the use of speech processing applications is expected to surge in multimedia-rich scenarios
engin erzin Associate Professor Department of Computer Engineering Ph.D. Bilkent University http://home.ku.edu.tr/ eerzin eerzin@ku.edu.tr Engin Erzin s research interests include speech processing, multimodal
More informationPresentation Video Retrieval using Automatically Recovered Slide and Spoken Text
Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text Matthew Cooper FX Palo Alto Laboratory Palo Alto, CA 94034 USA cooper@fxpal.com ABSTRACT Video is becoming a prevalent medium
More informationSentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationDETECTION OF QUESTIONS IN CHINESE CONVERSATIONAL SPEECH. Jiahong Yuan & Dan Jurafsky. Stanford University {jy55, jurafsky}@stanford.
DETECTION OF QUESTIONS IN CHINESE CONVERSATIONAL SPEECH Jiahong Yuan & Dan Jurafsky Stanford University {jy55, jurafsky}@stanford.edu ABSTRACT What features are helpful for Chinese question detection?
More informationCallAn: A Tool to Analyze Call Center Conversations
CallAn: A Tool to Analyze Call Center Conversations Balamurali AR, Frédéric Béchet And Benoit Favre Abstract Agent Quality Monitoring (QM) of customer calls is critical for call center companies. We present
More informationAnnotated bibliographies for presentations in MUMT 611, Winter 2006
Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationMovie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani
More informationComparative Error Analysis of Dialog State Tracking
Comparative Error Analysis of Dialog State Tracking Ronnie W. Smith Department of Computer Science East Carolina University Greenville, North Carolina, 27834 rws@cs.ecu.edu Abstract A primary motivation
More informationEffects of Automated Transcription Delay on Non-native Speakers Comprehension in Real-time Computermediated
Effects of Automated Transcription Delay on Non-native Speakers Comprehension in Real-time Computermediated Communication Lin Yao 1, Ying-xin Pan 2, and Dan-ning Jiang 2 1 Institute of Psychology, Chinese
More informationMODELING OF USER STATE ESPECIALLY OF EMOTIONS. Elmar Nöth. University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G.
MODELING OF USER STATE ESPECIALLY OF EMOTIONS Elmar Nöth University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, F.R.G. email: noeth@informatik.uni-erlangen.de Dagstuhl, October 2001
More informationProbabilistic topic models for sentiment analysis on the Web
University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as
More informationDiscriminative Multimodal Biometric. Authentication Based on Quality Measures
Discriminative Multimodal Biometric Authentication Based on Quality Measures Julian Fierrez-Aguilar a,, Javier Ortega-Garcia a, Joaquin Gonzalez-Rodriguez a, Josef Bigun b a Escuela Politecnica Superior,
More informationHow To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is
More informationExperiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National
More informationKNOWLEDGE-BASED IN MEDICAL DECISION SUPPORT SYSTEM BASED ON SUBJECTIVE INTELLIGENCE
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 22/2013, ISSN 1642-6037 medical diagnosis, ontology, subjective intelligence, reasoning, fuzzy rules Hamido FUJITA 1 KNOWLEDGE-BASED IN MEDICAL DECISION
More informationSocial Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets
Social Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets D7.5 Dissemination Plan Project ref. no H2020 141111 Project acronym Start date of project (dur.) Document due Date
More informationSPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS
SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS Mbarek Charhad, Daniel Moraru, Stéphane Ayache and Georges Quénot CLIPS-IMAG BP 53, 38041 Grenoble cedex 9, France Georges.Quenot@imag.fr ABSTRACT The
More informationDetecting Forum Authority Claims in Online Discussions
Detecting Forum Authority Claims in Online Discussions Alex Marin, Bin Zhang, Mari Ostendorf Department of Electrical Engineering University of Washington {amarin, binz}@uw.edu, mo@ee.washington.edu Abstract
More informationSpoken Document Retrieval from Call-Center Conversations
Spoken Document Retrieval from Call-Center Conversations Jonathan Mamou, David Carmel, Ron Hoory IBM Haifa Research Labs Haifa 31905, Israel {mamou,carmel,hoory}@il.ibm.com ABSTRACT We are interested in
More informationRecognition of Emotions in Interactive Voice Response Systems
Recognition of Emotions in Interactive Voice Response Systems Sherif Yacoub, Steve Simske, Xiaofan Lin, John Burns HP Laboratories Palo Alto HPL-2003-136 July 2 nd, 2003* E-mail: {sherif.yacoub, steven.simske,
More informationProper Name Retrieval from Diachronic Documents for Automatic Speech Transcription using Lexical and Temporal Context
Proper Name Retrieval from Diachronic Documents for Automatic Speech Transcription using Lexical and Temporal Context Irina Illina, Dominique Fohr, Georges Linarès To cite this version: Irina Illina, Dominique
More informationLMELECTURES: A MULTIMEDIA CORPUS OF ACADEMIC SPOKEN ENGLISH
LMELECTURES: A MULTIMEDIA CORPUS OF ACADEMIC SPOKEN ENGLISH K. Riedhammer, M. Gropp, T. Bocklet, F. Hönig, E. Nöth, S. Steidl Pattern Recognition Lab, University of Erlangen-Nuremberg, GERMANY noeth@cs.fau.de
More informationD2.4: Two trained semantic decoders for the Appointment Scheduling task
D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive
More informationHMM-based Breath and Filled Pauses Elimination in ASR
HMM-based Breath and Filled Pauses Elimination in ASR Piotr Żelasko 1, Tomasz Jadczyk 1,2 and Bartosz Ziółko 1,2 1 Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationLSTM for Punctuation Restoration in Speech Transcripts
LSTM for Punctuation Restoration in Speech Transcripts Ottokar Tilk, Tanel Alumäe Institute of Cybernetics Tallinn University of Technology, Estonia ottokar.tilk@phon.ioc.ee, tanel.alumae@phon.ioc.ee Abstract
More informationRecent advances in Digital Music Processing and Indexing
Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components
More informationSimple maths for keywords
Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd adam@lexmasterclass.com Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationThis document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
More informationTHE FUTURE OF BUSINESS MEETINGS APPLICATIONS FOR AMI TECHNOLOGIES
THE FUTURE OF BUSINESS MEETINGS APPLICATIONS FOR AMI TECHNOLOGIES 2006 AMI. All rights reserved. TABLE OF CONTENTS Welcome to the Future 2 The Augmented Multi-party Interaction Project 2 Applications
More informationMIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationTHE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM. Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3
Liu1;3 THE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3 1International Computer Science Institute, USA2SRI International,
More information4 Pitch and range in language and music
4 Pitch and range in language and music 4.1 Average and range of pitch in spoken language and song 4.1.1 Average and range of pitch in language Fant (1956) determined the average values for fundamental
More informationA Comparative Analysis of Speech Recognition Platforms
Communications of the IIMA Volume 9 Issue 3 Article 2 2009 A Comparative Analysis of Speech Recognition Platforms Ore A. Iona College Follow this and additional works at: http://scholarworks.lib.csusb.edu/ciima
More informationA System for Searching and Browsing Spoken Communications
A System for Searching and Browsing Spoken Communications Lee Begeja Bernard Renger Murat Saraclar AT&T Labs Research 180 Park Ave Florham Park, NJ 07932 {lee, renger, murat} @research.att.com Abstract
More informationisecure: Integrating Learning Resources for Information Security Research and Education The isecure team
isecure: Integrating Learning Resources for Information Security Research and Education The isecure team 1 isecure NSF-funded collaborative project (2012-2015) Faculty NJIT Vincent Oria Jim Geller Reza
More informationAUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language
AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID
More informationSpoken Dialog Challenge 2010: Comparison of Live and Control Test Results
Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results Alan W Black 1, Susanne Burger 1, Alistair Conkie 4, Helen Hastie 2, Simon Keizer 3, Oliver Lemon 2, Nicolas Merigaud 2, Gabriel
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationThe Influence of Topic and Domain Specific Words on WER
The Influence of Topic and Domain Specific Words on WER And Can We Get the User in to Correct Them? Sebastian Stüker KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der
More informationClassifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
More informationMUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
More informationLINGUISTIC DISSECTION OF SWITCHBOARD-CORPUS AUTOMATIC SPEECH RECOGNITION SYSTEMS
LINGUISTIC DISSECTION OF SWITCHBOARD-CORPUS AUTOMATIC SPEECH RECOGNITION SYSTEMS Steven Greenberg and Shuangyu Chang International Computer Science Institute 1947 Center Street, Berkeley, CA 94704, USA
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationSummarizing Online Forum Discussions Can Dialog Acts of Individual Messages Help?
Summarizing Online Forum Discussions Can Dialog Acts of Individual Messages Help? Sumit Bhatia 1, Prakhar Biyani 2 and Prasenjit Mitra 2 1 IBM Almaden Research Centre, 650 Harry Road, San Jose, CA 95123,
More informationData Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationPronunciation in English
The Electronic Journal for English as a Second Language Pronunciation in English March 2013 Volume 16, Number 4 Title Level Publisher Type of product Minimum Hardware Requirements Software Requirements
More informationUnlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics
Unlocking Value from Patanjali V, Lead Data Scientist, Anand B, Director Analytics Consulting, EXECUTIVE SUMMARY Today a lot of unstructured data is being generated in the form of text, images, videos
More informationDublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationHow to Improve the Sound Quality of Your Microphone
An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,
More informationSeparation and Classification of Harmonic Sounds for Singing Voice Detection
Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay
More informationThe effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications
Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.
More informationThings to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
More informationPREDICTING MARKET VOLATILITY FEDERAL RESERVE BOARD MEETING MINUTES FROM
PREDICTING MARKET VOLATILITY FROM FEDERAL RESERVE BOARD MEETING MINUTES Reza Bosagh Zadeh and Andreas Zollmann Lab Advisers: Noah Smith and Bryan Routledge GOALS Make Money! Not really. Find interesting
More informationEfficient diphone database creation for MBROLA, a multilingual speech synthesiser
Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland Why? useful for testing speech models
More informationOPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu
More informationTrigonometric functions and sound
Trigonometric functions and sound The sounds we hear are caused by vibrations that send pressure waves through the air. Our ears respond to these pressure waves and signal the brain about their amplitude
More informationSpeech Transcription
TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion
More informationInteraction Mining: the new frontier of Call Center Analytics
Interaction Mining: the new frontier of Call Center Analytics Vincenzo Pallotta 1, Rodolfo Delmonte 1,2, Lammert Vrieling 1, David Walker 1 Interanalytics Rue des Savoises, 19 1205 Geneva, Switzerland
More informationMicro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
More informationIFS-8000 V2.0 INFORMATION FUSION SYSTEM
IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence
More informationPerceptual experiments sir-skur-spur-stir
Perceptual experiments sir-skur-spur-stir Amy Beeston & Guy Brown 19 May 21 1 Introduction 2 Experiment 1: cutoff Set up Results 3 Experiment 2: reverse Set up Results 4 Discussion Introduction introduction
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationCorpus Design for a Unit Selection Database
Corpus Design for a Unit Selection Database Norbert Braunschweiler Institute for Natural Language Processing (IMS) Stuttgart 8 th 9 th October 2002 BITS Workshop, München Norbert Braunschweiler Corpus
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More information1. Introduction to Spoken Dialogue Systems
SoSe 2006 Projekt Sprachdialogsysteme 1. Introduction to Spoken Dialogue Systems Walther v. Hahn, Cristina Vertan {vhahn,vertan}@informatik.uni-hamburg.de Content What are Spoken dialogue systems? Types
More informationLIUM s Statistical Machine Translation System for IWSLT 2010
LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,
More informationE-discovery Taking Predictive Coding Out of the Black Box
E-discovery Taking Predictive Coding Out of the Black Box Joseph H. Looby Senior Managing Director FTI TECHNOLOGY IN CASES OF COMMERCIAL LITIGATION, the process of discovery can place a huge burden on
More informationIntelligent Agents Serving Based On The Society Information
Intelligent Agents Serving Based On The Society Information Sanem SARIEL Istanbul Technical University, Computer Engineering Department, Istanbul, TURKEY sariel@cs.itu.edu.tr B. Tevfik AKGUN Yildiz Technical
More information