Automatic Formatted Transcripts for Videos
|
|
- Silvia Fisher
- 8 years ago
- Views:
Transcription
1 Automatic Formatted Transcripts for Videos Aasish Pappu, Amanda Stent Yahoo! Labs Abstract Multimedia content may be supplemented with time-aligned closed captions for accessibility. Often these captions are created manually by professional editors an expensive and timeconsuming process. In this paper, we present a novel approach to automatic creation of a well-formatted, readable transcript for a video from closed captions or ASR output. Our approach uses acoustic and lexical features extracted from the video and the raw transcription/caption files. We compare our approach with two standard baselines: a) silence segmented transcripts and b) text-only segmented transcripts. We show that our approach outperforms both these baselines based on subjective and objective metrics. Index Terms: Spoken Language Processing, Closed-Captions, Spoken Text Normalization 1. Introduction Multimedia content such as video and audio files may be supplemented with closed captions for accessibility. Captioning multimedia content is typically a two step process: 1) transcribe the content to obtain text and non-speech events (e.g., applause), and 2) temporally align the transcription with the content to produce closed captions. Closed captions and transcripts not only make multimedia content accessible, but can also improve the searchability of the content [1], assist in video classification [2, 3] and video segmentation [4, 5], and help to highlight salient objects in a video frame [6]. Although closed captions can be very useful, the traditional approach to closed captioning is an expensive and timeconsuming process, requiring multiple rounds of manual transcription and alignment. Even after this, while the captions may be accurate, manual time alignments are typically perceptibly off. In addition, most search engine operators do not index closed caption files, but only index text made visible in a web page, so in order for multimedia content to be treated as first-class web content, it must be accompanied by visible transcripts. Most content publishers are unwilling to accompany their videos with poor-quality transcripts such as might be produced by simply printing out raw transcriptions or closed captions; that is, transcripts must be readable and not look spammy (e.g. big blocks of text, long sentences). However, manual construction of a readable and well formatted transcript requires additional time-consuming and expensive rounds of editing following closed captioning. In this paper, we present a system that takes a raw transcription of a video (e.g. crowd-sourced or ASR output) as input and generates: (i) accurately time-aligned closed captions; and (ii) a readable, well formatted transcript with punctuation, capitalization, and paragraph segmentation. Because the manual effort is reduced to straight transcription, considerable time and money can be saved and more multimedia can be made accessible and searchable. Our system has four components: alignment of transcript and video, punctuation insertion, capitalization and paragraph segmentation. In this paper we focus in particular on the task of punctuation insertion. We demonstrate that a combination of textual and acoustic features leads to higher accuracy on this task than either feature type alone, and that this leads to better decisions in later stages of the transcript formatting pipeline (capitalization, paragraph break insertion). This paper is organized as follows. First, we present a brief overview of previous work on formatting raw speech transcripts. Then, in Section 3, we describe our system. In Section 4, we present evaluations of our system, focusing on punctuation insertion and its downstream impacts. Finally, we make concluding remarks and briefly describe future directions for this work. 2. Related Work on Punctuation Insertion There has been a lot of previous work specifically on punctuation insertion in speech transcripts. Here we mention only highlights. Previous work on formatting speech recognition output has shown that both textual and frame-level features can be useful for punctuation prediction. Huang and Zweig used lexical and pause features in a maximum entropy tagger trained and tested on Switchboard conversational speech [7]. Kim and Woodland used lexical, pause, F0 and RMS features in a decision tree framework [8]. Liu et al. were the first to apply conditional random fields to this task [9]. In all cases they looked only at insertion of commas, question marks and periods. Other researchers have obtained good results for both punctuation and capitalization using only lexical information, with large quantities of training data [10, 11, 12]. More recent work has shown that accurate punctuation prediction can improve machine translation output with cross-lingual features [13, 14, 15]. In this work we show that functional features of low-level acoustic descriptors can complement textual features in punctuation of raw transcripts, and that improvements here can lead to outsize impacts on downstream processing (e.g. capitalization), and improved overall readability of final formatted transcripts. 3. System Our system takes as input a video and a raw transcription of the speech in the video. This transcription may be obtained through automatic speech recognition or (with considerably higher accuracy) through crowdsourcing or professional transcription [16]. The input is processed in four stages Preprocessing and Alignment We extract the audio from the input video. We obtain a phoneme-level transcription from the input word-level tran-
2 Video Raw Text Extract Audio Generate Pronunciation Construct Finite State Graph Sequence Alignment Grapheme To Phoneme Model Acoustic Model Figure 1: Workflow to obtain time-aligned transcriptions scription using the CMU pronunciation dictionary 1, and SE- QUITUR [17] for out of dictionary words. Then, we run the P2FA forced aligner [18] with the HUB4 acoustic model from the Sphinx open-source speech recognizer [19] on the audio and phoneme-level transcription. The process is outlined in Figure 1. The result is timestamps for every word in the input transcription, as well as for silences (regions of no speech). Figure 2 shows an example time-aligned speech segment followed by silence. We use this time-aligned data in our speech-based punctuation insertion model. As a side effect, we can also construct closed captions from the time-aligned transcription Punctuation Insertion We use the frontend from the Flite text-to-speech synthesizer [20] to normalize and homogenize the raw transcription. We run the normalized transcription through a punctuation insertion system trained on a corpus of well-formatted video transcriptions. We implemented systems for performing punctuation insertion using textual and acoustic information alone, as well as an ensemble approach (see Section 4.2.4) Capitalization After inserting punctuations into the transcription, we extract sentences. Within each sentence, we capitalize tokens wherever applicable i.e., named entities, tokens at the beginning of a sentence, and other special cases such as I m. We use the recaser tool in the MOSES machine translation toolkit [21], trained on one year of news articles, for capitalization Paragraph Boundary Insertion There has been relatively little work on paragraph break insertion, and most existing methods require considerable processing, e.g. parsing [22, 23]. In a large-scale commercial video transcription system there is very little time for preprocessing. To group sentences into paragraphs, we use the TextTiling algorithm [24]. This algorithm allows us to detect topic shifts across sentences and insert paragraph breaks when topic shifts occur. A topic shift is detected by computing lexical similarity between adjacent groups of sentences. When new words are introduced in a group of sentences, the algorithm attempts to insert a paragraph boundary preceding this group. The number of paragraph boundaries is determined by the distribution of the topic shift scores across the whole text. 1 Figure 2: Spectrogram and pitch contour for an example utterance with sharp rise and fall in intonation. Utterance text: wished for a sandwich i did i totally did <sil> 4.1. Data 4. Experiments We trained and evaluated our system using a set of videos that we obtained from Yahoo Screen. The videos cover several genres finance, sports, odd news, SNL, comedy, and trending now. The video lengths range from 20 seconds to 12 minutes. The primary language used in all videos is English. For each video, we have high quality, professionally created closed captions containing punctuation and capitalization (but no paragraph boundaries). We split the data into training data (243 videos), development data (121 videos) and test data (35 videos), randomly but balancing across genres. We trained models for both text-based and speech-based punctuation insertion on the training data, and tuned the parameters for the ensemble model using the development data Punctuation Insertion First, we compare models based on textual and speech features against standard baselines. Then, we compare these models with an ensemble method Baselines We compare our models against the following baselines: Baseline1: Uses a trigram language model to insert punctuation. We trained the language model on a corpus of news articles. This baseline can insert all types of punctuation, and is similar to the phrase-break insertion approach used in speech synthesis [25]. Baseline2: Uses the silence durations between phrases to insert punctuation according to: (min_silence < comma < max_silence < period). We used min_silence = 0.22 and max_silence = 0.4, which we computed based on 10-fold cross validation over the training data. This baseline can only insert {comma, period, none} Text-Based Model We use the CRF++ toolkit (crfpp.sourceforge.net) to train a sequence tagger to insert all types of punctuation (including none) between each pair of words in the input transcription. As features, we use part-of-speech (POS) tags and tokens from the transcription. We use the CLEARNLP toolkit [26] with its off-the-shelf model to predict POS tags.
3 Extremes Means Peaks Segments Onset Moments Crossings Percentiles Regression Samples Times DCT max, min, range arithmetic, geometric num. peaks, distance between peaks num. segments num. onsets, offsets st. deviation, variance zero-crossing rate, mean crossing rate percentile values, inter-percentile ranges linear and quadratic regression coefficients sampled values at equidistant frames rise and fall of the curve, duration DCT coefficients Table 1: Functional features used in speech based punctuation insertion Speech-Based Model Before we describe the speech-based punctuation model, we would like to provide the intuition behind using acoustic features to insert punctuation. Punctuations in spoken language not only serve as phrase breaks but also convey the sentiment/emotion of the speaker. For example, in Figure 2, one may predict from the transcription that the text should end with a PERIOD, but based on the pitch contour, which shows a sharp rise and fall in intonation, one may predict an EXCLAMATION. Therefore, we treat this problem as a hybrid of phrase-break prediction and emotion detection. Emotion detection in speech is a well-studied problem (e.g. [27, 28]). It is known that functional features are critical in these tasks [29]. A functional feature takes a sequence of low-level feature descriptors as input and produces a fixed-size vector as output. Since functional features are computed over low-level descriptor contours, they capture trends over the entire speech segment. Since the length of the output vector is independent of the length of the input sequence, we can easily perform feature selection across the vector s dimensions. Our speech-based punctuation insertion system operates only on silences from the input time-aligned transcription. We used opensmile [30] to extract 12 functional features (listed in Table 1) over the speech segment preceding each silence. We compute these functional features over four low level feature descriptors: Energy, Voicing probability, Pitch Onsets, and Duration. The feature extraction process results in a 2268 dimensional vector, which we use to classify each silence as one of {exclamation, questionmark, period, comma, hyphen, none}. We use the Weka [31] implementation of Random Forests with 30 trees and maximum depth computed based on cross-validation on the training data Ensemble Method We hypothesize that textual or speech features alone may provide incomplete information; for example, silences may not always translate into punctuations in the text and vice-versa. Table 2 shows silences in the training data that map to punctuations and silences that do not. We notice that only 1/3 of silences correspond to punctuations. To incorporate textual and speech information, we trained an ensemble method, a logistic regression model trained over the development data using the predicted labels and confidence scores from the text based and speech based models. Punctuation No punctuation Silence No silence n/a Results Table 2: Silences vs. punctuations We evaluate these methods against the professionally created closed captions for each video, which include punctuation. We use F1 score to assess label accuracy and word error rate (WER) to assess label positioning. Table 3 shows results for our two baseline methods, our text and speech based models, and the ensemble method. We observe that both individual models outperform the baselines. In addition, the ensemble model outperforms all the other methods on both metrics. We conclude that textual and speech features complement each other for this task Capitalization Sentence-final punctuation is crucial for capitalization. The closed captions in our data include capitalization, allowing us to measure capitalization accuracy for different punctuation methods. As Table 4 shows, the ensemble method outperforms other methods by 7%. This outsize performance difference is due to better punctuation with the ensemble model, particularly with respect to end-of-the-sentence punctuations (period, questionmark and exclamation) Subjective Evaluation Sentence-final punctuation is also crucial for paragraph breaking; however, closed captions do not contain paragraph breaks. We conducted a crowdsourced evaluation using Amazon mechanical turk to assess the overall readability of the transcripts our system can produce, including punctuation, capitalization and paragraph breaks. Turkers were presented with a video (including closed captions automatically aligned by our system) and three transcript variants produced using: (a) text based punctuation insertion followed by capitalization and paragraph break insertion; (b) speech based punctuation insertion followed by capitalization and paragraph break insertion; and (c) the ensemble method for punctuation insertion followed by capitalization and paragraph break insertion. In all cases, the tran- Method WER F1 Baseline Baseline Text only Speech only Ensemble Table 3: Results: punctuation insertion Punctuation model Accuracy Text only 65.2 Speech only 64.0 Ensemble 71.1 Table 4: Results: capitalization
4 Method Paragraphs Punctuations Capitalized words Macroaveraged rank Winners Losers text only speech only ensemble Table 5: Statistics on the test data and subjective evaluation results scripts were automatically preprocessed to remove white space between a punctuation and the previous token, remove white space within contractions like we ll, and normalize numbers. Turkers were told that the transcript variants were produced by computer programs. They were asked to rank the transcript variants from 1 (best / most readable) to 3 (worst / least readable). They were allowed to assign more than one transcript to a ranking, but were asked to use the whole range of rankings from 1 to 3 if possible. They were also asked to explain what made the best transcript(s) better, and what made the worst transcript(s) worse. We produced one HIT for each of the 35 videos in our test data. Each HIT was assigned to seven turkers. Turkers were required to be US-based (thus, presumably fluent English speakers) and to have completed at least 100 HITs with an approval rating of 90% or higher. We paid $0.15 per assignment. Table 5 shows some statistics about the punctuation, capitalization and paragraph breaks in the evaluated transcripts, as well as the macroaveraged results of the subjective evaluation and the number of test documents for which each method was a winner (four or more judgments of rank 1) or a loser (four or more judgments of rank 3). The ensemble method has highest overall readability. In their comments, turkers praised highly-ranked transcript variants for the flow of the paragraph and sentence breaks, and condemned low-ranked transcript variants for being chopped up (too many paragraph/sentence breaks), one big block of text (too few paragraph/sentence breaks), missing speaker changes, or missing capitalizations Analysis We observe that on certain videos even the ensemble system cannot achieve good performance. This is typically due to nonspeech events in the videos. In our data, videos contain the following non-speech events: laughter, music, applause and generic background noise. To fix this problem, we could use chroma features to detect music events and use acoustic event datasets 2 to filter laughter, applause and other noise events. Since we use acoustic features for punctuation insertion, accurate alignment of text and speech is critical. We compared our automatically-obtained word-level time alignment with the manually-created time alignments available from the professionally created closed captions. Since word-level manual alignments are not available with the closed captions, we only analyzed speech segments and non-speech events. From Table 6, we see that automatically aligned non-speech events are misaligned by 20 seconds, whereas manually aligned speech segments are misaligned by 6 seconds. Automatic alignment does well for speech; however, non-speech segments introduce errors during extraction of speech features due to misalignment. 2 c4dm.eecs.qmul.ac.uk/sceneseventschallenge/ description.html SegmentType Instances Begin(sec) End(sec) Speech Non-speech Table 6: Difference between automatic and manual alignments. means automatic segments are misaligned. means manual segments are misaligned. 5. Conclusions and Future Work We describe a system for generating well-formatted transcripts for videos from input raw transcriptions/closed captions. The system includes components for alignment of transcription to video, punctuation insertion, capitalization, and paragraph break insertion. We focus on the task of punctuation insertion, as it is critical to later stages. In both quantitative and qualitative evaluations, we show that an ensemble method combining acoustic and textual features outperforms speech-based and text-based methods, and leads to outsize improvements in later stages of processing. Although we achieve high accuracy for punctuation insertion, we still miss 7% of punctuations. We see that acoustic features are inaccurate when alignment fails due to non-speech segments. In addition, our system currently does not detect speaker change events, which should generally correspond to punctuation insertions. In future work, we could add speaker change detection, acoustic event detection and music identification to the preprocessing and alignment stage of our system. 6. References [1] J.-C. Shim, C. Dorai, and R. Bolle, Automatic text extraction from video for content-based annotation and retrieval, in Proceedings of the International Conference on Pattern Recognition, [2] D. Brezeale and D. Cook, Using closed captions and visual features to classify movies by genre, Proceedings of the ACM KDD Workshop on Multimedia Data Mining, [3] K. Filippova and K. Hall, Improved video categorization from text metadata and user comments, in Proceedings of SIGIR, [4] A. Hauptmann and M. Witbrock, Story segmentation and detection of commercials in broadcast news video, in Proceedings of the IEEE International Forum on Research and Technology Advances in Digital Libraries, [5] T. Cour et al., Movie/script: Alignment and parsing of video and text transcription, in Proceedings of ECCV, [6] M. Bertini, C. Colombo, and A. Del Bimbo, Automatic caption localization in videos using salient points, in Proceedings of ICME, [7] J. Huang and G. Zweig, Maximum entropy model for punctuation annotation from speech, in Proceedings of INTERSPEECH, [8] J. Kim and P. Woodland, The use of prosody in a combined system for punctuation generation and speech recognition, in Proceedings of INTERSPEECH, 2001.
5 [9] Y. Liu, A. Stolcke, E. Shriberg, and M. Harper, Using conditional random fields for sentence boundary detection in speech, in Proceedings of the ACL, [10] A. Gravano, M. Jansche, and M. Bacchiani, Restoring punctuation and capitalization in transcribed speech, in Proceedings of ICASSP, [11] W. Lu and H. Ng, Better punctuation prediction with dynamic conditional random fields, in Proceedings of EMNLP, [12] M. Shugrina, Formatting time-aligned ASR transcripts for readability, in Proceedings of HLT/NAACL, [13] S. Peitz, M. Freitag, A. Mauser, and H. Ney, Modeling punctuation prediction as machine translation, in Proceedings of the International Workshop on Spoken Language Translation, [14] F. Batista, H. Moniz, I. Trancoso, and N. Mamede, Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts, IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, pp , [15] J. Miranda, J. Neto, and A. Black, Improved punctuation recovery through combination of multiple speech streams, in Proceedings of ASRU, [16] R. Kushalnagar, W. Lasecki, and J. Bigham, A readability evaluation of real-time crowd captions in the classroom, in Proceedings of ASSETS, [17] M. Bisani and H. Ney, Joint-sequence models for grapheme-tophoneme conversion, Speech Communication, vol. 50, no. 5, pp , [18] J. Yuan and M. Liberman, Speaker identification on the SCOTUS corpus, in Proceedings of Acoustics 08, [19] P. Placeway et al., The 1996 Hub-4 Sphinx-3 system, Proceedings of the DARPA Speech Recognition Workshop, [20] A. Black and K. Lenzo, Flite: a small fast run-time synthesis engine, in Proceedings of the ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis, [21] P. Koehn et al., Moses: Open source toolkit for statistical machine translation, in Proceedings of the ACL, [22] K. Filippova and M. Strube, Using linguistically motivated features for paragraph boundary identification, in Proceedings of EMNLP, [23] C. Sporleder and M. Lapata, Broad coverage paragraph segmentation across languages and domains, ACM Transactions on Speech and Language Processing (TSLP), vol. 3, no. 2, [24] M. Hearst, TextTiling: Segmenting text into multi-paragraph subtopic passages, Computational linguistics, vol. 23, no. 1, pp , [25] P. Taylor and A. Black, Assigning phrase breaks from part-ofspeech sequences, Computer Speech and Language, vol. 12, pp , [26] J. Choi, Optimization of natural language processing components for robustness and scalability, Ph.D. dissertation, [27] B. Schuller, S. Steidl, and A. Batliner, The INTERSPEECH 2009 emotion challenge. in Proceedings of INTERSPEECH, [28] M. Valstar et al., Avec 2013: the continuous audio/visual emotion and depression recognition challenge, in Proceedings of the International Audio/Visual Emotion Challenge and Workshop, [29] B. Schuller et al., The INTERSPEECH 2010 paralinguistic challenge. in Proceedings of INTERSPEECH, [30] F. Eyben et al., Recent developments in opensmile, the Munich open-source multimedia feature extractor, in Proceedings of ACM Multimedia, [31] M. Hall et al., The WEKA data mining software: an update, ACM SIGKDD explorations newsletter, vol. 11, no. 1, pp , 2009.
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationUsing the Amazon Mechanical Turk for Transcription of Spoken Language
Research Showcase @ CMU Computer Science Department School of Computer Science 2010 Using the Amazon Mechanical Turk for Transcription of Spoken Language Matthew R. Marge Satanjeev Banerjee Alexander I.
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationMIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
More information2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
More informationAutomatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
More informationPresentation Video Retrieval using Automatically Recovered Slide and Spoken Text
Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text Matthew Cooper FX Palo Alto Laboratory Palo Alto, CA 94034 USA cooper@fxpal.com ABSTRACT Video is becoming a prevalent medium
More informationSemantic Video Annotation by Mining Association Patterns from Visual and Speech Features
Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering
More informationAUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language
AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID
More informationLIUM s Statistical Machine Translation System for IWSLT 2010
LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,
More informationSOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationEquity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK Nikola.milosevic@manchester.ac.uk Abstract Long
More informationAnnotated bibliographies for presentations in MUMT 611, Winter 2006
Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.
More informationSurvey: Retrieval of Video Using Content (Speech &Text) Information
Survey: Retrieval of Video Using Content (Speech &Text) Information Bhagwant B. Handge 1, Prof. N.R.Wankhade 2 1. M.E Student, Kalyani Charitable Trust s Late G.N. Sapkal College of Engineering, Nashik
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationC E D A T 8 5. Innovating services and technologies for speech content management
C E D A T 8 5 Innovating services and technologies for speech content management Company profile 25 years experience in the market of transcription/reporting services; Cedat 85 Group: Cedat 85 srl Subtitle
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationFast Labeling and Transcription with the Speechalyzer Toolkit
Fast Labeling and Transcription with the Speechalyzer Toolkit Felix Burkhardt Deutsche Telekom Laboratories, Berlin, Germany Felix.Burkhardt@telekom.de Abstract We describe a software tool named Speechalyzer
More informationUnlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics
Unlocking Value from Patanjali V, Lead Data Scientist, Anand B, Director Analytics Consulting, EXECUTIVE SUMMARY Today a lot of unstructured data is being generated in the form of text, images, videos
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationINFERRING SOCIAL RELATIONSHIPS IN A PHONE CALL FROM A SINGLE PARTY S SPEECH Sree Harsha Yella 1,2, Xavier Anguera 1 and Jordi Luque 1
INFERRING SOCIAL RELATIONSHIPS IN A PHONE CALL FROM A SINGLE PARTY S SPEECH Sree Harsha Yella 1,2, Xavier Anguera 1 and Jordi Luque 1 1 Telefonica Research, Barcelona, Spain 2 Idiap Research Institute,
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationAutomatic structural metadata identification based on multilayer prosodic information
Proceedings of Disfluency in Spontaneous Speech, DiSS 2013 Automatic structural metadata identification based on multilayer prosodic information Helena Moniz 1,2, Fernando Batista 1,3, Isabel Trancoso
More informationUsing Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents
Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationTHE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM. Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3
Liu1;3 THE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3 1International Computer Science Institute, USA2SRI International,
More informationLeveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training
Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training Matthias Paulik and Panchi Panchapagesan Cisco Speech and Language Technology (C-SALT), Cisco Systems, Inc.
More informationSENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen
SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas
More informationSegmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System
Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Eunah Cho, Jan Niehues and Alex Waibel International Center for Advanced Communication Technologies
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationGenerating Training Data for Medical Dictations
Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN pakhomov.sergey@mayo.edu Michael Schonwetter Linguistech Consortium, NJ MSchonwetter@qwest.net Joan Bachenko
More information2009 Springer. This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp. 12 23 DOI: 10.1007/978-3-642-18449-9_2
Institutional Repository This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp. 12 23 DOI: 10.1007/978-3-642-18449-9_2 2009 Springer Some Experiments in Evaluating ASR Systems
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationAUTOMATIC VIDEO STRUCTURING BASED ON HMMS AND AUDIO VISUAL INTEGRATION
AUTOMATIC VIDEO STRUCTURING BASED ON HMMS AND AUDIO VISUAL INTEGRATION P. Gros (1), E. Kijak (2) and G. Gravier (1) (1) IRISA CNRS (2) IRISA Université de Rennes 1 Campus Universitaire de Beaulieu 35042
More informationBuilding A Vocabulary Self-Learning Speech Recognition System
INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes
More informationCombining Content Analysis of Television Programs with Audience Measurement
Combining Content Analysis of Television Programs with Audience Measurement David Gibbon, Zhu Liu, Eric Zavesky, DeDe Paul, Deborah F. Swayne, Rittwik Jana, Behzad Shahraray AT&T Labs - Research {dcg,zliu,ezavesky,dedepaul,dfs,rjana,behzad}@research.att.com
More informationA GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing
A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing LAW VI JEJU 2012 Bayu Distiawan Trisedya & Ruli Manurung Faculty of Computer Science Universitas
More informationGerman Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings
German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings Haojin Yang, Christoph Oehlke, Christoph Meinel Hasso Plattner Institut (HPI), University of Potsdam P.O. Box
More informationSentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
More informationCENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
More informationSpeech Recognition in the Informedia TM Digital Video Library: Uses and Limitations
Speech Recognition in the Informedia TM Digital Video Library: Uses and Limitations Alexander G. Hauptmann School of Computer Science, Carnegie Mellon University, Pittsburgh PA, USA Abstract In principle,
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationSPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS
SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS Mbarek Charhad, Daniel Moraru, Stéphane Ayache and Georges Quénot CLIPS-IMAG BP 53, 38041 Grenoble cedex 9, France Georges.Quenot@imag.fr ABSTRACT The
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationComparative Study on Subject Classification of Academic Videos using Noisy Transcripts
Comparative Study on Subject Classification of Academic Videos using Noisy Transcripts Hau-Wen Chang Hung-sik Kim Shuyang Li + Jeongkyu Lee + Dongwon Lee The Pennsylvania State University, USA {hauwen,hungsik,dongwon}@psu.edu,
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationTED-LIUM: an Automatic Speech Recognition dedicated corpus
TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France firstname.lastname@lium.univ-lemans.fr
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationEffect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果
Effect of Captioning Lecture Videos For Learning in Foreign Language VERI FERDIANSYAH 1 SEIICHI NAKAGAWA 1 Toyohashi University of Technology, Tenpaku-cho, Toyohashi 441-858 Japan E-mail: {veri, nakagawa}@slp.cs.tut.ac.jp
More informationRecognition of Emotions in Interactive Voice Response Systems
Recognition of Emotions in Interactive Voice Response Systems Sherif Yacoub, Steve Simske, Xiaofan Lin, John Burns HP Laboratories Palo Alto HPL-2003-136 July 2 nd, 2003* E-mail: {sherif.yacoub, steven.simske,
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion Transcripts Vitomir Kovanović v.kovanovic@ed.ac.uk Dragan Gašević dgasevic@acm.org School of Informatics, University of Edinburgh Edinburgh, United Kingdom v.kovanovic@ed.ac.uk
More informationFinancial Trading System using Combination of Textual and Numerical Data
Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,
More informationSupervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization
Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization Luís Marujo 1,2, Anatole Gershman 1, Jaime Carbonell 1, Robert Frederking 1,
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationIntroduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
More informationSVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
More informationCrowdfunding Support Tools: Predicting Success & Failure
Crowdfunding Support Tools: Predicting Success & Failure Michael D. Greenberg Bryan Pardo mdgreenb@u.northwestern.edu pardo@northwestern.edu Karthic Hariharan karthichariharan2012@u.northwes tern.edu Elizabeth
More informationAn Automated Analysis and Indexing Framework for Lecture Video Portal
An Automated Analysis and Indexing Framework for Lecture Video Portal Haojin Yang, Christoph Oehlke, and Christoph Meinel Hasso Plattner Institute (HPI), University of Potsdam, Germany {Haojin.Yang,Meinel}@hpi.uni-potsdam.de,
More informationThe LENA TM Language Environment Analysis System:
FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September
More informationCarla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software
Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
More informationCS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis
CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More informationIndustry Guidelines on Captioning Television Programs 1 Introduction
Industry Guidelines on Captioning Television Programs 1 Introduction These guidelines address the quality of closed captions on television programs by setting a benchmark for best practice. The guideline
More informationMLLP Transcription and Translation Platform
MLLP Transcription and Translation Platform Alejandro Pérez González de Martos, Joan Albert Silvestre-Cerdà, Juan Daniel Valor Miró, Jorge Civera, and Alfons Juan Universitat Politècnica de València, Camino
More informationAn Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationOPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationProsodic Phrasing: Machine and Human Evaluation
Prosodic Phrasing: Machine and Human Evaluation M. Céu Viana*, Luís C. Oliveira**, Ana I. Mata***, *CLUL, **INESC-ID/IST, ***FLUL/CLUL Rua Alves Redol 9, 1000 Lisboa, Portugal mcv@clul.ul.pt, lco@inesc-id.pt,
More informationMovie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani
More informationFlorida International University - University of Miami TRECVID 2014
Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationTowards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
More informationEmail Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationRobust Sentiment Detection on Twitter from Biased and Noisy Data
Robust Sentiment Detection on Twitter from Biased and Noisy Data Luciano Barbosa AT&T Labs - Research lbarbosa@research.att.com Junlan Feng AT&T Labs - Research junlan@research.att.com Abstract In this
More informationCombining Slot-based Vector Space Model for Voice Book Search
Combining Slot-based Vector Space Model for Voice Book Search Cheongjae Lee and Tatsuya Kawahara and Alexander Rudnicky Abstract We describe a hybrid approach to vector space model that improves accuracy
More informationTHUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationEnglish Grammar Checker
International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,
More informationNamed Entity Recognition in Broadcast News Using Similar Written Texts
Named Entity Recognition in Broadcast News Using Similar Written Texts Niraj Shrestha Ivan Vulić KU Leuven, Belgium KU Leuven, Belgium niraj.shrestha@cs.kuleuven.be ivan.vulic@@cs.kuleuven.be Abstract
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationBeating the MLB Moneyline
Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
More informationPredicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
More informationM3039 MPEG 97/ January 1998
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039
More informationLeveraging Video Interaction Data and Content Analysis to Improve Video Learning
Leveraging Video Interaction Data and Content Analysis to Improve Video Learning Juho Kim juhokim@mit.edu Shang-Wen (Daniel) Li swli@mit.edu Carrie J. Cai cjcai@mit.edu Krzysztof Z. Gajos Harvard SEAS
More informationScaling Shrinkage-Based Language Models
Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 USA {stanchen,mangu,bhuvana,sarikaya,asethy}@us.ibm.com
More informationSpoken Document Retrieval from Call-Center Conversations
Spoken Document Retrieval from Call-Center Conversations Jonathan Mamou, David Carmel, Ron Hoory IBM Haifa Research Labs Haifa 31905, Israel {mamou,carmel,hoory}@il.ibm.com ABSTRACT We are interested in
More informationMicro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More information