Automatic Formatted Transcripts for Videos

Size: px
Start display at page:

Download "Automatic Formatted Transcripts for Videos"

Transcription

1 Automatic Formatted Transcripts for Videos Aasish Pappu, Amanda Stent Yahoo! Labs Abstract Multimedia content may be supplemented with time-aligned closed captions for accessibility. Often these captions are created manually by professional editors an expensive and timeconsuming process. In this paper, we present a novel approach to automatic creation of a well-formatted, readable transcript for a video from closed captions or ASR output. Our approach uses acoustic and lexical features extracted from the video and the raw transcription/caption files. We compare our approach with two standard baselines: a) silence segmented transcripts and b) text-only segmented transcripts. We show that our approach outperforms both these baselines based on subjective and objective metrics. Index Terms: Spoken Language Processing, Closed-Captions, Spoken Text Normalization 1. Introduction Multimedia content such as video and audio files may be supplemented with closed captions for accessibility. Captioning multimedia content is typically a two step process: 1) transcribe the content to obtain text and non-speech events (e.g., applause), and 2) temporally align the transcription with the content to produce closed captions. Closed captions and transcripts not only make multimedia content accessible, but can also improve the searchability of the content [1], assist in video classification [2, 3] and video segmentation [4, 5], and help to highlight salient objects in a video frame [6]. Although closed captions can be very useful, the traditional approach to closed captioning is an expensive and timeconsuming process, requiring multiple rounds of manual transcription and alignment. Even after this, while the captions may be accurate, manual time alignments are typically perceptibly off. In addition, most search engine operators do not index closed caption files, but only index text made visible in a web page, so in order for multimedia content to be treated as first-class web content, it must be accompanied by visible transcripts. Most content publishers are unwilling to accompany their videos with poor-quality transcripts such as might be produced by simply printing out raw transcriptions or closed captions; that is, transcripts must be readable and not look spammy (e.g. big blocks of text, long sentences). However, manual construction of a readable and well formatted transcript requires additional time-consuming and expensive rounds of editing following closed captioning. In this paper, we present a system that takes a raw transcription of a video (e.g. crowd-sourced or ASR output) as input and generates: (i) accurately time-aligned closed captions; and (ii) a readable, well formatted transcript with punctuation, capitalization, and paragraph segmentation. Because the manual effort is reduced to straight transcription, considerable time and money can be saved and more multimedia can be made accessible and searchable. Our system has four components: alignment of transcript and video, punctuation insertion, capitalization and paragraph segmentation. In this paper we focus in particular on the task of punctuation insertion. We demonstrate that a combination of textual and acoustic features leads to higher accuracy on this task than either feature type alone, and that this leads to better decisions in later stages of the transcript formatting pipeline (capitalization, paragraph break insertion). This paper is organized as follows. First, we present a brief overview of previous work on formatting raw speech transcripts. Then, in Section 3, we describe our system. In Section 4, we present evaluations of our system, focusing on punctuation insertion and its downstream impacts. Finally, we make concluding remarks and briefly describe future directions for this work. 2. Related Work on Punctuation Insertion There has been a lot of previous work specifically on punctuation insertion in speech transcripts. Here we mention only highlights. Previous work on formatting speech recognition output has shown that both textual and frame-level features can be useful for punctuation prediction. Huang and Zweig used lexical and pause features in a maximum entropy tagger trained and tested on Switchboard conversational speech [7]. Kim and Woodland used lexical, pause, F0 and RMS features in a decision tree framework [8]. Liu et al. were the first to apply conditional random fields to this task [9]. In all cases they looked only at insertion of commas, question marks and periods. Other researchers have obtained good results for both punctuation and capitalization using only lexical information, with large quantities of training data [10, 11, 12]. More recent work has shown that accurate punctuation prediction can improve machine translation output with cross-lingual features [13, 14, 15]. In this work we show that functional features of low-level acoustic descriptors can complement textual features in punctuation of raw transcripts, and that improvements here can lead to outsize impacts on downstream processing (e.g. capitalization), and improved overall readability of final formatted transcripts. 3. System Our system takes as input a video and a raw transcription of the speech in the video. This transcription may be obtained through automatic speech recognition or (with considerably higher accuracy) through crowdsourcing or professional transcription [16]. The input is processed in four stages Preprocessing and Alignment We extract the audio from the input video. We obtain a phoneme-level transcription from the input word-level tran-

2 Video Raw Text Extract Audio Generate Pronunciation Construct Finite State Graph Sequence Alignment Grapheme To Phoneme Model Acoustic Model Figure 1: Workflow to obtain time-aligned transcriptions scription using the CMU pronunciation dictionary 1, and SE- QUITUR [17] for out of dictionary words. Then, we run the P2FA forced aligner [18] with the HUB4 acoustic model from the Sphinx open-source speech recognizer [19] on the audio and phoneme-level transcription. The process is outlined in Figure 1. The result is timestamps for every word in the input transcription, as well as for silences (regions of no speech). Figure 2 shows an example time-aligned speech segment followed by silence. We use this time-aligned data in our speech-based punctuation insertion model. As a side effect, we can also construct closed captions from the time-aligned transcription Punctuation Insertion We use the frontend from the Flite text-to-speech synthesizer [20] to normalize and homogenize the raw transcription. We run the normalized transcription through a punctuation insertion system trained on a corpus of well-formatted video transcriptions. We implemented systems for performing punctuation insertion using textual and acoustic information alone, as well as an ensemble approach (see Section 4.2.4) Capitalization After inserting punctuations into the transcription, we extract sentences. Within each sentence, we capitalize tokens wherever applicable i.e., named entities, tokens at the beginning of a sentence, and other special cases such as I m. We use the recaser tool in the MOSES machine translation toolkit [21], trained on one year of news articles, for capitalization Paragraph Boundary Insertion There has been relatively little work on paragraph break insertion, and most existing methods require considerable processing, e.g. parsing [22, 23]. In a large-scale commercial video transcription system there is very little time for preprocessing. To group sentences into paragraphs, we use the TextTiling algorithm [24]. This algorithm allows us to detect topic shifts across sentences and insert paragraph breaks when topic shifts occur. A topic shift is detected by computing lexical similarity between adjacent groups of sentences. When new words are introduced in a group of sentences, the algorithm attempts to insert a paragraph boundary preceding this group. The number of paragraph boundaries is determined by the distribution of the topic shift scores across the whole text. 1 Figure 2: Spectrogram and pitch contour for an example utterance with sharp rise and fall in intonation. Utterance text: wished for a sandwich i did i totally did <sil> 4.1. Data 4. Experiments We trained and evaluated our system using a set of videos that we obtained from Yahoo Screen. The videos cover several genres finance, sports, odd news, SNL, comedy, and trending now. The video lengths range from 20 seconds to 12 minutes. The primary language used in all videos is English. For each video, we have high quality, professionally created closed captions containing punctuation and capitalization (but no paragraph boundaries). We split the data into training data (243 videos), development data (121 videos) and test data (35 videos), randomly but balancing across genres. We trained models for both text-based and speech-based punctuation insertion on the training data, and tuned the parameters for the ensemble model using the development data Punctuation Insertion First, we compare models based on textual and speech features against standard baselines. Then, we compare these models with an ensemble method Baselines We compare our models against the following baselines: Baseline1: Uses a trigram language model to insert punctuation. We trained the language model on a corpus of news articles. This baseline can insert all types of punctuation, and is similar to the phrase-break insertion approach used in speech synthesis [25]. Baseline2: Uses the silence durations between phrases to insert punctuation according to: (min_silence < comma < max_silence < period). We used min_silence = 0.22 and max_silence = 0.4, which we computed based on 10-fold cross validation over the training data. This baseline can only insert {comma, period, none} Text-Based Model We use the CRF++ toolkit (crfpp.sourceforge.net) to train a sequence tagger to insert all types of punctuation (including none) between each pair of words in the input transcription. As features, we use part-of-speech (POS) tags and tokens from the transcription. We use the CLEARNLP toolkit [26] with its off-the-shelf model to predict POS tags.

3 Extremes Means Peaks Segments Onset Moments Crossings Percentiles Regression Samples Times DCT max, min, range arithmetic, geometric num. peaks, distance between peaks num. segments num. onsets, offsets st. deviation, variance zero-crossing rate, mean crossing rate percentile values, inter-percentile ranges linear and quadratic regression coefficients sampled values at equidistant frames rise and fall of the curve, duration DCT coefficients Table 1: Functional features used in speech based punctuation insertion Speech-Based Model Before we describe the speech-based punctuation model, we would like to provide the intuition behind using acoustic features to insert punctuation. Punctuations in spoken language not only serve as phrase breaks but also convey the sentiment/emotion of the speaker. For example, in Figure 2, one may predict from the transcription that the text should end with a PERIOD, but based on the pitch contour, which shows a sharp rise and fall in intonation, one may predict an EXCLAMATION. Therefore, we treat this problem as a hybrid of phrase-break prediction and emotion detection. Emotion detection in speech is a well-studied problem (e.g. [27, 28]). It is known that functional features are critical in these tasks [29]. A functional feature takes a sequence of low-level feature descriptors as input and produces a fixed-size vector as output. Since functional features are computed over low-level descriptor contours, they capture trends over the entire speech segment. Since the length of the output vector is independent of the length of the input sequence, we can easily perform feature selection across the vector s dimensions. Our speech-based punctuation insertion system operates only on silences from the input time-aligned transcription. We used opensmile [30] to extract 12 functional features (listed in Table 1) over the speech segment preceding each silence. We compute these functional features over four low level feature descriptors: Energy, Voicing probability, Pitch Onsets, and Duration. The feature extraction process results in a 2268 dimensional vector, which we use to classify each silence as one of {exclamation, questionmark, period, comma, hyphen, none}. We use the Weka [31] implementation of Random Forests with 30 trees and maximum depth computed based on cross-validation on the training data Ensemble Method We hypothesize that textual or speech features alone may provide incomplete information; for example, silences may not always translate into punctuations in the text and vice-versa. Table 2 shows silences in the training data that map to punctuations and silences that do not. We notice that only 1/3 of silences correspond to punctuations. To incorporate textual and speech information, we trained an ensemble method, a logistic regression model trained over the development data using the predicted labels and confidence scores from the text based and speech based models. Punctuation No punctuation Silence No silence n/a Results Table 2: Silences vs. punctuations We evaluate these methods against the professionally created closed captions for each video, which include punctuation. We use F1 score to assess label accuracy and word error rate (WER) to assess label positioning. Table 3 shows results for our two baseline methods, our text and speech based models, and the ensemble method. We observe that both individual models outperform the baselines. In addition, the ensemble model outperforms all the other methods on both metrics. We conclude that textual and speech features complement each other for this task Capitalization Sentence-final punctuation is crucial for capitalization. The closed captions in our data include capitalization, allowing us to measure capitalization accuracy for different punctuation methods. As Table 4 shows, the ensemble method outperforms other methods by 7%. This outsize performance difference is due to better punctuation with the ensemble model, particularly with respect to end-of-the-sentence punctuations (period, questionmark and exclamation) Subjective Evaluation Sentence-final punctuation is also crucial for paragraph breaking; however, closed captions do not contain paragraph breaks. We conducted a crowdsourced evaluation using Amazon mechanical turk to assess the overall readability of the transcripts our system can produce, including punctuation, capitalization and paragraph breaks. Turkers were presented with a video (including closed captions automatically aligned by our system) and three transcript variants produced using: (a) text based punctuation insertion followed by capitalization and paragraph break insertion; (b) speech based punctuation insertion followed by capitalization and paragraph break insertion; and (c) the ensemble method for punctuation insertion followed by capitalization and paragraph break insertion. In all cases, the tran- Method WER F1 Baseline Baseline Text only Speech only Ensemble Table 3: Results: punctuation insertion Punctuation model Accuracy Text only 65.2 Speech only 64.0 Ensemble 71.1 Table 4: Results: capitalization

4 Method Paragraphs Punctuations Capitalized words Macroaveraged rank Winners Losers text only speech only ensemble Table 5: Statistics on the test data and subjective evaluation results scripts were automatically preprocessed to remove white space between a punctuation and the previous token, remove white space within contractions like we ll, and normalize numbers. Turkers were told that the transcript variants were produced by computer programs. They were asked to rank the transcript variants from 1 (best / most readable) to 3 (worst / least readable). They were allowed to assign more than one transcript to a ranking, but were asked to use the whole range of rankings from 1 to 3 if possible. They were also asked to explain what made the best transcript(s) better, and what made the worst transcript(s) worse. We produced one HIT for each of the 35 videos in our test data. Each HIT was assigned to seven turkers. Turkers were required to be US-based (thus, presumably fluent English speakers) and to have completed at least 100 HITs with an approval rating of 90% or higher. We paid $0.15 per assignment. Table 5 shows some statistics about the punctuation, capitalization and paragraph breaks in the evaluated transcripts, as well as the macroaveraged results of the subjective evaluation and the number of test documents for which each method was a winner (four or more judgments of rank 1) or a loser (four or more judgments of rank 3). The ensemble method has highest overall readability. In their comments, turkers praised highly-ranked transcript variants for the flow of the paragraph and sentence breaks, and condemned low-ranked transcript variants for being chopped up (too many paragraph/sentence breaks), one big block of text (too few paragraph/sentence breaks), missing speaker changes, or missing capitalizations Analysis We observe that on certain videos even the ensemble system cannot achieve good performance. This is typically due to nonspeech events in the videos. In our data, videos contain the following non-speech events: laughter, music, applause and generic background noise. To fix this problem, we could use chroma features to detect music events and use acoustic event datasets 2 to filter laughter, applause and other noise events. Since we use acoustic features for punctuation insertion, accurate alignment of text and speech is critical. We compared our automatically-obtained word-level time alignment with the manually-created time alignments available from the professionally created closed captions. Since word-level manual alignments are not available with the closed captions, we only analyzed speech segments and non-speech events. From Table 6, we see that automatically aligned non-speech events are misaligned by 20 seconds, whereas manually aligned speech segments are misaligned by 6 seconds. Automatic alignment does well for speech; however, non-speech segments introduce errors during extraction of speech features due to misalignment. 2 c4dm.eecs.qmul.ac.uk/sceneseventschallenge/ description.html SegmentType Instances Begin(sec) End(sec) Speech Non-speech Table 6: Difference between automatic and manual alignments. means automatic segments are misaligned. means manual segments are misaligned. 5. Conclusions and Future Work We describe a system for generating well-formatted transcripts for videos from input raw transcriptions/closed captions. The system includes components for alignment of transcription to video, punctuation insertion, capitalization, and paragraph break insertion. We focus on the task of punctuation insertion, as it is critical to later stages. In both quantitative and qualitative evaluations, we show that an ensemble method combining acoustic and textual features outperforms speech-based and text-based methods, and leads to outsize improvements in later stages of processing. Although we achieve high accuracy for punctuation insertion, we still miss 7% of punctuations. We see that acoustic features are inaccurate when alignment fails due to non-speech segments. In addition, our system currently does not detect speaker change events, which should generally correspond to punctuation insertions. In future work, we could add speaker change detection, acoustic event detection and music identification to the preprocessing and alignment stage of our system. 6. References [1] J.-C. Shim, C. Dorai, and R. Bolle, Automatic text extraction from video for content-based annotation and retrieval, in Proceedings of the International Conference on Pattern Recognition, [2] D. Brezeale and D. Cook, Using closed captions and visual features to classify movies by genre, Proceedings of the ACM KDD Workshop on Multimedia Data Mining, [3] K. Filippova and K. Hall, Improved video categorization from text metadata and user comments, in Proceedings of SIGIR, [4] A. Hauptmann and M. Witbrock, Story segmentation and detection of commercials in broadcast news video, in Proceedings of the IEEE International Forum on Research and Technology Advances in Digital Libraries, [5] T. Cour et al., Movie/script: Alignment and parsing of video and text transcription, in Proceedings of ECCV, [6] M. Bertini, C. Colombo, and A. Del Bimbo, Automatic caption localization in videos using salient points, in Proceedings of ICME, [7] J. Huang and G. Zweig, Maximum entropy model for punctuation annotation from speech, in Proceedings of INTERSPEECH, [8] J. Kim and P. Woodland, The use of prosody in a combined system for punctuation generation and speech recognition, in Proceedings of INTERSPEECH, 2001.

5 [9] Y. Liu, A. Stolcke, E. Shriberg, and M. Harper, Using conditional random fields for sentence boundary detection in speech, in Proceedings of the ACL, [10] A. Gravano, M. Jansche, and M. Bacchiani, Restoring punctuation and capitalization in transcribed speech, in Proceedings of ICASSP, [11] W. Lu and H. Ng, Better punctuation prediction with dynamic conditional random fields, in Proceedings of EMNLP, [12] M. Shugrina, Formatting time-aligned ASR transcripts for readability, in Proceedings of HLT/NAACL, [13] S. Peitz, M. Freitag, A. Mauser, and H. Ney, Modeling punctuation prediction as machine translation, in Proceedings of the International Workshop on Spoken Language Translation, [14] F. Batista, H. Moniz, I. Trancoso, and N. Mamede, Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts, IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, pp , [15] J. Miranda, J. Neto, and A. Black, Improved punctuation recovery through combination of multiple speech streams, in Proceedings of ASRU, [16] R. Kushalnagar, W. Lasecki, and J. Bigham, A readability evaluation of real-time crowd captions in the classroom, in Proceedings of ASSETS, [17] M. Bisani and H. Ney, Joint-sequence models for grapheme-tophoneme conversion, Speech Communication, vol. 50, no. 5, pp , [18] J. Yuan and M. Liberman, Speaker identification on the SCOTUS corpus, in Proceedings of Acoustics 08, [19] P. Placeway et al., The 1996 Hub-4 Sphinx-3 system, Proceedings of the DARPA Speech Recognition Workshop, [20] A. Black and K. Lenzo, Flite: a small fast run-time synthesis engine, in Proceedings of the ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis, [21] P. Koehn et al., Moses: Open source toolkit for statistical machine translation, in Proceedings of the ACL, [22] K. Filippova and M. Strube, Using linguistically motivated features for paragraph boundary identification, in Proceedings of EMNLP, [23] C. Sporleder and M. Lapata, Broad coverage paragraph segmentation across languages and domains, ACM Transactions on Speech and Language Processing (TSLP), vol. 3, no. 2, [24] M. Hearst, TextTiling: Segmenting text into multi-paragraph subtopic passages, Computational linguistics, vol. 23, no. 1, pp , [25] P. Taylor and A. Black, Assigning phrase breaks from part-ofspeech sequences, Computer Speech and Language, vol. 12, pp , [26] J. Choi, Optimization of natural language processing components for robustness and scalability, Ph.D. dissertation, [27] B. Schuller, S. Steidl, and A. Batliner, The INTERSPEECH 2009 emotion challenge. in Proceedings of INTERSPEECH, [28] M. Valstar et al., Avec 2013: the continuous audio/visual emotion and depression recognition challenge, in Proceedings of the International Audio/Visual Emotion Challenge and Workshop, [29] B. Schuller et al., The INTERSPEECH 2010 paralinguistic challenge. in Proceedings of INTERSPEECH, [30] F. Eyben et al., Recent developments in opensmile, the Munich open-source multimedia feature extractor, in Proceedings of ACM Multimedia, [31] M. Hall et al., The WEKA data mining software: an update, ACM SIGKDD explorations newsletter, vol. 11, no. 1, pp , 2009.

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

Using the Amazon Mechanical Turk for Transcription of Spoken Language

Using the Amazon Mechanical Turk for Transcription of Spoken Language Research Showcase @ CMU Computer Science Department School of Computer Science 2010 Using the Amazon Mechanical Turk for Transcription of Spoken Language Matthew R. Marge Satanjeev Banerjee Alexander I.

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

2014/02/13 Sphinx Lunch

2014/02/13 Sphinx Lunch 2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue

More information

Automatic slide assignation for language model adaptation

Automatic slide assignation for language model adaptation Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly

More information

Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text

Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text Presentation Video Retrieval using Automatically Recovered Slide and Spoken Text Matthew Cooper FX Palo Alto Laboratory Palo Alto, CA 94034 USA cooper@fxpal.com ABSTRACT Video is becoming a prevalent medium

More information

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering

More information

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID

More information

LIUM s Statistical Machine Translation System for IWSLT 2010

LIUM s Statistical Machine Translation System for IWSLT 2010 LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,

More information

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

Equity forecast: Predicting long term stock price movement using machine learning

Equity forecast: Predicting long term stock price movement using machine learning Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK Nikola.milosevic@manchester.ac.uk Abstract Long

More information

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Annotated bibliographies for presentations in MUMT 611, Winter 2006 Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.

More information

Survey: Retrieval of Video Using Content (Speech &Text) Information

Survey: Retrieval of Video Using Content (Speech &Text) Information Survey: Retrieval of Video Using Content (Speech &Text) Information Bhagwant B. Handge 1, Prof. N.R.Wankhade 2 1. M.E Student, Kalyani Charitable Trust s Late G.N. Sapkal College of Engineering, Nashik

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

C E D A T 8 5. Innovating services and technologies for speech content management

C E D A T 8 5. Innovating services and technologies for speech content management C E D A T 8 5 Innovating services and technologies for speech content management Company profile 25 years experience in the market of transcription/reporting services; Cedat 85 Group: Cedat 85 srl Subtitle

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

Fast Labeling and Transcription with the Speechalyzer Toolkit

Fast Labeling and Transcription with the Speechalyzer Toolkit Fast Labeling and Transcription with the Speechalyzer Toolkit Felix Burkhardt Deutsche Telekom Laboratories, Berlin, Germany Felix.Burkhardt@telekom.de Abstract We describe a software tool named Speechalyzer

More information

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics Unlocking Value from Patanjali V, Lead Data Scientist, Anand B, Director Analytics Consulting, EXECUTIVE SUMMARY Today a lot of unstructured data is being generated in the form of text, images, videos

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

More information

INFERRING SOCIAL RELATIONSHIPS IN A PHONE CALL FROM A SINGLE PARTY S SPEECH Sree Harsha Yella 1,2, Xavier Anguera 1 and Jordi Luque 1

INFERRING SOCIAL RELATIONSHIPS IN A PHONE CALL FROM A SINGLE PARTY S SPEECH Sree Harsha Yella 1,2, Xavier Anguera 1 and Jordi Luque 1 INFERRING SOCIAL RELATIONSHIPS IN A PHONE CALL FROM A SINGLE PARTY S SPEECH Sree Harsha Yella 1,2, Xavier Anguera 1 and Jordi Luque 1 1 Telefonica Research, Barcelona, Spain 2 Idiap Research Institute,

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Automatic structural metadata identification based on multilayer prosodic information

Automatic structural metadata identification based on multilayer prosodic information Proceedings of Disfluency in Spontaneous Speech, DiSS 2013 Automatic structural metadata identification based on multilayer prosodic information Helena Moniz 1,2, Fernando Batista 1,3, Isabel Trancoso

More information

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

THE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM. Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3

THE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM. Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3 Liu1;3 THE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3 1International Computer Science Institute, USA2SRI International,

More information

Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training

Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training Matthias Paulik and Panchi Panchapagesan Cisco Speech and Language Technology (C-SALT), Cisco Systems, Inc.

More information

SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen

SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS. Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen SENTIMENT EXTRACTION FROM NATURAL AUDIO STREAMS Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen Center for Robust Speech Systems (CRSS), Eric Jonsson School of Engineering, The University of Texas

More information

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Eunah Cho, Jan Niehues and Alex Waibel International Center for Advanced Communication Technologies

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Generating Training Data for Medical Dictations

Generating Training Data for Medical Dictations Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN pakhomov.sergey@mayo.edu Michael Schonwetter Linguistech Consortium, NJ MSchonwetter@qwest.net Joan Bachenko

More information

2009 Springer. This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp. 12 23 DOI: 10.1007/978-3-642-18449-9_2

2009 Springer. This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp. 12 23 DOI: 10.1007/978-3-642-18449-9_2 Institutional Repository This document is published in: Adaptive Multimedia Retrieval. LNCS 6535 (2009) pp. 12 23 DOI: 10.1007/978-3-642-18449-9_2 2009 Springer Some Experiments in Evaluating ASR Systems

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

AUTOMATIC VIDEO STRUCTURING BASED ON HMMS AND AUDIO VISUAL INTEGRATION

AUTOMATIC VIDEO STRUCTURING BASED ON HMMS AND AUDIO VISUAL INTEGRATION AUTOMATIC VIDEO STRUCTURING BASED ON HMMS AND AUDIO VISUAL INTEGRATION P. Gros (1), E. Kijak (2) and G. Gravier (1) (1) IRISA CNRS (2) IRISA Université de Rennes 1 Campus Universitaire de Beaulieu 35042

More information

Building A Vocabulary Self-Learning Speech Recognition System

Building A Vocabulary Self-Learning Speech Recognition System INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes

More information

Combining Content Analysis of Television Programs with Audience Measurement

Combining Content Analysis of Television Programs with Audience Measurement Combining Content Analysis of Television Programs with Audience Measurement David Gibbon, Zhu Liu, Eric Zavesky, DeDe Paul, Deborah F. Swayne, Rittwik Jana, Behzad Shahraray AT&T Labs - Research {dcg,zliu,ezavesky,dedepaul,dfs,rjana,behzad}@research.att.com

More information

A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing

A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing LAW VI JEJU 2012 Bayu Distiawan Trisedya & Ruli Manurung Faculty of Computer Science Universitas

More information

German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings

German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings Haojin Yang, Christoph Oehlke, Christoph Meinel Hasso Plattner Institut (HPI), University of Potsdam P.O. Box

More information

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015 Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

More information

CENG 734 Advanced Topics in Bioinformatics

CENG 734 Advanced Topics in Bioinformatics CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the

More information

Speech Recognition in the Informedia TM Digital Video Library: Uses and Limitations

Speech Recognition in the Informedia TM Digital Video Library: Uses and Limitations Speech Recognition in the Informedia TM Digital Video Library: Uses and Limitations Alexander G. Hauptmann School of Computer Science, Carnegie Mellon University, Pittsburgh PA, USA Abstract In principle,

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS

SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS Mbarek Charhad, Daniel Moraru, Stéphane Ayache and Georges Quénot CLIPS-IMAG BP 53, 38041 Grenoble cedex 9, France Georges.Quenot@imag.fr ABSTRACT The

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Comparative Study on Subject Classification of Academic Videos using Noisy Transcripts

Comparative Study on Subject Classification of Academic Videos using Noisy Transcripts Comparative Study on Subject Classification of Academic Videos using Noisy Transcripts Hau-Wen Chang Hung-sik Kim Shuyang Li + Jeongkyu Lee + Dongwon Lee The Pennsylvania State University, USA {hauwen,hungsik,dongwon}@psu.edu,

More information

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics

More information

TED-LIUM: an Automatic Speech Recognition dedicated corpus

TED-LIUM: an Automatic Speech Recognition dedicated corpus TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France firstname.lastname@lium.univ-lemans.fr

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果

Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果 Effect of Captioning Lecture Videos For Learning in Foreign Language VERI FERDIANSYAH 1 SEIICHI NAKAGAWA 1 Toyohashi University of Technology, Tenpaku-cho, Toyohashi 441-858 Japan E-mail: {veri, nakagawa}@slp.cs.tut.ac.jp

More information

Recognition of Emotions in Interactive Voice Response Systems

Recognition of Emotions in Interactive Voice Response Systems Recognition of Emotions in Interactive Voice Response Systems Sherif Yacoub, Steve Simske, Xiaofan Lin, John Burns HP Laboratories Palo Alto HPL-2003-136 July 2 nd, 2003* E-mail: {sherif.yacoub, steven.simske,

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Automated Content Analysis of Discussion Transcripts

Automated Content Analysis of Discussion Transcripts Automated Content Analysis of Discussion Transcripts Vitomir Kovanović v.kovanovic@ed.ac.uk Dragan Gašević dgasevic@acm.org School of Informatics, University of Edinburgh Edinburgh, United Kingdom v.kovanovic@ed.ac.uk

More information

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

More information

Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization

Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization Luís Marujo 1,2, Anatole Gershman 1, Jaime Carbonell 1, Robert Frederking 1,

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

SVM Based Learning System For Information Extraction

SVM Based Learning System For Information Extraction SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk

More information

Crowdfunding Support Tools: Predicting Success & Failure

Crowdfunding Support Tools: Predicting Success & Failure Crowdfunding Support Tools: Predicting Success & Failure Michael D. Greenberg Bryan Pardo mdgreenb@u.northwestern.edu pardo@northwestern.edu Karthic Hariharan karthichariharan2012@u.northwes tern.edu Elizabeth

More information

An Automated Analysis and Indexing Framework for Lecture Video Portal

An Automated Analysis and Indexing Framework for Lecture Video Portal An Automated Analysis and Indexing Framework for Lecture Video Portal Haojin Yang, Christoph Oehlke, and Christoph Meinel Hasso Plattner Institute (HPI), University of Potsdam, Germany {Haojin.Yang,Meinel}@hpi.uni-potsdam.de,

More information

The LENA TM Language Environment Analysis System:

The LENA TM Language Environment Analysis System: FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September

More information

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis

More information

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis Team members: Daniel Debbini, Philippe Estin, Maxime Goutagny Supervisor: Mihai Surdeanu (with John Bauer) 1 Introduction

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

Industry Guidelines on Captioning Television Programs 1 Introduction

Industry Guidelines on Captioning Television Programs 1 Introduction Industry Guidelines on Captioning Television Programs 1 Introduction These guidelines address the quality of closed captions on television programs by setting a benchmark for best practice. The guideline

More information

MLLP Transcription and Translation Platform

MLLP Transcription and Translation Platform MLLP Transcription and Translation Platform Alejandro Pérez González de Martos, Joan Albert Silvestre-Cerdà, Juan Daniel Valor Miró, Jorge Civera, and Alfons Juan Universitat Politècnica de València, Camino

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane

OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Prosodic Phrasing: Machine and Human Evaluation

Prosodic Phrasing: Machine and Human Evaluation Prosodic Phrasing: Machine and Human Evaluation M. Céu Viana*, Luís C. Oliveira**, Ana I. Mata***, *CLUL, **INESC-ID/IST, ***FLUL/CLUL Rua Alves Redol 9, 1000 Lisboa, Portugal mcv@clul.ul.pt, lco@inesc-id.pt,

More information

Movie Classification Using k-means and Hierarchical Clustering

Movie Classification Using k-means and Hierarchical Clustering Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani

More information

Florida International University - University of Miami TRECVID 2014

Florida International University - University of Miami TRECVID 2014 Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Robust Sentiment Detection on Twitter from Biased and Noisy Data

Robust Sentiment Detection on Twitter from Biased and Noisy Data Robust Sentiment Detection on Twitter from Biased and Noisy Data Luciano Barbosa AT&T Labs - Research lbarbosa@research.att.com Junlan Feng AT&T Labs - Research junlan@research.att.com Abstract In this

More information

Combining Slot-based Vector Space Model for Voice Book Search

Combining Slot-based Vector Space Model for Voice Book Search Combining Slot-based Vector Space Model for Voice Book Search Cheongjae Lee and Tatsuya Kawahara and Alexander Rudnicky Abstract We describe a hybrid approach to vector space model that improves accuracy

More information

THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

Named Entity Recognition in Broadcast News Using Similar Written Texts

Named Entity Recognition in Broadcast News Using Similar Written Texts Named Entity Recognition in Broadcast News Using Similar Written Texts Niraj Shrestha Ivan Vulić KU Leuven, Belgium KU Leuven, Belgium niraj.shrestha@cs.kuleuven.be ivan.vulic@@cs.kuleuven.be Abstract

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Beating the MLB Moneyline

Beating the MLB Moneyline Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information

M3039 MPEG 97/ January 1998

M3039 MPEG 97/ January 1998 INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039

More information

Leveraging Video Interaction Data and Content Analysis to Improve Video Learning

Leveraging Video Interaction Data and Content Analysis to Improve Video Learning Leveraging Video Interaction Data and Content Analysis to Improve Video Learning Juho Kim juhokim@mit.edu Shang-Wen (Daniel) Li swli@mit.edu Carrie J. Cai cjcai@mit.edu Krzysztof Z. Gajos Harvard SEAS

More information

Scaling Shrinkage-Based Language Models

Scaling Shrinkage-Based Language Models Scaling Shrinkage-Based Language Models Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, Abhinav Sethy IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights, NY 10598 USA {stanchen,mangu,bhuvana,sarikaya,asethy}@us.ibm.com

More information

Spoken Document Retrieval from Call-Center Conversations

Spoken Document Retrieval from Call-Center Conversations Spoken Document Retrieval from Call-Center Conversations Jonathan Mamou, David Carmel, Ron Hoory IBM Haifa Research Labs Haifa 31905, Israel {mamou,carmel,hoory}@il.ibm.com ABSTRACT We are interested in

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information