Open-Domain Name Error Detection using a Multi-Task RNN
|
|
|
- Erica Thompson
- 9 years ago
- Views:
Transcription
1 Open-Domain Name Error Detection using a Multi-Task RNN Hao Cheng Hao Fang Mari Ostendorf Department of Electrical Engineering University of Washington {chenghao,hfang,ostendor}@uw.edu Abstract Out-of-vocabulary name errors in speech recognition create significant problems for downstream language processing, but the fact that they are rare poses challenges for automatic detection, particularly in an open-domain scenario. To address this problem, a multi-task recurrent neural network language model for sentence-level name detection is proposed for use in combination with out-of-vocabulary word detection. The sentence-level model is also effective for leveraging external text data. Experiments show a 26% improvement in name-error detection F-score over a system using n-gram lexical features. 1 Introduction Most spoken language processing or dialogue systems are based on a finite vocabulary, so occasionally a word used will be out of the vocabulary (OOV), in which case the automatic speech recognition (ASR) system chooses the best matching in-vocabulary sequence of words to cover that region (where acoustic match dominates the decision). The most difficult OOV words to cover are names, since they are less likely to be covered by morpheme-like subword fragments and they often result in anomalous recognition output, e.g. REF: what can we get at Litanfeeth HYP: what can we get it leaks on feet While these errors are rare, they create major problems for language processing, since names tend to be important for many applications. Thus, it is of interest to automatically detect such error regions for additional analysis or human correction. Named entity recognition (NER) systems have been applied to speech output (Palmer and Ostendorf, 0; Sudoh et al., 06), taking advantage of local contextual cues to names (e.g. titles for person names), but as illustrated above, neighboring words are often affected, which obscures lexical cues to name regions. Parada et al. (11) reduce this problem somewhat by applying an NER tagger to a word confusion network (WCN) based on a hybrid word/fragment ASR system. In addition to the problem of noisy context, automatic name error detection is challenging because name errors are rare for a good recognizer. To learn the cues to name errors, it is necessary to train from the output of the target recognizer, so machine learning is faced with infrequent positive examples for which training data is very sparse. In addition, in an open domain system, automatically-learned lexical context features from one domain may be useless in another. In this paper, we address these general problems detecting rare events in an open-domain task specifically for name error detection. Prior work addressed the problem of skewed priors by artificially increasing the error rate by holding names out of the vocabulary (Chen et al., 13) or by factoring the problem into sentence-level name detection and OOV word detection (He et al., 14) (since OOV errors in general are more frequent than name errors). Sentence-level features are also shown to be more robust than local context in direct name error prediction (Marin, ). While these techniques provide some benefit, the use of discrete lexical context cues is sensitive to the limited amount of training data available. Our work leverages the factored approach, but improve performance by using a continuous-space sentence representation for predicting presence of a name. Specifically, we modify a recurrent neural network (RNN) language model (LM) to predict both the word sequence and a sentence-level name indicator. Combining the LM objective with name prediction provides a regularization effect in training that leads to improved sentence-level 737 Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages , Lisbon, Portugal, September. c Association for Computational Linguistics.
2 name prediction. The continuous-space model is also effective for leveraging external text resources to improve generalization in the open-domain scenario. The overall framework for speech recognition and baseline name error detection system is outlined in Section 2, and the multi-task (MT) RNN approach for sentence-level name prediction is introduced in Section 3. Experimental results for both sentence-level name detection and name error detection are presented in Section 4, demonstrating the effectiveness of the approach on test data that varies in its match to the training data. As discussed in Section, the sentence-level model is motivated by similar models for other applications. The paper summarizes key findings and discusses potential areas for further improvement in Section 6. 2 System Overview and Tasks The name error detection task explored in this work is a component in a bidirectional speechto-speech translation system for English to/from Iraqi Arabic with human-computer dialogue interaction for error resolution (Ayan et al., 13), developed during the DARPA BOLT project. The training data consists of a range of topics associated with activities of military personnel, including traffic control, military training, civil affairs, medical checkups, and so on. However, the system is expected to handle open-domain tasks, and thus the evaluation data covers a broader range of topics, including humanitarian aid and disaster relief, as well as more general topics such as sports, family and weather. The dialogues often contain mentions of names and places, many of which are OOV words to the ASR system. As illustrated in the previous section, ASR hypotheses necessarily have errors in OOV regions, but because specific names are infrequent, even in-vocabulary names can have these types of error patterns. Therefore, developing a robust name error detector for ASR hypotheses is an important component of the system to resolve errors and ambiguity. Detecting OOV errors requires combining evidence of recognizer uncertainty and anomalous word sequences in a local region. For name errors, lexical cues to names are also useful, e.g. a person s title, location prepositions, or keywords such as name. The baseline system for this work uses structural features extracted from a confusion network of ASR hypotheses plus ASR word confidence to represent recognizer uncertainty, and word n-gram context to the left and right of the target confusion network slot. These features are combined in a maximum entropy (ME) classifier trained to predict name errors directly. This is the same as the baseline used in (He et al., 14; Marin, ; Marin et al., ), but with a different ASR system. Training a classifier to predict whether a sentence has a name is easier than direct name error prediction, because the positive class is less rare, and it does not require recognizer output so more data can be used (e.g. speech transcripts without recognizer output, or written text). In addition, since the words abutting the name are less reliable in a recognition hypothesis, the information lost by working at the sentence level is minimal. The idea of using sentence-level name prediction is proposed in (He et al., 14), but in that work the sentence name posterior is a feature in the ME model (optionally with word cues learned by the sentence-level predictor). In our work, the problem is factored to use the acoustic confusibility and local word class features for OOV error prediction, which is combined with the sentence-level name posterior for name error prediction. In other words, only two features (posteriors) are used in training with the sparse name error prediction data. An ME classifier is then used to combine the two features to predict the wordlevel name error. The word-level OOV detector is another ME binary classifier; the full set of features used for the word-level OOV detector can be found in (Marin, ). The main innovation in this work is that we propose to use a multi-task RNN model for the sentence-level name prediction, where the training objective takes into account both the language modeling task and the sentence-level name prediction task, as described in the next section. 3 Multi-task Recurrent Neural Network The RNN is a powerful sequential model and has proven to be useful in many natural language processing tasks, including language modeling (Mikolov et al., ) and word similarity (Mikolov et al., 13c). It also achieves good results for a variety of text classification problems when combined with a convolutional neural network (CNN) (Lai et al., ). In this paper, we 738
3 Figure 1: The structure of the proposed MT RNN model, which predicts both the next word o(t) and whether the sentence contains a name y(t) at each time step. propose an MT RNN for the sentence-level name prediction task, which augments the word prediction in the RNN language model with an additional output layer for sentence-level name prediction. Formally, the MT RNN is defined over t = 1,..., n for a sentence of length n as: x(t) = Mz(t), h(t) = f (W x(t) + Rh(t 1)), o(t) = s (Uh(t) + b 1 ), y(t) = s (V h(t) + b 2 ). where z(t) R V is the 1-of-V encoding of the t-th word in the sentence; x(t) R d is the d- dimensional continuous word embedding corresponding to the t-th word; h(t) R k is a k- dimensional embedding that summarizes the word sequence through time t; and o(t) and y(t) are respectively the output layers for the language modeling task and the sentence-level prediction task. The parameters of the model (learned in multi-task training) include: M R d V, which is usually referred to as the word embedding matrix; projection matrices U R V d and V R 2 d ; and bias terms b 1, b 2 R d. f and s are respectively the sigmoid and softmax functions. Note that the word sequence associated with a sentence includes start and end symbols. The structure of the proposed MT RNN is shown in Fig. 1. At each time step, the hidden vector is used to predict both the next word and a sentence-level indicator of the presence of a name, providing a probability distribution for both variables. Thus, the hidden vector h t provides a continuous representation of the word history that emphasizes words that are important for predicting the presence of a name. The vector at time n can be thought of as a sentence embedding. The sentence-level output y t differs from the word-dependent label predictor typically used in named entity detection (or part-of-speech tagging) in that it is providing a sequentially updated prediction of a sentence-level variable, rather than a word-level indicator that specifies the location of a named entity in the sentence. The final prediction y n is used as a feature in the name error detection system. The sentence-level output y t does not always converge to y n gradually nor is it always monotonic, since the prediction can change abruptly (either positively or negatively) as new words are processed. The sentence-level variable provides a mechanism for capturing long distance context, which is particularly useful for speech applications, where both the name of interest and the words in its immediate context may be in error. The training objective is the combination of the log-likelihood of the word sequence and that of the sentence-level name prediction: n [(1 λ) log P (w(t) h(t)) + λ log P (y(t) h(t))], (1) t=1 where h(t) = [w(1),..., w(t 1)] and λ is the weight on the log-likelihood of the sentence-level name labels. Another way to train the model is to predict the sentence-level name label only at the end of the sentence, rather than at every time step. Preliminary experiment results show that this model has inferior performance. We argue that training with only the sentence-final name label output can result in unbalanced updates, i.e., information from the language modeling task is used more often than that from the sentence-level name prediction task, implying that balancing the use of information sources is an important design choice for multi-task models. The training objective is optimized using stochastic gradient descent (SGD). We also experiment with AdaGrad (Duchi et al., 11), which has shown to be more stable and converge faster for non-convex SGD optimization. Since language model training requires a normalization operation each time over the whole vocabulary (~60K) which is computationally intensive, we further speed up training by using noise contrastive esti- 739
4 mation (NCE) (Mnih and Teh, 12). For all models using NCE, we fix the number of negative samples to 0. All the weights are randomly initialized in the range of [ 0.1, 0.1]. The hidden layer size k is selected from {0, 0, 0} and the task mixing weight λ is select in {0.2, 0.4, 0.6, 0.8}, based on development set performance. We set the initial learning rate to 1 and 0.1 for SGD with and without Adagrad respectively. In our experiments, we observe that models trained with AdaGrad achieve better performance, so we only report the models with AdaGrad in this paper. At each training epoch, we validate the objective on the development set. The learning rate is reduced after the first time the development set loglikelihood decreases, and the whole training procedure terminates when the development set loglikelihood decreases for the second time. 4 Experiments 4.1 Data There are two types of datasets used in this paper: the BOLT dataset and a collection of Reddit discussions. The first dataset was collected during the DARPA TRANSTAC and BOLT projects. The ASR hypotheses of totally 7088 spoken sentences makes up of the training dataset (BOLT- Train) for both sentence-level name prediction and word-level name error detection. There are two development sets: Dev1 and Dev2. Dev1 is used for parameter tuning for all models, and Dev2 is used for training the ME-based word-level name error detector using the word-level OOV posterior and sentence-level name posterior. For RNN models, we tune the hidden layer size and the multitask weight λ; for the ME-based word-level name error detector, we tune the regularization parameter. Two test sets are used to evaluate sentencelevel name prediction (Test1) and word-level name error detection (Test2), based on the BOLT phase 2 and 3 evaluations, respectively. As shown in Table 1, the BOLT topics are categorized into three domains: TRANSTAC, HADR and General. The BOLT-Train, Dev1 and Dev2 sets contain only speech from the TRANSTAC domain, whereas the Test1 and Test2 sets contain all three domains. Detailed data split statistics and domain information are summarized in Table 2. Note that there are very few positive samples of name errors (roughly 1%), whereas for the Domain TRANSTAC HADR Topics Traffic Control, Facilities Inspection, Civil Affairs, Medical, Combined Training, Combined Operations Humanitarian Aid, Disaster Relief General Family, Gardening, Sports, Pets, Books, Weather, Language, Phone Table 1: Topics in different domains. Target Class BOLT- Train Dev1 Dev2 Test1 Test2 name sentences name errors OOVs Table 2: Data splits and statistics, including the percentage of sentences containing names (name sentences), the percentage of hypothesized words that are name errors (name errors), and the percentage of words that are OOVs (OOVs). sentence-level name prediction task and the wordlevel OOV prediction task, the data skewness is somewhat less severe (roughly 8% sentences with names in most of our data sets). In order to address the issue of domain mismatch between the training data and the broaddomain subsets of the test data, we collect text from Reddit.com which is a very active discussion forum where users can discuss all kinds of topics. Reddit has thousands of user-created and user-moderated subreddits, each emphasizing a different topic. For example, there is a general ASKREDDIT subreddit for people to ask any questions, as well as subreddits targeted for specific interests, like ASKMEN, ASKWOMEN, ASKSCIENCE, etc. Although the Reddit discussions are different from BOLT data, they have a conversational nature and names are often observed within the discussions. Therefore, we hypothesize that they can help improve sentencelevel name prediction in general. We collect data from 14 subreddits that cover different kinds of topics (such as politics, news, book suggestions) and vary in community size. The Stanford Name Entity Recognizer (Finkel et al., 0) is used to detect names in each sentence, and a sentencelevel name label is assigned if there are any names present. 1 The data are tokenized using Stanford 1 The Stanford Name Entity Recognizer achieves 82.3% F-score on the references of Dev1. Thus, it is expected to 740
5 Model TRANSTAC HADR General All BOW + ME RNN + ME SG + ME ST RNN MT RNN Table 3: F-scores on Test1 for sentence-level name prediction for models trained on BOLT data. CoreNLP tools (Manning et al., 14) and are lowered cased after running the Stanford Name Entity Recognizer. In total, we obtain 13K sentences containing at least one name and 360K sentences without names. 4.2 Sentence-level Name Prediction To evaluate the effectiveness of the proposed MT RNN model, we first apply it to the sentencelevel name prediction task on ASR hypotheses. For this task, each sample corresponds to a hypothesized sentence generated by the ASR system and a ground-truth label indicating whether there are names in that sentence. We compare the MT RNN with four contrasting models for predicing whether a sentence includes a name. BOW + ME. An ME classifier using a bag-ofwords (BOW) sentence representation. SG + ME. An ME classifier is used with the sentence embedding as features, where the embedding uses the skip-gram (SG) model (Mikolov et al., 13a) to get word-level embeddings (with window size ) and sentence embeddings are composed by averaging embeddings of all words in the sentence. RNN + ME. A simple RNN LM is trained (i.e., λ in (1) is set to 0), and the hidden layer for the last word in a sentence provides a sentencelevel embedding that is used in an ME classifier trained to predict the sentence-level name label. ST RNN. A single-task (ST) RNN model is trained to directly predict the sentence-level name for each word (i.e., λ in (1) is set to 1). All models are trained on either BOLT-Train or BOLT-Train + Reddit, and tuned on Dev1 including the dimension of embeddings, l 2 regularization parameters for the ME classifiers, and so on. The domain-specific F-scores on Test1 are summarized in Table 3 for training only with the BOLT give useful labels for the Reddit data. Figure 2: F-scores on Test1 for sentence-level name prediction for models trained with and without Reddit data. The number in the parenthesis indicates the portion of the domain in the data. data. The proposed MT RNN achieves the best results on the TRANSTAC and HADR domains, and it has significant overall performance improvement over all baseline models. Not surprisingly, the two approaches that used unsupervised learning to obtain a sentence-level embedding (SG + ME, RNN + ME) have the worst performance on the TRANSTAC and HADR domains, with both having very low precision (6-18%), with best precision on the general domain. The BOW + ME and the ST RNN have similar performance in terms of overall F-score, but the BOW + ME model is much better on the TRANSTAC domain, whereas the ST RNN achieves the best results on the General domain. The main failing of the BOW + ME model is in recall on the General domain, though it also has relatively low recall on the HADR domain. Note that the General domain only accounts for % of the Test1 set, so the ST RNN gets lower F-score overall compared with the MT RNN, which performs best on the other two domains (90% of the Test1 set). On all domains, the MT RNN greatly improves precision compared to the ST RNN (at the expense of recall), and it improves recall compared to the BOW + ME approach (at the expense of precision). In order to study the effectiveness of using external data, we also train all models on an enlarged training set including extra name-tagged sentences from Reddit. The improvement for each domain due to also using the Reddit data is shown in Fig. 2 for the three best configurations. (There is no benefit in the unsupervised learning cases.) Com- 741
6 pared with training only on the BOLT data, all three of these models get substantial overall performance improvement by utilizing the external domain training data. Since the external training data covers mostly topics in the General domain, the performance gain in that domain is most significant. For the MT RNN and BOW + ME classifiers, the additional training data primarily benefits recall, particulary for the General domain. In contrast, the ST RNN sees an improvement in precision for all domains. One reason that the added training text also benefits the models on the TRANSTAC domain is that over 90% of the words in the speech recognizer vocabulary are not seen in the small set of name-labeled speech training data, which means that the embeddings for these words are random in the RNNs trained only this data and the BOW + ME classifier will never use these words. 4.3 Word-level Name Error Detection We next assess the usefulness of the resulting MT RNN model for word-level name error detection on ASR hypotheses. Here, several word-level name error detection approaches are compared, including direct name error prediction and factored name/oov prediction approaches. OOV thresholding. This system simply uses the OOV prediction, but with the posterior threshold tuned for word-level name error based on the Dev1 set. Word Context. This is the baseline ME system described in Section 2 and also used as a baseline in (He et al., 14; Marin et al., ), which directly predicts the word-level name error using WCN structural features, current word Brown class, and up-to trigram left and right word context information. Word Class. This system, from (Marin et al., ), also directly predicts the word-level name error, but replaces the word n-gram features with a smaller number of word class features to address the sparse training problem. The word classes are based on seed words learned from sentence-level name prediction which are expanded to classes using a nearest neighbor distance with RNN embeddings trained on the BOLT data. Word Context + OOV. This system uses an l 2 regularized ME classifier to predict the wordlevel name error using two posteriors as features: a word-dependent posterior from the Word Context system and one from the OOV detection system. MT RNN + OOV. This system also uses an l 2 regularized ME classifier to predict the wordlevel name error using two posteriors as features: the same word-dependent OOV posterior as above, and the posterior from the sentencelevel name prediction using the MT RNN model described in Section 4.2, which is constant for all positions in the sentence. All of the above models are trained on BOLT-Train and tuned on Dev1. The ME classifier used in the Word Context + OOV and the MT RNN + OOV systems are trained on Dev2, with regularization weights and decision boundaries tuned on Dev1. Name error detection results (F-scores) are summarized in Table 4. The three systems that use discrete, categorical lexical context features (word or word class context) in direct name error prediction have worse results than the OOV thresholding approach overall, as well as on the TRANSTAC subset. Presumably this is due to over-fitting associated with the lexical context features. The factored MT RNN + OOV system, which uses a continuous-space representation of lexical context, achieves a gain in overall performance compared to the other systems and a substantial gain in performance on the TRANSTAC domain on which it is trained. Using the Reddit data further improves the overall F-score, with a slight loss on the TRANSTAC subset but substantial gains in the other domains. Although the performance improvement in the general domain is relatively small compared with sentence-level name prediction, utilizing external data makes the resulting representation more transferable across different domains and tasks. The best MT RNN + OOV system obtains 26% relative improvement over the baseline Word Context system (or 17% improvement over the simple OOV thresholding approach). Looking at performance trade-offs in precision and recall, we find that the use of the RNN systems mainly improves precision, though there is also a small gain in recall overall. The added training data benefits recall for all domains, with a small loss in precision for the TRANSTAC and HADR sets. The use of the OOV posterior improves precision but limits recall, particularly for the general domain where recall of the OOV posterior alone 742
7 Model TRANSTAC HADR General All OOV thresholding Word Context Word Class Word Context + OOV MT RNN + OOV MT RNN + OOV Table 4: F-scores on Test2 for word-level name error prediction. MT RNN is trained on BOLT- Train, whereas MT RNN is trained on BOLT- Train + Reddit. is only 18% vs. 2% for the word context model with the OOV posterior information. To better understand some of the challenges of this task, consider the following examples (R=reference, H=hypothesis): R1: i m doing good my name is captain rodriguez H1: i m doing good my name is captain road radios R2: well it s got flying lizards knights and zombies and shit H2: well it s gotta flying lives there it s nights and some bees and shia R3: i live in a city called omaha H3: i live in a city called omar ASR tokens associated with name errors are underlined and italicized; tokens associated with non-name OOV errors are simply underlined. Name errors have a similar character to OOV erors in that they often have anomalous word sequences in the region of the OOV word (examples 1 and 2), which is why the OOV posterior is so useful. However, too much reliance on the OOV posterior leads to wrongly detecting general OOV errors as name errors ( lizards and zombies in example 2) and missed detection of name errors where the confusion network cues indicate a plausible hypothesis ( omaha in example 3). Examples 1 and 3 illustrate the importance of lexical cues to names ( name... captain, city called ), but word-based cues are unreliable for the systems trained only on the small amount of domain-specific data. Leveraging the reddit data allowed the MT RNN system to detect the error in example 1 (HADR domain) that was missed by the word context system. Example 3 was only detected by the word context system when no OOV posterior is used. Though this example was from the General domain, city names represent an important error class in the TRANSTAC data, so the term city is learned as a useful cue. 4.4 Sentence Embedding As discussed in Section 3, we postulate that by modeling words in a sentence in sequential order and simutaneously predicting sentence categorical information, the resulting hidden vector of the last word should be a good representation of the whole sentence, i.e., a sentence embedding. To provide support for this hypothesis and show the impact of external data, we present the sentence embeddings learned by the different RNN variants on Test2 using the t-distributed Stochastic Neighbor Embedding (t-sne) visualization (van der Maaten and Hinton, 08) in Fig. 3. Four models are compared, including three RNNs trained on the BOLT data, and the MT RNN model trained on BOLT + Reddit data. Note the visualization method t-sne does not take the label information into account during the learning. As we can see in Fig. 3a, the positive and negative sentence embeddings learned by RNN LM are randomly scattered in the space, indicating that embeddings learned via unsupervised training (e.g., solely on word context) may fail to capture the sentence-level indicators associated with a particular task, in this case presence of a name. When the results are ploted in terms of domains, the embeddings are similarly broadly scattered there is no obvious topic representation in the embeddings. When comparing Fig. 3b to Fig. 3a, which is associated with the RNN using a sentence-final name indicator, we can see that a lot of positive vectors have moved to the bottom-left of the space, though there is still a relatively large overlap between positive and negative embeddings. In Fig. 3c, corresponding to the word-level MT RNN, there forms a separable subgroup of positive embeddings and the overlapping seems to be reduced as well in contrast to Figs. 3a and 3b. Finally, most of the positive sentence embeddings produced by the MT RNN model trained with external Reddit data gather at the botton-left of Fig. 3d. In general, the overlap between positive and negative sentences decreases from single task models to multi-task model. The external data make the proposed model produce more well-shaped groupings of sentence embeddings. 743
8 negative positive negative positive (a) RNN LM. (b) ST RNN. negative positive 2 negative positive (c) MT RNN. (d) MT RNN using extra Reddit data. Figure 3: t-sne visualization of sentence embeddings learned from different RNN models. Related Work Recently, due to the success of continuous representation methods, much work has been devoted to studying methods for learning word embeddings that capture semantic and syntactic meaning. Although these word embeddings are shown to be successful in some word-level tasks, such as word analogy (Mikolov et al., 13b; Jeffery Pennington, 14) and semantic role labeling (Collobert and Weston, 08), it is still an open question how best to compose the word embeddings effectively and make use of them for text understanding. Recent work on learning continuous sentence representations usually compose the word embeddings using either a convolutional neural network (CNN), a tree-structured recursive NN, or a variant of an RNN. A typical CNN-based structure composes word embeddings in a hierachical fashion (alternating between convolutional layers and pooling layers) to form the continuous sentence representation for sentence-level classification tasks (Kim, 14; Kalchbrenner et al., 14; Lai et al., ). These models usually build up the sentence representation directly from the lexical surface representation and rely on the pooling layer to capture the dependencies between words. Another popular method for continuous sentence representation is based on the recursive neural network (Socher et al., 12; Socher et al., 13; Tai et al., ). These models use a tree structure to compose a continuous sentence representation and have the advantages of capturing more fine-grained sentential structure due to the use of parsing trees. Note that the RNN-based sequential modeling used in this paper can be viewed as a linearized tree-structure model. In this paper, we train the neural network model with a multi-task objective reflecting both the probability of the sequence and the probability that the sequence contains names. The general idea of multitask learning dates back to (Caruana, 1997), and is shown to be effective recently for neural network models in different natural language 744
9 processing tasks. Collobert and Weston (08) propose a unified deep convolutional neural network for different tasks by using a set of taskindependent word embeddings together with a set of task-specific word embeddings. For each task, it uses a unique neural network with its own layers and connections. Liu et al. () propose a different neural network structure for search query classification and document retrieval where lowerlevel layers and connections are all shared but the high-level layers are task-specific. For tasks considered in (Collobert and Weston, 08) and (Liu et al., ), training samples are task-dependent. Thus, both models are trained following the SGD manner by alternating tasks for each training samples with task-dependent training objectives. In this paper, we combine the language modeling task with the sentence-level name prediction task, and each training sample has labels for both tasks. Therefore, the SGD training can be done with the weighted sum of the task-specific objectives for each training sample, and the language model objective can be thought of as a regularization term. Similar settings of multitask learning for neural network models are employed in phoneme recognition for speech (Seltzer and Droppo, 13) and speech synthesis (Wu et al., ) as well, but both of them use equal weights for all tasks. 6 Conclusion In this paper, we address an open domain rare event detection problem, specifically, name error detection on ASR hypotheses. To alleviate the data skewness and domain mismatch problems, we adopt a factored approach (sentencelevel name prediction and OOV error prediction) and propose an MT RNN for sentence-level name prediction. The factored model is shown to be more robust to the sparse training problem. For the problem of sentence-level name prediction, the proposed method of combining the language modeling and sentence-level name prediction objectives in an MT RNN achieves the best results among studied models for the domain represented by the training data as well as in the open-domain scenario. Visualization of sentence-level embeddings show how both the multi-task and the wordlevel name label update are important for achieving good results. The use of unrelated external training text (which can only be used in sentencelevel name prediction) improves all models, particularly for the highly-mismatched general domain data. The improvement in performance associated with using the external text is much smaller on the word-level name error detection task than on the sentence-level name prediction. This seems to be due to the high weight learned for the word-level posterior. For future work, it is worthwhile looking into whether continuous word and sentence representations can be combined in the name error detector to achieve further improvement. In this work, we proposed a model for learning sentence representations that might be useful for other sentence classification tasks, such as review and opinion polarity detection, question type classification and so on. As discussed in Section, there are other models that have been found useful for obtaining continuous sentence embeddings. It would be of interest to investigate whether other structures are more or less sensitive to data skew and/or useful for incorporating multi-domain training data. Acknowledgments The authors would like to thank Alex Marin, Ji He, Wen Wang and all reviewers for useful discussion and comment. This material is based on work supported by DARPA under Contract No. HR C-0016 (subcontract ). Any opinions, findings and conclusions or recommendations expressed herein are those of the authors and do not necessarily reflect the views of DARPA. References N.F. Ayan, A Mandal, M. Frandsen, Jing Zheng, P. Blasco, A Kathol, F. Bechet, B. Favre, A Marin, T. Kwiatkowski, M. Ostendorf, L. Zettlemoyer, P. Salletmayr, J. Hirschberg, and S. Stoyanchev. 13. Can you give me another word for hyperbaric? Improving speech translation using targeted clarification questions. In Proc. ICASSP, pages Rich Caruana Multitask learning. Machine Learning, 28, July. Wei Chen, Sankaranarayanan Ananthakrishnan, Rohit Prasad, and Prem Natarajan. 13. Variable-span out-of-vocabulary named entity detection. In Proc. Interspeech, pages Ronan Collobert and Jason Weston. 08. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proc. ICML. 74
10 John Duchi, Elad Hazan, and Yoram Singer. 11. Adaptive subgradient methods for online learning and stochastic optimization. Machine Learning Research, 12. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 0. Incorporating non-local information into information extraction systems by gibbs sampling. In Proc. ACL. Ji He, Alex Marin, and Mari Ostendorf. 14. Effective data-driven feature learning for detecting name errors in automatic speech recognition. In Proc. SLT. Christopher Manning Jeffery Pennington, Richard Socher. 14. Glove: Global vectors for word representations. In Proc. EMNLP. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 14. A convolutional neural network for modelling sentences. In Proc. ACL. Yoon Kim. 14. Convolutional neural networks for sentence classification. In Proc. EMNLP. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao.. Recurrent convolutional neural networks for text classification. In Proc. AAAI. Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh, and Ye-Yi Wang.. Representation learning using multi-task deep neural networks for semantic classification and information retrieval. In Proc. NAACL. Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David Mc- Closky. 14. The Stanford CoreNLP natural language processing toolkit. In Proc. ACL. Alex Marin, Mari Ostendorf, and Ji He.. Learning phrase patterns for asr error detection using semantic similarity. In Proc. Interspeech. Alex Marin.. Effective use of cross-domain parsing in automatic speech recognition and error detection. Ph.D. thesis, University of Washington. Andriy Mnih and Yee Whye Teh. 12. A fast and simple algorithm for training neural probabilistic language models. In Proc. ICML. David Palmer and Mari Ostendorf. 0. Improving out-of-vocabulary name resolution. Computer Speech and language, 19(1): Carolina Parada, Mark Dredze, and Frederick Jelinek. 11. OOV sensitive named-entity recognition in speech. In Proc. Interspeech, pages Michael L. Seltzer and Jasha Droppo. 13. Multitask learning in deep neural networks for improved phoneme recognition. In Proc. ICASSP. Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. 12. Semantic compositionality through recursive matrix-vector space. In Proc. EMNLP-CoNLL. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 13. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc. EMNLP. Katsuhito Sudoh, Hajime Tsukada, and Hideki Isozaki. 06. Incorporating speech recognition confidence into discriminative named entity recognition of speech. In Proc. ACL. Kai Sheng Tai, Richard Socher, and Christopher D. Manning.. Improved semantic representations from tree-structured long short-term memory networks. In Proc. ACL. Laurens van der Maaten and Geoffrey Hinton. 08. Visualizing data using t-sne. Machine Learning Research, 9, November. Zhizheng Wu, Cassia Valentini-Botinhao, Oliver Watts, and Simon King.. Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis. In Proc. ICASSP. Tomá s Mikolov, Martin Karafiát, Luká s Burget, Jan Cernocký, and Sanjeev Khudanpur.. Recurrent neural network based language model. In Proc. Interspeech. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 13a. Distributed representations of words and phrases and their compositionality. In Proc. NIPS. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 13b. Distributed representations of words and phrases and their compositionality. In Proc. NIPS. Tomá s Mikolov, Wen tau Yih, and Geoffrey Zweig. 13c. Linguistic regularities in continuous space word representations. In Proc. NAACL-HLT. 746
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Leveraging Frame Semantics and Distributional Semantic Polieties
Leveraging Frame Semantics and Distributional Semantics for Unsupervised Semantic Slot Induction in Spoken Dialogue Systems (Extended Abstract) Yun-Nung Chen, William Yang Wang, and Alexander I. Rudnicky
Package syuzhet. February 22, 2015
Type Package Package syuzhet February 22, 2015 Title Extracts Sentiment and Sentiment-Derived Plot Arcs from Text Version 0.2.0 Date 2015-01-20 Maintainer Matthew Jockers Extracts
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu
Strategies for Training Large Scale Neural Network Language Models
Strategies for Training Large Scale Neural Network Language Models Tomáš Mikolov #1, Anoop Deoras 2, Daniel Povey 3, Lukáš Burget #4, Jan Honza Černocký #5 # Brno University of Technology, Speech@FIT,
A Look Into the World of Reddit with Neural Networks
A Look Into the World of Reddit with Neural Networks Jason Ting Institute of Computational and Mathematical Engineering Stanford University Stanford, CA 9435 [email protected] Abstract Creating, placing,
Building A Vocabulary Self-Learning Speech Recognition System
INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes
Recurrent Convolutional Neural Networks for Text Classification
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Recurrent Convolutional Neural Networks for Text Classification Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao National Laboratory of
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies
CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies Hang Li Noah s Ark Lab Huawei Technologies We want to build Intelligent
Learning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview
Linguistic Regularities in Continuous Space Word Representations
Linguistic Regularities in Continuous Space Word Representations Tomas Mikolov, Wen-tau Yih, Geoffrey Zweig Microsoft Research Redmond, WA 98052 Abstract Continuous space language models have recently
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Gated Neural Networks for Targeted Sentiment Analysis
Gated Neural Networks for Targeted Sentiment Analysis Meishan Zhang 1,2 and Yue Zhang 2 and Duy-Tin Vo 2 1. School of Computer Science and Technology, Heilongjiang University, Harbin, China 2. Singapore
Opinion Mining with Deep Recurrent Neural Networks
Opinion Mining with Deep Recurrent Neural Networks Ozan İrsoy and Claire Cardie Department of Computer Science Cornell University Ithaca, NY, 14853, USA oirsoy, [email protected] Abstract Recurrent
Analysis and Synthesis of Help-desk Responses
Analysis and Synthesis of Help-desk s Yuval Marom and Ingrid Zukerman School of Computer Science and Software Engineering Monash University Clayton, VICTORIA 3800, AUSTRALIA {yuvalm,ingrid}@csse.monash.edu.au
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
Effective Self-Training for Parsing
Effective Self-Training for Parsing David McClosky [email protected] Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - [email protected]
A Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi [email protected] Satakshi Rana [email protected] Aswin Rajkumar [email protected] Sai Kaushik Ponnekanti [email protected] Vinit Parakh
Chapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. [email protected]
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang [email protected] http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
Recurrent Neural Networks
Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Machine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Semi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Introduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
Distributed Representations of Sentences and Documents
Quoc Le Tomas Mikolov Google Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043 [email protected] [email protected] Abstract Many machine learning algorithms require the input to be represented as
BEER 1.1: ILLC UvA submission to metrics and tuning task
BEER 1.1: ILLC UvA submission to metrics and tuning task Miloš Stanojević ILLC University of Amsterdam [email protected] Khalil Sima an ILLC University of Amsterdam [email protected] Abstract We describe
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore
Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
PhD in Computer Science and Engineering Bologna, April 2016. Machine Learning. Marco Lippi. [email protected]. Marco Lippi Machine Learning 1 / 80
PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi [email protected] Marco Lippi Machine Learning 1 / 80 Recurrent Neural Networks Marco Lippi Machine Learning
The Applications of Deep Learning on Traffic Identification
The Applications of Deep Learning on Traffic Identification Zhanyi Wang [email protected] Abstract Generally speaking, most systems of network traffic identification are based on features. The features
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
Specialty Answering Service. All rights reserved.
0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...
Distributed Representations of Words and Phrases and their Compositionality
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov [email protected] Ilya Sutskever [email protected] Kai Chen [email protected] Greg Corrado [email protected] Jeffrey
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Issues in Information Systems Volume 16, Issue IV, pp. 30-36, 2015
DATA MINING ANALYSIS AND PREDICTIONS OF REAL ESTATE PRICES Victor Gan, Seattle University, [email protected] Vaishali Agarwal, Seattle University, [email protected] Ben Kim, Seattle University, [email protected]
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Deep Learning for Chinese Word Segmentation and POS Tagging
Deep Learning for Chinese Word Segmentation and POS Tagging Xiaoqing Zheng Fudan University 220 Handan Road Shanghai, 200433, China zhengxq@fudaneducn Hanyang Chen Fudan University 220 Handan Road Shanghai,
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation
Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation
Analecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Automatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
D2.4: Two trained semantic decoders for the Appointment Scheduling task
D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive
Deep learning applications and challenges in big data analytics
Najafabadi et al. Journal of Big Data (2015) 2:1 DOI 10.1186/s40537-014-0007-7 RESEARCH Open Access Deep learning applications and challenges in big data analytics Maryam M Najafabadi 1, Flavio Villanustre
Microblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection
CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. [email protected] 2 3 Object detection
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
LSTM for Punctuation Restoration in Speech Transcripts
LSTM for Punctuation Restoration in Speech Transcripts Ottokar Tilk, Tanel Alumäe Institute of Cybernetics Tallinn University of Technology, Estonia [email protected], [email protected] Abstract
Applications of Deep Learning to the GEOINT mission. June 2015
Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics
2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
A Systematic Cross-Comparison of Sequence Classifiers
A Systematic Cross-Comparison of Sequence Classifiers Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko Bar-Ilan University, Computer Science Department, Israel [email protected], [email protected],
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Probabilistic topic models for sentiment analysis on the Web
University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
THE THIRD DIALOG STATE TRACKING CHALLENGE
THE THIRD DIALOG STATE TRACKING CHALLENGE Matthew Henderson 1, Blaise Thomson 2 and Jason D. Williams 3 1 Department of Engineering, University of Cambridge, UK 2 VocalIQ Ltd., Cambridge, UK 3 Microsoft
Manifold Learning with Variational Auto-encoder for Medical Image Analysis
Manifold Learning with Variational Auto-encoder for Medical Image Analysis Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill [email protected] Abstract Manifold
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Transition-Based Dependency Parsing with Long Distance Collocations
Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,
Which universities lead and lag? Toward university rankings based on scholarly output
Which universities lead and lag? Toward university rankings based on scholarly output Daniel Ramage and Christopher D. Manning Computer Science Department Stanford University Stanford, California 94305
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016 Previously, end-to-end.. Dog Slide credit: Jose M 2 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
Lecture 6: Classification & Localization. boris. [email protected]
Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
KEITH LEHNERT AND ERIC FRIEDRICH
MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features
Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering
The Delicate Art of Flower Classification
The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC [email protected] Note: The following is my contribution to a group project for a graduate machine learning
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
