A Deep Graphical Model for Spelling Correction
|
|
|
- Ashley Jackson
- 10 years ago
- Views:
Transcription
1 A Deep Graphical Model for Spelling Correction Stephan Raaijmakers a a TEXAR Data Science, Vijzelgracht 53C 1017 HP Amsterdam Abstract We propose a deep graphical model for the correction of isolated spelling errors in text. The model is a deep autoencoder consisting of a stack of Restricted Boltzmann Machines, and learns to associate errorfree strings represented as bags of character n-grams with bit strings. These bit strings can be used to find nearest neighbor matches of spelling errors with correct words. This is a novel application of a deep learning semantic hashing technique originally proposed for document retrieval. We demonstrate the effectiveness of our approach for two corpora of spelling errors, and propose a scalable correction procedure based on small sublexicons. 1 Introduction In this paper, we investigate the use of deep graphical models for the purpose of context-free spelling correction in text: spelling correction in isolated words. Spelling correction is a well-studied subject that has been on the agenda of the NLP community for decades ([3]). Nowadays, new applications arise in the contexts of big data, search engines and user profiling. For instance, in product data cleansing ([4, 19]), where large quantities of manually entered product data need to be standardized, accurate detection and handling of misspellings is important. Search engines like Google proactively perform spelling correction on user queries ([5, 15, 17, 25]). Stylometric user profiling in forensic situations may benefit from accurate estimates of the amount and type of spelling errors a person makes, e.g. for authorship verification (e.g. [13, 23]). The approach we propose is an application of a recent neural network technique that was proposed by Salakhutdinov and Hinton ([20]) for document hashing and retrieval. This technique, semantic hashing, attempts to compress documents into compact bit strings, after which document retrieval becomes an efficient and accurate operation on binary hash codes. In Section 2, we outline this approach, and describe how it can be used for spelling correction, which we view as a form of document retrieval at word level. In Section 3, we describe a number of experiments on two corpora of spelling errors. Results are discussed in Section 4. 2 Deep graphical models Deep graphical models have emerged in the machine learning community as hierarchically structured, powerful learning methods that perform deep learning : hidden layer information of subordinate layers is exploited as feature input by higher layers, the idea being that every layer computes features of its input (hence, higher levels compute meta-features, features of features ). This higher-order type of information is generally held to capture deep or semantic aspects of input data, and seems to correlate with the human brain architecture for cognitive processing ([1]). Deep learning has been applied to challenging problems in vision and speech and language processing (see e.g. [1, 11, 21]). Frequently, deep learning architectures are composed of stacks of Restricted Boltzmann Machines (RBM s): generative stochastic networks that learn probability distributions over inputs. Boltzmann Machines are stochastic neural networks with a layer of
2 visual neurons (input units) and a layer of hidden neurons. Restricted Boltzmann Machines have no connections from visible units (inputs) to visible units, or from hidden units to hidden units, making them bipartite graphs (see Figure 1). h 0 h 1 h 2 v 0 v 1 v 2 v 3 Figure 1: A Restricted Boltzmann Machine. Often, the neurons are binary stochastic neurons than can be in two states: on or off. Whether a neuron is on depends (via a sigmoid function) on node-specific properties (like a bias), and the weighted connections the node has with the other neurons in the network, as well as the states of those other neurons. The energy of a RBM at any time is given by E(v, h) = h W v + b v + c h, with W the weight matrix, and b and c bias vectors of the visible (v) and hidden (h) neurons. Training a Boltzmann machine means to feed input to the network through the visible neurons, and updating weights and biases with the purpose to minimize the energy of the network. Efficient training methods for RBM s have been proposed by [8, 9], based on alternating Gibbs sampling of the hidden units based on the visible units, and vice versa. In [20], an application of a stack of RBM s is presented for document hashing. The stack is trained to associate training documents with bit strings (binary codewords), which can be interpreted as memory addresses, and allow for finding similar test documents with fast and simple bit shift and Hamming distance operations. First, training proceeds in an unsupervised, greedy manner, mapping vector representations of documents to lower-dimensional real-valued vectors. After this stage, these output vectors are converted into binary vectors, which are used to reconstruct the original training documents. This is done by pairing the feedforward phase (generating bit strings) with a feedbackward phase that predicts the original training input from the binary vectors. Reconstruction errors are minimized with gradient-descent backpropagation in a supervised fine-tuning stage. Finally, the stack of finetuned RBM s can be used for the prediction of bit strings for arbitrary documents based on the training on the unlabeled, original training documents. Systems of this type are called autoencoders ([10]): they learn how to compress data into low-dimensional codewords. In the next subsection, we describe the details of a similar architecture we use in our experiments to handle the problem of spelling correction in isolated words. 2.1 Model architecture for spelling correction Our proposal is based on the idea of treating words as documents, and spelling correction as a form of document retrieval: retrieving the best matching correct spelling for a given input. Our hypothesis is that semantic hashing-based document matching can be applied here too, even if the documents are tiny compared to normal documents. In our approach, bit strings are learned from training words (correct spellings), and a corresponding dictionary of bit strings paired with correct spellings is derived. For queries, the model makes a bit string prediction, which is then matched against the bit string dictionary derived from the training data. The bit string with closest Hamming distance to the predicted bit string produces the proposed spelling for the query word. This approach is illustrated in Figure 2. We expand words to quasi-documents using a bag of character n-grams representation. An example is (bigrams) spelling: s,sp,pe,el,li,in,ng,g Using character n-grams allows for approximate matching of strings, since partial matches are possible. Character n-grams have been found to be strong features for many applications in text analytics, such as polarity classification of sentiment in documents (e.g. [18]). The bag-representation discards order, and usually contains frequencies of elements (like words, or, in our case, character n-grams) 1. Like [20], we 1 The unordered character n-gram representation may lead to discriminative difficulties in the rare case of two words that share the same n-grams, but differ from each other in the position of some n-grams. Adding position indexes to n-grams may prove beneficiary
3 Training data spelling terrific universal Deep Learning Model + bit strings Matching Query speling terific univrsal Predicted bit strings for query Figure 2: System setup. A model for the prediction of bit strings is learned from correctly spelled words. The closest match of learned bit strings with predictions yields the correction of the input. used a uniform model architecture in our experiments, with the following characteristics. We stack three primary RBM s with binary hiddden units, which have a (N (input) (hidden)) ( (input) (hidden)) ( (input) 128 (output)) structure. Here, N is the dimension of the input layer (the number of features), which we set to be min(n f, 1000), with N f the amount of different features in the training data. We generate bit strings of length 128, a value that was determined on the basis of some experiments on a fraction of the training data. These RBM s are connected to their inverse counterparts, in order to reconstruct the original training data from the bit string predictions. The graphic structure of the total, unrolled model is depicted in Figure 3. RBM 1 RBM 2 RBM 3 RBM 3 RBM 2 RBM 1 input input N N Figure 3: Model architecture. Three RBM s are pre-trained, and then reversed ( unrolled ). The feature representation of the input layer then consists of a vector of N log(1 + c f ) values, with c f the raw frequency of feature f in the bag of character n-grams derived for an input word. In our setup, the three first Boltzmann machines are pretrained for 500 epochs. Subsequently, the chain is unrolled (i.e. put in reverse mode), and the training data is reconstructed (input ) from the predicted bit strings. This is followed by a gradient descent backpropagation phase (100 epochs) that feeds back the measured reconstruction error (the error between original training data and reconstructed, predicted training data) into the chain, and optimizes the various weights. Finally, after backprogation, the three trained and optimized RBM s are stored as the final model. Predicted values by the model are discretized to binary values by simply thresholding them (values greater than or equal to.1 becoming 1, following [20], who mention that RBM s have asymmetric energy functions that tend to produce values closer to zero than to 1). During training, the learning rate was set to.1, and no momentum nor weight decay was used. A momentum of.9 was used for backpropagation. No systematic grid search was attempted due to the rather unfavorable time complexity of the training procedure. 2 Our approach differs in some respects from the approach by Salakhutdinov and Hinton [20]. Unlike their setup, which uses RBM s based on Constrained Poisson Models, we used standard Restricted Boltzmann Machines. In addition, we did not apply any optimization techniques to enhance the binarization of the output predictions. In [20], deterministic Gaussian noise was added to the inputs received by every node in the chain, in order to produce better binary codes. We merely relied on backpropagation in such cases, with the position indices being string parts of the n-grams, e.g. ab 1, bc 2. 2 According to [20], deep graphical models are somewhat insensitive to perturbations of hyperparameters.
4 for bit string optimization after the initial bit string predictions were made. 3 Experiments We tested the deep graphical model on two publicly available corpora of spelling errors: a corpus of spelling errors in Tweets 3, and a Wikipedia corpus of edit errors 4. The Tweet corpus consists of 39,172 spelling errors, their correction, and the edit steps to be performed to transform the error to the correct word form (replacement, insertion, deletion). The corpus has 2,466 different correct word types, which implies on average 15.9 errors per word. The Wikipeda corpus consists of 2,455 errors coupled to correct word forms (a total of 1,922 different correct word types), which means on average 1.3 errors per word. We performed two types of experiments. The first experiment is a small lexicon experiment, in which we built for separate portions of errors a model of the corresponding ground truth corrections. For both corpora, we took 10 equal-sized test data splits consisting of spelling errors, and the corresponding ground truth, built models for the ground truth words, and tested the models on the 10 test splits. We indexed every word for the presence of character bigrams and trigrams, on the basis of a feature lexicon of character n-grams derived from the training data. This lexicon consists of the 1,000 top character n-grams ranked by inverse document frequency, treating the bag of character n-grams of a word as a document. Next, in order to assess the impact of lexicon size on correction accuracy, we built a model of a larger amount of ground truth data, and tested the model on the same test data splits we used in the first experiment. We used the first 5,000 words of the Twitter corpus, and the first 500 words of the Wikipedia corpus for training, after which we tested the model on the first 5 test splits of the corresponding test data. For both corpora, this means a considerable increase in lexicon size. To speed up training of the model, we limited the feature lexicon to the 500 topranking character n-grams. Details of the corpus size, and the training and test splits are listed in Table 1. Corpus Size Size of training splits Size of test splits Splits used Lexicon size Experiment 1 Twitter 39,172 1,000 1, (the first 10) 1,000 Wikipedia 2, (the first 10) 1,000 Experiment 2 Twitter 39,172 5,000 1, (the first 5) 500 Wikipedia 2, (the first 5) 500 Table 1: Characteristics of the data used in the experiments. We report correction accuracy for different minimal word lengths, varying from 4, 6, 8 to 10 characters. Since we use character n-grams as features, we hypothesize that our approach works better for relatively longer words, which produce larger bags of character n-grams. The deep graphical model was implemented in Python, with the Oger toolbox 5. As a baseline, we used the well-known Unix aspell spelling correction program. While this comparison is not entirely sound (for one thing, we did not feed aspell with the same lexicon data as the RBM), it shows the baseline performance on our data with a standard, widely used and off-the-shelf large-vocabulary tool. 4 Results and discussion Results are presented in Figure 4 and are summarized in Table 2. The results demonstrate effective error correction capability for words longer than 6 characters, and show an overall performance well above baseline, 3 published by Eiji Aramaki. 4 ROGER/corpora.html 5
5 with the exception of the extended lexicon for Wikipedia. Yet, it is clear that, in general, correction accuracy linearly decreases with the size of the lexicon. The effect is stronger for the Wikipedia corpus than for the Twitter corpus. In fact, for the latter, the use of the large lexicon becomes an advantage for long words of 9 characters and more. Especially for shorter words in both corpora, the increase in lexicon size seems to lead to less accurate models. While we cannot rule out effects of non-optimal hyperparameters, we propose to Twitter corpus Wikipedia corpus Small lexicon Extended lexicon aspell (small lexicon) aspell (extended lexicon) Small lexicon Extended lexicon aspell (small lexicon) aspell (extended lexicon) Accuracy 70 Accuracy Minimum word length (characters) Minimum word length (characters) Figure 4: Accuracy results for the Twitter and Wikipedia corpora, for the small and extended lexicons. The aspell program was used as a baseline on the test data for both experiments; it uses its own dictionary. Experiment 1 - small lexicons Twitter Baseline Wikipedia Baseline Experiment 2 - extended lexicons Twitter Baseline Wikipedia Baseline Table 2: Accuracy results for the Twitter and Wikipedia corpus, under both experimental conditions, and for different minimum word lengths (4-10). remedy this problem by using an array of models derived from relatively small sublexicons. Input strings can efficiently be matched against these sublexicons, under the hypothesis that out-of-vocabulary words receive bit strings with higher average Hamming distance to the bit strings of the training words in the lexicon than in-vocabulary words. This is an expression of the reconstruction error of the model for foreign data. The less optimal the fit of the model to the test data, the higher the Hamming distance between the predicted bit strings with the learned bit strings for the training data becomes. The proposed procedure basically performs a nearest neighbor search over several lexicons in parallel, and the dictionary with the lowest average Hamming distance between training bit strings and predicted bit strings produces the correction. 6 Formally, 6 The size of the separate sublexicons is an extra parameter that can be further tuned during training.
6 we compute C(x), the correction of input word x as follows: ( ) C(x) = arg min arg min H(ˆp L (x), ˆp L (y)) y L L (1) Here, H(ˆp L (x), ˆp L (y)) is the Hamming distance between the bit strings predicted by the model for lexicon L, for words x and y (where y is a correct word in the dictionary L). We tested the viability of this procedure on the Wikipedia corpus for the 10 splits of Experiment 1, and we were able to infer for all test splits the correct lexicon for lookup. For every test split, we produced predictions with the 10 training data models, only one of which contained the correct ground truth (training partition i for test partition i). For every word in the test data, we let these models produce a bit string. We then found the nearest neighbor bit string in the corresponding training bit strings, and measured the Hamming distance between the predicted bit string and this nearest neighbor. For every test split, we averaged the distances. The resulting distance matrix is presented in Figure 5, which indeed shows (on the diagonal) minimal distance between the test data and the correct models. This means that the model with the minimal average Hamming distance between predictions and learned bit strings can be identified as the preferred model for correction. From a Models Bit string distance Test partitions Figure 5: Average Hamming distance between bit strings produced for the 10 test data splits (starting at 0), and the bit strings learned for the training data. practical point of view, these relatively small models can be jointly loaded in memory, and spelling error correction can be done efficiently in parallel. 5 Related work For a traditional topic like spelling correction, many solutions have been proposed, ranging from either knowledge-intensive (rule-based, dictionary-based) to statistical, data-driven and machine learning based methods, and mixed rule-based/data-driven solutions ([16]). In [12], an overview of noisy channel methods that can be used for spelling correction based on character-based language models is provided. Models based on transformations attempt to apply inverse string edit operations in order to map noisy strings back to clean versions 7. A modern substring matching algorithm that uses a similar approach is [22]. The wellknown GNU aspell program is based on these transformations as well, in combination with a procedure mapping words to soundalike strings ( metaphones ) that abstract away to a certain extent from orthography. A context-sensitive approach to spelling correction was presented in [6], using the Winnow machine learning algorithm. In [2, 14], early neural network approaches to spelling error correction were presented. In [24], a 7 See for a good and practical explanation.
7 hashing approach to the spelling correction of person names is presented. The authors formulate the problem of finding optimal bit strings as an Eigenvalue problem, to be solved analytically, and, using character bigrams, are able to improve significantly on a number of baselines. Their approach differs from ours in that we explicitly use machine learning techniques to learn optimal bit strings with self-optimization, and that we exploit an inherent characteristic of the learning paradigm (the reconstruction error of the trained autoencoder) to optimize lexicon selection for error correction. Character-level spelling correction based on error correction codes in the context of a brain-computer interface is described in [7]. 6 Conclusions We have demonstrated the effectiveness of deep autoencoders for spelling correction on two publicly available corpora of spelling errors in isolated words. To the best of our knowledge, this is a novel application of the semantic hashing technique proposed by [20]. Overall, character n-gram representations of strings produced high accuracy results, especially for longer words. This representation generates quasi-documents from words, and makes semantic hashing techniques originally meant for documents amenable to the level of words. Increased lexicon size was found to be of negative influence on correction accuracy. We proposed the use of small sublexicons as a remedy, relying on the reconstruction error of the autoencoder to identify the suitable lexicons for correction of a certain input. In a separate experiment, we were able to identify for the 10 Wikipedia test data splits of our small lexicon experiment the sublexicon that minimized the reconstruction error of the test data. Our approach is knowledge-free, and only demands a list of correctly spelled words. It can be extended to contextual spelling correction, by using windowed character n-gram representations of text, which we plan for future work. While we are not aware of comparable results on the two corpora we used, we hope our research contributes to further benchmarking of spelling correction methods applied to these datasets. The data splits we used are easily reconstructed from our specifications. Future work wil also address scalable hyperparameter optimization. Due to the prohibitive time complexity of training the models, efficient methods are needed here. While it is in theory possible we hit a coincidental local maximum in our choice of parameter settings, we observed in line with [20] some indifference of the models with respect to perturbations of certain parameters. A systematic hyperparameter search will deepen our understanding of this phenomenon, and may lead to further improvement of our results. References [1] Yoshua Bengio. Learning deep architectures for AI. Found. Trends Mach. Learn., 2(1):1 127, January [2] V. Cherkassky and N. Vassilas. Backpropagation networks for spelling correction. Neural Net., 1: , [3] Fred J. Damerau. A technique for computer detection and correction of spelling errors. Commun. ACM, 7(3): , March [4] T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley & Sons, Inc, [5] Huizhong Duan and Bo-June (Paul) Hsu. Online spelling correction for query completion. In Proceedings of the 20th international conference on World wide web, WWW 11, pages , New York, NY, USA, ACM. [6] Andrew R. Golding and Dan Roth. A Winnow-based approach to context-sensitive spelling correction. Mach. Learn., 34(1-3): , February [7] N. Jeremy Hill, Jason Farquhar, Suzanna Martens, Felix Bießmann, and Bernhard Schölkopf. Effects of stimulus type and of error-correcting code design on BCI speller performance. In NIPS, pages , 2008.
8 [8] G. Hinton. A practical guide to training restricted Boltzmann machines. Momentum, 9:1, [9] G.E. Hinton, S. Osindero, and Y.W. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7): , [10] G.E. Hinton and R.R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786): , [11] Geoffrey E. Hinton. To recognize shapes first learn to generate images. In Computational Neuroscience: Theoretical Insights into Brain Function. Elsevier, [12] Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1st edition, [13] M. Koppel and J. Schler. Exploiting stylistic idiosyncrasies for authorship attribution. In Proceedings of IJCAI, volume 3, pages 69 72, [14] Mark Lewellen. Neural network recognition of spelling errors. In COLING-ACL, pages , [15] Yanen Li, Huizhong Duan, and ChengXiang Zhai. A generalized hidden Markov model with discriminative training for query spelling correction. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR 12, pages , New York, NY, USA, ACM. [16] Lidia Mangu and Eric Brill. Automatic rule acquisition for spelling correction. In In Proceedings of the 14th International Conference on Machine Learning, pages Morgan Kaufmann, [17] Bruno Martins and Mrio J. Silva. Spelling correction for search engine queries. In Proceedings of EsTAL-04, Espaa for Natural Language Processing, [18] Stephan Raaijmakers and Wessel Kraaij. A shallow approach to subjectivity classification. In Eytan Adar, Matthew Hurst, Tim Finin, Natalie S. Glance, Nicolas Nicolov, and Belle L. Tseng, editors, ICWSM. The AAAI Press, [19] Erhard Rahm and Hong Hai Do. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23:2000, [20] Ruslan Salakhutdinov and Geoffrey Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7): , July [21] Sabato Marco Siniscalchi, Dong Yu, Li Deng, and Chin-Hui Lee. Exploiting deep neural networks for detection-based speech recognition. Neurocomput., 106: , April [22] Jason Soo. A non-learning approach to spelling correction in web queries. In Proceedings of the 22nd international conference on World Wide Web companion, WWW 13 Companion, pages , Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee. [23] Efstathios Stamatatos. A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol., 60(3): , March [24] Raghavendra Udupa and Shaishav Kumar. Hashing-based approaches to spelling correction of personal names. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 10, pages , Stroudsburg, PA, USA, Association for Computational Linguistics. [25] W. John Wilbur, Won Kim, and Natalie Xie. Spelling correction in the PubMed search engine. Inf. Retr., 9(5): , 2006.
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)
Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006
Object Recognition Large Image Databases and Small Codes for Object Recognition Pixels Description of scene contents Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
This version: December 12, 2013 Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks Lawrence Takeuchi * Yu-Ying (Albert) Lee [email protected] [email protected] Abstract We
Recurrent Neural Networks
Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time
Deep learning applications and challenges in big data analytics
Najafabadi et al. Journal of Big Data (2015) 2:1 DOI 10.1186/s40537-014-0007-7 RESEARCH Open Access Deep learning applications and challenges in big data analytics Maryam M Najafabadi 1, Flavio Villanustre
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Chapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY
SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation
Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
Learning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview
Data Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca [email protected] [email protected]
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Optimizing content delivery through machine learning. James Schneider Anton DeFrancesco
Optimizing content delivery through machine learning James Schneider Anton DeFrancesco Obligatory company slide Our Research Areas Machine learning The problem Prioritize import information in low bandwidth
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
called Restricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Ruslan Salakhutdinov [email protected] Andriy Mnih [email protected] Geoffrey Hinton [email protected] University of Toronto, 6 King
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
Principles of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
Efficient online learning of a non-negative sparse autoencoder
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
NEURAL NETWORKS A Comprehensive Foundation
NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...
IFT3395/6390 Historical perspective: back to 1957 (Prof. Pascal Vincent) (Rosenblatt, Perceptron ) Machine Learning from linear regression to Neural Networks Computer Science Artificial Intelligence Symbolic
Machine Learning for Data Science (CS4786) Lecture 1
Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Neural network software tool development: exploring programming language options
INEB- PSI Technical Report 2006-1 Neural network software tool development: exploring programming language options Alexandra Oliveira [email protected] Supervisor: Professor Joaquim Marques de Sá June 2006
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
Neural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
arxiv:1312.6062v2 [cs.lg] 9 Apr 2014
Stopping Criteria in Contrastive Divergence: Alternatives to the Reconstruction Error arxiv:1312.6062v2 [cs.lg] 9 Apr 2014 David Buchaca Prats Departament de Llenguatges i Sistemes Informàtics, Universitat
The Power of Asymmetry in Binary Hashing
The Power of Asymmetry in Binary Hashing Behnam Neyshabur Payman Yadollahpour Yury Makarychev Toyota Technological Institute at Chicago [btavakoli,pyadolla,yury]@ttic.edu Ruslan Salakhutdinov Departments
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK
Proceedings of the IASTED International Conference Artificial Intelligence and Applications (AIA 2013) February 11-13, 2013 Innsbruck, Austria LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK Wannipa
Neural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Visualizing Higher-Layer Features of a Deep Network
Visualizing Higher-Layer Features of a Deep Network Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent Dept. IRO, Université de Montréal P.O. Box 6128, Downtown Branch, Montreal, H3C 3J7,
SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS
UDC: 004.8 Original scientific paper SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS Tonimir Kišasondi, Alen Lovren i University of Zagreb, Faculty of Organization and Informatics,
Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,
Application of Neural Network in User Authentication for Smart Home System
Application of Neural Network in User Authentication for Smart Home System A. Joseph, D.B.L. Bong, D.A.A. Mat Abstract Security has been an important issue and concern in the smart home systems. Smart
Simplified Machine Learning for CUDA. Umar Arshad @arshad_umar Arrayfire @arrayfire
Simplified Machine Learning for CUDA Umar Arshad @arshad_umar Arrayfire @arrayfire ArrayFire CUDA and OpenCL experts since 2007 Headquartered in Atlanta, GA In search for the best and the brightest Expert
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
Artificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
An Analysis of Single-Layer Networks in Unsupervised Feature Learning
An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates 1, Honglak Lee 2, Andrew Y. Ng 1 1 Computer Science Department, Stanford University {acoates,ang}@cs.stanford.edu 2 Computer
Neural Networks in Data Mining
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department
Feature Engineering in Machine Learning
Research Fellow Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia August 21, 2015 Outline A Machine Learning Primer Machine Learning and Data Science Bias-Variance Phenomenon
Robust Sentiment Detection on Twitter from Biased and Noisy Data
Robust Sentiment Detection on Twitter from Biased and Noisy Data Luciano Barbosa AT&T Labs - Research [email protected] Junlan Feng AT&T Labs - Research [email protected] Abstract In this
Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment
2009 10th International Conference on Document Analysis and Recognition Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment Ahmad Abdulkader Matthew R. Casey Google Inc. [email protected]
LARGE-SCALE MALWARE CLASSIFICATION USING RANDOM PROJECTIONS AND NEURAL NETWORKS
LARGE-SCALE MALWARE CLASSIFICATION USING RANDOM PROJECTIONS AND NEURAL NETWORKS George E. Dahl University of Toronto Department of Computer Science Toronto, ON, Canada Jack W. Stokes, Li Deng, Dong Yu
Manifold Learning with Variational Auto-encoder for Medical Image Analysis
Manifold Learning with Variational Auto-encoder for Medical Image Analysis Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill [email protected] Abstract Manifold
Journée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
A Practical Guide to Training Restricted Boltzmann Machines
Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 Copyright c Geoffrey Hinton 2010. August 2, 2010 UTML
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Fast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
Character Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Learning multiple layers of representation
Review TRENDS in Cognitive Sciences Vol.11 No.10 Learning multiple layers of representation Geoffrey E. Hinton Department of Computer Science, University of Toronto, 10 King s College Road, Toronto, M5S
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
A Systematic Cross-Comparison of Sequence Classifiers
A Systematic Cross-Comparison of Sequence Classifiers Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko Bar-Ilan University, Computer Science Department, Israel [email protected], [email protected],
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS
NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS N. K. Bose HRB-Systems Professor of Electrical Engineering The Pennsylvania State University, University Park P. Liang Associate Professor
Artificial Neural Network and Non-Linear Regression: A Comparative Study
International Journal of Scientific and Research Publications, Volume 2, Issue 12, December 2012 1 Artificial Neural Network and Non-Linear Regression: A Comparative Study Shraddha Srivastava 1, *, K.C.
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
Bayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
Machine Learning and Data Mining -
Machine Learning and Data Mining - Perceptron Neural Networks Nuno Cavalheiro Marques ([email protected]) Spring Semester 2010/2011 MSc in Computer Science Multi Layer Perceptron Neurons and the Perceptron
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
NEURAL NETWORKS IN DATA MINING
NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH. Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane
OPTIMIZATION OF NEURAL NETWORK LANGUAGE MODELS FOR KEYWORD SEARCH Ankur Gandhe, Florian Metze, Alex Waibel, Ian Lane Carnegie Mellon University Language Technology Institute {ankurgan,fmetze,ahw,lane}@cs.cmu.edu
How To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
Processing of Big Data. Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015
Processing of Big Data Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015 Acknowledgement Some slides in this set of slides were provided by EMC Corporation and Sandra Avila, University
Multi-class Classification: A Coding Based Space Partitioning
Multi-class Classification: A Coding Based Space Partitioning Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Marcin Gabryel, and Marcin Korytkowski University of Geneva, Centre Universitaire d Informatique,
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Lecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
Email Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
Stephen MacDonell Andrew Gray Grant MacLennan Philip Sallis. The Information Science Discussion Paper Series. Number 99/12 June 1999 ISSN 1177-455X
DUNEDIN NEW ZEALAND Software Forensics for Discriminating between Program Authors using Case-Based Reasoning, Feed-Forward Neural Networks and Multiple Discriminant Analysis Stephen MacDonell Andrew Gray
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
How To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승
Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승 How much energy do we need for brain functions? Information processing: Trade-off between energy consumption and wiring cost Trade-off between energy consumption
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
