1 Boosting Unsupervised Grammar Induction by Splitting Complex Sentences on Function Words Jonathan Berant 1, Yaron Gross 1, Matan Mussel 1, Ben Sandbank 1, Eytan Ruppin, 1 and Shimon Edelman 2 1 Tel Aviv University and 2 Cornell University 1 Introduction The statistical-structural algorithm for unsupervised language acquisition, ADIOS (for Automatic DIstillation Of Structure), developed by Solan et al. (2005), has been shown capable of learning precise and productive grammars from realistic, raw and unannotated corpus data, including transcribed children-directed speech from the CHILDES corpora, in languages as diverse as English and Mandarin. This algorithm, however, does not deal well with grammatically complex texts: the patterns it detects in complex sentences often combine parts of different clauses, negatively affecting performance. We address this problem by employing a twostage learning technique. First, complex sentences are split into simple ones around function words, and the resulting corpus is used to train an ADIOS learner. Second, the original complex corpus is used to complete the training. We also show how the function words themselves can be learned from the corpus using unsupervised distributional clustering. 2 The ADIOS algorithm ADIOS (Solan et al., 2005) is a statistical-structural algorithm for unsupervised language acquisition, which takes a corpus (a set of sentences) from some language as an input, and outputs an estimate for the grammar of that language. 2.1 The data structure The first step in the ADIOS algorithm is to load the corpus into its data structure, which is a directed pseudograph. Every unique word in the corpus is represented in the graph by a node. Every sentence in the corpus is represented in the graph by a path, defined as an ordered set of graph edges. A path starts at a special node called the begin node, follows the graph edges to traverse the nodes corresponding to the words in the order of their appearance in the sentence, and finally reaches a second special node called the end node (see Figure 1). The graph is non-simple and may contain both loops and multiple edges between any pair of nodes.
2 Figure 1: The ADIOS grammar learning algorithm (Solan et al., 2005) represents the training corpus as a pseudograph. The simple example depicted here represents the following four-sentence corpus: (1) is that a cat? (2) is that a dog? (3) where is the dog? (4) and is that a horse? The loading procedure ensures that the number of paths in the graph is equal to the number of sentences in the corpus. We define the language that the algorithm has learned at any given moment as the set of sentences that can be produced by traversing any path and generating a string by accruing the nodes it passes through. At initialization, every path generates exactly one sentence, because no generalization has taken place yet. Unsupervised grammar induction algorithms seek to infer hierarchical tree representations for the corpus sentences, making explicit the constituents that are interchangeable in various contexts. The ADIOS algorithm constructs the hierarchical representations in a bottom-up fashion, simultaneously carrying out sentence segmentation and constituent generalization. 2.2 Segmentation criterion Segmentation of the input sentences is performed using a statistical segmentation criterion, MEX (Motif EXtraction), which scans the graph for patterns. Intuitively, a sequence of nodes is a pattern if there is a statistically significant bundle of paths that traverses it (for details see Solan et al., 2005). The advantage of the pseudograph data structure is that sentences are automatically aligned along nodes at which they overlap, and thus patterns are easily identified (see Figure 2). The ADIOS algorithm iterates the MEX procedure along the paths of the graph, searching for patterns. For every path, the best pattern (if any) is chosen and rewired onto the graph: a new node P is created for the pattern, and for every instance of the pattern in the graph, the nodes of the pattern are replaced by the new node P. This leads to a reduction in the size of the graph. By performing this process on all of the paths iteratively, hierarchical structures are rapidly created.
3 Figure 2: Pattern detection in the pseudograph: (A) The data structure allows for the easy identification of aligned sentences. (B) A set of nodes is identified as a pattern when a bundle of paths traverse it. (C) The pattern is rewired onto all of the paths that traverse the nodes of the pattern. The algorithm stops when no new patterns are found in the graph. 2.3 Generalization As mentioned, the segmentation procedure alone is capable of parsing the input sentences into a hierarchical structure. However, the size of the language remains unchanged by this procedure. To achieve generalization, ADIOS augments the MEX procedure with a search for elements that are in complementary distribution in the context of a given pattern. The search for complementary-distribution elements is performed by sliding a context window of size L along a candidate path, seeking other paths that coincide with it within this window in all places but one. Suppose the paths diverge in the i th place; the element in slot i in the candidate path is then temporarily replaced by an Equivalence Class (EC) that comprises all the nodes found in this slot in the partially aligned paths (see Figure 3), forming the generalized path. Thereafter, the usual MEX procedure is performed on the generalized path, which may lead to the creation of a pattern that contains an equivalence class. This is done for every possible position of the window on the path, and for every possible slot inside the window. The best pattern found is rewired into the graph. If a generalized pattern is thus rewired, the grammar of the learner becomes capable of generalization.
4 Figure 3: Generalization: looking at the paths that coincide with the search path along the context window, except for slot i+2, leads to the creation of a temporary equivalence class (EC) node at that slot. If a pattern that contains the generalized node is rewired onto the graph, the EC node becomes permanent. ATIS version Language model Perplexity 2 ADIOS Trigram Kneser-Ney Back-Off smoothing 14 2 PFA Inference (ALERGIA) + trigram 20 3 ADIOS SLM-wsj + trigram 1.E  NLPwin + trigram SLM-atis + trigram 15.9 Table 1: The lower perplexity achieved by ADIOS-derived language models comparing to standard benchmarks on the ATIS-2 and ATIS-3 corpora. Applying this process on all of the paths iteratively leads to a rapid growth in the size of the language that has been learned. A sentence is in the language if there exists a path that can generate it (a pattern node generates the concatenation of the strings generated by its children; an EC node generates a string generated by one of its children). The ADIOS algorithm has been subjected to extensive testing on a variety of corpora (Solan et al., 2005). Its performance generally exceeded that of other unsupervised grammar induction algorithms capable of dealing with raw corpus data, on various measures, such as precision and recall. In particular, in the language modeling task the perplexity levels achieved by ADIOS-derived models on the ATIS-2 and ATIS-3 corpora were lower than the standard benchmarks (Table 1).
5 Corpus Word types #sentences Average sentence length CHILDES 14, ,000 6 words ATIS-N 1,153 12, words Children s literature 52,180 41, words Table 2: The average length of a sentence in simple corpora such as CHILDES and ATIS-N is much lower than in complex corpora such as children s literature 3 The problem and motivation for solution Although ADIOS attains good performance when applied to corpora such as ATIS, CHILDES and artificial context-free grammars, it encounters difficulties when faced with more complex corpora such as texts from the Wall Street Journal or from children s literature. Specifically, executing the ADIOS algorithm on complex corpora results in a grammar with few patterns and poor generalization. The complexity of the WSJ and the literature corpora is manifested in two ways. First, the average length of a sentence in ATIS and CHILDES is much smaller than in more complex corpora (see Table 2). Second, ATIS and CHILDES are characterized by a relatively restricted number of grammatical structures: most of the sentences are simple questions or imperatives. In comparison, complex corpora contain multi-clause sentences that have a much more diverse and sophisticated grammatical structure. Because complex sentences across languages are usually formed by joining together a few simple sentences according to some grammatical construction, a possible way of dealing with grammatical complexity may be to decompose the complex sentences into simple ones, and to allow ADIOS to learn on a simpler corpus, where one may expect better performance. This idea is related to what Elman (1990) called the importance of starting small (in that study, training a neural network progressively on a sequence of corpora of increasing complexity led to better learning). Our method for dealing with complex sentences by decomposing them into simpler ones, which will be described in the next section, highlights the importance of the distinction between closed- and open-class words. Across languages, lexical items can be divided into two categories: content, or open-class words, and function, or closed-class words. Words in these two categories have distinct distributions in natural languages. Function words, which form a small, closed class (you cannot invent a new preposition, say, and expect to be understood) are very frequent in natural discourse. In comparison, content words form a much larger class, each of whose members, however, occurs much more rarely. A serious limitation of the ADIOS algorithm is its blindness to the distinction between closed- and open-class words. This results in a tendency for function words to appear at the edges of patterns early in the learning process (see Figure 4). This happens because many paths pass through a node of a frequent function word and
6 then branch out to different content words. 4 The Algorithm Our solution uses a subset of function words, which tend to mark clause boundaries, as cues for decomposing complex sentences into simpler ones. Lexical items that are markers for the edge of a clause will be referred to as conjunctions. The algorithm consists of four steps (see Figure 5): A Every sentence in the corpus is decomposed into a set of simple sentences by splitting on the conjunctions. The conjunctions themselves are omitted altogether from the sentences. B The simpler corpus is loaded onto the graph and the ADIOS algorithm is executed. C When learning concludes, the simple sentences are recomposed in their new generalized form. D The ADIOS algorithm is applied again to the recomposed corpus. The algorithm draws on the principles described in the previous section. Training is done in two phases. During the first phase, the learning procedure is restricted to deriving patterns from simple sentences only. When this stage concludes, the original complex sentences are made visible to the algorithm and learning continues. Conjunctions are used to split complex sentences to simpler ones. This Figure 4: In the ADIOS procedure, function words are often a dispersion point for many paths, due to their peculiar statistical distribution. Because of this property, bad patterns that cross clause and phrase boundaries are created.
7 Figure 5: The algorithm for learning complex syntax. (A) One path in the graph is shown; conjunctions are highlighted. (B) The path is split on its conjunction nodes into several paths. (C) Learning is carried out on the simple paths, using the ADIOS algorithm. (D) Conjunctions are reinstated and training is repeated. prevents the algorithm from creating patterns that cross clause boundaries early in the learning procedure. 5 Experiments and results To test our algorithm, we constructed an artificial context-free grammar (jym1) that generates sentences of three types of complexity (Diessel, 2004): (a) Complementation he believes that John runs. (b) Relativization I like the books that you buy. (c) Coordination After I run, I drink. The grammar contains 155 lexical items and 240 derivation rules. The grammar divides the verbs in the lexicon into six classes according to their theta-grid and
8 Figure 6: A fragment from the jym1 context-free grammar, used to generate the corpus that tests the algorithm. The fragment demonstrates some of the complexities in the grammar, such as the distinctions made between verbs that take different types of arguments. subcategorization frame. It distinguishes between human and non-human arguments, between external and internal arguments, and between noun phrases and sentences as internal arguments. Figure 6 presents a small fragment of the grammar. A corpus was generated from the jym1 grammar and divided into a training and testing parts. The training corpus consists of 8, 500 sentences and the testing corpus consists of 923 sentences. Our algorithm was trained and tested on these corpora. To test the performance of ADIOS we used the measures of recall and precision. Recall was defined as the proportion of sentences from the test set that were accepted by the learner at the end of the training. This measures the coverage of the target language that the learner achieved. Precision was defined as the proportion of sentences generated by the learner that were accepted by the jym1 context-free grammar. This measures the accuracy of the learner. Ten ADIOS learners were presented with the jym1 corpus (learners vary with respect to the order of sentences in the corpus). The mean recall and precision are shown in Figure 7. The splitting procedure resulted in a significantly improved recall: from 0.81 to 0.9 (p < 0.01); the precision did not change significantly (p = 0.14). Although precision decreased from 0.2 to 0.14 (n.s.) when we split on conjunctions, a manual comparison between the corpora generated in the two different configurations gave the impression that the sentences generated when splitting on conjunctions were as good as in the normal setting. To test that we decided to measure our accuracy using edit distance. The edit distance between a sentence generated by a learner and the grammar is defined as the minimal number of elementary edit operations performed on the sentence that would turn it grammatical (Lyon, 1974). An edit operation is either the insertion of a lexical item to the sentence, the deletion of a lexical item from the sentence, or the substitution of a
9 lexical item in the sentence for another. The edit distance of a learner is defined as the average edit distance to a correct sentence across the entire corpus it generates. Edit distance is a much relaxed measure than precision, because it gives some credit to sentences that are only partially correct. Results on the ten learners show that the edit distance when splitting on conjunctions (2.15) is almost identical to the edit distance achieved in the normal setting, (2.09), supporting our claim that the accuracy of the algorithm was not harmed due to splitting (p = 0.74). Next, we tried to split the sentences of the corpus not only on their conjunctions but also on their complementizers (e.g., I like the books that you buy would be split into I like the books and you buy ). In this experiment, recall improved from 0.81 to 0.89 (p < 0.01), but precision also decreased significantly from 0.2 to 0.1 (p < 0.05). Thus, splitting only on conjunctions yielded better results. To confirm that the improvement stemmed from the linguistically relevant segmentation and not merely from the shorter sentences handed to the algorithm, we split the sentences to arbitrarily shorter phrases with the same mean length as in the first experiment. Again, ten learners were trained, but now recall significantly decreased from 0.81 to 0.53 while precision remained unchanged. Even though splitting on conjunctions improved recall significantly, it was relatively high even in the baseline condition, when the regular version of ADIOS was used. As we mentioned before, when regular ADIOS is applied to complex corpora, the usual outcome is low recall. Thus, we wanted to see the effect of splitting the sentences when the initial recall is low. To that end, we constructed a second context free grammar (jym3) that differed from jym1 in two respects. First, the size of the lexicon was increased to 365 lexical items (compared to 155 lexical items in jym1). Given that the size of the corpus is fixed, this means that every lexical item was now much rarer, making generalization more difficult. Second, ambiguities were removed from the lexicon. In the jym1 corpus, ten lexical items belonged to two different categories each. Lexical items that belong to multiple categories may cause overgeneralization, and so all of the ambiguous lexical items were replaced by non-ambiguous ones. From the jym3 context-free grammar, a training set of 9, 000 sentences and a test set of 1000 sentences was generated. Recall improved dramatically from 0.17 in the normal execution (regular ADIOS) to 0.59 when splitting the sentences. Precision on the other hand significantly decreased from 0.55 to Given that we weigh recall and precision equally, the overall performance after the splitting was better: the F-score (defined as the geometric mean of precision and recall) improved significantly from 0.24 to In general, splitting alone managed to boost ADIOS to a higher level of performance on a complex corpus. 6 Unsupervised identification of conjunctions A lacuna in the approach described above is the need to provide the algorithm ahead of time with a list of conjunctions on which to split the complex sentences.
10 Figure 7: Results for the jym1 and jym3 corpora, showing the precision and recall measures, for the normal (baseline) version of ADIOS, compared to the version that splits complex sentences on conjunctions in the first phase of training. Of course, no such list is available to a child acquiring a language. Children, however, are very good at recognizing function words very early on (Shi et al., 2006), using cues such as prosody. Thus, one may conjecture that children have access to various sources of information that help mark clause boundaries, and that are unavailable to an algorithm that only relies on text. Thus, it would be useful to show that clause boundaries and conjunctions are identifiable using only the statistical properties of their distribution in language. 1 If that can be demonstrated, the independently learned conjunctions can be passed on to ADIOS for the purpose of splitting. This would keep the entire learning process unsupervised. The key to identifying conjunctions is to realize that the input is not just a finite sequence of words, but rather a set of sentences (much like natural language divides speech into utterances using prosody). Hence, it is possible to characterize what words appear at the beginnings of sentences, and what words appear at the endings of sentences. Intuitively, conjunctions are lexical items that coordinate two constituents which could be separate sentences in their own right. Thus, it is possible to characterize conjunctions as lexical items that are followed by words that usually appear at the beginning of a sentence, and are preceded by words that usually appear at the ending of a sentence. A natural way of formalizing this intuition is to use straightforward distributional clustering: Create a vector V beginning that counts for every possible bi-gram b in the language the frequency of b appearing at the beginning of a sentence. Create a similar vector V end for the ending of the sentences. For every word w in the language, create a pair of vectors W beginning and W end. W beginning counts for every possible bi-gram b in the language the frequency of b following w; W end counts the frequency of b preceding w. 1 Although in this work we only show a method for identifying conjunctions, it seems that it can be easily expanded to identifying clause boundaries.
11 Rank Conj. in jym1 Conj. in children s literature 1, and 2 and but 3 because because 4 although that 5 before though 6 when till 7 where where 8 after until 9 whether 10 which 11 when 12 for 13 thought 14 to Table 3: The algorithm for unsupervised identification of conjunctions succeeds in ranking all of the conjunctions in the jym1 corpus at the top of the output list. When applied on the children s literature corpus, the top 25 lexical items found by the algorithm (with one exception) are all conjunctions, complementizers and prepositions. For every pair of vectors W beginning and W end, calculate the angle α 1 between W beginning and V beginning and the angle α 2 between W end and V end. Let α be the mean of α 1 and α 2. Sort all the words in the lexicon according to α in increasing order and return the sorted list. In the jym1 corpus, eight words are defined as conjunctions. Applying the algorithm to the jym1 corpus results in the eight conjunctions being ranked at the top of the output list. To test the conjunction detection method on natural data, we applied it to a fragment of the children s literature corpus (55, 000 sentences and 6, 300 different word tokens 2 ). The top 25 lexical items that were found by the algorithm (with one exception) were all conjunctions, complementizers and prepositions (see Table 3). To conclude, it seems that the method we developed succeeds in identifying words that are probable markers for clause boundaries. 2 The entire corpus can be downloaded from Project Gutenberg (http://www.gutenberg.org) under the subject children s literature. We took a random sample from the English corpus and divided it into sentences using simple preprocessing techniques.
12 7 Conclusion The previously described statistical-structural algorithm for unsupervised language acquisition, ADIOS, detects in complex sentences patterns that combine parts of different clauses, negatively affecting performance. We addressed this problem by splitting complex sentences on function words, thus simplifying the complex sentences and preventing across-clauses patterns from occurring. To test this approach, we constructed an artificial CFG that generates sentences of three types of complexity: complementation, relativization and coordination. Splitting the complex sentences on function words indeed had a positive effect on the learning performance. Furthermore, we developed a method for an unsupervised identification of conjunctions, based solely on the statistical properties of these particles, and using distributional clustering. In conclusion, we presented a method for extending the capabilities of the ADIOS algorithm without detracting from its unsupervised nature. Acknowledgments. Thanks to the US-Israel Binational Science Foundation and the Horowitz Center for Complexity Science for support. References Diessel, H. (2004). The Acquisition of Complex Sentences, volume 105 of Cambridge Studies in Linguistics. Cambridge University Press, Cambridge. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14: Lyon, G. (1974). Syntax-directed least-errors analysis for context-free languages: a practical approach. Communications of the ACM, 17:3 14. Shi, R., Werker, J. F., and Cutler, A. (2006). Recognition and representation of function words in English-learning infants. Infancy, 10: Solan, Z., Horn, D., Ruppin, E., and Edelman, S. (2005). Unsupervised learning of natural languages. Proceedings of the National Academy of Science, 102:
Context Free Grammars So far we have looked at models of language that capture only local phenomena, namely what we can analyze when looking at only a small window of words in a sentence. To move towards
Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language Processing Some subproblems are partially solved Spelling correction, grammar checking Information retrieval with
1 Three Models for the Description of Language Alaa Kharbouch and Zahi Karam I. INTRODUCTION The grammar of a language is a device that describes the structure of that language. The grammar is comprised
Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer
Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?
Structure Mapping for Jeopardy! Clues J. William Murdock firstname.lastname@example.org IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 Abstract. The Jeopardy! television quiz show asks natural-language
Lecture 4 Deep Structure and Transformations Thus far, we got the impression that the base component (phrase structure rules and lexicon) of the Standard Theory of syntax generates sentences and assigns
Ling 130 Notes: English syntax Sophia A. Malamud March 13, 2014 1 Introduction: syntactic composition A formal language is a set of strings - finite sequences of minimal units (words/morphemes, for natural
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
last revised: 26 January 2009 1 Markov Chains A Markov chain process is a simple type of stochastic process with many social science applications. We ll start with an abstract description before moving
An Introduction to Categorial Grammar and its uses in Natural Language Processing Department of Computer Science and Automation Indian Institute of Science 1-Dec-2015 An Introduction to Categorial Grammar
HMM Speech Recognition ords: Pronunciations and Language Models Recorded Speech Decoded Text (Transcription) Steve Renals Acoustic Features Acoustic Model Automatic Speech Recognition ASR Lecture 9 January-March
Topics for Today! Text transformation Word occurrence statistics Tokenizing Stopping and stemming Phrases Document structure Link analysis Information extraction Internationalization Phrases! Many queries
CS 533: Natural Language Processing Lecture 03 N-Gram Models and Algorithms CS 533: Natural Language Processing Lecture 01 1 Word Prediction Suppose you read the following sequence of words: Sue swallowed
Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March
Basic syntax of quantifiion in English Jeff peaks phil 43916 eptember 15, 2014 1 The problem posed by quantified sentences.................... 1 2 Basic syntactic egories for quantifiion in English..............
COLORED GRAPHS AND THEIR PROPERTIES BEN STEVENS 1. Introduction This paper is concerned with the upper bound on the chromatic number for graphs of maximum vertex degree under three different sets of coloring
Noam Chomsky: Aspects of the Theory of Syntax notes Julia Krysztofiak May 16, 2006 1 Methodological preliminaries 1.1 Generative grammars as theories of linguistic competence The study is concerned with
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
JMLR: Workshop and Conference Proceedings 21:224 236, 2012 The 11th ICGI Model Merging versus Model Splitting Context-Free Grammar Induction Menno van Zaanen Nanne van Noord Tilburg University, Tilburg,
Studying the role of as in consider and regard sentences with syntactic weight and educational materials Tetsu NARITA Introduction When we say consider or regard in conversation, we either do or do not
Limits on the human sentence generator Anthony S. Kroch University of Pennsylvania The problem of language generation is the problem of translating communicative intent into linguistic form. As such, it
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
Overview Clustering Connectionist and Statistical Language Processing Frank Keller email@example.com Computerlinguistik Universität des Saarlandes clustering vs. classification supervised vs. unsupervised
Web Data : Given Slides baseados nos slides oficiais do livro Web Data Mining c Bing Liu, Springer, December, 2006. Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008
Phrase-Based Translation Models Michael Collins April 10, 2013 1 Introduction In previous lectures we ve seen IBM translation models 1 and 2. In this note we will describe phrasebased translation models.
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik firstname.lastname@example.org Abstract This paper provides an introduction to the design of augmented data structures that offer
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
Learning Lexical Clusters in Children s Books Edmond Lau 6.xxx Presentation May 12, 2004 1 Vision Children learn word patterns from repetition Cinderella s glass slippers -> Barbie s plastic slippers How
Probabilistic Context-Free Grammars (PCFGs) Michael Collins 1 Context-Free Grammars 1.1 Basic Definition A context-free grammar (CFG) is a 4-tuple G = (N, Σ, R, S) where: N is a finite set of non-terminal
Concept Maps for Data Modeling Ron McFadyen Ron McFadyen 2009 1 Preface Our ultimate goal is to learn to develop and document database designs, understand how to transform a design into a relational database,
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Judging grammaticality, detecting preposition errors and generating language test items using a mixed-bag of NLP techniques Jennifer Foster Natural Language Processing and Language Learning Workshop Nancy
Why is the number of - and -avoiding permutations equal to the number of binary trees? What are restricted permutations? We shall deal with permutations avoiding some specific patterns. For us, a permutation
AN AUGMENTED STATE TRANSITION NETWORK ANALYSIS PROCEDURE Daniel G. Bobrow Bolt, Beranek and Newman, Inc. Cambridge, Massachusetts Bruce Eraser Language Research Foundation Cambridge, Massachusetts Summary
A Chart Parsing implementation in Answer Set Programming Ismael Sandoval Cervantes Ingenieria en Sistemas Computacionales ITESM, Campus Guadalajara email@example.com Rogelio Dávila Pérez Departamento de
LEXICOGRAMMATICAL PATTERNS OF LITHUANIAN PHRASES Rūta Marcinkevičienė, Gintarė Grigonytė Vytautas Magnus University, Kaunas, Lithuania Abstract The paper overviews the process of compilation of the first
1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline
Project Report for 15781 Classification of Music Genre Muralidhar Talupur, Suman Nath, Hong Yan As the demand for multimedia grows, the development of information retrieval systems including information
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
Hidden Markov Models and Sequential Data Sequential Data Often arise through measurement of time series Snowfall measurements on successive days in Buffalo Rainfall measurements in Chirrapunji Daily values
Recursion and Induction Themes Recursion Recurrence Definitions Recursive Relations Induction (prove properties of recursive programs and objects defined recursively) Examples Tower of Hanoi Gray Codes
CS 6740/INFO 6300 Spring 2010 Lecture 17 April 15 Lecturer: Prof Lillian Lee Scribes: Jerzy Hausknecht & Kent Sutherland Using & Augmenting Context-Free Grammars for Sequential Structure 1 Motivation for
Lecture 1: Introduction and convex hulls 1 Geometry: points, lines,... Plane (two-dimensional), R 2 Space (three-dimensional), R 3 Space (higher-dimensional), R d A point in the plane, 3-dimensional space,
Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks
INDIAN INSTITUTE OF TECHNOLOGY MADRAS CS6370 Natural Language Processing Spell Check Assignment Report Aravind Sankar (CS11B033) Sriram V (CS11B058) September 28, 2014 1 INTRODUCTION This assignment involved
Mavuno: A Scalable and Effective Hadoop-Based Paraphrase Acquisition System Donald Metzler and Eduard Hovy Information Sciences Institute University of Southern California Overview Mavuno Paraphrases 101
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Prof. Sergio A. Alvarez http://www.cs.bc.edu/ alvarez/ Fulton Hall 410 B firstname.lastname@example.org Computer Science Department voice: (617) 552-4333 Boston College fax: (617) 552-6790 Chestnut Hill, MA 02467 USA
Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich email@example.com Abstract. We discuss paraphrasing controlled English texts, by defining
Learning and Inference for Clause Identification Xavier Carreras Lluís Màrquez Technical University of Catalonia (UPC) Vasin Punyakanok Dan Roth University of Illinois at Urbana-Champaign (UIUC) ECML 2002
Clustering Connectionist and Statistical Language Processing Frank Keller firstname.lastname@example.org Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
SPEAKER VERIFICATION BASED ON SPEAKER P OSITION IN A MULTIDIMENSIONAL SPEAKER IDENTIFICATION SPACE CLAUDE A. NORTON, III AND STEPHEN A. ZAHORIAN Department of Electrical and Computer Engineering Old Dominion
Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference Part of Speech Induction from Distributional Features: Balancing Vocabulary and Vivek V. Datla
A High Scale Deep Learning Pipeline for Identifying Similarities in Online Communities Robert Elwell, Tristan Chong, John Kuner, Wikia, Inc. Abstract: We outline a multi step text processing pipeline employed
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
Notes on English Intonation, p. 1 Notes on English Intonation. Ken de Jong, Indiana University 2/23/06 Introduction. Following are some very general notes to introduce you to an analysis of the intonation
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA email@example.com, firstname.lastname@example.org
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
Introducing Voice Analysis Software into the Classroom: how Praat Can Help French Students Improve their Acquisition of English Prosody. Laurence Delrue Université de Lille 3 (France) email@example.com
Syntactic Theory Background and Transformational Grammar Dr. Dan Flickinger & PD Dr. Valia Kordoni Department of Computational Linguistics Saarland University October 28, 2011 Early work on grammar There
ESOL 94S- Ford Section 3.1 & 3.2: Input Analysis 3.1: Variation in Language The English language, a phrase heard very frequently, gives the impression that English is one uniform system of communication
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
Learning Learning is essential for unknown environments, i.e., when designer lacks omniscience Artificial Intelligence Learning Chapter 8 Learning is useful as a system construction method, i.e., expose
Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture
Hello everyone! Thank you for coming to my tutorial today. My name is Erin Catto and I m a programmer at Blizzard Entertainment. Today I m going to talk about computing distance. 1 Suppose we have a pair
Operator precedence parsing Bottom-up parsing methods that follow the idea of shift-reduce parsers Several flavors: operator, simple, and weak precedence. In this course, only weak precedence Main di erences
Homework Exam 1, Geometric Algorithms, 2016 1. (3 points) Let P be a convex polyhedron in 3-dimensional space. The boundary of P is represented as a DCEL, storing the incidence relationships between the
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
Introduction to Graph Theory Introduction These notes are primarily a digression to provide general background remarks. The subject is an efficient procedure for the determination of voltages and currents
raph Theory Problems and Solutions Tom Davis firstname.lastname@example.org http://www.geometer.org/mathcircles November, 005 Problems. Prove that the sum of the degrees of the vertices of any finite graph is
Framework for Joint Recognition of Pronounced and Spelled Proper Names by Atiwong Suchato B.S. Electrical Engineering, (1998) Chulalongkorn University Submitted to the Department of Electrical Engineering
2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed
Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi Lecture - 15 Limited Resource Allocation Today we are going to be talking about
COREsim is a discrete event simulator included with the CORE family of systems engineering tools (but licensed separately). A discrete event simulator traverses a behavior model and schedules and executes