Training PCFGs. BM1 Advanced Natural Language Processing. Alexander Koller. 2 December 2014
|
|
- Doris Pope
- 8 years ago
- Views:
Transcription
1 Trag CFGs BM1 Advanced atural Language rocessg Alexander Koller 2 December 2014
2 robabilistic CFGs [1.0] V [0.5] [0.8] [0.5] i [0.2] V shot [1.0] [0.4] [1.0] elephant [0.3] [1.0] [0.3] an [0.5] [0.5] (let s pretend for simplicity that = R$)
3 arse trees p = p = V shot 0.5 an elephant V shot an elephant correct = more probable parse tree
4 Today arameters of CFG = rule probabilities. How do we learn parameters from corpora? maximum likelihood estimation hard EM usg Viterbi soft EM usg the side-outside algorithm
5 ML Estimation Assume we have a treebank. that is, every sentence annotated by hand with its correct parse tree Then we can use MLE to obta rule probabilities: (A w) = C(A w) C(A ) = C(A w) w 0 C(A w 0 ) tandard way of parameter estimation practice. Works well, not much smoothg necessary on TB.
6 Example TV V shot an elephant slept
7 Example TV V shot an elephant slept
8 Example TV V shot an elephant slept
9 Example TV V shot an elephant slept [0] TV [1/4] elephant [1/3] V [1/4] [2/3] [1/2]
10 Unsupervised estimation MLE works well for English. German: Tiger treebank exists, but is hard for CFGs, e.g. because of free word order. most other languages: phrase structure annotations unavailable, expensive to create -> unsupervised methods? Unsupervised methods: provide CFG, learn parameters from unannotated corpus show first hard EM, then soft EM ideas structive and generalize to related problems
11 Hard aka Viterbi EM n the absence of syntactic annotations, learner must vent its own parse trees. Viterbi EM: start with some parameter estimate produce syntactic annotations by computg best tree for each sentence usg Viterbi apply MLE to re-estimate parameters repeat as long as needed This is not real EM!
12 Example 1 [0.6] TV [1/3] elephant [0.2] V [1/3] [0.2] [1/3] p = (vs ) p = TV V slept shot an elephant
13 Example 1 [0.6] TV [1/3] elephant [0.2] V [1/3] [0.2] [1/3] p = (vs ) p = TV V slept shot an elephant
14 Example 1 [0.6] TV [1/3] elephant [0.2] V [1/3] [0.2] [1/3] p = (vs ) p = TV V slept shot an elephant
15 MLE on Viterbi parses 2 [1/4] TV [1/3] elephant [1/4] V [1/3] [1/2] [1/3] p = (vs ) p = TV V shot an elephant slept
16 ome thgs to note n this example, the likelihood creased. this need not always be the case for Viterbi EM Viterbi EM commits to a sgle parse tree per sentence. This has advantages and disadvantages: parse tree easy to compute, and can simply apply MLE ignores all uncertaty we had about correct parse (wng parse tree takes all)
17 Towards real (aka soft ) EM idea: weighted countg of rules all parse trees Viterbi-EM t 1 t 2 t 3 t 4 1 C t1 (r) + 0 C t2 (r) + 0 C t3 (r) + 0 C t4 (r) EM t 1 t 2 t 3 t 4 (t 1 w) C t1 (r) + (t 2 w) C t2 (r) + (t 3 w) C t3 (r) + (t 4 w) C t4 (r)
18 Expected counts Defe expected count of rule A B C, based on previous parameter estimate. E(A! BC)= t2t (t w) C t (A! BC) f we have them, can re-estimate parameters: (A! BC)= E(A! BC) E(A! r) r Challenge: How to compute E(A B C) efficiently? we assume grammars CF here
19 Fundamental idea E(A! BC)= t2t (t w) C t (A! BC) = 1 (w) = 1 (w) = 1 (w) = 1 (w) (t) C t (A! BC) t2t (t) rule for i, j, k t is A! BC t2t i,j,k! (t) rule for i, j, k t is A! BC i,j,k t2t i,j,k t of this form! (t) w 1 B A w n C (note that (t, w) = (t)) w i w j-1 w j w k-1
20 Fundamental idea E(A! BC)= t2t (t w) C t (A! BC) = 1 (w) = 1 (w) = 1 (w) = 1 (w) (t) C t (A! BC) t2t (t) rule for i, j, k t is A! BC t2t i,j,k! (t) rule for i, j, k t is A! BC i,j,k t2t i,j,k t of this form! (t) call this term μ(a B C, i, j, k) w 1 B A w n C (note that (t, w) = (t)) w i w j-1 w j w k-1
21 µ(a! BC,i,j,k)= Computg μ (t) = (A, i, k) (A! BC) (B,i,j) (C, j, k) t of this form w 1 B A w n C w i w j-1 w j w k-1
22 µ(a! BC,i,j,k)= Computg μ (t) = (A, i, k) (A! BC) (B,i,j) (C, j, k) t of this form w 1 B A w n C w i w j-1 w j w k-1 side probability (B,i,j) = t for B ) w i...w j 1 (t)
23 µ(a! BC,i,j,k)= Computg μ (t) = (A, i, k) (A! BC) (B,i,j) (C, j, k) t of this form w 1 B A w n C w i w j-1 w j w k-1 side probability (B,i,j) = t for B ) w i...w j 1 (t) outside probability (A, i, k) = t for ) w 1...w i 1 Aw k...w n (t)
24 nside probabilities (A, i, k) = t for B ) w i...w k 1 (t) A B C w i w j-1 w j w k-1 (A, i, i + 1) = (A! w i ) (A, i, k) = (A! BC) (B,i,j) (C, j, k) A!B C i<j<k
25 nside probabilities (A, i, k) = t for B ) w i...w k 1 (t) B A C special case: (w) = α(,1,n+1) w i w j-1 w j w k-1 (A, i, i + 1) = (A! w i ) (A, i, k) = (A! BC) (B,i,j) (C, j, k) A!B C i<j<k
26 Outside probabilities (A, i, k) = t for ) w 1...w i 1 Aw k...w n (t) = (B! AC) (C, k, j) (B,i,j) + (B! CA) (C, j, i) (B,j,k) B!A C k<japplen B!C A 1applej<i B B A C C A w... 1 w... i w... k w j w n w... 1 w... j w... i w k w n
27 Outside probabilities (A, i, k) = t for ) w 1...w i 1 Aw k...w n (t) = (B! AC) (C, k, j) (B,i,j) + (B! CA) (C, j, i) (B,j,k) B!A C k<japplen B!C A 1applej<i B base case: β(a,1,n+1) = 1 iff A = B A C C A w... 1 w... i w... k w j w n w... 1 w... j w... i w k w n
28 The nside-outside Algorithm tart with some itial estimate of parameters. For each sentence w, compute α, β, and μ. Compute expected counts E(A B C). sum expected counts over all sentences remember that (w) = α(, 1, n+1) Re-estimate (A B C) from expected counts. terate until convergence.
29 ome remarks nside-outside creases likelihood each step. Charniak ereira But huge problems with local maxima. Carroll & Charniak 92 fd 300 different local maxima for 300 different itial parameter estimates. mprove by partially bracketg strgs (ereira & chabes 92). Also, quite slow. Every EM step takes time O(n 3 ). Lari & Young 90 fd that learng a grammar from a corpus generated from CFG with nontermals requires 3 nontermals.
30 ummary Learng parameters of CFGs: maximum likelihood estimation from raw text hard EM : iterate MLE on Viterbi parses EM: use side-outside algorithm with expected rule counts CFG parsg with MLE parse gets f-score 70 s. Will improve on this next time. Have assumed that CFG is given and only parameters are to be learned. Will fix this January.
Online EFFECTIVE AS OF JANUARY 2013
2013 A and C Session Start Dates (A-B Quarter Sequence*) 2013 B and D Session Start Dates (B-A Quarter Sequence*) Quarter 5 2012 1205A&C Begins November 5, 2012 1205A Ends December 9, 2012 Session Break
More informationPOS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches
POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches POS Tagging 1 POS Tagging 2 Words taken isolatedly are ambiguous regarding its POS Yo bajo con el hombre bajo a PP AQ
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationApplying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu
Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ anoop@linc.cis.upenn.edu 1 Statistical Parsing: the company s clinical trials of both its animal and human-based
More informationLexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training Stefan Riezler IMS, Universit t Stuttgart riezler@ims.uni-stuttgart.de Jonas Kuhn IMS, Universit t
More informationPyrex Journals 2035-4876
Pyrex Journals 2035-4876 Pyrex Journal of Educational Research and Review Vol 1 (6) pp. 057-061 July, 2015 http:///pjerr Copyright 2015 Pyrex Journals Review Paper Comparative Analysis of Students Academic
More informationOutline of today s lecture
Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationα α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =
More informationAccelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
More informationBrill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
More informationPhase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
More informationThe Australian Journal of Mathematical Analysis and Applications
The Australian Journal of Mathematical Analysis and Applications Volume 7, Issue, Article 11, pp. 1-14, 011 SOME HOMOGENEOUS CYCLIC INEQUALITIES OF THREE VARIABLES OF DEGREE THREE AND FOUR TETSUYA ANDO
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationSymbiosis of Evolutionary Techniques and Statistical Natural Language Processing
1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: lurdes@sip.ucm.es)
More informationASCII CODES WITH GREEK CHARACTERS
ASCII CODES WITH GREEK CHARACTERS Dec Hex Char Description 0 0 NUL (Null) 1 1 SOH (Start of Header) 2 2 STX (Start of Text) 3 3 ETX (End of Text) 4 4 EOT (End of Transmission) 5 5 ENQ (Enquiry) 6 6 ACK
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationVisa Smart Debit/Credit Certificate Authority Public Keys
CHIP AND NEW TECHNOLOGIES Visa Smart Debit/Credit Certificate Authority Public Keys Overview The EMV standard calls for the use of Public Key technology for offline authentication, for aspects of online
More information8. EFFECTIVE COLLABORATIVE STRATEGIES
8. EFFECTIVE COLLABORATIVE STRATEGIES WITH ADULT LITERACY PROVIDERS 8a. This District is the provider of Adult. Its service coverage duplicates the district boundaries and its admistrative and structional
More informationA Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu
More informationThe PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark eckhard.bick@mail.dk Outline Flow chart Linguateca Palavras History
More informationPortuguese-English Statistical Machine Translation using Tree Transducers
Portuguese-English tatistical Machine Translation using Tree Transducers Daniel Beck 1, Helena Caseli 1 1 Computer cience Department Federal University of ão Carlos (UFCar) {daniel beck,helenacaseli}@dc.ufscar.br
More informationSelf-Training for Parsing Learner Text
elf-training for Parsing Learner Text Aoife Cahill, Binod Gyawali and James V. Bruno Educational Testing ervice, 660 Rosedale Road, Princeton, NJ 0854, UA {acahill, bgyawali, jbruno}@ets.org Abstract We
More informationOff-line (and On-line) Text Analysis for Computational Lexicography
Offline (and Online) Text Analysis for Computational Lexicography Von der PhilosophischHistorischen Fakultät der Universität Stuttgart zur Erlangung der Würde eines Doktors der Philosophie (Dr. phil.)
More informationMotivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
More informationCINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
More informationHMM : Viterbi algorithm - a toy example
MM : Viterbi algorithm - a toy example.5.3.4.2 et's consider the following simple MM. This model is composed of 2 states, (high C content) and (low C content). We can for example consider that state characterizes
More informationTreebank Search with Tree Automata MonaSearch Querying Linguistic Treebanks with Monadic Second Order Logic
Treebank Search with Tree Automata MonaSearch Querying Linguistic Treebanks with Monadic Second Order Logic Authors: H. Maryns, S. Kepser Speaker: Stephanie Ehrbächer July, 31th Treebank Search with Tree
More informationLearning and Inference over Constrained Output
IJCAI 05 Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wen-tau Yih Dav Zimak Department of Computer Science University of Illinois at Urbana-Champaign {punyakan, danr, yih, davzimak}@uiuc.edu
More informationa 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
More informationTekniker för storskalig parsning
Tekniker för storskalig parsning Diskriminativa modeller Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi joakim.nivre@lingfil.uu.se Tekniker för storskalig parsning 1(19) Generative
More informationEffective Self-Training for Parsing
Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu
More informationComma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University
Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect
More informationBinarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy
Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy Wei Wang and Kevin Knight and Daniel Marcu Language Weaver, Inc. 4640 Admiralty Way, Suite 1210 Marina del Rey, CA, 90292 {wwang,kknight,dmarcu}@languageweaver.com
More informationIntroduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014
Introduction! BM1 Advanced Natural Language Processing Alexander Koller! 17 October 2014 Outline What is computational linguistics? Topics of this course Organizational issues Siri Text prediction Facebook
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More information2 PROGRAM DOCUMENTATION LAN- GUAGE
SYNTACTIC TABULAR FORM PROCESSING BY PRECEDENCE ATTRIBUTE GRAPH GRAMMARS TOMOKAZU ARITA KIMIO SUGITA KENSEI TSUCHIDA TAKEO YAKU Dept. App. Math. Nihon University 3 5 4, Sakurajosui, Setagaya, Tokyo, 156-855,
More informationLecture 2 Introduction to Data Flow Analysis
Lecture 2 Introduction to Data Flow Analysis I. Introduction II. Example: Reaching definition analysis III. Example: Liveness analysis IV. A General Framework (Theory in next lecture) Reading: Chapter
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationLGPLLR : an open source license for NLP (Natural Language Processing) Sébastien Paumier. Université Paris-Est Marne-la-Vallée
LGPLLR : an open source license for NLP (Natural Language Processing) Sébastien Paumier Université Paris-Est Marne-la-Vallée paumier@univ-mlv.fr Penguin from http://tux.crystalxp.net/ 1 Linguistic data
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationHidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
More informationIntroduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages
Introduction Compiler esign CSE 504 1 Overview 2 3 Phases of Translation ast modifled: Mon Jan 28 2013 at 17:19:57 EST Version: 1.5 23:45:54 2013/01/28 Compiled at 11:48 on 2015/01/28 Compiler esign Introduction
More informationStatistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models
p. Statistical Machine Translation Lecture 4 Beyond IBM Model 1 to Phrase-Based Models Stephen Clark based on slides by Philipp Koehn p. Model 2 p Introduces more realistic assumption for the alignment
More informationNatural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression
Natural Language Processing Lecture 13 10/6/2015 Jim Martin Today Multinomial Logistic Regression Aka log-linear models or maximum entropy (maxent) Components of the model Learning the parameters 10/1/15
More informationFactoring a Difference of Two Squares. Factoring a Difference of Two Squares
284 (6 8) Chapter 6 Factoring 87. Tomato soup. The amount of metal S (in square inches) that it takes to make a can for tomato soup is a function of the radius r and height h: S 2 r 2 2 rh a) Rewrite this
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationGenerating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
More informationMath 319 Problem Set #3 Solution 21 February 2002
Math 319 Problem Set #3 Solution 21 February 2002 1. ( 2.1, problem 15) Find integers a 1, a 2, a 3, a 4, a 5 such that every integer x satisfies at least one of the congruences x a 1 (mod 2), x a 2 (mod
More informationLearning Translations of Named-Entity Phrases from Parallel Corpora
Learning Translations of Named-Entity Phrases from Parallel Corpora Robert C. Moore Microsoft Research Redmond, WA 98052, USA bobmoore@microsoft.com Abstract We develop a new approach to learning phrase
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
More informationStatistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
More information5HFDOO &RPSLOHU 6WUXFWXUH
6FDQQLQJ 2XWOLQH 2. Scanning The basics Ad-hoc scanning FSM based techniques A Lexical Analysis tool - Lex (a scanner generator) 5HFDOO &RPSLOHU 6WUXFWXUH 6RXUFH &RGH /H[LFDO $QDO\VLV6FDQQLQJ 6\QWD[ $QDO\VLV3DUVLQJ
More informationLearning To Deal With Little Or No Annotated Data
Learning To Deal With Little Or No Annotated Data Information Sciences Institute and Department of Computer Science University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292
More informationRepresenting Uncertainty by Probability and Possibility What s the Difference?
Representing Uncertainty by Probability and Possibility What s the Difference? Presentation at Amsterdam, March 29 30, 2011 Hans Schjær Jacobsen Professor, Director RD&I Ballerup, Denmark +45 4480 5030
More informationNoam Chomsky: Aspects of the Theory of Syntax notes
Noam Chomsky: Aspects of the Theory of Syntax notes Julia Krysztofiak May 16, 2006 1 Methodological preliminaries 1.1 Generative grammars as theories of linguistic competence The study is concerned with
More informationCorpus Design for a Unit Selection Database
Corpus Design for a Unit Selection Database Norbert Braunschweiler Institute for Natural Language Processing (IMS) Stuttgart 8 th 9 th October 2002 BITS Workshop, München Norbert Braunschweiler Corpus
More informationComputer Programming Lecturer: Dr. Laith Abdullah Mohammed
Algorithm: A step-by-step procedure for solving a problem in a finite amount of time. Algorithms can be represented using Flow Charts. CHARACTERISTICS OF AN ALGORITHM: Computer Programming Lecturer: Dr.
More information1-bit Full Adder. Adders. The expressions for the sum and carry lead to the following unified implementation:
1-bit Full Adder 1 The expressions for the sum and carry lead to the followg unified implementation: Sum A Carry C B This implementation requires only two levels of logic (ignorg the verters as customary).
More informationOnline Large-Margin Training of Dependency Parsers
Online Large-Margin Training of Dependency Parsers Ryan McDonald Koby Crammer Fernando Pereira Department of Computer and Information Science University of Pennsylvania Philadelphia, PA {ryantm,crammer,pereira}@cis.upenn.edu
More informationMark Scheme (Results) January 2012. GCE Decision D1 (6689) Paper 1
Mark Scheme (Results) January 2012 GCE Decision D1 (6689) Paper 1 Edexcel is one of the leading examining and awarding bodies in the UK and throughout the world. We provide a wide range of qualifications
More informationText Analytics Illustrated with a Simple Data Set
CSC 594 Text Mining More on SAS Enterprise Miner Text Analytics Illustrated with a Simple Data Set This demonstration illustrates some text analytic results using a simple data set that is designed to
More informationLanguage and Computation
Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University tamas.biro@yale.edu http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters
More informationLearning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
More informationBig Data: Big N. V.C. 14.387 Note. December 2, 2014
Big Data: Big N V.C. 14.387 Note December 2, 2014 Examples of Very Big Data Congressional record text, in 100 GBs Nielsen s scanner data, 5TBs Medicare claims data are in 100 TBs Facebook 200,000 TBs See
More information15 Prime and Composite Numbers
15 Prime and Composite Numbers Divides, Divisors, Factors, Multiples In section 13, we considered the division algorithm: If a and b are whole numbers with b 0 then there exist unique numbers q and r such
More informationPushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, 2014. School of Informatics University of Edinburgh als@inf.ed.ac.
Pushdown automata Informatics 2A: Lecture 9 Alex Simpson School of Informatics University of Edinburgh als@inf.ed.ac.uk 3 October, 2014 1 / 17 Recap of lecture 8 Context-free languages are defined by context-free
More informationFuture Trends in Airline Pricing, Yield. March 13, 2013
Future Trends in Airline Pricing, Yield Management, &AncillaryFees March 13, 2013 THE OPPORTUNITY IS NOW FOR CORPORATE TRAVEL MANAGEMENT BUT FIRST: YOU HAVE TO KNOCK DOWN BARRIERS! but it won t hurt much!
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More informationTranslating the Penn Treebank with an Interactive-Predictive MT System
IJCLA VOL. 2, NO. 1 2, JAN-DEC 2011, PP. 225 237 RECEIVED 31/10/10 ACCEPTED 26/11/10 FINAL 11/02/11 Translating the Penn Treebank with an Interactive-Predictive MT System MARTHA ALICIA ROCHA 1 AND JOAN
More informationSERVER CERTIFICATES OF THE VETUMA SERVICE
Page 1 Version: 3.4, 19.12.2014 SERVER CERTIFICATES OF THE VETUMA SERVICE 1 (18) Page 2 Version: 3.4, 19.12.2014 Table of Contents 1. Introduction... 3 2. Test Environment... 3 2.1 Vetuma test environment...
More informationAnalytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
More informationSQL Injection Attack. David Jong hoon An
SQL Injection Attack David Jong hoon An SQL Injection Attack Exploits a security vulnerability occurring in the database layer of an application. The vulnerability is present when user input is either
More informationLecture 23: Common Emitter Amplifier Frequency Response. Miller s Theorem.
Whites, EE 320 ecture 23 Page 1 of 17 ecture 23: Common Emitter mplifier Frequency Response. Miller s Theorem. We ll use the high frequency model for the BJT we developed the previous lecture and compute
More informationDEPENDENCY PARSING JOAKIM NIVRE
DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes
More informationLinguistic Research with CLARIN. Jan Odijk MA Rotation Utrecht, 2015-11-10
Linguistic Research with CLARIN Jan Odijk MA Rotation Utrecht, 2015-11-10 1 Overview Introduction Search in Corpora and Lexicons Search in PoS-tagged Corpus Search for grammatical relations Search for
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x θ) Likelihood: eskmate
More informationChapter 1. Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages
Chapter 1 CS-4337 Organization of Programming Languages Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705 Chapter 1 Topics Reasons for Studying Concepts of Programming
More informationConvergence of Translation Memory and Statistical Machine Translation
Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn and Jean Senellart 4 November 2010 Progress in Translation Automation 1 Translation Memory (TM) translators store past
More informationMarkus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
More informationPRIMARY CONTENT MODULE Algebra I -Linear Equations & Inequalities T-71. Applications. F = mc + b.
PRIMARY CONTENT MODULE Algebra I -Linear Equations & Inequalities T-71 Applications The formula y = mx + b sometimes appears with different symbols. For example, instead of x, we could use the letter C.
More informationWes, Delaram, and Emily MA751. Exercise 4.5. 1 p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].
Wes, Delaram, and Emily MA75 Exercise 4.5 Consider a two-class logistic regression problem with x R. Characterize the maximum-likelihood estimates of the slope and intercept parameter if the sample for
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationLanguage Model of Parsing and Decoding
Syntax-based Language Models for Statistical Machine Translation Eugene Charniak ½, Kevin Knight ¾ and Kenji Yamada ¾ ½ Department of Computer Science, Brown University ¾ Information Sciences Institute,
More informationApplications of Machine Learning in Software Testing. Lionel C. Briand Simula Research Laboratory and University of Oslo
Applications of Machine Learning in Software Testing Lionel C. Briand Simula Research Laboratory and University of Oslo Acknowledgments Yvan labiche Xutao Liu Zaheer Bawar Kambiz Frounchi March 2008 2
More informationNEDERBOOMS Treebank Mining for Data- based Linguistics. Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde
NEDERBOOMS Treebank Mining for Data- based Linguistics Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde LOT Summer School - June, 2014 NEDERBOOMS Exploita)on of Dutch treebanks
More informationAn Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation Robert C. Moore Chris Quirk Microsoft Research Redmond, WA 98052, USA {bobmoore,chrisq}@microsoft.com
More informationCIS570 Lecture 4 Introduction to Data-flow Analysis 3
Introdution to Data-flow Analysis Last Time Control flow analysis BT disussion Today Introdue iterative data-flow analysis Liveness analysis Introdue other useful onepts CIS570 Leture 4 Introdution to
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationM LTO Multilingual On-Line Translation
O non multa, sed multum M LTO Multilingual On-Line Translation MOLTO Consortium FP7-247914 Project summary MOLTO s goal is to develop a set of tools for translating texts between multiple languages in
More informationCodewebs: Scalable Homework Search for Massive Open Online Programming Courses
Codewebs: Scalable Homework Search for Massive Open Online Programming Courses Leonidas Guibas Andy Nguyen Chris Piech Jonathan Huang Educause, 11/6/2013 Massive Scale Education 4.5 million users 750,000
More informationStatic Analysis for Fast and Accurate Design Space Exploration of Caches
Static Analysis for Fast and Accurate Design Space Exploration of Caches Yun iang, Tulika Mitra Department of Computer Science National University of Sgapore {liangyun,tulika}@comp.nus.edu.sg ABSTRACT
More informationHow to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer
How to make the computer understand? Fall 2005 Lecture 15: Putting it all together From parsing to code generation Write a program using a programming language Microprocessors talk in assembly language
More informationTechWatch. Technology and Market Observation powered by SMILA
TechWatch Technology and Market Observation powered by SMILA PD Dr. Günter Neumann DFKI, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Juni 2011 Goal - Observation of Innovations and Trends»
More informationManagement Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011
Management Decision Making Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011 Management decision making Decision making Spreadsheet exercise Data visualization,
More informationTrends in corpus specialisation
ANA DÍAZ-NEGRILLO / FRANCISCO JAVIER DÍAZ-PÉREZ Trends in corpus specialisation 1. Introduction Computerised corpus linguistics set off around the 1960s with the compilation and exploitation of the first
More information