INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Context-Free Grammars & Parsing

Transcription

1 University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Context-Free Grammars & Parsing Stephan Oepen & Murhaf Fares Language Technology Group (LTG) November 3, 2016

2 Overview Last Time Sequence Labeling Dynamic programming Viterbi algorithm Forward algorithm

3 Overview Last Time Sequence Labeling Dynamic programming Viterbi algorithm Forward algorithm Today Grammatical structure Context-free grammar Treebanks Probabilistic CFGs

4 Recall: Ice Cream and Global Warming 0.8 S H 0.2 C 0.5 P(1 H)=0.2 P(2 H)=0.4 P(3 H)= /S 0.2 P(1 C) = 0.5 P(2 C) = 0.4 P(3 C) = 0.1

5 Recall: Viterbi Algorithm To find the best state sequence, maximize: P(s 1... s n o 1... o n ) = P(s 1 s 0 )P(o 1 s 1 )P(s 2 s 1 )P(o 2 s 2 )... The value we cache at each step: v i (s) = L max k=1 [v i 1(k) P(s k) P(o i s)] The variable v i (s) represents the maximum probability that the i-th state is s, given that we have seen O i 1. At each step, we record backpointers showing which previous state led to the maximum probability.

6 Recall: Dynamic Programming Dynamic programming algorithms solve large problems by compounding answers from smaller sub-problems record sub-problem solutions for repeated use

7 Recall: Dynamic Programming Dynamic programming algorithms solve large problems by compounding answers from smaller sub-problems record sub-problem solutions for repeated use They are used for complex problems that can be described recursively require the same calculations over and over again

8 Recall: Dynamic Programming Dynamic programming algorithms solve large problems by compounding answers from smaller sub-problems record sub-problem solutions for repeated use They are used for complex problems that can be described recursively require the same calculations over and over again Examples: Dijkstra s shortest path minimum edit distance longest common subsequence Viterbi decoding

9 Recall: An Example of the Viterbi Algorithmn H H H S /S C C C o 1 o 2 o 3

10 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 S P(H S)P(3 H) H H H /S C C C o 1 o 2 o 3

11 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 S P(H S)P(3 H) H H H /S C C C o 1 o 2 o 3

12 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 S P(H S)P(3 H) H H H /S P(C S)P(3 C) C C C v 1 (C) = o 1 o 2 o 3

13 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 S P(H S)P(3 H) H H H /S P(C S)P(3 C) C C C v 1 (C) = o 1 o 2 o 3

14 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 v 2 (H) = max(.32.12,.02.06) =.0384 S P(H S)P(3 H) P(C S)P(3 C) P(H H)P(1 H) H H H P(H C)P(1 H) C C C /S v 1 (C) = o 1 o 2 o 3

15 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 v 2 (H) = max(.32.12,.02.06) =.0384 S P(H S)P(3 H) P(C S)P(3 C) P(H H)P(1 H) H H H P(H C)P(1 H) C C C /S v 1 (C) = o 1 o 2 o 3

16 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 v 2 (H) = max(.32.12,.02.06) =.0384 S P(H S)P(3 H) P(C S)P(3 C) P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) C C C /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) = o 1 o 2 o 3

17 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 v 2 (H) = max(.32.12,.02.06) =.0384 S P(H S)P(3 H) P(C S)P(3 C) P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) C C C /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) = o 1 o 2 o 3

18 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 v 2 (H) = max(.32.12,.02.06) =.0384 v 3 (H) = max( , ) = S P(H S)P(3 H) P(C S)P(3 C) P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) P(H H)P(3 H) P(H C)P(3 H) C C C /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) = o 1 o 2 o 3

19 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 v 2 (H) = max(.32.12,.02.06) =.0384 v 3 (H) = max( , ) = S P(H S)P(3 H) P(C S)P(3 C) P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) P(H H)P(3 H) P(H C)P(3 H) C C C /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) = o 1 o 2 o 3

20 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 v 2 (H) = max(.32.12,.02.06) =.0384 v 3 (H) = max( , ) = S P(H S)P(3 H) P(C S)P(3 C) P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) P(H H)P(3 H) P(C H)P(3 C) P(H C)P(3 H) P(C C)P(3 C) C C C /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) =.032 v 3 (C) = max( , ) = o 1 o 2 o 3

21 Recall: An Example of the Viterbi Algorithmn v 1 (H) = 0.32 v 2 (H) = max(.32.12,.02.06) =.0384 v 3 (H) = max( , ) = S P(H S)P(3 H) P(C S)P(3 C) P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) P(H H)P(3 H) P(C H)P(3 C) P(H C)P(3 H) P(C C)P(3 C) C C C /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) =.032 v 3 (C) = max( , ) = o 1 o 2 o 3

22 Recall: An Example of the Viterbi Algorithmn S P(H S)P(3 H) P(C S)P(3 C) v 1 (H) = 0.32 P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) v 2 (H) = max(.32.12,.02.06) =.0384 P(H H)P(3 H) P(C H)P(3 C) P(H C)P(3 H) P(C C)P(3 C) v 3 (H) = max( , ) = C C C P( /S H) 0.2 v f ( /S ) = max( , ) = P( /S C) 0.2 /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) =.032 v 3 (C) = max( , ) = o 1 o 2 o 3

23 Recall: An Example of the Viterbi Algorithmn S P(H S)P(3 H) P(C S)P(3 C) v 1 (H) = 0.32 P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) v 2 (H) = max(.32.12,.02.06) =.0384 P(H H)P(3 H) P(C H)P(3 C) P(H C)P(3 H) P(C C)P(3 C) v 3 (H) = max( , ) = C C C P( /S H) 0.2 v f ( /S ) = max( , ) = P( /S C) 0.2 /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) = o 1 o 2 o 3 H v 3 (C) = max( , ) =.0016

24 Recall: An Example of the Viterbi Algorithmn S P(H S)P(3 H) P(C S)P(3 C) v 1 (H) = 0.32 P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) v 2 (H) = max(.32.12,.02.06) =.0384 P(H H)P(3 H) P(C H)P(3 C) P(H C)P(3 H) P(C C)P(3 C) v 3 (H) = max( , ) = C C C P( /S H) 0.2 v f ( /S ) = max( , ) = P( /S C) 0.2 /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) =.032 v 3 (C) = max( , ) = o 1 o 2 o 3 H H

25 Recall: An Example of the Viterbi Algorithmn S P(H S)P(3 H) P(C S)P(3 C) v 1 (H) = 0.32 P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) v 2 (H) = max(.32.12,.02.06) =.0384 P(H H)P(3 H) P(C H)P(3 C) P(H C)P(3 H) P(C C)P(3 C) v 3 (H) = max( , ) = C C C P( /S H) 0.2 v f ( /S ) = max( , ) = P( /S C) 0.2 /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) = o 1 o 2 o 3 H H H v 3 (C) = max( , ) =.0016

26 Recall: An Example of the Viterbi Algorithmn S P(H S)P(3 H) P(C S)P(3 C) v 1 (H) = 0.32 P(H H)P(1 H) H H H P(C H)P(1 C) P(H C)P(1 H) P(C C)P(1 C) v 2 (H) = max(.32.12,.02.06) =.0384 P(H H)P(3 H) P(C H)P(3 C) P(H C)P(3 H) P(C C)P(3 C) v 3 (H) = max( , ) = C C C P( /S H) 0.2 v f ( /S ) = max( , ) = P( /S C) 0.2 /S v 1 (C) = 0.02 v 2 (C) = max(.32.1,.02.25) = o 1 o 2 o 3 H H H v 3 (C) = max( , ) =.0016

27 Recall: Using HMMs The HMM models the process of generating the labelled sequence. We can use this model for a number of tasks: P(S, O) given S and O P(O) given O S that maximizes P(S O) given O P(s x O) given O We learn model parameters from a set of observations.

28 Moving Onwards Determining which string is most likely: How to recognize speech vs. How to wreck a nice beach which tag sequence is most likely for flies like flowers: NNS VB NNS vs. VBZ P NNS which syntactic structure is most likely: S S NP VP NP VP I VBD ate N sushi NP PP with tuna I VBD ate NP N sushi PP with tuna

32 From Linear Order to Hierarchical Structure The models we have looked at so far: n-gram models (Markov chains). Purely linear (sequential) and surface-oriented. sequence labeling: HMMs. Adds one layer of abstraction: PoS as hidden variables. Still only sequential in nature.

33 From Linear Order to Hierarchical Structure The models we have looked at so far: n-gram models (Markov chains). Purely linear (sequential) and surface-oriented. sequence labeling: HMMs. Adds one layer of abstraction: PoS as hidden variables. Still only sequential in nature. Formal grammar adds hierarchical structure. In NLP, being a sub-discipline of AI, we want our programs to understand natural language (on some level). Finding the grammatical structure of sentences is an important step towards understanding. Shift focus from sequences to grammatical structures.

34 Why We Need Structure (1/3) Constituency Words tends to lump together into groups that behave like single units: we call them constituents. Constituency tests give evidence for constituent structure: interchangeable in similar syntactic environments. can be co-ordinated can be moved within a sentence as a unit

35 Why We Need Structure (1/3) Constituency Words tends to lump together into groups that behave like single units: we call them constituents. Constituency tests give evidence for constituent structure: interchangeable in similar syntactic environments. can be co-ordinated can be moved within a sentence as a unit (4) Kim read [a very interesting book about grammar] NP. Kim read [it] NP. (5) Kim [read a book] VP, [gave it to Sandy] VP, and [left] VP. (6) [Read the book] VP I really meant to this week. Examples from Linguistic Fundamentals for NLP: 100 Essentials from Morphology and Syntax. Bender (2013)

36 Why We Need Structure (2/3) Constituency Constituents are theory-dependent, and are not absolute or language-independent.

37 Why We Need Structure (2/3) Constituency Constituents are theory-dependent, and are not absolute or language-independent. A constituent usually has a head element, and is often named according to the type of its head: A noun phrase (NP) has a nominal (noun-type) head: (9) [ a very interesting book about grammar ] NP A verb phrase ) has a verbal head: (10) [ gives books to students ] VP

38 Why We Need Structure (3/3) Grammatical functions Terms such as subject and object describe the grammatical function of a constituent in a sentence.

39 Why We Need Structure (3/3) Grammatical functions Terms such as subject and object describe the grammatical function of a constituent in a sentence. Agreement establishes a symmetric relationship between grammatical features. The decision of the Nobel committee members surprises most of us. Why would a purely linear model have problems predicting this phenomenon?

40 Why We Need Structure (3/3) Grammatical functions Terms such as subject and object describe the grammatical function of a constituent in a sentence. Agreement establishes a symmetric relationship between grammatical features. The decision of the Nobel committee members surprises most of us. Why would a purely linear model have problems predicting this phenomenon? Verb agreement reflects the grammatical structure of the sentence, not just the sequential order of words.

41 Grammars: A Tool to Aid Understanding Formal grammars describe a language, giving us a way to: judge or predict well-formedness Kim was happy because passed the exam. Kim was happy because final grade was an A.

42 Grammars: A Tool to Aid Understanding Formal grammars describe a language, giving us a way to: judge or predict well-formedness Kim was happy because passed the exam. Kim was happy because final grade was an A. make explicit structural ambiguities Have her report on my desk by Friday! I like to eat sushi with { chopsticks tuna }.

43 Grammars: A Tool to Aid Understanding Formal grammars describe a language, giving us a way to: judge or predict well-formedness Kim was happy because passed the exam. Kim was happy because final grade was an A. make explicit structural ambiguities Have her report on my desk by Friday! I like to eat sushi with { chopsticks tuna }. derive abstract representations of meaning Kim gave Sandy a book. Kim gave a book to Sandy. Sandy was given a book by Kim.

44 A Grossly Simplified Example The Grammar of Spanish S NP VP VP V NP VP VP PP PP P NP NP nieve NP Juan NP Oslo V amó P en

47 A Grossly Simplified Example The Grammar of Spanish S NP VP VP V NP VP VP PP PP P NP NP nieve NP Juan NP Oslo V amó P en S NP VP Juan VP PP V NP P NP amó nieve en Oslo Juan amó nieve en Oslo

57 A Grossly Simplified Example The Grammar of Spanish S NP VP { VP ( NP ) } VP V NP { V ( NP ) } VP VP PP { PP ( VP ) } PP P NP { P ( NP ) } NP nieve { snow } NP Juan { John } NP Oslo { Oslo } V amó { λbλa adore ( a, b ) } P en { λdλc in ( c, d ) } S NP VP Juan VP PP V NP P NP amó nieve en Oslo Juan amó nieve en Oslo

58 Meaning Composition (Still Very Simplified) S: {in ( adore ( John, snow ), Oslo )} NP: {John} VP: {λa in ( adore ( a, snow ), Oslo )} Juan VP: {λa adore ( a, snow )} PP:{λc in ( c, Oslo )} V:{λbλa adore ( a, b )} NP:{snow} P:{λdλc in ( c, d )} NP:{Oslo} amó nieve en Oslo VP V NP { V ( NP ) }

59 Another Interpretation S: {adore (John, in ( snow, Oslo )} NP: {John} VP: {λa adore (a, in ( snow, Oslo )} Juan V:{λbλa adore ( a, b )} NP:{in ( snow, Oslo )} amó NP:{snow} PP:{λc in ( c, Oslo )} nieve P:{λdλc in ( c, d )} NP:{Oslo} en Oslo NP NP PP { PP ( NP ) }

60 Context Free Grammars (CFGs) Formal system for modeling constituent structure. Defined in terms of a lexicon and a set of rules

61 Context Free Grammars (CFGs) Formal system for modeling constituent structure. Defined in terms of a lexicon and a set of rules Formal models of language in a broad sense natural languages, programming languages, communication protocols,...

62 Context Free Grammars (CFGs) Formal system for modeling constituent structure. Defined in terms of a lexicon and a set of rules Formal models of language in a broad sense natural languages, programming languages, communication protocols,... Can be expressed in the meta-syntax of the Backus-Naur Form (BNF) formalism. When looking up concepts and syntax in the Common Lisp HyperSpec, you have been reading (extended) BNF.

63 Context Free Grammars (CFGs) Formal system for modeling constituent structure. Defined in terms of a lexicon and a set of rules Formal models of language in a broad sense natural languages, programming languages, communication protocols,... Can be expressed in the meta-syntax of the Backus-Naur Form (BNF) formalism. When looking up concepts and syntax in the Common Lisp HyperSpec, you have been reading (extended) BNF. Powerful enough to express sophisticated relations among words, yet in a computationally tractable way.

64 CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = C, Σ, P, S

65 CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = C, Σ, P, S C is the set of categories (aka non-terminals), {S, NP, VP, V}

66 CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = C, Σ, P, S C is the set of categories (aka non-terminals), {S, NP, VP, V} Σ is the vocabulary (aka terminals), {Kim, snow, adores, in}

67 CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = C, Σ, P, S C is the set of categories (aka non-terminals), {S, NP, VP, V} Σ is the vocabulary (aka terminals), {Kim, snow, adores, in} P is a set of category rewrite rules (aka productions) S NP VP VP V NP NP Kim NP snow V adores

68 CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = C, Σ, P, S C is the set of categories (aka non-terminals), {S, NP, VP, V} Σ is the vocabulary (aka terminals), {Kim, snow, adores, in} P is a set of category rewrite rules (aka productions) S NP VP VP V NP NP Kim NP snow V adores S C is the start symbol, a filter on complete results;

69 CFGs (Formally, this Time) Formally, a CFG is a quadruple: G = C, Σ, P, S C is the set of categories (aka non-terminals), {S, NP, VP, V} Σ is the vocabulary (aka terminals), {Kim, snow, adores, in} P is a set of category rewrite rules (aka productions) S NP VP VP V NP NP Kim NP snow V adores S C is the start symbol, a filter on complete results; for each rule α β 1, β 2,..., β n P: α C and β i C Σ

70 Generative Grammar Top-down view of generative grammars: For a grammar G, the language L G is defined as the set of strings that can be derived from S. To derive w n from S, we use the rules in P to recursively 1 rewrite S into the sequence w n 1 where each w i Σ

71 Generative Grammar Top-down view of generative grammars: For a grammar G, the language L G is defined as the set of strings that can be derived from S. To derive w n from S, we use the rules in P to recursively 1 rewrite S into the sequence w n 1 where each w i Σ The grammar is seen as generating strings. Grammatical strings are defined as strings that can be generated by the grammar.

72 Generative Grammar Top-down view of generative grammars: For a grammar G, the language L G is defined as the set of strings that can be derived from S. To derive w n from S, we use the rules in P to recursively 1 rewrite S into the sequence w n 1 where each w i Σ The grammar is seen as generating strings. Grammatical strings are defined as strings that can be generated by the grammar. The context-freeness of CFGs refers to the fact that we rewrite non-terminals without regard to the overall context in which they occur.

73 Treebanks Generally A treebank is a corpus paired with gold-standard (syntactico-semantic) analyses Can be created by manual annotation or selection among outputs from automated processing (plus correction).

74 Treebanks Generally A treebank is a corpus paired with gold-standard (syntactico-semantic) analyses Can be created by manual annotation or selection among outputs from automated processing (plus correction). Penn Treebank (Marcus et al., 1993) About one million tokens of Wall Street Journal text Hand-corrected PoS annotation using 45 word classes Manual annotation with (somewhat) coarse constituent structure

75 One Example from the Penn Treebank S [WSJ 2350] advp, np-sbj-1 vp. rb, np nn vbz vp. Still nnp pos move is vbg vbn np advp-mnr Time s being received -none- rb Still, Time s move is being received well. *-1 well

78 Elimination of Traces and Functions S [WSJ 2350] advp, np vp. rb, np nn vbz vp. Still nnp pos move is vbg vbn advp Time s being received rb Still, Time s move is being received well. well

79 Probabilitic Context-Free Grammars We are interested, not just in which trees apply to a sentence, but also to which tree is most likely.

80 Probabilitic Context-Free Grammars We are interested, not just in which trees apply to a sentence, but also to which tree is most likely. Probabilistic context-free grammars (PCFGs) augment CFGs by adding probabilities to each production, e.g. S NP VP 0.6 S NP VP PP 0.4 These are conditional probabilities the probability of the right hand side (RHS) given the left hand side (LHS) P(S NP VP) = P(NP VP S)

81 Probabilitic Context-Free Grammars We are interested, not just in which trees apply to a sentence, but also to which tree is most likely. Probabilistic context-free grammars (PCFGs) augment CFGs by adding probabilities to each production, e.g. S NP VP 0.6 S NP VP PP 0.4 These are conditional probabilities the probability of the right hand side (RHS) given the left hand side (LHS) P(S NP VP) = P(NP VP S) We can learn these probabilities from a treebank, again using Maximum Likelihood Estimation.

82 Estimating PCFGs (1/3) S [WSJ 2350] advp, np vp. rb, np nn vbz vp. Still nnp pos move is vbg vbn advp Time s being received rb Still, Time s move is being received well. well

83 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. "."))

84 Estimating PCFGs (2/3) RB Still 1 (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. "."))

85 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1

86 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1,, 1

87 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1,, 1 NNP Time 1

88 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1,, 1 NNP Time 1 POS s 1

89 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1,, 1 NNP Time 1 POS s 1 NP NNP POS 1

90 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1

91 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1

92 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1 VBZ is 1

93 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1 VBZ is 1 VBG being 1

94 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1 VBZ is 1 VBG being 1 VBN received 1

95 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 1,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1 VBZ is 1 VBG being 1 VBN received 1 RB well 1

96 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 2,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1 VBZ is 1 VBG being 1 VBN received 1 RB well 1

97 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 2,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1 VBZ is 1 VBG being 1 VBN received 1 RB well 1 VP VBN ADVP 1

98 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 2,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1 VBZ is 1 VBG being 1 VBN received 1 RB well 1 VP VBN ADVP 1 VP VBG VP 1

99 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 2,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1 VBZ is 1 VBG being 1 VBN received 1 RB well 1 VP VBN ADVP 1 VP VBG VP 1 \.. 1

100 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 2,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1 VBZ is 1 VBG being 1 VBN received 1 RB well 1 VP VBN ADVP 1 VP VBG VP 1 \.. 1 S ADVP, NP VP \. 1

101 Estimating PCFGs (2/3) (S (ADVP (RB "Still")) (, ",") (NP (NP (NNP "Time") (POS " s")) (NN "move")) (VBZ "is") (VBG "being") (VBN "received") (ADVP (RB "well"))))) (\. ".")) RB Still 1 ADVP RB 2,, 1 NNP Time 1 POS s 1 NP NNP POS 1 NN move 1 NP NP NN 1 VBZ is 1 VBG being 1 VBN received 1 RB well 1 VP VBN ADVP 1 VP VBG VP 1 \.. 1 S ADVP, NP VP \. 1 START S 1

102 Estimating PCFGs (3/3) Once we have counts of all the rules, we turn them into probabilities.

103 Estimating PCFGs (3/3) Once we have counts of all the rules, we turn them into probabilities. S ADVP, NP VP \. 50 S NP VP \. 400 S NP VP PP \. 350 S VP! 100 S NP VP S \. 200 S NP VP 50

104 Estimating PCFGs (3/3) Once we have counts of all the rules, we turn them into probabilities. S ADVP, NP VP \. 50 S NP VP \. 400 S NP VP PP \. 350 S VP! 100 S NP VP S \. 200 S NP VP 50 P(S ADVP, NP VP \.) C(S ADVP, NP VP \.) C(S)

105 Estimating PCFGs (3/3) Once we have counts of all the rules, we turn them into probabilities. S ADVP, NP VP \. 50 S NP VP \. 400 S NP VP PP \. 350 S VP! 100 S NP VP S \. 200 S NP VP 50 P(S ADVP, NP VP \.) = C(S ADVP, NP VP \.) C(S) =