Modeling Graph Languages with Grammars Extracted via Tree Decompositions
|
|
|
- Phoebe Gibson
- 9 years ago
- Views:
Transcription
1 Modeling Graph Languages with Grammars Extracted via Tree Decompositions Bevan Keeley Jones, School of Informatics University of Edinburgh Edinburgh, UK Sharon Goldwater Mark Johnson Department of Computing Macquarie University Sydney, Australia Abstract Work on probabilistic models of natural language tends to focus on strings and trees, there is increasing interest in more general graph-shaped structures since they seem to be better suited for representing natural language semantics, ontologies, or other varieties of knowledge structures. However, while there are relatively simple approaches to defining generative models over strings and trees, it has proven more challenging for more general graphs. This paper describes a natural generalization of the n-gram to graphs, making use of Hyperedge Replacement Grammars to define generative models of graph languages. Introduction While most work in natural language processing (NLP), and especially within statistical NLP, has historically focused on strings and trees, there is increasing interest in deeper graph-based analyses which could facilitate natural language understanding and generation applications. Graphs have a long tradition within knowledge representation (Sowa, 97), natural language semantics (Titov et al., 9; Martin and White, ; Le and Zuidema, ), and in models of deep syntax (Oepen et al., ; de Marneffe and Manning, 8). Graphs seem particularly appropriate for representing semantic structures, since a single concept could play multiple roles within a sentence. For instance, in the semantic representation at the bottom right of Figure is an argument of both and own in the sentence, The is said to be rich in is privately owned. However, work on graphs has been hampered, due, in part, to the absence of a general agreed upon formalism for processing and modeling such data structures. Where string and tree modeling benefits from the wildly popular Probabilistic Context Free Grammar (PCFG) and related formalisms such as Tree Substitution Grammar, Regular Tree Grammar, Hidden Markov Models, and n-grams, there is nothing of similar popularity for graphs. We need a slightly different formalism, and Hyperedge Replacement Grammar (HRG) (Drewes et al., 997), a variety of context-free grammar for graphs, suggests itself as a reasonable choice given its close analogy with CFG. Of course, in order to make use of the formalism we need actual grammars, and this paper fills that gap by introducing a procedure for automatically extracting grammars from a corpus of graphs. Grammars are appealing for the intuitive and systematic way they capture the compositionality of language. For instance, just as a PCFG could be used to parse the as a syntactic subject, so could a graph grammar represent as a constituent in a parse of the corresponding semantic graph. In fact, picking a formalism that is so similar to the PCFG makes it easy to adapt proven, familiar techniques for training and inference such as the inside-outside algorithm, and because HRG is context-free, parses can be represented by trees, facilitating the use of many more tools from tree automata (Knight and Graehl, ). Furthermore, the operational parallelism with PCFG makes it easy to integrate graph-based systems with syntactic models in synchronous grammars (Jones et al., ). Probabilistic versions of deep syntactic models such as Lexical Functional Grammar and HPSG (Johnson et al., 999; Riezler et al., ) are one grammar-based approach to
2 modeling graphs represented in the form of feature structures. However, these models are tied to a particular linguistic paradigm, and they are complex, requiring a great deal of effort to engineer and annotate the necessary grammars and corpora. It is also not obvious how to define generative probabilistic models with such grammars, limiting their utility in certain applications. In contrast, this paper describes a method of automatically extracting graph grammars from a corpus of graphs, allowing us to easily estimate rule probabilities and define generative models. The class of grammars we extract generalize the types of regular string and tree grammars one might use to define a bigram or similar Markov model for trees. In fact, the procedure produces regular string and tree grammars as special cases when the input graphs themselves are strings or trees. There is always overhead in learning a new formalism, so we will endeavor to provide the necessary background as simply as possible, according to the following structure. Section introduces Hyperedge Replacement Grammars, which generate graphs, and their probabilistic extension, weighted HRGs. Section explains how each HRG derivation of a graph induces a tree decomposition of that graph. Given a tree decomposition of a graph, we use that mapping in reverse to induce an HRG that generates that graph (section ). Section also introduces four different strategies for finding tree decompositions of (and hence inducing HRGs from) a set of graphs. Section applies these strategies to the LOGON corpus (Oepen et al., ) and evaluates the induced weighted HRGs in terms of held-out perplexity. Section concludes the paper and discusses possible applications and extensions. Graphs and Hyperedge Replacement Grammars Hyperedge Replacement Grammar (HRG) is a generalization of CFG to graph languages (see Drewes et al. (997) for an overview). Where a CFG builds up strings by replacing symbols with new substrings, an HRG builds graphs by replacing edges with subgraphs. As a contextfree formalism, HRG derivations can be described by trees, similar to CFG parses. Thus, in the case of probabilistic HRG, it is possible to assign rule weights to define easily factorizable probability distriions over graphs, just as PCFGs do for strings. We start by defining a hypergraph, a generalization of a graph where edges may link any finite number of vertices. Formally, a hypergraph is a tuple (V, E, α, l, x). V and E are finite sets of vertices and hyperedges, respectively. The attachment function α : E V maps each hyperedge e E to a sequence of pairwise distinct vertices from V, where we call the length of α(e) the arity of e. The labeling function l : E Σ maps each hyperedge to a symbol in some ranked alphabet Σ, where the rank of l(e) is e s arity. Vertices are unlabeled, they can be simulated by treating unary hyperedges (i.e., hyperedges with a single vertex) as vertex labels. Finally, each graph has a set of zero or more external vertices, arranged in a sequence x V (pairwise distinct), which plays an important role in the rewriting mechanism of HRG. Just as hyperedges have an arity, so too do hypergraphs, defined as the length of x. We are primarily interested in languages of simple directed graphs, hypergraphs where each edge is either binary or, for vertex labels, unary. In this case, we can indicate visually the ordering on a binary edge with vertex sequence v v by an arrow pointing from vertex v to v. We may make use of hyperedges of arbitrary arity, though, for intermediate rewriting steps during derivations. The semantic dependency graph at the bottom right of Figure, taken from the Redwoods corpus (Oepen et al., ), is an example of a simple graph. It has both unary edges for expressing predicates like private and own and binary edges for specifying their relations. In principle, any vertex can have more than one unary edge, a fact we make use of in HRG rule definitions, such as in the graph on the right-hand side of rule r in Figure where vertex has two unary edges labeled and N. A weighted HRG is an edge rewriting system for generating hypergraphs, also defined as a tuple (Σ, N, S, R). Σ is a ranked alphabet of edge labels, N Σ a set of nonterminal symbols, S N a special start symbol, and R is a finite set of weighted rules. Each rule
3 in R is of the form [A h].w, where h is a hypergraph with edge labels from Σ, A N has rank equal to the arity of h, and weight w is a real number. As with PCFGs, a weighted HRG is probabilistic if the weights of all rules with the same ranked symbol A on the lefthand side sum to one. In the case of probabilistic HRG, the probability of a derivation is the product of the weights of the rules in the derivation, just as for PCFG. Figure shows an example of an HRG and a sample derivation. The external vertices of the right-hand side graphs have been shaded, and their sequence should be read top to bottom (e.g., to in rule r). Vertices have been identified by numbers, these identifiers are included only to make it easier to refer to them in our discussion; strictly speaking, vertices are unlabeled, and these numbers are irrelevant to the operation of the grammar. Nonterminal edges are dashed to make them easier to identify. Hyperedge replacement, the basic rewriting mechanism of HRG, is an operation where a hypergraph is substituted for an edge. If g is a hypergraph containing edge e, and h is another hypergraph with the same arity as e, edge e can be replaced with h by first removing e from g and then fusing h and g together at the external vertices of h and the vertices of α(e). So, if α(e) = v v...v k and h has external vertices u u...u k, we would fuse each u i to the corresponding v i. Much like with CFG, where each step of a derivation replaces a symbol by a substring, each step of an HRG derivation replaces an edge with a certain nonterminal symbol label by the right-hand side graph of some rule with the same symbol on its left-hand side. For instance, in the application of rule r in the fourth step of Figure, the edge N is replaced by the graph N by removing the red N edge and then attaching the new subgraph. Rule r has an external vertex sequence of to, and these are fused to the incident vertices of the nonterminal edge N. The edge to be replaced in each step has been highlighted in red to ease reading. Tree Decompositions We now introduce one additional piece of theoretical machinery, the tree decomposition (Bodlaender, 99). Tree decompositions play an important role in graph theory, feature prominently in the junction tree algorithm from machine learning (Pearl, 988), and have proven valuable for efficient parsing (Gildea, ; Chiang et al., ). Importantly, Lautemann (988) proved that every HRG parse identifies a particular tree decomposition, and by restricting ourselves to a certain type of tree we will draw an even tighter relationship, allowing us to identify parses given tree decompositions. A tree decomposition of a graph g is a tree whose nodes identify subsets of the vertices of g which satisfy the following three properties: Vertex Cover: Every vertex of g is contained by at least one tree node. Edge Cover: For every edge e of the graph, there is a tree node η such that each vertex of α(e) is in η. Running Intersection: Given any two tree nodes η and η, both containing vertex v, all tree nodes on the unique path from η to η also contain v. Figure presents four different tree decompositions of the graph shown at the bottom right of Figure. Consider (d). Vertex cover is satisfied by the fact that every vertex of the graph appears in at least one tree node. Graph vertex, for example, is covered by two nodes {,, } and {,, }. Similarly, every edge is covered by at least one of the nodes. Node {,, } covers one binary edge,, and three unary edges:,, and. We focus on a particular class called edgemapped tree decompositions, defined by pairs (t, µ) where t is a tree decomposition of some graph g and µ is a bijection from the nodes of t to the edges of g, where a node also covers the edge it maps to. Every graph has at least one edge-mapped tree decomposition; Figure (a)- (c) illustrates three such edge-mapped decompositions for a particular graph, where the mapping is shown by the extra labels next to the tree nodes. The edge mapping simplifies the rule extraction procedure described in Section since traversing the tree and following To avoid confusion, we adopt a terminology where node is always used in respect to tree decompositions and vertex and edge to graphs.
4 N N (r) S N N (r) N (r) N N (r) N (r) N N N N N (r) N (r) N N private Nprivate (r) N (r) N private N (r7) N N (r8) N own (r) (r) N N own Nown (r9) N N r r r r r S N N N N N N N N N N r9 r8 N r7 r r N N N N N N N r r r r private N private private N private own N own private own Figure : An HRG and a derivation of the semantic dependency graph for the is said to be rich in is privately owned. External vertices are shaded and ordered top to bottom, nonterminal edges are dashed, and the one being replaced is highlighted in red.
5 (a) {} {, } {, } {,, } {, } {,, } {,, } {,, } {, } (b) {} {, } {, } {,, } {, } {} {, } {,, } {, } {,, } {, } own (c) {, } {} {,, } {, } {,, } {, } {, } {, } {} {,, } {, } private {,, } {, } own {, } {,, } {, } own {,, } {, } private {, } Figure : Edge-mapped (a-c) and non-edgemapped (d) tree decompositions for the graph at the bottom right of Figure. this mapping µ guarantees that every graph edge is visited exactly once. {,, } {, } private {, } Running intersection will also prove important for rule extraction, since it tracks the tree violations of the graph by passing down the end points of edges that link edges in different branches of the decomposition. This same information must be passed down the respective paths of the HRG derivation tree via the external vertices of rule-right hand sides. Figure uses bold face and vertex order to highlight the vertices that must be added to each node beyond those needed to cover its corresponding edge. In the decomposition shown in (b), vertex must be passed from the node mapping to down to the node mapping to because the two edges share that vertex. Any HRG derivation will need to pass down vertices in a similar manner to specify which edges get attached to which vertices. As suggested by the four trees of Figure, there are always many possible decompositions (d) {,, } {, } {,, } {,, } {,, } Algorithm : Extract HRG rule A h from tree decomposition node η. function Extract(η) A label(parent(η), parent(η) η ) h.x order(η, parent(η) η) add terminal edge µ(η) to h for all η i children(η) do add nonterminal edge u i to h α(u i ) order(η i, η η i ) l(u i ) label(η, η η i ) return [A h] for any given graph. In the next section we describe three methods of producing tree decompositions, each leading to distinct grammars with different language modeling properties. HRG Extraction Rule extraction proceeds by first selecting a particular tree decomposition for a graph and then walking this tree to extract grammar rules in much the same way as one extracts n- grams or Regular Tree Grammars (RTG) from a corpus of strings or trees. The procedure (Algorithm ) extracts a single rule for each node of the decomposition to generate the associated terminal edge plus a set of nonterminals which can be subsequently expanded to
6 generate the subgraphs corresponding to each subtree of the decomposition node. In particular, given the tree decomposition in Figure (c), the procedure produces the grammar in Figure. Rule extraction works for any connected simple graph and can be easily adapted for arbitrary hypergraphs. Start by assigning the left-hand side nonterminal symbol according to label(parent(η), r), which returns a symbol determined by η s parent with rank r, the number of vertices in common between η and its parent. The external vertices of h are assigned by sorting the vertices that η shares with its parent. Any ordering policy will work so long as it produces the same ordering with respect to a given decomposition node. What is important is that the order of the external vertices of a rule match that of the vertices of the nonterminal edge it expands. The algorithm then constructs the rest of h by including terminal edge µ(η), and adding a nonterminal edge for each child η i of η, with vertices assigned according to an ordering of the vertices that η shares with η i, again labeled according to label. The function label just returns a nonterminal symbol of a given rank, chosen to match the number of external vertices of the righthand side. There are many possible choices of label; it can even be a function that always returns the same symbol for a given rank. For purposes of language modeling, it is useful to condition rule probabilities on the label of the edge associated with the parent node in the decomposition (analogous to conditioning on the preceding word in a bigram setting). It is also useful to distinguish the direction of that preceding edge. For instance, we would expect to have a different probability based on whether it is being generated as the argument of predicate vs. as a descendant of its own argument. Thus, each nonterminal encodes () the label of the preceding edge and () its direction with respect to the current edge as defined according to the headto-tail relation, where we edge e j is headto-tail with preceding edge e i iff the last vertex We experimented with various orderings, from preorder traversals of the tree decomposition to simply sorting by vertex identity, all with similar results. of α(e i ) is the first of α(e j ). For instance, is in head-to-tail relation with, while is not head-to-tail with. The grammar in Figure is extracted according to the tree decomposition in Figure (c). Consider how rule r is constructed while visiting the node η = {, } which maps to unary edge. The left-hand side symbol N comes from the label of the edge associated with η s parent node {,, } and has a rank of {, } {,, } =. The rule right-hand side is constructed so that it contains and two nonterminal edges. The first nonterminal edge comes from the intersection of η with left child {, }, yielding unary sequence and edge N. The second nonterminal edge is constructed similarly by ordering the vertices in the intersection of η with its right child {, } to get the binary sequence to, producing N. Finally, the external vertex sequence comes from ordering the members of {, } {,, }. The particular edge-mapped tree decomposition plays a key role in the form of the extracted rules. In particular, each branching of the tree specifies the number of nonterminals in the corresponding rule. For example, decompositions such as Figure (a) result in linear grammars, where every rule right-hand side contains at most one nonterminal. We experiment with three different strategies for producing edge-mapped tree decompositions. In each case, we start by building a node-to-edge map by introducing a new tree node to cover each edge of the graph, simultaneously ensuring the vertex and edge cover properties. The strategies differ in how the nodes are arranged into a tree. One simple approach (linear) is to construct a linearized sequence of edges by performing a depth first search of the graph and adding edges when we visit incident vertices. This produces nonbranching trees such as Figure (a). Alternatively, we can construct the decomposition according to the actual depth first search tree (dfs), producing decompositions like (b). Finally, we construct what we call a topological sort tree (top), where we add children to each node so as to maximize the number of head-
7 to-tail transitions, producing trees such as (c). For rooted DAGs, this is easy; just construct a directed breadth first search tree of the graph starting from the root vertex. It is more involved for other graphs still straightforward, accomplished by finding a minimum spanning tree of a newly constructed weighted directed graph representing head-to-tail transitions as arcs with weight and all other contiguous transitions as arcs of weight. Once the edge-mapped nodes are arranged in a tree all that is left is to add vertices to each to satisfy running intersection. One attractive feature of top is that, for certain types of input graphs, it produces grammars of well-known classes. In particular, if the graph is a string (a directed path), the grammar will be a right-linear CFG, i.e., a regular string grammar (a bigram grammar, in fact), and if it is a rooted tree, the unique topological sort tree leads to a grammar that closely resembles an RTG (where trees are edge-labeled and siblings are un-ordered). The other decomposition strategies do not constrain the tree as much, and their grammars are not necessarily regular. Another nice feature of top is that subtrees of a parse tend to correspond to intuitive modules of the graph. For instance, the grammar first generates a predicate like and then it proceeds to generate the subgraphs corresponding to its arguments and, much as one would expect a syntactic dependency grammar to generate a head followed by its dependents. The linear grammar derived from Figure (a), on the other hand, would generate as a descendant of. We also explore an augmentation of top called the rooted topological sort tree (r-top). Any graph can be converted to a rooted graph by simply adding an extra vertex and making it the parent of every vertex of in-degree zero (or if there are none, picking a member of each connected component at random). We exploit this fact to produce a version of top that generates all graphs as though they were rooted by starting off each derivation with a rule that generates every vertex with in-degree zero. We expect rooted graphs to produce simpler grammars in general because they reduce the number of edges that must be generated in non-topological order, requiring fewer rules that differ primarily in whether they generate an edge in head-to-tail order or not. In particular, if a graph is acyclic, all edges will be generated in head-to-tail relation and the corresponding grammar will contain fewer nonterminals. Evaluation We experiment using elementary semantic dependency graphs taken from the LOGON portion of the Redwoods corpus (Oepen et al., ). From Table, we can see that, while there are a few tree-shaped graphs, the majority are more general DAGs. Nevertheless, edge density is low; the average graph contains about. binary edges and.9 vertices. We set aside every th graph for the test set, and estimate the models from the remaining,8, replacing terminals occurring times in the training set with special symbol UNK. Model parameters are calculated from the frequency of extracted rules using a meanfield Variational Bayesian approximation of a symmetric Dirichlet prior with parameter β (Bishop, ). This amounts to counting the number of times each rule r with left-hand side symbol A is extracted and then computing its weight θ r according to θ r = exp ( Ψ(n r + β) Ψ( r :r =Ah n r + β) where n r is the frequency of r and Ψ is the standard digamma function. This approximation of a Dirichlet prior offers a simple yet principled way of simultaneously smoothing rule weights and incorporating a soft assumption of sparsity (i.e., only a few rules should receive very high probability). Specifically, we somewhat arbitrarily selected a value of. for β, which should result in a moderately sparse distriion. We evaluate each model by computing perplexity: N i= N ln p(g i ), where N is the number of graphs in the test set, g i is the i th graph, and p(g i ) is its probability according to the model, computed as the product of the weights of the rules in the extracted derivation. Better models should assign higher probability to g i, thereby achieving lower perplexity. ),
8 (a) Graphs strings r-trees 8 r-dags dags 8 total (b) Edges unary binary types tokens Table : LOGON corpus. (a) Graph types (r stands for rooted). (b) Edge types and tokens. model perplexity size linear, 9,98 dfs,, top,8,98 r-top, 9, Table : Model perplexity and grammar size. Table lists the perplexities of the language models defined according to our four different tree decomposition strategies. Linear is relatively poor since it makes little distinction between local and more distant relations between edges. For instance, the tree in Figure (a) results in a grammar where is generated as the child of distantly related as a remote descendant of neighboring edge. Dfs is better, suffers from similar problems. Both top and r-top perform markedly better, r-top less so because the initial rule required for generating all vertices of indegree zero is often very improbable. There are different such rules required for describing the training data, many of which appear only once. We believe there are ways of factorizing these rules to mitigate this sparsity effect, this is left to future work. Grammar sizes are also somewhat telling. The linear grammar is quite large, due to the extra rules required for handling the longdistance relations. The other grammars are of a similar, much smaller size, dfs is smallest since it tends to produce trees of much smaller branching factor, allowing for greater rule reuse. As predicted, the r-top grammar is somewhat smaller than the vanilla top grammar,, as previously noted, the potential reduction in sparsity is counteracted by the introduction of the extra initial rules. Conclusion & Discussion Graph grammars are an appealing formalism for modeling the kinds of structures required for representing natural language semantics, there is little work in actually defining grammars for doing so. We have introduced a simple framework for automatically extracting HRGs, based upon first defining a tree decomposition and then walking this tree to extract rules in a manner very similar to how one extracts RTG rules from a corpus of trees. By varying the kinds of tree decomposition used, the procedure produces different types of grammars. While restricting consideration to a broad class of tree decompositions where visiting tree nodes corresponds to visiting edges of the graph, we explored four special cases, demonstrating that one case, where parent-tochild node relations in the tree maximize headto-tail transitions between graph edges, performs best in terms of perplexity on a corpus of semantic graphs. This topological ordering heuristic seems reasonable for the corpus we experimented on since such parent-child transitions are equivalent to predicate-argument transitions in the semantic representations. Interesting questions remain as to which particular combinations of graph and decomposition types lead to useful classes of graph grammars. In our case we found that our topological sort tree decomposition leads to regular grammars when the graphs describe strings or particular kinds of trees, making them useful for defining simple Markov models and also making it possible to perform other operations like language intersection (Gecseg and Steinby, 98). We have presented only an initial study and there are potentially many interesting combinations. Acknowledgments We thank Mark-Jan Nederhof for his comments on an early draft and the anonymous reviewers for their helpful feedback. This research was supported in part by a prize studentship from the Scottish Informatics and Computer Science Alliance, the Australian Research Council s Discovery Projects funding scheme (project numbers DP and DP9) and the US Defense Advanced Research Projects Agency under contract FA8---7.
9 References Christopher M. Bishop.. Pattern Recognition and Machine Learning. Springer. Hans L. Bodlaender. 99. A tourist guide through treewidth. Acta Cybernetica, (-):. David Chiang, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, Bevan Jones, and Kevin Knight.. Parsing graphs with hyperedge replacement grammars. In Proceedings of the st Meeting of the ACL. Marie-Catherine de Marneffe and Christopher D. Manning. 8. The stanford typed dependencies representation. In Proceedings of COLING Workshop on Cross-framework and Cross-domain Parser Evaluation. Frank Drewes, Annegret Habel, and Hans-Jorg Kreowski Hyperedge replacement graph grammars. Handbook of Graph Grammars and Computing by Graph Transformation, pages 9. Ferenc Gecseg and Magnus Steinby. 98. Tree Automata. Akademiai Kiado, Budapest. Daniel Gildea.. Grammar factorization by tree decomposition. Computational Linguistics, 7(): 8. Mark Johnson, Stuart Geman, Stephen Canon, Zhiyi Chi, and Stefan Riezler Estimators for stochastic unification-based grammars. In Proceedings of the 7th Meeting of the ACL, pages. Bevan Jones, Jacob Andreas, Daniel Bauer, Karl- Moritz Hermann, and Kevin Knight.. Semantics-based machine translation with hyperedge replacement grammars. In Proceedings of COLING. Kevin Knight and Jonathon Graehl.. An overview of probabilistic tree transducers for natural language processing. In Proceedings of the th International Conference on Intelligent Text Processing and Computational Linguistics. Clemens Lautemann Decomposition trees: Structured graph representation and efficient algorithms. In M. Dauchet and M. Nivat, editors, CAAP 88, volume 99 of Lecture Notes in Computer Science, pages 8 9. Springer Berlin Heidelberg. Phong Le and Willem Zuidema.. Learning compositional semantics for open domain semantic parsing. In Proceedings of COLING. Scott Martin and Michael White.. Creating disjunctive logical forms from aligned sentences for grammar-based paraphrase generation. In Proceedings of the Workshop on Monolingual Text-To-Text Generation (MTTG), pages 7 8. Stephan Oepen, Dan Flickinger, Kristina Toutanova, and Christopher D. Manning.. Lingo redwoods. Research on Language and Computation, ():7 9. Judea Pearl Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, CA, edition. Stefan Riezler, Detlef Prescher, Jonas Kuhn, and Mark Johnson.. Lexicalized stochastic modeling of constraint-based grammars using log-linear measures and em. In Proceedings of the 8th Meeting of the ACL. John F. Sowa. 97. Conceptual graphs for a data base interface. IBM Journal of Research and Development, (): 7. Ivan Titov, James Henderson, Paola Merlo, and Gabriele Musillo. 9. Online graph planarisation for synchronous parsing of semantic and syntactic dependencies. In Proceedings of IJCAI, pages 7.
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma
CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages
Introduction Compiler esign CSE 504 1 Overview 2 3 Phases of Translation ast modifled: Mon Jan 28 2013 at 17:19:57 EST Version: 1.5 23:45:54 2013/01/28 Compiled at 11:48 on 2015/01/28 Compiler esign Introduction
DEPENDENCY PARSING JOAKIM NIVRE
DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes
Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits
Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique
Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
Regular Expressions and Automata using Haskell
Regular Expressions and Automata using Haskell Simon Thompson Computing Laboratory University of Kent at Canterbury January 2000 Contents 1 Introduction 2 2 Regular Expressions 2 3 Matching regular expressions
The Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, 2014. School of Informatics University of Edinburgh [email protected].
Pushdown automata Informatics 2A: Lecture 9 Alex Simpson School of Informatics University of Edinburgh [email protected] 3 October, 2014 1 / 17 Recap of lecture 8 Context-free languages are defined by context-free
5 Directed acyclic graphs
5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate
GRAPH THEORY LECTURE 4: TREES
GRAPH THEORY LECTURE 4: TREES Abstract. 3.1 presents some standard characterizations and properties of trees. 3.2 presents several different types of trees. 3.7 develops a counting method based on a bijection
Basic Parsing Algorithms Chart Parsing
Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS 2011/2012 Anna Schmidt Talk Outline Chart Parsing Basics Chart Parsing Algorithms Earley Algorithm CKY Algorithm
A first step towards modeling semistructured data in hybrid multimodal logic
A first step towards modeling semistructured data in hybrid multimodal logic Nicole Bidoit * Serenella Cerrito ** Virginie Thion * * LRI UMR CNRS 8623, Université Paris 11, Centre d Orsay. ** LaMI UMR
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing Yantao Du, Fan Zhang, Weiwei Sun and Xiaojun Wan Institute of Computer Science and Technology, Peking University The MOE Key
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Testing LTL Formula Translation into Büchi Automata
Testing LTL Formula Translation into Büchi Automata Heikki Tauriainen and Keijo Heljanko Helsinki University of Technology, Laboratory for Theoretical Computer Science, P. O. Box 5400, FIN-02015 HUT, Finland
Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy
Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy Wei Wang and Kevin Knight and Daniel Marcu Language Weaver, Inc. 4640 Admiralty Way, Suite 1210 Marina del Rey, CA, 90292 {wwang,kknight,dmarcu}@languageweaver.com
Bounded Treewidth in Knowledge Representation and Reasoning 1
Bounded Treewidth in Knowledge Representation and Reasoning 1 Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien Luminy, October 2010 1 Joint work with G.
6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010. Class 4 Nancy Lynch
6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010 Class 4 Nancy Lynch Today Two more models of computation: Nondeterministic Finite Automata (NFAs)
Agenda. Interface Agents. Interface Agents
Agenda Marcelo G. Armentano Problem Overview Interface Agents Probabilistic approach Monitoring user actions Model of the application Model of user intentions Example Summary ISISTAN Research Institute
Parsing Beyond Context-Free Grammars: Introduction
Parsing Beyond Context-Free Grammars: Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Sommersemester 2016 Parsing Beyond CFG 1 Introduction Overview 1. CFG and natural languages 2. Polynomial
COMPARING MATRIX-BASED AND GRAPH-BASED REPRESENTATIONS FOR PRODUCT DESIGN
12 TH INTERNATIONAL DEPENDENCY AND STRUCTURE MODELLING CONFERENCE, 22 23 JULY 2010, CAMBRIDGE, UK COMPARING MATRIX-BASED AND GRAPH-BASED REPRESENTATIONS FOR PRODUCT DESIGN Andrew H Tilstra 1, Matthew I
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs
Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 13 Overview Graphs and Graph
Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016
Master s Degree Course in Computer Engineering Formal Languages FORMAL LANGUAGES AND COMPILERS Lexical analysis Floriano Scioscia 1 Introductive terminological distinction Lexical string or lexeme = meaningful
Fast nondeterministic recognition of context-free languages using two queues
Fast nondeterministic recognition of context-free languages using two queues Burton Rosenberg University of Miami Abstract We show how to accept a context-free language nondeterministically in O( n log
Large-Sample Learning of Bayesian Networks is NP-Hard
Journal of Machine Learning Research 5 (2004) 1287 1330 Submitted 3/04; Published 10/04 Large-Sample Learning of Bayesian Networks is NP-Hard David Maxwell Chickering David Heckerman Christopher Meek Microsoft
Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy
Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy Kim S. Larsen Odense University Abstract For many years, regular expressions with back referencing have been used in a variety
Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing
1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: [email protected])
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
Lecture 17 : Equivalence and Order Relations DRAFT
CS/Math 240: Introduction to Discrete Mathematics 3/31/2011 Lecture 17 : Equivalence and Order Relations Instructor: Dieter van Melkebeek Scribe: Dalibor Zelený DRAFT Last lecture we introduced the notion
3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin [email protected] 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
CSCI 3136 Principles of Programming Languages
CSCI 3136 Principles of Programming Languages Faculty of Computer Science Dalhousie University Winter 2013 CSCI 3136 Principles of Programming Languages Faculty of Computer Science Dalhousie University
1 Definitions. Supplementary Material for: Digraphs. Concept graphs
Supplementary Material for: van Rooij, I., Evans, P., Müller, M., Gedge, J. & Wareham, T. (2008). Identifying Sources of Intractability in Cognitive Models: An Illustration using Analogical Structure Mapping.
Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi
Automata Theory Automata theory is the study of abstract computing devices. A. M. Turing studied an abstract machine that had all the capabilities of today s computers. Turing s goal was to describe the
CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010
CS 598CSC: Combinatorial Optimization Lecture date: /4/010 Instructor: Chandra Chekuri Scribe: David Morrison Gomory-Hu Trees (The work in this section closely follows [3]) Let G = (V, E) be an undirected
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Topology-based network security
Topology-based network security Tiit Pikma Supervised by Vitaly Skachek Research Seminar in Cryptography University of Tartu, Spring 2013 1 Introduction In both wired and wireless networks, there is the
Genetic programming with regular expressions
Genetic programming with regular expressions Børge Svingen Chief Technology Officer, Open AdExchange [email protected] 2009-03-23 Pattern discovery Pattern discovery: Recognizing patterns that characterize
JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004
Scientiae Mathematicae Japonicae Online, Vol. 10, (2004), 431 437 431 JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS Ondřej Čepeka and Shao Chin Sung b Received December May 12, 2003; revised February
CS510 Software Engineering
CS510 Software Engineering Propositional Logic Asst. Prof. Mathias Payer Department of Computer Science Purdue University TA: Scott A. Carr Slides inspired by Xiangyu Zhang http://nebelwelt.net/teaching/15-cs510-se
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b
LZ77 The original LZ77 algorithm works as follows: A phrase T j starting at a position i is encoded as a triple of the form distance, length, symbol. A triple d, l, s means that: T j = T [i...i + l] =
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
Generating models of a matched formula with a polynomial delay
Generating models of a matched formula with a polynomial delay Petr Savicky Institute of Computer Science, Academy of Sciences of Czech Republic, Pod Vodárenskou Věží 2, 182 07 Praha 8, Czech Republic
Parsing Technology and its role in Legacy Modernization. A Metaware White Paper
Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks
Graph Theory Problems and Solutions
raph Theory Problems and Solutions Tom Davis [email protected] http://www.geometer.org/mathcircles November, 005 Problems. Prove that the sum of the degrees of the vertices of any finite graph is
Data Structure [Question Bank]
Unit I (Analysis of Algorithms) 1. What are algorithms and how they are useful? 2. Describe the factor on best algorithms depends on? 3. Differentiate: Correct & Incorrect Algorithms? 4. Write short note:
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
root node level: internal node edge leaf node CS@VT Data Structures & Algorithms 2000-2009 McQuain
inary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from each
Mathematics for Algorithm and System Analysis
Mathematics for Algorithm and System Analysis for students of computer and computational science Edward A. Bender S. Gill Williamson c Edward A. Bender & S. Gill Williamson 2005. All rights reserved. Preface
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
Examination paper for MA0301 Elementær diskret matematikk
Department of Mathematical Sciences Examination paper for MA0301 Elementær diskret matematikk Academic contact during examination: Iris Marjan Smit a, Sverre Olaf Smalø b Phone: a 9285 0781, b 7359 1750
Web Data Extraction: 1 o Semestre 2007/2008
Web Data : Given Slides baseados nos slides oficiais do livro Web Data Mining c Bing Liu, Springer, December, 2006. Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008
Circuits 1 M H Miller
Introduction to Graph Theory Introduction These notes are primarily a digression to provide general background remarks. The subject is an efficient procedure for the determination of voltages and currents
Degree Hypergroupoids Associated with Hypergraphs
Filomat 8:1 (014), 119 19 DOI 10.98/FIL1401119F Published by Faculty of Sciences and Mathematics, University of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat Degree Hypergroupoids Associated
Y. Xiang, Constraint Satisfaction Problems
Constraint Satisfaction Problems Objectives Constraint satisfaction problems Backtracking Iterative improvement Constraint propagation Reference Russell & Norvig: Chapter 5. 1 Constraints Constraints are
Formal Languages and Automata Theory - Regular Expressions and Finite Automata -
Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March
Analysis of Algorithms, I
Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications
Distributed Structured Prediction for Big Data
Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich [email protected] T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of
Reliability Guarantees in Automata Based Scheduling for Embedded Control Software
1 Reliability Guarantees in Automata Based Scheduling for Embedded Control Software Santhosh Prabhu, Aritra Hazra, Pallab Dasgupta Department of CSE, IIT Kharagpur West Bengal, India - 721302. Email: {santhosh.prabhu,
Markov random fields and Gibbs measures
Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).
Language Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
Network (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
[Refer Slide Time: 05:10]
Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture
Chapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
NP-Completeness and Cook s Theorem
NP-Completeness and Cook s Theorem Lecture notes for COM3412 Logic and Computation 15th January 2002 1 NP decision problems The decision problem D L for a formal language L Σ is the computational task:
Lecture 16 : Relations and Functions DRAFT
CS/Math 240: Introduction to Discrete Mathematics 3/29/2011 Lecture 16 : Relations and Functions Instructor: Dieter van Melkebeek Scribe: Dalibor Zelený DRAFT In Lecture 3, we described a correspondence
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
University of Ostrava. Reasoning in Description Logic with Semantic Tableau Binary Trees
University of Ostrava Institute for Research and Applications of Fuzzy Modeling Reasoning in Description Logic with Semantic Tableau Binary Trees Alena Lukasová Research report No. 63 2005 Submitted/to
The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).
CHAPTER 5 The Tree Data Model There are many situations in which information has a hierarchical or nested structure like that found in family trees or organization charts. The abstraction that models hierarchical
Load Balancing and Switch Scheduling
EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: [email protected] Abstract Load
Cloud Computing is NP-Complete
Working Paper, February 2, 20 Joe Weinman Permalink: http://www.joeweinman.com/resources/joe_weinman_cloud_computing_is_np-complete.pdf Abstract Cloud computing is a rapidly emerging paradigm for computing,
Classification/Decision Trees (II)
Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: [email protected] Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).
A Business Process Services Portal
A Business Process Services Portal IBM Research Report RZ 3782 Cédric Favre 1, Zohar Feldman 3, Beat Gfeller 1, Thomas Gschwind 1, Jana Koehler 1, Jochen M. Küster 1, Oleksandr Maistrenko 1, Alexandru
. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
Modeling Guidelines Manual
Modeling Guidelines Manual [Insert company name here] July 2014 Author: John Doe [email protected] Page 1 of 22 Table of Contents 1. Introduction... 3 2. Business Process Management (BPM)... 4 2.1.
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
1. Nondeterministically guess a solution (called a certificate) 2. Check whether the solution solves the problem (called verification)
Some N P problems Computer scientists have studied many N P problems, that is, problems that can be solved nondeterministically in polynomial time. Traditionally complexity question are studied as languages:
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
136 CHAPTER 4. INDUCTION, GRAPHS AND TREES
136 TER 4. INDUCTION, GRHS ND TREES 4.3 Graphs In this chapter we introduce a fundamental structural idea of discrete mathematics, that of a graph. Many situations in the applications of discrete mathematics
