Dynamic Finite-State Transducer Composition with Look-Ahead for Very-Large-Scale Speech Recognition

Size: px
Start display at page:

Download "Dynamic Finite-State Transducer Composition with Look-Ahead for Very-Large-Scale Speech Recognition"

Transcription

1 Dynamic Finite-State Transducer Composition with Look-Ahead for Very-Large-Scale Speech Recognition Cyril Allauzen - [email protected] Ciprian Chelba - [email protected] Boulos Harb - [email protected] Michael Riley - [email protected] Johan Schalkwyk - [email protected] Aug 19, 2010

2 Weighted Finite-State Tranducers in Speech Recognition - I WFSTs are a general and efficient representation for many speech and NLP problems, see: Mohri, et al., Speech recognition with weighted finite-state transducers, in Handbook of Speech Processing. Springer In ASR, they have been used to: Represent models: G: n-gram language model (automaton over words) L: pronunciation lexicon (transducer from CI phones to words) C: context dependency (transducer from CD phones to CI phone) Combine and optimize models: Composition: Computes the relational composition of two transducers. Epsilon Removal: Finds equivalent WFST with no ǫ transitions. Determinization: Finds equivalent WFST that has no identically-labeled transitions leaving a state. Minimization: Finds equivalent deterministic WFST with the fewest states and arcs.

3 Weighted Finite-State Tranducers in Speech Recognition - II Advantages: Uniform data representation General, efficient, mathematically well-defined and reusable combination and optimization operations Variant systems realized in data not code. OpenFst, an open-source finite-state transducer library, was used for this work ( Released under the Apache license; used in many speech and NLP applications.

4 Weighted Acceptors Finite automata with labels and weights. Example: Word pronunciation acceptor: d/1 0 1 ey/0.5 ae/0.5 2 t/0.3 dx/0.7 3 ax/1 4

5 Weighted Transducers Finite automata with input labels, output labels, and weights. Example: Word pronunciation transducer: d:data/1 1 ey: ε /0.5 ae: ε /0.5 2 t: ε /0.3 dx: ε /0.7 3 ax: ε /1 4/0 0 d:dew/1 5 uw: ε /1 6/0 L: Closed union of V word pronunciation transducers. G: An n-gram model is a WFSA with (at most) V n 1 states.

6 Context-Dependent Triphone Transducer C y:y/ ε_x x:x/ ε_ε ε,* x:x/ ε_x x:x/ ε_y x:x/x_x x,x x:x/x_y x,y x:x/y_x x:x/y_y y:y/x_x y:y/x_y y:y/y_y y:y/y_x y,x x:x/y_ε x,ε y,y y:y/y_ε y:y/x_ ε y,ε y:y/ ε_y x:x/x_ε y:y/ ε_ε

7 Recognition Transducer Construction The models C, L, G can be combined and optimized with weighted finitestate composition and determinization as: C det(l G) (1) An alternative construction, producing an equivalent transducer, is: (C det(l)) G (2) If G is deterministic, Eq. 2 could be as efficient as Eq. 1 and avoids the determinization of L G, greatly saving time and memory and allowing fast dynamic combination (useful in applications). However, standard composition presents three problems with Eq. 2: 1. Determinization of L moves back word labels creating delay in matching and creating (possibly very many) useless composition paths 2. The delayed word labels in L produce a much larger composed machine when G is an n-gram LM. 3. The delayed word labels push back the grammar weights along paths in the composed machine to the detriment of ASR pruning.

8 0 r:red r:read r:reed r:road r:rode eh:ε eh:ε iy:ε iy:ε ao:ε ao:ε 6 d:ε 7 Composition Example 0 r:ε 1 eh:ε iy:ε ao:ε 2 3 d:read d:red d:read d:reed d:road 4 0 red/0.6 read/ ,0 5 r:red/0.6 r:read/0.4 5 d:rode L det(l) G 1,1 2,2 eh:ε eh:ε iy:ε 6,1 6,2 d:ε d:ε 7,1 7,2 0 r:ε 1,0 eh:ε iy:ε ao:ε 2,0 3,0 5,0 d:red/0.6 d:read/0.4 d:read/0.4 4,1 4,2 L G det(l) G

9 Definitions and Notation Paths Path π Origin or previous state: p[π]. Destination or next state: n[π]. Input label: i[π]. Output label: o[π]. p[π] i[π]:o[π] n[π] Sets of paths P(R 1, R 2 ): set of all paths from R 1 Q to R 2 Q. P(R 1, x, R 2 ): paths in P(R 1, R 2 ) with input label x. P(R 1, x, y, R 2 ): paths in P(R 1, x, R 2 ) with output label y.

10 Definitions and Notation Transducers Alphabets: input A, output B. States: Q, initial states I, final states F. Transitions: E Q (A {ǫ}) (B {ǫ}) K Q. Weight functions: initial weight function λ : I K final weight function ρ : F K. Transducer T = (A, B, Q, I, F, E, λ, ρ) with for all x A, y B : [[T]](x, y) = λ(p[π]) w[π] ρ(n[π]) π P(I,x,y,F)

11 Semirings A semiring (K,,, 0, 1) = a ring that may lack negation. Sum: to compute the weight of a sequence (sum of the weights of the paths labeled with that sequence). Product: to compute the weight of a path (product of the weights of constituent transitions). Semiring Set 0 1 Boolean {0, 1} 0 1 Probability R Log R {, + } log Tropical R {, + } min String B { } lcp ǫ with log defined by: x log y = log(e x + e y ).

12 (ǫ-free) Composition Algorithm States: (q 1, q 2 ) with q 1 in T 1 and q 2 in T 2. Transitions: e 1 transition in q 1 and e 2 in q 2 such that o[e 1 ] = i[e 2 ] ((q 1, q 2 ), i[e 1 ], o[e 2 ], w[e 1 ] w[e 2 ], (n[e 1 ], n[e 2 ]))

13 Composition Filter: Φ = (T 1, T 2, Q 3, i 3,, ϕ) Generalized Composition Algorithm Q 3 : set of filter states with i 3 initial and final. ϕ : (e 1, e 2, q 3 ) (e 1, e 2, q 3): transition filter Algorithm: States: (q 1, q 2, q 3 ) with q 1 in T 1, q 2 in T 2 and q 3 a filter state. Transitions: e 1 transition in q 1, e 2 in q 2 such that ϕ(e 1, e 2, q 3 ) = (e 1, e 2, q 3) with q 3 ((q 1, q 2, q 3 ), i[e 1], o[e 2], w[e 1] w[e 2], (n[e 1], n[e 2], q 3)) Trivial filter Φ trivial : Allows all matching paths Q 3 = {0, }, i 3 = 0 and ϕ(e 1, e 2, 0) = basic ǫ-free composition algorithm { (e1, e 2, 0) if o[e 1 ] = i[e 2 ] (e 1, e 2, ) otherwise

14 Pseudo-code Weighted-Composition(T 1, T 2 ) 1 Q I S I 1 I 2 {i 3 } 2 while S do 3 (q 1, q 2, q 3 ) Head(S) 4 Dequeue(S) 5 if (q 1, q 2, q 3 ) F 1 F 2 Q 3 then 6 F F {(q 1, q 2, q 3 )} 7 ρ(q 1, q 2, q 3 ) ρ 1 (q 1 ) ρ 2 (q 2 ) ρ 3 (q 3 ) 8 M {(e 1, e 2 ) E L [q 1 ] E L [q 2 ] such that ϕ(e 1, e 2, q 3 ) = (e 1, e 2, q 3 ) with q 3 } 9 for each(e 1, e 2 ) M do 10 (e 1, e 2, q 3 ) ϕ(e 1, e 2, q 3 ) 11 if (n[e 1 ], n[e 2 ], q 3 ) Q then 12 Q Q (n[e 1 ], n[e 2 ], q 3 ) 13 Enqueue(S, (n[e 1 ], n[e 2 ], q 3 )) 14 E E {((q 1, q 2, q 3 ), i[e 1 ], o[e 2 ], w[e 1 ] w[e 2 ], (n[e 1 ], n[e 2 ], q 3 ))} 15 return T

15 Epsilon-Matching Filter An ǫ-transition in T 1 (resp. T 2 ) can be matched in T 2 (resp. T 1 ) by an ǫ-transition or by staying at the same state (as if there were ǫ self-loops at each state in T 1 and T 2 ) Allowing all possible ǫ-matches: redundant ǫ-paths in T 1 T 2 wrong result when the semiring is non-idempotent Filter Φ ǫ-match : Disallows redundant ǫ-paths, favoring matching actual ǫ-transitions Q 3 = {0, 1, 2, }, i 3 = 0 and ϕ(e 1, e 2, q 3 ) = (e 1, e 2, q 3) where: q 3 = 8 >< >: 0 if (o[e 1 ], i[e 2 ]) = (x, x) with x B, 0 if (o[e 1 ], i[e 2 ]) = (ǫ, ǫ) and q 3 = 0, 1 if (o[e 1 ], i[e 2 ]) = (ǫ L, ǫ) and q 3 2, 2 if (o[e 1 ], i[e 2 ]) = (ǫ, ǫ L ) and q 3 1, otherwise. ǫ L : label of added self-loops composition algorithm of [Mohri, Peirera and Riley, 96]

16 Label-Reachability Filter Disallows following an ǫ-path in q 1 that will fail to reach a non-ǫ label that matches some transition in q 2 Label-Reachability r : Q 1 B {0, 1} r(q, x) = ( 1 if there exists a path from q to some q with output label x 0 otherwise Filter Φ reach : Same as Φ trivial except when o[e 1 ] = ǫ and i[e 2 ] = ǫ L then ϕ(e 1, e 2, 0) = (e 1, e 2, q 3 ) with q 3 = 0 r:ε 1 0 red/0.6 read/0.4 eh:ε iy:ε ao:ε d:read d:red d:read d:reed d:road d:rode 4 ( 0 if there exist e 2 in q 2 such that r(n[e 1 ], i[e 2 ]) = 1 otherwise 0,0 r:ε 1,0 eh:ε iy:ε ao:ε 2,0 3,0 d:red/0.6 d:read/0.4 d:read/0.4 4,1 4,2

17 Label-Reachability Filter with Label Pushing When matching an ǫ-transition e 1 with an ǫ L -loop e 2 : if there exists a unique e 2 in q 2 such that r(n[e 1 ], i[e 2]) = 1, then allow matching e 1 with e 2 instead early output of o[e 2] Filter Φ push-label : Q 3 = B {ǫ, } and i 3 = ǫ the filter state encodes the label that has been consumed early d:read 0 r:ε 1 0 red/0.6 read/0.4 eh:ε iy:ε ao:ε d:red d:read d:reed d:road d:rode 4 0,0,ε r:ε 1,0,ε eh:ε iy:read 2,0,ε 3,1,read d:red/0.6 d:read/0.4 d:ε/0.4 4,1,ε 4,2,ε

18 Label-Reachability Filter with Weight Pushing When matching an ǫ-transition e 1 with an ǫ L -loop e 2 : outputs early the -sum of the weight of the prospective matches Reachable weight w r : (q 1, q 2 ) e E[q 2 ],r(q 1,i[e])=1 w[e] Filter Φ push-weight : Q 3 = K, i 3 = 1 and = 0 the filter state encodes the weight that has been outputted early if o[e 1 ] = ǫ and i[e 2 ] = ǫ L, q 3 = w r (n[e 1 ], q 2 ) and w[e 2] = q 1 3 q 3 d:read 0 r:ε 1 0 red/0.6 read/0.4 eh:ε iy:ε ao:ε d:red d:read d:reed d:road d:rode 4 0,0,1 r:ε 1,0,1 d:red/0.6 4,1,1 2,0,1 eh:ε d:read/0.4 iy: ε/0.4 d:read 3,0,0.4 4,2,1

19 Representation of r Implementation Point representation: R q = {x B : r(x, q) = 1} inefficient in time and space Interval representation: I q = {[x, y) : x, y N, [x, y) R q, x 1 / R q, y / R q } efficiency depends on the number of interval for each R q one interval per state trivial for a tree - found by DFS one interval per state possible if C1P holds true if unique pronunciation L and preserved by determinization, minimization, closure and composition with C multiple pronunciation L typically fails C1P. However, a modification of the Hsu s (2002) C1P Test gives a greedy algorithm for minimizing the number of intervals per state.

20 Efficient computation of w r Implementation Requires fast computation of s q (x, y) = e E[q],i[e] [x,y) w[e] for q in T 2, x and y in B = N Achieved by precomputing c q (x) = e E[q],i[e]<x w[e] s q (x, y) = c q (y) c q (x)

21 Composition Options: Composition Design - Options typedef SortedMatcher<StdFst> SM; typedef SequenceComposeFilter<Arc> CF; ComposeFstOptions<StdArc, SM, CF> opts; opts.matcher1 = new SM(fst1, MATCH NONE, knolabel); opts.matcher2 = new SM(fst2, MATCH INPUT, knolabel); opts.filter = new CF(fst1, fst2); StdComposeFst cfst(fst1, fst2, opts);

22 Composition Filters Predefined Filters: Name SequenceComposeFilter AltSequenceComposeFilter MatchComposeFilter LookAheadComposeFilter<F> PushWeightsComposeFilter<F> PushLabelsComposeFilter<F> Description Requires FST1 epsilons to be read before FST2 epsilons Requires FST2 epsilons to be read before FST1 epsilons Requires FST1 epsilons be matched with FST2 epsilons Supports lookahead in composition Supports pushing weights in composition Supports pushing labels in composition Three lookahead composition filters, each templated on an underlying filter F, are added. All three can be used by cascading them.

23 Composition: Matcher Design Matchers can find and iterate through requested labels at FST states. Matcher Form: template <class F> class Matcher { typedef typename F::Arc Arc; public: }; void SetState(StateId s); bool Find(Label label); bool Done() const; const Arc& Value() const; void Next(); bool LookAhead(const Fst<Arc> fst, StateId s, Weight &weight); // Specifies current state // Checks state for match to label // No more matches // Current arc // Advance to next arc // (Optional) lookahead A Lookahead() method, given the language (FST + initial state) to expect, is added.

24 Matchers Predefined Matchers: Name SortedMatcher RhoMatcher<M> SigmaMatcher<M> PhiMatcher<M> LabelLookAheadMatcher<M> ArcLookMatcher<M> Description Binary search on sorted input ρ symbol handling σ symbol handling ϕ symbol handling Lookahead along epsilon paths Lookahead one transition Two lookahead matchers, each templated on an underlying matcher M, are added. Special symbol matchers: Consumes no symbol Consumes symbol Matches all ǫ σ Matches rest ϕ ρ

25 Recognition Experiments Broadcast News Spoken Query Task Trained on 96 and 97 DARPA Hub4 AM training sets. PLP cepstra, LDA analysis, STC Triphonic, 8k tied states, 16 components per state Speaker adapted (both VTLN + CMLLR) Acoustic Model Trained on > 1000hrs of voice search queries PLP cepstra, LDA analysis, STC Triphonic, 4k tied states, components per state Speaker independent 1996 Hub4 CSR LM training sets 4-gram language model pruned to 8M n- grams Language Model Trained on > 1B words of google.com and voice search queries 1 million word vocabulary Katz back-off model, pruned to various sizes

26 Recognition Experiments Precomputation before recognition Broadcast News Spoken Query Task Construction method Time RAM Result Time RAM Result Static (1) with standard composition 7 min 5.3G 0.5G 10.5 min 11.2G 1.4G (2) with generalized composition 2.5 min 2.9G 0.5G 4 min 5.3G 1.4G Dynamic (2) with generalized composition none none 0.2G none none 0.5G Broadcast News Spoken Query Task

27 Recognition Experiments A small part of the recognition transducer is visited during recognition: Spoken Query Task Static Number of states in recognition transducer 25.4M Dynamic Number of states visited per second 8K Very large language models can be used in first-pass: Word Error Rate Spoken Query Task Word error rate as function of LM size (with Ciprian Chelba and Boulos Harb) 1e+06 5e+06 1e+07 5e+07 1e+08 5e+08 1e+09 # of N Grams

28 Prior Work Caseiro and Trancoso (IEEE Trans. on ASLP 2006): they developed a specialized composition for a pronunciation lexicon L. If pronunciations are stored in a trie, then the words readable from a node form a lexicographic interval, which can be used to disallow noncoaccessible epsilon paths. Cheng, et al. (ICASSP 2007); Oonishi, et al (Interspeech 2008): they use methods apparently similar to ours, but many details are left unspecified, such as what is the representation of the reachable label sets. There are no published complexities, but the published results show a very significant overhead to the dynamic composition compared to a static recognition transducer. Our method: uses a very efficient representation of the label sets uses a very efficient computation of the weight pushing has a small overhead between static and dynamic composition

29 Conclusions This work: Introduces a generalized composition filter for weighted finite-state composition Presents composition filters that: Remove useless epsilon paths Push forward labels Push forward weights The combination of these filters permits the composition of large speech-recognition context-dependent lexicons and language models much more efficiently in time and space than before Experiments on Broadcast News and a spoken query task show a 5% to 10% overhead for dynamic, runtime composition compared to static, offline composition. To our knowledge, this is the first such system with so little overhead.

The OpenGrm open-source finite-state grammar software libraries

The OpenGrm open-source finite-state grammar software libraries The OpenGrm open-source finite-state grammar software libraries Brian Roark Richard Sproat Cyril Allauzen Michael Riley Jeffrey Sorensen & Terry Tai Oregon Health & Science University, Portland, Oregon

More information

Measuring the confusability of pronunciations in speech recognition

Measuring the confusability of pronunciations in speech recognition Measuring the confusability of pronunciations in speech recognition Panagiota Karanasou LIMSI/CNRS Université Paris-Sud [email protected] François Yvon LIMSI/CNRS Université Paris-Sud [email protected] Lori

More information

Automated Lossless Hyper-Minimization for Morphological Analyzers

Automated Lossless Hyper-Minimization for Morphological Analyzers Automated Lossless Hyper-Minimization for Morphological Analyzers Senka Drobac and Miikka Silfverberg and Krister Lindén Department of Modern Languages PO Box 24 00014 University of Helsinki {senka.drobac,

More information

Regular Expressions and Automata using Haskell

Regular Expressions and Automata using Haskell Regular Expressions and Automata using Haskell Simon Thompson Computing Laboratory University of Kent at Canterbury January 2000 Contents 1 Introduction 2 2 Regular Expressions 2 3 Matching regular expressions

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Regular Languages and Finite State Machines

Regular Languages and Finite State Machines Regular Languages and Finite State Machines Plan for the Day: Mathematical preliminaries - some review One application formal definition of finite automata Examples 1 Sets A set is an unordered collection

More information

Intrusion Detection via Static Analysis

Intrusion Detection via Static Analysis Intrusion Detection via Static Analysis IEEE Symposium on Security & Privacy 01 David Wagner Drew Dean Presented by Yongjian Hu Outline Introduction Motivation Models Trivial model Callgraph model Abstract

More information

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID

More information

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010. Class 4 Nancy Lynch

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010. Class 4 Nancy Lynch 6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010 Class 4 Nancy Lynch Today Two more models of computation: Nondeterministic Finite Automata (NFAs)

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey [email protected], [email protected]

More information

Finite Automata and Regular Languages

Finite Automata and Regular Languages CHAPTER 3 Finite Automata and Regular Languages 3. Introduction 3.. States and Automata A finite-state machine or finite automaton (the noun comes from the Greek; the singular is automaton, the Greek-derived

More information

Coding and decoding with convolutional codes. The Viterbi Algor

Coding and decoding with convolutional codes. The Viterbi Algor Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition

More information

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

CSE 135: Introduction to Theory of Computation Decidability and Recognizability CSE 135: Introduction to Theory of Computation Decidability and Recognizability Sungjin Im University of California, Merced 04-28, 30-2014 High-Level Descriptions of Computation Instead of giving a Turing

More information

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March

More information

StateFlow Hands On Tutorial

StateFlow Hands On Tutorial StateFlow Hands On Tutorial HS/PDEEC 2010 03 04 José Pinto [email protected] Session Outline Simulink and Stateflow Numerical Simulation of ODEs Initial Value Problem (Hands on) ODEs with resets (Hands

More information

Finite Automata. Reading: Chapter 2

Finite Automata. Reading: Chapter 2 Finite Automata Reading: Chapter 2 1 Finite Automaton (FA) Informally, a state diagram that comprehensively captures all possible states and transitions that a machine can take while responding to a stream

More information

2110711 THEORY of COMPUTATION

2110711 THEORY of COMPUTATION 2110711 THEORY of COMPUTATION ATHASIT SURARERKS ELITE Athasit Surarerks ELITE Engineering Laboratory in Theoretical Enumerable System Computer Engineering, Faculty of Engineering Chulalongkorn University

More information

Testing LTL Formula Translation into Büchi Automata

Testing LTL Formula Translation into Büchi Automata Testing LTL Formula Translation into Büchi Automata Heikki Tauriainen and Keijo Heljanko Helsinki University of Technology, Laboratory for Theoretical Computer Science, P. O. Box 5400, FIN-02015 HUT, Finland

More information

Theory of Computation Chapter 2: Turing Machines

Theory of Computation Chapter 2: Turing Machines Theory of Computation Chapter 2: Turing Machines Guan-Shieng Huang Feb. 24, 2003 Feb. 19, 2006 0-0 Turing Machine δ K 0111000a 01bb 1 Definition of TMs A Turing Machine is a quadruple M = (K, Σ, δ, s),

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

Compiler Construction

Compiler Construction Compiler Construction Regular expressions Scanning Görel Hedin Reviderad 2013 01 23.a 2013 Compiler Construction 2013 F02-1 Compiler overview source code lexical analysis tokens intermediate code generation

More information

Finite Automata. Reading: Chapter 2

Finite Automata. Reading: Chapter 2 Finite Automata Reading: Chapter 2 1 Finite Automata Informally, a state machine that comprehensively captures all possible states and transitions that a machine can take while responding to a stream (or

More information

Reading 13 : Finite State Automata and Regular Expressions

Reading 13 : Finite State Automata and Regular Expressions CS/Math 24: Introduction to Discrete Mathematics Fall 25 Reading 3 : Finite State Automata and Regular Expressions Instructors: Beck Hasti, Gautam Prakriya In this reading we study a mathematical model

More information

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation PLDI 03 A Static Analyzer for Large Safety-Critical Software B. Blanchet, P. Cousot, R. Cousot, J. Feret L. Mauborgne, A. Miné, D. Monniaux,. Rival CNRS École normale supérieure École polytechnique Paris

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

The Halting Problem is Undecidable

The Halting Problem is Undecidable 185 Corollary G = { M, w w L(M) } is not Turing-recognizable. Proof. = ERR, where ERR is the easy to decide language: ERR = { x { 0, 1 }* x does not have a prefix that is a valid code for a Turing machine

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016

Lexical analysis FORMAL LANGUAGES AND COMPILERS. Floriano Scioscia. Formal Languages and Compilers A.Y. 2015/2016 Master s Degree Course in Computer Engineering Formal Languages FORMAL LANGUAGES AND COMPILERS Lexical analysis Floriano Scioscia 1 Introductive terminological distinction Lexical string or lexeme = meaningful

More information

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions CSC45 AUTOMATA 2. Finite Automata: Examples and Definitions Finite Automata: Examples and Definitions A finite automaton is a simple type of computer. Itsoutputislimitedto yes to or no. It has very primitive

More information

Web Data Extraction: 1 o Semestre 2007/2008

Web Data Extraction: 1 o Semestre 2007/2008 Web Data : Given Slides baseados nos slides oficiais do livro Web Data Mining c Bing Liu, Springer, December, 2006. Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008

More information

Informatique Fondamentale IMA S8

Informatique Fondamentale IMA S8 Informatique Fondamentale IMA S8 Cours 1 - Intro + schedule + finite state machines Laure Gonnord http://laure.gonnord.org/pro/teaching/ [email protected] Université Lille 1 - Polytech Lille

More information

IMPLEMENTING SRI S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A SMART PHONE

IMPLEMENTING SRI S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A SMART PHONE IMPLEMENTING SRI S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A SMART PHONE Jing Zheng, Arindam Mandal, Xin Lei 1, Michael Frandsen, Necip Fazil Ayan, Dimitra Vergyri, Wen Wang, Murat Akbacak, Kristin

More information

T-79.186 Reactive Systems: Introduction and Finite State Automata

T-79.186 Reactive Systems: Introduction and Finite State Automata T-79.186 Reactive Systems: Introduction and Finite State Automata Timo Latvala 14.1.2004 Reactive Systems: Introduction and Finite State Automata 1-1 Reactive Systems Reactive systems are a class of software

More information

Introduction to Scheduling Theory

Introduction to Scheduling Theory Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France [email protected] November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling

More information

Variable Base Interface

Variable Base Interface Chapter 6 Variable Base Interface 6.1 Introduction Finite element codes has been changed a lot during the evolution of the Finite Element Method, In its early times, finite element applications were developed

More information

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy Kim S. Larsen Odense University Abstract For many years, regular expressions with back referencing have been used in a variety

More information

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, 2014. School of Informatics University of Edinburgh [email protected].

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, 2014. School of Informatics University of Edinburgh als@inf.ed.ac. Pushdown automata Informatics 2A: Lecture 9 Alex Simpson School of Informatics University of Edinburgh [email protected] 3 October, 2014 1 / 17 Recap of lecture 8 Context-free languages are defined by context-free

More information

A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms

A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms Simone Faro and Thierry Lecroq Università di Catania, Viale A.Doria n.6, 95125 Catania, Italy Université de Rouen, LITIS EA 4108,

More information

Magic Word. Possible Answers: LOOSER WINNER LOTTOS TICKET. What is the magic word?

Magic Word. Possible Answers: LOOSER WINNER LOTTOS TICKET. What is the magic word? Magic Word Question: A magic word is needed to open a box. A secret code assigns each letter of the alphabet to a unique number. The code for the magic word is written on the outside of the box. What is

More information

Introduction to Theory of Computation

Introduction to Theory of Computation Introduction to Theory of Computation Prof. (Dr.) K.R. Chowdhary Email: [email protected] Formerly at department of Computer Science and Engineering MBM Engineering College, Jodhpur Tuesday 28 th

More information

Turing Machines: An Introduction

Turing Machines: An Introduction CIT 596 Theory of Computation 1 We have seen several abstract models of computing devices: Deterministic Finite Automata, Nondeterministic Finite Automata, Nondeterministic Finite Automata with ɛ-transitions,

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet) Proceedings of the Twenty-Fourth Innovative Appications of Artificial Intelligence Conference Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet) Tatsuya Kawahara

More information

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information

Genetic programming with regular expressions

Genetic programming with regular expressions Genetic programming with regular expressions Børge Svingen Chief Technology Officer, Open AdExchange [email protected] 2009-03-23 Pattern discovery Pattern discovery: Recognizing patterns that characterize

More information

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013 Oct 4, 2013, p 1 Name: CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013 1. (max 18) 4. (max 16) 2. (max 12) 5. (max 12) 3. (max 24) 6. (max 18) Total: (max 100)

More information

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Questions 1 through 25 are worth 2 points each. Choose one best answer for each. Questions 1 through 25 are worth 2 points each. Choose one best answer for each. 1. For the singly linked list implementation of the queue, where are the enqueues and dequeues performed? c a. Enqueue in

More information

Reliability Guarantees in Automata Based Scheduling for Embedded Control Software

Reliability Guarantees in Automata Based Scheduling for Embedded Control Software 1 Reliability Guarantees in Automata Based Scheduling for Embedded Control Software Santhosh Prabhu, Aritra Hazra, Pallab Dasgupta Department of CSE, IIT Kharagpur West Bengal, India - 721302. Email: {santhosh.prabhu,

More information

CompuScholar, Inc. Alignment to Utah's Computer Programming II Standards

CompuScholar, Inc. Alignment to Utah's Computer Programming II Standards CompuScholar, Inc. Alignment to Utah's Computer Programming II Standards Course Title: TeenCoder: Java Programming Course ISBN: 978 0 9887070 2 3 Course Year: 2015 Note: Citation(s) listed may represent

More information

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi

Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi Automata Theory Automata theory is the study of abstract computing devices. A. M. Turing studied an abstract machine that had all the capabilities of today s computers. Turing s goal was to describe the

More information

The Model Checker SPIN

The Model Checker SPIN The Model Checker SPIN Author: Gerard J. Holzmann Presented By: Maulik Patel Outline Introduction Structure Foundation Algorithms Memory management Example/Demo SPIN-Introduction Introduction SPIN (Simple(

More information

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

More information

Fast nondeterministic recognition of context-free languages using two queues

Fast nondeterministic recognition of context-free languages using two queues Fast nondeterministic recognition of context-free languages using two queues Burton Rosenberg University of Miami Abstract We show how to accept a context-free language nondeterministically in O( n log

More information

Resilient Dynamic Programming

Resilient Dynamic Programming Resilient Dynamic Programming Irene Finocchi, Saverio Caminiti, and Emanuele Fusco Dipartimento di Informatica, Sapienza Università di Roma via Salaria, 113-00198 Rome, Italy. {finocchi, caminiti, fusco}@di.uniroma1.it

More information

6.080 / 6.089 Great Ideas in Theoretical Computer Science Spring 2008

6.080 / 6.089 Great Ideas in Theoretical Computer Science Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 6.080 / 6.089 Great Ideas in Theoretical Computer Science Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Influences in low-degree polynomials

Influences in low-degree polynomials Influences in low-degree polynomials Artūrs Bačkurs December 12, 2012 1 Introduction In 3] it is conjectured that every bounded real polynomial has a highly influential variable The conjecture is known

More information

Introduction to Finite Automata

Introduction to Finite Automata Introduction to Finite Automata Our First Machine Model Captain Pedro Ortiz Department of Computer Science United States Naval Academy SI-340 Theory of Computing Fall 2012 Captain Pedro Ortiz (US Naval

More information

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.

(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems. 3130CIT: Theory of Computation Turing machines and undecidability (IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems. An undecidable problem

More information

Introduction to Automata Theory. Reading: Chapter 1

Introduction to Automata Theory. Reading: Chapter 1 Introduction to Automata Theory Reading: Chapter 1 1 What is Automata Theory? Study of abstract computing devices, or machines Automaton = an abstract computing device Note: A device need not even be a

More information

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program. Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to

More information

Compression techniques

Compression techniques Compression techniques David Bařina February 22, 2013 David Bařina Compression techniques February 22, 2013 1 / 37 Contents 1 Terminology 2 Simple techniques 3 Entropy coding 4 Dictionary methods 5 Conclusion

More information

CAs and Turing Machines. The Basis for Universal Computation

CAs and Turing Machines. The Basis for Universal Computation CAs and Turing Machines The Basis for Universal Computation What We Mean By Universal When we claim universal computation we mean that the CA is capable of calculating anything that could possibly be calculated*.

More information

3515ICT Theory of Computation Turing Machines

3515ICT Theory of Computation Turing Machines Griffith University 3515ICT Theory of Computation Turing Machines (Based loosely on slides by Harald Søndergaard of The University of Melbourne) 9-0 Overview Turing machines: a general model of computation

More information

Business Intelligence and Process Modelling

Business Intelligence and Process Modelling Business Intelligence and Process Modelling F.W. Takes Universiteit Leiden Lecture 7: Network Analytics & Process Modelling Introduction BIPM Lecture 7: Network Analytics & Process Modelling Introduction

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

14.1 Rent-or-buy problem

14.1 Rent-or-buy problem CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms

More information

Automata and Formal Languages

Automata and Formal Languages Automata and Formal Languages Winter 2009-2010 Yacov Hel-Or 1 What this course is all about This course is about mathematical models of computation We ll study different machine models (finite automata,

More information

Computability Theory

Computability Theory CSC 438F/2404F Notes (S. Cook and T. Pitassi) Fall, 2014 Computability Theory This section is partly inspired by the material in A Course in Mathematical Logic by Bell and Machover, Chap 6, sections 1-10.

More information

P NP for the Reals with various Analytic Functions

P NP for the Reals with various Analytic Functions P NP for the Reals with various Analytic Functions Mihai Prunescu Abstract We show that non-deterministic machines in the sense of [BSS] defined over wide classes of real analytic structures are more powerful

More information

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises Automata and Computability Solutions to Exercises Fall 25 Alexis Maciel Department of Computer Science Clarkson University Copyright c 25 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

More information

C H A P T E R Regular Expressions regular expression

C H A P T E R Regular Expressions regular expression 7 CHAPTER Regular Expressions Most programmers and other power-users of computer systems have used tools that match text patterns. You may have used a Web search engine with a pattern like travel cancun

More information

Introduction to LabVIEW Design Patterns

Introduction to LabVIEW Design Patterns Introduction to LabVIEW Design Patterns What is a Design Pattern? Definition: A well-established solution to a common problem. Why Should I Use One? Save time and improve the longevity and readability

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]

More information

How To Trace

How To Trace CS510 Software Engineering Dynamic Program Analysis Asst. Prof. Mathias Payer Department of Computer Science Purdue University TA: Scott A. Carr Slides inspired by Xiangyu Zhang http://nebelwelt.net/teaching/15-cs510-se

More information

IVR Studio 3.0 Guide. May-2013. Knowlarity Product Team

IVR Studio 3.0 Guide. May-2013. Knowlarity Product Team IVR Studio 3.0 Guide May-2013 Knowlarity Product Team Contents IVR Studio... 4 Workstation... 4 Name & field of IVR... 4 Set CDR maintainence property... 4 Set IVR view... 4 Object properties view... 4

More information

Markov random fields and Gibbs measures

Markov random fields and Gibbs measures Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

More information

Software Verification: Infinite-State Model Checking and Static Program

Software Verification: Infinite-State Model Checking and Static Program Software Verification: Infinite-State Model Checking and Static Program Analysis Dagstuhl Seminar 06081 February 19 24, 2006 Parosh Abdulla 1, Ahmed Bouajjani 2, and Markus Müller-Olm 3 1 Uppsala Universitet,

More information

Notes on Complexity Theory Last updated: August, 2011. Lecture 1

Notes on Complexity Theory Last updated: August, 2011. Lecture 1 Notes on Complexity Theory Last updated: August, 2011 Jonathan Katz Lecture 1 1 Turing Machines I assume that most students have encountered Turing machines before. (Students who have not may want to look

More information

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology

Honors Class (Foundations of) Informatics. Tom Verhoeff. Department of Mathematics & Computer Science Software Engineering & Technology Honors Class (Foundations of) Informatics Tom Verhoeff Department of Mathematics & Computer Science Software Engineering & Technology www.win.tue.nl/~wstomv/edu/hci c 2011, T. Verhoeff @ TUE.NL 1/20 Information

More information

TED-LIUM: an Automatic Speech Recognition dedicated corpus

TED-LIUM: an Automatic Speech Recognition dedicated corpus TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France [email protected]

More information

Deterministic Finite Automata

Deterministic Finite Automata 1 Deterministic Finite Automata Definition: A deterministic finite automaton (DFA) consists of 1. a finite set of states (often denoted Q) 2. a finite set Σ of symbols (alphabet) 3. a transition function

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Monitoring Metric First-order Temporal Properties

Monitoring Metric First-order Temporal Properties Monitoring Metric First-order Temporal Properties DAVID BASIN, FELIX KLAEDTKE, SAMUEL MÜLLER, and EUGEN ZĂLINESCU, ETH Zurich Runtime monitoring is a general approach to verifying system properties at

More information

ANIMATION a system for animation scene and contents creation, retrieval and display

ANIMATION a system for animation scene and contents creation, retrieval and display ANIMATION a system for animation scene and contents creation, retrieval and display Peter L. Stanchev Kettering University ABSTRACT There is an increasing interest in the computer animation. The most of

More information

The P versus NP Solution

The P versus NP Solution The P versus NP Solution Frank Vega To cite this version: Frank Vega. The P versus NP Solution. 2015. HAL Id: hal-01143424 https://hal.archives-ouvertes.fr/hal-01143424 Submitted on 17 Apr

More information

Finding Liveness Errors with ACO

Finding Liveness Errors with ACO Hong Kong, June 1-6, 2008 1 / 24 Finding Liveness Errors with ACO Francisco Chicano and Enrique Alba Motivation Motivation Nowadays software is very complex An error in a software system can imply the

More information

3. The Junction Tree Algorithms

3. The Junction Tree Algorithms A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin [email protected] 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

More information

VoiceXML-Based Dialogue Systems

VoiceXML-Based Dialogue Systems VoiceXML-Based Dialogue Systems Pavel Cenek Laboratory of Speech and Dialogue Faculty of Informatics Masaryk University Brno Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 2 Computer based

More information

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions Theory of Formal Languages In the English language, we distinguish between three different identities: letter, word, sentence.

More information

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: [email protected] October 17, 2015 Outline

More information

Lecture 2: Universality

Lecture 2: Universality CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal

More information