A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS

Similar documents
SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

CHAPTER 2. Time Value of Money 6-1

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

Speeding up k-means Clustering by Bootstrap Averaging

Learning to Filter Spam A Comparison of a Naive Bayesian and a Memory-Based Approach 1

A particle Swarm Optimization-based Framework for Agile Software Effort Estimation

1. The Time Value of Money

APPENDIX III THE ENVELOPE PROPERTY

Average Price Ratios

Simple Linear Regression

A Parallel Transmission Remote Backup System

A probabilistic part-of-speech tagger for Swedish

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

Numerical Methods with MS Excel

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

How To Value An Annuity

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

Chapter Eight. f : R R

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

Chapter = 3000 ( ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization

Settlement Prediction by Spatial-temporal Random Process

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

Credibility Premium Calculation in Motor Third-Party Liability Insurance

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

Green Master based on MapReduce Cluster

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

Three Dimensional Interpolation of Video Signals

Optimizing Software Effort Estimation Models Using Firefly Algorithm

On formula to compute primes and the n th prime

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

The simple linear Regression Model

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

Fast, Secure Encryption for Indexing in a Column-Oriented DBMS

10/19/2011. Financial Mathematics. Lecture 24 Annuities. Ana NoraEvans 403 Kerchof

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

AP Statistics 2006 Free-Response Questions Form B

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

Statistical Intrusion Detector with Instance-Based Learning

Banking (Early Repayment of Housing Loans) Order,

The Digital Signature Scheme MQQ-SIG

Forecasting Trend and Stock Price with Adaptive Extended Kalman Filter Data Fusion

On Error Detection with Block Codes

Reinsurance and the distribution of term insurance claims

Numerical Comparisons of Quality Control Charts for Variables

The impact of service-oriented architecture on the scheduling algorithm in cloud computing

Suspicious Transaction Detection for Anti-Money Laundering

A particle swarm optimization to vehicle routing problem with fuzzy demands

Dynamic Provisioning Modeling for Virtualized Multi-tier Applications in Cloud Data Center

Curve Fitting and Solution of Equation

The Time Value of Money

Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), January Edition, 2011

Robust Realtime Face Recognition And Tracking System

Group Nearest Neighbor Queries

Fault Tree Analysis of Software Reliability Allocation

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

10.5 Future Value and Present Value of a General Annuity Due

of the relationship between time and the value of money.

RUSSIAN ROULETTE AND PARTICLE SPLITTING

STOCHASTIC approximation algorithms have several

THE McELIECE CRYPTOSYSTEM WITH ARRAY CODES. MATRİS KODLAR İLE McELIECE ŞİFRELEME SİSTEMİ

Integrating Production Scheduling and Maintenance: Practical Implications

ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS. Janne Peisa Ericsson Research Jorvas, Finland. Michael Meyer Ericsson Research, Germany

Web Service Composition Optimization Based on Improved Artificial Bee Colony Algorithm

Performance Attribution. Methodology Overview

ON SLANT HELICES AND GENERAL HELICES IN EUCLIDEAN n -SPACE. Yusuf YAYLI 1, Evren ZIPLAR 2. yayli@science.ankara.edu.tr. evrenziplar@yahoo.

Study on prediction of network security situation based on fuzzy neutral network

A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining

Common p-belief: The General Case

IP Network Topology Link Prediction Based on Improved Local Information Similarity Algorithm

The Popularity Parameter in Unstructured P2P File Sharing Networks

M. Salahi, F. Mehrdoust, F. Piri. CVaR Robust Mean-CVaR Portfolio Optimization

Transcription:

A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS I Ztou, K Smaïl, S Delge, F Bmbot To cte ths verso: I Ztou, K Smaïl, S Delge, F Bmbot. A COMPARATIVE STUDY BETWEEN POLY- CLASS AND MULTICLASS LANGUAGE MODELS. Proceedgs of the Ffth Iteratoal Coferece o Spoke Laguage Processg, 998, Sydey, Australa. 998. <hal-0292> HAL Id: hal-0292 https://hal.archves-ouvertes.fr/hal-0292 Submtted o 3 Feb 205 HAL s a mult-dscplary ope access archve for the depost ad dssemato of scetfc research documets, whether they are publshed or ot. The documets may come from teachg ad research sttutos Frace or abroad, or from publc or prvate research ceters. L archve ouverte plurdscplare HAL, est destée au dépôt et à la dffuso de documets scetfques de veau recherche, publés ou o, émaat des établssemets d esegemet et de recherche fraças ou étragers, des laboratores publcs ou prvés.

A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS I. Ztou*, K. Smaïl*, J.P. Hato*, S. Delge +, F. Bmbot + ³ *LORIA, BP 239, 54506 Nacy, Frace ³ IRISA-CNRS/INRIA, Campus uverstare de Beauleu, 35042 Rees cedex, Frace +ENST-Dept Sgal, CNRS-URA 820, 75634 Pars cedex 3, Frace ATR Iterpretg Telecommucato Labs, Kyoto, Japa E-mal: {ztou, smal, jph}@lora.fr, sdelge@tl.atr.co.jp, bmbot@rsa.fr ABSTRACT I ths work, we troduce the cocept of Multclass for laguage modelg ad we compare t to the Polyclass model. The orgalty of the Multclass s ts capablty to parse a strg of classes/tags to varable legth depedet sequeces. A few expermetal tests were carred out o a class corpus extracted from the Frech «Le Mode» word corpus labeled automatcally. Ths corpus cotas a set of 43 mllo of words. I our expermets, Multclass outperform frst-order Polyclass but are slghtly outperformed by secod-order Polyclass.. INTRODUCTION Laguage ca be vewed as a stream of words emtted by a source. Ths laguage source beg subject to sytactc ad sematc costrats, words are ot depedet, ad the depedeces are of varable legth. Oe ca therefore expect to retreve, a corpus, typcal varable-legth sequeces of words. The Multclass model, preseted ths paper, s a applcato of the Multgram model [] to sequeces of classes for modelg these varable-legth depedeces. To deal wth the sytactc costrats a laguage, we label the stream of words wth 233 classes (a word ca belog to several classes) extracted from the eght elemetary grammatcal classes of the Frech laguage. Ths paper presets a comparso of the Multclass laguage model wth the -class model whch s a kd of a geeralzato of the most used laguage model the speech recogto commuty (the -gram). The -class model used ths paper s a terpolated -class amed Polyclass. Ths laguage model s based o the same prcples as the -gram laguage model wth ths dfferece that classes are used stead of words. I the followg we frst dscuss the ecessty ad the maer of taggg a corpus of text (Secto 2). Secod, we troduce the cocept of the Polyclass laguage model used for the comparso (Secto 3). Thrd, we gve a theoretcal backgroud of the Multclass laguage model (Secto 4). The, we report a evaluato of the Multclass model ad a comparso wth the Polyclass model (Secto 5). Fally, we gve a cocluso ad some perspectves. 2. THE NECESSITY OF TAGGING The cocept of class s very mportat the two methods preseted below. We expla ths secto how we proceeded to tag our corpus usg a set of sytactc tags. The problem at had s the followg: gve a setece W(w w 2...w ), how to label the words of W wth the sytactc categores C(c c 2...c ) a way whch maxmzes: c) Pw ( w / c c) c/ w w) = () Pw ( w ) As we are terested fdg c c 2...c, the deomator wll ot affect the computato. By makg some depedece assumptos, formula () ca be expressed as: Pcc ( 2 c / ww 2 w) = / c 2c ) Pw ( / c) = (2) I order to estmate the probablty P(c / c -2 c - ), we eed to tag each word of the trag corpus. Cosequetly, the dctoary of the applcato eeds a sytactc feld for each etry. Ths volves that some words have to be duplcated f they appear more tha oe class. From the eght elemetary grammatcal classes of Frech, we bult up 233 classes cludg puctuato. Labelg the words of the vocabulary wth these 233 classes resulted a dctoary of 230 000 etres, each of whch cossts of a word ad ts sytactc class. The probablty P(c / c -2 c - ) ca be expressed as a relatve frequecy c ( 2c c) / c 2c ) = c ( c ) 2 Where (x) couts the umber of co-occurreces of the sytactc tags specfed by x a trag text. The frst step cossts collectg the couts of 3-class (a sequece of 3 classes) ad 2- class (a sequece of 2 classes). For that purpose, we labeled a small text by had, ad wth the statstcs collected we tagged automatcally a text of 0.5 mllo of words extracted from L Est Républca Frech ewspaper. The errors resultg from ths automatc taggg were had-corrected, ad the updated label statstcs were used to automatcally tag aother, larger, set of 43 (3)

mllo words, cosstg of 2 years (987-988) of Le Mode (LeM) ewspaper. Taggg a corpus meas to fd the most lkely sequece of classes for a sequece of words. I our approach we used a modfed Vterb algorthm [2]. 3. POLYCLASS MODEL Lke Multclass, we use a corpus of tags/classes obtaed by labelg a text corpus. Each word of the corpus s a sytactc class. A Polyclass model s a laguage model whch takes to accout oly classes. The formalsm of Polyclass laguage model ca be expressed as : PCC (... C) = pc ( / h) 2 (4) = where h, the hstory of C, s a fxed-legth sequece of classes. We call the legth of the hstory the order of the Polyclass model. I the followg, we use oly secod ad frst order Polyclass. Eve though the set of dstct sytactc tags s much smaller tha the sze of the vocabulary (233 tags versus thousads of words), most combatos of class labels occur oly a few tmes ay. I our corpus LeM, 34% of the observed 3-class ad more tha 5% of the observed 2-class occur oly oce, ad, 34% of the observed 2-class ad 62% of the observed 3-class occur 5 tmes or less. The errors resultg from the automatc taggg ted to ehace the heret sparseess of the data. I order to get relable estmates, the probabltes P(C /h ) have thus to be smoothed [3]. For ths purpose, we used a terpolato scheme, where the relatve couts of the 3-class (h cossts of the 2 class labels precedg C ) are learly terpolated wth the relatve couts of the 2, ad 0-class: αpc ( / c c ) + βpc ( / c ) + γ pc ( ) + δ where α + β + γ + δ =. 2 (5) The terpolato weghts α, β, γ ad δ were estmated by maxmzg the lkelhood of a developmet corpus. For ths purpose, we used the algorthm proposed by Jelek & al., who showed [3] that the ML estmato of the terpolato weghts could be assmlated to the ML estmato of the trasto probabltes of a HMM, thus allowg to use the forward-backward algorthm classcally used the HMM framework. 4. MULTICLASS MODEL I the Multclass approach, derved from the Multgram framework, strg of classes are assumed to result from the cocateato of varable-legth sequeces of classes, of maxmum legth class labels. The lkelhood of a strg of classes s computed by summg the lkelhood values of all possble segmetatos of the strg to sequeces of classes. By deotg by L a segmetato of a strg C of classes: (6) PC ( ) = PCL (, ) The decso-oreted verso of the model parses C accordg to the most lkely segmetato, thus yeldg the approxmato: P * ( C) = max P( C, L) L {L} The lkelhood computato for ay partcular segmetato to sequeces depeds o the model assumed to descrbe the depedeces betwee the sequeces. Assumg that the sequeces of classes are depedet, t comes: t= q PCL (, ) = pst ( ( )) t= where s(t) deotes the t th sequece of classes the segmetato L of C. The model s thus fully specfed by the set of probabltes, {p(s )}, of all the sequeces s whch ca be formed by combg, 2, or class labels. Maxmum lkelhood estmates of these probabltes ca be computed by formulatg the estmato problem as a ML estmato from complete data [5], where the observed data s the strg of symbols C, ad the ukow data s the uderlyg segmetato L. Deotg by b(s, L) the umber of occurreces of the sequece s a segmetato L of the corpus, at terato k+ the probablty of the sequece s s obtaed [4]: p ( s ) = ( k+ ) m (7) (8) ( k ) b( s, L) P ( C, L) ( k) b( L) P ( C, L) where b(l) = b(s,l) s the total umber of = sequeces L. Equato (9) shows that the estmate for p(s ) s merely a weghted average of the umber of occurreces of sequece s wth each segmetato. Sce each terato mproves the model the sese of creasg the lkelhood P (k) (C), t evetually coverges to a crtcal pot (possbly a local maxmum). The reestmato (9) ca be mplemeted by meas of a forward-backward algorthm [4]. The set of tal probabltes ca be talzed wth the relatve frequeces of all cooccurreces of symbols up to legth the trag corpus. The the probabltes are teratvely reestmated utl the trag set lkelhood does ot crease sgfcatly, or wth a fxed umber of teratos. I practce, some prug techque may be advatageously appled to the dctoary of sequeces, order to avod over-learg. A straghtforward way to proceed cossts smply dscardg, at each terato, the most ulkely sequeces,.e. those wth a probablty value fallg uder a prespecfed threshold. (9)

5. EVALUATION I ths secto, we preset a comparatve evaluato of the Polyclass ad of the Multclass models, based o expermets o the LeM corpus. For each expermet, we used a vocabulary of 233 classes cludg puctuato extracted from the eght elemetary grammatcal classes of the Frech laguage [6]. These classes are dvded to two groups: the ope ad closed classes. A closed class s made up of a fte umber of words (such as artcles, preposto,...). A ope class s made up of words whch ca be formed from root s word (such as verbs, ous,...). Each puctuato symbol s a sgle class. The performace of the Multclass ad the Polyclass are evaluated terms of class perplexty [7]: PP = 2 log 2 P( C) T where T s the umber of sytactc tags a test set C. I the Multclass case, P(C) s computed from equato (6). The frst expermet cocers the Polyclass laguage model. I ths expermet, the Polyclass relatve couts are computed o a trag set of 40 mllos of classes, ad the terpolato weghts (α, β, γ, δ) o a addtoal developmet set of,8 mllos of classes. Test perplexty values are computed o a dstct test set of about,6 mllos of classes. The corpus of developmet ad test do ot appear the trag corpus. Table shows the results obtaed for a frst ad secod order Polyclass model ad gves the values of the terpolato weghts. Order α β γ δ Nb PP 0 9,99x0-6,57x0-5 0 7 500 3,59 2 0,997 2,04x0-3 6,57x0-5 0 265 000,03 Tab : Ths table shows for each Polyclass model wth a order of ad 2 the values of the terpolato weghts, the umber of parameters the model Nb, ad the Polyclass perplexty PP o a test corpus of,6 mllos of classes. I a secod seres of expermets, we compare the Polyclass ad The Multclass models o oly oe moth (Ja87) of LeM corpus, whch we splt to a trag corpus, a developmet corpus ad a test corpus. We use a trag corpus of 55000 class seteces (more tha,7 mllo of classes), a test corpus of 5000 class seteces (more tha 0,5 mllo classes) ad a developmet corpus of 3000 class seteces (more tha 0, mllo classes). I the Polyclass model, the developmet corpus s used to evaluate the parameters α, β, γ, δ, ad the Multclass model we use ths corpus to optmze the maxmum umber () of classes a Multclass sequece ad the umber of occurreces (C 0 ) above whch a sequece of words s cluded the tal vetory of sequeces. The corpora of developmet ad test do ot appear the trag corpus. For the Multclass laguage model, all co-occurreces symbols are used to get tal estmates of the sequece probabltes. However, to avod overlearg, we foud t effcet to dscard frequet co-occurreces,.e. those appearg strctly less tha a gve umber of tmes C 0. The, 0 trag teratos are performed ths expermet wth dfferet values of ad C 0. Sequece probabltes fallg uder a threshold P 0 are set to 0, except those of legth whch are assged a mmum probablty P 0. We set the fxed probablty P 0 5 0-6 whch s half the probablty of a class occurrg oly oce the trag corpus. After the talzato ad for each terato, probabltes are reormalzed so that they add up to [4]. C 0 =0 C 0 = C 0 =2 C 0 =5 C 0 =0 4,08 4,9 4,25 4,35 4,47 3 4,98 4,85 4,83 4,85 4,9 4,79 4,63 4,6 4,64 4,70 Nb 25034 20663 794 3708 0490 9,65 0,87,27,77 2,5 5 8,5 3,20 2,86 2,77 2,89 8,3 2,95 2,58 2,48 2,6 Nb 25876 9820 78223 50568 33696 5,02 8,96 0,03,04,63 8 2,59 3,34 2,52 2,32 2,50 2,57 3,25 2,35 2,03 2,9 Nb 88994 56376 7525 67943 4492 Tab 2 : Ths table shows the umber of learg parameters (Nb), the perplexty o the trag corpus ( ), the perplexty o the developmet corpus ( ) ad the perplexty o the test corpus ( ) for dfferet umber of ad C 0. s the maxmum umber of words a Multclass sequece ad C 0 s the umber of occurreces above whch a sequece of words s cluded the tal vetory of sequeces. The expermets of Table 2 show that the mmum perplexty s for 8 ad C 0 5. Other expermets wth {7,8,9,0} ad C 0 {4,5,6,7} are reported Table 3. The expermets (Table 3) show that the mmum perplexty (2,00) o the test corpus s obtaed wth =0 ad C 0 =4. The comparso of perplexty of both Multclass ad Polyclass dcates that from =5 ad C 0, the Multclass s better tha the frst order Polyclass (3,46) but gves less good results tha secod order Polyclass (,43). It s mportat to ote that the umber of uts s the same order of magtude for optmal Multclass ad the secod order Polyclass ( 70000 for Multclass vs 80000 for secod order Polyclass).

C 0 =4 C 0 =5 C 0 =6 C 0 =7 2,35 2,37 2,39 2,44 7 2,09 2,07 2,08 2, Nb 77359 6699 59378 53207 2,28 2,32 2,36 2,4 8 2,04 2,03 2,04 2,07 Nb 78792 67943 60055 5380 2,25 2,29 2,33 2,38 9 2,02 2,00 2,02 2,06 Nb 7866 67820 59957 53600 2,24 2,28 2,32 2,37 0 2,00 2,00 2,0 2,05 Nb 7830 67355 59552 53239 Tab 3 : Ths table shows the umber of learg parameters (Nb), the perplexty o the developmet corpus ( ) ad the perplexty o the test corpus ( ) for {7,8,9,0} ad C 0 {4,5,6,7}. s the maxmum umber of words a Multclass sequece ad C 0 s the umber of occurreces above whch a sequece of words s cluded the tal vetory of sequeces. Order α β γ δ Nb PP 0 9,98x0 -,29x0-3 0 9 00 3,46 2 0,98,73x0-2,29x0-3 0 80 000,43 Tab4 : The table shows for each Polyclass model wth a order of ad 2 the values of the ecessary parameters, the umber of learg parameters Nb ad the class perplexty PP o a corpus of 55000 classes. Table 4 shows the results obtaed for a Polyclass model usg respectvely a legth hstory of (order ) ad 2 (order 2). 6. CONCLUSION AND PERSPECTIVES The expermets reported ths paper show that the Multclass approach s a compettve alteratve to the Polyclass (-class) laguage model. O our task, the Multclass laguage model outperforms terms of perplexty the frst order Polyclass model (2-class terpolated wth the -class ad 0-class), but we ote that the Multclass model gves slghtly less good results tha the secod order Polyclass. I order to mprove the Multclass model, we wll study methods for terpolatg the sequece probabltes. Aother drecto cossts assumg depedeces betwee the sequeces of classes as s proposed [8][9][0]. It also seems terestg to vestgate the applcato of the Multclass approach to other ssues. Ideed, ths approach mght be advatageously used to flter the lattce or the N-best lst of sequeces output by a speech recogzer, for stace by supplyg formato o sematc equvalece betwee sequeces of words. More geerally, t may fd applcatos the area of laguage uderstadg, such as cocept taggg based o the labels of phrase classes. 7. REFERENCES. F. Bmbot, R. Peracc, E. Lev ad B. Atal «Varable-Legth Sequece Modelg: Multgrams». IEEE Sgal Processg Letters, N 6, Vol. 2, Ju 995. 2. K. Smaïl, I. Ztou, F. Charpllet, J.-P. Hato «A Hybrd Laguage Model for a Cotuous Dctato Prototype», 5 th Europea Coferece o Speech Commucato ad Techology, PP 2723-2726, Vol 5, Rhodes 997. 3. F. Jelek, R. Mercer ad S. Roukos «Prcples of Lexcal Laguage Modelg for Speech Recogto», Avaces Sgal Processg, Furu S. edtor, New- York, Marcel Dekker, PP 65-699, 992. 4. S. Delge ad F. Bmbot. Laguage Modelg by Varable Legth Sequeces: Theoretcal Formulato ad Evaluato of Multgrams. ICASSP95, PP 69-72, 995. 5. A.P. Dempster, N.M. Lard, ad D.B. Rub. «Maxmum-lkelhood from complete data va the EM algorthm». Joural of the Royal Statstcal Socety, 39():-38, 977. 6. K. Smaïl, F. Charpllet ad JP. Hato A ew Algorthm for Word Classfcato based o a Improved Smulated Aealg Techque. 5th Iteratoal Coferece o the Cogtve Scece of Natural Laguage Processg, Dubl, 996. 7. F. Jelek (990). Self-orgazed laguage modelg for speech recogto, Readg Speech Recogto, pp.450-506. Ed. A.Wabel ad K.F.Lee edtor. Morga Kaufma Publshers Ic.,Sa Mateo, Calfora, 990. 8. I. Ztou, K. Smaïl ad JP. Hato. Varable-Legth Class Sequeces Based o a Herarchcal Approach: MC v. wll appear Iteratoal workshop SPEECH ad COMPUTER. St Petersburg (998). 9. S. Delge ad Y. Sagsaka. Learg a Sytagmatc ad Paradgmatc Structure from Laguage Data wth a bmultgram Model. Proceedgs of COLING-ACL 98, August 998. 0. S. Delge, F. Yvo ad F. Bmbot «Itroducg statstcal depedeces ad structural costrats varable-legth sequece models», Lecture Notes Artfcal Itellgece 47, PP 56-67, Sprger 996.