Hybrid Selection of Language Model Training Data Using Linguistic Information and Perplexity

Size: px
Start display at page:

Download "Hybrid Selection of Language Model Training Data Using Linguistic Information and Perplexity"


1 Hybrid Seection o Language Mode Training Data Using Linguistic Inormation and Antonio Tora Schoo o Computing Dubin City University Dubin, Ireand atora@computing.dcu.ie Abstract We expore the seection o training data or anguage modes using perpexity. We introduce three nove modes that make use o inguistic inormation and evauate them on three dierent corpora and two anguages. In our out o the six scenarios a inguisticay motivated method outperorms the purey statistica state-o-theart approach. Finay, a method which combines surace orms and the inguisticay motivated methods outperorms the baseine in a the scenarios, seecting data whose perpexity is between 3.49% and 8.17% (depending on the corpus and anguage) ower than that o the baseine. 1 Introduction Language modes (LMs) are a undamenta piece in statistica appications that produce natura anguage text, such as machine transation and speech recognition. In order to perorm optimay, a LM shoud be trained on data rom the same domain as the data that it wi be appied to. This poses a probem, because in the majority o appications, the amount o domain-speciic data is imited. A popuar strand o research in recent years to tacke this probem is that o training data seection. Given a imited domain-speciic corpus and a arger non-domain-speciic corpus, the task consists on inding suitabe data or the speciic domain in the non-domain-speciic corpus. The underying assumption is that a non-domain-speciic corpus, i broad enough, contains sentences simiar to a domain-speciic corpus, which thereore, woud be useu or training modes or that domain. This paper ocuses on the approach that uses perpexity or the seection o training data. The irst works in this regard (Gao et a., 2002; Lin et a., 1997) use the perpexity according to a domain-speciic LM to rank the text segments (e.g. sentences) o non-domain-speciic corpora. The text segments with perpexity ess than a given threshod are seected. A more recent method, which can be considered the state-o-the-art, is Moore-Lewis (Moore and Lewis, 2010). It considers not ony the crossentropy 1 according to the domain-speciic LM but aso the cross-entropy according to a LM buit on a random subset (equa in size to the domainspeciic corpus) o the non-domain-speciic corpus. The additiona use o a LM rom the nondomain-speciic corpus aows to seect a subset o the non-domain-speciic corpus which is better (the perpexity o a test set o the speciic domain has ower perpexity on a LM trained on this subset) and smaer compared to the previous approaches. The experiment was carried out or Engish, using Europar (Koehn, 2005) as the domain-speciic corpus and LDC Gigaword 2 as the non-domain-speciic one. In this paper we study whether the use o two types o inguistic knowedge (emmas and named entities) can contribute to obtain better resuts within the perpexity-based approach. 2 Methodoogy We expore the use o inguistic inormation or the seection o data to train domain-speciic LMs rom non-domain-speciic corpora. Our hypothesis is that ranking by perpexity on n-grams that represent inguistic patterns (rather than n-grams that represent surace orms) captures additiona inormation, and thus may seect vauabe data that is not seected according soey to surace orms. We use two types o inguistic inormation at 1 note that using cross-entropy is equivaent to using perpexity since they are monotonicay reated. 2 cataogentry.jsp?cataogid=ldc2007t07 8 Proceedings o the Second Workshop on Hybrid Approaches to Transation, pages 8 12, Soia, Bugaria, August 8, c 2013 Association or Computationa Linguistics

2 word eve: emmas and named entity categories. We experiment with the oowing modes: Forms (hereater ), uses surace orms. This mode repicates the Moore-Lewis approach and is to be considered the baseine. Forms and named entities (hereater ), uses surace orms, with the exception o any word detected as a named entity, which is substituted by its type (e.g. person, organisation). Lemmas (hereater ), uses emmas. Lemmas and named entities (hereater n), uses emmas, with the exception o any word detected as a named entity, which is substituted by its type. A sampe sentence, according to each o these modes, oows: : I decare resumed the session o the European Pariament : I decare resumed the session o the NP00O00 : i decare resume the session o the european_pariament n: i decare resume the session o the NP00O00 Tabe 1 shows the number o n-grams on LMs buit on the Engish side o News Commentary v8 (hereater NC) or each o the modes. Regarding 1-grams, compared to, the substitution o named entities by their categories () resuts in smaer vocabuary size (-24.79%). Simiary, the vocabuary is reduced or the modes (-8.39%) and n ( %). Athough not a resut in itse, this might be an indication that using inguisticay motivated modes coud be useu to dea with data sparsity. n n Tabe 1: Number o n-grams in LMs buit using the dierent modes Our procedure oows that o the Moore-Lewis method. We buid LMs or the domain-speciic corpus and or a random subset o the nondomain-speciic corpus o the same size (number o sentences) o the domain-speciic corpus. Each sentence s in the non-domain-speciic corpus is then scored according to equation 1 where P P I (s) is the perpexity o s according to the domainspeciic LM and P P O (s) is the perpexity o s according to the non-domain-speciic LM. score(s) = P P I (s) P P O (s) (1) We buid LMs or the domain-speciic and nondomain-speciic corpora using the our modes previousy introduced. Then we rank the sentences o the non-domain-speciic corpus or each o these modes and keep the highest ranked sentences according to a threshod. Finay, we buid a LM on the set o sentences seected 3 and compute the perpexity o the test set on this LM. We aso investigate the combination o the our modes. The procedure is airy straightorward: given the sentences seected by a the modes or a given threshod, we iterate through these sentences oowing the ranking order and keeping a the distinct sentences seected unti we obtain a set o sentences whose size is the one indicated by the threshod. I.e. we add to our distinct set o sentences irst the top ranked sentence by each o the methods, then the sentence ranked second by each method, and so on. 3 Experiments 3.1 Setting We use corpora rom the transation task at WMT13. 4 Our domain-speciic corpus is NC, and we carry out experiments with three non-domainspeciic corpora: a subset o Common Craw 5 (hereater CC), Europar version 7 (hereater EU), and United Nations (Eisee and Chen, 2010) (hereater UN). We use the test data rom WMT12 (newstest2012) as our test set. We carry out experiments on two anguages or which these corpora are avaiabe: Engish (reerred to as en in tabes) and Spanish ( es in tabes). We test the methods on three very dierent nondomain-speciic corpora, both in terms o the topics that they cover (text crawed rom web in CC, pariamentary speeches in EU and oicia documents rom United Nations in UN) and their size 3 For the inguistic methods we repace the sentences seected (which contain emmas and/or named entities) with the corresponding sentences in the origina corpus (containing ony word orms). 4 transation-task.htm 5 9

3 (around 2 miion sentences both or CC and EU, and around 11 miion or UN). This can be considered as a contribution o this paper since previous works such as Moore and Lewis (2010) and, more recenty, Axerod et a. (2011) test the Moore-Lewis method on ony one non-domainspeciic corpus: LDC Gigaword and an unpubished genera-domain corpus, respectivey. A the LMs are buit with IRSTLM (Federico et a., 2008), use up to 5-grams and are smoothed using a simpiied version o the improved Kneser-Ney method (Chen and Goodman, 1996). For emmatisation and named entity recognition we use Freeing 3.0 (Padró and Staniovsky, 2012). The corpora are tokenised and truecased using scripts rom the Moses tookit (Koehn et a., 2007). 3.2 Experiments with Dierent Modes Figures 1, 2 and 3 show the perpexities obtained by each method on dierent subsets seected rom the Engish corpora CC, EU and UN, respectivey. We obtain these subsets according to dierent threshods, i.e. percentages o sentences seected rom the non-domain-speciic corpus. These are 1 the irst 64 ranked sentences, 1 32, 1 16, 1 8, 1 4, 1 2 and 1. 6 Corresponding igures or Spanish are omitted due to the imited space avaiabe and aso because the trends in those igures are very simiar Figure 1: Resuts o the dierent methods on CC In a the igures, the resuts are very simiar regardess o the use o emmas. The use o named entities, however, produces substantiay dierent resuts. The modes that do not use named entity categories obtain the best resuts or ower threshods (up to 1/32 or CC, and up to 1/16 both or 6 1 An additiona threshod,, is used or the United Nations 128 corpus n Figure 2: Resuts o the dierent methods on EU Figure 3: Resuts o the dierent methods on UN EU and UN). I the best perpexity is obtained with a ower threshod than this (the case o EU, 1/32, and UN, 1/64), then methods that do not use named entities obtain the best resut. However, i the optima perpexity is obtained with a higher threshod (the case o CC, 1/2), then using named entities yieds the best resut. Tabe 2 presents the resuts or each mode. For each scenario (corpus and anguage combination), we show the threshod or which the best resut is obtained (coumn best). The perpexity obtained on data seected by each mode is shown in the subsequent coumns. For the inguistic methods, we aso show the comparison o their perormance to the baseine (as percentages, coumns di). The perpexity when using the u corpus is shown (coumn u) together with the comparison o this resut to the best method (ast coumn di). The resuts, as previousy seen in Figures 1, 2 and 3, dier with respect to the corpus but oow simiar trends across anguages. For CC we obtain the best resuts using named entities. The mode n obtains the best resut or Engish (5.54% ower n n 10

4 corpus best di di n di u di cc en 1/ eu en 1/ un en 1/ cc es 1/ eu es 1/ un es 1/ Tabe 2: Resuts or the dierent modes perpexity than the baseine), whie the mode obtains the best resut or Spanish (3.82%), athough in both cases the dierence between these two modes is rather sma. For the other corpora, the best resuts are obtained without named entities. In the case o EU, the baseine obtains the best resut, athough the mode is not very ar (1.18% higher perpexity or Engish and 1.63% or Spanish). This trend is reversed or UN, the mode obtaining the best scores but cose to the baseine (-0.51%, -0.35%). 3.3 Experiments with the Combination o Modes Tabe 3 shows the perpexities obtained by the method that combines the our modes (coumn comb) or the threshod that yieded the best resut in each scenario (see Tabe 2), compares these resuts (coumn di) to those obtained by the baseine (coumn ) and shows the percentage o sentences that this method inspected rom the sentences seected by the individua methods (coumn perc). corpus comb di perc cc en eu en un en cc es eu es un es Tabe 3: Resuts o the combination method The combination method outperorms the baseine and any o the individua inguistic modes in a the scenarios. The perpexity obtained by combining the modes is substantiay ower than that obtained by the baseine (ranging rom 3.49% to 8.17%). In a the scenarios, the combination method takes its sentences rom roughy the top 70% sentences ranked by the individua methods. 4 Concusions and Future Work This paper has expored the use o inguistic inormation (emmas and named entities) or the task o training data seection or LMs. We have introduced three inguisticay motivated modes, and compared them to the state-o-the-art method or perpexity-based data seection across three dierent corpora and two anguages. In our out o these six scenarios a inguisticay motivated method outperorms the state-o-the-art approach. We have aso presented a method which combines surace orms and the three inguisticay motivated methods. This combination outperorms the baseine in a the scenarios, seecting data whose perpexity is between 3.49% and 8.17% (depending on the corpus and anguage) ower than that o the baseine. Regarding uture work, we have severa pans. One interesting experiment woud be to appy these modes to a morphoogicay-rich anguage, to check i, as hypothesised, these modes dea better with sparse data. Another strand regards the appication o these modes to iter parae corpora, e.g. oowing the extension o the Moore-Lewis method (Axerod et a., 2011) or in combination with other methods which are deemed to be more suitabe or parae data, e.g. (Mansour et a., 2011). We have used one type o inguistic inormation in each LM, but another possibiity is to combine dierent pieces o inguistic inormation in a singe LM, e.g. oowing a hybrid LM that uses words and tags, depending o the requency o each type (Ruiz et a., 2012). Given the act that the best resut is obtained with dierent modes depending on the corpus, it woud be worth to investigate whether given a new corpus, one coud predict the best method to be appied and the threshod or which one coud expect to obtain the minimum perpexity. 11

5 Acknowedgments We woud ike to thank Raphaë Rubino or insightu conversations. The research eading to these resuts has received unding rom the European Union Seventh Framework Programme FP7/ under grant agreements PIAP- GA and FP7-ICT Reerences Amittai Axerod, Xiaodong He, and Jianeng Gao Domain adaptation via pseudo in-domain data seection. In Proceedings o the Conerence on Empirica Methods in Natura Language Processing, EMNLP 11, pages , Stroudsburg, PA, USA. Association or Computationa Linguistics. Staney F. Chen and Joshua Goodman An empirica study o smoothing techniques or anguage modeing. In Proceedings o the 34th annua meeting on Association or Computationa Linguistics, ACL 96, pages , Stroudsburg, PA, USA. Association or Computationa Linguistics. Andreas Eisee and Yu Chen Mutiun: A mutiingua corpus rom united nation documents. In Nicoetta Cazoari, Khaid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Steios Piperidis, Mike Rosner, and Danie Tapias, editors, LREC. European Language Resources Association. and Evangeos Dermatas, editors, EUROSPEECH. ISCA. Saab Mansour, Joern Wuebker, and Hermann Ney Combining transation and anguage mode scoring or domain-speciic data itering. In Internationa Workshop on Spoken Language Transation, pages , San Francisco, Caiornia, USA, December. Robert C. Moore and Wiiam Lewis Inteigent seection o anguage mode training data. In Proceedings o the ACL 2010 Conerence Short Papers, ACLShort 10, pages , Stroudsburg, PA, USA. Association or Computationa Linguistics. Luís Padró and Evgeny Staniovsky Freeing 3.0: Towards wider mutiinguaity. In Proceedings o the Language Resources and Evauation Conerence (LREC 2012), Istanbu, Turkey, May. ELRA. Nick Ruiz, Arianna Bisazza, Rodano Cattoni, and Marceo Federico FBK s Machine Transation Systems or IWSLT 2012 s TED Lectures. In Proceedings o the 9th Internationa Workshop on Spoken Language Transation (IWSLT). Marceo Federico, Nicoa Bertodi, and Mauro Cettoo IRSTLM: an open source tookit or handing arge scae anguage modes. In INTER- SPEECH, pages ISCA. Jianeng Gao, Joshua Goodman, Mingjing Li, and Kai- Fu Lee Toward a uniied approach to statistica anguage modeing or chinese. 1(1):3 33, March. Phiipp Koehn, Hieu Hoang, Aexandra Birch, Chris Caison-Burch, Marceo Federico, Nicoa Bertodi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Aexandra Constantin, and Evan Herbst Moses: open source tookit or statistica machine transation. In Proceedings o the 45th Annua Meeting o the ACL on Interactive Poster and Demonstration Sessions, ACL 07, pages , Stroudsburg, PA, USA. Association or Computationa Linguistics. Phiipp Koehn Europar: A Parae Corpus or Statistica Machine Transation. In Conerence Proceedings: the tenth Machine Transation Summit, pages 79 86, Phuket, Thaiand. AAMT, AAMT. Sung-Chien Lin, Chi-Lung Tsai, Lee-Feng Chien, Keh- Jiann Chen, and Lin-Shan Lee Chinese anguage mode adaptation based on document cassiication and mutipe domain-speciic anguage modes. In George Kokkinakis, Nikos Fakotakis, 12

A Similarity Search Scheme over Encrypted Cloud Images based on Secure Transformation

A Similarity Search Scheme over Encrypted Cloud Images based on Secure Transformation A Simiarity Search Scheme over Encrypted Coud Images based on Secure Transormation Zhihua Xia, Yi Zhu, Xingming Sun, and Jin Wang Jiangsu Engineering Center o Network Monitoring, Nanjing University o Inormation

More information

Dynamic Pricing Trade Market for Shared Resources in IIU Federated Cloud

Dynamic Pricing Trade Market for Shared Resources in IIU Federated Cloud Dynamic Pricing Trade Market or Shared Resources in IIU Federated Coud Tongrang Fan 1, Jian Liu 1, Feng Gao 1 1Schoo o Inormation Science and Technoogy, Shiiazhuang Tiedao University, Shiiazhuang, 543,

More information

Collaborative Machine Translation Service for Scientific texts

Collaborative Machine Translation Service for Scientific texts Collaborative Machine Translation Service for Scientific texts Patrik Lambert patrik.lambert@lium.univ-lemans.fr Jean Senellart Systran SA senellart@systran.fr Laurent Romary Humboldt Universität Berlin

More information

Machine translation techniques for presentation of summaries

Machine translation techniques for presentation of summaries Grant Agreement Number: 257528 KHRESMOI www.khresmoi.eu Machine translation techniques for presentation of summaries Deliverable number D4.6 Dissemination level Public Delivery date April 2014 Status Author(s)

More information

Minimum Support Size of the Defender s Strong Stackelberg Equilibrium Strategies in Security Games

Minimum Support Size of the Defender s Strong Stackelberg Equilibrium Strategies in Security Games Minimum Support Size o the Deender s Strong Stackeberg Equiibrium Strategies in Security Games Jiarui Gan University o Chinese Academy o Sciences The Key Lab o Inteigent Inormation Processing, ICT, CAS

More information

Face Hallucination and Recognition

Face Hallucination and Recognition Face Haucination and Recognition Xiaogang Wang and Xiaoou Tang Department of Information Engineering, The Chinese University of Hong Kong {xgwang1, xtang}@ie.cuhk.edu.hk http://mmab.ie.cuhk.edu.hk Abstract.

More information

Australian Bureau of Statistics Management of Business Providers

Australian Bureau of Statistics Management of Business Providers Purpose Austraian Bureau of Statistics Management of Business Providers 1 The principa objective of the Austraian Bureau of Statistics (ABS) in respect of business providers is to impose the owest oad

More information



More information

Art of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN: 1-932394-06-0

Art of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN: 1-932394-06-0 IEEE DISTRIBUTED SYSTEMS ONLINE 1541-4922 2005 Pubished by the IEEE Computer Society Vo. 6, No. 5; May 2005 Editor: Marcin Paprzycki, http://www.cs.okstate.edu/%7emarcin/ Book Reviews: Java Toos and Frameworks

More information

Advanced ColdFusion 4.0 Application Development - 3 - Server Clustering Using Bright Tiger

Advanced ColdFusion 4.0 Application Development - 3 - Server Clustering Using Bright Tiger Advanced CodFusion 4.0 Appication Deveopment - CH 3 - Server Custering Using Bri.. Page 1 of 7 [Figures are not incuded in this sampe chapter] Advanced CodFusion 4.0 Appication Deveopment - 3 - Server

More information

Adaptation to Hungarian, Swedish, and Spanish

Adaptation to Hungarian, Swedish, and Spanish www.kconnect.eu Adaptation to Hungarian, Swedish, and Spanish Deliverable number D1.4 Dissemination level Public Delivery date 31 January 2016 Status Author(s) Final Jindřich Libovický, Aleš Tamchyna,

More information

Adapting General Models to Novel Project Ideas

Adapting General Models to Novel Project Ideas The KIT Translation Systems for IWSLT 2013 Thanh-Le Ha, Teresa Herrmann, Jan Niehues, Mohammed Mediani, Eunah Cho, Yuqi Zhang, Isabel Slawik and Alex Waibel Institute for Anthropomatics KIT - Karlsruhe

More information

Fast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing.

Fast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing. Fast Robust Hashing Manue Urueña, David Larrabeiti and Pabo Serrano Universidad Caros III de Madrid E-89 Leganés (Madrid), Spain Emai: {muruenya,darra,pabo}@it.uc3m.es Abstract As statefu fow-aware services

More information

Pay-on-delivery investing

Pay-on-delivery investing Pay-on-deivery investing EVOLVE INVESTment range 1 EVOLVE INVESTMENT RANGE EVOLVE INVESTMENT RANGE 2 Picture a word where you ony pay a company once they have deivered Imagine striking oi first, before

More information

The TCH Machine Translation System for IWSLT 2008

The TCH Machine Translation System for IWSLT 2008 The TCH Machine Translation System for IWSLT 2008 Haifeng Wang, Hua Wu, Xiaoguang Hu, Zhanyi Liu, Jianfeng Li, Dengjun Ren, Zhengyu Niu Toshiba (China) Research and Development Center 5/F., Tower W2, Oriental

More information

Fixed income managers: evolution or revolution

Fixed income managers: evolution or revolution Fixed income managers: evoution or revoution Traditiona approaches to managing fixed interest funds rey on benchmarks that may not represent optima risk and return outcomes. New techniques based on separate

More information


SELECTING THE SUITABLE ERP SYSTEM: A FUZZY AHP APPROACH. Ufuk Cebeci SELECTING THE SUITABLE ERP SYSTEM: A FUZZY AHP APPROACH Ufuk Cebeci Department of Industria Engineering, Istanbu Technica University, Macka, Istanbu, Turkey - ufuk_cebeci@yahoo.com Abstract An Enterprise

More information

Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems

Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems Ergun Biçici Qun Liu Centre for Next Generation Localisation Centre for Next Generation Localisation School of Computing

More information

Chapter 3: e-business Integration Patterns

Chapter 3: e-business Integration Patterns Chapter 3: e-business Integration Patterns Page 1 of 9 Chapter 3: e-business Integration Patterns "Consistency is the ast refuge of the unimaginative." Oscar Wide In This Chapter What Are Integration Patterns?

More information

Distribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey

Distribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey Distribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey by Linda Drazga Maxfied and Virginia P. Rena* Using data from the New Beneficiary Survey, this artice examines

More information

The guaranteed selection. For certainty in uncertain times

The guaranteed selection. For certainty in uncertain times The guaranteed seection For certainty in uncertain times Making the right investment choice If you can t afford to take a ot of risk with your money it can be hard to find the right investment, especiay

More information

TERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007.

TERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007. This is the U.S. Socia Security Life Tabe, based on year 2007. This is avaiabe at http://www.ssa.gov/oact/stats/tabe4c6.htm. The ife eperiences of maes and femaes are different, and we usuay do separate

More information

Spatio-Temporal Asynchronous Co-Occurrence Pattern for Big Climate Data towards Long-Lead Flood Prediction

Spatio-Temporal Asynchronous Co-Occurrence Pattern for Big Climate Data towards Long-Lead Flood Prediction Spatio-Tempora Asynchronous Co-Occurrence Pattern for Big Cimate Data towards Long-Lead Food Prediction Chung-Hsien Yu, Dong Luo, Wei Ding, Joseph Cohen, David Sma and Shafiqu Isam Department of Computer

More information

THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

More information

AA Fixed Rate ISA Savings

AA Fixed Rate ISA Savings AA Fixed Rate ISA Savings For the road ahead The Financia Services Authority is the independent financia services reguator. It requires us to give you this important information to hep you to decide whether

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

Life Contingencies Study Note for CAS Exam S. Tom Struppeck

Life Contingencies Study Note for CAS Exam S. Tom Struppeck Life Contingencies Study Note for CAS Eam S Tom Struppeck (Revised 9/19/2015) Introduction Life contingencies is a term used to describe surviva modes for human ives and resuting cash fows that start or

More information

Betting Strategies, Market Selection, and the Wisdom of Crowds

Betting Strategies, Market Selection, and the Wisdom of Crowds Betting Strategies, Market Seection, and the Wisdom of Crowds Wiemien Kets Northwestern University w-kets@keogg.northwestern.edu David M. Pennock Microsoft Research New York City dpennock@microsoft.com

More information

Subject: Corns of En gineers and Bureau of Reclamation: Information on Potential Budgetarv Reductions for Fiscal Year 1998

Subject: Corns of En gineers and Bureau of Reclamation: Information on Potential Budgetarv Reductions for Fiscal Year 1998 GAO United States Genera Accounting Office Washington, D.C. 20548 Resources, Community, and Economic Deveopment Division B-276660 Apri 25, 1997 The Honorabe Pete V. Domenici Chairman The Honorabe Harry

More information

Advantages and Disadvantages of Sampling. Vermont ASQ Meeting October 26, 2011

Advantages and Disadvantages of Sampling. Vermont ASQ Meeting October 26, 2011 Advantages and Disadvantages of Samping Vermont ASQ Meeting October 26, 2011 Jeffrey S. Soomon Genera Dynamics Armament and Technica Products, Inc. Wiiston, VT 05495 Outine I. Definition and Exampes II.

More information

Undergraduate Studies in. Education and International Development

Undergraduate Studies in. Education and International Development Undergraduate Studies in Education and Internationa Deveopment Wecome Wecome to the Schoo of Education and Lifeong Learning at Aberystwyth University. Over 100 years ago, Aberystwyth was the first university

More information

UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT

UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT Eva Hasler ILCC, School of Informatics University of Edinburgh e.hasler@ed.ac.uk Abstract We describe our systems for the SemEval

More information

A Practical Framework for Privacy-Preserving Data Analytics

A Practical Framework for Privacy-Preserving Data Analytics A Practica Framework for Privacy-Preserving Data Anaytics ABSTRACT Liyue Fan Integrated Media Systems Center University of Southern Caifornia Los Angees, CA, USA iyuefan@usc.edu The avaiabiity of an increasing

More information

A Latent Variable Pairwise Classification Model of a Clustering Ensemble

A Latent Variable Pairwise Classification Model of a Clustering Ensemble A atent Variabe Pairwise Cassification Mode of a Custering Ensembe Vadimir Berikov Soboev Institute of mathematics, Novosibirsk State University, Russia berikov@math.nsc.ru http://www.math.nsc.ru Abstract.

More information

eg Enterprise vs. a Big 4 Monitoring Soution: Comparing Tota Cost of Ownership Restricted Rights Legend The information contained in this document is confidentia and subject to change without notice. No

More information

Multi-Robot Task Scheduling

Multi-Robot Task Scheduling Proc of IEEE Internationa Conference on Robotics and Automation, Karsruhe, Germany, 013 Muti-Robot Tas Scheduing Yu Zhang and Lynne E Parer Abstract The scheduing probem has been studied extensivey in

More information

How To Deiver Resuts

How To Deiver Resuts Message We sha make every effort to strengthen the community buiding programme which serves to foster among the peope of Hong Kong a sense of beonging and mutua care. We wi continue to impement the District

More information

Niagara Catholic. District School Board. High Performance. Support Program. Academic

Niagara Catholic. District School Board. High Performance. Support Program. Academic Niagara Cathoic District Schoo Board High Performance Academic Support Program The Niagara Cathoic District Schoo Board, through the charisms of faith, socia justice, support and eadership, nurtures an

More information

Enabling Direct Interest-Aware Audience Selection

Enabling Direct Interest-Aware Audience Selection Enabing Direct Interest-Aware Audience Seection ABSTRACT Arie Fuxman Microsoft Research Mountain View, CA arief@microsoft.com Zhenhui Li University of Iinois Urbana-Champaign, Iinois zi28@uiuc.edu Advertisers

More information

A Supplier Evaluation System for Automotive Industry According To Iso/Ts 16949 Requirements

A Supplier Evaluation System for Automotive Industry According To Iso/Ts 16949 Requirements A Suppier Evauation System for Automotive Industry According To Iso/Ts 16949 Requirements DILEK PINAR ÖZTOP 1, ASLI AKSOY 2,*, NURSEL ÖZTÜRK 2 1 HONDA TR Purchasing Department, 41480, Çayırova - Gebze,

More information

Finance 360 Problem Set #6 Solutions

Finance 360 Problem Set #6 Solutions Finance 360 Probem Set #6 Soutions 1) Suppose that you are the manager of an opera house. You have a constant margina cost of production equa to $50 (i.e. each additiona person in the theatre raises your

More information

Simultaneous Routing and Power Allocation in CDMA Wireless Data Networks

Simultaneous Routing and Power Allocation in CDMA Wireless Data Networks Simutaneous Routing and Power Aocation in CDMA Wireess Data Networks Mikae Johansson *,LinXiao and Stephen Boyd * Department of Signas, Sensors and Systems Roya Institute of Technoogy, SE 00 Stockhom,

More information

Infrastructure for Business

Infrastructure for Business Infrastructure for Business The IoD Member Broadband Survey Infrastructure for Business 2013 #5 The IoD Member Broadband Survey The IoD Member Broadband Survey Written by: Corin Tayor, Senior Economic

More information


3.3 SOFTWARE RISK MANAGEMENT (SRM) 93 3.3 SOFTWARE RISK MANAGEMENT (SRM) Fig. 3.2 SRM is a process buit in five steps. The steps are: Identify Anayse Pan Track Resove The process is continuous in nature and handed dynamicay throughout ifecyce

More information

Diploma Decisions for Students with Disabilities. What Parents Need to Know

Diploma Decisions for Students with Disabilities. What Parents Need to Know Dipoma Decisions for Students with Disabiities What Parents Need to Know Forida Department of Education Bureau of Exceptiona Education and Student Services Revised 2005 This is one of many pubications

More information

The Use of Cooling-Factor Curves for Coordinating Fuses and Reclosers

The Use of Cooling-Factor Curves for Coordinating Fuses and Reclosers he Use of ooing-factor urves for oordinating Fuses and Recosers arey J. ook Senior Member, IEEE S& Eectric ompany hicago, Iinois bstract his paper describes how to precisey coordinate distribution feeder

More information

Early access to FAS payments for members in poor health

Early access to FAS payments for members in poor health Financia Assistance Scheme Eary access to FAS payments for members in poor heath Pension Protection Fund Protecting Peope s Futures The Financia Assistance Scheme is administered by the Pension Protection

More information

l l ll l l Exploding the Myths about DETC Accreditation A Primer for Students

l l ll l l Exploding the Myths about DETC Accreditation A Primer for Students Expoding the Myths about DETC Accreditation A Primer for Students Distance Education and Training Counci Expoding the Myths about DETC Accreditation: A Primer for Students Prospective distance education

More information

Factored Translation Models

Factored Translation Models Factored Translation s Philipp Koehn and Hieu Hoang pkoehn@inf.ed.ac.uk, H.Hoang@sms.ed.ac.uk School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom

More information

Income Protection Options

Income Protection Options Income Protection Options Poicy Conditions Introduction These poicy conditions are written confirmation of your contract with Aviva Life & Pensions UK Limited. It is important that you read them carefuy

More information

Sentiment Analysis with Global Topics and Local Dependency

Sentiment Analysis with Global Topics and Local Dependency Proceedings of the Tenty-Fourth AAAI Conference on Artificia Inteigence (AAAI-10) Sentiment Anaysis ith Goba Topics and Loca Dependency Fangtao Li, Minie Huang, Xiaoyan Zhu State Key Laboratory of Inteigent

More information


ST. MARKS CONFERENCE FACILITY MARKET ANALYSIS ST. MARKS CONFERENCE FACILITY MARKET ANALYSIS Prepared by: Lambert Advisory, LLC Submitted to: St. Marks Waterfronts Forida Partnership St. Marks Conference Center Contents Executive Summary... 1 Section

More information

Avaya Remote Feature Activation (RFA) User Guide

Avaya Remote Feature Activation (RFA) User Guide Avaya Remote Feature Activation (RFA) User Guide 03-300149 Issue 5.0 September 2007 2007 Avaya Inc. A Rights Reserved. Notice Whie reasonabe efforts were made to ensure that the information in this document

More information

An Online Service for SUbtitling by MAchine Translation

An Online Service for SUbtitling by MAchine Translation SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2012 Editor(s): Contributor(s): Reviewer(s): Status-Version: Arantza del Pozo Mirjam Sepesy Maucec,

More information

Load Balancing in Distributed Web Server Systems with Partial Document Replication *

Load Balancing in Distributed Web Server Systems with Partial Document Replication * Load Baancing in Distributed Web Server Systems with Partia Document Repication * Ling Zhuo Cho-Li Wang Francis C. M. Lau Department of Computer Science and Information Systems The University of Hong Kong

More information

GWPD 4 Measuring water levels by use of an electric tape

GWPD 4 Measuring water levels by use of an electric tape GWPD 4 Measuring water eves by use of an eectric tape VERSION: 2010.1 PURPOSE: To measure the depth to the water surface beow and-surface datum using the eectric tape method. Materias and Instruments 1.

More information

CUSTOM. Putting Your Benefits to Work. COMMUNICATIONS. Employee Communications Benefits Administration Benefits Outsourcing

CUSTOM. Putting Your Benefits to Work. COMMUNICATIONS. Employee Communications Benefits Administration Benefits Outsourcing CUSTOM COMMUNICATIONS Putting Your Benefits to Work. Empoyee Communications Benefits Administration Benefits Outsourcing Recruiting and retaining top taent is a major chaenge facing HR departments today.

More information

Secure Network Coding with a Cost Criterion

Secure Network Coding with a Cost Criterion Secure Network Coding with a Cost Criterion Jianong Tan, Murie Médard Laboratory for Information and Decision Systems Massachusetts Institute of Technoogy Cambridge, MA 0239, USA E-mai: {jianong, medard}@mit.edu

More information

Business schools are the academic setting where. The current crisis has highlighted the need to redefine the role of senior managers in organizations.

Business schools are the academic setting where. The current crisis has highlighted the need to redefine the role of senior managers in organizations. c r o s os r oi a d s REDISCOVERING THE ROLE OF BUSINESS SCHOOLS The current crisis has highighted the need to redefine the roe of senior managers in organizations. JORDI CANALS Professor and Dean, IESE

More information


PENALTY TAXES ON CORPORATE ACCUMULATIONS H Chapter Six H PENALTY TAXES ON CORPORATE ACCUMULATIONS INTRODUCTION AND STUDY OBJECTIVES The accumuated earnings tax and the persona hoding company tax are penaty taxes designed to prevent taxpayers

More information

Automatic slide assignation for language model adaptation

Automatic slide assignation for language model adaptation Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly

More information

Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation

Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation Nicola Bertoldi Mauro Cettolo Marcello Federico FBK - Fondazione Bruno Kessler via Sommarive 18 38123 Povo,

More information

Pricing and hedging of variable annuities

Pricing and hedging of variable annuities Cutting Edge Pricing and hedging of variabe annuities Variabe annuity products are unit-inked investments with some form of guarantee, traditionay sod by insurers or banks into the retirement and investment

More information

Oligopoly in Insurance Markets

Oligopoly in Insurance Markets Oigopoy in Insurance Markets June 3, 2008 Abstract We consider an oigopoistic insurance market with individuas who differ in their degrees of accident probabiities. Insurers compete in coverage and premium.

More information

This paper considers an inventory system with an assembly structure. In addition to uncertain customer

This paper considers an inventory system with an assembly structure. In addition to uncertain customer MANAGEMENT SCIENCE Vo. 51, No. 8, August 2005, pp. 1250 1265 issn 0025-1909 eissn 1526-5501 05 5108 1250 informs doi 10.1287/mnsc.1050.0394 2005 INFORMS Inventory Management for an Assemby System wh Product

More information

Certificate in Contemporary Music 2016 For International Applicants

Certificate in Contemporary Music 2016 For International Applicants Certificate in Contemporary Music 2016 For Internationa Appicants Quaification Certificate in Contemporary Music Performance Programme eve: Leve 4 Length: Start dates: Study options: One year 15 February

More information

Technical Support Guide for online instrumental lessons

Technical Support Guide for online instrumental lessons Technica Support Guide for onine instrumenta essons This is a technica guide for Music Education Hubs, Schoos and other organisations participating in onine music essons. The guidance is based on the technica

More information

How to Cut Health Care Costs

How to Cut Health Care Costs How to Cut Heath Care Costs INSIDE: TEN TIPS FOR MEDICARE BENEFICIARIES What is one of the biggest financia surprises in retirement? Heath care costs. It s a growing concern among many Medicare beneficiaries,

More information



More information

Views of black trainee accountants in South Africa on matters related to a career as a chartered accountant

Views of black trainee accountants in South Africa on matters related to a career as a chartered accountant Views of back trainee accountants in South Africa on matters reated to a career as a chartered accountant ESader Department of Appied Accountancy University of South Africa BJErasmus Department of Business

More information

Teamwork. Abstract. 2.1 Overview

Teamwork. Abstract. 2.1 Overview 2 Teamwork Abstract This chapter presents one of the basic eements of software projects teamwork. It addresses how to buid teams in a way that promotes team members accountabiity and responsibiity, and

More information

Protection Against Income Loss During the First 4 Months of Illness or Injury *

Protection Against Income Loss During the First 4 Months of Illness or Injury * Protection Against Income Loss During the First 4 Months of Iness or Injury * This note examines and describes the kinds of income protection that are avaiabe to workers during the first 6 months of iness

More information

ONE of the most challenging problems addressed by the

ONE of the most challenging problems addressed by the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 44, NO. 9, SEPTEMBER 2006 2587 A Mutieve Context-Based System for Cassification of Very High Spatia Resoution Images Lorenzo Bruzzone, Senior Member,

More information

Hedge Fund Capital Accounts and Revaluations: Are They Section 704(b) Compliant?

Hedge Fund Capital Accounts and Revaluations: Are They Section 704(b) Compliant? o EDITED BY ROGER F. PILLOW, LL.M. PARTNERSHIPS, S CORPORATIONS & LLCs Hedge Fund Capita Accounts and Revauations: Are They Section 704(b) Compiant? THOMAS GRAY Hedge funds treated as partnerships for

More information

LIUM s Statistical Machine Translation System for IWSLT 2010

LIUM s Statistical Machine Translation System for IWSLT 2010 LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,

More information


FIRST BANK OF MANHATTAN MORTGAGE LOAN ORIGINATORS NMLS ID #405508 ITEMS TO BE SUBMITTED WITH HOME EQUITY LOAN APPLICATION Bring In: Pay stubs from the ast 30 days W-2 s and Tax Returns from the ast 2 years Bank Statements from ast 2 months (A Pages) Copy of Homeowner

More information

With the arrival of Java 2 Micro Edition (J2ME) and its industry

With the arrival of Java 2 Micro Edition (J2ME) and its industry Knowedge-based Autonomous Agents for Pervasive Computing Using AgentLight Fernando L. Koch and John-Jues C. Meyer Utrecht University Project AgentLight is a mutiagent system-buiding framework targeting

More information

Vacancy Rebate Supporting Documentation Checklist

Vacancy Rebate Supporting Documentation Checklist Vacancy Rebate Supporting Documentation Checkist The foowing documents are required and must accompany the vacancy rebate appication at the time of submission. If the vacancy is a continuation from the

More information

effect on major accidents

effect on major accidents An Investigation into a weekend (or bank hoiday) effect on major accidents Nicoa C. Heaey 1 and Andrew G. Rushton 2 1 Heath and Safety Laboratory, Harpur Hi, Buxton, Derbyshire, SK17 9JN 2 Hazardous Instaations

More information

Hybrid Machine Translation Guided by a Rule Based System

Hybrid Machine Translation Guided by a Rule Based System Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the

More information

Human Capital & Human Resources Certificate Programs

Human Capital & Human Resources Certificate Programs MANAGEMENT CONCEPTS Human Capita & Human Resources Certificate Programs Programs to deveop functiona and strategic skis in: Human Capita // Human Resources ENROLL TODAY! Contract Hoder Contract GS-02F-0010J

More information

your statement of insurance

your statement of insurance your statement of insurance Schoo - Winter Trave Insurance poicyhoder: STG issued on: 1st February 2013 poicy number: NS9 0001313 reason for issue: new business This Statement of Insurance forms part of

More information

WHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization

WHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization Best Practices: Pushing Exce Beyond Its Limits with Information Optimization WHITE Best Practices: Pushing Exce Beyond Its Limits with Information Optimization Executive Overview Microsoft Exce is the

More information

Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation

Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation Joern Wuebker M at thias Huck Stephan Peitz M al te Nuhn M arkus F reitag Jan-Thorsten Peter Saab M ansour Hermann N e

More information

The Web Insider... The Best Tool for Building a Web Site *

The Web Insider... The Best Tool for Building a Web Site * The Web Insider... The Best Too for Buiding a Web Site * Anna Bee Leiserson ** Ms. Leiserson describes the types of Web-authoring systems that are avaiabe for buiding a site and then discusses the various

More information


READING A CREDIT REPORT Name Date CHAPTER 6 STUDENT ACTIVITY SHEET READING A CREDIT REPORT Review the sampe credit report. Then search for a sampe credit report onine, print it off, and answer the questions beow. This activity

More information

Chapter 3: JavaScript in Action Page 1 of 10. How to practice reading and writing JavaScript on a Web page

Chapter 3: JavaScript in Action Page 1 of 10. How to practice reading and writing JavaScript on a Web page Chapter 3: JavaScript in Action Page 1 of 10 Chapter 3: JavaScript in Action In this chapter, you get your first opportunity to write JavaScript! This chapter introduces you to JavaScript propery. In addition,

More information

Design of Follow-Up Experiments for Improving Model Discrimination and Parameter Estimation

Design of Follow-Up Experiments for Improving Model Discrimination and Parameter Estimation Design of Foow-Up Experiments for Improving Mode Discrimination and Parameter Estimation Szu Hui Ng 1 Stephen E. Chick 2 Nationa University of Singapore, 10 Kent Ridge Crescent, Singapore 119260. Technoogy

More information

Books on Reference and the Problem of Library Science

Books on Reference and the Problem of Library Science Practicing Reference... Learning from Library Science * Mary Whisner ** Ms. Whisner describes the method and some of the resuts reported in a recenty pubished book about the reference interview written

More information

Migrating and Managing Dynamic, Non-Textua Content

Migrating and Managing Dynamic, Non-Textua Content Considering Dynamic, Non-Textua Content when Migrating Digita Asset Management Systems Aya Stein; University of Iinois at Urbana-Champaign; Urbana, Iinois USA Santi Thompson; University of Houston; Houston,

More information

The KIT Translation system for IWSLT 2010

The KIT Translation system for IWSLT 2010 The KIT Translation system for IWSLT 2010 Jan Niehues 1, Mohammed Mediani 1, Teresa Herrmann 1, Michael Heck 2, Christian Herff 2, Alex Waibel 1 Institute of Anthropomatics KIT - Karlsruhe Institute of

More information

Who Benefits From Social Health Insurance in Developing Countries?

Who Benefits From Social Health Insurance in Developing Countries? Who Benefits From Socia Heath Insurance in Deveoping Countries? Pau Gerter University of Caifornia at Bereey and NBER Orvie Soon University of the Phiippines, Schoo of Economics March, 2000 Abstract A

More information

Breakeven analysis and short-term decision making

Breakeven analysis and short-term decision making Chapter 20 Breakeven anaysis and short-term decision making REAL WORLD CASE This case study shows a typica situation in which management accounting can be hepfu. Read the case study now but ony attempt

More information

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统 SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande

More information

Annual Notice of Changes for 2016

Annual Notice of Changes for 2016 Easy Choice Best Pan (HMO) offered by Easy Choice Heath Pan, Inc. Annua Notice of Changes for 2016 You are currenty enroed as a member of Easy Choice Best Pan (HMO). Next year, there wi be some changes

More information

A Branch-and-Price Algorithm for Parallel Machine Scheduling with Time Windows and Job Priorities

A Branch-and-Price Algorithm for Parallel Machine Scheduling with Time Windows and Job Priorities A Branch-and-Price Agorithm for Parae Machine Scheduing with Time Windows and Job Priorities Jonathan F. Bard, 1 Siwate Rojanasoonthon 2 1 Graduate Program in Operations Research and Industria Engineering,

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit

The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58 Ncode: an Open Source Bilingual N-gram SMT Toolkit Josep M. Crego a, François Yvon ab, José B. Mariño c c a LIMSI-CNRS, BP 133,

More information

The definition of insanity is doing the same thing over and over again and expecting different results

The definition of insanity is doing the same thing over and over again and expecting different results insurance services Sma Business Insurance a market opportunity being missed Einstein may not have known much about insurance, but if you appy his definition to the way existing brands are deveoping their

More information

Vendor Performance Measurement Using Fuzzy Logic Controller

Vendor Performance Measurement Using Fuzzy Logic Controller The Journa of Mathematics and Computer Science Avaiabe onine at http://www.tjmcs.com The Journa of Mathematics and Computer Science Vo.2 No.2 (2011) 311-318 Performance Measurement Using Fuzzy Logic Controer

More information

LT Codes-based Secure and Reliable Cloud Storage Service

LT Codes-based Secure and Reliable Cloud Storage Service 2012 Proceedings IEEE INFOCOM LT Codes-based Secure and Reiabe Coud Storage Service Ning Cao Shucheng Yu Zhenyu Yang Wenjing Lou Y. Thomas Hou Worcester Poytechnic Institute, Worcester, MA, USA University

More information