Hybrid Selection of Language Model Training Data Using Linguistic Information and Perplexity
|
|
- Andrew Park
- 8 years ago
- Views:
Transcription
1 Hybrid Seection o Language Mode Training Data Using Linguistic Inormation and Antonio Tora Schoo o Computing Dubin City University Dubin, Ireand atora@computing.dcu.ie Abstract We expore the seection o training data or anguage modes using perpexity. We introduce three nove modes that make use o inguistic inormation and evauate them on three dierent corpora and two anguages. In our out o the six scenarios a inguisticay motivated method outperorms the purey statistica state-o-theart approach. Finay, a method which combines surace orms and the inguisticay motivated methods outperorms the baseine in a the scenarios, seecting data whose perpexity is between 3.49% and 8.17% (depending on the corpus and anguage) ower than that o the baseine. 1 Introduction Language modes (LMs) are a undamenta piece in statistica appications that produce natura anguage text, such as machine transation and speech recognition. In order to perorm optimay, a LM shoud be trained on data rom the same domain as the data that it wi be appied to. This poses a probem, because in the majority o appications, the amount o domain-speciic data is imited. A popuar strand o research in recent years to tacke this probem is that o training data seection. Given a imited domain-speciic corpus and a arger non-domain-speciic corpus, the task consists on inding suitabe data or the speciic domain in the non-domain-speciic corpus. The underying assumption is that a non-domain-speciic corpus, i broad enough, contains sentences simiar to a domain-speciic corpus, which thereore, woud be useu or training modes or that domain. This paper ocuses on the approach that uses perpexity or the seection o training data. The irst works in this regard (Gao et a., 2002; Lin et a., 1997) use the perpexity according to a domain-speciic LM to rank the text segments (e.g. sentences) o non-domain-speciic corpora. The text segments with perpexity ess than a given threshod are seected. A more recent method, which can be considered the state-o-the-art, is Moore-Lewis (Moore and Lewis, 2010). It considers not ony the crossentropy 1 according to the domain-speciic LM but aso the cross-entropy according to a LM buit on a random subset (equa in size to the domainspeciic corpus) o the non-domain-speciic corpus. The additiona use o a LM rom the nondomain-speciic corpus aows to seect a subset o the non-domain-speciic corpus which is better (the perpexity o a test set o the speciic domain has ower perpexity on a LM trained on this subset) and smaer compared to the previous approaches. The experiment was carried out or Engish, using Europar (Koehn, 2005) as the domain-speciic corpus and LDC Gigaword 2 as the non-domain-speciic one. In this paper we study whether the use o two types o inguistic knowedge (emmas and named entities) can contribute to obtain better resuts within the perpexity-based approach. 2 Methodoogy We expore the use o inguistic inormation or the seection o data to train domain-speciic LMs rom non-domain-speciic corpora. Our hypothesis is that ranking by perpexity on n-grams that represent inguistic patterns (rather than n-grams that represent surace orms) captures additiona inormation, and thus may seect vauabe data that is not seected according soey to surace orms. We use two types o inguistic inormation at 1 note that using cross-entropy is equivaent to using perpexity since they are monotonicay reated. 2 cataogentry.jsp?cataogid=ldc2007t07 8 Proceedings o the Second Workshop on Hybrid Approaches to Transation, pages 8 12, Soia, Bugaria, August 8, c 2013 Association or Computationa Linguistics
2 word eve: emmas and named entity categories. We experiment with the oowing modes: Forms (hereater ), uses surace orms. This mode repicates the Moore-Lewis approach and is to be considered the baseine. Forms and named entities (hereater ), uses surace orms, with the exception o any word detected as a named entity, which is substituted by its type (e.g. person, organisation). Lemmas (hereater ), uses emmas. Lemmas and named entities (hereater n), uses emmas, with the exception o any word detected as a named entity, which is substituted by its type. A sampe sentence, according to each o these modes, oows: : I decare resumed the session o the European Pariament : I decare resumed the session o the NP00O00 : i decare resume the session o the european_pariament n: i decare resume the session o the NP00O00 Tabe 1 shows the number o n-grams on LMs buit on the Engish side o News Commentary v8 (hereater NC) or each o the modes. Regarding 1-grams, compared to, the substitution o named entities by their categories () resuts in smaer vocabuary size (-24.79%). Simiary, the vocabuary is reduced or the modes (-8.39%) and n ( %). Athough not a resut in itse, this might be an indication that using inguisticay motivated modes coud be useu to dea with data sparsity. n n Tabe 1: Number o n-grams in LMs buit using the dierent modes Our procedure oows that o the Moore-Lewis method. We buid LMs or the domain-speciic corpus and or a random subset o the nondomain-speciic corpus o the same size (number o sentences) o the domain-speciic corpus. Each sentence s in the non-domain-speciic corpus is then scored according to equation 1 where P P I (s) is the perpexity o s according to the domainspeciic LM and P P O (s) is the perpexity o s according to the non-domain-speciic LM. score(s) = P P I (s) P P O (s) (1) We buid LMs or the domain-speciic and nondomain-speciic corpora using the our modes previousy introduced. Then we rank the sentences o the non-domain-speciic corpus or each o these modes and keep the highest ranked sentences according to a threshod. Finay, we buid a LM on the set o sentences seected 3 and compute the perpexity o the test set on this LM. We aso investigate the combination o the our modes. The procedure is airy straightorward: given the sentences seected by a the modes or a given threshod, we iterate through these sentences oowing the ranking order and keeping a the distinct sentences seected unti we obtain a set o sentences whose size is the one indicated by the threshod. I.e. we add to our distinct set o sentences irst the top ranked sentence by each o the methods, then the sentence ranked second by each method, and so on. 3 Experiments 3.1 Setting We use corpora rom the transation task at WMT13. 4 Our domain-speciic corpus is NC, and we carry out experiments with three non-domainspeciic corpora: a subset o Common Craw 5 (hereater CC), Europar version 7 (hereater EU), and United Nations (Eisee and Chen, 2010) (hereater UN). We use the test data rom WMT12 (newstest2012) as our test set. We carry out experiments on two anguages or which these corpora are avaiabe: Engish (reerred to as en in tabes) and Spanish ( es in tabes). We test the methods on three very dierent nondomain-speciic corpora, both in terms o the topics that they cover (text crawed rom web in CC, pariamentary speeches in EU and oicia documents rom United Nations in UN) and their size 3 For the inguistic methods we repace the sentences seected (which contain emmas and/or named entities) with the corresponding sentences in the origina corpus (containing ony word orms). 4 transation-task.htm 5 9
3 (around 2 miion sentences both or CC and EU, and around 11 miion or UN). This can be considered as a contribution o this paper since previous works such as Moore and Lewis (2010) and, more recenty, Axerod et a. (2011) test the Moore-Lewis method on ony one non-domainspeciic corpus: LDC Gigaword and an unpubished genera-domain corpus, respectivey. A the LMs are buit with IRSTLM (Federico et a., 2008), use up to 5-grams and are smoothed using a simpiied version o the improved Kneser-Ney method (Chen and Goodman, 1996). For emmatisation and named entity recognition we use Freeing 3.0 (Padró and Staniovsky, 2012). The corpora are tokenised and truecased using scripts rom the Moses tookit (Koehn et a., 2007). 3.2 Experiments with Dierent Modes Figures 1, 2 and 3 show the perpexities obtained by each method on dierent subsets seected rom the Engish corpora CC, EU and UN, respectivey. We obtain these subsets according to dierent threshods, i.e. percentages o sentences seected rom the non-domain-speciic corpus. These are 1 the irst 64 ranked sentences, 1 32, 1 16, 1 8, 1 4, 1 2 and 1. 6 Corresponding igures or Spanish are omitted due to the imited space avaiabe and aso because the trends in those igures are very simiar Figure 1: Resuts o the dierent methods on CC In a the igures, the resuts are very simiar regardess o the use o emmas. The use o named entities, however, produces substantiay dierent resuts. The modes that do not use named entity categories obtain the best resuts or ower threshods (up to 1/32 or CC, and up to 1/16 both or 6 1 An additiona threshod,, is used or the United Nations 128 corpus n Figure 2: Resuts o the dierent methods on EU Figure 3: Resuts o the dierent methods on UN EU and UN). I the best perpexity is obtained with a ower threshod than this (the case o EU, 1/32, and UN, 1/64), then methods that do not use named entities obtain the best resut. However, i the optima perpexity is obtained with a higher threshod (the case o CC, 1/2), then using named entities yieds the best resut. Tabe 2 presents the resuts or each mode. For each scenario (corpus and anguage combination), we show the threshod or which the best resut is obtained (coumn best). The perpexity obtained on data seected by each mode is shown in the subsequent coumns. For the inguistic methods, we aso show the comparison o their perormance to the baseine (as percentages, coumns di). The perpexity when using the u corpus is shown (coumn u) together with the comparison o this resut to the best method (ast coumn di). The resuts, as previousy seen in Figures 1, 2 and 3, dier with respect to the corpus but oow simiar trends across anguages. For CC we obtain the best resuts using named entities. The mode n obtains the best resut or Engish (5.54% ower n n 10
4 corpus best di di n di u di cc en 1/ eu en 1/ un en 1/ cc es 1/ eu es 1/ un es 1/ Tabe 2: Resuts or the dierent modes perpexity than the baseine), whie the mode obtains the best resut or Spanish (3.82%), athough in both cases the dierence between these two modes is rather sma. For the other corpora, the best resuts are obtained without named entities. In the case o EU, the baseine obtains the best resut, athough the mode is not very ar (1.18% higher perpexity or Engish and 1.63% or Spanish). This trend is reversed or UN, the mode obtaining the best scores but cose to the baseine (-0.51%, -0.35%). 3.3 Experiments with the Combination o Modes Tabe 3 shows the perpexities obtained by the method that combines the our modes (coumn comb) or the threshod that yieded the best resut in each scenario (see Tabe 2), compares these resuts (coumn di) to those obtained by the baseine (coumn ) and shows the percentage o sentences that this method inspected rom the sentences seected by the individua methods (coumn perc). corpus comb di perc cc en eu en un en cc es eu es un es Tabe 3: Resuts o the combination method The combination method outperorms the baseine and any o the individua inguistic modes in a the scenarios. The perpexity obtained by combining the modes is substantiay ower than that obtained by the baseine (ranging rom 3.49% to 8.17%). In a the scenarios, the combination method takes its sentences rom roughy the top 70% sentences ranked by the individua methods. 4 Concusions and Future Work This paper has expored the use o inguistic inormation (emmas and named entities) or the task o training data seection or LMs. We have introduced three inguisticay motivated modes, and compared them to the state-o-the-art method or perpexity-based data seection across three dierent corpora and two anguages. In our out o these six scenarios a inguisticay motivated method outperorms the state-o-the-art approach. We have aso presented a method which combines surace orms and the three inguisticay motivated methods. This combination outperorms the baseine in a the scenarios, seecting data whose perpexity is between 3.49% and 8.17% (depending on the corpus and anguage) ower than that o the baseine. Regarding uture work, we have severa pans. One interesting experiment woud be to appy these modes to a morphoogicay-rich anguage, to check i, as hypothesised, these modes dea better with sparse data. Another strand regards the appication o these modes to iter parae corpora, e.g. oowing the extension o the Moore-Lewis method (Axerod et a., 2011) or in combination with other methods which are deemed to be more suitabe or parae data, e.g. (Mansour et a., 2011). We have used one type o inguistic inormation in each LM, but another possibiity is to combine dierent pieces o inguistic inormation in a singe LM, e.g. oowing a hybrid LM that uses words and tags, depending o the requency o each type (Ruiz et a., 2012). Given the act that the best resut is obtained with dierent modes depending on the corpus, it woud be worth to investigate whether given a new corpus, one coud predict the best method to be appied and the threshod or which one coud expect to obtain the minimum perpexity. 11
5 Acknowedgments We woud ike to thank Raphaë Rubino or insightu conversations. The research eading to these resuts has received unding rom the European Union Seventh Framework Programme FP7/ under grant agreements PIAP- GA and FP7-ICT Reerences Amittai Axerod, Xiaodong He, and Jianeng Gao Domain adaptation via pseudo in-domain data seection. In Proceedings o the Conerence on Empirica Methods in Natura Language Processing, EMNLP 11, pages , Stroudsburg, PA, USA. Association or Computationa Linguistics. Staney F. Chen and Joshua Goodman An empirica study o smoothing techniques or anguage modeing. In Proceedings o the 34th annua meeting on Association or Computationa Linguistics, ACL 96, pages , Stroudsburg, PA, USA. Association or Computationa Linguistics. Andreas Eisee and Yu Chen Mutiun: A mutiingua corpus rom united nation documents. In Nicoetta Cazoari, Khaid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Steios Piperidis, Mike Rosner, and Danie Tapias, editors, LREC. European Language Resources Association. and Evangeos Dermatas, editors, EUROSPEECH. ISCA. Saab Mansour, Joern Wuebker, and Hermann Ney Combining transation and anguage mode scoring or domain-speciic data itering. In Internationa Workshop on Spoken Language Transation, pages , San Francisco, Caiornia, USA, December. Robert C. Moore and Wiiam Lewis Inteigent seection o anguage mode training data. In Proceedings o the ACL 2010 Conerence Short Papers, ACLShort 10, pages , Stroudsburg, PA, USA. Association or Computationa Linguistics. Luís Padró and Evgeny Staniovsky Freeing 3.0: Towards wider mutiinguaity. In Proceedings o the Language Resources and Evauation Conerence (LREC 2012), Istanbu, Turkey, May. ELRA. Nick Ruiz, Arianna Bisazza, Rodano Cattoni, and Marceo Federico FBK s Machine Transation Systems or IWSLT 2012 s TED Lectures. In Proceedings o the 9th Internationa Workshop on Spoken Language Transation (IWSLT). Marceo Federico, Nicoa Bertodi, and Mauro Cettoo IRSTLM: an open source tookit or handing arge scae anguage modes. In INTER- SPEECH, pages ISCA. Jianeng Gao, Joshua Goodman, Mingjing Li, and Kai- Fu Lee Toward a uniied approach to statistica anguage modeing or chinese. 1(1):3 33, March. Phiipp Koehn, Hieu Hoang, Aexandra Birch, Chris Caison-Burch, Marceo Federico, Nicoa Bertodi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Aexandra Constantin, and Evan Herbst Moses: open source tookit or statistica machine transation. In Proceedings o the 45th Annua Meeting o the ACL on Interactive Poster and Demonstration Sessions, ACL 07, pages , Stroudsburg, PA, USA. Association or Computationa Linguistics. Phiipp Koehn Europar: A Parae Corpus or Statistica Machine Transation. In Conerence Proceedings: the tenth Machine Transation Summit, pages 79 86, Phuket, Thaiand. AAMT, AAMT. Sung-Chien Lin, Chi-Lung Tsai, Lee-Feng Chien, Keh- Jiann Chen, and Lin-Shan Lee Chinese anguage mode adaptation based on document cassiication and mutipe domain-speciic anguage modes. In George Kokkinakis, Nikos Fakotakis, 12
A Similarity Search Scheme over Encrypted Cloud Images based on Secure Transformation
A Simiarity Search Scheme over Encrypted Coud Images based on Secure Transormation Zhihua Xia, Yi Zhu, Xingming Sun, and Jin Wang Jiangsu Engineering Center o Network Monitoring, Nanjing University o Inormation
More informationDynamic Pricing Trade Market for Shared Resources in IIU Federated Cloud
Dynamic Pricing Trade Market or Shared Resources in IIU Federated Coud Tongrang Fan 1, Jian Liu 1, Feng Gao 1 1Schoo o Inormation Science and Technoogy, Shiiazhuang Tiedao University, Shiiazhuang, 543,
More informationCollaborative Machine Translation Service for Scientific texts
Collaborative Machine Translation Service for Scientific texts Patrik Lambert patrik.lambert@lium.univ-lemans.fr Jean Senellart Systran SA senellart@systran.fr Laurent Romary Humboldt Universität Berlin
More informationMachine translation techniques for presentation of summaries
Grant Agreement Number: 257528 KHRESMOI www.khresmoi.eu Machine translation techniques for presentation of summaries Deliverable number D4.6 Dissemination level Public Delivery date April 2014 Status Author(s)
More informationMinimum Support Size of the Defender s Strong Stackelberg Equilibrium Strategies in Security Games
Minimum Support Size o the Deender s Strong Stackeberg Equiibrium Strategies in Security Games Jiarui Gan University o Chinese Academy o Sciences The Key Lab o Inteigent Inormation Processing, ICT, CAS
More informationFace Hallucination and Recognition
Face Haucination and Recognition Xiaogang Wang and Xiaoou Tang Department of Information Engineering, The Chinese University of Hong Kong {xgwang1, xtang}@ie.cuhk.edu.hk http://mmab.ie.cuhk.edu.hk Abstract.
More informationAustralian Bureau of Statistics Management of Business Providers
Purpose Austraian Bureau of Statistics Management of Business Providers 1 The principa objective of the Austraian Bureau of Statistics (ABS) in respect of business providers is to impose the owest oad
More informationCONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS
Dehi Business Review X Vo. 4, No. 2, Juy - December 2003 CONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS John N.. Var arvatsouakis atsouakis DURING the present time,
More informationArt of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN: 1-932394-06-0
IEEE DISTRIBUTED SYSTEMS ONLINE 1541-4922 2005 Pubished by the IEEE Computer Society Vo. 6, No. 5; May 2005 Editor: Marcin Paprzycki, http://www.cs.okstate.edu/%7emarcin/ Book Reviews: Java Toos and Frameworks
More informationAdvanced ColdFusion 4.0 Application Development - 3 - Server Clustering Using Bright Tiger
Advanced CodFusion 4.0 Appication Deveopment - CH 3 - Server Custering Using Bri.. Page 1 of 7 [Figures are not incuded in this sampe chapter] Advanced CodFusion 4.0 Appication Deveopment - 3 - Server
More informationAdaptation to Hungarian, Swedish, and Spanish
www.kconnect.eu Adaptation to Hungarian, Swedish, and Spanish Deliverable number D1.4 Dissemination level Public Delivery date 31 January 2016 Status Author(s) Final Jindřich Libovický, Aleš Tamchyna,
More informationAdapting General Models to Novel Project Ideas
The KIT Translation Systems for IWSLT 2013 Thanh-Le Ha, Teresa Herrmann, Jan Niehues, Mohammed Mediani, Eunah Cho, Yuqi Zhang, Isabel Slawik and Alex Waibel Institute for Anthropomatics KIT - Karlsruhe
More informationFast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing.
Fast Robust Hashing Manue Urueña, David Larrabeiti and Pabo Serrano Universidad Caros III de Madrid E-89 Leganés (Madrid), Spain Emai: {muruenya,darra,pabo}@it.uc3m.es Abstract As statefu fow-aware services
More informationPay-on-delivery investing
Pay-on-deivery investing EVOLVE INVESTment range 1 EVOLVE INVESTMENT RANGE EVOLVE INVESTMENT RANGE 2 Picture a word where you ony pay a company once they have deivered Imagine striking oi first, before
More informationThe TCH Machine Translation System for IWSLT 2008
The TCH Machine Translation System for IWSLT 2008 Haifeng Wang, Hua Wu, Xiaoguang Hu, Zhanyi Liu, Jianfeng Li, Dengjun Ren, Zhengyu Niu Toshiba (China) Research and Development Center 5/F., Tower W2, Oriental
More informationFixed income managers: evolution or revolution
Fixed income managers: evoution or revoution Traditiona approaches to managing fixed interest funds rey on benchmarks that may not represent optima risk and return outcomes. New techniques based on separate
More informationSELECTING THE SUITABLE ERP SYSTEM: A FUZZY AHP APPROACH. Ufuk Cebeci
SELECTING THE SUITABLE ERP SYSTEM: A FUZZY AHP APPROACH Ufuk Cebeci Department of Industria Engineering, Istanbu Technica University, Macka, Istanbu, Turkey - ufuk_cebeci@yahoo.com Abstract An Enterprise
More informationParallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems
Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems Ergun Biçici Qun Liu Centre for Next Generation Localisation Centre for Next Generation Localisation School of Computing
More informationChapter 3: e-business Integration Patterns
Chapter 3: e-business Integration Patterns Page 1 of 9 Chapter 3: e-business Integration Patterns "Consistency is the ast refuge of the unimaginative." Oscar Wide In This Chapter What Are Integration Patterns?
More informationDistribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey
Distribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey by Linda Drazga Maxfied and Virginia P. Rena* Using data from the New Beneficiary Survey, this artice examines
More informationThe guaranteed selection. For certainty in uncertain times
The guaranteed seection For certainty in uncertain times Making the right investment choice If you can t afford to take a ot of risk with your money it can be hard to find the right investment, especiay
More informationTERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007.
This is the U.S. Socia Security Life Tabe, based on year 2007. This is avaiabe at http://www.ssa.gov/oact/stats/tabe4c6.htm. The ife eperiences of maes and femaes are different, and we usuay do separate
More informationSpatio-Temporal Asynchronous Co-Occurrence Pattern for Big Climate Data towards Long-Lead Flood Prediction
Spatio-Tempora Asynchronous Co-Occurrence Pattern for Big Cimate Data towards Long-Lead Food Prediction Chung-Hsien Yu, Dong Luo, Wei Ding, Joseph Cohen, David Sma and Shafiqu Isam Department of Computer
More informationTHUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationAA Fixed Rate ISA Savings
AA Fixed Rate ISA Savings For the road ahead The Financia Services Authority is the independent financia services reguator. It requires us to give you this important information to hep you to decide whether
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationLife Contingencies Study Note for CAS Exam S. Tom Struppeck
Life Contingencies Study Note for CAS Eam S Tom Struppeck (Revised 9/19/2015) Introduction Life contingencies is a term used to describe surviva modes for human ives and resuting cash fows that start or
More informationBetting Strategies, Market Selection, and the Wisdom of Crowds
Betting Strategies, Market Seection, and the Wisdom of Crowds Wiemien Kets Northwestern University w-kets@keogg.northwestern.edu David M. Pennock Microsoft Research New York City dpennock@microsoft.com
More informationSubject: Corns of En gineers and Bureau of Reclamation: Information on Potential Budgetarv Reductions for Fiscal Year 1998
GAO United States Genera Accounting Office Washington, D.C. 20548 Resources, Community, and Economic Deveopment Division B-276660 Apri 25, 1997 The Honorabe Pete V. Domenici Chairman The Honorabe Harry
More informationAdvantages and Disadvantages of Sampling. Vermont ASQ Meeting October 26, 2011
Advantages and Disadvantages of Samping Vermont ASQ Meeting October 26, 2011 Jeffrey S. Soomon Genera Dynamics Armament and Technica Products, Inc. Wiiston, VT 05495 Outine I. Definition and Exampes II.
More informationUndergraduate Studies in. Education and International Development
Undergraduate Studies in Education and Internationa Deveopment Wecome Wecome to the Schoo of Education and Lifeong Learning at Aberystwyth University. Over 100 years ago, Aberystwyth was the first university
More informationUEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT Eva Hasler ILCC, School of Informatics University of Edinburgh e.hasler@ed.ac.uk Abstract We describe our systems for the SemEval
More informationA Practical Framework for Privacy-Preserving Data Analytics
A Practica Framework for Privacy-Preserving Data Anaytics ABSTRACT Liyue Fan Integrated Media Systems Center University of Southern Caifornia Los Angees, CA, USA iyuefan@usc.edu The avaiabiity of an increasing
More informationA Latent Variable Pairwise Classification Model of a Clustering Ensemble
A atent Variabe Pairwise Cassification Mode of a Custering Ensembe Vadimir Berikov Soboev Institute of mathematics, Novosibirsk State University, Russia berikov@math.nsc.ru http://www.math.nsc.ru Abstract.
More informationeg Enterprise vs. a Big 4 Monitoring Soution: Comparing Tota Cost of Ownership Restricted Rights Legend The information contained in this document is confidentia and subject to change without notice. No
More informationMulti-Robot Task Scheduling
Proc of IEEE Internationa Conference on Robotics and Automation, Karsruhe, Germany, 013 Muti-Robot Tas Scheduing Yu Zhang and Lynne E Parer Abstract The scheduing probem has been studied extensivey in
More informationHow To Deiver Resuts
Message We sha make every effort to strengthen the community buiding programme which serves to foster among the peope of Hong Kong a sense of beonging and mutua care. We wi continue to impement the District
More informationNiagara Catholic. District School Board. High Performance. Support Program. Academic
Niagara Cathoic District Schoo Board High Performance Academic Support Program The Niagara Cathoic District Schoo Board, through the charisms of faith, socia justice, support and eadership, nurtures an
More informationEnabling Direct Interest-Aware Audience Selection
Enabing Direct Interest-Aware Audience Seection ABSTRACT Arie Fuxman Microsoft Research Mountain View, CA arief@microsoft.com Zhenhui Li University of Iinois Urbana-Champaign, Iinois zi28@uiuc.edu Advertisers
More informationA Supplier Evaluation System for Automotive Industry According To Iso/Ts 16949 Requirements
A Suppier Evauation System for Automotive Industry According To Iso/Ts 16949 Requirements DILEK PINAR ÖZTOP 1, ASLI AKSOY 2,*, NURSEL ÖZTÜRK 2 1 HONDA TR Purchasing Department, 41480, Çayırova - Gebze,
More informationFinance 360 Problem Set #6 Solutions
Finance 360 Probem Set #6 Soutions 1) Suppose that you are the manager of an opera house. You have a constant margina cost of production equa to $50 (i.e. each additiona person in the theatre raises your
More informationSimultaneous Routing and Power Allocation in CDMA Wireless Data Networks
Simutaneous Routing and Power Aocation in CDMA Wireess Data Networks Mikae Johansson *,LinXiao and Stephen Boyd * Department of Signas, Sensors and Systems Roya Institute of Technoogy, SE 00 Stockhom,
More informationInfrastructure for Business
Infrastructure for Business The IoD Member Broadband Survey Infrastructure for Business 2013 #5 The IoD Member Broadband Survey The IoD Member Broadband Survey Written by: Corin Tayor, Senior Economic
More information3.3 SOFTWARE RISK MANAGEMENT (SRM)
93 3.3 SOFTWARE RISK MANAGEMENT (SRM) Fig. 3.2 SRM is a process buit in five steps. The steps are: Identify Anayse Pan Track Resove The process is continuous in nature and handed dynamicay throughout ifecyce
More informationDiploma Decisions for Students with Disabilities. What Parents Need to Know
Dipoma Decisions for Students with Disabiities What Parents Need to Know Forida Department of Education Bureau of Exceptiona Education and Student Services Revised 2005 This is one of many pubications
More informationThe Use of Cooling-Factor Curves for Coordinating Fuses and Reclosers
he Use of ooing-factor urves for oordinating Fuses and Recosers arey J. ook Senior Member, IEEE S& Eectric ompany hicago, Iinois bstract his paper describes how to precisey coordinate distribution feeder
More informationEarly access to FAS payments for members in poor health
Financia Assistance Scheme Eary access to FAS payments for members in poor heath Pension Protection Fund Protecting Peope s Futures The Financia Assistance Scheme is administered by the Pension Protection
More informationl l ll l l Exploding the Myths about DETC Accreditation A Primer for Students
Expoding the Myths about DETC Accreditation A Primer for Students Distance Education and Training Counci Expoding the Myths about DETC Accreditation: A Primer for Students Prospective distance education
More informationFactored Translation Models
Factored Translation s Philipp Koehn and Hieu Hoang pkoehn@inf.ed.ac.uk, H.Hoang@sms.ed.ac.uk School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom
More informationIncome Protection Options
Income Protection Options Poicy Conditions Introduction These poicy conditions are written confirmation of your contract with Aviva Life & Pensions UK Limited. It is important that you read them carefuy
More informationSentiment Analysis with Global Topics and Local Dependency
Proceedings of the Tenty-Fourth AAAI Conference on Artificia Inteigence (AAAI-10) Sentiment Anaysis ith Goba Topics and Loca Dependency Fangtao Li, Minie Huang, Xiaoyan Zhu State Key Laboratory of Inteigent
More informationST. MARKS CONFERENCE FACILITY MARKET ANALYSIS
ST. MARKS CONFERENCE FACILITY MARKET ANALYSIS Prepared by: Lambert Advisory, LLC Submitted to: St. Marks Waterfronts Forida Partnership St. Marks Conference Center Contents Executive Summary... 1 Section
More informationAvaya Remote Feature Activation (RFA) User Guide
Avaya Remote Feature Activation (RFA) User Guide 03-300149 Issue 5.0 September 2007 2007 Avaya Inc. A Rights Reserved. Notice Whie reasonabe efforts were made to ensure that the information in this document
More informationAn Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2012 Editor(s): Contributor(s): Reviewer(s): Status-Version: Arantza del Pozo Mirjam Sepesy Maucec,
More informationLoad Balancing in Distributed Web Server Systems with Partial Document Replication *
Load Baancing in Distributed Web Server Systems with Partia Document Repication * Ling Zhuo Cho-Li Wang Francis C. M. Lau Department of Computer Science and Information Systems The University of Hong Kong
More informationGWPD 4 Measuring water levels by use of an electric tape
GWPD 4 Measuring water eves by use of an eectric tape VERSION: 2010.1 PURPOSE: To measure the depth to the water surface beow and-surface datum using the eectric tape method. Materias and Instruments 1.
More informationCUSTOM. Putting Your Benefits to Work. COMMUNICATIONS. Employee Communications Benefits Administration Benefits Outsourcing
CUSTOM COMMUNICATIONS Putting Your Benefits to Work. Empoyee Communications Benefits Administration Benefits Outsourcing Recruiting and retaining top taent is a major chaenge facing HR departments today.
More informationSecure Network Coding with a Cost Criterion
Secure Network Coding with a Cost Criterion Jianong Tan, Murie Médard Laboratory for Information and Decision Systems Massachusetts Institute of Technoogy Cambridge, MA 0239, USA E-mai: {jianong, medard}@mit.edu
More informationBusiness schools are the academic setting where. The current crisis has highlighted the need to redefine the role of senior managers in organizations.
c r o s os r oi a d s REDISCOVERING THE ROLE OF BUSINESS SCHOOLS The current crisis has highighted the need to redefine the roe of senior managers in organizations. JORDI CANALS Professor and Dean, IESE
More informationPENALTY TAXES ON CORPORATE ACCUMULATIONS
H Chapter Six H PENALTY TAXES ON CORPORATE ACCUMULATIONS INTRODUCTION AND STUDY OBJECTIVES The accumuated earnings tax and the persona hoding company tax are penaty taxes designed to prevent taxpayers
More informationAutomatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
More informationCache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation
Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation Nicola Bertoldi Mauro Cettolo Marcello Federico FBK - Fondazione Bruno Kessler via Sommarive 18 38123 Povo,
More informationPricing and hedging of variable annuities
Cutting Edge Pricing and hedging of variabe annuities Variabe annuity products are unit-inked investments with some form of guarantee, traditionay sod by insurers or banks into the retirement and investment
More informationOligopoly in Insurance Markets
Oigopoy in Insurance Markets June 3, 2008 Abstract We consider an oigopoistic insurance market with individuas who differ in their degrees of accident probabiities. Insurers compete in coverage and premium.
More informationThis paper considers an inventory system with an assembly structure. In addition to uncertain customer
MANAGEMENT SCIENCE Vo. 51, No. 8, August 2005, pp. 1250 1265 issn 0025-1909 eissn 1526-5501 05 5108 1250 informs doi 10.1287/mnsc.1050.0394 2005 INFORMS Inventory Management for an Assemby System wh Product
More informationCertificate in Contemporary Music 2016 For International Applicants
Certificate in Contemporary Music 2016 For Internationa Appicants Quaification Certificate in Contemporary Music Performance Programme eve: Leve 4 Length: Start dates: Study options: One year 15 February
More informationTechnical Support Guide for online instrumental lessons
Technica Support Guide for onine instrumenta essons This is a technica guide for Music Education Hubs, Schoos and other organisations participating in onine music essons. The guidance is based on the technica
More informationHow to Cut Health Care Costs
How to Cut Heath Care Costs INSIDE: TEN TIPS FOR MEDICARE BENEFICIARIES What is one of the biggest financia surprises in retirement? Heath care costs. It s a growing concern among many Medicare beneficiaries,
More informationAN APPROACH TO THE STANDARDISATION OF ACCIDENT AND INJURY REGISTRATION SYSTEMS (STAIRS) IN EUROPE
AN APPROACH TO THE STANDARDSATON OF ACCDENT AND NJURY REGSTRATON SYSTEMS (STARS) N EUROPE R. Ross P. Thomas Vehice Safety Research Centre Loughborough University B. Sexton Transport Research Laboratory
More informationViews of black trainee accountants in South Africa on matters related to a career as a chartered accountant
Views of back trainee accountants in South Africa on matters reated to a career as a chartered accountant ESader Department of Appied Accountancy University of South Africa BJErasmus Department of Business
More informationTeamwork. Abstract. 2.1 Overview
2 Teamwork Abstract This chapter presents one of the basic eements of software projects teamwork. It addresses how to buid teams in a way that promotes team members accountabiity and responsibiity, and
More informationProtection Against Income Loss During the First 4 Months of Illness or Injury *
Protection Against Income Loss During the First 4 Months of Iness or Injury * This note examines and describes the kinds of income protection that are avaiabe to workers during the first 6 months of iness
More informationONE of the most challenging problems addressed by the
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 44, NO. 9, SEPTEMBER 2006 2587 A Mutieve Context-Based System for Cassification of Very High Spatia Resoution Images Lorenzo Bruzzone, Senior Member,
More informationHedge Fund Capital Accounts and Revaluations: Are They Section 704(b) Compliant?
o EDITED BY ROGER F. PILLOW, LL.M. PARTNERSHIPS, S CORPORATIONS & LLCs Hedge Fund Capita Accounts and Revauations: Are They Section 704(b) Compiant? THOMAS GRAY Hedge funds treated as partnerships for
More informationLIUM s Statistical Machine Translation System for IWSLT 2010
LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,
More informationFIRST BANK OF MANHATTAN MORTGAGE LOAN ORIGINATORS NMLS ID #405508
ITEMS TO BE SUBMITTED WITH HOME EQUITY LOAN APPLICATION Bring In: Pay stubs from the ast 30 days W-2 s and Tax Returns from the ast 2 years Bank Statements from ast 2 months (A Pages) Copy of Homeowner
More informationWith the arrival of Java 2 Micro Edition (J2ME) and its industry
Knowedge-based Autonomous Agents for Pervasive Computing Using AgentLight Fernando L. Koch and John-Jues C. Meyer Utrecht University Project AgentLight is a mutiagent system-buiding framework targeting
More informationVacancy Rebate Supporting Documentation Checklist
Vacancy Rebate Supporting Documentation Checkist The foowing documents are required and must accompany the vacancy rebate appication at the time of submission. If the vacancy is a continuation from the
More informationeffect on major accidents
An Investigation into a weekend (or bank hoiday) effect on major accidents Nicoa C. Heaey 1 and Andrew G. Rushton 2 1 Heath and Safety Laboratory, Harpur Hi, Buxton, Derbyshire, SK17 9JN 2 Hazardous Instaations
More informationHybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
More informationHuman Capital & Human Resources Certificate Programs
MANAGEMENT CONCEPTS Human Capita & Human Resources Certificate Programs Programs to deveop functiona and strategic skis in: Human Capita // Human Resources ENROLL TODAY! Contract Hoder Contract GS-02F-0010J
More informationyour statement of insurance
your statement of insurance Schoo - Winter Trave Insurance poicyhoder: STG issued on: 1st February 2013 poicy number: NS9 0001313 reason for issue: new business This Statement of Insurance forms part of
More informationWHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization
Best Practices: Pushing Exce Beyond Its Limits with Information Optimization WHITE Best Practices: Pushing Exce Beyond Its Limits with Information Optimization Executive Overview Microsoft Exce is the
More informationJane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation Joern Wuebker M at thias Huck Stephan Peitz M al te Nuhn M arkus F reitag Jan-Thorsten Peter Saab M ansour Hermann N e
More informationThe Web Insider... The Best Tool for Building a Web Site *
The Web Insider... The Best Too for Buiding a Web Site * Anna Bee Leiserson ** Ms. Leiserson describes the types of Web-authoring systems that are avaiabe for buiding a site and then discusses the various
More informationREADING A CREDIT REPORT
Name Date CHAPTER 6 STUDENT ACTIVITY SHEET READING A CREDIT REPORT Review the sampe credit report. Then search for a sampe credit report onine, print it off, and answer the questions beow. This activity
More informationChapter 3: JavaScript in Action Page 1 of 10. How to practice reading and writing JavaScript on a Web page
Chapter 3: JavaScript in Action Page 1 of 10 Chapter 3: JavaScript in Action In this chapter, you get your first opportunity to write JavaScript! This chapter introduces you to JavaScript propery. In addition,
More informationDesign of Follow-Up Experiments for Improving Model Discrimination and Parameter Estimation
Design of Foow-Up Experiments for Improving Mode Discrimination and Parameter Estimation Szu Hui Ng 1 Stephen E. Chick 2 Nationa University of Singapore, 10 Kent Ridge Crescent, Singapore 119260. Technoogy
More informationBooks on Reference and the Problem of Library Science
Practicing Reference... Learning from Library Science * Mary Whisner ** Ms. Whisner describes the method and some of the resuts reported in a recenty pubished book about the reference interview written
More informationMigrating and Managing Dynamic, Non-Textua Content
Considering Dynamic, Non-Textua Content when Migrating Digita Asset Management Systems Aya Stein; University of Iinois at Urbana-Champaign; Urbana, Iinois USA Santi Thompson; University of Houston; Houston,
More informationThe KIT Translation system for IWSLT 2010
The KIT Translation system for IWSLT 2010 Jan Niehues 1, Mohammed Mediani 1, Teresa Herrmann 1, Michael Heck 2, Christian Herff 2, Alex Waibel 1 Institute of Anthropomatics KIT - Karlsruhe Institute of
More informationWho Benefits From Social Health Insurance in Developing Countries?
Who Benefits From Socia Heath Insurance in Deveoping Countries? Pau Gerter University of Caifornia at Bereey and NBER Orvie Soon University of the Phiippines, Schoo of Economics March, 2000 Abstract A
More informationBreakeven analysis and short-term decision making
Chapter 20 Breakeven anaysis and short-term decision making REAL WORLD CASE This case study shows a typica situation in which management accounting can be hepfu. Read the case study now but ony attempt
More informationSYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
More informationAnnual Notice of Changes for 2016
Easy Choice Best Pan (HMO) offered by Easy Choice Heath Pan, Inc. Annua Notice of Changes for 2016 You are currenty enroed as a member of Easy Choice Best Pan (HMO). Next year, there wi be some changes
More informationA Branch-and-Price Algorithm for Parallel Machine Scheduling with Time Windows and Job Priorities
A Branch-and-Price Agorithm for Parae Machine Scheduing with Time Windows and Job Priorities Jonathan F. Bard, 1 Siwate Rojanasoonthon 2 1 Graduate Program in Operations Research and Industria Engineering,
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58 Ncode: an Open Source Bilingual N-gram SMT Toolkit Josep M. Crego a, François Yvon ab, José B. Mariño c c a LIMSI-CNRS, BP 133,
More informationThe definition of insanity is doing the same thing over and over again and expecting different results
insurance services Sma Business Insurance a market opportunity being missed Einstein may not have known much about insurance, but if you appy his definition to the way existing brands are deveoping their
More informationVendor Performance Measurement Using Fuzzy Logic Controller
The Journa of Mathematics and Computer Science Avaiabe onine at http://www.tjmcs.com The Journa of Mathematics and Computer Science Vo.2 No.2 (2011) 311-318 Performance Measurement Using Fuzzy Logic Controer
More informationLT Codes-based Secure and Reliable Cloud Storage Service
2012 Proceedings IEEE INFOCOM LT Codes-based Secure and Reiabe Coud Storage Service Ning Cao Shucheng Yu Zhenyu Yang Wenjing Lou Y. Thomas Hou Worcester Poytechnic Institute, Worcester, MA, USA University
More information