Et Cotxtsstv Word Compto or Mob Dvs ABSTRACT Ata va d Bosh Tburg Ctr or Cratv Computg Tburg Uvrsty P.O. Box 90153, NL5000 LE Tburg, Th Nthrads Ata.vdBosh@uvt. Word ompto s a bas thoogy or rdug th ort vovd txt try o mob dvs ad augmtatv ommuato dvs, whr y ad as o us ar dd, but whr a ow mmory ootprt s aso rqurd. Stadard soutos omprss a xo to a sux tr wth a sma mmory ootprt ad hgh rtrva spd. Kystrok savgs, a masurab orrat o txt try ort ga, typay mprov wh th agorthm woud aso tak to aout th prvous word; howvr, ths oms at th ost o a arg ootprt. W dvop two word ompto agorthms that od th prvous word th put. Th rst agorthm utzs a haratr bur that uds a xd umbr o rt kystroks, udg thos bogg to prvous words. Th sod agorthm uds th ompt prvous word as a xtra put atur. I smuato studs, th rst agorthm yds markd mprovmts kystrok savgs, but has a arg mmory ootprt. Th sod agorthm a b tud by rquy thrshodg to hav a sma ootprt, ad b ss tha o ordr o magtud sowr tha th bas systm, wh ts kystrok savgs mprov ovr th bas. Catgors ad Subjt Dsrptors D.2 [Sotwar]: Sotwar Egrg Gra Trms Agorthms, Huma Fators, Prorma Kywords Word ompto, prdtv txt prossg, mob dvs, otxt sstvty, sag, rgooms 1. INTRODUCTION Word ompto s a bas thoogy or txt try o mob dvs, as w as a mportat ompot augmtatv aguag thoogy toos [3]. Its ma am s to rdu th umbr Prmsso to mak dgta or hard ops o a or part o ths work or prsoa or assroom us s gratd wthout provdd that ops ar ot mad or dstrbutd or prot or ommra advatag ad that ops bar ths ot ad th u tato o th rst pag. To opy othrws, or rpubsh, to post o srvrs or to rdstrbut to sts, rqurs pror sp prmsso ad/or a. Copyrght s hd by th author/owr(s). MobHCI 2008, Sptmbr 2 5, 2008, Amstrdam, th Nthrads. ACM 9781595939524/08/09. To Bogrs Tburg Ctr or Cratv Computg Tburg Uvrsty P.O. Box 90153, NL5000 LE Tburg, Th Nthrads A.M.Bogrs@uvt. o kystroks ssary durg txt try by orrty suggstg th ompto o th word urrty bg trd, as soo as possb, so that th word dos ot hav to b ompty kyd. O straghtorward stratgy s to grat a suggsto at th word s uty pot,.. th pot at whh th word s th oy word avaab th agorthm s tra word mod (.g. a st o words) that ts th strg o haratrs kyd so ar. Th word ompto agorthm may aso vtur to suggst a rta word v bor ay o th possb words uty pots ar rahd. From th momt that th mod suggsts a orrt word ompto (vsuay dspayd a dsgatd part o th dv s sr), th usr s typay ab to k a Apt butto to aord ad at th suggsto. Atua savgs ar mad wh th suggsto s aordd bor th word s borast haratr s kyd. Athough word ompto agorthms ar usd may dvs wth som suss, urrt word ompto systms rma mprt, ad ar vurab to at ast th oowg two ators. Th rst ator that hamprs sstay a word ompto systms s that wh a word urrty bg trd dos ot our th systm s tra word or aguag mod, th systm w ot suggst t, ad th usr w d to ky th u word. Ths probm a oy b ovrom by udg mor words th ostruto o th agorthm, th hop o havg a bttr ovrag o w, us txt. A sod ssu that apps to th smpr kd o word ompto systms that ar oy basd o a word st, s that atr vry spa or putuato mark, ths systms start to guss words aw wthout takg to aout th prvous squ o haratrs or words. Yt, a ass ad udamta sght, quatd ormato thory [11], s that haratrs or words prdg th urrt word may otrbut ormato that woud ab suggstg th urrt word arr tha ts uty pot, som ass v mmdaty. Usg ormato rom th prvousy trd txt may aso hp pkg o partuar suggstd word ovr atratv words wth th sam ta haratrs, but whh ar ss ky gv th prvous word. I ths papr, w xpor ths two probms, provd soutos, ad aayz thr ratv utty. It s struturd as oows. Atr rvwg ratd work th ara o word ompto (or prdtv txt prossg) agorthms Sto 2, w trodu a otxtsstv bas word ompto agorthm ad two otxtsstv varats Sto 3. Atr trodug our trag ad tstg data ad our smuato study Sto 4, w show th rato btw th suss o our thr agorthms (masurab th prtag o ky prsss savd) ad th sz ad ovrag o thr word mods, ad w ompar th thr mods, Sto 5. W oud Sto 6 that by udg otxt, kystrok savgs a dd b mprovd at th ost o mm IO 465
ory dd; ad that a otxtsstv word ompto agorthm that taks to aout th u prvous word ( rquythrshodd) as a atur, yds substata mprovmts kystrok savgs, agast oy a modst ras mmory usag. As a a ot, w dsuss vaorzato aspts o word prdto systms gra, ad our study partuar, Sto 7. 2. RELATED WORK Word ompto systms am to rdu th ort spt o trg txt a dgta dv. Gray spakg, thr ar two drt approahs to rdug th ort o txt try [12]. Th rst approah s to d a way o rdug th physa rstrats o trg txt by thr usg atrat modats or by hagg or augmtg th kyboard ayout suh as rordrg th ky mappgs. Examps o suh rordrgs hav b proposd by [6] ad [13], amog othrs. Th othr approah ad th o w ous o ths papr ams at rdug th amout o typg ssary to tr a txt by usg prdtv typg ads suh as word ompto systms. Suh ads try to prdt th ompto o th urrt word as t s bg kyd, ad ar avaab or put modats ragg rom u kyboards to th mor rstrtd kypads o mob phos. For sta, th popuar T9 agorthm was dsgd spay to or prdtv txt try o th stadard 12ky kypad ayout o th urrt grato o mob phos [5]. Howvr, a growg umbr o mob phos, PDAs, ad BakBrrys sport u QW ERTY kyboards. W thror ous o prdtv txt try usg th attr typ o kyboards ths papr. Prdtv txt try agorthms us otxt ormato to atpat what bok o haratrs (ttrs, grams, syabs, words, or tr phrass) a prso s gog to wrt xt [3]. Th bok sz th typg ad trs to prdt, us th potta savgs trms o tm ad kystroks. Prdto o gram szd boks has b appd by, or sta, Goodma t a. (2002), who usd gram mods to orrt ad prdt usr put usg sot kyboards,.. osr kyboards opratd usg a touh sr or a styus [4]. Usg aguag mods o at most th 6 prvous haratrs, thy wr ab to sussuy orrt usr put, wh t was ot try ar what ky th usr mat to prss. MaKz t a. (2001) aso us ttr squ probabts to dsambguat btw ambguous usr put [9]. Thy mphasz th ssty o kpg th mmory ootprt sma wh at th sam tm maxmzg prdtv prorma. Howvr, prdtg grams or syabs a rsut rasd ogtv ort or th usr who has to hk ad approv th prdtos. Baus words (whtspadmtd squs o ttrs) ar mor asy rogzab, thrby rdug th ogtv ort rqurd, words ar most wdy usd as prdtd boks [3]. O o th arst appatos o prdtv txt try was th Ratv Kyboard by Darragh t a. (1990). Thy adoptd a dyam, mpt, ad adaptv modg stratgy by usg a trbasd mmory strutur to stor rurrg grams ad quky math substrgs assoatd wth prdtos. Suggstos wr sortd by gth ad rquy, wth th ogst, most rqut substrgs bg prrrd [2]. How t a. (2005) took th prvous word to aout th word ompto task by usg usg a bgram word mod to prdt th most ky word th squ basd o mathg haratr prxs o th urrt word bg typd. Thy rport tm savgs o 14.1% o avrag [6]. TaakaIsh (2007) ompard our aguag mods or prdtv txt try [13]: two smp mods basd o rquy ad ry outs, a mod basd o oourr btw words, ad a adaptv gram mod that taks to aout th probabty dstrbuto o words prvousy usd by th usr as w. Th adaptv gram mod prormd bst. Stoky t a. (2004) took a smat approah to word prdto stad o a pury statsta o, by ookg up ommo ss otxt rom th Op Md Commo Ss (OMCS) projt or ah omptd word [12]. A words rom th rtrvd otxts hav thr rquy sors updatd th txt prdto dtoary; basd o th rst typd haratrs o th xt word, words ar th suggstd by rquy. As thr trag matra, thy usd a orpus o 5,500 ma mssags st ovr th ours o o yar by o usr, osstg o 1.1M words. I addto, thy usd thr smar Wb pag orpora otag btw 10,500 ad 16,500 words. Som gra ousos a b draw rom th body o ratd work. O s that xprts td to bt mor rom prdtv txt try or rdudsz kyboards tha ovs do, but or stadardszd kyboards ths stuato s rvrsd [2]. Ths s baus prdtv typg ads ar mor ky to b usu stuatos wth ostadard kyboards, whr pop aot tr haratrs as ast as orma. Prdtv txt try aso tds to yd ss bt wh prdtg SMS txt tha orma txt, baus th voabuary s mor prsoazd ad baus avrag word gth tds to b smar, h th potta savgs ar aso rdud. Fay, a gra probm s that a ak o stadardzd tst orpora maks omparg th drt approahs dut. Th majorty o prdtv txt approahs td to b vauatd o drt orpora, usuay omposd o wspapr txt. A addtoa probm s that vry rsarh ort sms to b vauatd usg drt mtrs as w: whr o approah s vauatd usg ht rato (.. th v o auray prdtg th orrt word), othrs ar vauatd trms o kystrok savgs or tm savgs. 3. THREE WORD COMPLETION ALGO RITHMS Th two dsjot goas o a ast word ompto agorthm ar () that t ds to hod a wdovrag word or aguag mod, ad () t shoud b ab to say at th arst possb pot o a haratr squ bg trd, wth a suss rat that shoud b as hgh as possb, to whh word th squ a b omptd. Ths task a b phrasd as a assato probm whh a put squ o trd kys s mappd to th word that s atuay bg tdd by th prso kyg th txt. Suppos that a prso s kyg th txt t o a orma QWERTY kyboard. Frst, th word prdto agorthm ds to suggst t durg th rst two kystroks. Th, atr bg rst by th spa bar, t ds to suggst throughout th squ o that word bg kyd, ttra. Its assato task thus a b mad xpt as thrt assato stas dptd at th thad sd o Fgur 1. To tra a assr, thr a xo or a txt oud b odd wth th shm as ustratd Fgur 1, rsutg as may abd stas as haratrs th xo or txt. Th assr, th, has to u th d o prsrvg a th words th xo or trag txt, ad bg ab to grat, gv w put o ky squs, prdtos o th orrt words as soo as possb as th word s bg kyd. To ths purpos, th gra strutur o trs ar w sutd [8]. I our study, w mpoy th IGTr agorthm [1], whh mpmts tr omprsso ad assato. I th rst omprsso phas, a st o words (thr rom a xo or a txt) s omprssd by IGTr to a tr strutur. To do ths, rst a otm ordrg o aturs s omputd, whr th aturs ar th postos 466 IO
haratr bur prdto haratr bur prdto haratr bur prvous word prdto t s t t spa spa t t t t t t t s t s s s s s t t spa spa t s t t t t t t t spa spa Fgur 1: Examp assato stas drvd rom th st t or th otxtsstv agorthm (t), th haratr otxt agorthm (mdd), ad th prvousword otxt agorthm (rght). th haratr bur. Th ordrg o th aturs s dtrmd by thr ormato ga rato wth rspt to prdtg th word [10]. Gv ay rasoab amout o xamps, th rghtmost atur,.. th most rty prssd ky, has th ovra hghst prdtv powr, h th argst ga rato. Th ovra ordrg s rom rght to t. Subsquty, a root od s ratd, whh rprsts th most ky word wh o ky s prssd yt. Ths root od as out to a rst ayr o ods through ars, whr th ars rprst a possb kystroks. Eah rstayr od rprsts th most ky word gv a sg kystrok, ad brahs out to sodayr ods, otd by ars that dot a possb kystroks gv th rst kystrok. A od boms a d od th ar adg to that od uquy dts a sg word, v th ar s ot th ast haratr o th word. Ths agorthm s rursvy appd ut a omprssd tr s produd, rady to pross w stas. Th tr asss w omg stas (whr th kystroks ar kow, but th word ds to b prdtd) by travrsg th tr wth th urrt haratr bur (. Fgur 1) as put aturs. Startg wth th root od, t dtrmstay taks th ar rprstg th rghtmost haratr, ad otus oowg mathg ars th ordrg o th aturs ut thr () t outrs a d od, at whh pot th prdtd word s mttd as output, or () t outrs a odg od rom whh t ds o mor mathg ars otg to ods urthr dow th tr, at whh pot t mts th most ky word so ar. I th argr otxt o th txt appato, ths tr s mbddd a ratm wrappr that rads ah omg kystrok, updats th haratr bur (shtg t tward wth ah kystrok, rasg t atr ah spa), sds th bur to th tr, aths th prdto mttd by th tr, ad prsts ths to th usr, who a th prss a spa Apt ky to apt th tr s suggsto. To ud otxt o prvousy trd txt, two bas optos ar avaab ad tstd ths papr. Th rst, ustratd th mdd o Fgur 1, s to ot rst th haratr bur atr th spabar or a putuato ky s ht, but rathr to kp th haratr bur d wth a xd umbr o rty prssd kys thus udg th prvous word, or at ast a part o t, or somtms v a part o th borast word. Th sod varat s to ud th u prvous word (or words) kyd bor th ast spa. Th rghthad part o Fgur 1 dspays th stup xpord ths papr, whr a smp haratr bur that s rst atr ah spa, s aompad by a sg atur arryg th word prvous to th word urrty bg prdtd. I th tr, ths prvous word atur s mxd wth th othr ttr aturs, makg th tr somwhat mor ompatd shap. A urstrtd word atur a hav hudrds o thousads o uqu vaus (.. a uqu words ourrg th moword orpora usd or trag), h a tr that woud spt suh a od woud b shattrd ovr a Zpa dstrbuto o ods, udg a og ta o hapax vaus (words ourrg o, typay about ha o a th uqu words th trag orpus) org o grazato powr, ad ky produg orrt xtword suggstos [14]. As suggstd by Va d Bosh (2005), a wordvaud atur a b mad mor t wh tgratd a tr strutur wh oy th most rqut vaus o th atur,.g. th top most rqut words, ar kpt as dsrt vaus, ad a othr vaus ar umpd togthr udr a dummy vau. I mpra tsts w st = 200, produg th bst prorma o a hdout tst st. Our wordvaud atur thus auss a 201od brahg th tr at th v t s vokd. 4. EXPERIMENTAL SETUP W ru smuato studs o a arg amout o otrod data, spt to trag ad tstg matra. Ths smuato studs muat huma typg bhavor a dtrmst way,.., thy muat a xd stratgy typg whh orrt word ompto suggstos ar aptd at th arst possb pot. Rug suh a arta smuato study abstrats away rom huma rror ad dvatg stratgs (suh as gorg orrt suggstos oasoay), aowg xprmts wth vry arg amouts o data, but dsaowg th masurmt o ra prossg tms, masurmts o rgy spt, or bra atvty masurmts. Esstay, w assum that kystrok savgs ar a ssb orrat o huma ort savgs. Our xprmts ar basd o Duth data, wth th purpos o posg a sghty argr hag tha Egsh. Lk Egsh ad Grma, Duth s a grma aguag. Lk Grma, Duth has a produtv ompoudg morphoogy, aowg or og admssb words. Ths phomo puts a prssur o th ukow words probm, as og ompouds td to b rar, ad ompard to Egsh, tak away rquy outs rom th words thy ar omposd o. As a txt orpus or trag ad tstg, w usd th rst our moths o a Duth oa wspapr s u art arhv, spt to a trag orpus o up to t mo words, ad a dsjot tst st o 100,000 words. Th orpus thus ossts o jourast txt, ovr IO 467
g oa, atoa, ad tratoa tops o a sorts. I addto, w ud o atratv tst st rprstg mor ooqua ad soa aguag, vz. a oto o trasrbd atoa daogus rom th Spok Duth Corpus 1. Ths suborpus s atgorzd as th most spotaous rgstr th orpus, ad otas a tota o 2,444,755 words atr bas aup (w rmovd trasrbd dsus, ad ovrtd d pauss to ommas, to bttr rsmb orthograph txt). Not that wth both sts w ar usg rug txt as trag data, ad ot a xo. Obvousy w d th txt to b ab to grat th thad otxtua aturs th two otxtsstv word ompto systm varats. At th sam tm, ths approah dos ma or th otxtsstv varat that t s trad o a uqu words ourrg th orpus, ad othg s,.g. o xtra baad or yopd st o Duth words, whh oud b a wom boost prp. I ordr to aow or a propr omparso btw th thr systms, w tra ad tst a thr o th xat sam matra. 4.1 Exprmta dsg ad vauato Kpg th tst sts ostat, w vary th amout o trag matra a psudoxpota srs, startg at 10,000 words, dg at 10,000,000. At ah stp th urv, w prorm a u IGTr xprmt whh w grat a tr, ad pross th tst sts wth th tr. At ah smuato xprmt, w masur th oowg or vauato purposs: 1. Th umbr o ods th tr; 2. Th umbr o haratrs prossd pr sod by th IGTr assato agorthm; 3. Th prtag o kystroks savd. Th attr prtag o kystroks savd s th proporto o th tota umbr o kystroks dd to grat th tr tst txt, oowg a orrt suggstos o th systm as soo as possb, agast th tota umbr o kystroks dd wh th u tst txt woud b typd haratr by haratr. I prp, kystroks a b savd th word s orrty suggstd bor th borast haratr s prssd. Th amout o kystroks savd w thror b 0.0% or hghr; at 0.0%, th word ompto systm s ompty tv. 5. RESULTS Fgur 2 dspays th arg urv rsuts o our thr varats, masurd o th 100,000words tst st draw rom th sam sour as th trag matra. Th xaxs, rprstg th umbr o words th trag orpus, oows a ogarthm sa, wh th yaxs rprsts th prtag o kystroks savd agast th daut stuato o typg a words haratr by haratr. Frst, w obsrv that a thr s xhbt a upward trd; trad o mor data, thy ad to bttr kystrok savgs. A sod obsrvato s that th otxtsstv varats, both th o basd o a xdgth bur wth prvous haratr otxt (th dashd ) ad th o basd o th prvous word otxt (th dottd ), appar to b xhbtg a ogar growth; wth vry doubg o th trag data, thy yd a ostat mprovmt kystrok savgs. I otrast, th urv rprstg th otxtsstv bas systm appars to tapr o, ag rasgy bhd th th otxtsstv varats. 1 http://ads.t.ku./g/hom.htm % kyprsss savd 25 20 15 10 5 No otxt Charatr otxt Top 200 word otxt 0 10000 100000 1+06 1+07 trag words Fgur 2: Larg urvs o th thr word ompto varats, trms o th prtag o haratrs savd, masurd o prossg th 100,000word tst txt. At th maxma trag st sz o 10 mo words, th bas systm savs 14.5% kystroks. I otrast, th systm usg prvous haratr otxt savs 22.4% (a ratv ras o 54%), ad th systm usg th prvous word as otxt savs 19.6% (a ratv ras o 35%). Howvr, otxt sstvty oms at th ost o a argr mmory ootprt. Usg th sam ogarthm xaxs as Fgur 2, Fgur 3 dspays th umbr o tr ods dd at th varous trag st szs by th thr systms. Th yaxs s aso ogarthm. I ths ogog spa, th umbrs o tr ods dd by both two haratrburbasd systms, th otxtsstv bas, ad th systm usg th prvous haratr otxt hav a ar rato wth th umbr o trag stas. At th argst trag st sz th otxtsstv bas systm uss about 15 tms ss mmory tha th systm usg th haratr otxt: 1.7 mo ods vs. 26.2 mo ods. Sghty otrast, th amout o ods dd by th systm usg th prvous word as otxt taprs o wth mor trag data, dg up at about doub th amout o ods, 3.4 mo, ompard to th bas systm at th sam 10 mo trag words. Th mmory ootprt urv o ths systm v dspays a md dras, ausd by th at that th wordvaud atur th tr ds up a owr pa th ga rato atur rakg wh mor trag data s avaab.. wth mor trag data th atur s tstd dpr th tr, ausg ratvy ss ragmtato hghr up th tr. Bsds mmory ootprt, w woud k to sur that our varats ar ot xptoay sowr tha th bas, as th agorthm has to b ast ough to oow ra tm kystroks, th avrag spd o whh has b stmatd at 0.12 sods pr kystrok or a good typst, ad 0.28 sods or a bad typst [7]. Fgur 4 dspays th spd o th thr systms usg th sam ogarthm arg urv axs, masurd ovr th u 100,000 word tst st utrruptd assato mod, o a statothart mutor omputg srvr. Th gur shows that th spd o th thr systms, dspt thr wdy drg mmory ootprts, s qut smar, ad suty hgh. Th spd o a thr systms ds rom about 65 thousad haratrs (kystroks) pr sod 468 IO
tr ods 1+07 1+06 No otxt Charatr otxt Top 200 word otxt Tab 1: Dr kystrok savgs o th doma 100,000word tst st, vs. th outodoma spotaous daogu tst st at th maxma trag st sz by th thr systms. % Kystroks savd Systm Irgstr tst Outorgstr tst Bas 14.5 5.3 Charatr otxt 22.4 9.8 Word otxt 19.5 8.4 100000 10000 10000 100000 1+06 1+07 trag words trast wth th savgs o th rgstr tst st, by th thr systms. It s qut appart rom th two oums Tab 1 that th kystrok savgs o th tst st rom th drt rgstr ar dramatay owr, udr 10%. At th sam tm, th ratv gas by th otxtsstv mthods oow th sam pattr as obsrvd bor, ad v sghty bttr trms o ratv mprovmt. Fgur 3: Mmory usag o th thr word ompto varats, trms o th umbr o tr ods. to aroud 40 thousad haratrs pr sod or th two otxtsstv systms, ad aroud 50 thousad haratrs pr sod or th otxtsstv bas systm at 10 mo words o trag data. Ev th prossg ut o th mob dv s two ordrs o magtud sowr tha th AMD Optro hpst o our omputg srvr, t woud b ast ough to grat w suggstos mmdaty atr ah w kystrok. haratrs pr sod 100000 80000 60000 40000 20000 0 10000 100000 1+06 1+07 trag words No otxt Charatr otxt Top 200 word otxt Fgur 4: Spd o th thr word ompto varats, trms o th umbr o haratrs prossd pr sod. Th rsuts dspayd Fgurs 2, 3, ad 4 ar obtad wth th 100,000word tst st dsjot rom, but rom th sam org ad rgstr as th trag st o wspapr txts. As argud arr, a xtra tst st o a drt rgstr woud pos a pottay dut hag to th word ompto agorthm, just as gr, doma ad rgstr drs do aguag modg [14]. Tab 1 sts th kystrok savgs attad o th xtra tst st o trasrbd spotaous Duth atoa daogus, o 6. DISCUSSION I ths papr w xpord two varats o a bas word ompto systm, that mak us o prvousword otxt two drt ways. W ra smuato xprmts, masurg kystrok savgs, udr th assumpto that ths savgs ar a orrat o huma ort savd wh th word ompto agorthm woud b orporatd txt try sotwar. O varat uds th prvous haratr otxt up to a xd umbr o haratrs, wh th othr varat udd a rquythrshodd rprstato o th prvous word that was kyd bor th urrt o. Arguaby, th varat odg th u prvous word apturs th gust tuto that th dtty o th prvous word hods th rvat ormato, otrast to th prvous haratrs. O th othr had, th prvous haratrs do od th dtty o th prvous word or words mpty. Aso, thy hod parta ttrbyttr ormato o th d o th prvous word, apturg morphooga ormato o suxs ad tos, whh may arry prdtv ormato as w. Wth mor data, a systms prorm bttr, but wh th bas systm s prorma ga taprs o, th gas o th otxtsstv systms mprov ovr th bas, ad appar to ras at a ogar pa. At 10 mo words o trag data, th bst kystrok savgs prtag o 22.4% s attad by th systm usg th prvous haratr otxt, a ratv mprovmt o 54% ovr th bas savgs prtag o 14.5%. Uortuaty, ths optma savgs prtag oms wth th argst mmory ootprt. Gv that a straghtorward (uooma) mpmtato o th tr osts 20 byts pr od, th bstprormg systm woud rqur 524 Mb o workg mmory, whh s a asb amout o mmory o most urrt mob dvs. Th varat that uss th rquythrshodd prvous word as otxtua ormato uss substatay ss mmory ad oy about doub th amout o mmory that th bas systm uss. Th rquy thrshodg, tud by stg a optma top words o a hdout tst st, rsutd bg st to 200, whh s smar to th th typa umbr o uto words grma aguags (th top 200 most rqut Duth words ud most o th osdass uto word atgors suh as dtrmrs, ojutvs, proous, ad to a ssr dgr prpostos, ad ota ratvy w opass words). As a addtoa aayss, w ompard th y o our thr systms o a tst st that otas wdy drt typ o txt tha th trag st. Wh th trag st otas wspapr txt, th xtra outorgstr tst st otad a arg st o IO 469
trasrbd spotaous atoa ovrsatos about muda tops ad otag a ot o soa dsours ad hthat. Importaty, w rportd a dramat drop th kystrok savgs o ths typ o txt, to udr 10%. At th sam tm, w otd osdraby bttr savgs by th otxtsstv mthods as ompard to th bas systm. I utur work, o obvous xtso to th urrt matrx o xprmts s to mx th ros o trag ad tst sts systmatay (so that th trasrbd daogu orpus s aso trag st o st o xprmts), ad aso to grat a ombd trag st o th two rgstrs. Othr mor xtrm tst sts may b osdrd, whh abbrvatos ar usd typa o mob pho ad hat txt try, suh as w8 or wat, or whr th us o aguag s soppy ad u o spg ad grammar rrors. Both short words ad rrors pos a hag to word ompto agorthms, as w savgs a b mad wth short words, ad rrors osttut ukow toks. Othr avus o utur rsarh ud robustss to mutgua data, whh a b s as a xtrm orm o drt rgstrs; raty, t s qut ky that may usrs w atuay grat drt txts drt rgstrs ad aguags o th sam dv. Aso, prsoazato ad rmta trag ad adaptato to dvdua usrs shoud b tgratd th urrt approah, as w as adaptabty to kyboards wth ss kys tha th aguag has haratrs (suh as mob pho kypads, or QWERTYkyboards or rta Asa aguags). 7. VALORIZATION Word ompto subsystms ar rquty udd ommuato ads ordr to ras th usr s rat o ommuato. Ads or prdtv txt try hav svra mportat appatos, wth mprovd ass or pop wth dsabts bg amog th most mportat os. Pop wth svr motor ad ora dsabts suh as rbra pasy or hmpga a graty bt rom toos that ab thm to ras thr ommuato spd [3]. For may yars, prdtv txt try thqus hav aso oud appato aguags wth argvoum haratr sts suh as Chs or Japas, whr th stadard kyboard dos ot hav ough kys to aommodat a th haratrs prst th aguag. Hr, typg ads a ab th projto o a wd rag o haratrs wth a mtd umbr o kys; at, ths s urrty th major mthod o sovg ths probm [13]. Othr popuar appatos o prdtv txt try ar put stuatos wth a rdud umbr o kys, suh as mob phos, ad stuatos wthout drt tat dbak, suh as styusopratd touh srs. Most urrt mob phos ar quppd wth som vrso o a word ompto agorthm; th T9 agorthm [5], or sta, s amd to b shppd wth 2.5 bo dvs 2, o th bass o whh th owrs oud am to hav th most wdy dstrbutd p o aguag thoogy th word. Our partuar otrbuto to th urrt stat o th art word ompto thoogy s that w ormuat rommdatos o how th y o ths thoogy a b mprovd, at ary mtd osts (a mdy hghr mmory ootprt). Our arg urv xprmts or a too to hart th trado btw havg mor trag matra ad mor otxt sstvty (ausg hghr txt try y) ad dg mor mmory. As mmory mtatos o mob dvs ar graduay ad stady bg rdud, our rommdatos oud b usd by mob dv dvoprs to support th dso to mov rom a ratvy mtd word om 2 http://www.ua.om/ws/prssrass/ 2007/20070824tg.asp pto systm to a ss ostrad otxtsstv systm, org ustomrs a otaby bttr srv. Akowdgmts Th rsarh rportd ths papr has b udd by th Duth Mstry o Eoom Aars, udr ts Iovato Programm o MaMah Itrato, IOP MMI. Th authors wsh to thak Mo va Zaa, Hrma Sthouwr, Ptr Brk, othr mmbrs o th ILK (Tburg) ad ILPS (Amstrdam) rsarh groups, ad Frak Hostd or ommts ad suggstos. 8. REFERENCES [1] W. Damas, A. Va d Bosh, ad A. Wjtrs. IGTr: Usg Trs or Comprsso ad Cassato Lazy Larg Agorthms. Arta Itg Rvw, 11:407 423, 1997. [2] J. J. Darragh, I. H. Wtt, ad M. L. Jams. Th Ratv Kyboard: A Prdtv Typg Ad. Computr, 23(11):41 49, 1990. [3] N. GarayVtora ad J. Abasa. Txt Prdto Systms: A Survy. Uvrsa Ass th Iormato Soty, 4(3):188 203, 2006. [4] J. Goodma, G. Voa, K. Stury, ad C. Parkr. Laguag Modg or Sot Kyboards. I Prodgs o IUI 02, pags 194 195, Nw York, NY, USA, 2002. ACM. [5] D. L. Grovr, M. T. Kg, ad C. A. Kushr. Rdud Kyboard Dsambguatg Computr. Patt No. US5818437, Tg Commuatos, I., Satt, WA, Otobr 1998. [6] Y. How ad M.Y. Ka. Optmzg Prdtv Txt Etry or Short Mssag Srv o Mob Phos. I M. J. Smth ad G. Savdy, dtors, Prodgs o HCII 05, Las Vgas, NV, Juy 2005. Lawr Erbaum Assoats. [7] D. Kras ad B. Joh. Th GOMS Famy o Aayss Thqus: Toos or Dsg ad Evauato. Tha Rport CMUHCII94106, Carg Mo Uvrsty, 1994. [8] D. E. Kuth. Th Art o Computr Programmg, voum 3: Sortg ad Sarhg. AddsoWsy, Radg, MA, 1973. [9] I. S. MaKz, H. Kobr, D. Smth, T. Jos, ad E. Skpr. LttrWs: PrxBasd Dsambguato For Mob Txt Iput. I Prodgs o UIST 01, pags 111 120, Nw York, NY, USA, 2001. ACM. [10] J. Qua. C4.5: Programs or Mah Larg. Morga Kauma, Sa Mato, CA, 1993. [11] C. Shao. A Mathmata Thory o Commuato. B Systms Tha Joura, 27:379 423 ad 623 656, 1948. [12] T. Stoky, A. Faaborg, ad H. Lbrma. A Commoss Approah to Prdtv Txt Etry. I CHI 04: CHI 04 Extdd Abstrats o Huma Fators Computg Systms, pags 1163 1166, Nw York, NY, USA, 2004. ACM. [13] K. TaakaIsh. Wordbasd Prdtv Txt Etry usg Adaptv Laguag Mods. Natura Laguag Egrg, 13(1):51 74, 2007. [14] A. Va d Bosh. Saab Cassatobasd Word Prdto ad Cousb Corrto. Tratmt Automatqu ds Lagus, 46(2):39 63, 2006. 470 IO