TrinityHall, Cambridge, England.



Similar documents
Small Business Administration

A) the use of different pens for writing B) learning to write with a pen C) the techniques of writing with the hand using a writing instrument

WASHINGTON UNDERINSURED MOTORISTS COVERAGE SELECTION/REJECTION

3.Processstatemonitoring

Control Systems with Actuator Saturation

Signature verification using Kolmogorov-Smirnov. statistic

Performative Design for Spatial Acoustics

University of Cambridge ESOL Examinations Certificate in Teaching English to Speakers of Other Languages 2015/2016 Course Information

How To Check If A Shipyard Is Asbestos Free

WASHINGTON UNDERINSURED MOTORISTS COVERAGE SELECTION/REJECTION

CANADIAN PAYMENTS ASSOCIATION ASSOCIATION CANADIENNE DES PAIEMENTS RULE G12

UNIT 21. CONCESSIVE, PURPOSE, REASON

STEP-BY-STEP INSTRUCTION GUIDE FOR ONLINE APPLICATION TO STUDY AT UP

Bank or building society account details form

348 Birch s Road North Bay, ON P1B 8Z4 (ph) (fax) FREIGHT LOSS AND DAMAGE CLAIMS PROCEDURE

On the first page, select the appropriate semester and click "Search Schedule of Classes." You'll now see the online Schedule of Classes search page.

D2.4: Two trained semantic decoders for the Appointment Scheduling task

FREIGHT LOSS AND DAMAGE CLAIMS PROCEDURES

CITY of DALY CITY INSURANCE REQUIREMENTS

Bookkeeping and Accounting

GEORGIA UNINSURED MOTORISTS COVERAGE SELECTION/REJECTION

Suggestions and patter for the Debbie Hepplewhite method of teaching print handwriting

Packaging Guide. Mortgage Applications.

Integrated Development of Distributed Real-Time Applications with Asynchronous Communication

A Framework for the Semantics of Behavioral Contracts

How To Pass Cambriac English: First For Schools

CSCE 314 Programming Languages

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Inspection campaign summary report

SPA 475 Spring 2013 Methods for Teaching Spanish as a Second Language Department of Literature and Languages Texas A&M University-Commerce

New Payroll Experience Payroll Control Center in Customer

Copyright 2007 Casa Software Ltd. ToF Mass Calibration

Standards Designation and Organization Manual

PET Speaking Part 2 teacher s notes

application form ECCLESIASTICAL INVESTMENT FUNDS Version 3

EPEAT CONFORMITY ASSESSMENT PROTOCOLS : 4.4 Product longevity/life cycle extension

Recent changes to Queensland s Workers Compensation Scheme

Reading Assistant: Technology for Guided Oral Reading

2013/2014 TAX YEAR NEW SUBSCRIPTIONS. Applications must be received by 15 November 2013.

BUPA MEDICAL GAP SCHEME

If you are a citizen of any other country, please provide country name below:

Electronic Health Records

Sheffield Hallam University BSc (Hons) Hospitality Business Management


Appendix... B. The Object Constraint

Application Form Pure Lump Sum Plan

Cambridge English: Preliminary (PET) Frequently Asked Questions (FAQs)

The different kinds of variables in a Java program

Electronic Health Records

Tall Building Form Generation by Parametric Design Process

Please click on ANNX-K menu option as in the screen below.

GUIDELINES ON THE DEVELOPMENT AND USE OF A NATIONAL VALUATION DATABASE AS A RISK ASSESSMENT TOOL

Private Medical Insurance

Forlorn Hope : tracing the dynamics of composer-performer collaboration

Ege Fren Ege Fren San. ve Tic. A.Ş.

INTERNATIONAL STANDARD

Professional Indemnity Insurance Architects and Consulting Engineers Single Project Cover

Tropical Tracks. Tropical rainforests are located along the Equator. Look at the map in the Biome. Draw the Equator on your map and label it.

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

MODELING)THE)LOJACK)EFFECT)IN)THE)) CYBER)SECURITY)MARKET))

HYPOTHESIS TESTING WITH SPSS:

Graduate Assessment Test (Sample)

Using AR Student Health Insurance

Good News For Credit Card Users. A Consumer Guide

University of Reading Postgraduate Research Programmes Admissions Policy

A Parents Guide to University

AXA Wealth. Uncrystallised funds pension lump sum benefit options. 1 of 5. When to use this form. Have you received advice or guidance?

...Invest with confidence INVESTMENT ACCOUNT (CORPORATE) APPLICATION FORM BUSINESS NAME: ACCOUNT NUMBER:

Writing a Literature Review in Higher Degree Research. Gillian Colclough & Lindy Kimmins Learning & Teaching Support

Coventry University. BA (Hons) Business Administration. Coventry University. Programme Overview

GE Healthcare. Electronic data interchange and proactive services for Centricity revenue cycle management customers

MICHIGAN STATUTORY WILL NOTICE. 1. Any person age 18 or older and of sound mind may sign a Will.

LITERATURE REVIEWS. The 2 stages of a literature review

Specifications on Reasonable Adjustments

BBC LEARNING ENGLISH 6 Minute Grammar Question forms

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

THE BREADBOARD; DC POWER SUPPLY; RESISTANCE OF METERS; NODE VOLTAGES AND EQUIVALENT RESISTANCE; THÉVENIN EQUIVALENT CIRCUIT

Transcription:

usingrecurrentneuralnetworks HandwritingRecognition O-lineCursive AndrewWilliamSenior A TrinityHall, Cambridge, England. Thisthesisissubmittedforconsideration forthedegreeofdoctorofphilosophy attheuniversityofcambridge. September1994

Computerhandwritingrecognitionoersanewwayofimprovingthehumancomputerinterfaceandofenablingcomputerstoreadandprocessthemany handwrittendocumentsthatmustcurrentlybeprocessedmanually.this thesisdescribesthedesignofasystemthatcantranscribehandwrittendocuments. Summary forthenormalizationandrepresentationofhandwrittenwordsaredescribed, scannedfromahandwrittenpageandproducesword-leveloutput.methods nitionispresented,followedbyadescriptionofrelevantpsychologicalre- writingrecognitionarethendescribed.acompletesystemforautomatic, o-linerecognitionofhandwritingisthendetailed,whichtakeswordimages search.previousresearchers'approachestotheproblemsofo-linehand- First,areviewoftheaimsandapplicationsofcomputerhandwritingrecog- includinganoveltechniquefordetectingstroke-likefeatures.threeprobwritingrecognitioninvestigated.themethodofcombiningtheprobability estimatestochoosethemostlikelywordisdescribed,andperformanceimprovementsaremadebymodellingthelengthsoflettersandthefrequency ofwordsinthecorpus.thesystemistestedonadatabaseoftranscriptsfrom abilityestimationtechniquesaredescribed,andtheirapplicationtohand- acorpusofmodernenglishandrecognitionresultsareshown.recognition isdescribedbothwiththesearchconstrainedtoaxedvocabularyandwith anunlimitedvocabulary. beforeassessingwherefutureworkismostlikelytobringaboutimprovements. Keywords O-linecursivescript,handwritingrecognition,OCR,recurrentneuralnet- Thenalchaptersummarizesthesystemandhighlightstheadvancesmade works,forward-backwardalgorithm,hiddenmarkovmodels,durationmod- elling. O-linehandwritingrecognition 1

ThisthesisdescribesresearchcarriedoutatCambridgeUniversityEngineeringDepartmentbetweenOctober1991andSeptember1994.Itistheresult ofmyownworkandcontainsnoworkdoneincollaboration.thelengthof thisthesis,includingreferencesandgurecaptions,isthirty-seventhousand words. Acknowledgements Firstofall,IwouldliketoexpressmygratitudetothelateProfessorFrank Robinsonwhohassupervisedmeadmirablyforthelatterhalfofthisthesis vidingtheoriginalinspirationforthiswork.iamalsoindebtedtodrtony withenthusiasticguidanceandsupport,particularlyinthelastfewweeks. Fallside,forsupervisingmeduringthersthalfofthisthesisandforpro- groupwhichfrankfallsidecreated.thegrouphasbeenanidealenvironment,bothsociallyandtechnically,inwhichtoconductresearch.thosein thegroupwhohavehelpedinthecreationofthisthesisaretoonumerousto mentionindividually. IwouldliketothankeveryoneelseintheSpeech,VisionandRobotics Declaration whosefriendshiphasbeeninvaluableinthelastthreeyears. Fuzzywhoproof-readatsuchshortnotice,andparticularlytoTimJervis forprovidingthenancialsupportnecessaryformetocarryoutthiswork. AT&THolmdelforrecentfruitfuldiscussionsthathavehelpedshapethewritingofthisthesisoryofmyfather.MyparentshavealwayssupportedmeandtothemIowe everything. Finally,Iwouldliketodedicatethisthesistomymotherandtothemem- IwouldalsoliketothankeveryoneImetatLexicus,IBMHawthorneand TheformerScienceandEngineeringResearchCouncilistobethanked SpecialthanksmustgotoChenThamandAndyPiperforfriendship;to O-linehandwritingrecognition 2

Contents 1Introduction 2Handwritingrecognition 1.1Thisthesis::::::::::::::::::::::::::::::::8 1.2Originalcontribution::::::::::::::::::::::::::9 1.3Notation:::::::::::::::::::::::::::::::::9 2.1Ataxonomyofhandwritingrecognitionproblems:::::::::11 10 7 2.2Applications:::::::::::::::::::::::::::::::15 2.1.1On-lineversuso-line:::::::::::::::::::::11 2.1.2Authoridenticationversuscontentdetermination::::12 2.1.3Writerindependence::::::::::::::::::::::13 2.1.4Vocabularysize:::::::::::::::::::::::::13 2.1.5Isolatedcharacters:::::::::::::::::::::::13 2.1.6Opticalcharacterrecognition:::::::::::::::::14 2.2.1Cheques:::::::::::::::::::::::::::::16 2.2.2Frompostcodestoaddresses:::::::::::::::::16 3Psychologyofreading 2.3Existingo-linehandwritingrecognitionsystems:::::::::18 2.2.3Formprocessing:::::::::::::::::::::::::17 2.2.4Otherapplications:::::::::::::::::::::::17 2.3.1Isolatedcharactersordigits::::::::::::::::::18 2.3.2O-linecursivescript:::::::::::::::::::::19 4Overviewofthesystem 3.1Readingbyfeatures:::::::::::::::::::::::::::23 3.2Readingbylettersandreadingbywords::::::::::::::25 3.3Lexiconandcontext:::::::::::::::::::::::::::26 3.4Summary:::::::::::::::::::::::::::::::::27 22 4.1Summaryofparts::::::::::::::::::::::::::::29 4.2Imageacquisitionandcorpuschoice:::::::::::::::::30 4.3Anoteonresults::::::::::::::::::::::::::::32 4.4Theremainingchapters::::::::::::::::::::::::33 O-linehandwritingrecognition 3

5Normalizationandrepresentation 5.1Normalization::::::::::::::::::::::::::::::34 CONTENTS 5.2Parametrization:::::::::::::::::::::::::::::40 5.1.1Baselineestimationandslopecorrection:::::::::36 5.1.2Slantcorrection:::::::::::::::::::::::::38 5.1.3Smoothingandthinning::::::::::::::::::::39 5.2.1Skeletoncoding:::::::::::::::::::::::::40 6Findinglarge-scalefeatureswithsnakes 5.3Findinghandwritingfeatures:::::::::::::::::::::45 5.4Summary:::::::::::::::::::::::::::::::::46 6.1Findingstrokes:::::::::::::::::::::::::::::48 5.2.2Non-uniformquantization::::::::::::::::::43 6.2Snakes::::::::::::::::::::::::::::::::::48 5.2.3Analternativeapproach::::::::::::::::::::44 6.3Pointdistributionmodelsandconstraints:::::::::::::50 6.4Trainingfeaturemodels::::::::::::::::::::::::52 6.5Findingfeaturematches::::::::::::::::::::::::53 6.6Discussion::::::::::::::::::::::::::::::::55 47 7Recognitionmethods 7.1Recurrentnetworks:::::::::::::::::::::::::::58 7.2Time-delayneuralnetworks:::::::::::::::::::::73 7.1.1Training:::::::::::::::::::::::::::::60 7.1.2Networktargets:::::::::::::::::::::::::62 7.1.3Generalization::::::::::::::::::::::::::63 7.1.4Understandingthenetwork::::::::::::::::::67 57 7.3Discreteprobabilityestimation::::::::::::::::::::74 8HiddenMarkovmodelling 7.4Summary:::::::::::::::::::::::::::::::::79 7.3.1Asimplesystem:::::::::::::::::::::::::75 8.1AbasichiddenMarkovmodel::::::::::::::::::::80 7.3.2Vectorquantization:::::::::::::::::::::::75 7.3.3Training:::::::::::::::::::::::::::::77 7.3.4Discussion::::::::::::::::::::::::::::78 8.2Durationmodelling:::::::::::::::::::::::::::85 8.1.1Labelling:::::::::::::::::::::::::::::84 8.1.2Decoding:::::::::::::::::::::::::::::85 8.2.1Enforcingaminimumduration::::::::::::::::86 8.2.2Parametricdistributions::::::::::::::::::::88 80 O-linehandwritingrecognition 8.3Targetre-estimation::::::::::::::::::::::::::90 8.4Languagemodelling::::::::::::::::::::::::::96 8.2.3Results::::::::::::::::::::::::::::::90 8.3.1Forward-backwardretraining:::::::::::::::::934

8.4.1Vocabularychoice::::::::::::::::::::::::96 8.4.2Grammars::::::::::::::::::::::::::::97 8.4.3Experimentalconditions::::::::::::::::::::100 8.4.4Coverage:::::::::::::::::::::::::::::102 CONTENTS 9Conclusions 8.5Rejection:::::::::::::::::::::::::::::::::105 8.6Out-of-vocabularywordrecognition:::::::::::::::::107 8.7Summary:::::::::::::::::::::::::::::::::109 8.4.5Searchissues::::::::::::::::::::::::::103 Bibliography 9.1Furtherwork:::::::::::::::::::::::::::::::112 111 114 O-linehandwritingrecognition 5

Chapter1 Introduction Theworldisllingwithcomputers.Whetherwelikeitornot,theyare becomingubiquitous.asevermorepeopleareforcedintocontactwithcomputersandourdependenceuponthemcontinuestoincrease,itisessential thattheybecomeeasiertouse.asmoreoftheworld'sinformationprocessingisdoneelectronically,itbecomesmoreimportanttomakethetransfer Bythisartyoumaycontemplatethevariationofthe23letters. ofinformationbetweenpeopleandmachinessimpleandreliable. RobertBurton.TheAnatomyofMelancholy. andtoactinhumansocietyinalessconstrainedmannerthanhaspreviously computerindustrytomakecomputersincreasingly`userfriendly'.inthis uraltopeople.thuscomputersshouldbebetterabletointeractwithpeople beenpossible.theseaimsarereectedinthemoremodestattemptbythe intelligence,issimplytoenablecomputerstoaccomplishtaskswhicharenat- forthetimebeingthelonger-termgoalsofanalysingandemulatinghuman Oneoftheaspirationsoftheeldofarticialintelligence,ifoneignores vein,computershavecomeoutoflaboratoriesandintohomesandoces; wecommunicatewiththemusingmiceandkeyboardsratherthanpunched cardsandtoggleswitches.handwritingisanaturalmeansofcommunication whichnearlyeveryonelearnsatanearlyage.1thusitwouldprovideaneasy wayofinteractingwithacomputer,requiringnospecialtrainingtouseeectively.acomputerabletoreadhandwritingwouldbeabletoprocessahost ofdatawhichatthemomentisnotaccessibletocomputermanipulation. intothecomputerrecognitionofhandwriting.onereasonadvancedisthat theoptimismaboutthecapabilitiesofimminentspeechrecognitionmachines madepeoplefeelthatotherapproacheswereunnecessary.whilesomeof thepromisesofspeechrecognitionbymachinehavealreadybeenfullled, andresearchersarestilloptimistic,someofthebenetshavebeenslowto materializeandpeoplehavethoughtagainaboutwhatisrequiredofhuman- Afterthisargument,itseemssurprisinghowlittleresearchtherehasbeen O-linehandwritingrecognition peoplecomingintocontactwithcomputers,theguremustbehigher. computerinterfaces.thoughspeechisaveryconvenientformofcommu- 1DowningandLeong(1982:p.299)quoteanestimatedworldliteracyrateof71%.Inthose 6

wheresilenceisimportant,orwherealargenumberofpeoplemustwork withcomputers,itisclearthatvoiceinputisnotthebestsolution.though nication,itisnotalwaysthemostpractical.innoisyenvironments,those computerprofessionalsandsecretarieswouldbelothtogiveuptheconvenienceandspeedofakeyboard,forthosenotfamiliarwithkeyboards,and forportableoroccasionaluse,handwritingentryisclearlyofpracticalvalue. ofhandwrittendocumentsalreadyincirculation.fromchequesandletters totaxreturnsandmarketresearchsurveys,handwritingrecognitionhasa Thishasleadtothegrowthinthelastyearortwoof`pencomputing' the useofcomputerswhichallowinputfromanelectronicstylus(geake1992). hugepotentialtoimproveeciencyandtoobviatetedioustranscription.as theeconomistrecentlysuggested,\today'sbiggestprizeincomputervision, howeveristextandhandwriting..."(browning1992). Inadditiontoapotentialmodeofdirectcommunicationwithcomputers, CHAPTER1.INTRODUCTION handwritingrecognitionisessentialtoautomatetheprocessingofamyriad 1.1Thisthesis Thisthesisinvestigatestheuseofhandwritingrecognitionasamediumof contentsofthethesis.thenextchaptersummarizestheaimsandachievementsofotherworkintheeldofhandwritingrecognitionandestablishes handwrittendocuments.laterchapterspresentresearchcarriedouttode- communicationbetweenpeopleandcomputers.afterpresentingageneral overviewofhandwritingrecognition,itfocusesontheproblemofreading ataxonomyoftheeldintowhichtheoriginalworkofthisthesiscanbetted.applicationsforhandwritingrecognitionarealsoexamined.chaptervelopacomputersystemwhichtacklesthisproblem.thesystemhasbeen describedinearlierpapers(seniorandfallside1993a;senior1993). studiesworkinthepsychologyofreading,todiscoverknowledgewhichcan beputtouseinthedesignofamachinehandwritingrecognitionsystem. Thethesisisdividedinto9chapters.Thischapterdescribestheaimand ofindividualpartsofthatsystem,includingnormalizationandrepresentation(senior1994);feature-nding(seniorandfallside1993b);probabilitsentedandadiscussionoftheirvalidity. thathasbeendesigned,andthefollowingchaptersdescribetheworkings Chapter4presentsanoverviewofthehandwritingrecognitionsystem estimationandlanguagemodelling.eachofthesechaptersincludesdetails ofexperimentscarriedouttoassesstheperformanceofthetechniquespre- thehandwritingsystemandsummarizeswhathasbeenachievedinthisprogrammeofresearch.furtherworkwhichcouldbecarriedonfromthisthesis isalsosuggested. Thenalchapterdrawstogethertheconclusionsofthechaptersabout O-linehandwritingrecognition 7

Thisthesisdescribesanew,completeo-linehandwritingrecognitionsystem.Themajororiginalcontributionsdescribedinthisthesisareasfollows: CHAPTER1.INTRODUCTION 1.2Originalcontribution Thesystemappliesanovelapproach,usingrecurrentneuralnetworks Thepsychologyofreadingliteratureisreviewed,showinghowthestudy Thetrainingofarecurrentneuralnetworkwiththeforward-backward forprobabilityestimation.whiletherecurrentneuralnetworkhaspreviouslybeenusedforspeechrecognition,ithasnotbeforebeenapplied ofhumanreadingandwritinggivesanindicationofthecharacteristics totherecognitionofhandwriting. algorithmisdescribedhereforthersttime. Themethodsusedheretonormalizehandwrittenwordsareanoriginalsynthesisofnewandestablishedtechniques.Previouslypublished methodsarecomparedandimprovedupon. whichmightproveusefulinareadingmachine. Wordsareencodedinanoriginalmannerwhichisshowntobebetterthanthecommonbit-maprepresentation,andanovelmethodof featuredetection,basedupontheuseofsnakesisdescribed. Chapter8investigatestheuseofdurationmodellingforo-linehandwritingrecognitionandinvestigatestheproblemsofout-of-vocabulary Throughoutthisthesis,thedistinctionismadebetweenahandwrittenword, 1.3Notation andtheideaofthatword.tomakethisdistinction,thefollowingtypo- wordswithlexicaoflimitedsize. graphicalconventionisemployed.torepresentahandwrittenwordorlet- ter,thefollowingfontisused:`abc fghijklmnopqr uvwxyz';andtode- notethelettersorwordsasconcepts(mcgrawetal.1994),thisfontisused: `abcdefghijklmnopqrstuvwxyz'.thepurposeofthesystemdescribedhereis totranscribe`word'into`words'.whentheinternalrepresentationofthe systemisreferredto(section5.2),asingleframeofdataisshownthus:xt; andthedatarepresentingawholewordareshownasx0.thesetoflettersasconceptsisdenotedandanarbitraryindividualletterisshowni. i P(xtji)orP(xt0ji)respectively;theprobabilitythatframetrepresents ThediscreteprobabilitiesusedthroughoutaredenotedP.Theseincludethe probabilityofoneorseveralframesofdatagiventhatframetispartofletter letterigiventhedataoftheframe P(ijxt);andtheprobabilityofthejth elementofaframext,giventhatthatframerepresentsletteri P((xt)jji). O-linehandwritingrecognition 8

Chapter2 Handwritingrecognition Ascomputerpowerhasincreasedovertheyears,andtheirrangeofapplicabilityhassimilarlyincreased,oneofthemajorgoalsofresearchintocomputershasbeentomakecomputerseasiertocommunicatewithandthusto :::avastpopulationabletoreadbutunabletodistinguishwhatis maketheirbenetsavailabletoamuchgreaternumberofpeople.oneof worthreading. themajorobstaclestotheintegrationofcomputersasuniversalinformation G.M.Trevelyan.EnglishSocialHistory. annotationstoprinteddocuments,maybehandwritten;inmanysituations itwouldbehighlydesirabletoprocessthecontentsofthesedocumentsby machine,forwhichhandwritingrecognitionisessential. onpaper.particularlywhendealingwiththegeneralpublic,ahugeamount ofocepaperworkishandwritten.lettersandfaxes,aswellasformsor processingsystemsisthefactthatmostusefulbusinessdataisstillstored havebeendeveloped,andmuchworkhasbeencarriedoutintocomputer municationbetweencomputersandawiderclassofusersinagreatervariety ofcircumstances.whileideassuchasthemouseandtouch-sensitivescreens speechrecognition,thereisstillmuchscopeformakingtheinterfacemore naturalforuserswhoarenotfamiliarwithcomputers.handwritingranks veryhighlyasawayofcommunicatinglinguisticinformationinawaywhich Similarly,computeruserinterfacesneedtobeimprovedtoenablecom- isnaturaltoverymanypeople.thoughspeechrecognitionhasbeenclaimed asthepanaceaforuser-interfaceproblems,ithasbeenslowtoachieveits promise,particularlyinnoisyenvironments,andthelimitationsofspeech morepopular.notonlyaremoreresearcherstryingtotackletheproblems recognitionhavebecomeclearerasresearchhasadvanced. havebecomeavailablewithhandwritingrecognitionsoftwareforisolated charactersandmorerecentlyforcursivescript.handwritingrecognitionsystemshavealreadystartedtobeusedforreadingzipcodesonenvelopesanableandareactuallybeingsoldasusefulproducts.oflate,pencomputers Inthelastfewyearstheeldofhandwritingrecognitionhasbecomemuch thatitpresents,butsolutionstotheseproblemsareslowlybecomingavail- O-linehandwritingrecognition 9

itisworthpresentingheretheeldofautomatichandwritingrecognitionin itsentirety.afterdescribingataxonomyoftheeld,applicationsenvisaged amountsoncheques. Beforedescribinganewhandwritingrecognitionsysteminlaterchapters, CHAPTER2.HANDWRITINGRECOGNITION Havingestablishedtheneedforautomatichandwritingrecognitioningeneral,itisusefultoexaminetheeldmorecloselyandtoidentifyseveral 2.1Ataxonomyofhandwritingrecognitionproblems ispresentedtodemonstratetheapproachestaken. forhandwritingrecognitionsystemsarediscussedandworkbyotherauthors areaswithdierentapplicationsandrequiringdierentapproaches.though methodscouldbedistinguished,handwritingrecognitionsystemsaregenerallypolarizedbetweenthosereceivingtheirdatadirectlyfromsomesortof researchers,eachconcentratingonaspecialareaofhandwritingrecognition. 2.1.1On-lineversuso-line Themajordivisionisbetweenon-lineando-linesystems.Whileother manytechniquescanbeshared,theliteraturetendstodivideintogroupsof line,problemwherethetimeorderingofstrokesisavailableaswellaspen up/downinformation;overlappingstrokescaneasilybedistinguishedand matter.intheliteraturedynamicissometimesusedtomeanon-lineand statico-line.sofar,themajorityofsystemshavetackledtheeasier,on- pendeviceattachedtothecomputer,andthosewhichrecognizehandwritingalreadypresentonapieceofpaper ahandwritingequivalentofoptical CharacterRecognition(OCR)whichisalreadywidelyusedforreadingprinted strokepositionsareaccuratelyknown.ontheotherhand,o-linesystems streamofinformation,techniquesfromspeechrecognitionhavebeensuccessfullyappliedtothisproblem,includinghiddenmarkovmodels(bellegardaetal.1994)andtime-delayneuralnetworks(schenkeletal.1994).the datafromthetabletareusually(x;y)coordinatessampledataconstantfre- hasseenmuchinvestmentinon-linesystems,andthedicultyofo-line recognitionhasdeterredresearchuntilrecently. quencyintime,thoughtheyareoftenre-parametrizedtobeequally-spaced, andrepresentedintermsofarc-length,curvature,andangle,withinforma- Sincetheon-linedatafromanelectronicstylusareaone-dimensional overlapandalackoforderinginformation.thegrowthofpencomputing havetocopewiththevagariesofdierentpentypes,widestrokeswhich tionaboutwhetherthepenistouchingthetablet.aparticularproblemof on-linerecognitionishowtohandledelayedstrokes strokeswhichare writtenaftertherestoftheword,asindotting`i'sandcrossing`t's.some authorschoosetomanagewithoutthisextradata;schenkeletal.record itsexistenceasa`hat'featureassociatedwiththestrokesoverwhichthe O-linehandwritingrecognition 10

CHAPTER2.HANDWRITINGRECOGNITION RECOGNITIONTEXTON-LINEOFF-LINE HANDWRITING IDENTIFICATIONSIGNATURE delayedstrokesoccur,andbengioetal.(1994a)representthesurrounding Figure2.1:Subdivisionsofmachinehandwritingrecognition(afterPlamondonandLorette(1989)). VERIFICATION asinthemodellingofhandwritingproductionorintheapplicationofprobabilisticrecognizersandgrammaticalconstraintscernedwitho-linehandwritingrecognition,parallelworkfromon-lineresearchisbroughtinthroughoutwhenthereisacommunityofinterests,such ingure2.1anddescribedinthefollowingsection.whilethisthesisiscononomyofbotho-andon-linehandwritinganalysisissimilar;asisshown Althoughapplicationsandtechniquesvaryconsiderably,thegeneraltax- visualcontextofallstrokessothatthedotisseenabovethecuspofthe`i'. 2.1.2Authoridenticationversuscontentdetermination Aseconddichotomyintheeld,orthogonaltotheon-line/o-linedivision isaccordingtotheinformationtobeextractedfromthehandwriting.from bothon-lineando-linedata,itmaybenecessarytodeterminetheauthorshipofthewriting,thecontentofwhathasbeenwritten,orboth.inboth approaches.techniquesalsodierdependingonwhethertheauthoristobe recognizedfromasignatureorfromapieceoftext. cases,theeectsofsomevariationsshouldbeignored.todeterminetheauthorship,dierencesinpersonalstyleshouldbehighlighted,tocapturewhat ischaracteristicaboutoneperson'swriting(theiridioscript).conversely,to determinethecontentofthewriting,thevariationsduetoidioscriptshould beeliminatedandignored.thesetworequirementsresultinverydierent poolofknownauthors,forinstanceinawriter-adaptivehandwritingrecognitionsystemwhichusesdierentparametersforwordrecognitionaccording Iftheauthorofapieceoftextorsignaturemustbedetermined,thedis- totheauthor.theformeristhemoreuseful,butofcoursetheharder,prob- tinctionismadebetweenverifyingthattheauthoristheclaimedauthor(for instanceinsecurityorbankingapplications)ormerelydecidingbetweena O-linehandwritingrecognition 11

andathoroughreviewofsignaturevericationsystems. 2.1.3Writerindependence lem.plamondonandlorette(1989)giveanoverviewofhandwritingsystems, CHAPTER2.HANDWRITINGRECOGNITION continuousspeech.analoguestoeachoftheseexistinhandwritingrecognition,andarediscussedinthisandthefollowingsections. Thewholeeldofhandwritingrecognitionissimilartothealreadywelldevelopedsubjectofautomaticspeechrecognition,whichisoftenclassied alongthelinesofspeakerdependence,vocabularysizeandisolatedwordvs. whichneedonlyrecognizethewritingofasingleauthor.insteadofcreating diculttodeviseasystemtorecognizemanypeoples'handwritingthanone asystemwhichcanrecognizeanybody'shandwriting,theproblemofmultiplewriterscouldbetackledbyasystemwhichisabletoadapttothecurrent (correspondingtospokenaccentsandidiolects).becauseofthis,itismore usedtoteachhandwritingtoanindividualandontheindividual'sidioscript Handwritingstylesareextremelydiverse,dependingbothonthepattern lotofmaterialbythesameauthor,butwouldbeofnousewhenidentifyingthecitynamesonenvelopes.alternatively,manysimilarsubsystems writer.adaptationtothewriter'sstylecouldbeusedwhenrecognizinga couldbecreated,eachrecognizingonestyleofhandwriting(oroneindividual'shandwriting).thenaglobalsystemwouldselectthesubsystemwhich alargelexicon(wherewordsaremorelikelytobesimilartoeachother). Thetaskofrecognizingwordsfromasmalllexiconismucheasierthanfrom 2.1.4Vocabularysize correspondedtoaparticularhandwritingsample. Thus,animportantcriterioninassessingsystemperformanceisthesizeofthe lexiconused.thelexiconwilldependontheapplicationoftherecognition system.forageneraltexttranscriptionsystem,alexiconof60,000words words,orpostaltownsfromenvelopes,thevocabularycanbemuchsmaller. Alternatively,itmaybenecessaryforthesystemtorecognizenon-wordsif (thenumberofreferencesinamedium-sizeddictionary),wouldcoverabout 98%ofoccurrences,andforspecicdomains,suchasreadingchequevaluesin tobeverydicultsinceinnaturalspeechwordsruntogetherwithnosilence foreignwordsornames.thisissueisdiscussedagaininsection8.4. 2.1.5Isolatedcharacters Segmentationofcontinuousspeechintoitscomponentwordshasbeenfound theuserislikelytowritewordsnotinthelexicon,suchasabbreviations, O-linehandwritingrecognition tinguishtheboundariesbetweenletters thedierencebetween`ui'and between.forsimplertaskstherecognitionismadeeasierbyforcingthe speakertopausebetweenwords.similarly,incursivescriptitishardtodis- 12

`iu'orbetween`v]'and`^'isveryslight.thetaskcanbesimpliedbyforcing thewritertoseparateletters(discretehandwriting),towriteincapitalsorfor thegreatestclarity,towriteclearlyseparatedcapitalsinpre-printedboxes. Whenhighreliabilityisrequired,thelatterconstraintsmaybeunavoidable CHAPTER2.HANDWRITINGRECOGNITION sincetheyarealreadynecessarytoenablehumanreaderstodecipherresponsesonforms.anumberofauthorshaveinvestigatedtheproblemof writeeachwordinaseparatebox,oronaguideline.theseconstraintsare mustbeseparate)orpurecursivescript. mainlytoencourageclaritysincethewordsegmentationproblemprovesless recognizingisolatedcharacters(section2.3.1),particularlyfortheproblem ofreadingpostalcodes.otherauthorshaveresearchedtherecognitionof discretehandwriting(`handprint'wherelower-caselettersarewrittenbut dicultthansegmentationintocharacters,andlessstrictconstraintscould stillensurehighaccuracysegmentationofapageintoitscomponentwords. Otherauthorshavedescribedmethodsofsegmentingpagesintowordsand distinguishingbetweengapsinwordsandgapsbetweenwords(sriharietal. Similarconstraintscanbeplacedoncursivescript,forcingtheauthorto 1993). 2.1.6Opticalcharacterrecognition O-linehandwritingrecognitionhasmuchincommonwithopticalcharacter recognition(ocr) thereadingofprintbycomputer.thisapplicationreceivedmuchattentionduringthe1980sandsuccessfulsolutionshavebeen found,withcommercialpackagesavailableformicrocomputerswhichcan readtypeinavarietyoffontsandinacertainamountofnoise.thehistory (1993).Inmoredicultsituations,thesecommercialpackagesarestillnot satisfactory.authorsdescribeproblemsworkingwithunusualcharactersets andfonts,poorqualitydocumentsordocumentsinspecialformats(bosand vandermoer1993;mcveigh1993).indeed,itisnotclearthatocriseconomicallyviableinagreatmanycaseswhenhighaccuracyisessential(olsen andcurrentstatusofocrarereviewedbymorietal.(1992)andpavlidis formssuchasblurring,mergingandslightpositionalvariations.theprocess ofhandwritingismuchmorevariableinalloftheseprocessesandsuers fromvariationsduetoothereectssuchasco-articulation theinuence onthepage,onlybeingcorruptedbyarelativelysmallamountofnoisein allletters`a'areproducedfromasinglearchetype,andthusareverysimilar recognitionisthegreatvariabilityinhandwriting.fortypeinaxedfont, ThereasonwhythesuccessofOCRhasnotcarriedoverintohandwriting ofoneletteronanother.also,withtype,thesymbolsareusuallydistinct (exceptcertainligatures,as`',whichcanbelearntasaseparatesymbol)so theproblemofsegmentationisnotpresent. suchastemplatematching,areinadequatewhenpresentedwiththegreater O-linehandwritingrecognition AsaconsequenceofthistherelativelysimpletechniquesusedinOCR, 13

carriesovertohandwritingrecognition. variabilityinhandwritingsorelativelylittleresearchintheocrliterature CHAPTER2.HANDWRITINGRECOGNITION 2.2Applications befordata-entrytoobviateakeyboardasinpencomputers,butcanalsobe usedforspecialpurposessuchasusingdynamicsignaturestoverifyidentity. Thissectionreviewssomeofthemoreimportantapplicationsthatmaybe envisagedforo-linehandwritingrecognition.on-linerecognitiontendsto developmentcosts.thisconvergencecanbeseeninthemodel-basedapproachesnowbeingused(pettierandcamillerapp1993;doermann1993), foron-linehandwritingrecognition.currently,o-lineperformancelagsbehindthatofon-linerecognitionsystems,butoverthenextfewyears,asthingrecognitionwillconverge,leadingtomoregeneralsystemsandreduced technologyimprovesitislikelythatmethodsforbothtypesofhandwrit- Onepotentialapplicationinthelongtermisinusingo-linetechniques whichinterpreto-linehandwritingasapathofinklaiddownovertime, bylookingatthepsychologyofreading(chapter3) thewaypeopleread treatbotho-lineandon-linewordsasatwo-dimensionalimage,andnotas aone-dimensionalstreamoftrajectorydata.thereasonforthiscanbeseen duction.thedatathatcanbederivedbysuchalgorithmsisverysimilarto thedataavailabletoanon-linerecognizer. ratherthanasanimagetobeanalysedindependentlyofitsmethodofpro- thewriting.sincethisinvolvesignoringthetimeinformation,atrstthis seemstobeapoormethodofanalysingon-linedata.however,theinformationinhandwritingisnottransmittedinthetimingofthepentrajectory.it isbylookingatanimage,notbyanalysingthepenpathusedtoproduce Inthelongertermthough,itwouldseemthattheconvergenceislikelyto linesystems,an`o'writtenclockwisemustberecognizeddierentlyfroman `o'writtenanticlockwise,forinthetimesequenceinformation,theyappear dierent.someonewhowrites` a'maysubsequentlyreturntoextendthe doesnotmatterwhetherthestrokesofawordarewrittenquicklyorslowly, nal`a'stroketomakethewordread` d',butthischangewouldbeloston withchangingspeed,oreveninrandomorder,sinceitistheappearanceof amachinerelyingonthetime-orderingofstrokes.ano-lineapproachignoresthesefactorsandsimplylooksatthenalpositionofthestrokes,just sourceofmis-informationisactuallyavoided.forinstance,incurrenton- thenishedwordthatmatters.thus,bydiscardingthetimesequence,a inginformationisveryusefulwhencreatinganauthorvericationsystem totheproblemofdelayedstrokes(section2.1.1).afterthesearguments,it maybeseenthat,whileon-linerecognitionisbetterthano-linenow,becausethetiminginformationgenerallyisconsistent,agoodo-lineapproach mightultimatelycopewithawidervarietyofvariation.conversely,thetim- asahumanreaderwould.thisapproachalsogivesasatisfactorysolution O-linehandwritingrecognition 14

on-linesignaturesaremuchhardertoforgethano-linesignatures,sincethe dynamicsofstrokes(withpenbothupanddown)arehardertoforgethan thenishedappearance. CHAPTER2.HANDWRITINGRECOGNITION 2.2.1Cheques Oneimportantcommercialapplicationforo-linecursivescriptisinthemachinereadingofbankcheques.Whiletheamountinguresiseasiertoread, alsoincludesignatureverication,bringingaboutanincreaseinsecuritywith thepayeecorrespondedtotheaccounttobecredited.suchasystemmight thereductionindrudgeryandtime.giventhenumberofchequespassing itshouldbecheckedthattheamountinwordsisthesame,andthiscanbe throughthebankingsystemeachday,achequereadingsystem,evenifonly asystemthatachievedhighaccuracywithoutalexicon,onecouldcheckthat usedforconrmationwherethenumericalamountisunclear.suchasystem abletocondentlyverifyhalfofthecheques,wouldsavemuchlabouron wouldonlyneedtohaveasmallvocabulary(aboutthirty-vewords).given ofachievinga1in100,000errorratefromthecombinedrecognitionofliteralandnumericalamounts,butpermitting50%ofchequestoberejectedfor manualsorting(lerouxetal.1991). maintained.theprojectsupportedbythefrenchpostocehasthegoal atediousandunpleasantjob.chequeswhichcouldnotbecondentlyveriedbymachinewouldstillbeprocessedmanually,soaccuracywouldbe 2.2.2Frompostcodestoaddresses O-linesystemscapableofrecognizingisolatedhandwrittendigitshavealreadybeencreatedandinstalledinmanypostocesaroundtheworld,as partofautomaticmail-sortingmachines.givenasystemtolocatethepostcodeonanenvelope(wangandsrihari1988;martinsandallinson1991; Palumboetal.1992)thiscanbereadandusedtodirectmailautomatically. ClearlycertaincountriessuchastheUSAareatanadvantageinhavingdigitonlyzip-codesandmanyresearchershavealreadytackledthisproblemwith thepostalcodeclassicationtoberemovedbycomparingcandidatezipcodes withcandidateaddressesinadatabaseofalladdress/zipcodecombinations, givingmorehighcondenceclassications.furthermore,forcountrieswith limitedresolutioninthepostcode,theaddresscanbeusedtoincreasethe reasonablesuccess(section2.3.1). mationcontainedintherestoftheaddress.thisallowstheuncertaintyin resolutionofsorting.u.s.postalserviceprojectsaimtousetheaddressto Toprocessmoremailautomatically,systemsmustbegintousetheinfor- whenonlythevedigitzipcodewasprovided. handwritingrecognition,sinceithasawidevarietyoflevelsofdiculty,from determinean11digitdeliverypointcodewhichspeciesasinglehouseeven O-linehandwritingrecognition Mailsortingcanbeseenasanidealapplicationforwriter-independent 15

pletedeterminationofanaddresswithoutapostcode.addressrecognition isolateddigitswrittenatpredeterminedlocationsonanenvelope,uptocom- alsoadmitsofacertainamountoferrorwhileallowingalargerejectionrate. Sincetherewillalwaysbesomeaddressesthatareillegibleorincomprehensibletoamachine,a`don'tknow'answercanbegivenandtheitemsentto CHAPTER2.HANDWRITINGRECOGNITION Anothermajorapplicationwhichisnowreceivingattentionistheautomatic postalserviceisconsideredfallibleandtheconsequentdelaysarealready tolerated. 2.2.3Formprocessing abinforhumansorting.further,somemailisalreadymisrouted,sothe public.foranythingmorethanthemostsimpleinformation,forwhichcheck boxescanbeused,repliesarehandwritteninspacesprovided.muchofthis processingofforms.formsarewidelyusedtocollectdatafromthegeneral informationmustbestoredindatabasesandcanbeprocessedautomatically onceenteredintothecomputer.dataentryiscurrentlythebottle-neckinthe process.severalauthorshavewrittensystemstosegmentthehandwritten datafromthepre-printedformandthentotranscribethehandwrittendata. Insomeapplications,thismaybeisolatedcapitalletterswritteninboxes, butworkisnowmovingontohandprint(breuel1994;garrisetal.1994). Althoughformsmustusuallybehandprintedtokeepthewritingaslegibleas possible,forhumanaswellasmachineprocessing,cursiverecognitionwould stillbeusefulforprocessingthoseformsthathavemistakenlybeenlledout incursivescript. 2.2.4Otherapplications recordingthestyleofwriting).documentswouldthenbeeasilysearchable writingrecognitioncaneasilybeenvisaged.alreadymanycompaniesuse agesofdocumentsratherthanthedocumentsthemselves.thisisclearlya verydata-intensivetask,butonewayofreducingthedatastorageistoextracttheinformationandstoretextinascii(orperhapsinaricherformat Avarietyofotherocedocumentprocessingsystemsusingo-linehand- andindexconstructionwouldbemadepossible.furtherpossibilitiesexistin readinghandwrittendocumentsfortheblindorinautomaticreadingoffaxes. electronicdocumentprocessingsystemswhichmanipulatethescannedim- Faxedorderscouldbeprocessedanddispatchedautomaticallyandstandard mostresearch.intheliteraturethereisawiderangeofpapersdescribing enquiriesrepliedtowithouthumanintervention.otherfaxescouldbefed directlyintoanelectronicmailsystem,providingattheveryleastautomatic O-linehandwritingrecognition noticationoffaxarrivalbyreadingthecoversheet,ifnotthefulltextofthe toenglishortotheromanalphabet,thoughthesehaveprobablyattracted document. Ofcourse,theadvantagesofhandwritingrecognitionarenotrestricted 16

handwritingrecognitioninamultitudeoflanguages.thebasicproblems ofhandwritingrecognitionarecommontoalllanguages,butthediversity ofscriptsmeansthatverydierentapproachesmaybeused.forexample, JapaneseKanji(MoriandYokosawa1988)andChinese(Luetal.1991)charactersarestronglystroke-based,andcharactersareeasytosegmentfrom CHAPTER2.HANDWRITINGRECOGNITION somehebrewrequireaccuraterecognitionofdiacriticmarks.govindanand Shivaprasad(1990)citemanymorelanguages. todistinguish.arabicandromanalphabetscanbecursive,andarabicand oneanother,butcharactersareverycomplexandtherearemanyclasses Thissectionreviewssomeoftheo-linehandwritingsystemswhichhave beendetailedinprint.todothisitisconvenienttoclassifythem,asdescribedabove,intoisolatedcharacterandcursivescriptsystems.hereonly 2.3Existingo-linehandwritingrecognitionsystems 2.3.1Isolatedcharactersordigits Suenetal.(1980)provideagoodreviewofhandwritingrecognitionupto laterchapterswhenparticularissuesarediscussed. 1980,concentratingonisolatedcharacterrecognition whichhadbeenthe abriefoverviewofthesesystemsisgiven.specicdetailsareprovidedin focusofresearchuntilthen.theydescribeavarietyoffeaturebasedapproachesanddividetheseintoglobalfeatures(templatesortransformations suchasfourier,walshorhadamard);pointdistributions(zoning,moments, n-tuples,characteristiclociandcrossingsanddistances)andgeometricalor lartechniques,andinvolveseparatedetectorsforeachofseveraltypesof topologicalfeatures.thelatterwere,andhaveremained,themostpopu- featuressuchasloops,curves,straightsections,endpoints,anglesandintersections.forinstance,impedovoetal.(1990)usecross-points,end-points andbend-pointsastheirfeatures,codingtheseastotheirlocationinthree horizontalandthreeverticalzoneswithineachcharacter.theencodedcharactersarethenidentiedusingadecisiontreeclassier.ellimanandbanks beingdecodedinaneuralnetwork(afeed-forwardneuralnetworkoran (1991)alsousefeatures(end-point,junction,curveandloop)eachofwhich isassociatedwithanumericalquantity,suchascurvatureorlength,before adaptivefeedbackclassier). phologicalfeaturescreatedbyseparatelyexaminingtheleft,right,topand anormalizedbitmapimageofthecharactertoberecognizedintotheirnetworks(multi-layeredperceptronandneocognitronrespectively).boththese bottomedgesofeachcharacter.theproleofthecharacterfromeachedge iscodedasaseparatefeatureforclassicationbyaneuralnetwork. LeCunetal.(1989)andFukushima(1980)taketheapproachoffeeding NellisandStonham(1991)andHepp(1991)bothusesetsofglobalmor- O-linehandwritingrecognition 17

becomemorespecializedandlesslocationspecicdeeperinthenetwork, untiltheoutputsofthenallayercorrespondtocharacters,independentof locationintheimage. networksareconstructedfromlayersofidenticalfeaturedetectors,which CHAPTER2.HANDWRITINGRECOGNITION patternrecognitionmethods(simardetal.1993;hintonetal.1992;boser Impedovoetal.1990;Lanitisetal.1993),particularlysincetheincreasingavailabilityofdatahasmadethisastandardtestproblemfortesting 1994).Isolateddigitclassiershavenowbecomesogoodthatresearchis digitsorcharactersinthelastfewyears(hepp1991;idanandchevalier1991; concentratingonreadingwholezipcodeswherethedigitsareoftentouching Ahostofotherauthorshavetackledtheproblemofrecognizingisolated (FontaineandShastri1992;KimuraandShridhar1991;Matanetal.1992), andndingoptimalcombinationsofmultipleclassiersnowseemsamore promisingwayofreducingerrorratesthanndingbetterclassiers.huang andsuen(1993)citeseveralpaperstakingthisapproach.performanceisnow tionuntilrecently,partlybecauseofthedicultyoftheproblem,butalso beinglimitedbythenumberofdigitswhichareentirelyambiguousandcould notbecondentlyclassiedbyhumanreaders. 2.3.2O-linecursivescript Theproblemofo-linecursivescriptrecognitionhasreceivedlittleatten- becauseofthelackofdata.simon(1992)andsuenetal.(1993)givebrief reviewsofscriptrecognizers,butthebestreviewisprobablybylecolinet andbaret(1994).simonmakesthedistinctionbetweenthesegmentation approachandtheglobalapproach,accordingtowhetherwordsareidentied veryfewauthorstakethelatterstrategy.plessisetal.(1993)useaholistic match,butonlytoreducethesizeoftheirlexiconbeforeusingamoredetailedrecognitionmethod.lecolinetandcrettez(1991)usethetermsexplicit byrecognizingindividuallettersorbyrecognizingwordsasawhole.infact, onadierentunitofwriting.bothapproachesusestrongevidencefrom well-writtenpartsofwords,togetherwitharestrictedlexicon,torecognize ually,orifthesegmentationisaby-productofarecognitionprocessworking wordswhicharepartiallybadlywritten. segmentationandimplicitsegmentationaccordingtowhetheranattemptis madetodividethewordintoseparatecharactersandrecognizetheseindivid- tonormalizeandcleanthedata.somepreprocessingmethodsaredescribed approachesusealexicontoconstraintheresponsestoaknownvocabulary. inchapter5.ineachcase,arecognitionstrategythenhypothesizescharacterorwordidentities,andbecauseexactrecognitionisverydicult,allthe Alltheauthorsdescribedbelowincorporatesomeformofpreprocessing O-linehandwritingrecognition cityorstatenamesinaddresses.theseauthorstakeadualapproach,with arst,quickclassicationtoreducethelexiconsize,followedbyamoreac- thatofkimuraetal.(1993b,1993a)whohavecreatedasystemforreading Perhapsthemostsuccessfulo-linehandwritingrecognitionsystemis 18

aroughexplicitsegmentationandeachsegmentisclassiedasaletter.the secondstagendsadierentexplicitsegmentationbysplittingthewordinto disjointboxesandjoiningtheboxestogetherusingdynamicprogrammingto curatesecondclassicationusingdierenttechniques.therststagends CHAPTER2.HANDWRITINGRECOGNITION Theseauthorsreportresultsof91.5%recognitionwithalexiconof1000words onthecedardatabaseofwordssegmentedfromaddressesintheu.s.mail (Hull1993). formcompletecharacters.thesearethenpassedtoacharacterclassier. intheirpaper,identifyingthesekeylettersmightbesucienttoidentify mostwords,buttheauthorsproposetheirtechniquesasawayofltering, toreducethenumberofwordsinthelexiconofpossiblematches. approachistoextractanumberofkeylettersfromeachcursiveword particularlytheinitialletterandthoseclearlyidentiablebyascenders,descendersorloops.forasmallvocabularytask(readingcheques)asdescribed CherietandSuen's(1993)approachisalsoletter-based.However,their theboundariesbetweencharacters,butalsosplitsomecharactersintotwo anexplicitsegmentationapproach,buthereeachsegmentneednotcorrespondtoacharacter.theyndpresegmentationpointswhichincludeall strokes,loopsandcusps)withinthesegmentsbyaseriesofeventdetectors PapersbySrihariandBozinovic(1987;BozinovicandSrihari1989)take ormorepieces.theythenndfeatures(16inall,includingdots,curves, andusethefeaturestoconstructletterhypothesesaccordingtostatisticsof featureoccurrencesgatheredduringtraining.wordsarehypothesizedviaa follow.theresultanttwo-lettersequencesareputontothestack,tobeexpandedwhentheyarethemostlikelysequences.attheendoftheword, stackmethod,wherethemostlikelyprexesarestoredandexpandeduntil thewordendisreached.aftertherstiterationofthisprocedure,thestack thelexicallycorrectwordthatishighestonthestackischosenasthebest match. containsallthehypothesesfortherstletterinorderoflikelihood.thetop entwritersanddierentlexica(780and7800words).testingonasingle- authordatabaseofhorizontal,non-slantingwriting,a77%recognitionrate wasobtainedonthesmalllexicon,48%onthelarge.asecondsingle-author databaseyieldeda71%recognitionrateonthesmallerlexicon. SrihariandBozinovicconductedanumberofexperiments,usingdier- (mostlikely)hypothesisisthenexpandedbylookingatwhatletterscould ofuptothreesegmentswithaneuralnetworkclassiertrainedonisolated charactersegmentationpointsandattempttoclassifysegmentsorgroups letters.incorrectsegmentationstendtogetlowerclassicationscoresthan whenaletteriscorrectlysegmented,andwhenthescoresarecombinedin ahiddenmarkovmodel,thebesthypothesisforthegroupingsofsegments andtheiridentitiesisfound.resultsof70%forsingle-authorcursiveword YanikogluandSandon(1993)takeasimilarapproach.Theyndpossible recognitionarequotedforalexiconof30,000words. O-linehandwritingrecognition Edelmanetal.(1990)havedevelopedahandwritingreaderwhichrelies

turningpointsatthetop,bottom,leftorrightofacharacter)arefoundin onthealignmentofletterprototypes.here,anchorpoints(e.g.endpoints; prototypecurves,codedassplines,whichcanbecomposedintolower-case thetestwordandthesepointsareusedtomatchthewordagainstasetof CHAPTER2.HANDWRITINGRECOGNITION characters.thesystemishand-designedandisnottrainedautomatically. Usinga30,000wordlexicon,theseauthorsobtainedan81%recognitionrate onthetrainingsetandaround50%ontestsetsbythreeauthors.thestress ofthissystemisonrecognitionwithoutalexicon,however,andrecognition ratesof8{22%aregivenforthreeauthorsincludingtheauthorwhosewriting thesetoasetofreferencewordswithdynamicprogramming.theidentied wasusedtodevelopthesystem. wordsareusedtogetherwithagrammartoverifytheamountingures.with oce.thetaskhereistorecognizeamountswritten(inwords)onpostal chequesandtousethesetoverifytheamountswritteningures.moreau tackledbyanumberofauthorsintheproblemposedbythefrenchpost etal.(1991)identifyafewcharacteristicsofthecursivewordsandmatch Theproblemofreadingtheamountoncheques(section2.3.2)hasbeen a60%rejectionrate,theerrorrateachievedis0.2%.paquetandlecourtier (1991)reduceeachwordtoaseriesofcurveswhichtheymatchtoexamples inalexicon.theyachieve60%correctonthe50%ofwordswhicharewellsegmentedandlater(paquetandlecourtier1993)achieveanerrorrateof 59%whenrejecting9.5%ofwords.Lerouxetal.(1991)taketwoparallelapproaches oneistorecognizethewordasawhole,byndingafewfeatures andcomparingwithreferencewords.thesecondisaletter-by-letterapproachwherethedesireistorecognizeonlysomeoftheletters,andtouse thisinformationtorestrictthelexicon.theirsystemcorrectlyidenties62% ofwords.thesystemdescribedbysimon(1992)achievesa0.15%errorrate witharejectrateof24%usinga25wordvocabulary. O-linehandwritingrecognition 20

Chapter3 Psychologyofreading Thereisanartofreadingaswellasanartofthinkingandanartof writing. derstoodwhatinformationpeopleusetorecognizehandwrittenwords,then Beforeattemptingthemachinerecognitionofhandwriting,itisworthwhile consideringthewaythatpeoplereadandwrite.consideringhumanreadingmayleadtoanincreasedunderstandingofthetransferofinformation throughthemediumofhandwriting,sothatitcanbeseenwhichprocesses playausefulrole,andwhicharemerelyepiphenomena.ifitcanbeun- D'Israeli. aclueisfoundastowhatfeaturesmightbeusefulforamachinerecognitionsystem.otherfeaturesarelikelytobepoorlypreservedsincetheyplay insightsastowhichfeaturesofhandwritingarerepresentationsoftheinformationandwhichmereartefactsofthegenerationprocess. nousefulrole.understandinghandwritingproductionmaysimilarlygive involvedinreadingtype,someofwhichisapplicabletocursivescript.taylorandtaylor(1983),downingandleong(1982)andraynerandpollatsek (1989)givethoroughreviewsofthepsychologyofreading.Mostresearchso farhasconcentratedonreadingindividuallettersorwordsoutofcontext.it Alargebodyofpsychologicaldatahasbeengatheredontheprocesses couldbearguedthatthisgiveslittleindicationoftheprocessesoccurringin normalreadingwheremanywordsarevisibleanditisthetextasawhole, notindividualwords,thatisimportant.howeverresultsarehardtoprove insuchanaturalenvironmentwithmanyvariables,anditisonlyunderrestrictedexperimentalconditionsthathypothesescanberigorouslytestedingwhaterrorsaremadeunderdicultconditions.onetechniqueistheuse oftachistoscopestoashawordinfrontofasubjectforaveryshorttimefollowedbyapatternedmasktoinhibiticonicmemory,whichotherwiseallows thesubjecttopreserveanimageofthewordmentallyforanuncontrolled periodoftime. Researchintoreading,asinmuchofpsychology,reliesheavilyonobserv- O-linehandwritingrecognition 21

Aswillbeseenlater,manyapproachestohandwritingrecognitionrelyon detectingfeaturesinthewriting,suchasthestrokeswhichgotomakeup 3.1Readingbyfeatures CHAPTER3.PSYCHOLOGYOFREADING ofbarsandedgesandprovideacompactrepresentationoflineswhichis particularlyappropriatetotherepresentationofwritingandprint.anumber individualletters.hubelandwiesel(1962)describetheprocessesearlyin thevisualcortex.thecomplexcellsthattheydiscoveredcodethepresence ofauthorshavesoughttodeterminewhathigher-levelrepresentationmight latedcharactersbyexaminingtheconfusionsbetweenletterspresentedei- theratadistanceorforashorttime,eccentricallyinthesubject'seldof beusedspecicallyforletters. view.boumausestheerrorsmadebysubjectstoidentifygroupsofconfusable,or`psychologicallyclose',letters.bouma'sclassicationisshownin table3.1. Bouma(1971)investigatedthefeatureswhichpeopleusetorecognizeiso- OutercontourBoumashape Short Tall innerpartsandrectangularenvelope1aszx roundenvelope obliqueouterparts CodeLetters Projecting verticalouterparts ascendingextensions slenderness descender 2eoc 3rvw 4nmu 5dhkb 6tilf Shapetype Table3.1:Boumashapes. Numberofwordssharingthesameshape 7gjpqy Outercontour+initial2301313773414723 Boumashape+initial3340492 Table3.2:Worddiscriminationusingwordshapemeasureson 1389250102452320169736 3201832031 12345678910+ ` 'wouldbecome527,butsoalsowould`bo'whichisseentobesimilar inshape.taylorandtaylorusedtheseboumashapesforastudyonthetext oftheirownbook.table3.2showsasimilarexperimentonthetextofthis Usingtheseclasses,wordscanbeencodedaccordingtotheirshape,so thetextofthisthesis. O-linehandwritingrecognition 22

asshort,tallorprojecting.theoutercontourisenoughtospecify1389ofthe techniques,andthenumberofwordsofeachshapeiscounted.theouter contourisacoarsercodingthantheboumashape,simplyclassifyingletters thesis.thewordsareclassiedaccordingtoeachoffourshapedescription CHAPTER3.PSYCHOLOGYOFREADING 3444wordsuniquely,butthereare36shapessharedbytenormorewords Boumashape,havingmoreclassesthanoutercontour,givesmoreunique shapes 3201wordsareuniquelylabelled. each.iftherstletterisknown,theambiguityisfurtherreduced.the gatethevariabilityofsomehandwritingfeaturescomparingvariationinan afewsimplefeaturescanidentifymostwords,withouttheneedtorecognizetheindividualletters.haberandhaber(1981)havecarriedoutsimilasiontreewhichmightbeusedtodistinguishthelettersofthehelveticafont byobservingonlyalimitedsetoffeatures.eldridgeetal.(1984)investi- workintotheeectivenessoflettershapeforreading,andalsogiveadeci- Thisstudyshowsthat,inconjunctionwithalexiconofpermittedwords, tioniscarriedoutbyndingwordfeaturesthatllrolesininternalmodels individual'shandwritingwiththatbetweenindividuals. inrepresentingcharacters.althoughtheirexperimentsareconductedwith machine-generatedlettersmadeupofstraightlinesegments,theyinvestigatetherecognitionoflettersatthelimitsofclass-boundaries,sotheirwork ofletters.thusaletter`b'couldbedescribedasaloopwithashortstroke isofrelevancetohandwritingrecognition.theysuggestthatletterrecogni- McGrawetal.(1994)furtherinvestigatethefeaturesthatmightbeused lowerright.theseauthorsdonotconsiderthepossibilityofoverlapping featureswhichmightcharacterizetheletteraswellifnotbetter.forinstance,a`b'couldalsobedescribedasatallstrokeoverlappingalooptothe aboveandtotheleft,orasatallstrokewithacurvedsectionjoinedatthe right.theymaketheimportantpointthatthehigher-levelfeaturesusedfor readingarenotlikelytosimplyarisebottom-upfromthevisualprocessing Ifanaccuratemodeloftheseprocessescanbefound,thenitcouldbeusedfor system,ashubelandwieselcellsdo,buttobedenedtop-downdependingontheclassestobedistinguished.thisdependsinturnonthewriting andgroupingsmadeinthatlanguage. representationofhandwritinginacompactform,andforrecognition.alimi tweenphonemeshavetobere-learntaccordingtothedierentdistinctions andplamondon(1993)discussavarietyofmodelsforhandwritinggeneration,andabbinketal.(1993)andsingerandtishby(1993)haveusedthe Hollerbach(1981)modelformodellinghandwritingforrecognition.Singer andtishbyderiveaverycompactcodewhichrepresentsthehandwriting Manystudieshavealsobeenmadeintotheprocessesinvolvedinwriting. systemtoberead,justaswhenlearninganewlanguagetheboundariesbe- butalsoallowstheeasyremovalofslant,slopeandothervariation,making thewritingmorelegible.teulings(1994)discussesfeatureextractionfrom O-linehandwritingrecognition on-linecursivescript.asyettheseapproacheshaveusuallybeenappliedto on-linescriptwherethepentrajectoryisaccuratelyknown.thestaticnature 23

(1993)showsthato-linescriptcanbeconsideredinthisway.Howeverit seemsthat,whilecompactrepresentationscanbefoundusingthemodel- ofo-linewritingdoesnotlenditselftotheseapproaches,thoughdoermann basedapproach,readingisavisualprocessanddynamicapproacheswillal- waysfailtorepresentdatasuchasthedotsonaletter`i'appropriately,for CHAPTER3.PSYCHOLOGYOFREADING Oneofthefundamentalndingsofreadingresearchistheimportanceof 3.2Readingbylettersandreadingbywords recognitionofwordsassingleentitiesandnotastheconjunctionoftheir hereitisimportantwherethedotoccurs,notwhenorhow. onetonameletters theneachgroupwasswitchedtotheothertask.noevidencewasfoundthatlearningonetaskimprovedperformanceintheother, thusonemayconcludethat\relativelyuentreadingrequiresfamiliarity withtheshapesofwords,butnotwiththelettersinthosewords."(p.195) tersareallupside-down).theytrainedtwogroups onetoreadwordsand componentcharacters.taylorandtaylorciteworkbykolers&magee, whoseexperimentsinvolvedtrainingsubjectsoninvertedtext(wheretheletrectlywhenpresentationtimeisshortenoughtoinduceerrors)whenpresentedaspartofawordthanwhenpresentedeitheronitsownorsurrounded describedbyraynerandpollatsekp.77). givenbythewordsuperiorityeect.thisisthetermusedforthephe- byarbitrarycharactersinanon-word(forinstanceinreicher'sexperiments Furtherevidenceforreadingbywordsratherthanindividuallettersis nomenonthataletterisbetterrecognized(morefrequentlyrecognizedcor- mustbeusedforthetwoscripttypesandindicatesthatthemechanismof readingismorecomplexthanitmightatrstappear.downingandleong phemic)scriptismuchlessaected.thisshowsthatdierentbrainpathways discussthepossibilityofphonological,visualorbothpathwaysforindexinganinternallexicon,andtheevidenceseemstosuggestthatpeopleuse bothacodingofthesoundsofwordsandacodingofthevisualimagewhen erscanseverelyimpairreadingofkana(syllabic)scriptwhereaskanji(mor- whichshowsthatdamagetocertainareasofthebrainsofjapaneseread- ItisinterestingtonotetheworkbyYamadori(1975)andSasanuma(1984) recognizingwordswhilereading. Letter-basedprocessFrom50msafterawordispresented,theindividual Whole-wordprocessThisisarapidprocesstakingperhaps50-100mswhich TaylorandTaylorproposeareadingmechanismwiththreepaths: dozenlettersoflongerwords. letteridentitiesarebecomingavailable.(thiscouldbeunderstoodasa progressiveincreaseinthefrequencyofthelterusedassuggestedin workbymarr(1982)).outerlettersareidentiedrst,andmaybeused isbasedonlyonthepatternofthewordasawhole,orthersthalf- O-linehandwritingrecognition 24

Scan-parseprocessThisprocessistheslowestandusestheletteridentities suxes)mayberecognizedassingleitems. anewone.theseauthorsalsosuggestthatwordunits(prexesand toadjustthersthypothesisofthewhole-wordprocess,ortogenerate CHAPTER3.PSYCHOLOGYOFREADING Readingreliesontheuseofalexiconofwords.Wordsthatarewrittenun- 3.3Lexiconandcontext toproduceaphoneticversionofthewrittenword,whichcanbeused clearlycanoftenonlybeidentiedbecauseitisknownthattheymustrep- resentarealword,ratherthanoneoftheotherletterstringsthatmightbe `readinto'thecursiveword.thewordofgure3.1acouldbeinterpreted inmanyways,butareaderwouldgenerallyoptfor`minimum'becausethat isaword.psychologicalstudieshaveveriedtheexistenceofsomeformof internallexicon,thoughtheformthatthistakesisunclear.neverthelessthe lexicaldecisiontaskisanimportanttoolinexperiments.forthistheexperimentermeasuresthetimetakentodeterminewhetherastringoflettersis awordornot. asadditionalevidenceforthewordidentity. Contextisalsosignicant.Thecorrectinterpretationoftheword`min- Figure3.1:Wordambiguity.(a)isidentiedbyrecognizingthe two`i'sandknowingthatthewordmustbeinthelexicon.(b) isstillambiguousunlesscontextissupplied. surroundingwordsthatthetwocanbedistinguished.grammarcanbesucienttodistinguishambiguouswords,bydeterminingfromthesurrounding thepsychologyofreading.)contextisalsoimportantinchoosingbetween othermightbeunderstoodinthecontextofadiscussionofnon-wordsin validwordhypotheses.thewordingure3.1bcouldequallywellbeidentiedas`clump'or`dump'oreven`jump',anditisonlyfromthemeaningof imum'ismadeevenmorelikelyinapassageaboutoptimization.(butan- contextwhetherawordisaverboranoun,orwhetheraverbistransitive ornot.toimplementthisdiscriminationinanautomaticsystem,somelanguagemodelmustbeintroducedtodeterminelegitimatewordsequences. Languagemodelsarediscussedinsection8.4. O-linehandwritingrecognition 25

consideredbyraynerandpollatsek(p.62)tobeanimportantinuenceon thereadingofthewordswithinthattext.however,theresultsquotedby Edelmanetal.(1990)showhowdicultitcanbetoidentifyhandwritten Contextisimportantfortheskilledreadingofpassagesoftext,butisnot CHAPTER3.PSYCHOLOGYOFREADING non-words,thushighlightinghowimportantarestrictedlexiconandcontext are.\incomparison,peoplerecognizecorrectly96.8%ofhandprinted tofoursubjects.thesubjectshadtotypetheirreadingofthecursivestring, withnotimelimittotheresponses,andallowingmultipleguesses.edelman Edelman's(1988)experimentconsistedofpresentingnon-wordcursivestrings characters[neisserandweene1960],95.6%ofdiscretizedhandwriting[suen1983]andabout72%ofcursivestrings(see[edelman foundtheerrorrateconsistentwiththeerrorrateforindividualletters. Theproblemofhandwritingrecognitioniscomplicatedbythefactthat 1988]appendix1)." ifthespeechisdiculttounderstandforwhateverreason.thereisfeedbackofanyerrorsthataremade,sobehaviourcanbecorrected,withthe aimoftransferringinformationmosteectively.ontheotherhand,writingisusuallyreadmuchlaterthanitiscreated,andthisfeedbackloopdoes muchhandwritingisintendedforuseonlybytheauthor.whenpeoplespeak, thatpersonistheretoqueryanyambiguitiesimmediately,ortoindicate notexist.writingnotlegibletoothersiseasilyacceptedbyanauthorwho itisinvariablywiththepurposeofbeingunderstoodbysomeoneelse,and alreadyknowswhatiswritten.particularlyifawriterisusedtowordprocessingdocumentsforconsumptionbyothers,noteswrittenforpersonaluse maybewritteninawaythatotherreaderscannotunderstand.wordsmay 3.4Summary recognize anexactingifnotimpossibletask. knowsthecontextinwhichtheywerewritten.however,itisjustsuchnotes toone'sselfthatpencomputersaredesignedtostoreand,itisclaimed, simplybecomeillegiblemnemonicscomprehensibleonlytotheauthorwho Fromtheworkthathasbeenreviewedinthischapter,itispossibletoextract anumberofimportantprincipleswhichcanbeusedforguidanceinthedesign ofamachinetoreadcursivescript.whilefollowingpsychologicalstudies mightnotyieldtheeasiestnorthebestmethodoftacklingthisproblem, beingawareofhowpeoplereadgivesanindicationoftheoperationsofthe bestreadingmachineknown.thosefactorswhichareseentobeimportant aresummarizedbelow,andtakenintoconsiderationinthedesignofthe handwritingrecognizerinthesubsequentchapters. O-linehandwritingrecognition representationallevelofthehubelandwieselcells,peoplerecognizeletters First,intherecognitionofwrittenforms,itseemsthatbeyondthesimple 26

itseemsthattheycorrespondtosuchelementsasloops,curvedstrokesand byobservinghigher-levelfeatures.thoughtheexactfeaturesareunknown, straightlinesegments.ifthesefeaturesarehowinformationisconveyedbetweenpeopleinhandwriting,thentheywouldbeagoodchoiceoffeaturefor CHAPTER3.PSYCHOLOGYOFREADING amachinehandwritingrecognizer,astheyarelikelytobeinvariantbetween writersandunderdierentconditions.further,whilepeoplelearntoread byrecognizingindividualletters,andthismightbenecessaryforneworlong words,skilledreaderstakeinwholewordsatatime.itcanalsobeseen thatreadingismadepossibleonlybyknowingthatmostwordswillfallinto apriorvocabulary,andbyusingthecontextsurroundingwordstoovercome ambiguity. O-linehandwritingrecognition 27

Chapter4 Overviewofthesystem hasbeenadearthofresearchandpublicationsontheproblemsofo-line recognition,butthatthereisgreatpotentialforapplyingsuccessfulsystems Havingreviewedtheliterature,itisapparentthatuntilrecentyearsthere particularlyinthebankingandpostalelds.recentlythesituationhas Polonius:Whatdoyoureadmylord? changed,buttherestillremainsasignicantgapbetweentheperformance Hamlet:Words,words,words. ofresearchsystemsandtheaccuracyrequiredforpracticalimplementations. Shakespeare.Hamlet. nition,fromscanningtoproducingamachine-readabledocumentofrecog- nizedwords.thischapterbrieydescribesthewholesystemandthendetails anumberofissuesrelatingtothecompletedesign,includingadescriptionof thedatabasesusedforexperiments.subsequentchapterspresenttheother Toattempttollpartofthisgap,thesystemdescribedinthisthesishas aspectsofthesysteminmoredetail. beendevelopedtocarryoutalltheoperationsofo-linehandwritingrecog- 4.1Summaryofparts Pageofhandwriting Segmentation ScanningSinglewordimage Parametrization Normalization Encodedword Recurrentnetwork DiscreteHMM RecognitionLikelihoods Languagemodelling Durationmodelling HMM Thesystemdescribedinthisthesiscanbeconvenientlydividedintothesame broadsectionsasarefoundinmostotherhandwritingrecognitionsystems, inahandwrittendocument. mainprocesseswhichmustbecarriedouttoidentifythewords Figure4.1:Aschematicoftherecognitionsystem,showingtheWord O-linehandwritingrecognition 28

andproceedsinabottom-upmanner,processingsmalleramountsofdataat successivelyhigherlevelsofrepresentation,toarriveatawordidentitywhich canbeoutputinasciicode. suchasthosedescribedinchapter2.thesystembeginswithdataacquisition CHAPTER4.OVERVIEWOFTHESYSTEM scannerisusedratherthanacamera,toensurecontrolledconditions,especiallyoflighting.avarietyofscannersisavailable,fromhand-heldunits forreadingasmallamountofmaterial,throughat-bedscannersandmachineswithsheetfeedorpage-turning,uptopostalmachineswithavery Tocapturedatafromahandwrittendocument,ingeneralsomesortof fastthroughput. andthenaseriesofimageprocessingoperationsiscarriedouttonormalize theimage,asdescribedinthersthalfofchapter5.thelatterhalfofthat chapterdiscussesthebestwayofrepresentingtheusefulinformationcontainedintheimage.thatchapterandthenextalsodiscussthederivation ofhandwrittenfeaturesfromtheimage,asasuccinctwayofdescribingthe shapeofthehandwriting. Thescannedimagemustbesegmentedintoseparatewords(section4.2) aredescribedtogetherwiththetrainingmethodforeach.fromeachofthese encodedfeatureinformation.threedierentpatternrecognitiontechniques theprobabilitiesarecombinedinahiddenmarkovmodelsystem(chapter8) whichndsthebestchoiceofwordfortheobserveddata.thissystemallows thenaturalincorporationofpriorinformationaboutthelengthsoflettersand aboutarestrictedlistofpermittedwords,aboutthegrammarofalanguage Chapter7thendiscusseshowdataprobabilitiescanbeestimatedfromthe andpotentiallyeventhesemanticcontextofthewriting. 4.2Imageacquisitionandcorpuschoice Thesystemisdesignedtoprocessdatacapturedfromascanner,butfor thenatureoftheproblemtobesolved,thecharacterofthematerial, andsoforth.johnchadwick.thedeciphermentoflinearb. Thesuccessofanydeciphermentdependsupontheexistenceand researchpurposesitisconvenienttoworkonaxeddatabasestoredon availabilityofadequatematerial.howmuchisneededdependsupon usingastandarddatabasetoproduceresultswhichwouldbeeasilycomparablewiththeresultsquotedforothersystems.inthespeechrecognitioncommunitytheproductionofstandarddatabaseshasmadeavailable largecorporaofspeechwhichindividualinstitutionscouldnotcollectthemselves.thishasenabledreliablecomparisonbetweendierentrecognition systemsandencouragedcompetition,albeittendingtonarrowthegoalsof researchtowardsperformingwellonthestandardtasks.however,atthe diskforrepeatabilityandspeed.ideallyworkwouldhavebeenconducted startofthisresearchtherewasnoo-linecursivedatabaseavailable,so O-linehandwritingrecognition 29

theonlysolutionwastocollectanewdatabase.subsequentlythecedar database(hull1993)hasbeenreleased,butitisdesignedspecicallyfor numberofspecialproblemswhichdidnotfallintothealreadywidescope thetaskofisolatedwordrecognitionfromaddressblocks,andintroducesa CHAPTER4.OVERVIEWOFTHESYSTEM ofthisresearch.theseproblemsincludehavingtodealwithoverlapping wordsandhavingtoremoveguidelines,envelopepatternsandotherclutter, thoughworkhasbeendonetoremovemuchofthisnoise(doermann1993; Kimuraetal.1993b). andmicrosoft1988)format,whennotcompressed. Thesheets,eachcontaining150{200words,werethenscannedonaat-bed scannerat300dotsperinchresolution,in8bits(256levelsofgrey)toproduce oneleperpage.eachpagetakesabout8mbytesofstorageintiff(aldus gaveclearstrokeswithsharpedges,butthestrokesarewideandoverlap. authoronaplain,whitea4sheet.thewriterusedablackbre-tippenwhich Inthedatabasecollectedforthisresearch,wordswerewrittenbyasingle realconditions,thisproblemcanbedicult.however,thereexistpublished techniquesforperformingthisoperation(garrisetal.1994;yanikogluand Sandon1993)andithasnotbeenstudiedindetailinthiswork.Forthis databaseeachwordwaswrittenwithinawideborderofwhitespacetofacilitatesegmentation.thealgorithmforsegmentationisthusverysimple, merelylookingforblankhorizontallinestopartitionbetweenlinesoftext. Thenexttaskistosegmenteachpageintoitscomponentwords.Under Withinlinesoftext,thealgorithmlooksforlonghorizontalgapsbetween words.ifthealgorithmfails,theautomaticallydeterminedboundingboxes segmentationdisplayed.wordsareautomaticallylabelledbyalignmentwith themachinereadablelewhichwasusedtopromptthewriter. aroundwordscanbemanuallyadjustedusingagraphicaltooldevelopedfor thepurpose.figure4.2showsasectionofapageofdatawiththeautomatic words(`one'to`nineteen',tensfrom`twenty'to`hundred',plus`thousand', `million'and`zero').thesewordswerechosenbecausetheyformacorpus usefulforanapplicationsuchaschequeverication,butthesmallvocabulary Initialtestswerecarriedoutonadatabaseofthenumberswrittenoutas boundingboxesdetectedautomatically. Figure4.2:Asectionofapageofthedatabase,showingthe O-linehandwritingrecognition 30

enabledareasonablestudytobemadeinashorttimeandfacilitateddata collection.tenexemplarsofeachofthesewordsweretaken:threetoserve asatrainingsetandfourastestdata(atestsetof124images),plusafurther threetobeusedasavalidationset(seesection7.1.3). CHAPTER4.OVERVIEWOFTHESYSTEM ofthelancaster{oslo/bergen(lob)corpus(johanssonetal.1986).this isanextensivecorpusofmodernenglishcollectedfromawidevarietyof wholecontainsamillionwordswithavocabularyofaround40,000words. sourcessuchasnewspapers,novelsandnon-ctionbooks.thecorpusasa Writingoutsentencesfromthiscorpusgiveslargerdatasetspermittingbettertrainingoftherecognitionsystemandlayingthefoundationsforfuture Subsequently,alargerdatasetwascreatedbythecollectionoftranscripts haveincludedpunctuationandcapitalletters.thevocabularyofthetranscribedcorpusis1334words,andresultsquotedusethislexiconsizeexcept workonlanguagemodellingtoimprovetheresults,basedonworkalready wherestatedotherwise.thesizeofthisdatabaseissucientfortrainingfor testimagesfromwordswrittenbyasingleauthor.initialtranscriptionsconsistedentirelyoflowercasewords,butsubsequentadditionstothedatabase conducted,forinstancebykuhnanddemori(1990).thelobhandwrittendatabasecontains2360trainingimages,675validationimagesand1016 single-authorrecognition,butmoredatawouldbenecessarytotacklethe writer-independenttask. availableasmoreresearchisconductedintheeld.toencouragethisandto encouragecross-testingonmultipledatasets,thedatabasedescribedabove hasbeenmadepubliclyavailable.1 4.3Anoteonresults Itishopedthatmorestandarddatabasesofo-linedatawillbecome Toprovideameasureoftheworthofeachofthetechniquespresented,experimentsaredescribedthroughoutthethesisandthecorrespondingresults comparedbytrainingacompletesystemforeachofthepossibleconditions arepresented.sincethereisusuallynodirect,objectivemeasureoftheeectivenessofonetechniquecomparedwithanother,twotechniquesareoften andtestingonanunseentestset.thenalresultsobtainedarepercentageerrorratesshowingtheproportionofwordsinthetest-setincorrectly classiedbythewholesystem.theseerrorratesareusedtocomparetwo techniquesordetermineanoptimumparametervaluebyholdingallother variablesconstant.thestandardexperimentalconditionsforeachpartofthe systemaremadeclearinthefollowingchaptersasthosepartsaredescribed O-linehandwritingrecognition ftp://svr-ftp.eng.cam.ac.uk/pub/data/handwritingpageimage.tar.gz (andaresummarizedinsection8.4.3),butmanyresultsarepresentedbefore thewholesystemhasbeenexplainedindetail.forcomparison,sincethe standardtestvocabularyis1334words,randomguessingwouldgivea99.9% 1Asampleisavailablebyanonymousftp: 31

errorrate,andguessingthemostlikelyword(`the')allthetimewouldgive a93.2%errorrate. initialconditions,resultsaresubjecttoacertainamountofvariation.where Becausethetrainingofrecurrentnetworksisfoundtobedependenton CHAPTER4.OVERVIEWOFTHESYSTEM possible,severalnetworkshavebeentrainedunderconditionsidenticalexceptfortheinitialvaluesoftheweights.fromtheseruns,anestimate^of ofthemean.however,thetrainingofrecurrentnetworksisverycomputationallyintensive,soithasnotbeenpossibletotrainmultiplenetworks out,standarderrorsestimatedfrommultiplerunsundersimilarconditions themeanpercentageerrorratecanbeobtained,ascan^,thestandarderror stancewhenseveralnetworksaretestedundertwodierentconditions,to determineifthedierenceinthemeanerrorrateissignicant.thestatistic foreveryexperiment.inexperimentswhereonlyonerunhasbeencarried arequoted.wheretwotechniquesaretobecompared,statisticaltestsare carriedout.theone-tailedstudent'st-testisusedforpaireddata,forin- R4400Indigowith150MHzclock.Alltimesareapproximate,andtesttimes parisonpurposes,alltimesaregivenastheequivalentforasilicongraphics aregivenastheaveragetimepertestwordoverthewholetestset. ofthetestisdenotedt(degreesoffreedom)andtherelevanttabulatedvalue isshownastsignicance(degreesoffreedom). 4.4Theremainingchapters Trainingandtestingtimesarequotedinthefollowingchapters.Forcom- andthecodingschemesusedtorepresentthedataforrecognition.finally, itdescribesthesimplefeatureswhichcanbeextractedfromtheskeletonof Thenextchapterdescribesthetechniquesusedtonormalizethewordimage, makesanassessmentofthewholesystemandpointstothepossibilitiesfor ahandwrittenword.chapter6describesamorecomplextechniquewhich furtherworkbuildingonthatdescribedinthisthesis. describedinchapter7,andchapter8explainsthesystemusedtomakethe choiceofthebestword,giventheseestimates. operateontheencodeddatatoderivecharacterprobabilityestimatesare canbeusedtoextractlargerscalefeatures.therecognitionsystemswhich Finally,chapter9drawstogethertheresultsofthepreviouschapters, O-linehandwritingrecognition 32

Chapter5 Normalizationandrepresentation Thesystemdescribedinthisworkisdesignedtoidentifyahandwrittenword whenpresentedwithascannedimage.asystemcouldbeenvisagedwhich identiedtheworddirectlyfromtheimagepresented,butthetaskofthe recognitionsystemisgreatlysimpliedbypreprocessingtheimage,organizingtheinformationandrepresentingitinamoreaccessiblemanner.the L'ecritureestlapeinturedelavoix. processingtobecarriedoutbeforerecognitionconsistsoftwomajorparts Voltaire. normalizationandrepresentation.therstoftheseattemptstoremove thesecondthenexpressesthesalientinformationcontainedintheimagein aconciseway,suitableforprocessingbyapatternrecognitionsystem.this chapterdescribesthenormalizationoperationsperformedoneachimageby thissystem. variationsintheimageswhichdonotaecttheidentityoftheword,and 5.1Normalization Cursivescriptvariesinmanydierentways.Inadditiontothepeculiaritiesofanauthor'sidioscript,whichmeanthatonewritercanbeidentied togiveadierentappearancetoaword.then,aproceduremustbedeterminedtoestimateeachoftheseparametervaluesfromthesampleword(or several)andnallyanotherproceduremustbefoundtoremovetheeects following: O-linehandwritingrecognition oftheparameterfromtheword.themostobviousparametersincludethe variationistoidentifycertainparametersofthehandwritingthatmayvary tobesolvedhere,allthisvariationisirrelevantandservesonlytoobscure tions,withdierentmediaandfordierentpurposes.intherecognitiontask theidentitiesofthewords,althoughinotherapplications,suchasauthor verication,this`noise'maybeofmostinterest.onewayofreducingthe amongthousands,therearethepeculiaritiesofwritingindierentsitua- 33

HeightTheheightofletterswillvarybetweenauthorsforthesametask,and SlantTheslantisthedeviationofstrokesfromthevertical.Thistendstobe foragivenauthorfordierenttasks(forinstancedependentonthesize ofguidelinesgiven,ortheamountoftexttobettedintoaspace); CHAPTER5.NORMALIZATIONANDREPRESENTATION SlopeThisistheangleofthebaselineofawordifitisnotwrittenhorizontally.Evenwhengivenahorizontalguideline,authorswillwriteallor beobserved(srihariandbozinovic1987:p.229); somewordswithnon-horizontalbases.oftenthiscanbeassumedto awriter-dependentparameter,butvariesbetweenwordstoo; StrokewidthThisdependsonsuchfactorsasthewritinginstrumentused, RotationIfthepageisskewinthescanner,thenallthewordswillberotated, thepressureappliedandtheangleofthewritinginstrumentaswellas thepapertype; bestraight,butinextremecasescurved,`hill-and-dale'baselinesmay byaprocessindependentofslantandslopewhichareshearprocesses intheproductionofthehandwriting.inthissystemthough,rotation WordImage Scanned isassumedtobesmallandisremovedbyacombinationofslantand slope-correctiontransforms. HistogramCorrection SlopeEstimation Baseline Correction Slant Smoothingand Thresholding x0 Parametrization Fitting Snake Skeleton Transform Distance Thesystemdescribedhereincorporatesnormalizationforeachofthese tonormalizetheimagebeforeitisencoded. Figure5.1:Aschematicofthepreprocessingoperationsneeded Thinning factors,reducingeachimagetooneconsistingofverticallettersofuniform O-linehandwritingrecognition 34

heightonahorizontalbaselineandmadeofone-pixel-widestrokes.figure5.1showsaschematicofthesenormalizationoperations,whichareexplainedinthischapter.thenormalizationprocessdescribedinthefollowing sectionsisillustratedforasamplewordingure5.3. CHAPTER5.NORMALIZATIONANDREPRESENTATION 5.1.1Baselineestimationandslopecorrection Thecharacterheightisdeterminedbyndingtheintuitivelyimportantlines whichareshownrunningalongthetopandbottomoflowercaselettersin gure5.2 theupperandlowerbaselinesrespectively(usingtheterminologyofsrihariandbozinovic),withacentrelinebetweenthetwo.with Upperbaseline Lowerbaseline Centreline theselines,theascendersanddescenderswhichareusedbyhumanreaders indeterminingwordshape(section3.1)canalsobeidentied. HorizontaldensityhistogramVerticaldensity Ascenders Figure5.2:Histograms,centrelineandbaselines. Descenders 1Calculatetheverticaldensityhistogrambycountingthenumberofblack Theheuristicusedforbaselineestimationconsistsofthefollowingsteps: pixelsineachhorizontallineintheimage.verticalandhorizontaldensityhistogramsareshownontherightandbottomedgesofgure5.2. 4Retainonlythepointsaroundtheminimumofeachchainofpixels. 3Findthelowestremainingpixelineachverticalscanline. 2Rejectthepartoftheimagelikelytobeahookeddescender(asinthe letters`gqy').suchadescenderisindicatedbyapeakinthevertical densityhistogram.theminimuminthehistogramabovethispointis foundandtheimageisclearedfromthatpointdownwards. tomakethebaselinehorizontal.thisstraighteningiscarriedoutbyapplicationofasheartransformparalleltotheyaxis(gure5.3c).slopecorrection 5Findthelineofbesttthroughthesepoints(gure5.3b). O-linehandwritingrecognition 6Rejecttheoutlyingpointsandcalculatethenewlineofbestt.Thisis Giventheestimateofthelowerbaseline,thewritingcanbestraightened nowconsideredtobethebaselineofthecharacter. 35

CHAPTER5.NORMALIZATIONANDREPRESENTATION (a)initialimage (b)slopeestimate (c)slopecorrected (d)cannyedgesandslantestimate withbaselineestimates. (e)slantcorrected Figure5.3:Successivestagesinthenormalization. (f)skeleton O-linehandwritingrecognition 36

canbecarriedoutonwholelinestoremoverotationinthescannedimage orskewedwriting,andthencarriedoutonindividualwordstoremovelocal transformations.next,theheightofthelowerbaselinecanbere-estimated, undertheassumptionthatitisnowhorizontal.theupperlinemaybereestimatedusingasimilarprocedure,thoughthisisfoundtobelessrobust, BozinovicandSrihari(1989)detailacomplexmethodforletterslantcorrec- becauseofthepresenceof`'strokes,whicharehardertoseparatefromthe 5.1.2Slantcorrection bodyoftextthanaredescenders,asbozinovicandsrihari(1989)observe. CHAPTER5.NORMALIZATIONANDREPRESENTATION technique,amorestableversionofthisalgorithmhasbeendeveloped. tion.thisinvolvesisolatingareasofthetextwhicharenear-verticalstrokes ingisthinnerthanexpected.however,bymakinganestimateofthewriting thicknessfromthedistancetransform(seesection6.1)andusinganiterative andestimatingtheslantofeachofthese.thisprocedurewasfoundtobe verysensitivetothethicknessofthewritingandisunreliablewhenthewrit- remainingimageisinhorizontalstrips,someofwhicharetoonarrowtouse aparameterwhichmustbespecied.aftereachsuchrowiseliminated,the andareeliminated.(asecond,lesscriticalparameteristhesmallestheight anyrowswhichcontainlongrunsofblackpixels.themaximumnumberof talrowsinawordwhichcontainhorizontalstrokes.theseareidentiedas consecutiveblackpixelswhichcanbepermittedbeforealineiseliminatedis BozinovicandSrihari'salgorithmcommencesbyeliminatingallhorizon- word.theslantiscorrectedwithashearparalleltothex-axis.figure5.3e acrossallsuchstrokesgivesanestimateoftheaverageoverallslantofthe andtheslantofthelinebetweenthetwoiscalculated.averagingtheslants showsaslant-correctedword. eachofwhichthecentroidsoftheupperandlowerhalvesaredetermined, ofhorizontalstripwhichcanbeusedtoestimatetheslope.)theremainingstripsaredividedintoboxescontainingseparate,near-verticalstrokesin bottomsectionsarenotconnectedandcannotbesensiblyusedtoestimate thestrokeslant. renementistodiscardboxesinwhichthestrokefragmentsinthetopand splitthewordintostrokesforarangeofvaluesoftherun-lengthparameterandtousethevaluewhichgivesthegreatestnumberofboxes.itis undertheseconditionsthatthebestslantestimatesareobtained.afurther Themodicationwhichhasbeenfoundtostabilizethisalgorithmisto andfoundtobemorereliable.thisinvolvesndingtheedgesofstrokes, Kimuraetal.1993a)orbyusinganedgedetectionlter.Bothofthesetechniquesgivesachainofconnectedpixelsrepresentingtheedgesofstrokes. foundtogivepoorslopeestimates,andanalternativetechniquewastried eitherbyndingthecontourofthethresholdedimage(caesaretal.1993a; Theorientationsofedgeswhichareclosetotheverticalareaveragedtogive Inpractice,despitethemodications,thealgorithmwasstillsometimes O-linehandwritingrecognition 37

anoverallslantestimate.anestimatebasedonthecanny(1986)edgedetectorhasbeenusedinthissystem.itisfoundtotendtounderestimatethe slantasingure5.3d.yanikogluandsandon(1993)ndasimilarestimate, usingthemodeslantfoundbyedgeoperatorswithin30ofthevertical. CHAPTER5.NORMALIZATIONANDREPRESENTATION Toremovenoisefromtheimage,eitherfromtheoriginaldocument,from 5.1.3Smoothingandthinning scanningdefects,orfromapplyingsheartransformstodiscreteimages,it isusefultosmooththeimage.thisiscarriedoutbyconvolutionwitha onascannedimagewhenusingablackbre-tippenonplainwhitepaper, sourcessuchaspaperquality,ageandcondition;penorpenciltype;poor illuminationwhenusingacameraratherthanaat-bedscanner;andshowthroughfromwritingontheothersideofapage. 2-dimensionalGaussianlter.Ithasbeenfoundthatthereislittlenoise appliedtoreducethestrokesinthewritingtoawidthofonepixelsothey butdegradationfromthisidealsituationispossiblefromalargenumberof canbefollowedlater.thisistheskeletonofthewordshowningure5.3f. ThealgorithmusedwasthatduetoDavies(1990:p.153). algorithmshavebeenwritten,withavarietyofproperties.lametal.(1992) everypixelblackorwhite.nextaniterative,erosivethinningalgorithmis Havingnormalizedandsmoothedtheimage,itisthresholdedtoleave presentacomprehensivereviewwith138references.despitethisdiculty, Suen1984;ArcelliandSannitidiBaja1985)didnobetter.Thereisscope formoreworkonidentifyingasuitablethinningalgorithmforhandwriting, becausetheskeletonistobecoarselyparametrizedlater,asimplealgorithm wasfoundtoworkwell,andotheralgorithmsthatweretried(zhangand Skeletonizationisanotoriouslydicultproblemtosolvewell,andmany foundasthatwhichapproximatesthepathofthepenmostclosely(correspondingtothedatareceivedinanon-linesystem),andnotanalgorithm butitwouldseemthatamodel-basedmethodsuchasthoseofpettierand whenahumanreaderobservesaword.suchaskeletonisprobablybest Camillerapp(1993)andDoermann(1993),whichusetheknowledgethatthe imageismadefromaseriesofstrokes,isthemostpromisingapproach.ultimatelywhatisrequiredisaskeletonwhichrepresentsthestrokesperceived thatbestmatchesahumanapproximationtoapixel-basedskeletonization algorithmashasbeensuggested(plamondonetal.1993).experimentswere alargeerror(oftheorderofthestrokewidth)inthereportedpenposition whenthepenanglevaries.withoutbetterhardware,thisinvestigationcould notbepursued. carriedout,matchingskeletonsofo-lineimageswiththeon-linedatafor thesamewriting,butitwasfoundthatfromconventionaltabletsthereis tobewide,makingskeletonizationdicult.inmanypapers,(e.g.caesar etal.1993a)thestrokewidthissmall,soskeletonizationworkswellandboth O-linehandwritingrecognition Itisworthnotingthatinthedatabasecollectedhere,thestrokestended 38

theskeletonandcontourwillgivegoodapproximationstothetruepenpath. 5.2Parametrization CHAPTER5.NORMALIZATIONANDREPRESENTATION desiredistheidentityofthewordsonthepage,aninformationcontentof scannedimage,whichcantake8mbofstoragespace,allthatisultimately theorderofafewhundredbytes.onewayoflookingatrecognitionisas invariantsofthewordsandsuppressesspuriousvariations,thenormalized imageneedstobeparametrizedinanappropriatemannerforinputtothe networkwhichistocarryouttherecognitionprocess.fromtheoriginal Nowthattheimagehasbeenreducedtoastandardform,whichhighlights representationisofprimeimportanceinpatternrecognitionproblemsand caneasilymeanthedierencebetweenaparticularmethodsolvingorfailingtosolveaproblem.theproblemofrepresentationisdiscussedmore aprocessofinformationsiftingwiththeultimateaimofderivingtheword generallybymarr(1982)andwinston(1984:ch.8).speechiscodedusing niquesuchasaconnectionistnetwork,theymustbereducedinnumberand transformedintoaformmoreappropriatethanagreyscaleimage.data identities.inordertoprocessthedataeectivelywitharecognitiontech- techniquessuchaslters,cepstra,melscalebinningandvectorquantization toreducetheamountofdatausedtodescribeaword,anddealswiththe beforeattemptingrecognition.theserepresentationsexpresstherelevant problemofhowthewordshouldbestberepresented. dantvariation.theremainderofthischapterdescribestheprocessesused informationinamuchmoreusefulformthantheoriginaltime-varyingvoltagemeasuredbyananaloguetodigitalconverterattachedtoamicrophone. 5.2.1Skeletoncoding Similarly,inscriptrecognition,theuseful,invariantinformationmustbeextractedfromthewrittenwordswhilediscardingthevastmajorityofredun- Themainmethodofparametrizationusedistocodetheskeletonoftheword sothatinformationaboutthelinesintheskeletonispassedontotherecognitionsystem.analternativemethod,basedonthegrey-levelimageisdescribedinsection5.2.3videdintoagridofrectangles.(figure5.4a.)theverticalstrips(frames)are ofaxedwidthforthewholeword,alengthdeterminedbytheheightestimateofthecharacter.typicallythereare6framesinthehorizontalspace occupiedbyonecharacterheight.thisassumesthatthecharacterheight Intheskeletoncodingscheme,theareacoveredbythewordisrstdi- isproportionaltothecharacterwidth,whichisavalidassumptionfornormalhandwritingbyasingleauthor,butwillnotbeasaccurateformultiple writers. O-linehandwritingrecognition 39

CHAPTER5.NORMALIZATIONANDREPRESENTATION (a)skeletonwithgrid (b)parametrizedlinesegmentdata Figure5.4:Successivestagesintheparametrization. (c)featuressuperimposedonlinesegmentdata O-linehandwritingrecognition 40

intosevenregions,eachofwhichcanbeidentiedasplayingadenite,but distinctroleintherepresentationofhandwriting.theregionsclosetothe upperandlowerbaselinesidentiedinsection5.1.1bothcontainmostof Theverticalresolutionofthegridischosensothatthewordisdivided CHAPTER5.NORMALIZATIONANDREPRESENTATION thehorizontalmovementsinaword,representingtheturningpointsatthe portantintheboumashapeofaletter(section3.1)arefoundintheregions abovethehalf-lineandbelowthebase-line,andtwomoreregionscanbe strokeswhichmakeupthemajorityofhandwriting,aswellascontainingthe tworegionsalsocontaintheendpointsofshortstrokes.themiddleregionbetweenthesetwolinescapturesimportantinformationabouttheshort internaldetailoftheletters`'and`s'.theascendersanddescenderssoim- topandbaseofmostsmallletters,andtheligaturesbetweenletters.these mentofthetrainingdataalsoincreased.thereisavariablenumberofverti- calframesinaword,withlongwordshavingmoreframesthanshortwords, identiedcontainingtheendpointsorloopsofascendersanddescenders. wasslightlylowerbecausegeneralizationwasimpaired;thestoragerequire- butagivencharacterwillalwaysoccupyapproximatelythesamenumber. Foreachoftheserectanglesinthegrid,fourbinsareallocatedtorepresentdierentlineangles(vertical,horizontal,andthelines45degreesfrom Ahigherverticalresolution(16regions)hasbeentried,butperformance these).withinthisframework,thelinesoftheskeletonimageare`coarse coded'asfollows. skeletonentersanewboxinthegrid,thesectioninthepreviousboxiscoded accordingtoitsangle.theboxassociatedwiththissegment's(x;y;)values isnow`lled'(settoone).segmentswhicharenotperfectlyalignedwiththe anglesofthebinscontributetothebinsrepresentingthetwoclosestorientations.thisrepresentationcanbeseentoresemblethehubelandwiesel cellswhichcodeinformationearlyinthevisualcortex.thesearetunedtoa Theone-pixel-widelinesoftheskeletonarefollowed,andwhereverthe particularspatiallocationandangle,butalsorespondtoedgesorbarswith similarparameters.caesaretal.(1993b)andbengioetal.(1994a)usesimilarmethodsofrepresentingo-lineandon-linecursivescriptrespectively. Thisprovidesthelatterwithamethodforcodingthespatialrelationshipsof nearbystrokes,andovercomingtheproblemsofdelayedstrokes. afullbinanditspositionandorientationcorrespondroughlytotheposition onthe`'strokewhichisbetweentheverticaland45degreessoboththese linesareshowninthecorrespondingboxesingure5.4b. andorientationofthesectionofskeletonwhichgaverisetoit.becauseof thecoarsecoding,somelinesegmentscontributetotwobinsandthisisseen Hereafter,therstframeofdataintherepresentationofawordwill Figure5.4bshowstheinputpatternschematically.Eachlinerepresents denotedxts. O-linehandwritingrecognition bereferredtoasx0andthenalframex.theframes(xs;:::;xt)willbe 41

frameswerechosenbyblindlydrawingagridonthewordimage.thewidth oftheframeswaschoseninproportiontothecharacterheight.inpracticethough,characterheightandwidthvaryindependentlyfromauthortpendently.also,ratherthanblindlyplacingtheframes,itwouldbebetterif theycouldbealignedmorewiththedata.asingleframecouldthencontain allofaverticalstroke,ratherthanstrokesslightlyotheverticalendingup intwoadjacentframes. Theabovedescriptioncodedalltheframestobeofequalwidth,andthe 5.2.2Non-uniformquantization CHAPTER5.NORMALIZATIONANDREPRESENTATION author,soitwouldbebetterifthesescalefactorscouldbeestimatedindetentiallettersegmentationpoints.afterthewordhasbeennormalized,but issimilartothesystemusedbyyanikogluandsandon(1993)forndingpo- Tocorrectthesetwoproblems,asimplesystemhasbeendevised,which perimposedonthehistogramoftheoriginalwordanditsskele- ton. Figure5.5:Thenon-uniformhorizontalquantizationschemesu- segmentsfoundunderthisscheme.thisquantizationschemeisnotcompletelyrobust,assmallchangesintheimagecanleadtodierentnumbersof minimaarefarapart,toensurethattheframesdonotexceedacertainwidth (chosenaccordingtothecharacterheight).figure5.5showsthecentresof maximaandminima,despitethesmoothing.abetterschemecouldperhaps frameboundariesaredenedtobethemidpointsbetweenadjacentmaximum/minimumpairs.furtherframesareaddedwherethemaximaand Themaximaandminimaofthesmootheddensityhistogramarefound,and beforethinning,thehorizontaldensityhistogramiscalculatedandsmoothed. O-linehandwritingrecognition 42

bedesigned,butthisonehasimprovedresultsovertheuniformquantization, asisshownintable5.1. CHAPTER5.NORMALIZATIONANDREPRESENTATION QuantizationSizeofErrorrate method network^^ dierentquantizationschemes.resultsareshownfornetworks Table5.1:Errorratesfornetworkstrainedondatasampledby Uniform Non-uniform 16011.51.60 1609.61.60 8015.60.72 withdierentnumbersoffeedbackunits(section7.1.3). 8013.31.60 5.2.3Analternativeapproach Insteadofcodingtheimageinthiscomplicatedfashion,itmaybeasked whetheritwouldnotbemucheasiertosimplypresenttherecognitionsystemwiththeimagedirectly.thiswouldreducetheamountofprocessing similarnumberofbinstotheskeletoncoding.figure5.6showssuchan required,andskeletonizationartefactswouldnotdistortthedata.thesame undersampledgrey-levelimage.eachpixelisstoredin8bitsor256levelsof normalizationproceduresmustbecarriedouttogivescale,slantandslope grey. independenceandtheimagemustbesub-sampledtoobtainamanageable amountofdata.hereaverticalresolutionof32pixelsisusedforcoding letterswiththeirdescendersandascenders.thismakeseachpixelapproximatelysquarewhenusingthesamehorizontalquantization,andgivesa O-linehandwritingrecognition Figure5.6:Theword`pound'undersampled. 43

theskeletoncodingmethodintable5.2.theskeletoncodinggivesamuch lowererrorrate. Theresultsobtainedforthispreprocessingtechniquearecomparedwith CHAPTER5.NORMALIZATIONANDREPRESENTATION Table5.2:Errorratesusinglinesegmentandundersampling preprocessingmethods. RepresentationErrorrate% Linesegments20.41.60 Undersample31.70.84 ^ ^ 5.3Findinghandwritingfeatures Theprevioussectionshavedescribedhowtheoriginalwordimagecanbe normalizedandencodedinacanonicalformsothatdierentimagesofthe samewordareencodedsimilarly.however,thecodingonlyrepresented performedonthewriting. low-levelinformationabouttheword,andcodeditfairlycoarselytoreduce DotsDotsabovetheletters`i'and` featurescanbeeasilydiscernedfromtheprocessingthathasalreadybeen theinformationburden.theperformanceoftherecognizercanbeimproved describesamethodofndinglarge-scalefeatures,butanumberofuseful bypassingitmoreinformationaboutsalientfeaturesintheword.chapter6 JunctionsJunctionsareeasilyfoundintheskeletonoftheword,aspoints rules.short,isolatedstrokesoccurringonorabovethehalflineare markedas`i'dots. strokesmeetorcross. withmorethantwoneighbours.junctionsindicatepointswheretwo 'canbeidentiedwithasimplesetof EndpointsEndpointsarepointsintheskeletonwithonlyoneneighbour LoopsLoopscanbefoundfromtheskeletonorbyperformingaconnectedcomponentanalysisontheoriginalimage,tondareasofbackground oftheskeletonizationalgorithm. upwardtodownwardarerecordedastopturningpoints.similarlyleft, TurningpointsPointswhenthedirectionofaskeletonsegmentchangesfrom andmarktheendsofstrokes,thoughtheycanbeproducedasartefacts rightandbottomturningpointscanbefound. O-linehandwritingrecognition colournotconnectedtotheregionsurroundingtheword.aloopis 44

ingsrihariandbozinovic(1987),usethetopologyofawordasafeature. Howeverthisisnotalwaysagoodchoiceofinvariantsinceextraloops codedbyanumberrepresentingitsarea.anumberofauthors,includ- caneasilybeformed,orloopsthatcouldbeexpectedmightnotbefully CHAPTER5.NORMALIZATIONANDREPRESENTATION closed.ascenderscanbecomeloops,`t'strokescanjoinupwithother letterstocreatealoop,andnormallyclosedletterslike`a'and`o'can steadoffouranglebinsateachverticalposition,tenfeaturesareencoded, arerecordedalongwiththeanglebinsforeachhorizontalstrip.thusin- usefultoknowwhetheraloopordotispresentinaparticularframe,the positionsoftheendpoints,turningpointsandjunctionsareusefulandthey Eachofthesefeaturescanbeencodedinasinglebinbut,whileitisonly beleftopenorlledinnormalhandwriting. (7(4+4+2)+2),buttheadditionalinformationimprovesthenetwork's performance.someofthesefeaturesareshowningure5.4c,superimposed onthelinesegmentfeatures.endpointsareindicatedby`'shapes,turningpointsby`<'andjunctionsby`'.table5.3showstheperformance improvementobtainedbyaddingthesefeaturestotherepresentation. horizontalbands,thisincreasesthesizeofaframefrom28bytes(74)to72 andanextratwofeaturesareassociatedwiththewholeframe.withseven Table5.3:Errorratesusinglinesegmentcodingmethod,with andwithouttheskeletonfeatures. Representation Linesegmentswithfeatures18.21.60 Errorrate% 20.41.60 ^ ^ moreeectivethanonebasedonthegreyleveloftheimage.featureshave beenextractedfromtheskeletonandarefoundtoimproverecognitionfurther. Thischapterhasdescribedavarietyofnormalizationmethodsforhandwrittenwordsandthendescribedacodingschemeforthosewords.Ithasbeen shownthatacodingbasedonextractinginformationfromtheskeletonis 5.4Summary O-linehandwritingrecognition 45

Chapter6 Findinglarge-scalefeatureswith snakes Lettherebesnakes!Andsnakestherewere,are,willbe::: wasseenthatthefeaturesgenerallyheldtobeofmostsignicanceinreadingwerelarger-scale,stroke-likefeatures.itwouldbehighlydesirableif informationaboutthepresenceofsuchfeaturescouldbedeterminedand conciselyencodedforuseinrecognition. scalefeaturesforrecognition,indeedsomearebasedentirelyontheuseof suchfeatures.thischapterdescribesanewmethodofautomaticallyndinga largeclassofstroke-likefeaturesincursivewordswrittenwithbroadstrokes. Beforedescribingthemethodusedinthissystem,itisworthlookingatthe methodsthathavebeenusedbyotherauthors. Anumberofo-linehandwritingrecognitionsystemshaveusedlarge- oninformationfromasmallareaoftheimage.however,insection3.1it thelocationandorientationofthelinesegmentsintheskeleton.thiscodingwasthenextendedtoincorporatelow-levelfeatureswhichcouldbeeasilyidentied.allofthesefeaturesweresimpleandlocal dependingonly Thepreviouschapterdescribedacodingforhandwrittenwordswhichrecords SilviaPlath.Snakecharmer. contoursofthewordimages.thefeaturesthataredenedareshortandlong linedatafromon-linetracinginformation,whichseemstohavegivensmooth curvesandnarrowstrokes.however,deningrulesthatwillreliablypickout featureswhenthereisnoiseisextremelydicult,andrelyingonthecontour meansthatfeaturesthatrunacrossintersectionscannotbedetected. strokes,curvesections,loopsanddots.theseauthorsconstructedtheiro- SrihariandBozinovic(1987)denetheirfeatureswithrulesbasedonthe tertorepresentstroke-likefeaturesinon-linehandwriting.theytanum- berofprototypestrokefeaturestotheon-linehandwrittenstring,anduse theidentitiesofthestrokesthatmatchedtondletterhypothesesandeventuallywordmatches.themethodisdescribedasopticalmatching,butthlowcurve-ttingtothecontouralone.becausestrokecontoursaresmooth, dataisagaincollectedfromagraphicstabletsothestrokesarenarrowandal- Edelmanetal.(1990)useamethodsimilartothatdescribedinthischap- O-linehandwritingrecognition 46

enoughstrokescanbematchedreliablyalongthelengthofthestrokesequence,andletterhypothesescanbeproposedsolelyonthebasisofthese features. CHAPTER6.FINDINGLARGE-SCALEFEATURESWITHSNAKES andnarrowstrokes.theyoperateonthecontouroftheimage,butthefeaturesthatshouldbedetectedarethestrokes,whicharebettercharacterized chapterdescribesamethodofndingthecentresofstrokesregardlessofthe 6.1Findingstrokes thicknessofthestroke,theirregularitiesinthestrokeedges,orthepresence ofoverlappingstrokesoredges. Theproblemwithbothoftheabovemethodsisthattheyrequirecleandata soanaturalchoiceofrepresentationtoconsideristhedistancetransform. Thisassignsavalue,D(x;y),toeachpixel(x;y)inthethresholdedimage, bythepathofthepencentrethanbyeithertheleftorrightedge.thus,this conesinthedistancetransform,thetransformincreasingthefurtherapoint isfromtheedge,andstrokesbecomeridges.nowdetectingstrokecentres whichisthedistanceofthatpixelfromthenearestbackgroundpixel,zeroif thepixelisitselfpartofthebackground.thuscirclesintheimagebecome Thecentresofstrokesarethosepartswhicharefurthestfromtheedges, Snakesaredeformablesplines(smoothcurvesegments)placedinapotential becomesaproblemofndingridgesinthedistancetransform.themethod chosentondtheseridgesissnakes. 6.2Snakes eldwhichtranslateanddeformtoreducetheirpotentialenergy.traditionallytheyhavebeenusedtondedgesingreylevelimages,byaccordinglotourstohighcontrastedges.suchauseisseenintheoriginalpaperofkass etal.(1987).furtheruseshaveincludedtrackingcurvesectionsinvideosequences(cipollaandblake1990),andextractionoffeaturesfromfaces(yuille potentialstoareasofhighcontrastsothatthesnakeseekstomatchitscon- etal.1992).inthelattercase,aparametricmodelwasbuiltforeachofthe featurestobeextracted(e.g.eyes,mouth)andthesewerettedtorealimages.leymarie(1990)usessnakestondskeletonsinmuchthesamewayas theyareusedhere,attemptingtondmaximaofthedistancetransform.the planeandtheactualsplinepathgeneratedisaninterpolationofthesepoints (gure6.1),eachpointx(s),s2[0;n 1]onthepathbeingaweightedsum remainderofthissectiondescribesinmoredetailthemechanismunderlying thesnakes'operation. seriesofncontrolpointsfpi:i=0;:::;n 1gisdenedinatwo-dimensional ofthenearestcontrolpoints'positions.b(s)isapolynomialfunctionwhich O-linehandwritingrecognition TheshapesofsnakesaregovernedbycubicB-splines(Pavlidis1992).A 47

theparameterswhichincreasesfromoneendofthecurvetotheother.the determineshowmuchweightisgiventoeachcontrolpoint,accordingto tom'controlpointsp 1=2p0 p1,andpn=2pn 1 pn 2. B-splineisforcedtoterminateattheendcontrolpointsbygenerating`phan- CHAPTER6.FINDINGLARGE-SCALEFEATURESWITHSNAKES B(s)=8><>:16s3 x(s)=nx i= 1B(s+2 i)pi 023+12(s 2)3 (s 2)22<s3 16(4 s)3 23 12(s 2)3 (s 2)21<s2 0s1 3<s4 elsewhere: (6.1) morecomplexshapes,morecontrolpointscanbeadded,buteachpointon thecurveisonlydeterminedbythefournearestcontrolpoints.other(noncubic)splinescanbedened,interpolatingmoreorfewercontrolpoints.the weightingpolynomialsensurecontinuityandsmoothness(c2). Objectp2 p 1p0p1 Normal x(s) p3p4 onanimage.howitmoves,accordingtothefeaturesintheimage,must nowbedened.apotentialfunction f(x;y)isdenedonthepixelsf(x;y)g Giventhepositionsofthecontrolpoints,thesnakecannowbelocated transformalonganormal. Figure6.1:Asnakewithfourcontrolpointsandthedistance Distancetransform Thesplineshowningure6.1hastheminimumfourcontrolpoints.For (6.2) computation. intensityi,contrastjrij2or,asinthiscase,thedistancetransformd(x;y). ins.ateachsamplepointskthenormaltothecurveissearchedforthe wherethesnakeistobeattractedtocurvesofhighvaluesinf.fmightbe O-linehandwritingrecognition Herethecity-blockmetricD=jxj+jyjhasbeenusedforsimplicityof ThesplinecurvesaresampledsothatMsamplesaregeneratedperunit 48 Snakelocation Idealdisplacement

snaketowardsthelocalmaxima.sinceeachsamplepointisaweightedsum minimumofthepotentialfunction fwithinacertaindistanceoneither side.thedisplacementoftheminimumisrecordedforeachsamplingpoint, andthesedisplacementsarethenaddedtothecontrolpointstomovethe CHAPTER6.FINDINGLARGE-SCALEFEATURESWITHSNAKES thedisplacementd(s)isdistributedamongthesecontrolpoints: ofthenearestfourcontrolpoints: x(sk)=b(sk+2 i)pi+b(sk+1 i)pi+1+b(sk i)pi+2 Thenewcontrolpointsdeneasplinewhichliesclosertothelinesoflocal pi(t+1)=pi(t)+1mxkb(sk+2 i)d(sk): +B(sk 1 i)pi+3; (6.3) maxima,andaftertwoorthreeiterationsagoodmatchwillbefoundifone ispresentinthesearchareaaroundthesnake'sinitialposition. (6.4) Asdenedabove,thesesnakesdonotservethepurposeoffeaturerecognition.Theyareveryexible,soanysnakecanadapttotawiderangeof featureshapes,evencollapsingtoapointinsomepotentialwells.tocomvature.thisgeneral`straightness'constraintsuitsthepurposesoftrackinpensateforthis,kassetal.deneaninternalenergybasedontheintegral ofrstandsecondderivativesalongthesnake'slength,topenalizehighcur- edgesinimages,buttondfeatures,theconstraintsneedtobechosenso thatthesnakecanonlymatchfeaturesofaparticularshape. ture,butabletomatchinstancesofthatfeaturewholeshapesvarysome- ofthepdmisperformingprincipalcomponentanalysisonthecovariance whichtheyuseasshapedescriptorsforvariousobjectssuchasheartsinmagwhat.cootesandtaylor(1992)describe`pointdistributionmodels'(pdmsneticresonanceimagesandresistorsonimagesofcircuitboards.theessence matrixofthecoordinatesofthecontrolpointsofasnake,andrestrictingthe Anumberofmodelsmustbegenerated,eachmatchingaparticularfea- 6.3Pointdistributionmodelsandconstraints snake'sshapetomatchshapesthathavebeenseeninatrainingset. feature,forinstancetheshortverticalstrokeofan`i',thepositionsofthe controlpointscanberecordedandstatisticsgathered.ifthekthexample featurehaspositionsk=(pk;0;:::;pk;n 1)Tthecentroidofthatexamplecan befound: IfasnakewithncontrolpointsisplacedonKexamplesofaparticular O-linehandwritingrecognition pk=pipk;i n: (6.5) 49

subtractingthecentroidsandaveraging: Themeandisplacementofeachpointfromthecentroidcanbecalculatedby CHAPTER6.FINDINGLARGE-SCALEFEATURESWITHSNAKES sisthemeanshapeofthefeatureandrepresentsatypicalexample.ifthe sk=(pk;0 pk;:::;pk;n 1 pk)t s=pksk K: (6.6) itcanbeconsideredasavectorof2ncoordinatesandthe2n2ncovariance deviationofaparticularexamplefromthemeanshapeofafeatureisfound: sk=sk s; (6.7) matrixoftheshapescanbefound: =PksksTk K: (6.9) (6.8) ofvariationinthesystem.thisisdonebydiagonalizationofthecovariance matrix.eacheigenvectorshowsacorrelationinthevariationofthepoint coordinates a`mode'ofvariationinwhichthepointsconcernedhavelinearlyrelateddisplacements.theeigenvaluesgivetheextentofvariationin eigenvectorcapturesmostofthevariationinthemodelshape.thesemodes thedirectionofthecorrespondingeigenvector,sothelargesteigenvalue's PrincipalComponentAnalysiscanbecarriedouttodeterminethemodes arestrikinglydemonstratedincootesetal.'s(1992)resistormodelwhere positionoftheresistoronitswire,thebendofthewire,andtheshapeofthe resistorbody.figure6.2showsthemajormodesofvariationoftwofeature models. therstfewmodescorrespondtonaturalphysicalparameterssuchasthe thecentroidofthesnakeiscalculatedfromthenewcontrolpointcoordinate snakewithnoconstraints,fromoneiterationofthetechniquesofsection6.2, strainthevariationofasnake.havingworkedoutthenewpositionofa Havingdeterminedthesemodesofvariation,theycanbeusedtocon- Figure6.2:Snakemodelsfor`n'and`o'featuresshowingthe vector.transformingthisdierenceintothecoordinateframeoftheprincipal majormodeofvariationwithin1:5ofthemean. theminormodesissuppressedsincethisrepresentsdeviationfromthespace oftypicalstrokeshapes.themahalanobisdistanced2(s)=st 1s O-linehandwritingrecognition componentsgivesthedeviationfromthemeanineachdirection.variationin showshowmuchthesnakedeviatesfromthemodel.thisdistancescales 50

downvariationalongtheprincipalaxes,givingameasureofhowmanystandarddeviationsthesnakeliesfromthemean,assumingthatdeviationsof snakesfromthemeanaredistributedasagaussianellipsoid.ifthedistance istoogreat,itcanbereducedbyscalingdownallcomponentsofthedeviation.theconstraineddeviationisthentransformedbacktotheoriginal haveashapesimilartothoseobservedinthetrainingset. coordinates,andaddedtothecentroidtogenerateanewsnakewhichwill applicationoftheconstraintsaretwoseparateprocesses,andbecausethe imagespaceisquantized,itispossiblethatthesnakeentersacycleofdisplacingontothemaximumandbeingconstrainedtoitsoriginalposition.the snakethusneverreachesastableposition.toavoidthiscase,thettingprocessisstoppedafteramaximumof10iterations,thoughamatchisusually foundafterjust2or3iterations. Becausethedisplacementtondthedistancetransformmaximaandthe CHAPTER6.FINDINGLARGE-SCALEFEATURESWITHSNAKES chosen.theseauthorsdonotusethedistancetransformforthematch,but insteadrelyontheskeleton,whichcanoftenbedistortedawayfromthe matchedtopre-segmentedimagesofhandwrittencharactersfromapostcode database.eachmodeliscomparedwitheachimage,andthebestmatchis modelsforisolatedcharacterrecognitionforpostcodereading.hereamodel isproducedforeachof36alphanumericcharactersandthesemodelsare Lanitis(1992)andLanitisetal.(1993)haveinvestigatedtheuseofthese 6.4Trainingfeaturemodels Inthisworktheideasofsplinesandprincipalcomponentanalysisintheform actualstrokesatintersections. ofpointdistributionmodelshavebeenlinkedtogethertoformconstrained B-splinemodelsoffeaturesofhandwrittenletters. eachofthesplinecontrolpoints;thepermittedrelativevariationsinthese iesthesefeatureshavebeen:`n'hump;`u'trough,whichalsomodelsliga- tures;`i'stroke(foundinmanylettersincluding`u'and`n');`'cross-stroke; ascender;descenderand`o'shape.eachofthesefeaturescanbemodelled byasinglespline,thoughothermodelssuchas`_'maybeconstructedby joiningmorethanone.eachmodelcontainsthemeandisplacementsof Onemodelisconstructedforeachfeaturetoberecognized.Ininitialstud- pointpositions,givenbythecovariancematrix;andthemeanandvarianceoftheobservedyco-ordinateofthecentroidspk,torecordhowhigh inawordthefeatureoccurs.thepreprocessordeterminescharactersize,so thecoordinatesarenormalizedtobeindependentofthewritingsize. acteristicsofthefeature: O-linehandwritingrecognition Initiallyaseedmodelisgeneratedbyhandtodescribethegeneralchar- Thenumberofpointsneededtomodelthefeature.Forasmall,straight feature,onlyfourpointsmaybenecessary.foralongerlineoracurve, 51

Thefeaturetopology(looporline)andtheinterconnectionofthesplines sixarefoundtobeadequate,butforan`o'or`s'feature,eightpoints arerequiredtorepresenttheshape. (whethertheyforman`_'orwhetheraloophasatailornot). CHAPTER6.FINDINGLARGE-SCALEFEATURESWITHSNAKES Thepositionofthefeatureinacharacter whetherthefeatureisin meantomatchthestroke.whenthepotentialminimumhasbeenfound,the ofhandwrittenwords.initiallythiscanbebypointingoutfeatureinstances manually,andallowingtheseedmodeltodeformwithoutconstraintfromthe Theinitialshapeofthefeature. Theseedmodelsarenowmatchedtoinstancesofthefeaturesinimages anascender,adescenderorinthemiddlesectionoflowercaseletters. snake'sshapeisaddedintothestatisticsofobservedshapes.whenagood recognizer. 6.5Findingfeaturematches modelhasbeenfound,thisprocedurecanbeautomatedsothatthefeatures inawordarefoundautomatically.theautomaticfeaturespottingisused Havingcreatedamodelforeachofthefeaturestobefound,thenextstepis bothtotrainthemodelsandsubsequentlytospotthefeaturesusedinthe constrainedtoliewithinstandarddeviationsofthemeanshape sothe themodel,isplacedattheleftedgeoftheword,andpermittedtodeform tondalloccurrencesofeachfeatureintheword.themethodsdescribed abovewillndafeaturematchifoneliesclosetothestartingpositionofthe snake,sosnakesmustbeplacedatregularintervalsalongthewordtodetect allthefeaturespresent.asnake,whoseshapeisinitiallythemeanshapefor tomatchthedistancetransformpotential,butwiththedeformationbeing shapewillalwaysbesimilartoshapesalreadytakenbythatfeaturebefore. (For,avalueof1hasbeenusedhere.)Abestmatchgiventheconstraintsis foundbyiteratingforalimitednumberoftimesoruntilthesnakeceasesto representingthedegreeofsupportthatthedataprovidesforthemodeland move.shouldthesnakemoveaboveorbelowthebandwhereitisnormally isdetermined. theamountofdeformationofthemodelrequiredtotthedata.thesupportisthesumoftwocomponents:thesumofthedistancetransformalong found,forinstancea`'strokefeaturematchingthetopofan`r',thenitis rejected.otherwise,thedegreeofmatchbetweenthesnakeandtheimage backgroundpoints,andthedeformationismeasuredwiththemahalanobis distanced(s)ofthematchshapefromthemeanshapeofthefeature. thelengthofthesnakeplusanextraweight,w,forallpointsthatarenot Thedegreeofmatch,M,isdenedasthedierenceoftwocomponents, O-linehandwritingrecognition M=Xkf(x(sk))+wk d(s) (6.10) 52

Snakeswithscoresgreaterthanathresholdareacceptedasfeaturematches, andtheremainderarerejected.theextraweightactsasapenaltyforthe CHAPTER6.FINDINGLARGE-SCALEFEATURESWITHSNAKES (typically7)andthevalueofthethresholdisadjustedinaccordancewiththis modelcrossingareasthatarenotstrokes.itsvalueisdeterminedempirically wherewk=(wiff(x(sk))6=0; valueandthemeanvalueofthedistancetransform.thismakesthematching 0otherwise: (6.11) processindependentofthewidthofthestrokessincethickstrokesgiveridges withhigherdistancetransformvaluesthanthinstrokes.themeanvalueof thedistancetransformisalsousedtoindicatethestrokewidthinthemodiedslantdetectionalgorithm,andtogivethespatialfrequencyparameter forthecannyedgedetector(section5.1.2). nottakenintoaccount,the`l'modelmightappeartomatchthe`b'along itswholelength.sinceonlyasmallpartofeachimageistobematchedata time,suchameasurewouldbeinappropriatehere. components theamountofdatamodelledbythesnakeandapenaltyfor theamountofdatawhichthesnakefailstomodel.thisistoprevent,for example,an`l'modelbeingmatchedtoa`b'.iftheunmodelleddatawere ThisisincontrasttothemeasureoftusedbyLanitis,whoaddstwo isrepeateduntilthewholewordhasbeensearchedforthatfeature.inthis way,eachfeatureismatchedacrossthewholeofeachwordinthetraining set.itispossiblethattwosuccessiveplacementsofasnakewillconverge tothesamefeature,butmultiplematchesofthissortcanberejectedonthe themeanandisdisplacedtotherightbyhalfitswidth,wheretheprocedure basisofthexco-ordinatesofthecentroidsbeingveryclose.figure6.3shows Aftereachmatch,theshapeandheightofthesnakeisre-initializedto correspondstothecentroidofthematchingmodel.infactonemodelmight spanseveralframes,butthematchisonlyrecordedinthecentralframe. allthematchesforthefeaturesusedinavarietyofwords. bytepersnakemodelisallocatedineachframe,andwheneverafeature matchisfoundthisisrecordedintheappropriateplaceintheframewhich processingformatdescribedinthepreviouschapter.inthiscase,onemore Forthisapplication,thefeaturematchesmustbecodedinthesamepremation. Table6.1:Errorrateswithandwithoutincludingsnakeinfor- Method Withsnakeinformation15.60.72 Without Errorrate(%) 18.21.60 ^ ^ tiontothebasicskeletoncodingofchapter5.addingthefeaturesintothe O-linehandwritingrecognition Table6.1showstheimprovementgainedusingsnakefeaturesinaddi- 53

CHAPTER6.FINDINGLARGE-SCALEFEATURESWITHSNAKES (a)`u'feature (c)`i'feature (b)`n'feature (e)descender (f)`o'feature (d)ascender representationreducesthesystemerrorrate. Figure6.3:Dierentfeaturesfoundautomaticallyinseveral 6.6Discussion words. Thespeedofthepreprocessingalgorithmshasnotbeendiscussedsofar, sincethesystemdescribedherehasbeendesignedforexibilityincomparingalternativealgorithmsratherthanformaximumspeed.inparticular,a largenumberofintensiverasteroperationsarecarriedout,whichcouldbe combinedforgreaterspeed.thespeedofpreprocessinginthecurrentsystemisapproximatelyonesecondperword.thiscouldeasilybeconsiderably Forthesamereason,itisfoundthatmaintainingausefuldegreeofexibilityintheconstraintsonan`o'featuretomakeittawidevarietyof`o's meansthatitisalsoexibleenoughtocollapseandmatch`i'strokes.furtherindividualconstraintscouldbeimposed,inthemannerofyuilleetal. Itisdiculttohavemanyfeaturessincewithwidestrokes,featurestend easilyparallelizableforanapplicationrequiringhighspeed. tooverlapintheirrolesandmatchthesamepartsofwords.forexample ifoneweretotraina`'shape,itwouldbelikelytomatch`i'strokestoo. reducedbyoptimizingtheprogram,andmanyoftheoperationsshouldbe O-linehandwritingrecognition (1992),butwouldmeanlosingthesimplicityofthissystem.Ifthematching andconstraintscouldbemademorereliable,itwouldbedesirabletomake 54

amorecompletesetofsnakefeaturesthatwouldprovideacompletecover ofthewordimage,accountingforalltheink.suchacodingcouldbeused asacompleterepresentationoftheword,muchmorecompactlythanthe skeletonrepresentation.then,aswithedelmanetal.'ssystem,recognition CHAPTER6.FINDINGLARGE-SCALEFEATURESWITHSNAKES doforisolatedcapitalletters.hintonetal.(1992)alsousesplinemodelsfor thecharacter.theyuseprobabilisticmethodstodeneanenergymeasure entirecharacters.theymodeltheinkofdigitimagesasbeinggenerated givingmatchesforwholecharacterswithinacursivestringasthoseoflanitis bygaussiansourcesdistributedalongasplinewhoseshapematchesthatof couldbebasedonthisrepresentationalone. whichisminimizedtoadapttheirmodelstothedata.whilethemethodis Alternatively,charactermodelscouldbedevelopedfrommultiplesnakes, otherapproaches.suchwholecharactermodelscouldalsobeadaptedto attractive,theauthorsadmitthatitisslow,andhasnotproventomatch multiplepositionsinacursivewordtondreliablecharactermatches,either forpreliminarylexiconreductionasdonebycherietandsuen(1993)oras anadditionalsourceofknowledgeforanyrecognitionsystem. O-linehandwritingrecognition 55

Chapter7 Recognitionmethods thealphabet,whichareveryfew,inalltheirrecurringsizesand aspacelargeorsmall,buteverywhereeagertomakethemout;and :::inlearningtoreadweweresatisedwhenweknewthelettersof combinations;notslightingthemasunimportantwhethertheyoccupy viouschapter.avarietyofpatternrecognitionmethodsisavailable,and notthinkingourselvesperfectintheartofreadinguntilwerecognize manyhavebeenusedforhandwritingrecognitionbyotherauthors.here istorecognizewhatisrepresentedbytheframesofdatacreatedinthepre- Thenextstageintheprocessofdeducingwordidentitiesfromhandwriting themwherevertheyarefound. threetechniquesarepresentedwhichcalculateanestimateoftheprobabilityofanygivenframebeingpartoftherepresentationofagivenletter.how theseprobabilitiesarecombinedtogethertondthemostlikelywordis explainedinthenextchapter;thischaptersimplydescribeshowtheseprobabilityestimatescanbederivedabilitiesfromasequenceofdata.thespeechrecognitioncommunityhature,threemainmethodsemerge.hiddenmarkovmodelshavebecomethe beenndingsolutionstothisproblemforsometime,andtheirsolutions areapplicabletotheproblemofhandwritingrecognition.fromthelitera- mostwidelyusedapproachtomodellingspeech(e.g.woodlandetal.1994). Feed-forwardneuralnetworkshavebeenusedbyseveralauthors,including Thereareseveralestablishedmethodsofestimatingasequenceofprob- Plato.TheRepublic. BourlardandMorgan(1993),andrecurrentneuralnetworkshavealsobeen works(tdnns),aformoffeed-forwardnetwork,areusedbyschenkeletal. successfulinthiseld(robinson1994). (1994)andMankeandBodenhausen(1994). speechrecognitionsystemwithhandwrittendata.time-delayneuralnet- thehiddenmarkovmodelsofbellegardaetal.(1994),nagetal.(1986)and Starneretal.(1994).Thelatterhaveobtainedgoodresultssimplyusinga matingprobabilitiesforshortsectionsoftheinputdata.amongtheseare Otherauthorshaveusedtheseapproachestoon-linerecognition,esti- O-linehandwritingrecognition 56

nolongerareadilyapparenttime-orderingofinformation.insteadthex-axis isdivideduptogivesuccessiveframes,processedleft-to-rightinthesame wayasscanningprocessesofreading.caesaretal.(1993b)andgillouxetal. Thesemethodsarealsoapplicabletoo-linehandwriting,thoughthereis CHAPTER7.RECOGNITIONMETHODS usesafeed-forwardnetworkforclassifyingo-linehandprintedstrings. odsofestimatingthedatalikelihoodsp(x0ji)whichareusedtondword likelihoodsinthenextchapter.theremainderofthischapterdescribeseach tionwithmanyparallelfeaturesperframethatisusedhere.breuel(1994) useasparsex-orderedseriesoflarge-scalefeatures,unliketherepresenta- (1993)usehiddenMarkovmodelsforo-linerecognition,thoughthelatter model,thoughintensivestudywasnotmadeoftdnnsbecausetheydidnot performaswellastherecurrentnetworksinearlytrials. Inthiswork,allthreeofthesemethodshavebeeninvestigatedasmeth- 7.1Recurrentnetworks Thissectiondescribestherecurrenterrorpropagationnetworkwhichhas beenusedasoneoftheprobabilitydistributionestimatorsforthehandwritingrecognitionsystem.recurrentnetworkshavebeensuccessfullyapplied tospeechrecognition(robinson1994)buthavenotpreviouslybeenusedfor handwritingrecognition,on-lineoro-line.herethetimeaxisisreplaced bythehorizontaldisplacementthroughtheword,framesrepresentingnota speechsignalovertime,butsuccessiveverticalstripsfromaword,working lefttoright.arecurrentnetworkiswellsuitedtotherecognitionofpatterns occurringinatime-seriesbecausethesameprocessingisperformedoneach process,whereveritoccursinaword.inaddition,internal`state'unitsare sectionoftheinputstream.thusaletter`a'canberecognizedbythesame availabletoencodemulti-framecontextinformationsolettersspreadover severalframescanberecognized. desiredfunctionapproximation.inthiscasethenetworkistaughttorecognizelettersandthefunctionstobeapproximatedareletterprobability distributionsp(ijxt0). perceptronswithnonlinearactivationfunctions,asdescribedbyrumelhart etal.(1986).theoutputoiofaunitisafunctionoftheinputsajandthe networkparameters,whicharetheweightsofthelinkswijwithabiasbi: Therecurrentnetworkarchitectureusedhereisasinglelayerofstandard network;thatistosaytheyarecomposedofalargenumberofsimpleprocessingunitswithmanyinterconnectinglinks.eachunitmerelyoutputsa functionoftheweightedsumofitsinputs,buttheusefulnessofsuchnetworksresidesintheexistenceoftrainingalgorithmswhichcan,byrepeated Recurrentnetworksareatypeofconnectionist(oftentermed`neural') presentationoftrainingexamples,adjusttheweightstoconvergetowardsa O-linehandwritingrecognition i=bi+xajwij: oi=fi(fjg); (7.2) (7.1) 57

Inputframes Network CHAPTER7.RECOGNITIONMETHODS Output (Characterprobabilities) Input/outputunits Feedbackunits Thenetworkisfullyconnected thatis,eachinputisconnectedtoeveryoutput.however,someoftheinputunitsreceivenoexternalinputandareconnectedone-to-onetocorrespondingoutputunitsthroughaunittime-delay Figure7.1:Aschematicoftherecurrenterrorpropagationnetwork.Forclarityonlyafewoftheunitsandlinksareshown. Unittimedelay attheinputandthefeedbackunitsareinitializedtoactivationsof0.5.the (gure7.1).theremaininginputunitsacceptasingleframeofparametrized outputsarecalculatedfromequations7.1and7.2andtheoutputletterprobabilitiesarereadofromtheoutputs.inthenextiteration,theoutputsof Duringrecognition(`forwardpropagation'),therstframeispresented Pjej(section7.1.1). inputandtheremaining26outputunitsestimateletterprobabilitiesforthe 26characterclasses.Thefeedbackunitshaveastandardsigmoidactivation thefeedbackunitsarecopiedtothefeedbackinputs,andthenextframepresentedtotheinputs.outputsareagaincalculated,andthecycleisrepeated functionf(i)=(1+e i) 1,butthecharacteroutputshavea`softmax'activationfunctionfi(fjg)=eworkoutputswillapproximatetheposteriorprobabilitiesP(ijxt0).Itwill foreachframeofinput,withaprobabilitydistributionbeinggeneratedfor minimumofthenetworkisreached,assumingthatthenetworkhasenough parametersandthetrainingschemecanndtheglobalminimum,thenet- beseenlater(chapter8)howtheseprobabilitiescanbecombinedtoobtain eachframe. O-linehandwritingrecognition Itcanbeshown(BourlardandMorgan1993:p.118)thatwhentheglobal 58... v v l e e w w w t t...

wordlikelihoodestimatesinamarkovmodelframework.thisframework makesuseofthedatalikelihoodsp(xtji)whichcanbeapproximatedbyassumingthatthecurrentcharacterclassisconditionallyindependentofthe previousframes,giventhecurrentframe.(i.e.thatp(ijxt)p(ijxt0)which CHAPTER7.RECOGNITIONMETHODS isastandardassumptionmadebyresearchersusinghiddenmarkovmodels tomodelhandwriting).thenthefollowingequationcanbeused(bourlard andmorgan1993): ofdataarepassedthroughthenetworkbeforetheprobabilitiesfortherst Theassumptionsusedinmakingthisapproximationareexplainedfurtherin thenextchapter. framearereado,previousoutputprobabilitiesbeingdiscarded.thisinput/outputlatencyismaintainedthroughouttheinputsequence,withextra, Toallowthenetworktoassimilatecontextinformation,severalframes P(xtji)/P(ijxt) P(i): (7.3) becauseofthenumberoflayersthroughwhicherrorsmustbepropagated, termdependenciesinrecurrentnetworksisnoteasy(bengioetal.1994b) toincorporatewholelettersinthecontextwouldbeideal,butlearninglong beenfoundtobemostsatisfactoryinexperimentstodate.alongerlatency tributionsforthelastframesoftrueinputs.alatencyoftwoframeshas andacompromiseisused. emptyframesofinputsbeingpresentedattheendtogiveprobabilitydis- 7.1.1Training Trainingthenetworkrequires`unfolding'itintime.Duringtrainingona agationonfourframesofdata.aninput/outputlatency(sec- tion7.1)ofoneframeisshown,sotherstoutputsarediscarded andthelastframeinputisallzeros.thefeedbackunitsareinitializedto0.5asdescribedinsection7.1.4. Figure7.2:Anetwork`unfolded'fortrainingafterforwardprop- word,theframesofdataareinputandpropagatedforward,asforrecog- O-linehandwritingrecognition 59 Input 0 0.5 t=0 Input 1 t=1 t=2 Output 0 Input 2 Output 1 Input 3 Feedback 0 Feedback 1 Feedback 2 t=3 Output 2 0 Feedback 3 t=4 Output 3

tothenetworkweightsarecalculated.thenetworkatsuccessivetimesteps backusingthegeneralizeddeltarule(rumelhartetal.1986),andchanges stored.attheendofaword,errorsinthenetwork'soutputarepropagated nition,buttheinputs,outputsandfeedbackactivationsforeachframeare CHAPTER7.RECOGNITIONMETHODS istreatedasadjacentlayersofamulti-layernetwork(gure7.2).thisprocessisgenerallyknownas`back-propagationthroughtime'.afterprocessing (+1)framesofdatawithaninput/outputlatency,thenetworkisequivalent toa(+1+latency)layernetwork.readersarereferredtorumelhartetal. (1986)andRobinson(1994)foradetaileddescriptionofthebasictraining toagoodlocalminimummorelikely.inadditiontotheincorporationof amomentumtermintheweightupdateformulae,twosuchimprovements havebeenusedinthiswork,namelyjacobs'deltabar-deltaupdaterule(jacobs1988)andbridle's(1990)softmax.theformerprovidesforindividual procedure. learningratesforeachweightwhichadaptaccordingtothesignsofsuccessiveweightchanges.thelatterprovidesadierenttransferfunctiononthprovedinavarietyofways,tospeedconvergenceandtomakeconvergence Itiswidelyrecognizedthatthisback-propagationalgorithmcanbeimdeltarulesuggestedbyRobinsonandFallside(1991)wereincorporatedand outputunitsofthenetwork,ensuringthattheoutputsarebetween0and1 gavemuchimprovedconvergence.thesechangesusemultiplicativelearning Becauseofdicultiesintrainingstability,modicationstothedeltabarsquareserrormeasuremorecommonlyusedinback-propagationnetworksputandtargetprobabilitydistributions)errorcriterioninsteadoftheleast- andsumto1(asisdesirablesincetheyaretreatedasprobabilities).this ratechangesandpreventthelearningratesfromdeviatingtoofarfromthe alsotrainsthenetworkaccordingtoarelativeentropy(betweentheouttumtermswhenthemeanoutput/targetrelativeentropyoverthetraining mean.forthisworkanadditionalmeasurewastaken,ofzeroingmomennicantistochooseanecienttrainingschedule.thisspecieshowmaningtakesseveraldaysonafastcomputer.(morethan3daysofcputimefor an80-unitnetwork.)inadditiontothemethodsdescribedabove,anumber ofotherwaystoimprovetrainingspeedhavebeenexplored.themostsig- patternsshouldbepresentedtothenetworkbeforeeachweightupdate.initiallytheweightupdatesfromdierentpatternswilltendtobeinroughly Trainingtimesforneuralnetworkscanbeverylong.Inthisinstancetrain- setincreased. thesamedirection,asthenetworkmovestoanappropriateregioninweight space.latertheupdatesfromdierentpatternswillbeindierentdirections,andtheupdatesneedtobesmoothedtondthebestdisplacementfor thewholetrainingset.thus,atthestartoftraining,weightscanbeupdated onaper-patternbasis(`on-line'or`stochastic'training),butforne-tuning neartheendoftraining,weightupdatesshouldbeaveragedoveralargerset ofdata. O-linehandwritingrecognition Inthisapplication,anumberofsimplescheduleshavebeentested,with 60

thebestbeingtostartbyupdatingonasmallnumberofwords,typically abatchoffourwordsorabout80frames.then,wheneverthemeanrelativeentropyincreases,thebatchsizeisdoubled,withacorrespondingcut inthestepsizeparameter.thiscontinuesuptoalimitof1024wordsper CHAPTER7.RECOGNITIONMETHODS batch(roughlyathirdofthetrainingset).themomentumfactoralsocontrolsthissmoothing,butnoschedulebasedonchangingthisparameterwas foundtobeasgood.thisis,however,themethodpreferredbyrobinson (1994)whoincreasesthemomentumparameter(thedegreeofsmoothing) overtime.bourlardandmorgan(1993)alsopreferon-linetraining.the choiceisperhapslargelytodowiththesizeofthetrainingset.althoughthe handwritingdatabasewaslarge(56,000frames),itwasfeasibletocalculatea weightupdatebasedonathirdofthetrainingset,whichisimpossibleforthe muchlargerspeechdatabases.thepresentationofallthetrainingexamples tothenetworkiscalledanepoch.thenumberofweightupdatesperepoch canbeobtained.themethoddidnotperformwellwiththeon-linetraining decreasestothreeduringtraining. usedhere,astheshapeoftheerrorsurfaceisdierentforeachbatchofdata. approximatestheerrorsurfaceasaquadratic,withdiagonalcovariance,and basedonthewholedataset,soagoodestimateofthetrueerrorsurface usesquadraticinterpolationtopredicttheminimumineachdimension.this iseectiveforsmalldatasetproblems,whereweightupdatesarealways TheQuickpropweightupdatescheme(Fahlman1988)wasalsotried.This 7.1.2Networktargets Fortraining,atargetvaluemustbegiven,againstwhichthenetworkoutput canbecomparedinordertocomputetheerrorintheoutputsandtheweight thisproblem.unlikethesegmentationproblemofmosthandwritingsystems thetrainingdata,indicatingthecorrectclass theclassforwhichthenetworkoutputshouldbeone,allothersbeingzero.withthedatacollected wordimage(section4.2).however,thelabellingofindividualframeswith updates.thetargetvalueisgivenintheformofalabelforeachframeof aletterlabeltoeachoftheframesofatrainingword.thisisonlyfortrainingpurposes,andneednotbecarriedoutontestwords.innewdata,this frame/lettercorrespondenceisnottriviallydetermined;itcanonlybetruly carriedoutbyaccuraterecognition acatch22situation.forsomeprob- (section2.3.2),thisisnottheproblemofdeterminingwherethetestword imagemustbesplittoseparateitscomponentletters,butthatofassigning hereitisarelativelysimplemattertoassociatethewordlabelwitheach thecorrespondingclassisnotaseasy,andsomethoughtmustbegivento O-linehandwritingrecognition whereitsownsegmentationsaremoreaccurate.handsegmentationwould lems,suchasspeechrecognition,peoplehaveresortedtohand-labellingdata togiveaninitialtrainingset.thishasbeenavoidedherebyusinga`bootstrap'schemewhichderivesanapproximatesegmentationfromaverynaive technique.thissegmentationisgoodenoughtotrainthenetworktoapoint 61

bemoreaccuratestill,somightgiveimprovedresults,butwouldrequirea largeamountoftediouswork,forlittleornogain. inanywordisassumed(thoughthisisclearlyinaccurate)tooccupythesame Theschemeusedinitiallyisan`equallength'scheme,whereeachletter CHAPTER7.RECOGNITIONMETHODS therst+1 longerthanotherlettersand`i'and`'areshorter.lettersintheseclasses aregivenrelativelengthsof3and1respectively,comparedto2forother letters.theframesarethenlabelledinproportiontotherelativelengthsof numberofframesofinput.thus,inannletterwordwhichtakes+1frames, thelettersintheword.thus,intheword`wi',thersthalfoftheframes forexample,onequarteroftheframesareassumedtobelongtoeachletter. Thiscanbemadeslightlymoreaccuratebyrecognizingthat`^'and`m'are nframesarelabelledwiththerstletteroftheword.in`noun', wouldbeconsideredtorepresentthe`^',thenextsixththe`i'andtheremainingthirdthe`'.itisthissegmentationthatgivesthetargetswhichthe recurrentnetworkistrainedtoreproduce.thetargetsaresettooneforthe describedinchapter8. 7.1.3Generalization Aproblemwithnetworktrainingistoobtaintheoptimumsolutiontothe Thesetargetsareonlyusedforpreliminarytraining.Re-estimatedtargets correctclassandzeroforallotherclasses. areusedtoachievegreaterperformance.there-estimationprocesswillbe trade-obetweentrainingandgeneralization.thiswell-knownproblemcan perhapsbestbeseenbyconsideringtheproblemofcurve-ttingtondata points.an(n 1)thorderpolynomialcanbefoundtoperfectlyinterpolate anysuchset,butifthereisanynoiseinthedata,thevaluesonthecurve betweenwillcorrespondbadlytothevaluesofanysubsequentlyobserved data-points.thecurveisover-tted,andgeneralizationispoor.similarly, targetsarbitrarilyclosely.however,suchanetworkwillgivepoorgeneralizationandmakepoorpredictionsforinputsotherthanthoseincludedinthe trainingset. worksizeisrightforthesizeoftheproblem.inthiscasethenumberof intrainingarecurrentnetwork,givenenoughtimeandcomputingpower itshouldbepossibletotrainalargeenoughnetworktomatchthedesired priatetothetasktobesolved(e.g.ttingastraightlinetothendatapoints whenalineareectisbeingmodelled).forcomplexproblemsthesizeof thenetworkforoptimumgeneralizationisdiculttodetermine,thoughindividualauthorshavefoundrules-of-thumbrelatingthenumberoftraining parametersiskeptdownandtheorderofthemodelischosentobeappro- Onewayofmaintaininggoodgeneralizationistomakesurethatthenet- examplestothenumberoffreeparameterstobetrained(bourlardandmorgan1993:p.234).inpractice,foraspecicproblem,trial-and-errorisoften used.methodswherebythenetworkisgrownorprunedtotherightsize havealsobeendeveloped. O-linehandwritingrecognition 62

problem,buttopreventover-trainingwithinthatnetwork.possibletechniquesincludeweightdecayandaddingnoisetoweights,butthemethod usedhereisearly-stoppingwhichcanbeimplementedwithoutchangingthe Analternativeistouseanetworkknowntobeatleastlargeenoughforthe CHAPTER7.RECOGNITIONMETHODS trainingprocedureandhastheadvantageoflimitingtrainingaccordingtothe sameperformancecriterion(worderrorrate)aswillultimatelybeusedfor testingthenetwork.ifanetworkistrainedonadataset,itisfoundthat, duringtraining,theerrorratewhentestedonanindependentvalidationset willfallasasolutionislearnt,andthenbegintoriseasgeneralizationisimpairedbyover-training.iftrainingisstoppedattheminimumofthevalidationerror,optimumrecognitiononanindependenttestsetwillbeobtained. trainandvalidatecycleisrepeatedeveryepochuntiltheerrorrateonthe intoseparatetrainingandvalidationsets.aftertrainingthenetworkfora shorttime,thenetwork'sperformanceistestedonthevalidationset.this Thismethodhasbeenwidelyusedintheneural-networkcommunity,andis validationsetstartstoincrease,indicatingthatthenetworkisstartingto particularlyappropriateforlargedatasettasks.bourlardandmorgan(1993) becomeover-trained.thestoppingcriterionisaheuristicbasedontheobservationofvalidationworderrorrateovertime.thecriterionusedhereis Todeterminethebesttimetostoptraining,thetrainingsetispartitioned haveusedasimilarmethodforlarge-vocabularyspeechrecognition. tostopwhenthevalidationerrorrateisabovetheminimumobservedduring notpreviouslypresentedtothenetwork. errorrateisreloaded,andtestedonthetrainingsetwhichconsistsofdata trainingformorethantwelveepochs,orthesamewithoutadecreaseinthe meanrelativeentropy.afternishingtraining,thenetworkwiththelowest Number ofunitsfixedtargetretraining 02 49.0 41.1 Errorrate(%) 40.9 34.0EpochsTimeper 171 75epoch(s) 1230 160 10 40 804 29.3 23.1 21.6 21.4 16.9 14.8 26.2 22.3 19.1 16.3 15.6 12.2 141 133 181 132 115 1250 1280 1270 1450 2100 bersofhiddenunits.resultsarequotedbeforeandafterre-trainingwith Table7.1andgure7.3showtheerrorratesforunitswithdierentnum- feedbackunits. Table7.1:Errorratesfornetworkswithdierentnumbersof 320 13.5 9.6 116 14000 4900 re-estimatedtargets,aprocessexplainedinsection8.3.performancecan beseentoimprovesteadilyasthenumberofunitsincreases.thusitcan O-linehandwritingrecognition 63

Errorrate% Timeperepoch(s,logscale) Fixedtargets Re-estimatesCHAPTER7.RECOGNITIONMETHODS Numberoffeedbackunits 40 errorbars(onestandarddeviation). Figure7.3:Testerrorratesagainst numberoffeedbackunits,showing ThelowercurveshowstheerrorafterretrainingwiththeBaum-Welch re-alignment. Figure7.4:Approximateaverage trainingtimeagainstnumberofnetworkweights(log-logscale). Numberofweights(logscale) beseenthatearlystoppingensuresthatgeneralizationdoesnotsuerwhen thenetworksizeisincreased.infacttheincreasedcapacityofmorefeedback 10 unitshasbeentrained,thoughitislikelythattherecognitionratewouldbe stillhigher.thetimeestimatesareseentocomefromaconstantterm(becauseofoverheadsandofcross-validationtesting)plusatermproportional unitsallowsthenetworktoperformbetter.becauseoftheincreasedtrainingtimeassociatedwithlargernetworks,nonetworkabove320feedback tothenumberofweights(proportionaltothesquareofthenumberoffeedbackunits),whichbecomessignicantonlywith40ormorefeedbackunits (gure7.4). Itcaneasilybeseenthattherearemanyglobalminima(anypermutationof initialconditions(therandomweightsgiventothenetworkpriortotraining). thefeedbackunitsgivesanidenticalsolution)anditisnotsurprisingthata dierentsolutionisfoundeachtime,thelocalminimafoundinweightspace correspondingtonetworksgivingdierentperformances.thisisaproblem errorratesquotedthatthenalsolutionsobtainedaredependentonthe Itwillbeseenfromthehighvaluesforthestandarderrorsofthemean existsbothinndingbettertrainingscheduleswithinthespaceofsolutions triedalready,andintryingmorecomplexupdatetechniques.theensemble reachesgoodsolutions,thereisscopeforspeedimprovement.thisscope oftrainingmethodscurrentlyusedresemblesthosearrivedatbybourlard thatmightbesolvedwithmoredataorbybettertraining,forinstanceby ndingabettertrainingschedule. O-linehandwritingrecognition Insummary,whileasatisfactorymethodoftraininghasbeenfound,which 64 35 30 25 20 15 10 5 0 0 50 100 150 200 250 300 10 4 10 3 3 10 4 10 5

Meanrelativeentropy Errorrate% Errorrate% CHAPTER7.RECOGNITIONMETHODS Figure7.5:Validationerrorrate againstnumberoftrainingepochs forvenetworksunderthesame conditions,butdierentinitial weights. Epochs numbersoffeedbackunits. Figure7.6:Percentagerecognition errorrateversusnumberoftraining epochsfornetworkswithdierent Epochs 0 160units 320units Figure7.7:Averagerelativeentropyofthetrainingsetoutputs andtargetsagainstnumberoftrainingepochs. Epochs 80units O-linehandwritingrecognition 65 100 80 60 40 20 0 0 20 40 60 80 100 120 140 2.5 2 1.5 1 0.5 100 80 60 40 20 0 0 20 40 60 80 100 120 140 160 0 2 4 10 20 40 80 160 0 20 40 60 80 100 120

andmorgan(1993)androbinson(1994),butdiersinanumberofdetails. 7.1.4Understandingthenetwork CHAPTER7.RECOGNITIONMETHODS cult.while`gradientdescentontheerrorsurface'isoftentalkedabout, thoughthehighdimensionalityofinterestingproblemsmakesanalysisdif- networksinparticular,hasbeenthelackofunderstandingofhowthenetworksoperate.itisnotalwayswellunderstoodtowhichproblemstheyare bestsuited,orhowbesttousethemonproblemstowhichtheyareappropriate.neuralnetworkshavebeenstudiedingreaterdepthinrecentyears, Oneofthegreatproblemswithneuralnetworksingeneral,andrecurrent beplotted,andforhigherdimensionsitbecomesdiculttocalculate,let itisonlyforatrivialneuralnetworkwithtwoweightsthatthissurfacecan alonevisualize.recurrentnetworksareharderstilltounderstand,sincethe dimensionalityismuchhigher outputsaredependentontheinputs,not onlyofthecurrentframe(andforthehandwritingrecognitionnetworksdiscussedhere,thereareabout80inputs),butalsoofalltheprecedingframeserationofrecurrentnetworksundercertainconditions.inordertodiscover howtherecurrentnetworkisoperatinginthistask,agraphicalinterfaceto thenetworkhasbeenconstructed,enablinginputs,activationsandweights tobeexamined.theremainderofthissectiondiscussessomeoftheunderstandingthathasbeenreachedastotheinternalrepresentationofdatain thenetwork. Robinson(1989)andPearlmutter(1990)havepreviouslystudiedtheop- singlewordthroughthenetandtoobservetheoutputs.figure7.8shows anexampleoftheword`fortun 'beingpresentedtothenetwork.the horizontaltracesshowtheactivationsoftheoutputunitsagainsttime.since theoutputsofthenetworkareconstrainedbythesoftmaxfunctiontosum toone,mostoftheoutputsareseentobealwaysclosetozero,withonlyone ortworisingtoasignicantvalueatanytime.theactivitiesduringtherst Arstexperimenttodemonstratethenetwork'soperationistopassa twoframes(beforetherstverticalline)arealwaysignoredinthetraining andtestingofthenetworkbecauseoftheinput/outputlatency.subsequent framesseetheprobabilitiesfor`f',`o',`r'andsoonincreasing,withasmall amountofactivityinotherletters.notethatthevalleybetweenthe`u' and`n'isconfusedwitha`v',andthatthe`'ispartiallyconfusedwithan `l',buttheseconfusionsareeliminatedbythedurationmodelling(discussed inchapter8.2)andtherequirementthatthewordshouldbeinthelexicon. Theverticallinesrepresenttheletterboundariesoftheforcedalignment (section8.3)fromtheviterbidecoder. randomlyinitializedwithweightsofzeromeanandsmallvariance.however, aftertraining,alltheweightsfromanyfeedbackunittothesameunitforthe nexttime-stepwerefoundtobepositive,withstrongconnections.(fora typicalnetworktheyhavemean2.6andstandarddeviation0.6.)connections O-linehandwritingrecognition Considernowtheweightswithinthenetwork.Initiallynetworkswere 66

CHAPTER7.RECOGNITIONMETHODS Figure7.8:Thesystemrecognizingtheword`fortun '.The activationsoftheoutputunitsareplottedagainstthenumberofframesprocessed.classboundariesfoundbyviterbi forcedalignmentareshownwiththeassociatedclasslabels(section8.3). f tunat e tootherfeedbackunitsvarygreatly,withaslightlynegativemean(e.g.mean -0.4,standarddeviation1.2).Thisindicatesthatthenetworkislearning theintuitivemechanismofhavingthefeedbackunitspreservetheirstate, exceptwheninuencedbyinputsandotherfeedbackunits.sincethenetwork solutionsseemtofavourthisstate-preservation,bettersolutionsmightbe foundmorequicklybychoosinganinitialweightdistributionwhichpreserves state.thiscanbecalculatedasfollows. respondingtoaweightedsumofinputsi=0,sincethesigmoidactivation function,forwhichf(0)=0:5,isusedforthefeedbackunits),then iftheotherweightshaveazeromean.forsteady-stateconditions,i=0, Ifthefeedbackunitsareassumedtohaveameanactivationaj=0:5(cor- sobi= 0:5wii.Now,foranactivationai=0:5+ai, i=bi+xjajwijbi+0:5wii Primingthenetworkconnectionstothesevaluesgivesfastertrainingand Forthesigmoid,f0(0)=0:25sothestateisstablewhenwii=4;bi= 2. Sinceai=f(i),forsmallai: ai=f(ai) 0:5aiwiif0(0): i=bi+aiwii: O-linehandwritingrecognition 67

aremuchhigher(mean4.6,standarddeviation0.5),revealingthatpriming agreaterrecognitionaccuracyaftertraining.thenalvaluesoftheselinks thenetworkweightsputsthenetworkintousefulareasofweightspacethat werenotexploredwhiletrainingun-primednetworks.italsoconrmsthe CHAPTER7.RECOGNITIONMETHODS usefulnessoffeedbackconnectionswhichpreservethefeedbackunits'state. weightsfrominputtooutputunitsarepositive.thisistobeexpected,since asingleframeofinputisitselfambiguousanddoesnotgiveastrongindicationastothecharacteroftheframetwotime-stepspreviously(whichdirect linkswouldindicate,sinceoutputsrefertotheframesinputtwotime-steps linksfromtheunitsrepresentinglinesinthelowestpartofaword.thisis previously).onenotableexceptiontothisistheletter`q'whichhasstrong Examiningotherconnectionswithinthenetwork,itisseenthatveryfew because`'iswrittenwithadescendertotherightof(delayedwithrespect informationistransferredbythedirectinput-to-outputconnections,ithas beenfoundthatanetworkwiththeseconnectionsperformsbetterthanone whichdoesnot. to)thebodyoftheletter.figure7.9ashowsthelinksfromoneinputunitin activatedbythisinputunit,whileotheroutputsareinhibited.becausesome thelowestpartoftheword.alltheletterswithdescenderstotherightare featurespresentedattheinputduringthelastfewtime-stepssothataclassicationofthecurrentframecanbemadeaccordingtothecontext,since feedbackunits.inthishandwritingproblem,theyneedtorepresentthe anindividualframeisambiguous.however,thewaythisinformationisencodedisnotreadilyapparent.aswasnotedearlier,eachunithasastrong feedbackconnectiontoitselftomaintainthestateovertime.otherwise,few Inarecurrentnetwork,themostimportantaspecttounderstandisthe linksfromthefeedbackunitsarefoundtobestronglypositive. theroleofthefeedbackunits.figure7.9b,cshowstheconnectionsfromthe onlytwofeedbackunitsinasmallnetworktotheoutputs.itisnoticeable thattheconnectionsreectthefrequenciesofthelettersinthetrainingset. Veryrareletterssuchas`q'and`z'haveverystrongnegativeconnections. Becauseoftheirrarity,theselettersgenerateverylittleerrorsignal,soitis inappropriateforthescarceresourcestobeusedmodellingtheseletters.on Ifanetworkwithveryfewunitsisexamined,itiseasiertounderstand theotherhand,theletters`edlrst'havepositiveconnectionsfromthefeedbackunitssincethesearecommon.thetwomostcommonletters(`et')are modelledbybothfeedbackunits.figure7.10showstheoutputprobabilities fortheword` akin',whichshowstheeectofthis.theletters`se'are duringmostoftheframes,thecorrectwordisstillchosenfromthelexicon. well-dened,thoughnotasclearlyaswiththe80unitnetwork(gure7.8). Therearenoticeablepeaksintheoutputtracesofthesetwoletters,butthe O-linehandwritingrecognition otherlettersshownomarkeddeviationfromzero.itcanalsobeseenthat throughthedirectinput-outputconnections,thedescenderisidentiedas belongingtoeithera`'ora`',thoughthenetworkdoesnothavethemodellingcapacitytodistinguishthetwo.despitetheuncertaintyofthenetwork 68

CHAPTER7.RECOGNITIONMETHODS -7.31-8.43-17.70-7.60-6.67-11.61 0.26-9.72-8.48 2.85-37.79-7.41-9.70-8.67-5.18-16.93 1.67-7.64-17.96-7.36-9.07-10.93-5.99-37.93 0.73-6.39 abcdefghijklmnopqrstuvwxyz -3.05-0.85-1.33-2.10 0.99-1.46-5.42-0.47-4.91-18.51-2.27-1.29-2.17-2.76-0.83-2.21-22.21 0.41 1.08 0.49-4.25-1.82-0.88-1.53-6.00-47.08 abcdefghijklmnopqrstuvwxyz -0.69-0.97 0.06 0.67 1.42-5.15-4.75-1.28-2.53-18.58-2.35 1.68-3.17-2.63-4.66-3.39-21.98-3.80-1.67 0.10-0.65-2.63-2.29-1.65-2.45-47.08 abcdefghijklmnopqrstuvwxyz (a) (b) (c) Figure7.9:Connectionstrengthstotheoutputsinarecurrent network.circlesarewhiteforpositiveweights,blackfornegative.largermagnitudesarerepresentedbylargerradii.(a) showstheconnectionsfromadescenderinputunitina60-unit network.(b)and(c)aretheconnectionsfromtheonlytwofeedbackunitsinasmallnetwork. O-linehandwritingrecognition 69

CHAPTER7.RECOGNITIONMETHODS in'.noclassboundariesareshownbecausethe2-unitnetworkre-estimatesareinaccurate. Figure7.10:Thetwounitnetworkrecognizingtheword` aklowotherwise,thoughthecorrelationisfarfromperfect.ingure7.10the havehighactivationswhentherelevantlettersarepresentattheinput,and tivationswhenpresentedwithworddata.theunitsaregenerallyseento Theroleofthefeedbackunitscanalsobeveriedbyexaminingtheirac- doesnotgohighduringthe`'asmightbeexpected.thebiasestotheoutput unitsarefoundtoreectthevariationinclassfrequencies,butthiscorrelationisnotasstrongassuggestedbytheexperienceofbourlardandmorgan (1993:p.127).Examininganetworkwithfourunits,oneofthefeedbackunits isfoundtohavenegativeconnectionstoalltheoutputsexcept`i',andtoreceivestrongpositiveinputfromtheinputunitrepresentingthedotfeature. Thisrepresentationallowsthenetworktorememberthepresenceofanidot framesareentirelyzeroisconstructed,andpresentedtoatrainednetwork. duringthelatencyperiod. conditionsistofeedanullinputintothenetwork.adatalewhereall Theunforcedoutputforasamplenetworkwith60feedbackunitsisshown ingure7.11.itcanbeseenthattheoutputandfeedbackunitsgothrough Examininganetworkwithbutonehiddenunitshowsthatthenetworkdy- severalcyclesbeforereachingasteadystatewithalltheunitsinsaturation. Anotherwayofinvestigatingthenetwork'sbehaviourundercontrolled activationoffeedbackunitzeroishighduringthe`s'and`',thoughunit1 O-linehandwritingrecognition namicsare,understandably,simpler.theoutputsareallmonotonic,and 70

CHAPTER7.RECOGNITIONMETHODS O-linehandwritingrecognition Figure7.11:Thenetworkoutputsforunforcedinputs. 71

feedbackunitsaretested,thebehaviourbecomesmorecomplex,untilwith reachasteadystateafterafewframes.asnetworkswithmoreandmore a160-unitnetwork,nosteadystateisachievedafter130frames.thenetworkappearstobeenteringlimitcycles,exhibitingdynamicbehaviourwith CHAPTER7.RECOGNITIONMETHODS noactiveinputs. 7.2Time-delayneuralnetworks Inputunits Hidden LetteroutputNeuralnetworklinks Figure7.12:Aschematicofthetimedelayneuralnetwork, showingasinglehiddenlayer. units theperceptronsisthenshiftedtotheright,andanotherhiddenframecalculated.thisprocesscanberepeatedforalltheframes.atthesametime,a secondlayerofperceptronunitstakesagroupofhiddenframesandforeach ofthesecalculatesanoutputprobabilitydistributionwithsoftmaxunits,just asfortherecurrentnetwork.thus,foreachinputframeacorrespondingoutputdistributioniscalculated.sincethesameperceptronsoperateoneach sectionoftheinput,thetdnnisgoodatposition-invariantpatternrecognition.ithasaxedwindowofcontextwhichisthenumberofinputframes onwhicheachoutputdepends.thelengthofthiswindow(veframesin thediagram)isdeterminedbythereceptiveeldsoftheperceptrons.this makesthetdnngoodforrecognitionofpatternswithlimitedcontext,when O-linehandwritingrecognition Time-delayneuralnetworks(TDNNs)areamethodofapplyingasimple forward-propagationneuralnetworktoasequenceofframesofdatatoarrive atasequenceofprobabilityestimates.atdnnisrepresentedingure7.12. Alayerofperceptrons,asusedintherecurrentnetwork,takesasmallgroup ofinputframes(threeinthediagram)andcalculatestheactivationsofacorrespondinghiddenframewithequations7.1and7.2.thereceptiveeldof 72 t t t t t t h h h h h r r r r r e e e e e e e e

theextentofthiscontextisknown,butlonger-termdependenciescannotbe learnt.becauseoftherigidhierarchyoftheinputandhiddenunits,dependenciesofvariablelengtharehardtolearn.eachperceptroncanonly associatefeatureswhichareaxeddistanceapart.therecurrentnetwork, CHAPTER7.RECOGNITIONMETHODS thearchitectureofatdnnisspeciedbyalargenumberofparameters.the ontheotherhand,storesallcontextinthehiddenunitswhichareavailable numberofhiddenlayersmustbespecied,aswellasthenumberofunits ateverytimestep.ifthecontextisofvariablelength,thefeedbackunitswill arbitrarydelay. thishandwritingrecognitiontask.theywerealsofoundtobeunwieldysince varyslowlyandthecorrelationbetweentwofeaturescanbedetectedatan becontrolledisthenumberofframesshiftedbetweensuccessiveoperations ofeachofthesetsofperceptrons.findingagoodsetofvaluesforallthese ineachandthesizeofeachreceptiveeld.afurtherparameterthatcan ItisbelievedtobeforthisreasonthatTDNNsdidnotperformwellon parametersrequiresalongsearch,whereastherecurrentnetworkhasasingle suchparameter thenumberoffeedbackunits(section7.1.3).becauseof thispoorinitialperformance,tdnnswerenotinvestigatedfurther,andno mation.thisinvolvescomputinganumberofinteger-valuedindicesfrom 7.3Discreteprobabilityestimation resultsarepresentedforthemhere. eachframeandusingthesetolookupprobabilityvaluesinpre-computed Thissectiondescribesthethirdtechniqueinvestigatedforprobabilityesti- tables.whencombinedwiththehiddenmarkovmodels(hmms)described theusualmethodofcalculatingprobabilitiesforadiscretehmm.bycontrast,therecurrentnetworkandhmmtogetherwouldbetermedahybrid system. inthenextchapter,thesystemisaconventionaldiscretehmmsincethisis ofdatatogiveestimatesoftheprobabilities.parametricdistributionscould clearlycomputationallyimpracticalandwouldrequireinfeasiblequantities wouldrequire2568026probabilitiestobestoredandestimated.thisis (256possiblevalues),tostoretheprobabilityofeachpossibleco-occurrence probabilityofaframeofdatabeinggenerated,giventheidentityoftheletter. Sincethedataarerepresentedasabout80features,eachcodedasabyte TheprobabilitiesthatmustbeestimatedarethelikelihoodsP(xtji) the moredicult.twomethodsareusedtosimplifytheestimation. beused,whichcalculatetheseprobabilitiesasfunctionsofasmallernumber ofparameters,butthenumbersarestillimpractical,andthere-estimation O-linehandwritingrecognition 73

0and1,themostimportantinformationiswhetheralinesegmentispresent ture,evenfortheskeletonwherethecoarsecodingdoesgivevaluesbetween 7.3.1Asimplesystem First,sincetheunitsmostlyrecordsimplythepresenceorabsenceofafea- CHAPTER7.RECOGNITIONMETHODS numberofvaluesmuchlessthan256).secondly,thefeaturesareassumedto ornot.theinputsarethusre-quantizedtobebinary-valued(orsomeother beindependent.thustheprobabilityoftheco-occurrenceofallthefeatures inaframeissimplytheproductoftheoccurrenceoftheindividualfeatures. oneboxishighlycorrelatedwiththeoccurrenceofaverticalstrokeinthebox sumtoone,only8026. isclearlyinaccuratesince,forexample,theoccurrenceofaverticalstrokein Nowonly80226probabilitiesneedtobestoredor,sincethepairsmust Theassumptionofindependenceintheoccurrenceoffeaturesintheinput P(xtji)YjP((xt)jji) (7.4) below.inpractice,theassumptionisfartoostrong,andtheperformance Vectorquantization(VQ)isamethodofcharacterizingeachframebyasingle ofthehmmsystemismuchworsethanthatoftherecurrentnetwork(an errorrategreaterthan50%).thefollowingsectiondescribesasystemwhich obviatestheindependenceassumption,andgivesbetterrecognitionresults. 7.3.2Vectorquantization number,orcodec(xt).thequantizationprocessisdesignedsothatsimilar framesareallcodedasthesamenumber.then,insteadofestimatingthe theprobabilityofthecodegiventhecharacterclassthatmustbeestimated: P(xtji)P(c(xt)ji). probabilityofallthefeaturesinaframegiventhecharacterclass,itisonly Inthesubsequenttraining,itisthesecodesthatarethefeatures,anditis theprobabilityofacodebeingpartofagivenletterthatmustbeestimated. isthencodedaccordingtothenearestcodevector:c(xt)=argminikci xtk2. zationdeterminesacodebookofcodevectorsciinthisspace.eachframext spacewithasmanydimensionsasthereareelementsintheframe.quanti- Beforebeingabletoestimatetheprobabilities,thecodevectorsmustbe Invectorquantization,eachframeisconsideredasavectorinametric isthenthecentroidofaclusteroftrainingvectors.anumberofalgorithms determined.toberepresentative,theymustbewelldistributedinthespace existforcarryingoutthisclustering,andanumberarereviewedbygray otherinthemetricspace,andthecodevectorsaredeterminedbyaclustering Thegroupsofequivalentvectorsareassumedtobethoseclosetoonean- algorithmwhichndstheseclustersinthetrainingvectors.eachcodevector ofvectorsactuallyproducedbythepreprocessingsystem,andeachshould representatypicalgroupofvectorswhichcanbeconsideredtobesimilar. O-linehandwritingrecognition 74

(1984).ThemethodusedhereisbyLindeetal.(1980).Itproducesasetof codingvectorsgivenatrainingsetofvectorsoutputbythepreprocessor thesametrainingsetwhich,whencodedbythequantizerisusedtoestimate thecodeprobabilitieswhicharestoredinthetables.inbrief,thealgorithm CHAPTER7.RECOGNITIONMETHODS worksinthefollowingmanner: 1.Seedthequantizerwithoneclassicationvector thecentroidofthe 2.Spliteachclassicationvectortogivetwo,perturbingeachslightly.This trainingset. hastheeectofdividingtheoriginalclusterwithahyperplaneperpen- 3.Classifyeachofthetrainingvectorsbyassigningittothenearestclassicationvector. whichwerenearesttoit. tionissucientlysmall,theotherclassallocationswillbeunaected. Perturbationalongthelinejoiningthecentroidtotheoriginwasfound toworkjustasquicklyasperturbationalongtheaxiswiththegreatest in-clustervariance. dicularlybisectingthelinejoiningthetwonewcentres.iftheperturba- 4.Moveeachclassicationvectortothecentroidofthetrainingvectors 5.Gotostep3unlesstheclassicationsarethesameasinthelastiteration. theeuclideandistancewasused.thisisreasonablesincealltheinputsare 6.Gotostep2untilthedesirednumberofclassicationvectorsisobtained. Forstep3,adistancemetricmustbespecied.Asarstapproximation tivewhichhasalsobeentestedisthemahalanobisdistance(alreadyseenin constrainedtofallinthesame[0;1]interval.thisdistancewillreduceto thehammingdistancewhenallthevectorsarebinaryvalued.analterna- chapter6),wherethedistancebetweentwopointsxandyisgivenby: whereisthecovariancematrixofthetrainingvectors. butionofvectorsisellipticallygaussian,whichisclearlynottruehere.nev- ertheless,itallowscorrelationsbetweenvectorelementstobetakeninto accountwhenndingthedistancebetweentwovectors.abettermetric, TheMahalanobisdistanceisderivedfromtheassumptionthatthedistri- kx yk2=(x y)t 1(x y); (7.5) O-linehandwritingrecognition basedonknowledgeoftheoriginofthedataandthefactthatthedataare largelybinary-valuedcouldprobablybefound.thiswouldmodelthecorrelationsbetweenfeaturesbetterandresultinmorerepresentativeclusters. 75

ever,hand-craftingametricwouldbeacomplexprocedure,andthemaha- lanobismetricisthemostcomplexmetricinvestigatedhere. Betterresultsmightbeobtainedfromquantizingwithsuchclusters.How- AfurtherissueindesigningaVQ-HMMsystem,istheoptimumnumberof CHAPTER7.RECOGNITIONMETHODS clusterstochoose.thisinvolvesstrikingabalancebetweenanover-trained Discreteprobabilityestimationrequiresthetablesofprobabilitiestobelled systemwhichdoesnotgeneralizewellandonewhichhasalowdiscriminative power.resultsaregivenforavarietyofnumbersofclustersandtheoptimum valuechosen. 7.3.3Training withtheestimateofp(cijj)foreachofthecodesciandlettersj.after segmentationprocedure,thenumberoftimescodeciispartofletterjis vectorquantizingthecorpusandlabellingeachframewiththeautomatic countedoverthewholetrainingcorpus.dividingbythenumberofframes representingjgivesanestimateoftheemissionprobabilitiesp(cijj).by re-aligningwiththebaum-welchprocedureofchapter8,theprobabilities canbere-estimatedandtherecognitionrateimprovedslightly.forthis HMMframework,theBaum-Welchprocedureisveryfast,sincethemaximizationstepoftheExpectation-Maximizationalgorithm,ofwhichthisisan example,consistsonlyoftakingthefrequencycountsratherthandoinggradientdescentaswiththerecurrentnetworks anotoriouslytime-consuming clustersislowerbecauseduringthesplittingsomeclustershavebeenfound tobeemptyandthecorrespondingcentroidsdiscarded. powersoftwointhersttable,sinceateachiterationofthesplittingalgorithmthenumberofclustersisdoubled.intheothertables,thenumberof distancesareshownintables7.2,7.4and7.3.thenumbersofclustersare problem. RecognitionratesfortheHMMsystemwithEuclideanandMahalanobis ClustersErrorrate(%) 1024 2048 4096 256 512 24.1 20.6 22.9 28.1 93.5 ClustersErrorrate(%) Table7.2:Errorratesforthehid- 1006 256 509 25.9 denmarkovmodelsystemwitheu- clideandistancevectorquantization.table7.3:errorratesforthehidden MarkovmodelsystemusingdiagonalcovarianceMahalanobisdistance 21.0 23.9 40.6 1979 3796 22.0 O-linehandwritingrecognition 76

ClustersErrorrate(%) 1001 254 505CHAPTER7.RECOGNITIONMETHODS 26.6 24.5 tersupto512increasesthediscriminativeperformanceofthesystem,sothe Fromtables7.2and7.3itcanbeseenthatincreasingthenumberofclus- Table7.4:ErrorratesforthehiddenMarkovmodelsystemusingMahalanobisdistancevectorquantization. 22.4 errorratefalls.beyondthis,thegeneralizationfailsandperformancefalls orapidly.by4000clustersthesystemfailscompletely.thediagonalmahalanobisdistancemethodgivesslightly,butnotsignicantlyworseresults, andthefull-covariancemahalanobisdistancegivesworseresultsagain.the full-covariancematrixcodebookisprohibitivelyexpensive,computationally, toworkoutforlargernumbersofcentroids.thelackofimprovementisdue oftenone.themahalanobisdistanceisintendedformodellingdistributions whicharegaussiandistributed,anassumptionnottruehere. totheunusualdistributionoftheinputswhicharenearlyalwayszero,and thesameasarecurrentnetworkwith64feedbackunits.anetworkwith 7.3.4Discussion Thebestofthesediscreteprobabilityestimatorshas51226parameters 60feedbackunitsachievesa14.5%errorrate.Itcanthusbeseenthatthe purehmmsystemdoesnotperformaswellasthehybridrecurrentnetwork/hmmsystem.whilethisshowsthattherecurrentnetworkisamore practicalsolutiontotheproblemofmodellingthegraphicdata,itdoesnot argueabsolutelyagainsttheuseofhiddenmarkovmodels.whilemuchof theworkofthisthesisisequallyapplicabletobothsystems,moretimehas beenspentperfectingtherecurrentnetworksystemthaninvestigatingimprovementsinthepurehmmapproach.itisundoubtedlytruethatwith furtherinvestigationthehmmsystemcouldbeimproved.thereisasetof standardtechniquesthatcouldbetakenfromspeechhmmsandappliedto thissystem,whichcouldreasonablybeexpectedtogivebetterperformance. Theseincludegivingdierentstateswithinaletterseparateprobabilitydistributions,andproducingcontext-dependentmodelswhichwouldbeable tomodelthecoarticulationbetweenadjacentletters mostparticularly theligatureswhichvarywithdierentcontexts.however,similarmethods mightalsobeappliedtothehybridsystem. O-linehandwritingrecognition 77

canbeusedfortheproblemofo-linehandwritingrecognition,andhasdiscussedsomeoftheissuesinvolvedinusingthem.thetrainingofthemodels 7.4Summary Thischapterhaspresentedthreemethodsofprobabilityestimationwhich CHAPTER7.RECOGNITIONMETHODS batchesduringtraining. hasalsobeendiscussedandrecognitionresultspresented.therecurrentnetworkswerefoundtoperformbetterthanboththediscretehiddenmarkov howtheprobabilityestimatesareusedforwordrecognition. isverytime-consuming,butanumberofmethodshavebeenusedwhichreducethetrainingtime,includingweightinitialization,jacobs'weight-update scheme,andatrainingschedulewhichchangesthesizeofweight-update modelandthetimedelayneuralnetwork.trainingtherecurrentnetworks Thenextchaptercompletesthedescriptionofthesystembyexplaining O-linehandwritingrecognition 78

Chapter8 HiddenMarkovmodelling Thepreviouschapterdescribedmethodsofmodellingthegraphicaldataofa handwrittenword.eachmethodgaveanestimateofthelikelihoodp(xtji) foreachframeofinputxtandforeachcharacterclassi(of26).thischapter dealswiththeprocessofderivingthebestwordchoicefromasequenceof Thereadingisrightwhichrequiressomanywordstoproveitwrong. theseframeprobabilitydistributionsbytheuseofhiddenmarkovmodels. SamuelJohnson. ThemethodsdescribedhereapplyequallytothepurediscreteHMMandto therecurrentnetworkhybridsystem,buttestsaredescribedforthehybrid systemsinceitwasfoundtobemoreeective.forthetimebeing,thesystem isassumedtohaveaknownvocabularyanditisassumedthatanyword presentedtoitwillbeinthatvocabulary. 8.1AbasichiddenMarkovmodel Becausethedataarenoisyorambiguous,theoutputofthewholesystem shouldbeaprobabilitydistributionacrossthewordsinthelexicon,being probabilitiesforotherwords.theprobabilitydistributiontobedetermined words`clump',`jump'and`dump',withalowerprobabilityfor`lump'andsmall theprobabilityforanywordthatitwastheoneoriginallywritten.normallytheprobabilityshouldbeclosetooneforoneword,andclosetozero fortheothers,butwherethereisambiguity,errororpoordata,thedis- gure3.1b,high,roughlyequalprobabilitieswouldbeexpectedforthethree tributionmightbemoreuniform.forinstance,fortheambiguouswordof isp(wjx0)acrossallwordswinthelexiconl,giventheinputdatax0. thesetofstatesisq=fqr:r=0;:::;n 1g,correspondingtotheletters foreachwordintheknownlexicon,withonestaterepresentingeachletter.figure8.1showsamodelfortheword`one'.iftherearenstates, HMMsisthatofRabinerandJuang(1986).AseparateHMMiscreated abilitiesusingahiddenmarkovmodel(hmm).agoodtutorialarticleon Theindividualframeprobabilitiesarecombinedtoproducewordprob- O-linehandwritingrecognition 79

CHAPTER8.HIDDENMARKOVMODELLING Figure8.1:AsimpleMarkovmodelfortheword`one'withone stateperletter. (t)o(t)on(t)one(t) one one ginningoftheword.ateachtimestept=0;:::;,astatetransitionis model.attimet=0themodelisinstateq0,correspondingtothebe- made,followingoneofthearrowsinthediagram.thismeansthateither havebeengenerated.eachcircleinthediagramrepresentsastateofthe thenextstateisentered,oraself-transitionismadeandthestateatthe subsequenttimestepisthesameasthecurrentstate.thestateattime L(qr).TheMarkovmodelrepresentsaprocessbywhichthewritingcould strictive.tousethemodel,transitionprobabilitiesareassignedtoeachof thepermittedtransitionsandareassumedtobeindependentofthetime: tiswrittenst.ingeneralahiddenmarkovmodelcanallowtransitions knownandnoletterscanbemissedout,sothemodelismademorere- ap;r=p(st+1=qrjst=qp);ap;r=0exceptwhenr=porr=p+1.for betweenanypairofstates,butinhandwriting,theorderofthelettersis themodeltobeatruemarkovmodel,allthetransitionprobabilitiesare dependentsolelyonthecurrentstate.bythisprocess,astatesequence S=(S0;:::;S)isarrivedat,whichrecordsthestateateachtimestep.A typicalstatesequencemightbes=(q0;q0;q0;q0;q1;q1;q1;q2;q2;q2;q2;q2)correspondingtothelettersequencel(s)=(o,o,o,o,n,n,n,e,e,e,e,e).the inferred.itisonlytheframesofdatathatareobserved. representationofthelettersigniedbythecurrentstate.thedataareas- modelisahiddenmarkovmodelbecausesisnotdirectlyobservable,only sumedtooccuraccordingtoaprobabilitydistributionp(xtjl(st)),whichis estimatedbytherecognitionsystemofchapter7.withthisinformationan expressioncanbederivedfortheprobabilityofaword,givenaparticular observationsequencex0. Inthegenerationprocesswhichistobemodelled,thesystemproducesa frameofgraphicdataxtateachtimestep.thedataarepartofthegraphic assumingthatthewordisinthelexiconl,theprobabilitiesmustsumtoone, TheposteriorprobabilityofawordWcanberewrittenusingBayes'rule: andcanbenormalized: insection8.4.2.theprobabilityp(x0)ofthedataoccurringisunknown,but wherep(w)isthepriorprobabilityofthewordoccurring,whichisdiscussed XW2LP(Wjx0)=1 P(Wjx0)=P(x0jW)P(W) P(x0); (8.2) (8.1) O-linehandwritingrecognition 80

Therearemanystatesequencesrepresentinganygivenword.Writing P(Wjx0)=P(x0jW)P(W) CHAPTER8.HIDDENMARKOVMODELLING S(W)=fS;suchthatSrepresentsWg; PW2LP(x0jW)P(W): (8.3) wherethestatesequenceprobabilityp(s)istheproductoftheinitialdistribution,r=p(s0=qr),andthesubsequenttransitionprobabilities: Herer=0forallstatesexcepttherst(0=1),sothemodelisconstrained P(x0jS)=P(x0jS)P(x1jS;x0) P(S)=S0 1 Yt=0aSt;St+1: then P(x0jW)=X S2S(W)P(x0jS)P(S); (8.5) (8.4) tostartwiththerstletter.now,bybayes'rule =P(x0jS)Yt=1P(xtjS;xt 1 (8.7) (8.6) classthatthecurrentstaterepresents,thisreducesto: Ifitisassumedthattheemissionprobabilityisdependentsolelyonthe P(x0jS)=Yt=0P(xtjL(St)); 0): (8.9) (8.8) weakerassumptionisthattheemissionprobabilityisconditionallyindependentofprecedingorfollowingstates,giventhecurrentstate: whichinvolvesthetermsp(xtjl(st))storedinthetablesofchapter7.a where,byfurtherapplicationsofbayes'rule,itcanbeseenthat: P(x0jS)=Yt=0P(xtjSt;xt 1 P(xtjL(St);xt 1 0)=P(L(St)jxt0)P(xt0) =Yt=0P(xtjL(St);xt 1 0) (8.11) (8.10) O-linehandwritingrecognition NowP(L(St)jxt0)isexactlytheposteriorprobabilityestimatedbytherecurrentnetwork.P(xt0)istheprobabilityoftherstfewframesofdata,which isthesameforallwords.p(l(st);xt 1 0)isassumedtobeproportionalto P(L(St);xt 1 0): (8.12) 81

P(L(St)),thepriorprobabilityofaframebelongingtotheclassL(St).This assumptionisclearlyincorrect,butisfoundtoworkinpractice.thisprobabilitycanbeestimatedbycountingthenumberofframesineachclassaccordingtothelabelsofthetrainingset. CHAPTER8.HIDDENMARKOVMODELLING canbenormalizedtogivewordprobabilities: ThustherearetwoexpressionsforthelikelihoodL(Wjx0)ofaword,which P(Wjx0)=L(Wjx0) L(Wjx0)P(W)X PWL(Wjx0) S2S(W) Yt=0P(xtjL(St))! Yt=0P(L(St)jxt0) P(L(St))! S0 1 Yt=0aSt;St+1!(8.14) Yt=0aSt;St+1!:(8.15) (8.13) Equation8.14isusedforthetablelook-upsystemandequation8.15isused istheprobabilityofbeinginstateraftertframeshavebeenobserved.thus theequationsareappliedtotherecurrentnetworksystem. henceforth,butthescaledlikelihoodsp(l(st)jxt0) fortherecurrentnetwork.forsimplicity,thelikelihoodsp(xtjl(st))areused r(0)=rtheinitialdistribution. namicprogramming,inanarraystructurerepresentingthestatesofthemar- kovmodel.inthismodel,eachstateisaccordedaprobabilityr(t),which TheseexpressionscanbecalculatedecientlyusingtheprincipleofDy- P(L(St))aretobeunderstoodwhen probabilitiesaregenerated,themarkovmodelforwardprobabilitiesarecalculatedrecursivelybytheformula: untilallhavebeenprocessed.atthispointthenalstate(dasheding- Assuccessiveframesofdataarefedintotherecognizer,andcharacter ure8.1)containsp(x0jw)=n(+1),thelikelihoodthatthedatax0repre- sentedthewordofthismodel.bychoosingthemaximumofthelikelihoods, argmaxwl(wjx0),ifthemodelsaregood,agoodestimateoftheidentityof theoriginalwordisobtained. r(t+1)=xpp(t)p(xtjl(qp))ap;r (8.16) speedandnumericalaccuracy.multiplicationsbecomeadditionsinthelog (1987). domain.probabilityadditionscanbecalculatedbyusingtheidentity andderivingthesecondtermfromalook-uptable,asdescribedbybrown Alloftheseprobabilitiesarestoredandmultipliedinthelogdomainfor log(a+b)loga+log(1+exp(logb loga)); (8.17) O-linehandwritingrecognition 82

8.1.1Labelling Itwillberecalledfromchapter4thatthedatabaseconsistsofbothupperand lowercaselettersaswellaspunctuation.infactthepunctuationisexcluded inthesegmentationprocess,soonlywordimagesarepassedtothepreprocessingsystem,andnorecognitionofpunctuationiscarriedout.ifthiswere desired,aseparatesystemforrecognizingpunctuationmarkswouldbenec- CHAPTER8.HIDDENMARKOVMODELLING contourshapeofpunctuationmarks. simplersystemcouldbeused,perhapsbasedonrulesforthelocationand location,therecurrentnetworkapparatuswouldbeinappropriate.amuch essary.aspunctuationmarksappearinisolationandarelargelydenedby wordsandinafewacronyms.indeed,thecurrentsystemrecognizescapital inthedatabasetotrainanetworkwithseparateoutputclassesforbothupperandlowercaseletters,sincecapitalsonlyoccuratthebeginningofafegories,andmakesnodistinctionbetweenupperandlowercaseletters.an `a'andan`a'arebothlabelledthesame,andthenetworkistrainedtogive thesameoutputforeither.therearenotenoughexamplesofcapitalletters Thesystemdescribedheregivesadistributionacrossthe26lettercate- comparedto15forincorrectwordswithoutcapitals.moredatawithcapital letterswouldimprovetherecognitionrateoncapitalletters,bringingdown stillpossiblebasedontheremaininglettersandtheconstraintsofthelimitedvocabulary.testinga160-unitnetworkwithagrammargavean8.8% errorrate,butamongwordswithcapitalstheerrorratewas35%.theaveragerankinthelexiconofincorrectlyrecognizedwordswithcapitalswas96, letterspoorly,butsincetheyaregenerallyonlyinitialletters,recognitionis theoverallerrorrate. casewererequired,thenetworkcouldbegiven52outputstorepresentthe upperandlowercaseletters.however,itmightbebetter(becausethenetworksizewouldbekeptdown)tokeepjust26outputcategories,andhave independentprobability,withasigmoidoutput(equivalenttothetwo-class softmax).whenusingsuchasystem,thehiddenmarkovmodelswouldneed aseparateunitindicatingthecaseoftheletter.suchaunitwouldgivean Ifmoredatawereavailable,anddistinctionbetweenupperandlower tobeadaptedtoaccountfortheseparateclassesand,accordingtothetask, modelswithinitialcapitals,fullcapitalsorevenmixedcasewordscouldbe sincethereisnonoiseorligatureclassinthelabellingofthetrainingdata. permitted. thenetworktoindicatethattheinputsdonotcorrespondtoanyoftheletter classes.suchaclasscouldbeusedinthissystemtorepresentpoorwriting ortheligaturesbetweenletters,buttheimplementationwouldbedicult Sincethesystemacceptscursiveanddiscretewriting,thedatawouldneedto behand-labelledtoindicatethepresenceofligatures.ifsuchhand-labelling Somesystems(Schenkeletal.1994)havea`noise'outputclasstoallow weredone,thenanoptionalligaturemodelcouldbeinsertedbetweenthe O-linehandwritingrecognition lettermodelsofeachword.anoisemodelcouldbeplacedinparallelwith 83

usedherewereclean,theseideaswerenotimplemented. illegibleintheinput.sincefewframescontainonlyligatures,andthedata thelettermodelstoallowletterstobeskippedwhentherewassomething CHAPTER8.HIDDENMARKOVMODELLING quencessuchasl(s)=(o,o,o,o,o,o,o,o,o,o,n,e)aregoingtocontribute therewillbeasmallnumberofsimilarstatesequenceswhicharemuchmore likelythanalltheothers.also,thesinglemostlikelysequence,s,willbe littletotheprobabilityoftheword.infact,inmostcases,itcanbesaidthat Inpractice,mostofthestatesequencesSarehighlyimprobable,andse- 8.1.2Decoding Viterbidecoding.Inthiscase,thedecodingissimpler.Adierentsetof similartoallofthese,andcanbeconsideredtoberepresentative.thus,a goodapproximationtoequation8.5is: likelihoods,0,isstored: Carryingoutdecodingononlythemostlikelystatesequenceiscalled 0r(t+1)=max P(x0jW)/P(x0jS)P(S): p0p(t)p(xtjl(qp))ap;r: (8.18),andarefoundtogivebetterresultsforthishandwritingrecognitionsystem (T(2)=9:72;t:99(2)=6:96).Comparativeresultsaregivenintable8.1. Theselikelihoodscanbecomputedmorequicklythanthefullprobabilities, (8.19) ve80-unitnetworkstrainedonviterbisegmentations,and DecodingErrorrate(%)Decodingtime testedwithviterbiorfulldecoding. Table8.1:Acomparisonoferrorratesanddecodingtimesfor method^ Viterbi17.00.68 Full 20.40.82 ^ perword(s) 1.32 1.65 canbechosensothatwordsaremodelledaswellaspossible,andtogiveoptimumrecognitionperformance.asarstapproximation,itcouldbesaidthat allstatesequencesareequallylikely,andsoallthetransitionprobabilities couldbemadeidentical(ap;p=ap;p+1=128p).sinceaxednumberofframes Thissectioninvestigateshowthetransitionprobabilitiesap;rinequation8.6 8.2Durationmodelling isbeingdecoded,anystatesequencewouldhaveprobabilityp(s)=(12)(+1). O-linehandwritingrecognition 84

andthewordprobabilitiesdependentirelyontheobserveddata,takingno Inthiscasethestatesequenceprobabilityhasnoeectontherecognition, accountofwhetherthestatesequenceisreasonablefortheword. Practically,though,anumberofimprovementscanbemadetothetransitionprobabilitiestomaketheMarkovmodelsmodelthetruedurations CHAPTER8.HIDDENMARKOVMODELLING durationofthemodeltobeequaltothemeanobserveddurationofaletter: q=1=dav.infact,insuchasimplemodel,thiswillmerelytendtofavour oflettersmuchbetter.hochberg(1992)hasusedsimilartechniquesforthe longorshortwordsdependingonwhetherp>qornot,becauseforaword modellingofhmmstatedurationsinspeechrecognition.inthesimple,onestate-per-lettermodelofgure8.1,thetransitionprobabilitiesfordwelling ofletters,p(s)=p(+1)(qp).adjustingthemeanlengthofeachmodelindividuallygivesimprovedmodelling,buttostarttoobtainaccuratemodels inagivenstateorexitingtothenextstateinaword(pandq=1 prespectively)canbeadjusted.theobviouschoiceistoarrangefortheexpected ofthelengthsofletters,thedurationdistributionneedstobeexamined. maininginthestatefornframes.thedurationdistributionofthesimple modelofgure8.1isgeometric,asinthesolidlineofgure8.2. ThedurationdistributionspeciestheprobabilitiesP(n)8n>0ofre- dottedlineofthegure).betterperformance(ultimatelyintermsofreduced errorrates)istobeexpectedifp(s)canbemodelledmoreaccurately. Thisdoesnotmatchthedurationdistributionsfoundinpractice(showninthe 8.2.1Enforcingaminimumduration P(n)=pn 1q (8.20) durationdmin1isenforced.thisforcesp(n)=0forn<dmin. Itisfoundthatpoormodellingoftenresultsfrompassingthroughamodelin lettermodel.therstistochoosedmintobethesmallestdurationobserved bytheincreaseinthedataprobability.toavoidthisproblem,aminimum asingleletterisveryrarelycontainedinasingleframeofdata.althoughthe probabilityofsuchashortdurationwillbeverylow,thiscanbeoutweighed asingletimestep,whenthedatamatchthecurrentmodelverybadly,though inthetrainingset,butthisissubjecttonoise,particularlysincethedurations aredeterminedautomatically.abettermethodseemstobetochoosedmin= dav=2,thoughothersimilarmethodsworkjustaswell. Severalmethodshavebeenusedtochoosetheminimumdurationofa exactlythesame,buttherearetwiceasmany.whenviterbidecoding,this resultsinaminimumdurationdmin,longerdurationshavingprobabilities eachofthestatesinagivenmodel,asshowningure8.3.thegraphicdata probabilitiesarethesameforallthestatesineachclass(i.e.theemission probabilitiesaretied).theoperationsforcalculatingthelikelihoodsare Thesimplestmethodofimplementingaminimumdurationistorepeat O-linehandwritingrecognition 85

Probability CHAPTER8.HIDDENMARKOVMODELLING 1state 2states,Viterbi 2states,full Observed models,comparedwithobserved`'durations. Figure8.2:ProbabilitydistributionsforthesimpleMarkov Duration(frames) 0.3 givenbythegeometricdistribution.theprobabilityofremaininginsucha Figure8.3:AMarkovmodelfortheword`one'withtwostates perletter. oonneeone wheredavistheaveragedurationofaletterdeterminedfromthetraining modelfornframesisgivenby: set.infactthesearelikelihoods,andthenormalizedprobabilitiesare P(n)=(pn dminqdminndmin q= dav dmin+1 01 otherwise (8.22) (8.21) arepermitted,thedistributiongivenbythismodelisnolongergeometric, formtoenforceminimumphonedurationsinspeechrecognition. Robinson(1994),forexample,usesgeometricdistributionmodelsofthis Whendoingfull(asopposedtoViterbi)decoding,wheremultiplepaths P(n)=(pn dminqndmin 0 otherwise: (8.23) O-linehandwritingrecognition P(n)=(Cn 1 q=dmin dav: 0dmin 1pn dminqdminndmin otherwise (8.24) (8.25) 86 0.25 0.2 0.15 0.1 0.05 0 0 2 4 6 8 10 12

Thisdistributionisclosertotheobserveddistribution(gure8.2),butbybettermodellingofthewholeoftheprobabilitydistribution,theperformance canbeincreasedstillfurther. CHAPTER8.HIDDENMARKOVMODELLING 8.2.2Parametricdistributions 1 2 3 Pdwellm Figure8.4:Acomplexdurationmodelwithmstatesforone P(1) P(2) P(3) P(m) Moredetailedmodellingofthedurationprobabilitydistributioncanbeaccomplishedwithamorecomplexmodel,showningure8.4.Here,each letterisrepresentedbymstates.therstm 1statescorrespondtoletter durationsoffrom1tom 1frames.Fromeachofthesestates,theonlypermittedtransitionsareontothenextstateofthesameletterorontotherst withthedurationprobabilitiesp(n).thenalstatehasadwellloopwhich stateofthenextletter.thetransitionstothenextletterarethuslabelled letter. givesthedistributionageometrictail.theprobabilitypdwellisadjustedto maketheexitprobabilitiessumtoone: sumoftheprobabilitiesatanynodenotequaltoone,thesumoftheprobabilitiesoftransitionoutofthemodelisone,sothedurationofthemodel isdescribedbyaprobabilitydistribution.infact,bynormalizingappropri- m 1 Xn=1P(n)+1Xn=mP(m)Pn m ately,thesamemodeldurationdistributioncanbemaintainedwhilemaking dwell=1: (8.26) Theremainingtransitionsaregivenprobabilityone.Whilethismakesthe andfollowsageometricdistributionthereafter.however,thedecodingtime thesumofprobabilitiesateachstateequaltoone,buttheformdescribed eectthatmodellengthhasonrecognitionaccuracy. distributioncanbemodelled.withmstatesthemodelisperfectupton=m, isproportionaltothenumberofstates,sothelengthofthemodelmustbe chosenfromatrade-obetweenaccuracyandspeed.table8.2showsthe hereisclearer. Thedurationdistributioncouldbemadetofollowexactlytheobserved Themorestatesinthemodel,themoreaccuratelyagivenprobability durationhistogramfromthetrainingdata.withoutlargequantitiesofdata, O-linehandwritingrecognition 87

Probability Probability Geometric Poisson Gamma Observed CHAPTER8.HIDDENMARKOVMODELLING Geometric Poisson Gamma Observed forthreedurationmodels,com- Figure8.5:Probabilitydistributions Duration(frames) Duration(frames) 0.3 0.3 paredwiththehistogramofob- served`'durations. Figure8.6:Thesamewithaforced butionisusedwhichtstheobservedhistogramwell.inthiswork,three however,thesedistributionsarenoisy,soaparametricprobabilitydistri- minimumdurationof3frames. durationmodelshavebeeninvestigated basedonthegeometric,poisson andgammadistributions.ineachofthesecases,theparametricdistributionisusedtocalculatetheprobabilityofbeinginalettermodelforagiven numberofframes.eachofthesedistributionscanbeshiftedtoimposea truepoissondistribution,p(0)6=0. Evenforthecasedmin=1,thePoissondistributionisshifted,sinceforthe minimumdurationdmin1. ThePoissondistribution Schenkeletal.(1994)haverecentlyusedthePoissondistributionforduration modellinginon-linehandwriting. P(n)=8><>:e n dmin =dav dmin: 0(n dmin)!ndmin otherwise (8.27) Thegammadistribution (8.28) themeanandvariance.thevaluesofandaresetaccordingtothemethod O-linehandwritingrecognition Thisdistributionisparametrizedbytwoparametersandwhichdetermine 88 0.25 0.2 0.15 0.1 0.05 0 0 2 4 6 8 10 12 0.25 0.2 0.15 0.1 0.05 0 0 2 4 6 8 10 12

ofmoments:= dmin+1 CHAPTER8.HIDDENMARKOVMODELLING P(n)=8><>:(n+1 dmin) 1e (n+1 dmin) =( dmin+1)22 0 () ndmin otherwise: (8.30) (8.29) seenthatenforcingaminimumdurationof2inthegeometricmodelreducestheerrorrate,butfurtherincreasesimpairtheperformance.bothof thecomplexdurationmodelsperformbetterthanthegeometricdistribu- 8.2.3Results Sampleerrorratesandrecognitiontimesareshownintable8.2.Itcanbe (8.31) The8stategammadistributionisusedinotherexperimentsthroughoutthis tionmodels,andthegammadistributionperformsbetterthanthepoisson thesis. (T(34)=4:49;t:999(34)<3:14).Modellinglongerdurationsmoreaccurately byaddingstatesimprovestheperformancebutthereturnsdiminishandthe computationtimeincreases.comparingthe2and8stategammadistributionsshowsasignicantreductioninerrorrate(t(4)=3:28;t:975(4)=2:78), butcomparing8and12stategammadistributionsdoesnot(t(4)=0:16). tinguishingbetweensingleanddoubleletters.inthegeometricmodel,for agivensetofdata,thereisnodierencebetweentheprobabilitiesforthe models`reed'and`red'forexampleifthedurationofthe`'islongerthanthe minimumdurationofthetwo`e'models.however,withthemorecomplex thosewithsingleletters.inthe`reed/red'example,`red'willhaveahigher durationmodels,thosewithdoubleletterswillhavedierentprobabilitiesto Onespecicwayinwhichthebettermodellingismanifestedisindis- probabilitythan`reed'ifthenumberofframeswithhigh`e'probabilitiesis Havingtrainedthenetworkforsometime,ithasagoodestimateofthe mean. 8.3Targetre-estimation aroundthemeandurationofan`',loweriftherearemorethandoublethe probabilityofeachframebelongingtoanyletter.giventhecorrectword, thebeststatesequencesforthiswordrepresentsasegmentationgiving anewlabelforeachframe.foranetworkwhichmodelstheprobability distributionswell,thissegmentationwillbebetterthantheautomaticsegmentationofsection7.1.2sinceittakesthedataintoaccount.findingthe mostprobablestatesequencesistermedaforcedalignment.sinceonly thecorrectwordmodelneedbeconsidered,suchanalignmentisfasterthan O-linehandwritingrecognition 89

DurationNumberErrorrate(%)Recognition modelofstates^ 1CHAPTER8.HIDDENMARKOVMODELLING Geometric 18.20.97 16.60.92 17.11.00 ^timeperword(s) 26.10.79 0.42 16.50.94 0.62 16.40.82 16.30.76 Poisson 16.10.82 0.55 16.20.86 0.83 1015.90.79 0.91 1215.70.74 1.43 2 16.50.92 1.65 3 16.40.90 2.14 4 15.90.69 2.49 Gamma 1015.50.77 1215.50.81 68 15.70.78 15.60.72 0.55 0.83 0.91 1.43 1.65 2.14 thisautomaticsegmentationgivesabetterrecognitionrate,butstillavoids thesearchthroughthewholelexiconrequiredforrecognition.trainingon models. Table8.2:Sampleperformanceguresforthedierentduration 2.49 quencewhenusingan8stategammadistributionmarkovmodel,butwith anuntrainednetwork,sothegraphicdatahasnoeectonthesegmentation.thisissimilartothe`equallength'segmentationusedtobootstrap thesystem.(b)showstheeectofremovingthedurationmodel.thereis Figure8.7showsthreedierentsegmentationsoftheword`b r'.first (a)showsthesegmentationarrivedatbytakingthemostlikelystatese- thenecessityofmanuallysegmentinganyofthedatabase. nownothingtodistinguishbetweenthestatesequences,exceptslightdifferencesinthenetwork'sprobabilityestimatesduetoinitialasymmetry,so apoorsegmentationresults.aftertrainingthenetwork(c),thedurations lengthsegmentationsandspeedinguptraining.however,aftercompleting trainingonthesexedtargets,afurthersmallimprovementinrecognition deviatefromthepriorassumeddurationstomatchtheobserveddata.this re-estimatedsegmentationrepresentsthedatamoreaccurately,sogivesbettertargetstowardswhichtotrain. datalesandusedtotrainnewnetworks,avoidingtheless-accurate,equal- accuracycanbeobtainedbyusingthetargetsdeterminedbythenewnet- Havingtrainedonenetwork,thesegmentationscanbestoredwiththe work'sownre-estimationofthesegmentation. O-linehandwritingrecognition 90

b u CHAPTER8.HIDDENMARKOVMODELLING (a) t1ber (b) Figure8.7:Viterbisegmentationsoftheword`b r'.each linerepresentsoneletteriandishighfortheframestwhen ut(c) le r St=i.(b)isasegmentationwithanuntrainednetworkand nodurationmodel.(a)showstheeectofaddinganeightstate gammadistributiondurationmodel,andissimilartothe`bootstrap'segmentation.(c)isthesegmentationre-estimatedwith 0 numberofepochs(gure7.7).afteraplateauindicatingconvergence,train- Theeectsofthiscanbeseeninthegraphofrelativeentropyagainst segmentsarenotlabelledin(b). afullytrainednetworkandadurationmodel.forclarity,the ingonthexedtargetsisstoppedaccordingtothestoppingcriterion.train- ingonthenetwork'ssegmentationre-estimationisthenbegunandasteeper dropinrelativeentropyisseen.therelativeentropyfallssignicantlybecausethenewsegmentationisthatwhichisclosest(withintheconstraints oferrorrateagainstnumberofepochs(gure7.6),buttheeectislargely ofthedurationmodelling,andthecorrectwordmodel)tothatindicatedby maskedbynoise. thenetwork'soutputprobabilities.thustherelativeentropyoftheoutput Therelativeentropycontinuestofall.Similareectscanbeseeninthegraph andtargetdistributionswillimmediatelybelowerwhenthenewsegmentationisadopted.thereafter,anewsegmentationiscalculatedateveryepoch andthenetworkadaptsitsparametersinaccordancewiththissegmentation. trained,re-estimatingthetargetsateachiteration.theretrainingimproves therecognitionperformance(t(2)=3:91;t:95(2)=2:92). Table8.3showswordrecognitionerrorratesforthree80-unitnetworks trainedtowardsxedtargetsestimatedbyanothernetwork,andthenre- O-linehandwritingrecognition 91 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 1 l 0 0 5 10 15 20 25 30 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 5 10 15 20 25 30

Training method CHAPTER8.HIDDENMARKOVMODELLING Error(%) alignments. xedalignments,thenretrainedusingindividuallyre-estimated Table8.3:Errorratesfor3networkswith80unitstrainedwith Fixedtargets21.21.73 Retraining17.00.68 ^^ 8.3.1Forward-backwardretraining nitionliterature,apotentialmethodofimprovementcanbeseen.viterbi Thesystemdescribedaboveperformswell,butexaminingthespeechrecog- framealignmentshavesofarbeenusedtodeterminetargetsfortraining. Theseassignoneclasstoeachframe,basedonthemostlikelystatesequence, butabetterapproachmightbetoallowadistributionacrossalltheclasses indicatingwhicharelikelyandwhicharenot,avoidinga`hard'classication atpointswhereaframemayindeedrepresentmorethanoneclass,ornone (asinaligature).a`soft'classicationwouldgiveamoreaccurateportrayal oftheframeidentities. rithm(rabinerandjuang1986).toobtainthedistributionp(t)=p(st= qpjx0;w),theforwardprobabilitiesp(t)mustbecombinedwiththebackwardprobabilitiesp(t)whichrepresenttheprobabilityofobservingframes xt+1whenstartinginstatepattimet.thebackwardprobabilitiesarecalculatedsimilarlytotheforwardprobabilitiesofequation8.16: Suchadistributioncanbecalculatedwiththeforward-backwardalgo- Asuitablenaldistributionr()=rischosen,e.g.=1forthelast attimetisthengivenby: characteronly.thelikelihoodofobservingthedatax0andbeinginstateqp p(t)=p(t)p(xtjst=qp)xrap;rr(t+1): p(t 1)=Xrr(t)P(xtjSt=qr)ap;r: (8.32) normalization: Thentheprobabilitiesp(t)ofbeinginstateqpattimetareobtainedby p(t)=p(t) Prr(t): (8.33) Theseprobabilitiesareusedastargetsfortherecurrentnetworkoutputs. pleoftheword`b r'.theprobabilitiesshownarethoseestimatedbythe forward-backwardalgorithmwhenusinganuntrainednetwork,forwhichthe P(xtjSt=qp)willbeindependentofclass.Despitethelackofinformation, O-linehandwritingrecognition Figure8.8ashowstheinitialestimateoftheclassprobabilitiesforasam- 92

framemustbelongtotherstletter,andthelastframemustbelongtothe lastletter,ofcourse,butitcanalsobeseenthathalfwaythroughtheword, themostlikelylettersarethoseinthemiddleoftheword.severalclass theprobabilitydistributionscanbeseentotakereasonableshapes.therst CHAPTER8.HIDDENMARKOVMODELLING probabilitiesarenon-zeroatatime,reectingtheuncertaintycausedsince thenetworkisuntrained.nevertheless,thislimitedinformationisenough totrainarecurrentnetwork,becauseasthenetworkbeginstoapproximate durationmodel,asshowningure8.8b,whichgivesmorepronouncedpeaks theseprobabilities,thesegmentationsbecomemoredenite.incontrast, intheprobabilitiesforindividualletters,becausethedurationmodelreduces trainedtowardstheincorrecttargets,reinforcingitserror. (gure8.7b).thesegmentationisverydenitethough,andthenetworkis work,themostlikelyalignmentcanbeverydierentfromthetruealignment usingviterbisegmentationswithnodurationmodelforanuntrainednet- theuncertaintyintheirlengthandlocation.figure8.8c,dshowstheeect thatdividingbytheclasspriorprobabilityhasonthesegmentation.withno durationmodel,thesegmentationisdistorted,butwhenthedurationmodel isimposed,thesegmentationisbetter(strongerpeaks,whichoverlapless) Theprocessoftraininganetworkcanbespeededupbyenforcingastrong thanbeforedividingbytheclassprior. 1butl (a) er (b) Figure8.8:Baum-Welchsegmentationsoftheword`b r' (c)showstheeectofdividingbythepriorclassprobability ofaddinganeightstategammadistributiondurationmodel. withanuntrainednetwork.(a)isthesegmentationusingno durationmodel,andauniformclassprior.(b)showstheeect (c) (d) 0 used)givesamuchmorerigidsegmentation(gure8.9a,b),withmostof theprobabilitiesbeingzeroorone,butwithaboundaryofuncertaintyat O-linehandwritingrecognition Finally,atrainednetwork(especiallywhenastrongdurationalmodelis (equation8.15).(d)showsthesamewithadurationmodel. 93 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 1 0 0 5 10 15 20 25 30 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 5 10 15 20 25 30 1 0 0 5 10 15 20 25 30

CHAPTER8.HIDDENMARKOVMODELLING Figure8.9:Baum-Welchsegmentationsoftheword`b r' and(b)hasaneight-stategammadistributiondurationmodel. usingtrainednetworks.(a)hasthegeometricdurationmodel representpartsoftwoletters,oraligaturebetweentwo,allowsthenet- thetransitionsbetweenletters.thisuncertainty,whereaframemighttruly worktrainedwiththeforward-backwardalgorithmandtestedusingfullfor- wardprobabilitiestogiveimprovedrecognitionresultsoveranetworkusing Viterbialignmentsandtesting.Theimprovementisshownintable8.5.The nalprobabilisticsegmentationcanbestoredwiththeframesofdatainthe samewayastheviterbisegmentationwas,andusedwhensubsequentnetworksaretrainedonthesamedata.trainingisthensignicantlyquicker thanwhentrainingtowardstheapproximatebootstrapsegmentations. retrained,re-estimatingthetargetsateachiteration.aswiththecorretionperformance(t(4)=3:11;t:975(4)=2:78). towardsxedbaum-welchtargetsestimatedbyanothernetwork,andthen Table8.4showswordrecognitionerrorratesfor80-unitnetworkstrained Training method Error(%) spondingviterbialignments(gure8.3)theretrainingimprovestherecogni- alignments. Baum-Welchalignments,thenretrainedusingre-estimated Table8.4:Errorratesfor5networkswith80unitstrainedwith Fixedtargets16.90.75 Retraining15.60.72 ^^ errorislowestifthesystemistestedwithviterbiratherthanfulldecoding.forbaum-welchtargets,thedierenceissmallerbutstillsignicant (T(4)=4:94;t:995(4)=4:60). abilitieswhentraininganddecoding.itcanbeseenthattheerrorratesfor thenetworkstrainedwithbaum-welchtargetsarelowerthanthosetrained onviterbitargets(t(2)=5:24;t:975(2)=4:30).asseenintable8.1,the Table8.5showsacomparisonbetweentheuseofViterbiandfullprob- O-linehandwritingrecognition Markovmodel,andthetablesinsection7.3.3refertomodelsretrainedwith Baum-Welchretrainingisthestandardmethodofretrainingthediscrete 94 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30

TrainingCHAPTER8.HIDDENMARKOVMODELLING Table8.5:Errorratesfornetworkswith80unitstrainedwith method ViterbidecodeFulldecode ^ Error(%) Viterbi(3networks)orBaum-Welch(5networks)alignments, thentestedusingviterbiorfullprobabilitydecoding. Baum-Welch15.40.7415.60.72 17.00.6820.40.82 ^ ^ ^ Baum-Welch.ThenetworkestimationsusedtoprimethetrainingaregenerallybetterthanthoseofthediscreteHMM,soonlyasmallimprovement 8.4Languagemodelling isseenbyretraining. theeldofspeechrecognition(waibelandlee1990:ch.8).thesystemas describedsofarhasalanguagemodelbuiltinintheformofaxedlexicon whichlimitsthesearchtoasetlofpermittedwords. Oneareawheregreatgainsinrecognitionaccuracycanbemadeisbylanguage modelling,ascanbeseenfromthewealthofliteratureonthisareafrom Iamnotyetsolostinlexicography::: SamuelJohnson. Thelexiconusedsofarwaschosentobetheunionvocabularyofthetraining,testandvalidationsets,sothatanywordinthecorpuswouldbeinthe lexicon.inpractice,thelexiconsizewouldbedictatedbythetasktobedealt with.inanapplicationsuchasreadingcheques,thevocabularysizewouldbe around35words,comprisingnumbers,currencyunits,`and'andsoforth.on theotherhand,fortranscribinglonghanddocuments,thevocabularywould 8.4.1Vocabularychoice needtobetensofthousandsofwords,tocovernearlyallthewordslikely tooccur.thesizeofthevocabularyaectstheperformanceofanyrecognitionsystembecausewhenitislarge,wordssimilartothecorrectwordare identifyonlythecityoronlythestatename,thesehavingbeensegmented fromtheaddressblock.thevocabularyisnowmuchsmaller,makingthe morelikelytobepermitted.forinstance,inachequeapplicationtheword taskeasier.infact,themainreasonforusingcursivescriptinaddressreadingistodisambiguateconfusionsinreadingthezipcode.ifthezipcode largevocabularysystem,increasingthelikelihoodofconfusion. Inpostalapplications,thepotentialvocabularyislarge,containingall street,city,countyandcountrynames,butasystemmightberequiredto `hundred'isunlikealltheotherwords,but`hounded'mightbenecessaryina O-linehandwritingrecognition 95

uncertain,thevocabularywillreectthisuncertaintyandrisetoten,ahundredorathousandpotentialcitynames.(ifthecorrespondencebetween zipcodesandcitiesisnotone-to-one,thevocabularysizewillvary,butthis isreliablyread,thecitywillbeknown,butifone,twoorthreedigitsare CHAPTER8.HIDDENMARKOVMODELLING isaroughguide.)thusthesearereasonablevocabularysizesfortestinga postalsystem,withthevocabularybeingdynamicallychosenfromalonger listaccordingtothecitiesmatchingtheknowndigitsofthezipcode. LexiconErrorrate(%)Timeper 104816.10.73 215518.30.73 455420.80.72 size 50113.30.72 ^ ^ word(s) 0.22 0.33 0.61 beenconductedwithlexicacontainingdierentnumbersofwords.table8.6 Totesttheeectonerrorratethatthelexiconsizehas,experimentshave icaofdierentsizes. Table8.6:Errorratesfromtestingve80-unitnetworksonlex- 973323.70.72 1.27 andgure8.12showtheresultsoftheseexperiments.thelexicaarecreated 2.83 ensuresthatthecorrectwordisalwaysinthelexicon,butallowslexicafrom 447to10,000wordstobetested.Inpractice,thelexicaweremadefrom approximately500,1000,2000,4000and8000words,butincludingallwords sharingthelowestfrequencyneededtomakeupthetotal,meantthatthese gureswereexceededineachcase.thisexperimentcorrespondstoonedone bytakingthevocabularyofthetest-set(447)andaddingtothatthemost frequentwordsfromthelobcorpusthatwerenotalreadyincluded.this byschenkeletal.(1994)whosimilarlyconstructlexicaincludingallthetestsetwords.theymakeupthetotalwithwordschosenrandomlyfromalarge lexiconthanwiththeusual1334wordlexicon(15.6%). dictionarywhichwilltendtobelonger,andthuslessconfusablethanthe becauseoftheincreaseinsimilaritybetweenthepermittedwords.because themostcommonwordswereadded,andsincethesearetheshorterwords whichthesystemtendstoconfuse,theresultsareworsewiththis1048word mostfrequentwords.the501worderrorrateislowerthanthosequoted 8.4.2Grammars before,becauseofthesmallerlexiconsize,butlaterlexicagivemoreerrors Afterconsideringthevocabularyofthesystem,thenextlevelofcomplexity inlanguagemodellingistoimposeagrammaronthewords,tolimitwhich wordsarepermissibleinagivencontextortoaccountforthefrequencies simplyinvolvesdeterminingtheprobabilityofawordoccurring,andusing O-linehandwritingrecognition ofdierentwords.thesimplestformistermeda`unigram'grammar,and 96

Errorrate% Languagemodelfactor thatasthep(w)termofequation8.1.theprobabilitiesaredeterminedby frequencycountsinacorpusofdata,forinstanceinthewholelobcorpus (lessthetrainingset)orjustonthetrainingset.oneproblemwithdeningstochasticgrammarsisthatwordsinthegrammarmaynotoccurinthe CHAPTER8.HIDDENMARKOVMODELLING exist,butherethesimpleexpedientofassigningafrequencycountofoneto databaseavailablefortrainingthegrammar.complexsmoothingtechniques unobservedwordsisadopted. Languagemodelfactor Durationmodelfactor 4 Figure8.10:Ameshplotshowing theeectonerrorrateofweightingthelanguageanddurationmodel 3.5 3 probabilities. Figure8.11:Thecorrespondingcontourplot,showingtheminimumat Durationmodelfactor 14 2.5 12 2 1.5 languagemodelandthedurationmodelwithrespecttotheacousticmodel 1 givesbetterrecognition.thisisequivalenttorewritingequation8.14as Inpractice,ithasbeenfoundinspeechrecognitionthatweightingthe (3,2). 4 8 0.5 L(Wjx0)=P(W)X 4 2 2 S2S(W)P(x0jS)P(S): 0 (8.34) 0.5 1 1.5 2 2.5 3 3.5 0 0 abilityestimates.figures8.10and8.11showthevariationinerrorratewhen testingasinglenetworkastheweightsarealtered,keepingtheweightingof Varyingtheweightsandaectstherecognitionrate,andisamethodof indicatingtherelativedegreeofcondenceintheaccuracyofthethreeprob- thegraphicdataprobabilityequaltoone.theoptimumvaluesfoundare3 forthelanguagemodelweight,and2forthedurationmodelweight,. tothepreviousword orbigramgrammarswhichassignaprobabilityto whichusecontexttodeterminewhichwordsarepossibleinthenextposition suchaswordpairgrammarswhichsimplylimitthevocabularyaccording aword,conditionedonthepreviousword.bydeterminingstatisticsona largecorpusoftext,thefrequencyofoccurrenceofpairsofwordscanbe Muchresearchhasbeendoneintousingmorecomplexlanguagemodels O-linehandwritingrecognition 97

determined,givingthebigramgrammarp(wtjwt 1).Forpairsofwordsnot observedinthecorpus,theunigramgrammarmustbeusedinstead.more contextcanbeused,asinthegeneraln-gramgrammarp(wtjwt 1;:::;Wt n), andparsingsentencesduringrecognitioncangiveinformationaboutwhat CHAPTER8.HIDDENMARKOVMODELLING likelyinthefollowingtext,andjelinek(1991)discussesothermethodsof languagemodelling.thepresentsystemconsiderseachwordinisolation, partsofspeecharepossibleorlikelyinthenextword.kuhnanddemori (1990)describeamethodofcachingrecentlyusedwordsasthesearemore sononeofthesemorecomplexschemeshasbeenimplemented,thoughthey wouldbeappropriateforasystemtranscribingsentences.chequeamounts andpostaladdresseshaveasimplestructureforwhicharestrictivegrammar canbewrittentosignicantlyreducethenumberofwordsthatneedtobe consideredatthenextstage. Nogrammar Grammarbasedontrainingsetonly GrammarbasedonwholeofLOBcorpusEntropyPerplexity 10.38 8.96 9.72 1334 500 is,istomeasureitsperplexityq(g)(lee1989:p.145).thisistheaverage recognitionrate.acrudemethodofquantifyinghoweectiveagrammarg Allgrammarsareusedtolimitthechoiceofwords,andsoimprovethe Table8.7:EntropyandperplexityofgrammarsfortheLOBcorpus. 845 overallwordsofthenumberofpermittedsuccessorwords.foraunigram Leenotesthatthis\doesnotreecttheuncertaintyencounteredwhendecoding."Ifthegrammardoesnotreecttheactualfrequenciesofthewords probabilitydistribution,measuredinbits: grammar,thisissimplytwotothepoweroftheentropyh(g)oftheunigram inthetestset,thentheperplexityisapoorguidetothegrammar'sutility. H(G)= XW2LP(W)log2P(W) Q(G)=2H(G): (8.36) (8.35) AbettermeasureisthetestsetperplexityQtest(G)calculatedfromthecross entropyofthetestset,giventhegrammar(charniak1993:p.34): whereptest(w)istheproportionofthetestsetthatwordwrepresents,not theunigramprobabilityp(w).(wheretestsetwordsarenotinthelexicon, asinsection8.4.4,ptest(w)iscalculatedasaproportionofthein-vocabulary Htest(G)= XW2LPtest(W)log2P(W) Qtest(G)=2Htest(G); (8.37) O-linehandwritingrecognition (8.38) 98

LexiconPerplexity 1334 size CHAPTER8.HIDDENMARKOVMODELLING 1048 2155 501 500 845 742NogrammarUnigram 15.6 13.3 Errorrate(%) 4554 9733 1029 1119 1188 921 16.1 18.3 20.8 23.7 14.5 15.6 13.8 15.5 16.8 17.7 words.)thisperplexitymeasureindicateshowusefulthegrammarisatlimitingthechoiceofwordstothoseinthetestset,whichisthefunctionthat thegrammarshouldperform. matedonthelobcorpushasahigherperplexitythanthatestimatedonthe forthelexicawithlengthsotherthan1334areestimatedonthelobcorpusexcludingthetestset.becauseofthemismatchbetweenthetestset distributionandtheunigramprobabilities,theperplexityforthe501word trainingset.theeectofusingthesegrammarsforrecognitionisshownin vocabularyishigherthanthelexiconsize,andthe1334wordgrammaresti- Sampletest-setperplexitiesareseenintables8.7and8.8.Theunigrams trainingsetgrammarandthelobcorpusgrammar. icaofdierentsizes.the1334wordlexiconistestedwiththe Table8.8:Errorratesfromtestingve80-unitnetworksonlex- 18.7 table8.8.itcanbeseenthatusingagrammardecreasestheerrorrateinall casesexceptwiththe501wordlexiconwhentheperplexityofthegrammar itycanbeseentoindicatetheeectivenessofthegrammarreasonablywell. Thisishighlightedingure8.14wheretherecognitionrateisseentobeproportionaltothelogperplexityforeachofthetypesoflexiconandgrammar ishigherthanthelexiconsize(gures8.12and8.13).thetestsetperplex- used,thoughtheslopesdierbetweenthegrammartypes. NormalizationSlopecorrection;SrihariandBozinovic'sslantestimate;Zhang 8.4.3Experimentalconditions Atthispoint,thewholeofthestandardtestsystemhasbeendescribed,and itisnowpossibletosummarizetheconditionsusedforearlierexperiments. ments.thetypicalconditionsareasfollows: Theseconditionsareusedeverywhereexceptasnotedinindividualexperi- O-linehandwritingrecognition RepresentationUniformhorizontalquantization;7bandverticalquantization;skeletoncodingatfourangles;turn,endpoint,junctionanddot andsuen'sthinningalgorithm. features;elevensnakefeatures. 99

Errorrate% Errorrate% Perplexity 25Nogrammar Unigram CHAPTER8.HIDDENMARKOVMODELLING Lexiconsize(logscale) 4Nogrammar Unigram Figure8.12:Agraphoferrorrates averagedoverve80-unitnetworks. Lexiconsize(logscale) 10 Errorratesareshownwhentesting withandwithoutaunigramgrammar. Figure8.13:Thetest-setperplexitiesfortheunigramgrammarsplottedagainstlexiconsize.Thelexicon size(theperplexitywithnogrammar)isalsoplottedforcomparison. Figure8.14:Agraphoferrorrateagainstperplexityforlexicaof dierentlengthswithandwithoutuseoftheunigramgrammar. Nogrammar Figuresfortwodierentgrammarsareshownforthe1334word Perplexity(logscale) Unigram 1334words 25 RecognitionRecurrentnetwork;80feedbackunits;26softmaxoutputunits. TrainingBack-propagationthroughtimewiththemodieddeltabar-delta terion;retrainingtowardsre-estimatedtargets. scheme;trainingtowardsxedbaum-welchtargetsuntilstoppingcri- lexicon(basedonthetrainingsetorlobcorpus). Testing1334wordvocabulary;nounigramgrammar;8stategammadistribu- O-linehandwritingrecognition tiondurationmodel;durationmodelweightingof2;fullforwardproba- bilitycalculation.testswithaunigramgrammarusethegrammarbased onthetrainingandvalidationsets,andagrammarweightingfactorof3. 100 20 15 10 5 0 10 3 10 4 20 15 10 5 0 10 3 10 3 10 4 10 3 10 4

sofar.improvementscanbemadeby:usingthecannyslantestimate;increasingthenumberoffeedbackunits;usingtheunigramgrammarandusing non-uniformquantization.subsequentexperimentsdescribedinthischapter Itwillbenotedthattheseconditionsarenottheoptimalconditionsfound CHAPTER8.HIDDENMARKOVMODELLING dataset.errorratesforthisnetworkareshownintable8.9. backunits thelargestnetworktrainedonthenon-uniformlyquantized usealloftheseenhancements,butthenetworksizeislimitedto160feed- Conditions Beforeretraining,nogrammar,fulldecoding Afterretraining,nogrammar,fulldecoding Afterretraining,perplexity500grammar,fulldecoding Afterretraining,perplexity500grammar,Viterbi Errorrate(%) 11.6 9.6 9.2 8.4.4Coverage 1334wordvocabulary. Table8.9:Errorrateswhentestinga160-unitnetworkonthe 8.8 describesonemethodofdoingthis),becondemnedtoincorrectlyclassify thesenon-wordsoragthattherewasanout-of-vocabularywordforhuman identifyawordthatisnotinthelexicon.achequeamountcouldbelledin incorrectly,oralargevocabularysystemmightbepresentedwithaproper nameorneologismwhichwouldnotbeinthelexicon.thusasystemmust beableeithertorecognizewordsnotinthevocabulary(thenextsection Inmostapplications,thereisachancethattherecognizerwillbeaskedto of-vocabularywords,thenthisgureisanupperboundontheproportionof shouldbeabletoclassifythem,thevocabularyistermed`open',incontrast proof-reading. wordsthattherecognizercanclassifycorrectly.somesamplecoveragesfor tothe`closed'vocabularytaskassumedabove.foranopenvocabularytask, theissueofcoveragemustbeaddressed theproportionofwordsinatext whichareinarecognizer'slexicon.ifthereisnomethodofrecognizingout- Inthecasewhereout-of-vocabularywordsarenoterrors,andthesystem thelobcorpuswithlexicaofdierentsizesareshownintable8.10.ineach case,thelexiconismadeofthenmostfrequentwordsfromthelobcorpus. assessed.onanyothercorpus,coveragewouldattenomoreforlarger lexica.thecoverageproportionsarecomparedwiththeperformanceofthe 160-unitnetworkofsection8.4.3. Itshouldbenotedthatthecoverageguresforthelargerlexicaarearticially highbecausethelexicaarederivedfromthecorpusonwhichcoverageis Asameasureofhowwellthesystemisperformingcomparedtothisupper thelexiconsizeincreases,therecognitionrateincreases,thoughitdoesnot riseasfastasthetestsetcoverageratewhichistheoptimalperformance. O-linehandwritingrecognition Theseresultsareshowngraphicallyingure8.15.Itcanbeseenthat,as 101

LexiconCoverage(%)Errorrate(%) sizenlobtesttestsetinlexiconperplexityperword(s) 29.910.9CHAPTER8.HIDDENMARKOVMODELLING 89.1 0.0Test-setDecodingtime 12551.851.9 1628.428.7 3236.536.1 6444.643.7 415.514.9 821.922.1 85.1 78.6 73.7 66.5 59.8 53.2 10.0 0.6 3.6 8.6 7.4 8.1 12.6 22.0 37.2 61.9 1.9 3.5 0.76 0.77 0.78 0.80 1000093.894.1 25058.658.3 50065.466.8 47.5 39.8 9.9 156.6 94.8 0.85 2000097.597.7 100072.672.5 200079.781.0 34.3 27.4 10.3 9.5 226.1 369.7 0.96 3000099.099.3 400086.688.5 20.4 16.0 10.1 10.9 571.6 822.8 1.19 1.68 Table8.10:Coverageratesforlexicacomposedofthenmost 13.9 11.9 1048.2 2.70 frequentwordsfromthelobcorpus,onthelobcorpusasa 12.6 12.0 1179.4 4.93 whole,oronthelobtestset.thelattergureistheupper 11.8 boundonthenumberofwordscorrect.errorratesareshown 23.9 asapercentageofwordsincorrectinthetestsetandasapercentageofthemaximumpotentialwordscorrect.recognition timespertestwordareshown. 36.8 vocabularywords(whichthesystemcouldhavecorrectlyidentiedwiththat lexicon)whicharemisclassied.thisrisesfrom0%withtwowords(allwords `the'and`of'arecorrectlyclassied)to12%witha30,000wordvocabulary. bound,thein-lexiconerrorrateisalsoplotted.thisistheproportionofin- timeincreaseslinearlywiththelengthofthelexicon(ascanbeseenintable8.10wheretherecurrentnetworktakesapproximately0.76sperword, Inthesystemdescribedhere,whichhasnotbeenoptimizedforspeed,with 8.4.5Searchissues alargelexiconthemajorityoftherecognitiontimeisspentcalculatingthe probabilitiesinthehiddenmarkovmodelratherthanestimatingtheposteriorsintherecurrentnetwork.sincethereisonemodelperword,thesearch testsdescribedhere. thesehasyetbeenimplementedinthesystem,butallcouldbeaddedsimply.patiencewastheonlystrategyadoptedforthefewlarge-vocabulary ofstrategieswhichmustbeimplementedtoincreasethespeed.noneof plus10-3sperlexiconitem).foradevelopmentsystemwitha1000word O-linehandwritingrecognition vocabularythisistolerable,butforlargervocabulariesthereareanumber 102

Percentcoverage/recognition Perplexity 100Coverage Errorrate In-lexiconerrors CHAPTER8.HIDDENMARKOVMODELLING Lexiconsize(logscale) 5Unigram Nogrammar coveragerateforlexicaofdierent Figure8.15:Agraphoftest-set Lexiconsize(logscale) 10 sizes.recognitionratesfora160- unitnetworkareshown,andthe failurerateisalsoplotted.failure istheproportionofin-vocabulary wordsthatarewronglyclassied.figure8.16:agraphofthetestset forin-vocabularywords. perplexityoftheunigramgrammar stateisfoundtobemuchlesslikelythantheotherstatesthesearchalong canbesaved,atthecostofasmallorganizationaloverhead(gure8.17). theselettersarebeingrepeated.bystoringthelexiconinatree,thislabour words`proud'and`proof'sharetherstthreeletters,thecalculationsfor wouldbetoorganizethelexiconaccordingtoatreestructure.sincethe Furthertimesavingscanbeintroducedbypruningthesearchpath.Ifa Therstsaving,whichdoesnotaecttheperformanceoftherecognizer, network. cruderecognitionmethod.themethoddoesnotneedtobeveryaccurateifit pathsleadingfromthatstateisterminated.similarly,onlythen-bestpaths canrejectareasonableproportionofthevocabularybutrarelyrejectthecorrectword.potentialmethodsmightincluderunningacut-downrecognizer eectivelyprunedbyexaminingtheposteriorprobabilitiesestimatedbythe quired.renalsandhochberg(1994)haveshownthatthesearchcanbevery ateachtimestepneedberetained,reducingthenumberofoperationsre- withone-state-per-letter-models,oratechniqueassimpleasconsidering theheight:widthratioofaword,orrecognizingjusttherstletterwithan isolatedcharacterrecognizer.afterreducingthevocabularywiththissimple Speedmightalsobeimprovedbyrestrictingthevocabularyusingasimple, method,thefullrecognizercanberunwiththesmallervocabulary.systems foron-linerecognitionalreadyusethisfastmatchapproach(schenkeletal. 1994). O-linehandwritingrecognition 103 80 60 40 20 0 10 0 10 5 10 4 10 3 10 2 10 1 10 0 10 0 10 5

CHAPTER8.HIDDENMARKOVMODELLING p or os ot post Figure8.17:Threewordsfromalexiconstoredasatreetoreducethecalculationtimeindecoding. u df proud proof 8.5Rejection Andnonecanreadthetext noteveni. siedbytherecognizerand,accordingtoitslabel,determinedtobecorrect Theresultsquotedsofarhaveallbeenerrorrates,whereeachwordisclas- orincorrect.thisistheperformancemeasurewhichmustbeusedforany non-interactivetexttranscriptionsystem,foritisthenumberoferrorsthat issignicant.foranapplicationthatallowssomehumanintervention,however,amechanismforrejectioncanbeused.ifameasureofcondencefor MerlininTennyson'sIdyllsoftheKing. classiedwithlowcondencecanberejected.withagoodmeasureofcon- dence,manymoreincorrectwordsthancorrectwordswouldberejected, thesystem'sclassicationscanbeformulated,thenthosewordswhichare text.similarly,inapost-ocesortingsituation,ifthoseenvelopeswhose addressesareclassiedwithlowcondencearerejectedandmanuallysorted, thenumberofmachinesortedmailpiecesincorrectlyroutedwillbereduced. sotheproportionofacceptedwordswhicharecorrectwouldbehigherthan cation,reducingtheeortneededtoproof-readandcorrectthetranscribed therawrecognitionrate.foratexttranscriptionsystem,rejectedwordscan behighlightedinthetranscriptionandtheuserpromptedforcorrectclassithewordlikelihoodsandposteriorwordprobabilitiesforthemostlikely high,but,acknowledgingthedicultyofhandwritingrecognition,thepermittedrejectionratesarehigh(section2.2.1). Projectsdesignedtotacklecommercialproblemshavespeciedaccuracyand rejectiongoalsthattheclassiersmustmeet.becauserecognitionmustbe goodtomakeautomationcost-eective,theaccuracygureisusuallyvery O-linehandwritingrecognition Threerejectionmeasureshavebeenevaluatedforthissystem,basedon 104

Errorrate% arealreadycalculated,anditcanbeseenthatifthegraphicdatamatchesa wordmodelverywell,thenp(wbestjx0)willbeclosetooneandl(wbestjx0), likelihoodsl(wbestjx0),l(wsecondjx0)andprobabilitiesp(wbestjx0),p(wsecondjx0) word,wbest,andthesecondmostlikelyword,wsecond.inthedecoder,the P(Wsecondjx0)andL(Wbestjx0) CHAPTER8.HIDDENMARKOVMODELLING onthenumberofframes(+1)intheword).toobtainathresholdapplicable towordsofanylength,theloglikelihoodisscaledtobeindependentofthese factorsandthevariablethresholdedisthenormalizedlikelihood^l(wbestjx0): L(Wbestjx0)istheproductofavariablenumberofprobabilities(depending L(Wsecondjx0)willallbehigh.,butthissimplenormalizationwasfoundtobemosteective. Alternativescalingfactorshavebeentested,incorporatingtheweightsand log^l(wbestjx0)=logl(wbestjx0) Likelihooddierence Combined+1: (8.39) onnormalizedmaximumloglikelihood,dierenceinnormalized Figure8.18:Erroragainstrejectionproportionforthresholding loglikelihoodandacombinedscheme. Percentrejected 8 rejectionratecanbefound.theerrorratecanbeplottedagainsttherejectionrateforavarietyofthresholdvalues,toshowthetrade-obetween Byvaryingathresholdonanyofthesedimensions,andrejectingwords whichfallbeyondthethreshold,theerrorrateincorrectwordsaccepted innormalizedlogofthebesttwowords'likelihoods.thelikelihooddierencemethodworksbetterthanthelikelihoodmethod,sincetheerrorrate totalwordsacceptedand rejectionandaccuracy.figure8.18showsthesecurveswhenthethreshold isonthenormalizedlogofthemaximumlikelihood,andonthedierence ofacombinedmethodwhichthresholdsonalinearcombinationofthetwo measures.methodsbasedontheposteriorprobabilitygavesimilarresults (understandably,sincetheposteriorsarecloselyrelatedtothelikelihoods, byequation8.15).usingsucharejectioncriterionincreasestheaccuracyof thesystem,e.g.givingerrorratesaslowas1.2%whenrejecting20.8%ofthe islowerforagivenrejectionrate.thisgraphalsoshowstheperformance words,or4.9%whenrejecting8.0%. O-linehandwritingrecognition 105 7 6 5 4 3 2 1 0 0 5 10 15 20 25

8.6Out-of-vocabularywordrecognition Wordsandwordlessness.Betweenthetwo... CHAPTER8.HIDDENMARKOVMODELLING Ifthevocabularyisnotinherentlylimitedbythetask(inwhichcaseanout ofvocabularywordisanerror),thesystemshouldbeabletodetectthatthe wordispoorlyrecognizedand,ifpossible,shouldthenuseanalternative strategytorecognizetheword TonyHarrison.Wordlists. z ȧ b dc gure8.19.eachcirclerepresentsalettermodel,withoneormorestates. Theinitialdistributionisuniformacrosstherststatesofeachlettermodel. Theprobabilitiesarecombinedtondthe0probabilitiesasbefore,butafter Onesuchstrategyistocreateanon-wordMarkovmodel,asshownin Figure8.19:Anon-wordMarkovmodelshowingsomeofthe eachletteriscomplete,atransitiontoanyofthelettersispermitted.asthe 26lettermodels. dataareaccumulated,apathistracedbetweensuccessiveletters. letterscorrespondingtoitsstatesequencecanbeprintedout.viterbidecodingisused,sincendingthebestsequenceofletterswhencalculatingfull probabilitiesismuchmoredicultthaninthexed-vocabularytask.justas makingatransitionfromonelettertoanother,andtheseprobabilitiescanbe withawordbigram,aletterbigramcanbecreateddetailingtheprobabilityof multipliedintothestatesequenceprobability.table8.11showstherecognitionratesforthenon-wordmodelwhenitisusedinsteadofalexicon.these resultscomparefavourablywiththesingle-authornon-worderrorratesof 78{92%ofEdelmanetal.(1990). Whenthenalframeisprocessed,themostlikelypathisfoundandthe O-linehandwritingrecognition (0.76scomparedto2.21swhenusinga1334wordlexicon,bothwiththe160- bythismethod,andthen,ifalexiconisavailable,thebestin-lexiconmatch unitnetwork.)thenon-wordmodelcouldbeusedasafastalternativetothe lexicon-baseddecoder.itispossibletondthemostlikelylettersequence Decodingwiththenon-wordmodelisfasterthanwhenusingalexicon. 106

log^l(nonwordjx0) Percentcoverage/recognition BigramweightErrorrate(%) CHAPTER8.HIDDENMARKOVMODELLING weightings(withthedurationmodelweight=3),usingviterbi Table8.11:Errorratesforthenon-wordmodelwithdierent 0123 60.3 decoding. 53.9 54.2 Lexiconcorrect 55.1 Non-wordcorrect Bothcorrect Figure8.20:Wordsplottedwith log^l(wbestjx0) Lexiconsize(logscale) Withnon-wordmodel Coverage Lexicononly 6000 non-wordnormalizedlikelihood 100 againstlexiconnormalizedlikelihood. Figure8.21:Agraphofrecognitionrateagainstlexiconsize, isdeterminedbyndingthewordwiththeminimumeditdistancefromthis vocabularywords.thecoverageof sequence.1severaloftheclosestwordscouldbeidentiedandusedasthe withandwithoutmodellingout-of- vocabularyforaslower,moreaccuraterecognition. Asystemhasbeencreatedwhichusesboththelexiconandthenon-word thelexicaisalsoshown. stringrespectively.theproblemthenistodecidewhichofthesehypotheses tochoose.ithasalreadybeenseenthatthenormalizedlikelihoodisagood model,ndingthemostlikelywordinthelexiconandthemostlikelyletter condencemeasurefortheclassicationofthelexicon-basedsystem.asimilarmeasurecanbedenedforthenon-wordmodel,basedonthelikelihood penaltiesareaccumulatedfordeletion,insertionandsubstitutionofletters.thiscomparisonisfasterthancalculationoftheprobabilitiesforeachword. ofthemostlikelystatesequence,l(nonwordjx0). 1Theeditdistanceiscalculatedbycomparingtheletterstringwitheachlexiconword,and log^l(nonwordjx0)=logp(wbest)l(nonwordjx0) O-linehandwritingrecognition +1 (8.40) 107 5000 4000 3000 2000 1000 0 0 2000 4000 6000 8000 10000 80 60 40 20 0 10 0 10 5

Notethat,tocorrectfortheeectoftheunigramgrammaron^L(Wbestjx0),the sameprior,p(wbest)mustbeincludedinthenon-wordnormalizationtomake thegurescomparable.now,plottinglog^l(nonwordjx0)againstlog^l(wbestjx0) foreachword(gure8.21)showsthatthereisaclearboundaryseparating CHAPTER8.HIDDENMARKOVMODELLING theout-of-vocabularywordswhichthenon-wordmodelcorrectlyidenties fromthein-vocabularywordswhichthelexicalapproachgetsrightbutthe non-wordmodelgetswrong.thesearethetwosetsofwordsforwhichthe decisionbetweenmethodsiscritical.wordsforwhichbothmethodsareright orbotharewrongcanbeignoredhereasthechoicebetweenstrategiesdoes notaecttheaccuracyoftheseclassications. chosentogiveadecisionboundaryontheline: Sincethetwogroupsofwordshardlyoverlap,athresholdPnw,canbe waschosentopermitnumericallyaccuratecalculationswiththeprobabilities Figure8.20showsonesuchboundary logbpnw=2600.thisthresholdcan beinterpretedasthelogoftheprobabilityoftransitionintothenon-word allthelexiconwordmodels.infactpnw=0:33.thebasebofthelogarithm modelwithinaglobalmodelwhichencompassesthenon-wordmodeland log^l(nonwordjx0)=logpnw+log^l(wbestjx0): (8.41) storedasintegers,ifdesired,soinfactbislittlemorethanone. errorratesarecomparedtothecoverageandtheerrorrateusingonlythe lexicon,asingure8.15.thistimetherecognitionrateishigherthanthe coverageforsmalllexica,showingthepowerofthenon-wordmodelforrecognizingout-of-vocabularywords.withlargerlexica,therecognitionrate fallsbelowthecoverage,butremainsabovethelexicon-onlyrecognitionrate. Thusanon-wordmodelalwaysimprovestherecognitionrate,thoughtheeffectissmallwhenthelexiconislarge. 8.7Summary writtenwords:derivingwordprobabilitiesfromtheframelikelihoodsofthe previouschapter.fromthesimplemodelswithonestateperletter,anumberofenhancementshavebeendescribed.bymodellingthedurationdistributionsofletters,thesystemaccuracyhasbeenimproved.theproblem ofvocabularysizehasbeenaddressedanditseectontheerrorrateshown, forbothclosedandopenvocabularytasks.asimpleunigramgrammarhas beenimplemented,andithasbeenshownhowthisreducestheerrorrate.a schemeforrejectingpoorlyrecognizedwordshasbeendescribedandasystemforrecognizingwordsnotinthelexiconimplemented.combiningthese hasgivenincreasedrecognitionontheopenvocabularytaskwhenmanytest O-linehandwritingrecognition wordsarenotinthelexicon. Figure8.21showstheerrorrateswhenusingthisdecisionboundary.The Thischapterhasdescribedthenalstageintheprocessofrecognizinghand- 108

vocabularytask.lowererrorratescanbeachievedbyapplyingarejection 8.8%withalexiconandgrammar,53.9%usingnolexiconand12%ontheopen criterion. Themostsignicantresultsfromthischapterarethenalerrorratesof CHAPTER8.HIDDENMARKOVMODELLING O-linehandwritingrecognition 109

Chapter9 Conclusions hasbeenimplementedandtestedonadatabaseofcursivescript.theresults showthatthemethodofrecurrenterrorpropagationnetworkscanbeapplied Thisthesishasdescribedacompletehandwritingrecognitionsystemwhich Isawinniteprocessesthatformedonesinglefelicityand, successfullytothetaskofo-linecursivescriptrecognitionandperformbetterthanacomparisonhiddenmarkovmodelsystem.an88%recognitionrate JorgeLuisBorges.TheGod'sScript. understandingall,iwasabletounderstandthescriptofthetiger. hasbeenachievedonanopen-vocabularytask.comparisonofresultswith otherresearchersisdicultbecauseofdierencesinexperimentaldetails, Thesingleauthorrecognitionratesforothersystemsare(forvariouslexicon sizes):48%bybozinovicandsrihari(1989),50%byedelmanetal.(1990)and 70%byYanikogluandSandon(1993). theactualhandwritingusedandthemethodofdatacollection.theresults whichhavebeenpublishedforsimilarproblemsarenotedinsection2.3.2. berofways.thesuccessiveimprovementsaresummarizedintable9.1.this showstherelativereductioninerrorratethateachofthetechniqueshas broughtabout. offeatureshaveledtoreducederrorrates.thehybridsystem,whichwas foundtoperformbetterthanthediscreteprobabilityhmmsystem,wasimprovedbyretrainingwithre-estimatedframelabels.baum-welchretraining oftherecurrentnetworkhasbeendescribedhereandhasalsobroughtabout animprovementinrecognitionratescomparedtousingviterbitargets.betterperformancestillcanbehopedforfromtraininglargernetworks,butthe Enhancementsinnormalizationandinthedetectionandrepresentation Therecognitionperformanceofthesystemhasbeenimprovedinanumware. trainingtimeisproblematicforsuchlargenetworkswithoutspecialisthard- O-linehandwritingrecognition mance,bothbyincorporatingamodelofthedurationofeachletter,andby addingaunigramwordgrammar.ithasbeenshownthatthesystemcan recognize46%ofwordswithoutrestrictiontoalexicon,andthatamodel Languagemodellinghasbeenfoundtoimprovetherecognitionperfor- 110

Method Skeletonvs.undersampling Features Non-uniformquantization Proportionalerror ratereduction(%) CHAPTER9.CONCLUSIONS Snakes Hybridvs.discreteHMM Baum-Welchvs.Viterbitargets 36 11 15 30 theincorporationofthetechniquesdescribedinpreviouschapters.thediscretehmmiscomparedtoahybridwiththesame 9 Table9.1:Theproportionalreductioninerrorrateachievedby Retraining Durationmodel Unigramgrammar 14 numberofparameters. 7 forwordsnotinthesystem'svocabularycanincreasetherecognitionrate beyondthatotherwiseobtained. beenreducedbychoosinganeectiveweightupdatescheme,byusingsoftmaxoutputs,byspecifyingthetrainingscheduleandbyinitializationofthe weightmatrix.preliminaryworktoinvestigatetheoperationofthenetwork hasbeencarriedout,givingagreaterunderstandingoftheweightsandfeedbackunits.muchmorecouldbedoneinthisareawiththehopeofgreater understandingandimprovedperformance. Thetrainingtimeoftherecurrentnetworkhasbeeninvestigatedandhas problemofwhereeortcanbemosteectivelyappliedtoincreasetheperformance.itisfeltthatinthissystem,theeorthasbeenevenlydistributed, butwithaslightemphasisontheworkdescribedinchapter8.indistributing theeort,potentialimprovementsineveryaspectofthesystemhavenecessarilybeenleftwithoutbeinginvestigated.asaresult,furtherworkcould becarriedout,withreasonablehopeofreturn,onanyofthetechniquesthat havebeendescribed. ularproblemwithamodel-basedapproach,andderivesarepresentationof notyetbeenappliedtoarecognitiontask.normalizationofaskeletonin abetterskeletonfromtherawimage.doermann(1993)tacklesthispartic- theo-linestrokeswithinferredtemporalinformation.histechniquehas theformderivedbydoermanncouldbecarriedoutusingtheproceduresof SingerandTishby(1994)whichuseamodelofhandwritingproductionto Thepreprocessingusedcouldbeimprovedupon,forexamplebyextracting Inwritingacompletehandwritingrecognitionsystem,onemustfacethe 9.1Furtherwork O-linehandwritingrecognition 111

guidenormalization.thenon-uniformquantizationschemecouldalsobe mademorestable,andthesnakefeaturemodelscouldbeextendedasdescribedattheendofchapter6. Thissystemhasbeentestedontheproblemofsingle-writerhandwritingrecognition,thoughthedesignhasbeenmadeopentoacceptingany styleofhandwriting,withnormalizationagainstscale,slope,slantandstroke CHAPTER9.CONCLUSIONS tionsacrossallstatesrepresentingthesameletter.thiswouldbesimple givenaprobabilitydistributionforeachstate,insteadoftyingthedistribu- intoseveralspacestobeindividuallyquantized.thehmmcouldalsobe quantizationschemesusingalternativemetricsordividingtheinputspace rithmswhichwillallowthesystemtobetestedonthecedardatabase. width.itishopedthatfutureworkwillincludetheincorporationofalgo- forthepurehmm,butmightbecomputationallyintensiveforthehybrid system.context-dependentmodelsmightalsobeused. ThepureHMMsystemcouldbeimprovedbyexperimentingwithother techniqueofconnectionistmodelmerging(robinsonetal.1994).theimpositionofamorecomplex,task-dependentgrammarwhichfurtherrestricts thechoiceofwordscanalsobeexpectedtoyieldhigheraccuracy. Betterrecognitionratesforthehybridsystemcouldbeexpectedfromthe O-linehandwritingrecognition 112

Abbink,G.H.,Teulings,H.L.andSchomaker,L.R.B.(1993)Description Bibliography AldusandMicrosoft,(1988)TiStandardDenition,5.0edition. Alimi,A.andPlamondon,R.(1993)Performanceanalysisofhandwritten Arcelli,C.andSannitidiBaja,G.(1985)Awidthindependentfastthinning ofon-linescriptusinghollerbach'sgenerationmodel.in(iwfhr1993), pp.217{224. strokesgenerationmodels.in(iwfhr1993),pp.252{261. Bengio,Y.,LeCun,Y.andHenderson,D.(1994a)Globallytrainedhandwrittenwordrecognizerusingspatialrepresentation,convolutionalneural networksandhiddenmarkovmodels.in(cowanetal.1994),pp.937{ Bellegarda,J.B.,Nahamoo,D.,Nathan,K.S.andBellegarda,E.J.(1994)SupervizedhiddenMarkovmodelingforon-linehandwritingrecognition. InInternationalConferenceonAcoustics,SpeechandSignalProcessing, volume5,pp.149{152. algorithm.ieeetransactionsonpatternanalysisandmachineintelligence7(4):463{474. Boser,B.E.(1994)Patternrecognitionwithoptimalmarginclassiers.In Bos,B.andvanderMoer,A.(1993)TheBakuninprojectandopticalcharacterrecognition.In(OCRHD1993),pp.11{15. Networks5(2):157{166. dencieswithgradientdescentisdicult.ieeetransactionsonneural 944. Bengio,Y.,Simard,P.andFrasconi,P.(1994b)Learninglong-termdepen- Bouma,H.(1971)Visualrecognitionofisolatedlowercaseletters.Vision (Impedovo1994),pp.147{171. Breuel,T.M.(1994)Asystemfortheo-linerecognitionofhandwrittentext. Bourlard,H.andMorgan,N.(1993)ConnectionistSpeechRecognition:A Bozinovic,R.M.andSrihari,S.N.(1989)O-linecursivewordrecognition.IEEETransactionsonPatternAnalysisandMachineIntelligence HybridApproach.Kluwer. Research11:459{474. TechnicalReport94{02,IDIAP,CP609,1920Martigny,Switzerland. 11(1):68{83. O-linehandwritingrecognition 113

Bridle,J.S.(1990)Probabilisticinterpretationoffeedforwardclassication Brown,P.F.(1987)Theacoustic-modellingprobleminautomaticspeech networkoutputs,withrelationshipstostatisticalpatternrecognition. NeurocomputingF68:227{236. BIBLIOGRAPHY Browning,J.(1992)Articialintelligencesurvey.TheEconomist322 Caesar,T.,Gloger,J.M.andMandler,E.(1993a)Preprocessingandfeatureextractionforahandwritingrecognitionsystem.In(ICDAR1993)partment.CMU-CS-87-125. recognition.technicalreport,carnegiemelloncomputersciencede- (7750):21. Canny,J.F.(1986)Acomputationalapproachtoedgedetection.IEEE Caesar,T.,Joachim,G.,Kaltenmeier,A.andMandler,E.(1993b)Recognition TransactionsonPatternAnalysisandMachineIntelligence8:679{698. pp.409{416. pp.408{411. byhandwrittenwordimagesbystatisticalmethods.in(iwfhr1993), Cootes,T.F.andTaylor,C.J.(1992)Activeshapemodels `smartsnakes'. Charniak,E.(1993)StatisticalLanguageLearning.MITPress. Cheriet,M.andSuen,C.Y.(1993)Extractionofkeylettersforcursivescript Cipolla,R.andBlake,A.(1990)Thedynamicanalysisofapparentcontours. InThirdInt.Conf.ComputerVision,pp.616{623. InProceedingsoftheBritishMachineVisionConference,ed.byD.Hogg recognition.patternrecognitionletters14:1009{1017. Cowan,J.D.,Tessauro,G.andAlspector,J.eds.(1994)AdvancesinNeural Cootes,T.F.,Taylor,C.J.,Cooper,D.H.andGraham,J.(1992)Training MachineVisionConference,ed.byD.HoggandR.Boyle.SpringerVerlag. InformationProcessingSystems,number6.MorganKaufmann. modelsofshapefromsetsofexamples.inproceedingsofthebritish andr.boyle.springerverlag. Downing,J.andLeong,C.K.(1982)PsychologyofReading.Macmillan. Doermann,D.S.,(1993)DocumentImageUnderstanding:IntegratingRecoveryandInterpretation.UniversityofMarylandPh.D.thesis. Study.Dept.ofAppliedMath,WeizmanInstituteofSciencePh.D.thesis. MicroelectricsandsignalprocessingNumber9.LondonAcademic. Edelman,S.,(1988)ReadingandWritingCursiveScript:AComputational Davies,E.R.(1990)MachineVision:Theory,Algorithms,Practicalities. Edelman,S.,Ullman,S.andFlash,T.(1990)Readingcursivescriptbyalignmentofletterprototypes.InternationalJournalofComputerVision5 O-linehandwritingrecognition (3):303{331. 114

Elliman,D.G.andBanks,R.N.(1991)Acomparisonoftwoneuralnetworks Eldridge,M.A.,Nimmo-Smith,I.,Wing,A.M.andTotty,R.N.(1984) measures.journaloftheforensicsciencesociety24(3):179{219. Thevariabilityofselectedfeaturesincursivehandwriting:Categorical BIBLIOGRAPHY Fontaine,T.andShastri,L.(1992)Characterrecognitionusingamodular Fahlman,S.E.(1988)Anempiricalstudyoflearningspeedinbackpropagationneuralnetworks.TechnicalReportCMU-CS-88-162,CMU. forhand-printedcharacterrecognition.iniee2ndneuralnetworks, Fukushima,K.(1980)Neocognitron:Aself-organisingneuralnetworkmodel sylvania,philadelphiapa19104-6389. spatiotemporalconnectionistmodel.neuroprose,universityofpenn- number349iniee,pp.224{228. BiologicalCybernetics36:193{202. foramechanismofpatternrecognitionunaectedbyshiftinposition. Gilloux,M.,Leroux,M.andBertille,J.-M.(1993)StrategiesforhandwrittenwordsrecognitionusinghiddenMarkovmodels.In(ICDAR1993), InformationProcessingSystems,number5.MorganKaufmann. P.J.,Janet,S.A.andWilson,C.L.,(1994)NISTform-basedhandprint Garris,M.D.,Blue,J.L.,Candela,G.T.,Dimmick,D.L.,Geist,J.,Grother, Geake,E.(1992)Letterstoacomputer.NewScientistpp.30{33. Giles,C.L.,Hanson,S.J.andCowan,J.D.eds.(1993)AdvancesinNeural pp.299{304. recognitionsystem.documentunderstandingmailinglist. Govindan,V.K.andShivaprasad,A.P.(1990)Characterrecognition{a Gray,R.M.(1984)Vectorquantization.In(WaibelandLee1990),chapter review.patternrecognition23(7):671{683. Hinton,G.E.,Williams,C.K.I.andRevow,M.D.(1992)Adaptiveelastic Hepp,D.J.(1991)Anapplicationofbackpropagationtotherecognitionof Haber,R.N.andHaber,L.R.(1981)Visualcomponentsofthereading ofthespie1451:228{233. process.visiblelanguagexv(2):147{182. handwrittendigitsusingmorphologicallyderivedfeatures.proceedings 3.3,pp.75{100. Hochberg,M.M.,(1992)AComparisonofState-DurationModellingTechniquesforConnectedSpeechRecognition.DivisionofEngineering, Hollerbach,J.M.(1981)Anoscillationtheoryofhandwriting.Biological modelsforhand-printedcharacterrecognition.in(moodyetal.1992), pp.512{522. BrownUniversityPh.D.thesis. O-linehandwritingrecognition Huang,Y.S.andSuen,C.Y.(1993)Combinationofmultipleclassierswith Cybernetics39:139{156. measurementvalues.in(icdar1993),pp.598{601. 115

Hubel,D.H.andWiesel,T.N.(1962)Receptiveelds,binocularinteraction Hull,J.J.(1993)Adatabaseforhandwrittentextrecognitionresearch.IEEE andfunctionalarchitectureinthecat'svisualcortex.journalofphysiology160:106{154. BIBLIOGRAPHY ICDAR.(1993)SecondInternationalConferenceonDocumentAnalysisand ICDAR.(1991)FirstInternationalConferenceonDocumentAnalysisand Idan,Y.andChevalier,R.C.(1991)Handwrittendigitsrecognitionbya Recognition,St.Malo,France. TransactionsonPatternAnalysisandMachineIntelligence. Impedovo,S.ed.(1994)FundamentalsinHandwritingRecognition,volume Impedovo,S.,Dimauro,G.andPirlo,G.(1990)Anewdecisiontreealgorithm Recognition,Tsukuba,Japan.IEEEComputerSocietyPress. supervisedkohonen-likelearningalgorithm.ijcnn913:2576{2581. IWFHR.(1993)ThirdInternationalWorkshoponFrontiersinHandwriting 124ofNATOASISeriesF:ComputerandSystemsSciences.Springer Jacobs,R.A.(1988)Increasedratesofconvergencethroughlearningrate Verlag. Jelinek,F.(1991)Upfromtrigrams!Thestruggleforimprovedlanguage forhandwrittennumeralsrecognitionusingtopologicalfeatures.proc. SPIE1384:280{284. Johansson,S.,Atwell,E.,Garside,R.andLeech,G.(1986)ThetaggedLOB Recognition.CEDAR,SUNYBualo. adaptation.neuralnetworks1:295{307. models.inproceedingseuropeanconferenceonspeechcommunication Kimura,F.andShridhar,M.(1991)Handwrittennumericalrecognition andtechnology,pp.1037{1039. Kimura,F.,Shridhar,M.andChen,Z.(1993a)Improvementsofalexicon Kass,M.,Witkin,A.andTerzopoulos,D.(1987)Snakes:Activecontour corpus.technicalreport,norwegiancomputingcentreforthehumanities,bergen. basedonmultiplealgorithms.patternrecognition24(10):969{983. models.inproc.1stinter.conf.computervision,pp.259{268.ieee. Kuhn,R.anddeMori,R.(1990)Acache-basednaturallanguagemodelfor Kimura,F.,Shridhar,M.andNarasimharmurthi,N.(1993b)Lexicondi- directedalgorithmforrecognitionofunconstrainedhandwrittenwords. rectedsegmentation-recognitionprocedureforunconstrainedhandwrit- tenwords.in(iwfhr1993),pp.122{131. In(ICDAR1993),pp.18{22. Lam,L.,Lee,S.andSuen,C.Y.(1992)Thinningmethodologies Acomprehensivesurvey.IEEETransactionsonPatternAnalysisandMachine Intelligence14(9):869{885. O-linehandwritingrecognition speechrecognition.ieeetransactionsonpatternanalysisandmachine Intelligence12(6):570{583.Corrected:PAMI,14(6):691,1992. 116

Lanitis,A.,Taylor,C.J.andCootes,T.F.(1993)Agenericsystemforclassifyingvariableobjectusingexibletemplatematching.InBritishMachine Press. VisionConference,ed.byJ.Illingworth,volume1,pp.329{338.BMVA Lanitis,A.(1992)Applicationsofpointdistributionmodelsinhandwritten opticalcharacterrecognitionandfacerecognition.transferreport,dept. ofmedicalbiophysics,universityofmanchester. BIBLIOGRAPHY LeCun,Y.,Boser,B.,Denker,J.S.,Henderson,D.,Howard,R.E.,Hubbard, Lecolinet,E.andBaret,O.(1994)Cursivewordrecognition:Methodsand Lecolinet,E.andCrettez,J.-P.(1991)Agrapheme-basedsegmentationtechnique.In(ICDAR1991),pp.740{748. W.andJackel,L.D.(1989)Backpropagationappliedtohandwrittenzip coderecognition.journalofneuralcomputation1:541{551. Lee,K.-F.(1989)AutomaticSpeechRecognition:TheDevelopmentofthe strategies.in(impedovo1994),pp.235{263. Leroux,M.,Salome,J.C.andBadard,J.(1991)Recognitionofcursivescript Leymarie,F.(1990)Trackinganddescribingdeformableobjectsusingactive SPHINXSystem.Kluwer. wordsinasmalllexicon.in(icdar1991),pp.774{782. Manke,S.andBodenhausen,U.(1994)Aconnectionistrecognizerforon-line Linde,Y.,Buzo,A.andGray,R.M.(1980)Analgorithmforvectorquantizer Lu,S.W.,Ren,Y.andSuen,C.Y.(1991)HierarchicalattributedgraphrepresentationandrecognitionofhandwrittenChinesecharacters.Pattern contourmodels.technicalreporttr-cim-90-9,mcgilluniversity. Recognition24(7):617{632. design.ieeetransactionsoncommunicationscom-28(1):84{95. Marr,D.(1982)Vision.Freeman. Martins,W.andAllinson,N.M.(1991)Visualsearchofpostalcodesbyneuralnetworksusinghumanexamples.InIEE2ndConferenceonNeural Networks. Matan,O.,Burges,C.J.C.,LeCun,Y.andDenker,J.S.(1992)Multi-digit cursivehandwritingrecognition.ininternationalconferenceonacoustics,speechandsignalprocessing,volume2,pp.633{6. McGraw,G.,Rehling,J.andGoldstone,R.(1994)Rolesinletterperception:Humandataandcomputermodels.TechnicalReportCRCC-TR90, McVeigh,A.(1993)TheIrishdatabaseproject:acaseforOCR.In(OCRHD recognitionusingaspacedisplacementnetwork.in(moodyetal.1992), Moody,J.E.,Hanson,S.J.andLippmann,R.P.eds.(1992)Advancesin pp.488{495. O-linehandwritingrecognition CenterforResearchonConceptsandCognition.IndianaUniversity. 1993),pp.29{37. NeuralInformationProcessingSystems,number4.MorganKaufmann. 117

Mori,S.,Suen,C.Y.andYamamoto,K.(1992)HistoricalreviewofOCR Moreau,J.-V.,Plessis,B.,Bougeois,O.andPlagnaud,J.-L.(1991)Apostal chequereadingsystem.in(icdar1991),pp.758{766. researchanddevelopment.proceedingsoftheieee80(7):1029{1058. BIBLIOGRAPHY Mori,Y.andYokosawa,K.(1988)Neuralnetworksthatlearntodiscriminate Nag,R.,Wong,K.H.andFallside,F.(1986)Handwrittenscriptrecognition Nellis,J.andStonham,T.J.(1991)Afullyintegratedhand-printedcharacter similarkanjicharacters.neuralinformationprocessingsystems. usinghiddenmarkovmodels.inicassp'86.ieee. Olsen,M.(1993)Scanning,keyboarding,anddataverication:Factorsin OCRHD.(1993)OpticalCharacterRecognitionintheHistoricalDiscipline, numbera18inviiiinternationalconferenceoftheassociationforhistoryandcomputing.netherlandshistoricdataarchive. selectingdatacollectiontechnologies.in(ocrhd1993),pp.93{112. enceonneuralnetworks,number349iniee,pp.219{223. recognitionsystemusingarticialneuralnetworks.iniee2ndconfer- Palumbo,P.W.,Srihari,S.N.,Soh,J.,Sridhar,R.andDemajenko,V.(1992) Paquet,T.andLecourtier,Y.(1991)Handwritingrecognition:Application Postaladdressblocklocationinrealtime.IEEEComputer25(7):34{42. Pavlidis,T.(1993)Recognitionofprintedtextunderrealisticconditions. Pavlidis,T.(1992)Applicationofsplinestoshapedescription.InVisual Paquet,T.andLecourtier,Y.(1993)Recognitionofhandwrittensentences Form,ed.byArcelli,Cordella,andS.diBaja,pp.431{441.Plenum. onbankcheques.in(icdar1991),pp.749{757. Pearlmutter,B.A.(1990)Dynamicrecurrentneuralnetworks.TechnicalReportCMU-CS-88-191,CarnegieMellonUniversity,SchoolofComputer Science,Pittsburgh,PA15213. usingarestrictedlexicon.patternrecognition26(3):391{407. Pettier,J.C.andCamillerapp,J.(1993)Scriptrepresentationbyageneralised Plamondon,R.,Bordeau,M.,Chouinard,C.andSuen,C.Y.(1993)Validationofpreprocessingalgorithms:Amethodologyanditsapplicationsto thedesignofathinningalgorithmforhandwrittencharacters.in(icdar 1993),pp.262{269. writeridentication Thestateoftheart.PatternRecognition22 PatternRecognitionLetters14:317{326. Plamondon,R.andLorette,G.(1989)Automaticsignaturevericationand skeleton.in(icdar1993),pp.850{853. Plessis,B.,Sicsu,A.,Heutte,L.,Menu,E.,Lecolinet,E.,Debon,O.and O-linehandwritingrecognition (2):107{129. Moreau,J.-V.(1993)Amulti-classiercombinationstrategyforthe recognitionofhandwrittencursivewords.in(icdar1993),pp.642{ 645. 118

Rabiner,L.R.andJuang,B.H.(1986)AnintroductiontohiddenMarkov Rayner,K.andPollatsek,A.(1989)ThePsychologyofReading.Prentice- Hall. models.ieeeasspmagazine3(1):4{16. BIBLIOGRAPHY Renals,S.andHochberg,M.(1994)Decodertechnologyforconnection- Robinson,A.andFallside,F.(1991)Arecurrenterrorpropagationnetwork Robinson,A.(1994)Theapplicationofrecurrentnetstophoneprobability Robinson,A.,Hochberg,M.andRenals,S.(1994)IPA:Improvedphone istlargevocabularyspeechrecognition.technicalreportcued/f- INFENG/TR.186,CambridgeUniversityEngineeringDepartment,UK. estimation.ieeetransactionsonneuralnetworks. Robinson,A.J.,(1989)DynamicErrorPropagationNetworks.Cambridge speechrecognitionsystem.computerspeechandlanguage5:259{274. modellingwithrecurrentneuralnetworks.inproceedingsoftheieee volume1,pp.37{40. InternationalConferenceonAcoustics,Speech,andSignalProcessing, Sasanuma,S.(1984)CansurfacedyslexiaoccurinJapanese.InOrthographies Rumelhart,D.E.,Hinton,G.E.andWilliams,R.J.(1986)Learninginternal ExplorationsintheMicrostructureofCognition,ed.byD.E.Rumelhart andj.l.mcclelland,volume1,chapter8,pp.318{362.bradfordbooks. UniversityEngineeringDepartmentPh.D.thesis. Schenkel,M.,Guyon,I.andHenderson,D.(1994)On-linecursivescript representationsbyerrorpropagation.inparalleldistributedprocessing: recognitionusingtimedelayneuralnetworksandhiddenmarkovmod- andreading.lawrenceerlbaumassociates. Senior,A.W.(1994)Normalisationandpreprocessingforarecurrent Senior,A.W.(1993)Arecurrentnetworkapproachtotheautomaticreading ofhandwriting.in(ocrhd1993),pp.59{65. pp.360{365. cessing,volume2,pp.637{640. networko-linehandwritingrecognitionsystem.in(impedovo1994), els.ininternationalconferenceonacoustics,speechandsignalpro- Simard,P.,LeCun,Y.andDenker,J.(1993)Ecientpatternrecognition Senior,A.W.andFallside,F.(1993b)Usingconstrainedsnakesforfeature Senior,A.W.andFallside,F.(1993a)O-linehandwritingrecognitionby Simon,J.-C.(1992)O-linecursivewordrecognition.Proceedingsofthe recurrenterrorpropagationnetworks.in(iwfhr1993),pp.132{141. Singer,Y.andTishby,N.(1993)Decodingofcursivescripts.In(Gilesetal. spottingino-linecursivescript.in(icdar1993),pp.305{310. O-linehandwritingrecognition usinganewtransformationdistance.in(gilesetal.1993),pp.50{58. IEEE80(7):1150{1161. 1993). 119

Srihari,S.N.andBozinovic,R.M.(1987)Amulti-levelperceptionapproach Singer,Y.andTishby,N.(1994)Dynamicalencodingofcursivehandwriting. ToappearinBiologicalCybernetics. toreadingcursivescript.articialintelligence33:217{255. BIBLIOGRAPHY Suen,C.Y.,Berthod,M.andMori,S.(1980)Automaticrecognitionof Srihari,S.N.,Govindaraju,V.andShekhawat,A.(1993)Interpretationof Starner,T.,Makhoul,J.,Schwartz,R.andChou,G.(1994)On-linecursive handwrittenaddressesinusmailstream.in(icdar1993),pp.291{294. Suen,C.Y.,Legault,R.,Nadal,C.,Cheriet,M.andLam,L.(1993)Buildinga pp.125{128. tionalconferenceonacoustics,speechandsignalprocessing,volume5, handprintedcharacters thestateoftheart.proc.ieee68(4):469{ handwritingrecognitionusingspeechrecognitiontechniques.ininterna- Teulings,H.-L.(1994)Invarianthandwritingfeaturesusefulincursive-script 487.244references. Waibel,A.andLee,K.-F.eds.(1990)ReadingsinSpeechRecognition.MorganKaufmann. avisuallycomplexenvironmentanditsapplicationtolocatingaddress Taylor,I.andTaylor,M.M.(1983)ThePsychologyofReading.NewYork newgenerationofhandwritingrecognitionsystems.patternrecognition Letters14:303{315. Wang,C.-H.andSrihari,S.N.(1988)Aframeworkforobjectrecognitionin Academic. recognition.in(impedovo1994),pp.179{198. Winston,P.H.(1984)ArticialIntelligence.AddisonWesley,secondedition. Woodland,P.C.,Odell,J.J.,Valtchev,V.V.andYoung,S.J.(1994)Large Processing,volume2,pp.125{128. oftheieeeinternationalconferenceonacoustics,speech,andsignal vocabularycontinuousspeechrecognitionusinghtk.inproceedings 151. blocksonmailpieces.internationaljournalofcomputervision2:125{ Yanikoglu,B.A.andSandon,P.A.(1993)O-linecursivehandwritingrecog- Yamadori,A.(1975)Ideogramreadinginalexia.Brain98:231{238. Zhang,T.Y.andSuen,C.Y.(1984)Afastparallelalgorithmforthinning Yuille,A.L.,Hallinan,P.W.andCohen,D.S.(1992)Featureextractionfrom nitionusingstyleparameters.technicalreportpcs-tr93-192,dart- mouthcollege,nh. facesusingdeformabletemplates.internationaljournalofcomputer Vision8(2):99{111. digitalpictures.communicationsoftheacm27(3):236{239. O-linehandwritingrecognition 120