JORNAL OF COMPTERS VOL. 5 NO. 5 MAY 00 663 Handwrtten Nushu Character Recognton Based on Hdden Marov Model Jangng Wang Rongbo Zhu College of Coputer Scence South-Central nversty for Natonaltes Wuhan 430074 Chna Eal: wjng000@yahoo.co.cn Abstract Ths paper proposes a statstcal-structural character learnng algorth based on hdden Marov odel for handwrtten Nushu character recognton. The stroe relatonshps of a Nushu character reflect ts structure whch can be statstcally represented by the hdden arov odel. Based on the pror nowledge of character structures we desgn an adaptve statstcalstructural character learnng algorth that accounts for the ost portant stroe relatonshps whch as to prove the recognton rate by adaptng selectng correct character to the current handwrtten character condton. We penalze the structurally satched stroe relatonshps usng the pror clue potentals and derve the lelhood clue potentals fro Gaussan xture odels. Theoretc analyss proves the convergence of the proposed algorth. The experental results show that the proposed ethod successfully detected and reflected the stroe relatonshps that seeed ntutvely portant. And the overall recognton rate s 93.7 percent whch confrs the effectveness of the proposed ethods. Index Ters character recognton statstcal-structural learnng algorth Nushu character hdden Marov odels I. INTRODCTION Nushu (feale scrpts) was derved fro suare Chnese characters and were varatons of the later []. Nushu was popular n the valley of the Xaoshu Rver n Jangyong County of Hunan Provnce and s stll used by soe senle woen nowadays. Researches show that Nushu had ore than 000 characters aong whch 80% were created based on Chnese characters and only 0% were conages wth unnown orgn. Its characters too the shape of rhobus and were hgher on the rght part and lower on the left part. They are slender and beautful loo le Jaguwen (scrpts on tortose shells and anal bones) at frst glance and retan uch falar trace of Chnese characters []. Nushu s coposed of very strange characters whch feature strange shapes a strange way of arng strange socal functons and hstory. Dfferent fro Chnese deographc characters Nushu characters are deographc characters that have a sngle syllable and ndcate ther sound. Manuscrpt receved January 009; revsed Feberary 5 009; accepted Feberary 0 009. Correspondng author J. Wang E-al: wjng000@yahoo.co.cn Nushu was the tool of cultural councaton for local countrysde woen especally ddle- and old-aged woen. It played ts unue socal functon and was bascally used to create woen's wors and record woen's songs. Nushu wors norally were wrtten on delcately ade anuscrpts fans handerchefs and peces of paper. Nushu has acadec value fro the perspectves of phlology lngustcs socology ethnography and hstory etc. Therefore t s reputed as a wonderful dscovery and a wonder n the hstory of Chnese characters by scholars hoe and abroad. A. Related Wor Snce ore than 80% Nushu characters were created based on Chnese characters the orgnal Nushu ateral was handwrtten. Therefore the research schee of handwrtten Nushu character recognton can adopt the schee of Chnese characters recognton whch s the ost coon way to. The Chnese character structure s herarchcal: any straght-lne stroes consttute ndependent radcals whch n turn consttute characters [3 4]. The statstcal recognzer extracts the nforaton of the character age nto a feature vector by a feature extracton process. The feature vector does not represent the pen-trajectores drectly. Generally the recognzer represents and analyzes the character age by one of the two nds of ethod the statstcal ethod and the structural ethod [5 6]. However t represents the property of the character age reflectng the character structure ndrectly. Wth such a representaton the statstcal recognzer odels and analyzes the character usng varous nds of statstcal ethodologes. Wu et al. [7] projected a two densonal (-D) character age along x and y drectons. The ey features for coarse classfcaton are the Fourer coeffcents of the projected profles. Tseng et al. [8] selected contour drecton and crossng count to be the features of ther coarse classfcaton ethod. Chang and Wang [9] nspred by Dr. W. Yun-Wu used the perpheral shape codng technue to preclassfy handwrtten Chnese characters. They used ten categores of stroe patterns to code the four corners of a character. However such systes focused only on the relatonshp between near or connected stroe pars. As the result they were dffcult to represent the relatonshp between the stroes far fro each other. Moreover they were not effectve to represent the relatonshp between ore than two stroes because the nterstroe feature s dffcult to defne for ore than 00 ACADEMY PBLISHER do:0.4304/jcp.5.5.663-670
664 JORNAL OF COMPTERS VOL. 5 NO. 5 MAY 00 two stroes. Also they had a proble n cobnng the stroe atchng scores wth the atchng scores of the stroe relatonshps. They coputed the overall atchng score by sply accuulatng or ultplyng all stroe atchng scores and atchng scores of the stroe relatonshps. As the result the nforaton about the stroe relatonshp was duplcated because the atchng scores of ndvdual stroes also reflect soe nforaton about the stroe relatonshps. Structure approach has hgh tolerance to non-structure dstortons such as nose and wrtng style varatons. Snce Chnese characters are coposed of stroes fored by lne segents ost approaches use the geoetrcal and topologcal features of stroes as the recognton bass. In order to represent fner nforaton Lu et al. and Zhang and Xa categorzed the stroes nto several types such as horzontal vertcal slash bac-slash dot tc and hoo [0] []. The character odel was coposed of a set of odel stroes each of whch belongs to one of the predefned types. By assgnng dfferent attrbutes to each of the stroe types they represented the propertes of the stroe ore accurately. However the perforance of such systes heavly depends on the developer s nowledge because the character odels were not systeatcally traned but anually desgned. K and K represented the stroe by the dstrbutons of ts poston slope and length []. Such a statstcal odelng s ore systeatc than the prevous ethods. Although the character structure was anually specfed as was n the prevous ethods the poston and the shape of each stroe were statstcally odeled. Ther dstrbutons were estated fro tranng saples. The statstcal stroe odelng s desrable to tolerate wrtng varaton and ore robust than the heurstc-based ethod. Character recognton proceeds by fndng the best structural atch between the nput stroes and the stroe odels. Copared wth the statstcal ethod the structural ethod extracts feature ponts and lne segents fro character ages and represents ther spatal relatonshps by a relatonal graph n whch the node denotes the feature pont or lne segent and the edge between two nodes denotes ther relatonshps (for exaple constrant graph odel [0] attrbuted relatonal graph [] and herarchcal rando graph []). Despte the excellent descrptve ablty for fne detals of character structures there are two ajor probles yet to be solved. The frst s the stroe extracton proble because the stroes are often abguous and degraded how to extract the stable ones for odelng ther spatal relatonshps. Ths proble becoes uch ore dffcult f the thnnng preprocessng technues cause juncton-dstortons n character seletons [3]. The second proble les n that the structural ethod usually depends heavly on developer s heurstc nowledge [4 5] leadng to nether the rgorous atchng algorth nor the autoatc leanng schee fro tranng saples. B. Motvaton Chnese character as well as Nushu character recognton s adtted as a very dffcult proble n character recognton due to () very large character set () hgh coplexty of Chnese characters and (3) any slar character patterns. Snce both statstcal schee and structural schee have ther ert and deert. Therefore a hybrd statstcal-structural ethod s necessary for odelng character structures and recognzes characters. Our approach can be consdered to be a convergence between these two threads of research. However t proves the perforance on both sdes n ter of overall recognton rate. In ths paper we concentrate on the handwrtten Nushu character recognton proble where few research wors have done. A statstcal-structural character learnng algorth based on hdden Marov odel s proposed to recognze the handwrtten Nushu characters. The stroe relatonshps of a Nushu character reflect ts structure whch can be statstcally represented by the hdden arov odel. Based on the pror nowledge of character structures we desgn an adaptve statstcalstructural character learnng algorth that accounts for the ost portant stroe relatonshps whch as to prove the recognton rate by adaptng selectng correct character to the current handwrtten Nushu character condton. We penalze the structurally satched stroe relatonshps usng the pror clue potentals and derve the lelhood clue potentals fro Gaussan xture odels. The rest paper s organzed as follows. In secton II the proposed Nushu characters recognton schee s proposed n detal. Detaled experental results are shown n secton III. Fnally the concluson and future wor are gven n secton IV. II. PROPOSED ALGORITHM A. Statstcal-Structural Character Modelng In the proposed odelng ethod a character s represented by a set of odel stroes. The structure of the odel stroes s anually desgned. As shown n Fg. the odel stroe s coposed of a poly-lne connectng K feature ponts. In Handwrtten process a odel stroe s nstantated nto varous shapes of nput stroes and therefore the feature ponts are nstantated nto varous pxels of the nput stroes. In order to odel such a varaton the feature pont s represented by a dstrbuton of the pxels. Because each pxel s dentfed by ts poston and drecton the feature pont s represented by ther dstrbuton. Addtonally the drecton at the feature pont was also odeled to reflect ore nforaton. Fgure. Statstcal stroe odel of Nushu character. 00 ACADEMY PBLISHER
JORNAL OF COMPTERS VOL. 5 NO. 5 MAY 00 665 Denote the set of lns of Nushu character Q and let Q denote the cardnalty of the set Q. We defne the stroe crosspont as T event. We refer to N consecutve crossponts as n -graph. The specal case of sngle crossponts s referred to as ungraph. Two consecutve crossponts are referred to as dgraph n the lteratures and trgraph eans three consecutve crossponts etc. Gven a seuence of consecutve crossponts S = { s s s } where s the nuber of crosspont seuence we have n -graph wth the sze of n+. We defne the duraton of n -graph GD = { d d d N [ n + ]} as follows: d T T sn+ s = event event. The duratons of n -graph are used as seuence features for further analyss n our proposed odel. We ae a natural assupton that the n -graph wth duraton y P( y ) fors a Gaussan dstrbuton such that: ( yμ ) σ () P( y ) = e () πσ where μ s the ean value of the duraton y for n - graph and σ s the standard devaton. Snce behavoral characterstcs of the ndvduals could be nfluenced by any reasons the statstcal analyss ethod used by prevous wor can be vewed as the sae probablty was gven to the vald attepts of dgraph latences and duratons wthn the standard devatons of the ean duratons. By usng Gaussan odelng we can gve hgher probablty to the n -graph duratons of test saples that s ore close to the n -graph ean duratons of reference saples and lower probablty to the n - graph duraton that s far fro the ean of the n -graph for the reason that the ndvduals could be teporarly out of regular typng behavor and we can tae the rregular typng behavor wthout dscardng the possblty that the set of n -graph duratons provded by the correspondng ndvduals. Wth the ltaton that we are unable to collect all the typng crossponts of the ndvdual and calculate the exact paraeters of the eans and varances for each dstnct cobnaton of n -graph duratons. We have to deduce {( μ σ )} of n -graph duratons gve a crosspont seuence S by the ethod of axu lelhood estaton of the paraeters. Fortunately the axu lelhood estaton of the paraeters for Gaussan dstrbuton can copute the saple ean and saple varance as follows. = μ = = σ = d ( ) d( ) μ where s the nuber of n -graph appeared n S. In order to ne Nushu structural character the proposed HMM odels seuental data such as the seuence of the crossponts of Nushu characters and handwrtten character nforaton that we tae nto consderaton. The HMM we use to odel the structural character nforaton of crosspont and structural character seuence. HMMs are a odelng technue derved fro Marov odels whch are stochastc processes whose output s a seuence of states correspondng to soe physcal event. HMMs have the observaton as a probablstc functon of the states.e. the resultng odel s a doubly ebedded stochastc process wth an underlyngs to chastc that s not observable (t s hdden) but can only be observed through another set of stochastc processes that produce these ueue of observatons. Consderng that t s a statstcal graphcal odel where each crcle s a rando varable. nshaded crcles t represent are unnown (hdden) state varables we wsh to nfer and shaded crcles y t are observed state varables where t s a specfc pont n te. A s a state transton atrx holdng the probabltes of transtonng fro have: to j t t (3) (4) + where eans the -th state. So we j P( = t = t ) = + A j. (5) η s a state esson atrx holdng the output probablty P( yt t = ) of -th state. π s the ntal state probablty of -th state. A copact notaton λ = ( A ηπ ) s used to ndcate the coplete paraeter set of the odel. In our settng gven a crosspont seuence S n - graph G [ n + ]-graph G ' such that: S = { s s s } N. (6) G = { g g g n + } (7) G' = { g g g } (8) ' ' ' n The state transton atrx A s the probablty of the freuency that the [ n + ]-graph appeared n the as follows: 00 ACADEMY PBLISHER
666 JORNAL OF COMPTERS VOL. 5 NO. 5 MAY 00 A = g n (9) ' g /( ). t gt+ t The state esson atrx η here s defned as the Gaussan dstrbuton probablty of the n -graph G = { g g g n + } wth duraton GD = { d ( g ) d ( g ) d ( g )} as follow: n+ n+ ( d( g') μ ) σ g Pd ( ( g') g) = e g= g' ηg ( d( g')) = πσg 0 otherwse (0) There are three basc probles to solve wth the HMM λ = ( A ηπ ): ). Gven a odel paraeters λ = ( A ηπ ) and observaton output seuence O = OO Ot copute the probablty PO ( λ ) of the observaton output seuence. ) Gven a odel paraeters λ = ( A ηπ ) and observaton output seuence O = OO Ot fnd the ost probable state seuence Q = QQ Qt whch could have generated the observaton output seuence. 3) Gven an observaton output seuence O = OO Ot generate a HMM λ = ( A ηπ ) to axze the PO ( λ ). We ae the assupton that each ndvdual has hs/her own HMM wth λ = ( A ηπ ) for characters crosspont and character structural characterstcs. The proble to solve s that gven a crosspont seuence and ts character structural characterstcs nforaton we have to choose one fro the nuber of HMMs whch has the hghest probablty to generate the crosspont seuence S. Conseuently frst we have to calculate the probablty of crosspont seuence S for each HMM. Ths s slar to the frst basc proble to solve wth HMM as descrbed above and we wll show how to solve the proble wth Forward algorth. The state probabltes α s of each state can be coputed by frst calculatng α for all states at t = : α ( g ) = π( g ) η ( d ). () g Then for each te step t = the state probablty α s calculated recursvely for each state: α ( g ) = α ( g ) A η ( d ). () t+ t+ t t gt gt+ gt+ t+ Fnally the probablty of crosspont seuence S gven a HMM λ = ( A ηπ ) s as follows: P( SGGD λ) = α( g) = α ( g ) Ag ( ). g η g d (3) The esson probabltes tae less coputaton to obtan snce we use the Gaussan dstrbuton to odel observed states. Addtonally the observed states are only connected to the correspondng unnown states because we now the exact cobnaton of n -graph the ndvdual typed. So the suaton of all partal probablty of the state at te s gnored and only one probablty s calculated. In orgnal verson of the Forward algorth the coputaton nvolved n the calculaton of α t ( j) t [ T] j [ N] where T s the nuber of observatons n the seuence and s the nuber of states n the odel reures O( N T) calculatons. In our odfed verson of the Forward algorth we can see that t only reures ONT ( ) calculatons. B. Structural Learnng In the character buldng and extracton odule frst we have to buld the reference character for each Nushu character. It reures the user to provde the reference saples or handwrtten profle. The ore uantty of reference saples or hstores provded the ore exact paraeters can be extracted. After collectng suffcent nuber of reference saples we use the axu lelhood estaton for Gaussan odelng to calculate the paraeters of each n -graph duraton. We also have to copute the transton probablty atrx and ntal probablty vector wth respect to HMM. Then the paraeters calculated for HMM are treated as the base eleent of the reference profle for each user. The feature buldng and extracton odule extracts two observaton seuences based on a sldng-wndow approach. One observaton seuence s extracted n the horzontal drecton representng colun observatons and the other one s extracted n the vertcal drecton representng row observatons. Each dscrete observaton represents a ultdensonal feature vector whch s apped by eans of vector uantzaton (VQ). The ultdensonal feature vector cobnes both foreground and bacground nforaton. The foreground features represents local nforaton about the wrtng observed fro bacground foreground transtons. The other two features represent a global pont of vew about the wrtng n the frae fro whch they are extracted. The bacground features are based on a confguraton chan code representng concavty nforaton. The learnng algorth ncludes three steps: settng up Nushu prototypes ntalzng Nushu paraeters and the HMM paraeter estaton. Frst we set up Nushu prototypes for each category of characters usng the observaton fro a well-segented standard character where the nuber of stes I of standard characters euals the nuber of labels J of the Nushu for each category. The proposed adaptve learnng algorth antans a control probablty vector to select an accurate character 00 ACADEMY PBLISHER
JORNAL OF COMPTERS VOL. 5 NO. 5 MAY 00 667 aong a set of characters at te. A good polcy to update the probablty vector s a pursut algorth that always rewards the acton wth the current nu penalty estate and that stochastc learnng control perfors well n speed of convergence. In ths syste the probablty vector s the rate selecton probablty vector pn ( ) = [ p... pk ] where n s the ndex of the seuence of structural characters. The error of the nth handwrtten character durng s expressed by γ. The set of characters avalable are { R : =... K}. At begnnng the pn ( ) are assgned eual values: p(0) = [/ K.../ K ]. (4) In order to axze the lelhood the recognton algorth s reured to fnd the ndex of the best character. Such an approach reures the nowledge of handwrtten state durng each recognton. The stochastc learnng algorth presented n ths paper randoly selects a character. The character selecton probablty vector s altered by an teratve updatng process whch axzes the probablty of axzng the character recognton rate. Then the character recognton proceeds wth the fxed pn ( ) untl every character s selected at least M nuber of tes after whch pn ( ) s augented at each n. Followng each recognton perod an update of Sn ( ) and pn ( ) are carred out consderng the last M recognton sgnals of each recognton perod. Then we can get: R Sn ( ) = I( j) (5) L M j = L M + where I ( j ) s an ndcator functon: f Recognzaton s correct I ( j) = (6) 0 else L ( n ) s the nuber of recognton perods for whch the character structural R s selected durng the nth recognton perod. The structural learnng algorth can be suarzed as follows: Step. If t s the frst recognton perod ntalze the probablty vector as n euaton (4). Else selects a character structural R ( [ K] ) accordng to probablty dstrbuton p. Step. pdate I ( j ) and L ( n ). Then update Sn ( ) accordng to (5). Step 3. If for all L M for all go to next step else go to step. Step 4. Detect the ndex of the estated best character structure and update accordng to the followng euatons: where p Δp K p ( n+ ) = p ( n+ ) = j= j Δ p s a tunable penalty probablty paraeter. (7) The structural learnng algorth fnds the ndex of the estated best character R axzng the lelhood S at te n : L = arg ax{ R I ( )} (8) = L M+ For the probablty of ang the rght decson let the best recognton rate at te n R ( n ) be unue. Let φ ( n ) be the probablty that the estated best recognton rate s the actual best rate. Then we can get: R () n φ () n = Pr{ I() < I () ()} (9) L() n L() n = L() n M+ R = L() n M+ n The above probablty s readly obtaned by usng bnoal probablty dstrbuton. L Snce M I ( ) 0 for all [ K] when = L M+ L L I ( ) > I ( ) = L M+ = L M+ R L I R = L M+ ( ) ay exceed M. Let s tae nto account the fact that L L R I( ) < I( ) n such cases. R = L M+ = L M+ Let ε be the largest nonnegatve nteger less than α ( R/ R ) where α s a nonnegatve nteger. Defne the ndcator functon O() whch value s when condton wthn parentheses s satsfed else s 0. Defne the paraeter θ for [ K] : θ = ε O( ε M) + M O( ε > M) (0) Then we have: M L() n K θ L() n α= = L M+ = β= 0 = L M+ φ = [Pr{ I ( ) = α} Pr{ I( ) = α}] Consderng that: We can get: () θ = M ε M () M θ M α Mα M β φ = Q( Q) Q ( Q) α= α β= 0 β Mβ (3) 00 ACADEMY PBLISHER
668 JORNAL OF COMPTERS VOL. 5 NO. 5 MAY 00 where Q s the probablty of successful recognton of a Nushu character usng the structural. Snce the an objectve n the wor was to speed up the learnng process the proposed structural learnng algorth wors by dvdng an off-lne tranng database nto saller blocs. Each teraton of the algorth processes a dfferent bloc of data. Thus gven an ntal HMM and the bloc data drawn fro the tranng set ths algorth wors accordng to the algorth. HMMs are able to perfor recognton tass n pattern recognton systes. The ost popular approach for such tass conssts of creatng a set of HMMs so that each class s represented by an ndependent HMM. The classfcaton of an unnown observaton seuence S = { s s s } nto a class can be carred out by coputng whch HMM outputs the hghest lelhood related to O. In detal consder a class proble n whch each class s represented by a sngle HMM. The lelhood can be easly coputed by the forward bacward procedure. C. Adjustng and Recognton R two probablty value PSGGD ( λ ) and PSGV ( λ ). PSGGD ( λ ) can be vewed as the possblty f all the n -graphs duratons n G are devatng ε tes of duraton σ fro duraton μ. PSGV ( λ ) s the threshold value of probablty used to decde that the acceptance of the crosspont seuence S s confred f followng expresson s true. PSGGD ( λ ) PSGV ( λ ). (5) The weghtng factor ε can be specfed wth respect to dfferent level of securty strength. In the Identfcaton procedure gven a crosspont seuence S = { s s s } fro the ndvdual and a set of HMMs λ ' s = { λ λ λl } where l s the nuber of HMM. The proble s to choose the best one fro λ s whch ost probably generated S or there s no such one exsted. In the begnnng the crosspont seuence s transfored to n -graph cobnatons G = { g g g n + } and the tng nforaton of n - graph duraton GD = { d d d n + } s calculated. PSGGD ( λ ) for each HMM n λ s s produced by the proposed forward algorth. We select user wth the axu probablty over others such as: PSGGD ( λ ) = ax( PSGGD ( λ )) j [ l]. (6) j After that we produce a vector for user such that V V = { μ εσ μ εσ μ εσ } (7) g g g g g n+ g n+ Fgure. Nushu character adjustng and recognton. In the adjustng odule gven a crosspont seuence S wth claed dentty we wsh to exane the possblty that S generated by. Frst we transfor the crosspont seuence S to n -graph cobnatons G and calculate the character structural nforaton of n - graph duraton as usual. At ths oent we have S = { s s s } G = { g g g n + } and GD = { d d d n + }. Now we produce a vector V such that: V = { μ εσ μ εσ μ εσ } (4) g g g g g n+ g n+ where ε s the weghtng factor μ g s s duraton ean of n -graph g and σ g s s duraton standard devaton of n -graph. V s the n -graph duraton vector to evaluate the threshold value of the probablty produced by the proposed odfed forward algorth. Wth the nputs GD V and λ we can apply the proposed forward algorth entoned above to obtan where ε s the weghtng factor μ g s s duraton ean of n -graph g and σ g s s duraton standard devaton of n -graph. Agan we apply the proposed forward algorth entoned above to obtan two probablty value PS ( GGD λ ) and PSGV ( λ ). If the expresson P( S GGD λ) PSGV ( λ) the crosspont seuence generated by user s confred. Otherwse we consder the crosspont seuence s not generated by any user n the user profle database. III. EXPERIMENTAL RESLTS We evaluated the proposed algorth on the Nushu database whch has 783 classes wth 00 saples for each class. Fg. 3 shows soe typcal saples n the Nushu database. Fgure 3. Saples n Nushu database. 00 ACADEMY PBLISHER
JORNAL OF COMPTERS VOL. 5 NO. 5 MAY 00 669 We used a Matlab pleentaton on a PC wth.4 GHz CP and GB of eory. The average te on preprocessng s 0.00 seconds and the stroe extracton (0.03 seconds) and the learnng algorth (0.0 seconds) consue a total of 0.04 seconds n the connected neghborhood syste per character age. Although the structural atch wth one character odel s effcent reurng less than a second n our pleentaton practcally we have to repeat the structural atch wth all categores of character odels such as 783 categores n Nushu database to recognze one nput character age. Recognton rate (%) 00 95 90 85 80 75 SCSM MBSEM Proposed schee 000 000 3000 4000 5000 Tranng saples Fgure 5. Recognton rate of dfferent schee. Fgure 4. The stroe extracton and structural atchng result of the proposed algorth. When the nuber of categores ncreases the total te cost to recognze one character age ncreases. Currently there are two coonly adopted strateges to expedte the recognton process. The frst sultaneously uses several coputers to perfor the structural atch wth all the character odels n parallel. The second s the herarchcal classfcaton syste that uses a fast algorth to select a few canddate character odels and then perfors the structural atch between the nput stroes and these odels to deterne the best one. Fg. 4 shows the proposed stroe extracton and the structural atchng results. The frst colun shows the nput character. The second colun shows the slant and oent noralzaton of the character seleton. The thrd colun shows the proposed character odels where the labels are nubered and the adjustng and recognton functons are perfored. The structural learnng algorth assgns the best labels to the extracted canddate stroes. We copared our ethod wth the SCSM [3] and the attrbuted relatonal graph MBSEM [0]. The recognton rate of dfferent schee s shown n Fg. 5. The SCSM used the frst 000 odd nuber of saples of each category for tranng and the frst 000 saples of even nuber of saples for test on Nushu database. By handlng degraded regon the baselne recognton rate was 90.45 percent. For MBSEM the recognton rate vares wth the tranng saples ncreasng. The reason s that MBSEM can not recognze the handwrtten Nushu characters although tranng saple ncreases. For the proposed schee the recognton rate ncreases wth the tranng saples ncreasng whch proves the proposed character structural learnng algorth can ncrease the recognton rate. It s clear that the recognton rate of the proposed schee s 3.7% and 4.9% hgher than those of SCSM and MBSEM respectvely. The reason s that the proposed schee not only taes the character structures but statstcal-structural character nto consderatons and the adaptve character structure learnng algorth guarantee the recognton rate. The HMM-based statstcal-structural character odelng also truly depcts the Nushu character structure. IV. CONCLSION AND FTRE WORK A statstcal-structural character learnng algorth based on hdden Marov odel s proposed to recognze the handwrtten Nushu characters. The approach s a convergence between statstcal and structural threads of research. However t proves the perforance on both sdes n ter of overall recognton rate greatly. The stroe relatonshps of a Nushu character reflect ts structure whch can be statstcally represented by the hdden arov odel. Based on the pror nowledge of character structures an adaptve statstcal-structural character learnng algorth accounts for the ost portant stroe relatonshps whch as to prove the recognton rate by adaptng selectng correct character to the current handwrtten Nushu character condton. The experental results and the coparsons wth other ethods show that the proposed ethod successfully detected and reflected the stroe relatonshps that seeed ntutvely portant. And the overall recognton rate s 93.7 percent whch s obvously hgher than those of other schees. As a future research challenge we wll nvestgate how to decrease the use of the external nowledge for the proposed algorth. For exaple the use of a -fold cross-valdaton would be useful to deterne the nuber of teratons to tran each bloc of data Furtherore topology learnng could be eployed to deterne the best HMM topology. 00 ACADEMY PBLISHER
670 JORNAL OF COMPTERS VOL. 5 NO. 5 MAY 00 ACKNOWLEDGMENT Ths wor s supported by the Natural Scence Foundaton of Chna (No. 6084004) the Natural Scence Foundaton of State Ethnc Affars Cosson (No. 08ZN0). The authors are grateful for the anonyous revewers who ade constructve coents. REFERENCES [] Z.-B. GONG New Fndngs about Feale Scrpts Journal of South-Central nversty for Natonaltes Vol. 3 No. 4 pp. 93-97 003. [] Z.-B. GONG Nushu n Jangyong Is absolutely Not Ancent Characters durng pre-qn Day Journal of South- Central nversty for Natonaltes Vol. No. 6 pp. 30-33 00. [3] I.-J. K J.-H. K Statstcal character structure odelng and ts applcaton to handwrtten Chnese character recognton IEEE Transactons on Pattern Analyss and Machne Intellgence Vol. 5 Is. pp. 4-436 003. [4] R. M. Suresh S. Aruuga Fuzzy technue based recognton of handwrtten characters Iage and Vson Coputng v 5 n pp. 30-39 007. [5] R. Zhang X. Dng H. L. Lu Dscrnatve tranng based uadratc classfer for handwrtten character recognton Internatonal Journal of Pattern Recognton and Artfcal Intellgence v n 6 pp. 035-046 007. [6] M. F. Zafar O. Dzulfl Wrter ndependent onlne handwrtten character recognton usng a sple approach Inforaton Technology Journal v 5 n 3 pp. 476-484 006. [7] W. W. Ln Recognton of handwrtten Chnese characters by feature atchng n Proc. 99 Int. Conf. Coputer Processng of Chnese and Orental Languages 99 pp. 54 57. [8] Y. L. Wu T. M. Wu and B. S. Jeng Optcal Chnese character recognton usng a projecton profle and the Fourer transforaton J. Telecoun. Lab. Technue vol. 0 pp. 37 45 990. [9] H. D. Chang and J. F. Wang Preclassfcaton for handwrtten Chnese character recognton by a perpheral shape codng ethod Pattern Recognton. vol. 6 pp. 7 79 993. [0] C.L. Lu I.J. K and J.H. K Model-Based Stroe Extracton and Matchng by Heurstc Search for Handwrtten Chnese Character Recognton Proc. Sxth Int l Worshop Fronters n Handwrtten Recognton pp. 547-556 998. [] X. Zhang and Y. Xa The Autoatc Recognton of Handprnted Chnese Characters A Method of Extractng an Order Seuence of Stroes Pattern Recognton Letters vol. no. 4 pp. 59-65 983. [] H.Y. K and J.H. K Herarchcal Rando Graph Representaton of Handwrtten Characters and Its Applcaton to Hangul Recognton Pattern Recognton vol. 34 no. pp. 87-0 00. [3] I.-J. K and J.-H. K Statstcal Character Structure Modelng and Its Applcaton to Handwrtten Chnese Character Recognton IEEE Trans. Pattern Analyss and Machne Intellgence vol. 5 no. pp. 4-436 Nov. 003. [4] M. Asela J. Laasonen Adaptve cobnaton of adaptve classfers for handwrtten character recognton Pattern Recognton Letters v 8 n pp. 36-43 007. [5] P. M. Patl T. R. Sontae Rotaton scale and translaton nvarant handwrtten Devanagar nueral character recognton usng general fuzzy neural networ Pattern Recognton v 40 n 7 pp. 0-7 007. Jangng Wang receved the B.S. and M.S. degrees n Artfcal Intellgence fro Wuhan nversty Chna n 986 and 986 respectvely; and Ph.D. degree n ntellgent coputaton fro Wuhan nversty Chna n 007. She was a vstng professor of nversty of Wsconsn-La Crosse and Chonbu Natonal nversty. She s currently a Professor n College of Coputer Scence of South-Central nversty for Natonaltes. She has publshed over 40 papers n nternatonal journals and conferences n the areas of artfcal ntellgence and ntellgent coputaton. Her current research nterests are n the areas of character recognton ntellgent coputaton and optzaton. The research actvtes have been supported by the Natural Scence Foundaton of Chna Natural Scence Foundaton of State Ethnc Affars Cosson and Natural Scence Foundaton of Hube provnce. Dr Wang has been actvely nvolved n around 0 nternatonal conferences servng as Sesson Char and a revewer for nuerous referred journals and any nternatonal conferences. Rongbo Zhu receved the B.S. and M.S. degrees n Electronc and Inforaton Engneerng fro Wuhan nversty of Technology Chna n 000 and 003 respectvely; and Ph.D. degree n councaton and nforaton systes fro Shangha Jao Tong nversty Chna n 006. He s currently an Assocate Professor n College of Coputer Scence of South-Central nversty for Natonaltes. He has publshed over 40 papers n nternatonal journals and conferences n the areas of wreless councatons coverng 3G oble systes and beyond MAC and routng protocols and wreless ad hoc sensor and esh networs. He receved the Outstandng B. S. Thess and M. S. Thess awards fro Wuhan nversty of Technology n 000 and 003 respectvely. Hs current research nterests are n the areas of wreless councatons protocol desgn and optzaton. The research actvtes have been supported by the Natural Scence Foundaton of Hube provnce and Natural Scence Foundaton of South-Central nversty for Natonaltes. Dr Zhu has been actvely nvolved n around 0 nternatonal conferences servng as Sesson Char of the Intellgent Networs Trac at LSMS 07 and as a revewer for nuerous referred journals such as IEEE Councaton Letters Wley Wreless Councatons and Moble Coputng and any nternatonal conferences such as IEEE Globeco 08 IEEE ICC 07 IET CCWMSN 07 ISICA 07 and so on. 00 ACADEMY PBLISHER