Investigation of Normalization Techniques and Their Impact on a Recognition Rate in Handwritten Numeral Recognition

S C H E D A E I N F O R M A T I C A E VOLUME 19 010 Investgaton of Normalzaton Technques and Ther Impact on a Recognton Rate n Handwrtten Numeral Recognton WIESŁAW CHMIELNICKI 1, KATARZYNA STĄPOR 1 Faculty of Physcs, Astronomy and Appled Computer Scence, Jagellonan Unversty, Reymonta 4, 30-059 Kraków, e-mal: weslaw.chmelnck@uj.edu.pl Insttute of Computer Scence, Slesan Techncal Unversty, Akademcka 16, 44-100 Glwce Abstract. Ths paper presents several normalzaton technques used n handwrtten numeral recognton and ther mpact on recognton rates. Experments wth fve dfferent feature vectors based on geometrc nvarants, Zernke moments and gradent features are conducted. The recognton rates obtaned usng combnaton of these methods wth gradent features and the SVM-rbf classfer are comparable to the best state-of-art technques. Keywords: handwrtten numeral recognton, normalzaton technques, SVM classfer, feature vectors, OCR, geometrc nvarants, Zernke moments, gradent features.

54 1. Introducton The recognton of handwrtten numerals by computer has been a subject of ntensve research for about last ffty years. The problem, whch s very smple for almost every human, s extremely complcated for the machne. Hundreds of scentst developed many sophstcated systems but computers are stll unable to compete wth human capabltes. The man reason that causes recognton of characters and numerals such a complex problem s the varety of wrtng styles. Each character or numeral can look dfferently dependng on a wrtng person, lght or heavy prnts, varyng levels of care, etc. A typcal OCR system usually conssts of three man processng stages: preprocessng, feature extracton/selecton and recognton usng a classfer. Preprocessng s a very mportant factor for the next two recognton stages: feature extracton and classfcaton. It conssts of a sequence of operatons appled to the mages to prepare them for feature extracton. Usually we start wth nose removal and/or smoothng [40], document skew correcton [41], normalzaton [], slant correcton [4]. Dependng on a feature extracton method addtonal preprocessng steps mght be requred such as thnnng [43] or contour analyss. Feature extracton conssts of large varety of technques whch allow us to represent an mage as a vector of values (features). These features can be based for example on geometrc moment nvarants [1], Zernke moments [7] or gradents [9, 10]. At ths stage a feature selecton algorthm can be appled to reduce the sze of the nput feature vector to avod the so-called curse of dmensonalty problem. There s also a large number of classfers. Begnnng from parametrc statstcal classfers, neural networks, SVMs (Support Vector Machnes) and fnshng on hybrd classfers. At each stage we can choose parameters whch could affect the fnal classfcaton performance. Ths paper focuses on recognton of unconstraned handwrtten numerals and especally on the normalzaton technques. It shows the mpact of mage normalzaton on the classfcaton performance. There are several normalzaton methods (descrbed n []) whch have been mplemented to use n ths research. The experments wth these methods show how dfferent normalzaton algorthms nfluence the fnal classfcaton performance. Fve dfferent feature vectors of three types (geometrc nvarants, Zernke moments and gradent features) are used n the experments. The nfluence of normalzaton on each of these vectors s shown. In ths research the SVM classfer based on the statstcal learnng theory of Vapnk [5] was used. The MNIST (modfed NIST) dgt database was used n the experments. The NIST database was collected from specally desgned forms flled by US Census Bureau employees and Hgh school students. The number of tranng

55 samples and test samples are 60 000 and 10 000, respectvely. Ths database s wdely used n varous character recognton researches. In recent years many feature extracton methods for handwrtten numeral recognton have been proposed. A survey of these methods can be found n [50]. Addtonally, some new features such as: stroke features [0], curvature features [10], local structure features [1] or structural and concavty features [44] are used to enhance a recognton rato. There are also many approaches to the classfcaton task of handwrtten characters, lke: statstcal technques [], neural networks and Support Vector Machnes (SVMs) [3]. The parametrc or non-parametrc statstcal classfers, the lnear dscrmnant functon (LDF), the quadratc dscrmnant functon (QDF), the nearest-neghbour (1-NN) and k-nn classfers, the Parzen wndow classfer, etc are used. They can be modfed: for example a modfed quadratc dscrmnant functon was proposed by Kmura et al. [4, 6]. Neural networks nclude the multlayer perceptron (MLP), the radal bass functon (RBF) network and the polynomal classfer. Several of these classfers have been evaluated n [7]. There are three man databases: CENPARMI [16], CEDAR [8], and MNIST [9] used n handwrtten characters recognton. They have been wdely used for valdatng the recognton performance. The CENPARMI dgt database was released by the Concorda Unversty. It contans 6000 dgts dvded nto a tran set (4000 mages 400 samples per class) and a test set (000 mages 00 samples per set). The CEDAR dgt database was released by CEDAR, SUNY Buffalo. The tranng data set contans 18 468 dgt mages. The test data set contans 711 dgt mages. The last database MNIST (modfed NIST) was extracted from the NIST specal databases SD3 and SD7. In ths database the szes of tranng and test sets are 60 000 and 10 000 mages, respectvely. Tab. 1. Results on CENPARMI database Method Recognton rate Franke [33] 97.60% Suen et al. [30] 98.85% Lu et al. [31] 98.45% Gader et al. [34] 98.30% Lu et al. [] 99.15% Below the results acheved on these databases are presented. In the followng tables the best recognton rates obtaned usng correspondng databases are shown. The recognton rate s defned as the number of correctly recognzed mages/the total number of mages of the test dataset.

56 The results acheved usng the CENPARMI database are presented n Tab. 1. The recognton rates vary from 97.6% to 99.15%. Some of these results have been receved usng multple classfers for example Suen et al. [30], but even better results 99.15% can be acheved usng a sngle classfer as n []. Tab.. Results on CEDAR database Method Suen et al. [30] 99.77% Lu et al. [8] 98.87% Ca et al. [35] 98.4% Oh et al. [36] 98.73% Recognton rate The lst of results obtaned usng the CEDAR database s presented n Tab.. The recognton rates are n the range from 98.4% to 99.77%. The best result was acheved by combnng multple classfers n [30], the best result usng a sngle classfer was 98.87% [8]. The thrd database, MNIST, s most wdely used for evaluaton of recognton algorthms. Some results on ths database are presented n Tab. 3. Here the recognton rates vary from 98.3% to 99.41%. The hghest accuracy was gven by Teow and Loe usng SVC on drecton and stroke-end features [8]. The same result 99.41% was acheved by Lu et al. [] on the NIST database (MNIST s a modfed NIST). Tab. 3. Results on MNIST database Method Teow et al. [8] 99.41% Mayraz et al. [37] 98.3% Dong et al. [38] 99.01% Belonge et al. [39] 99.37% Lu et al. []* 99.41% Recognton rate The rest of ths paper s organzed as follows. Secton ntroduces the normalzaton strateges, Secton 3 descrbes features extracton methods used n ths paper. Secton 4 shortly descrbes foundatons of the SVM classfer. Secton 5 presents expermental results, whle Secton 6 conclusons and future work.

57. Normalzaton technques Normalzaton s a process that changes dfferent mage parameters to obtan more convenent values. Normalzaton technques are used generally to reduce the wthn-class varaton. For example the ntensty normalzaton tres to equalze ntensty of the character mages, the perspectve transformaton may correct the mbalance of the character wdth [3], the moment normalzaton tres to correct the rotaton or slant [64] and the rato normalzaton changes the character aspect rato. Normalzaton s consdered to be the most mportant preprocessng factor for character recognton [55]. In most experments consderng feature extracton and classfcaton of characters a square standard plane wth the fxed N N dmensons s used. All orgnal character mages, usually of dfferent szes, are mapped onto ths plane. From ths standard plane a feature vector s extracted. The algorthms descrbed below are used to perform these mappngs. Let W1 and H1 denote the wdth and the heght of the orgnal character mage, respectvely. Then the aspect rato of the orgnal mage s defned as: mn( W1, H1) R1. (1) max( W, H ) Smlarly, let W and H denote the wdth and the heght of the normalzed mage, respectvely. Then the aspect rato the normalzed mage s defned as: 1 1 mn( W, H ) R. () max( W, H ) Both R1 and R are n the range [0, 1), as t follows from Eqs (1) and (). In the descrbed experments an aspect rato adaptve normalzaton (ARAN) strategy [] s used. In ths strategy the normalzed aspect rato R s calculated based on the orgnal aspect rato R1 usng dfferent mappng functons. So usng ARAN normalzaton a character mage s ftted nto a new normalzed plane H W, and then ths plane s shfted to overlap the standard plane. Dmensons of the normalzed plane are calculated as follows. It s assumed that one dmenson of the normalzed plane flls one dmenson of the standard plane: N = max(h, W), the other dmenson s calculated usng the aspect rato R and then centred on the standard plane (see Fg. 1). The transformaton to the standard plane can be descrbed usng a coordnate mappng. Let us denote the orgnal mage as f(x, and the normalzed mage as g(x, y ). Then the normalzed mage can be generated by coordnate mappng g(x, y ) = f(x,. We can use the forward mappng or the backward mappng. These mappngs are gven by:

58 Fg. 1. (a) An orgnal mage, (b) A normalzed mage on the standard plane and x x( x,, y y( x, x x( x,, y y( x, respectvely, where x, y are the coordnates of the normalzed mage and x, y are the coordnates of the orgnal mage. The mappngs used n the experments are presented n Tab. 4. When the forward mappng s used, x and y are dscrete, but x and y can be real values. Smlarly, when the backward mappng s used, then x and y are dscrete, but x and y can be real values. Moreover, n the forward mappng, the mapped coordnates x, y usually do not fll all pxels n the normalzed plane. So the coordnate dscretzaton or pxel nterpolaton s necessary. Fg.. An example of the backward mappng

59 The dscretzaton algorthm s qute smple. The mapped coordnates (x, y ) or (x, are approxmated by the closest nteger numbers ([x ], [y ]) or ([x], [y]). Then n case of the forward mappng, the dscrete coordnates (x, y ) scan the pxels of the orgnal mage and the pxel value f(x, s assgned to all pxels ranged from ([x (x, ], y (x, ]) to ([x (x+1, y+1)], [y (x+1, y+1)]). In case of the backward mappng dscretzaton s trval. In the descrbed experments grey-scaled mages are used, so the nterpolaton algorthm must be used. In case of the backward mappng the mapped pxel (x, s surrounded by four dscrete pxels. The grey level g(x, y ) s a weghted combnaton of the four pxel values. It s graphcally presented n Fg.. In the forward mappng every pxel n the orgnal mage and the normalzed mage are treated as squares of a unt area. The unt square of the orgnal mage s mapped to a rectangle n the normalzed plane. It s llustrated n Fg. 3. Next, each unt square overlappng the rectangle s gven a grey level value proportonal to the overlappng area. Fg. 3. An example of the forward mappng Two mappngs used n the experments are presented n Tab. 4. When usng the moment-based normalzaton method one dmenson may go beyond the standard plane. In ths case the mage part outsde the standard plane s cut off. Lnear mappng Moment mappng Tab. 4. Normalzaton methods used n experments Forward mappng Backward mappng x x x x/ y y y y/ x ( x x c ) x c x ( x x c ) / xc y ( y y c ) y c y ( y y c ) / y c where:

60 W / W1, H / H1 the centre of gravty of the mage s gven by mpq denotes the geometrc moments:, x m /, c 10 m00 y c m 01 / m00, m pq x y x p y q f ( x,, x c, y c denote the geometrc centre of the normalzed plane gven by: x c W /, y c H /. In the descrbed experments, all normalzaton functons are mplemented by the backward mappng. Fg. 4 shows samples of the normalzed mages, correspondng to all normalzaton functons. The sze of the standard plane s 3 3. In the descrbed experments there are several normalzaton methods mplemented as descrbed above. The used normalzaton functons are lsted n Tab. 5. Tab. 5. Lst of normalzaton functons used n the experments Symbol Descrpton Aspect rato N0 Lnear normalzaton wth fxed aspect rato R 1 N1 Lnear normalzaton wth preserved aspect rato N Lnear normalzaton wth square root rato R R 1 R R 1 N3 Lnear normalzaton wth cube root rato R 3 R 1 N4 Lnear normalzaton wth fxed aspect rato* R 9 0. N5 Lnear normalzaton wth square root of sne of R sn( / R1 N6 N7 aspect rato Moment normalzaton wth preserved aspect rato Moment normalzaton wth square root rato R R 1 R R 1 N8 Moment normalzaton wth cube root rato R 3 R 1 N9 Moment normalzaton wth fxed aspect rato* R 0. 9 N10 Moment normalzaton wth square root of sne R of aspect rato * The aspect rato obtaned usng test procedure from range [0.4, 1). ) sn( / 1) R

61 Fg. 4. Normalzed mages 3. Feature extracton In the descrbed experments three types of features are used: geometrc moment nvarants, Zernke moments and gradent features. The moment nvarants are known to be nvarant under rotaton, translaton, scalng and reflecton. The Zernke moments are nose reslent. The gradent features are easy to extract and gve the hgh performance and dscrmnatve power as well. 3.1. Geometrc moment nvarants Frst type of features whch were used are geometrc moment nvarants. These features extract global propertes of the mage such as the shape area, the centre of the mass, the moment of nerta, and so on. In these experments a feature vector smlar to presented n [46] s used, but modfed and extended to a 98D vector. Gven a grey-scale mage of the sze M N, the regular moments of order (p + q) are defned as: pq N M m x 1 j1 p y q j f ( x, y From the above translaton-nvarant central moments can be obtaned by placng the orgn n the centre of gravty. where pq N M ( x 1 j1 x) p ( y j q j ). f ( x, y j ),

6 Hu showed that: are scale-nvarant. pq m10 m01 x, y. m m pq 00 pq ( 1) 00 / 00, p q Fnally, rotaton-nvarant feature can be constructed. In ths paper seven nvarants were used as follows: 1 0 0, ( 0 0) 4 11, ( 3 ) (3 3 30 1 1 03 4 ( 30 1) ( 1 03) ( )( )(( ) 3( ) ) ),, 5 30 1 30 1 30 1 1 03 1 30)( 1 03)(( 30 1) ( 1 03) ), 6 ( 0 0)(( 30 1) ( 1 03) ) 4 11( 30 1)( 1 03 7 (3 1 03)( 03 1)(( 30 1) 3( 1 03) 1 30)( 1 03)(3( 30 1) ( 1 03) ). (3 ) (3 ), As these features are rotaton-nvarant there wll be a problem wth recognzng some numerals,.e. 6 and 9 or and 5. As a matter of fact the recognton rate usng these seven features was only about 45%. To avod ths problem the mage was dvded nto 4 and 9 square regons and all seven geometrc nvarants were extracted from every regon. Ths gves a (4 + 9) * 7 + 7 = 98D feature vector. Ths feature vector s denoted as GMI. 3.. Zernke moments A Zernke moments concept was frst ntroduced by Teague n 1980 [49]. Compared to the geometry moment nvarants Zernke moments are computatonally expensve, but have several advantages: they are orthogonal, rotaton nvarant and nose reslent. Addtonally, they have one nterestng feature, the orgnal mage can be reconstructed from these moments. They have been used to bnary pctures because they are not nvarant due to

63 contrast. Ths drawback can be easly avoded usng the grey-scale normalzaton. Complex Zernke moments are constructed usng a set of complex polynomals whch form a complete orthogonal bass set defned on the unt dsc. These polynomals are defned as below: where V nm ( x, R nm ( x, e jm tan 1 ( y / x) j 1, n 0, n m s even and ( nm) / s n / s ( 1) ( x y ) ( n s)! Rnm( x,. s0 n m n m s!! s! s Then Zernke moment of order n and repetton m s defned as: n 1 f ( x, ( V ( x, ) * Anm nm, x y 1, (3) m where * denotes a complex conjugate operator, n m s even and m n. It s nterestng that the orgnal mage can be reconstructed usng the formula: N f ( x, lm A V ( x,, (4) N n0 m where the sum s taken for all m n and n m s even. The ampltudes of Zernke moments Anm are rotaton nvarant. Invarance to the scale and translaton can be obtaned by shftng and scalng the mage before the computaton of Zernke moments. The normalzaton algorthms used n these experments guarantee that all mages are shfted and scaled. There are two feature vectors based on Zernke moments used. The frst vector ZM1 conssts of all 47 frst ampltudes of Zernke moments from Z 0, 0 to Z1,1, and the second vector ZM conssts of 4 ampltudes chosen only as descrbed n [7],.e: Z0,0, Z,0, Z3,1, Z3,3, Z4,0, Z4,, Z5,1, Z5,3, Z5,5, Z6,0, Z7,1, Z7,3, Z7,5, Z8,4, Z8,6, Z9,5, Z9,7, Z10,, Z10,4, Z11,1, Z11,5, Z11,7, Z1,0. nm nm, 3..1. Fast algorthm to compute Zernke moments The algorthm computng Zernke moments wth the use of Eq. (3) wll be very neffcent. As a matter of fact t s useless accordng to the database sze. So there must be found a far more effcent method. Let us notce that under polar coordnates the above formula can be expressed as:

64 A nm n 1 x y R nm ( r) e jm f ( r, ), (5) where r x y, tan 1 ( x / and R nm ( r) ( nm) s0 / s ( 1) ( n s)! r n m n m s!( s)!( s)! Then magnary and real components of Zernke moment can be calculated as: n C nm Rnm( r) cos( m ) f ( r, ) x y. ns (6) (7) and S nm n x y R nm ( r) sn( m ) f ( r, ). (8) The character mages are dscrete and usually relatvely small (n these experments 3 x 3) so the number of dfferent values of r s relatvely small. Let us denote the pxel n the centre of the mage as level 0, and the next 8 neghbourng pxels as level 1 and so on. On each level there wll be level+1 dfferent values of r. The number of levels wll be: n / ( n / 1). 1 Fg. 5. All possble values of r for a 7 x 7 mage

65 The total number of dfferent values of r wll be: 1/8(n + 6n + 8). Usually only a few dozen of frst Zernke moments are used, so all values of Rnm(r) can be computed usng Eq. (5) for all possble values of r, m, n. For example n the descrbed experments there are 3 x 3 mages and 47 frst Zernke moments are used. It gves 153*47 = 7191 dfferent values for all mages. Precomputng all these values has a great mpact on the algorthm effcency because all values of Rnm are calculated only once for all mages (the MNIST database used n the experments has 60 000 + 10 000 = 70 000 mages). Values of sn(mθ), cos(mθ) (or even Rnm(r)sn(mθ), Rnm(r)cos(mθ)) can be precomputed as well. 3.3. Gradent features The gradent features can be easly used to grey-scale mages and are robust aganst mage nose and edge drecton fluctuatons. Addtonally, the gradent can be computed by usng the Sobel operator, whch has two masks for the gradent components n horzontal and vertcal drectons. So t can be effcently extracted from the mage. The gradent gves us the magntude and the drecton of the greatest change n ntensty n the neghbourhood of a pxel. The Roberts [10] and Krsh [65] operator have also been used n the lterature. Fg. 6. Sobel masks used to compute gradents The Sobel operator s used to compute gradent components as follows: g x ( x, f ( x 1, y 1) f ( x 1, f ( x 1, y 1) f ( x 1, y 1) f ( x 1, f ( x 1, y 1) (9)

66 g y ( x, f ( x 1, y 1) f ( x, y 1) f ( x 1, y 1) f ( x 1, y 1) f ( x, y 1) f ( x 1, y 1). (10) Then, the gradent magntude s calculated as: and the gradent drecton as: A( x, g ( x, g ( x, (11) ( x, tan x 1 g ( g y x y ( x, ). ( x, The complete gradent map may contan some nose nformaton. Especally when grey-scale mages are used. To avod these spurous gradents a smple flterng algorthm s proposed an adaptve gradent thresholdng. In the frst step the average gradent magntude s computed over the whole mage and then ths value s used to flter our gradent map. Formally: g avr x y A( x, M N, where M, N are dmensons of the mage. Then and A( x, A( x, 1 ( x, ( x, 1 A( x, A A( x, A A( x, A A( x, A avr avr avr avr. The gradent drectons are real values from range [0, 360). To extract a feature vector they are quantzed nto a small number of nteger values. There are 1 nteger values used representng gradent scopes: [0, 30), [30, 60), [60, 90) and so on. Next, the gradent map s dvded nto 4 x 4 parts. Then, a percentage of pxels wth the drecton of gradent quantzed to value K = 1,,..., 1 s computed n each part. Hence the total number of features wll be 4 4 1 = 19D. Ths feature vector s denoted as GF. The second feature vector denoted as GFC s also based on gradent features. There are 10 crossng lne features added to the prevous vector GF. The crossng lne features are extracted n the followng steps. Frst, the centre of gravty of the mage s found, then the horzontal and vertcal lne are drawn through ths pont, and fnally two extra lnes on each sde of the horzontal and vertcal lne are added wth equal margns. The crossng lne

67 feature s the number of ntersecton ponts wth the mage. For example for the numeral shown n Fg. 7 there are two vectors: (1,1,1,1,1) and (1,,,,1). Fg. 7. Crossng lne features (a) horzontal, (b) vertcal 4. The SVM classfer The Support Vector Machne (SVM) has been proposed by Vapnk n [5]. The SVM technque has been used n dfferent applcaton domans and has outperformed the tradtonal technques n terms of generalzaton capablty. Contrary to the tradtonal technques whch try to mnmse the emprcal rsk (the classfcaton error on the tranng data) SVM mnmses the structural rsk (the classfcaton error on data never seen before). The classfcaton task s to predct whether a test sample belongs to one of two classes. In a feature space ths corresponds to fndng a hyperplane whch separates these two classes. There s an nfnte number of such hyperplanes, so among the possble choces, the SVM classfer selects the one for whch the dstance of the hyperplane from the closest feature vectors (the margn ) s as large as possble. Ths hyperplane s called an optmal separatng hyperplane. Let us consder a classfer whose decson functon s gven by: T f ( x) sgn( x w b), where x denotes a feature vector and w s a weght vector. The problem s separable when there exst w and threshold b such that: T y ( x w b) 1, 1,,, m. To maxmze the margn we must mnmze 1/ w Ths problem leads to a so-called dual optmzaton problem, whch s such that L D N 1/ N 1, j. ( x j T x j ),

68 0, 1,,, N and 0. Ths leads to a hyperplane decson functon: f ( x) sgn( y ( x support vectors T N 1 y x) b), (1) where x are support vectors wth Lagrangan non-zero multplers α. The support vectors are the feature vectors whch le on the margns. Ths s an advantage of ths approach because only a small number of vectors s used to compute a resultng classfer. In a real lfe problem t s unlkely that a hyperplane wll exactly separate the data. To deal wth ths problem the soft margn hyperplanes are used. A set of varables ξ representng errors (.e. the vectors whch le nsde the margn) and a parameter C whch determnes a trade-off between margn maxmzaton and error mnmzaton are ntroduced. In Eq. (1) a dot product of the nput vectors s used. So we can apply some trck to calculate the dot product of the vectors n the feature space usng a kernel functon. It allows us to create a decson functon that s nonlnear n the nput space, but s lnear n the feature space,.e.: f ( x) sgn( y K( x, x) b), support vectors (13) where K(x, x) s a kernel functon. Typcal kernel functons are: 1. Lnear kernel: T K x, x) x x, (. Polynomal kernel: T d K( x, x) ( x x c), 3. RBF (Radal bass kernel) Kernel: K x, x) x x, 0, ( 4. Gausan RBF Kernel: K( x, x) exp( x ) /, x T 5. Sgmod kernel: K( x, x) tanh( x x c). The last problem whch must be solved s that handwrtten numeral recognton s a mult-class problem and the SVM s a bnary classfer. There are two commonly used solutons. The frst s WTA (wnner takes all) strategy. In ths approach we buld N classfers for N classfcaton problems: one class versus all other classes. Another approach s to buld n(n 1)/ classfers for each par of classes, then use MVS (majorty votng scheme) strategy.

69 5. Expermental results In ths secton recognton accuraces obtaned n the experments are presented. The results are presented n Tab. 6. In the rows there are fve dfferent feature vectors (descrbed n Sect. 3) and n the columns N0 N10 are normalzaton methods used (see Tab. ). In the frst column WN there s accuracy obtaned wthout any normalzaton. The handwrtten dgt database MNIST descrbed n Secton 1 s used n these experments. Some mages of the tranng dataset are shown n Fg. 8. Ths database s dvded nto two datasets: the tranng dataset ncludng 60 000 samples and the test dataset ncludng 10 000 samples. As a SVM-rbf classfer s used n the experments there s an extra dataset necessary for valdaton purposes (to fnd C and γ parameters). So the tranng dataset was dvded nto two sets: 50 000 samples for the tranng set and 10 000 samples for the valdaton set. For these three datasets feature vectors are generated usng normalzaton methods lsted n Tab. 5. For normalzaton methods N4 and N9 a specal procedure was used to fnd the best rato R. The 1 feature vectors are generated usng aspect ratos from 0.4 to 0.95 wth step 0.05. The best result (for aspect rato 0.9) s presented n Tab. 6. The extra feature vector set s generated for orgnal mages. Fg. 8. Examples from MNIST database

70 Before usng the classfer all feature vectors are lnearly scaled nto [ 1, 1] range. The man advantage of usng ths scalng s to avod attrbutes n greater numerc ranges domnatng those n smaller numerc ranges. Another advantage s to avod numercal dffcultes durng the calculaton. Because kernel values usually depend on the nner products of feature vectors, e.g. the lnear kernel and the polynomal kernel, large attrbute values mght cause numercal problems. In Tab. 6 the results of the experments are shown. There are feature vectors n rows and normalzaton methods n columns. Tab. 6. Recognton rates obtaned on dfferent feature vectors WN N0 N1 N N3 N4 N5 N6 N7 N8 N9 N10 GMI 79.31 8.58 83.49 85.4 85.18 85.5 84.7 84.93 85.44 85.47 85.03 84.99 ZM1 84.10 91.4 9.84 91.1 93.54 93.94 91.80 9.90 94.33 94. 94.19 94.05 ZM 86.4 9.0 93.91 94.0 94.1 93.9 9.11 93.45 94.79 94.6 94.77 94.31 GF 94.5 96.78 96.83 96.98 98.01 97.61 97.83 97.70 98.76 98.61 98.77 98.7 GFC 95.4 97,86 97.65 98.1 98.48 98.3 98.47 98.1 99.16 98.8 98.98 99.06 Geometrc moments nvarants are the poor feature vector. But as can be seen n [46] and n our experments the results are close to the best features vectors based on gradent and drectonal features. Possbly, f we add extra features to these vectors,.e. concave features or crossed lnes features, the result would be even better. Zernke moments are better than geometrc moments, but are not so good as gradent features. In ths paper we manly focus on normalzaton methods and as a matter of fact we do not nvestgate the feature vectors. Maybe methods presented n [7] can be extended and yeld to better results. The best results are acheved usng gradent features. The extended GFC vector leads to even better results. The geometrc moment nvarants and Zernke moments appear slghtly worse, but the results are promsng. Perhaps combnng these feature vectors wth other features wll lead to better recognton ratos. GF and GFC vectors are most promsng. Consderng that they are easy to extract and easy to understand they seem to be good choce for future work. To examne how normalzaton methods nfluence recognton rates all feature vectors are extracted from the dataset also wthout normalzaton (the column WN n Tab. 6 shows the recognton rate on ths vector). Tab. 7 presents relatve recognton rates defned as follows: RRR j 100 Rj 1 100 R *100,

71 where Rj s recognton rate obtaned on -th feature vector usng j-th normalzaton and R s recognton rate obtaned on -th feature vector wthout any normalzaton. Ths measure shows how a normalzaton method contrbutes to achevng the optmal recognton rate. For example RRR = 100 means that ths normalzaton method leads to the maxmal recognton rate = 100%, RRR = 0 means that the correspondng normalzaton method brngs no advantage to the result and values less than zero mean that the normalzaton deterorates the recognton rato. Tab. 7. Relatve recognton rates N0 N1 N N3 N4 N5 N6 N7 N8 N9 N10 MIs 15.8 0. 9.5 8.4 30.0 6.1 7. 9.6 9.8 7.6 7.5 ZM1 46.0 55.0 44. 59.4 61.9 48.4 55.3 64.3 63.6 63.5 6.6 ZM 43,3 55.7 56.5 57.9 51. 4.7 5.4 6.1 60.9 6.0 58.7 GF 41. 4. 44.9 63.7 56.4 60.4 58.0 77.4 74.6 77.6 76.6 GFC 55.0 50.6 6.4 68.1 64.7 67.9 60.5 8.4 75. 78.6 80.3 6. Conclusons and future work In ths paper several normalzaton methods and fve feature vectors are compared on the MNIST database. The recognton results show that the moment normalzaton functons N7 yeld the hghest recognton rates. The results obtaned usng moment normalzaton functons N10 and N9 are also very good. Generally, the normalzaton s nfluental to the recognton performance for both dmenson-based and moment-based normalzaton. It s nterestng that preservng the aspect rato or forcng the aspect rato to one leads to substantally worse results. There are fve sets of feature vectors tested. The best results are acheved usng gradent features, but t can be seen that other feature vectors are not qute useless. The results are promsng. Maybe some extenson of these feature vectors could lead to results comparable wth gradent features. The reported results provde useful nsghts for selectng a sutable normalzaton algorthm n developng recognton systems. There are also nterestng results n experments usng dfferent aspect ratos. Testng one class versus all others shows that there s no unversal aspect rato optmal for all classes. Dfferent aspect ratos are optmal for dfferent numerals. For example for numeral 1 the best aspect rato s 0.4 and

7 for numeral 0 the best aspect rato s 0.95. Ths observaton s useless n ths experment, because there must be one classfer for all numerals, but n the future work t can be used for buldng a multple classfer soluton. The experments descrbed n ths paper are focused on normalzaton. The results show that ths preprocessng technque has a great mpact on the fnal recognton rate regardless of the feature vector used. The next step s to fnd even better feature vectors whch n conjuncton wth these normalzaton technques wll lead to even better recognton rates. 7. References [1] Xu D., L H.; Geometrc moment nvarants, Pattern Recognton 41, 008, pp. 40 49. [] Lu Ch.L., Nakashma K., Sako H., Fujsava H.; Handwrtten dgt recognton: nvestgaton of normalzaton and feature extracton technques, Pattern Recognton 37, 004, pp. 65 79. [3] Chang C.-C., Ln C.-J.; LIBSVM: a lbrary for support vector machnes, software avalable at: http://www.cse.ntu.edu.tw/~cjln/lbsvm, 001. [4] Lauer F., Suen Ch.Y., Bloch G.; A tranable feature extractor for handwrtten dgt recognton, Pattern Recognton 40, 007, pp. 1816 184. [5] Zhang W., Tang Y.Y., Xue Y.; Handwrtten Character Recognton Usng Combned Gradent and Wavelet Feature, Internatonal Conference on Computatonal Intellgence and Securty, Vol. 1, 006, pp. 66 667. [6] Stąpor K.; Automatc object classfcaton, Publshng House EXIT, 005. [7] Tong X.J., Zeng S., Zhou K., Jang Q.; Hand-wrtten numeral recognton based on Zernke moment, Proceedngs of the 008 ICWAPR, pp. 368 37. [8] Teow L.-N., Loe K.-F.; Robust vson-based features and classfcaton schemes for offlne handwrtten dgt recognton, Pattern Recognton 35(11), 00, pp. 355 364. [9] Lu H., Dng X.; Handwrtten Character Recognton Usng Gradent Feature and Quadratc Classfer wth Multple Dscrmnaton Schemes, Proceedngs of the Eghth ICDAR, 005, pp. 19 5.

73 [10] Sh M., Fujsava Y., Wakabayash T., Kmura F.; Handwrtten Numeral Recognton usng gradent and curvature of gray scale mage, Pattern Recognton 35(10), 00, pp. 051 059. [11] Crstann N., Scholkopf B.; Support vector machnes and Kernel methods: the new generaton of learnng machnes, AI Magazne 13(3), 00, pp. 3 41. [1] Lu Ch.L., Nakashma K., Sako H., Fujsava H.; Handwrtten dgt recognton: benchmarkng of state-of-the-art technques, Pattern Recognton 36, 003, pp. 71 85. [13] Sh M., Fujsawa Y., Wakabayash T., Kmura F.; Handwrtten numeral recognton usng gradent and curvature of gray scale mage, Pattern Recognton 35, 00, pp. 051 059. [14] Scholkopf B., Smola A.J.; Learnng wth Kernels. Support Vector Machnes, Regularzaton, Optmzaton, and Beyond, The MIT Press, 001. [15] Cheret M., Kharma N., Lu Ch.-L., Suen Ch.-Y.; Character Recognton Systems: A gude for students and practoners, Wley-Interscence, 007. [16] Suen Ch.Y., Nadal Ch., Legault R., Ma T.A., Lam L.; Computer Recognton of unconstraned handwrtten numeral, Proceedngs of the IEEE, 80, 199, pp. 116 1180. [17] Srkantan G., Lam S.W., SrHar S.N.; Gradent-based contour encodng for character recognton, Pattern Recognton 9, 1996, pp. 1147 1160. [18] Arca N., Yarmna-Vural F.T.; Optcal Character Recognton for Cursve Handwrtng, IEEE Transactons on Pattern Analyss and Machne Intellgence 3, 00, pp. 801 813. [19] Lu C.-L., Nakashma K., Sako H., Fujsawa H.; Aspect Rato adaptve normalzaton for handwrtten character recognton, n: Advances n Multmodal Interfaces ICMI 000, T. Tan Y. Sh, W. Gao (eds.), Lecture Notes n Computer Scence 1948, 000, pp. 418 45. [0] Kmura et al.; Evaluaton an synthess of feature vectors for handwrtten numeral recognton, IEICE Trans. Inform. Systems E79-D(5), 1996, pp. 436 44. [1] Heutte L., Paquet T., Moreau J.V., Lecourter Y., Olver C.; A structural/ statstcal feature based vector for handwrtten character recognton, Pattern Recognton Letters 19(7), 1998, pp. 69 641. [] Jan A.K., Dun R.P.W., Mao J.; Statstcal Pattern Recognton: a revew, IEEE Transactons on Pattern Analyss and Machne Intellgence (1), 000, pp. 4 37.

74 [3] Burges C.J.C.; A tutoral on support vector machnes for pattern recognton, Knowledge Dscovery Data Mnng (), 1998, pp. 1 43. [4] Kmura F., Takashna K., Tsuruoka S., Myake Y.; Modfed quadratc dscrmnant functons and the applcaton to Chnese character recognton, IEEE Trans. Pattern Anal. Mach. Intell. 9(1), 1987, pp. 149 153. [5] Vapnk V.; The Nature of Statstcal Learnng Theory, Sprnger, New York 1995. [6] Kmura F., Shrdhar M.; Handwrtten numeral recognton based on multple algorthms, Pattern Recognton 4(10), 1991, pp. 969 981. [7] Lu C.-L., Sako H., Fujsawa H.; Performance evaluaton of pattern classfers for handwrtten character recognton, Internatonal Journal on Document Analyss Recognton 4(3), 00, pp. 191 04. [8] Lee D.-S., Srhar S.-N.; Handprnted dgt recognton: a comparson of algorthms, Proceedngs of the Thrd Internatonal Workshop on Fronters of Handwrtng Recognton, Buffalo, New York, 1993, pp. 153 164. [9] LeCun Y. et al.; Comparson of learnng algorthms for handwrtten dgt recognton, n: Proceedngs of the Internatonal Conference on Artfcal Neural Networks, F. Fogelman-Soule, P. Gallnar (eds.), Nanterre, France 1995, pp. 53 60. [30] Suen C.-Y., Lu K., Strathy N.W.; Sortng and recognzng cheques and fnancal documents, n: Document Analyss Systems: Theory and Practce, S.-W. Lee, Y. Nakano (eds.), Sprnger, Berln 1999, pp. 173 187. [31] Lu C.-L., Nakagawa M.; Handwrtten numeral recognton usng neural networks: mprovng the accuracy by dscrmnatve tranng, Proceedngs of the Ffth Internatonal Conference on Document Analyss and Recognton, 1999, pp. 57 60. [3] Naggy G., Tuong N.; Normalzaton technques for handprnted numerals, Communcatons of the ACM 13(8), 1970, pp. 475 481. [33] Franke J.; Isolated handprnted dgt recognton, n: Handbook of Character Recognton and Document Image Analyss, H. Bunke, P.S.P. Wang (eds.), World Scentfc, Sngapore 1997, pp. 103 11. [34] Gader P.D., Khabou M.A.; Automatc feature generaton for handwrtten dgt recognton, IEEE Transactons on Pattern Analyss and Machne Intellgence 18(1), 1996, pp. 156 161.

75 [35] Ca J.-H., Lu Z.-Q.; Integraton of structural and statstcal nformaton for unconstraned handwrtten numeral recognton, IEEE Transactons on Pattern Analyss and Machne Intellgence 1(3), 1999, pp. 63 70. [36] Oh I.-S., Lee J.-S., Suen C.Y.; Analyss of class separaton and combnaton of class-dependent features for handwrtng recognton, IEEE Transactons on Pattern Analyss and Machne Intellgence 1(10), 1999, pp. 1089 1094. [37] Mayraz G., Hnton G.E.; Recognzng handwrtten dgts usng herarchcal products of experts, IEEE Transactons on Pattern Analyss and Machne Intellgence 4(), 00, pp. 189 197. [38] Dong J.X., Krzyzak A., Suen C.Y.; A mult-net learnng framework for pattern recognton, Proceedngs of the Sxth Internatonal Conference on Document Analyss and Recognton, Seattle 001, pp. 38 33. [39] Belonge S., Malk J., Puzcha J.; Shape matchng and object recognton usng shape contexts, IEEE Transactons on Pattern Analyss and Machne Intellgence 4(4), 00, pp. 509 5. [40] Gonzalez R.C., Woods R.E.; Dgtal Image Processng, nd edton, Addson Wesley, 001. [41] Hull J.J.; Document mage skew detecton: Survey and annotated bblography, n: Document Analyss Systems II, J.J. Hull and S.L. Taylor (eds.), World Scentfc, Sngapore, 1998, pp. 40 64. [4] Yamaguch T., Nakano Y., Maruyama M., Myao H., Hanno T.; Dgt classfcaton on sggnboard for telephone number recognton, Proceedngs of the 7 th Internatonal Conference for Document Analyss and Recognton, Ednburgh, Scotland 003, pp. 359 363. [43] Zhang T.Y., Suen C.Y.; A fast parallel algorthm for thnnng dgtal patterns, Communcaton of the ACM 7(3), 1984, pp. 36 39. [44] Favata J.T., Srkantan G., Srhar S.N.; Handprnted character/dgt recognton usng a multple feature/resoluton phlosophy, Proceedngs of the Fourth Internatonal Workshop on Fronters of Handwrtng Recognton, Tape 1994, pp. 57 66. [45] de Olvera Jr. J.J., Veloso L.R., de Carvalho J.M.; Interpolaton/decmaton scheme appled to sze normalzaton of characters mages, Proceedngs of the 15th Internatonal Conference Pattern Recognton, Vol., Barcelona, Span 000, pp. 577 580.

76 [46] Ramteke R.J., Mehrotra S.C.; Feature Extracton Based on Moment Invarants for Handwrtng, IEEE Conference on Recognton Cybernetcs and Intellgent Systems, Issue 7 9, 006, pp. 1 6. [47] Cheng D., Yan H.; Recognton of handwrtten dgts based on contour nformaton, Pattern Recognton 31(3), 1998, pp. 35 55. [48] Tong X.J., Zeng S., Zhou K., Zhao K., Jang Q.; Hand-wrtten numeral recognton based on Zernke moment, Proceedngs of the 008 Internatonal Conference on Wavelet Analyss and Pattern Recognton, Vol. 1, 008, pp. 368 37. [49] Teague M.R.; Image analyss va the general theory of moments, Journal of the Optcal Socety of Amerca 70(8), 1980, pp. 90 930. [50] Trer O.D., Jan A.K., Taxt T.; Feature extracton. Methods for character recognton a survey, Pattern Recognton 9, 1996, pp. 641 66. [51] Kawamura A. et al.; On-lne recognton of freely handwrtten Japanese characters usng drectonal feature denstes, Proceedngs of the 11th Internatonal Conference on Pattern Recognton, Vol., The Hague 199, pp. 183 186. [5] Mor S., Suen C.Y., Yamamoto K.; Hstorcal revew of OCR research and development, Proceedngs of IEEE 80(7), 199, pp. 109 1053. [53] Khotanzad A., Hong Y.H.; Invarant mage recognton by Zernke moments, IEEE Transactons on Pattern Analyss and Machne Intellgence 1(5), 1990, pp. 489 490. [54] Burges C.J.C.; A tutoral on support vector machnes for pattern recognton, Knowledge Dscovery Data Mnng (), 1998, pp. 1 43. [55] Gudessen A.; Quanttatve Analyss of preprocessng technques for the recognton of handprnted characters, Pattern Recognton 8, 1976, pp. 19 7. [56] Crstann N., Shawe-Taylor J.; An Introducton to Support Vector Machnes and Other Kernel-Based Learnng Methods, Cambrdge Unversty Press, 000. [57] Labusch K., Barth E., Martnetz T.; Smple Method for Hgh-Performance Dgt Recognton Based on Sparse Codng, IEEE Transacton on Neural Networks 19(11), 008, pp. 1985 1989. [58] Fan R.E., Chen P.H., Ln C.J.; Workng Set Selecton Usng Second Order Informaton for Tranng Support Vector Machnes, Journal of Machne Learnng Research 6, 005, pp. 1889 1918.

77 [59] Keerth S.S., Ln C.J.; Asymptotc behavors of support vector machnes wth Gaussan kernel, Neural Computaton 15(7), 003, pp. 1667 1689. [60] Kernel machnes web ste, http://www.kernel-machnes.org/. [61] The MNIST database of handwrtten dgts, http://yann.lecun.com/exdb/mnst/. [6] Hsu C.W., Chang C.C., Ln C.J.; A practcal gude to support vector classfcaton, http://www.cse.ntu.edu.tw/~cjln/papers/gude/gude.pdf. [63] Mukundan R., RamaKrshnan K.R.; Fast Computaton of Legendre and Zernke Moments, Pattern Recognton 8(9), 1995, pp. 1433 144. [64] Casey R.G.; Moment normalzaton of handprnted character, IBM Journal of Research and Development 14, 1970, pp. 548 557. [65] Lee S.-W.; Multlayer cluster neural network for totally unconstraned handwrtten numeral recognton, Neural Networks 8(5), 1995, pp. 783 79. [66] Abuhaba I.S.I., Holt M.J.J., Datta S.; Recognton of off-lne handwrtng, Computer Vson and Image Understandng 71, 1998, pp. 19 38. Receved May 16, 010