Gabor Features for Offline Arabic Handwriting Recognition Jin Chen, Huaigu Cao, Rohit Prasad, Anurag Bhardwaj, Prem Natarajan 9 June 2010 Workshop on Document Analysis and Systems 2010
Outline Introduction Handwriting recognition overview Proposed Gabor features for handwriting recognition Experimental results
Introduction Goal: Improve accuracy of offline Arabic handwriting recognition Challenges: Unconstrained offline handwriting recognition has significantly different writing styles Shapes of the same character glyph vary across writers and even for the same writer Need: Develop features and classifiers that are effective in discriminating handwritten glyphs
Script-Independent Glyph Modeling Using HMMs Hidden Markov Models (HMMs) can model a feature vector as a function of one independent variable Independent variable is time in speech, for text images the independent variable is position within the text line Modeling framework allows for cursive scripts implicitly Does not require pre-segmentation of line images presegmented into words/sub-words/characters Training data simply consists of text image lines with corresponding transcripts Manual segmentation into words or characters is NOT required Note: A glyph is the stroke-segment that corresponds to a writing unit such as a character, a sub-word, or a word
Novel Extension - Stochastic Segment Modeling (SSM) Integrates evidence from different features and recognition approaches HMM generates fuzzy stochastic segments (2-D character images) Apply 2-D classification to the fuzzy segments We use Support Vector Machines (SVM) for 2-D character classification
Common Features for Character Recognition Gradient-Structural-Concavity (GSC) [Favata 94] Concatenate gradient, structural, and concavity features 95% classification accuracy on the NIST handwritten character database Contour Code [Verma 04] Use the rate of slope changes along the contour profile, along with the numbers of ascenders/descenders, start/end points, etc. 85% recognition rate on the BAC handwritten character database Character-SIFT [Zhang 09] Compute dynamic gradient histograms in the elastic-meshing and concatenate them into features 94% recognition rate on the HCL2000 Chinese database
Gabor Filtering based Feature Extraction Limitations of existing features GSC features do not capture width of the stroke Contour features are sensitive to artifacts such as broken strokes and pepper noise Extract features using the output of Gabor filtering Gabor filters are frequency-domain band-pass filters that select the signal at a specific orientation and frequency Captures stroke width and orientations Filtering output is robust to noise artifacts
Overview of Gabor Filtering A 2-D Gabor filter is a complex sinusoidal plane modulated by a Gaussian in the spatial domain: where R1 and R2 are: denotes the wavelength of a Gabor filter denotes the orientation of the filter
Overview of Gabor Filtering (2) In the frequency domain, a Gabor filter is defined as: where K is a constant, F1 and F2 are: Carrier Envelope Gabor Filter http://www.cs.umd.edu/class/spring2005/cmsc838s/assignment-projects/gabor-filter-visualization/report.pdf
Related Work in Gabor Filter Based Features [Wang 05] Set λ according to the stroke width Extract features only using the real part of the filtering response; positive and negative responses are treated separately 98.9% accuracy on Chinese handwritten character database 99.1% recognition accuracy on MNIST digit database [Ge 02] Set λ according to the stroke width Extract features using the magnitude of the filtering 2M sample database with a vocabulary of 4616 Chinese handwritten characters 97.5% recognition accuracy
Proposed Gabor Features Features are computed from the magnitude response of real and imaginary parts Step 1: Apply Gabor filters at 2 different frequencies and 4 different orientations Step 2: Partition the filter response into 8 x 8 grids Step 3: Count # of strong responses in each grid and concatenate them into a 512 dimensional vector: 8 x 8 (grid) x 2 (frequency) x 4 (orientation)
Experiments with Gabor Features Performed Part-of-Word (PAW) classification experiments to assess the efficacy of Gabor features Used Support Vector Machines (SVM) for classification and compared performance with GSC and Graph based features Dataset: Applied Media Analytics (AMA) Arabic database [AMA 07] Selected 34 most frequent PAW classes and run noise removal: Median filtering Slant correction Rule-line removal Training set: 6498 PAWs Testing set: 848 PAWs Sample images from the AMA databases
Features for Comparison GSC (512-dim) Gradient: gradient value and orientation for each bin, and then count pixels that have the same gradient Structure: real-valued features estimated from pixel neighborhood using a codebook of predefined shapes Concavity: coarse pixel density, large strokes, and concavity of different orientations Graph features (208-dim) Binarize and then apply stroke thinning to acquire a single-pixel wide representation of the image Traverse the skeleton to count the number of patterns, including 5 node types, 3 edge types, and 5 segment types
Experimental Results Comparison with GSC and Graph Features Feature Set % Classification Accuracy GSC 81.6 Graph 68.2 Proposed Gabor 82.7 Gabor I [Wang 05] (positive and negative real part) Gabor II (positive real part only) 76.2 79.8
Experimental Results Combination of Features Feature Set % Classification Accuracy GSC 81.6 Proposed Gabor + GSC 84.3 Gabor + Graph 82.8 Graph + GSC 79.7 Gabor I + GSC 82.7 Gabor II + GSC 82.7
Conclusions Experimental results demonstrate that Gabor features are useful for offline Arabic PAW classification Ongoing work: integrating Gabor features into the HMM and SSM framework Training set: 658K lines, 3.7M words Development set: 14K lines, 89K words Testing set: 14K lines, 89K words Recognition System %Word Error Rate HMM 26.5 SSM with Gabor 26.0 SSM with GSC 25.7 SSM with Gabor + GSC 25.7
Thank You!
Statistical Significance Test GSC+Gabor is statistically significantly better than using GSC along: GSC+Gabor (A): 715/848 GSC (B): 692/848 Null hypothesis (H0): Ra = Rb Alternative hypothesis (H1): Ra > Rb n 01 = # of samples misclassified by A but not by B n 10 = # of samples misclassified by B but not by A [Diertterich 98] Test Statistic Z 2 ~ χ 2 (1): n 01 = 25, n 10 = 48, Z = 2.57 > 1.96, the confidence level is 95%.
Reference 1. [Favata, 94 ] Handprinted character/digit recognition using a multiple feature/resolution philosophy. International Workshop on Frontiers in Handwriting Recognition, 1994. p57-p66. 2. [Verma, 04 ] A novel approach for structural feature extraction: contour vs. direction. Pattern Recognition Letters. 25(9): 975-988, 2004. 3. [Zhang, 09 ] Character-SIFT: a novel feature for offline handwritten Chinese character recognition. Proc. of ICDAR. 2009. 4. [Wang, 05 ] Gabor filter-based feature extraction for character recognition. Pattern Recognition. 38: 369-379, 2005. 5. [Ge, 02 ] Offline recognition of Chinese handwritten characters using Gabor features, CDHMM modeling and MCE training. Proc. of ICASSP, 2002. 6. [AMA, 07 ] Applied Media Analysis, Arabic-Handwritten-1.0. http://appliedmediaanalysis.com/datasets.htm. 2007. 7. [Natarajan, 09 ] Stochastic Segment Modeling for Offline Handwriting Recognition. Proc. of ICDAR. 2009. 8. [Dietterich, 98 ] Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10:1895 1923.