Using Lexical Similarity in Handwritten Word Recognition
|
|
|
- Dorcas Lindsey
- 9 years ago
- Views:
Transcription
1 Using Lexical Similarity in Handwritten Word Recognition Jaehwa Park and Venu Govindaraju Center of Excellence for Document Analysis and Recognition (CEDAR) Department of Computer Science and Engineering State University of New York at Buffalo Amherst, NY Abstract Recognition using only visual evidence cannot always be successful due to limitations of information and resources available during training. Considering relation among lexicon entries is sometimes useful for decision making. In this paper, we present a method to capture lexical similarity of a lexicon and reliability of a character recognizer which serve to capture the dynamism of the environment. A parameter, lexical similarity, is defined by measuring these two factors as edit distance between lexicon entries and separability of each character s recognition results. Our experiments show that a utility function considering lexical similarity in a decision stage can enhance the performance of a conventional word recognizer. 1 Introduction Shapes of characters in handwriting vary widely. Some of the variation in the shpae of handwriting are categorized and methods to normalize the variations have been developed in previous literature [1]. However, still the variation of character shapes is one of the major obstacles in handwriting recognition reaching the high recognition performance seen in ther recognition of machine printed documents. Usually the target of handwritten word recognition is to choose one word from the dictionary as the final answer. If a lexicon is given, the recognition engine finds the best lexicon entry which has the highest matching score to the array of image segments of the input image or extracted feature array from the image. Scoring is performed between shape based feature vectors extracted from segmented character images and reference prototypes of codewords. Empirically speaking, recognition using only visual evidence, i.e. distance between shape based feature vector extracted from an input image and reference prototypes, cannot always be successful due to the limitations of in- Figure 1. Handwritten words in different writing style formation and resources available during training. If a word recognition engine is operated on shape based features only, the success of finding the true answer with high confidence values can not always be achieved, because some of the characters in a word sometimes have poor shapes, as seen on the images of the last line in Figure 1. However, Amherst and Buffalo can still be recognized, if the characters whose images have sharp shape features are unique in the lexicon (for example, based on character recognition of A, t, B and f of the images). Additional lexical similarity measurement which describes uniqueness of characters in a lexicon for the purpose of accepting the lexicon entry as a valid answer, must be considered to reach higher performance. In more general words, capturing the decision environment of the recognizer is required to overcome the incompleteness of shape based recognition. In this paper, we present a method to capture the dynamism of the decision environment represented by the class (character) similarity of a lexicon and reliability of the character recognizer. 2 Methodology A lexicon driven word recognizer finds the best matching word in a given lexicon. In analytical approaches, word 1
2 recognition engines find the best matches between lexicon entries and an array of pre-segmented strokes of word images (Figure 2). Comparisons are made between shape oriented feature vectors extracted from possible combinations of segments and reference prototypes of codewords to generate matching scores. A global optimum path is obtained for each lexicon entry and corresponding confidence values of the path is provided as a recognition result [2]. If shape features from an input image are very sharp, the image is recognized easily as the true answer presented in a given lexicon (images in the first line of Figure 1). However, if shape features are not so close to prototypes as the images shown in the last line of Figure 1, more attention is required to accept the true words, compared to the images above. For instance, the image of Amherst in the last line can be recognized when there is no ambiguity in the position of r in the given lexicon since the visual evidence of isolated character image at position r does not much match the general shape of r. If the lexicon entries are given as two possible choices of Amherst and Buffalo, we can easily choose Amherst as an answer based on strong shape oriented recognition evidence of A, h and t. If the lexicon is given as Amherst and Amheist, we can not choose Amherst as the final answer. 2.1 Similarity Measure Lexical similarity measurement (in distance) between two finite character arrays can be measured by Edit Distance [3]. Given two strings, the edit distance is defined as the minimum cost of transforming one string to the other through a sequence of weighted edit operations. The edit operations are usually defined in terms of insertion, deletion,andsubstitution of symbols. However in the lexical similarity measurement especially in segmentation driven word recognition, the simple cost model between characters, such as confusion tables of character recognizer, is not accurate enough to express the actual cost of character transformation because each character recognition is performed on a set of grouped segmented stroke array called segment frame. The similarity measurement of a lexicon entry depends not only on the lexicon but also on the separation characteristics of the character recognizer. If the recognizer being employed has high inter-class confusion for particular character pairs, the similarity (from the point of recognition separation of entries) which have the confusion pairs in their character arrays is very high. We extend the edit distance by adding three edit operations; splitting, merging and converting in order to let it work in a segment frame. The edit distance is weighted by inter-class confusion of a character recognizer since interclass confusion reflects imperfections of the character rec- (a) S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S0 S0-1 S0-2 S0-3 S3 S3-4 S3-5 S3-6 (b) Figure 2. example of stroke segmentation array; (a) contour representation of input image and (b) segmented strokes and combination of subset of strokes ognizer. 3 Framework 3.1 Recognition Condence Consider a lexicon, H, which has N H lexicon entries and a lexicon entry, hi, over a finite alphabet, of length N hi, given as H = fh 0 h 1 hn H ;1 g, and hi = [h i 0 h i 1 h i Nhi ;1], where the class set C is a possible character set including the null, C = f c 1 c 2 c Nc g,andh i j C. Simply put, hi is a word, and h i j is the j th character of the word hi. If upper and lower case alphabets or similar shape characters, (0,O) (1,I,l), (2,Z), etc, are allowed to be folded as possible combinations of a lexicon word, h i j can be a subset that includes all possible character at the position. Let us assume that a stroke segmentation algorithm is applied to a given image and a stroke segment array s = [s 0 s 1 s Ns;1 ] of length N s is generated, where s i is a stroke segment. An example is shown in Figure 2. A dynamic programming based word recognizer similar to [2] finds the best matching path of a lexicon entry within a segment array conforming to rules such as (i) at least one character matches one stroke segment, (ii) one character can only match within a window, and (iii) lexicon entries with a length smaller than N s or the maximum allowable number of characters of stroke array are discarded. The confidence of any h associated with sub-array between stroke segment s a and s b of s is given by A a b (h) =min 8c2h h(c) (1) 2
3 where h (c) is the confidence measure function of class c in pre-built prototypes of a character recognition engine (in our implementation it is in distance measure metric). Recognition confidence of a lexicon entry h is given by a minimum distance matching array of the individual class subset, A(h) =[A a0 b0 (h 0 ) A an h;1 b Nh ;1 (h Nh ;1)] where a 0 =0, b Nh ;1 = N s ; 1 and a i+1 ; b i =1. The conventional recognition confidence of a lexicon entry is given by the average of character recognition confidence in the lexicon entry as, (2) ja(h)j = 1 N h ;1 A ai bi (h i ) (3) N h i=0 3.2 Edit Cost in Segment F rame We can define the edit operations between the best matching paths of two different lexicon entries, hi and hj, without altering the segmented stroke frame of matching as follows nullifying: r(h i m! js ab ) substitution: r(h i m! h j n js ab ) splitting: r(h i m! [h j n1 h j n2 ]js ab ) merging: r([h i m1 h i m2 ]! h j n js ab ) converting: r([h i m1 h i m2 ]! [h j n1 h j n2 ]js ab ) where r(h i m! h j n js ab ) is a transform function from a symbol subset h i m to h j n within stroke segments array of s a to s b. m and n denote indices of characters within a lexicon entry, i.e. h i m denotes the m th character in lexicon entry i where h i m C. a and b are indices of the stroke segment array, and s ab denotes any character combination of stroke segment s a through s b where 0 a b N s ;1. Nullifying is a transform operation for a character match a certain boundary of stroke segments to null. Substitution is the replacement of a character match to another within the exact same matching bound. Splitting is a split of a character match into several character matches within the segment frame and merging is the inverse operation of splitting. Converting is transforming a string of multiple character matches into another string matches when matching bounds of characters in the two strings cross each other within the segment frame. Let "(c m! c n js ab ) be a elementary transform cost function from c m to c n and h (cjs ab ) be a character recognition confidence function of c within stroke segments s a and s b. The transform cost functions are defined as "(h i m! js ab )= h (h i m js ab ) (4) "(h i m! h j n js ab )= h (h i m js ab ) ; h (h j n js ab ) (5) "(h i m! [h j n1 h j n2 ]js ab )= h (h i m js ab ) ; min( h (h j n1 js aa1 )+ h (h j n1+1 js (a1+1)a2 )+ + h (h j n2 js an2 ;n 1 ;1+1b)) (6) "([h i m1 h i m2 ]! h j n js ab )= min( h (h j m1 js aa1 )+ h (h j m1+1 js (a1+1)a2 )+ + h (h j m2 js am2 ;m 1 ;1+1b)) ; h (h j n js ab ) (7) "([h i m1 h i m2 ]! [h j n1 h j n2 ]js ab )= min( h (h j m1 js aa1 )+ h (h j m1+1 js (a1+1)a2 )+ + h (h j m2 js am2 ;m 1 ;1+1b)) ; min( h (h j n1 js aa1 )+ h (h j n1+1 js (a1+1)a2 )+ + h (h j n2 js an2 ;n 1 ;1+1b)) (8) where a 1 a 2 a n2;n 1;1 are indices of stroke segments within s a s b satisfying a a 1 a n2;n 1;1 b. 3.3 Matching Distance A matching transform path from hi to hj, has been defined directly in terms of transform operation of the paths (or traces) ri j = [r i j 0 r i j 1 r i j Nhi ;1], where ri j is a path of transform operations of symbols from hi to hj within a segment frame s. The associated transform cost for apathis (ri j )= N hi ;1 k=0 "(r i j k ) (9) where k is a index of matching bound array of the path, k 2f(s a0b 0 ) (s anhi ;1b Nhi ;1 )g. Let ri j be the transform path of minimum cost from hi to hj. Then the matching distance of hi to hj is defined as (ri j )=min((ri j)) (10) The overall transform cost of a character of a lexicon entry is defined by the average of minimum transform costs of each character to all corresponding characters of the other lexicon entries in the lexicon. The overall matching distance of a true character matched in a portion of a segment array can be minimized in 3
4 negative value, if corresponding characters of other lexicon entries matched to the same segment portion are all different and a character recognition engine generates maximum separation between them. Otherwise, if most of corresponding characters of other lexicon entries in the same portion are the same true class or if the classifier does not generate good separation between them, the overall matching distance increases, or as a non true class it became a positive value. Thus, the overall matching distance reflects the importance of a character in a lexicon as well as the separation degree provided by a recognition engine. Lexical similarity of a lexicon entry is defined by an overall transform distance array of the character string of the lexicon entry to all the others in the lexicon. The liability of a character in a lexicon L(hijH) can be evaluated by the lexical similarity. L(hijH) =; 1 j=n H ;1 N H ; 1 "(ri j) (11) j=0 j6=i The absolute recognition distance of a lexicon entry in Equation 3 is the minimum distance measurement between feature vector and templates of the best path within the stroke segment array. This lexicon complexity of a lexicon entry reflects the lexical (or character) entropy of a lexicon entry projected into a lexicon in terms of matching distance. The overall lexicon complexity is 1 jl(hijh)j = N H ; Recognition Utility j=n H ;1 j=0 j6=i (ri j ) (12) The recognition utility function U (hjh) of each lexicon entry h in a lexicon H is defined by a convex combination of recognition confidence A and lexical similarity L such as U (hjh) =A(h) ; L(hjH) (13) where U (hjh) =[U (h 0 jh) U(h Nh ;1jH)]. The parameter is referred to the index of rejectivity. Large values of imply an increased willingness to reject hypotheses. = 0 implies minimal concern for liability, = 1 implies equal concern for recognition confidence and lexical similarity [4]. If we set juj to be a function which generates an overall recognition utility value of a lexicon entry h as the average value of each recognition utility, then juj becomes ju (hjh)j = 1 N h ;1 U (h i jh) (14) N h i=0 (a) (b) Figure 3. Stroke segmentation: (a) segmented primitive array and (b) prime stroke period estimation If we set as zero, juj becomes a conventional method of finding the average distance between a lexicon entry and a given image using a character recognizer which provides the distance in feature space. 4 Implementation A segmentation based word recognizer is chosen as the base implementation model of our approach. After preprocessing, a character segmentation algorithm is applied to a given word image and a character recognizer is applied to each segment assuming that perfect character separation is achieved. However since perfect character segmentation is not always guaranteed, an over-segmentation strategy (called as stroke segmentation) is usually adopted to avoid character segmentation errors which feed forward to character recognizer. After stroke segmentation procedure, possible combinations of segmented strokes are sent to a character recognizer to find matching scores of characters, and these scores are used as a metric in next word matching. Various word matching schemes have been implemented according to the metric used in a character classification and the method of handling a lexicon. In our implementation, a matching scheme using Dynamic Programming and a distance metric are used to find the best match between a symbol string and segmented strokes. Our implementation follows the flow of segmentation and recognition after preprocessing. 4.1 Stroke segmentation In the segmentation stage, word components are split into more detailed symbolic primitives using the method described in [5]. An example of segmentation results is shown in Figure 2. Then possible combinations of stroke primitives are sent to a character recognizer. 4
5 A prime stroke analysis, which have been introduced in [5] is used as a basic parameter provider. A dominant stroke interval period, prime spatial period is estimated in a given word image and used as a reference for choosing a character matching window size. For computational efficiency, a permissible window size for character matching is generated by a function of the number of characters in a lexicon, the number of prime segments in a given image, and extracted prime stroke period from the image. 4.2 F eatures Average Utility lexicon1 (top choice) lexicon2 lexicon3 Features are directly generated from a pixel based contour representation of a character [6]. A given image is divided into N f by N f grids in equal area where N f is the size of divisions. A N g slots histogram of gradient of contour components are obtained at each grid and an array of the histogram measurements of all grids are used as a feature vector. Two global features, aspect ratio of bounding box and a vertical/horizontal component ratio on overall contours are included in the feature vector in order to prevent mis-classification of non-character components. A (N f N f N g +2)dimensional feature vector is extracted for each image. In order to generate prototype templates, feature vectors are extracted using a training character set and the same feature vector extraction method. And the extracted feature vectors which belong to a class are clustered independently, using a K-means clustering algorithm. A clustering procedure is completed if the mean square error of clusters to the centers is less than a predetermined threshold or the number of centers reaches the maximum allowable number of clusters. 4.3 Matching The major goal of character matching following stroke segmentation is to provide evidence in a given metric to find the best matches of given lexicon entries in a next word matching stage. All possible combinations of strokes within a permissible window are sent to a character recognizer. And the character recognizer provides corresponding confidence projected in its knowledge base. In our implementation, distance between feature vector extracted from the given combination of strokes and built-in prototypes of associated character subset, is used as a metric. Following the stroke spacing model described in [5], it is assumed that a character has at least one prime stroke. The number of characters in a lexicon entry and the number of prime primitives in a given image are used to decide the basic size of matching window of each character. For more meaningful alignment of a sub-array of prime primitives to a character, filtering using a weighted matching win Rejected Lexicons Figure 4. recognition utility dow which is a function of a prime stroke period is applied to the combination of segmented strokes for a character. A basic permissible window size for a i th character h i of a lexicon h on a stroke primitive array s is determined as s min =min(i +1 N s ; s (N h ; i)) s max =max( s (N h ; i) N s ; (N h ; i)) where s min and s max are the minimum and maximum of a window, N h is the length of the lexicon h, and s is the maximum allowable number of strokes for a character matching. The actual distance measurement of a subset of symbol h in a lexicon entry h in a combination of all strokes between s a and s b is given by a product of the window function and the distance function of a character recognizer as A a b (h) =min 8c2h ( h(cjs ab )) (15) In our experiment, h (cjs ab ) generates Euclidean distance between features extracted from combined image of segments s a s b and the prototypes. Dynamic programming technique is used to find the best fit between the possible symbol strings of a lexicon entry and the segmented stroke frame of a given image. The recognition distance found by a character recognizer is used as a metric to find word recognition distance. One of the matching paths which generates minimum distance measure throughout a possible symbol string of a lexicon entry is chosen as the best match of the lexicon entry. 5 Experiment The proposed recognition utility is applied in a lexicon driven word recognizer for experiments. For the character 5
6 3 2.5 using both character recognition distance and lexicon complexity (liability) has better performance compared to the recognizer using only the distance. 2 6 Conclusion Error Rate (%) with similaity without similarity Reject Rate (%) Figure 5. recognition performance We have presented a method to capture lexical entropy in order to enhance recognition performance of a lexicon driven word recognizer which has a segmentation process prior to character recognition. Lexical entropy is evaluated as a parameter, lexicon complexity, in each lexicon entry and is reflected in the decision making as convex combination of visual feature distance metrics. Experimental results show the marginal improvement of recognition performance. Finding a generalized lexical entropy measuring method is expected as further research. recognition engine, from setting of N f = 2 and N g = 8, total 36 feature measurements are extracted for a feature vector. 4 additional sub-images have been generated from quad division using a centroid of contours of a given character image. Finer level of feature vectors are obtained from the sub-images using same feature extraction method. Thus total 180 dimensional feature vector is used for measuring feature distance between given images and prototypes. Using isolated character images from word images which are exclusive to testing word images, a maximum of 30 templates per class have been generated for each subnode by a K ; means clustering algorithm with a mean square error threshold of 0.1. The training character images are generated from pre-isolated character images which are manually segmented and collected from word images taken from address blocks of mail pieces in USPS main stream word images including city names, state names, and street names collected from address block of mail pieces in USPS mail stream were used as testing images which is not used for training. The lexicons were generated as subsets which has randomly selected lexicon entries in fixed size (10 and 100) from the pool of all the possible city, state, and street names. The true word was always present in the lexicon. The word recognizer achieved 96.3% of top choice correct rate with size 10 lexicons. Figure 4 shows dynamic characteristics of adjusted distance by liability exposure in term of lexicon complexity. The setting of the index of rejectivity,, is0.7,andthe lexicon size is 100. If the rejected lexicon entries become larger, the distance gap between the first and second choices becomes large. Figure 5 shows the overall effect of using lexicon complexity throughout the testing set. The lexicons of size 10 were used. The index of rejection was set at 0.7 when liability (lexicon complexity) was applied. The recognizer References [1] A. W. Senior and A. J. Robinson. An Off-Line Cursive Handwriting Recognition System. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3): , [2] G. Kim and V. Govindaraju. A Lexicon Driven Approach to Handwritten Word Recognition for Real- Time Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4): , [3] R. A. Wagner and M. J. Fischer. The String-to- String Correction Problem. J. Assoc. Comput. Mach., 21(1): , [4] M. A. Goodrich, W. C. Stirling and R. L. Frost. A Theory of Satisficing Decisions and Control. IEEE Transactions on System, Man and Cybernetics - Part A, 28(6): , [5] J. Park, V. Govindaraju and S. N. Srihari. Efficient Word Segmentation Driven By Unconstrained Handwritten Phrase Recognition. In International Conference on Document Analysis and Recognition, Bangalore, India, pages , [6] H. Freeman. Computer Processing of Line-Drawing Images. Computing Surveys, 6(1):57 97,
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
Unconstrained Handwritten Character Recognition Using Different Classification Strategies
Unconstrained Handwritten Character Recognition Using Different Classification Strategies Alessandro L. Koerich Department of Computer Science (PPGIA) Pontifical Catholic University of Paraná (PUCPR) Curitiba,
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Signature verification using Kolmogorov-Smirnov. statistic
Signature verification using Kolmogorov-Smirnov statistic Harish Srinivasan, Sargur N.Srihari and Matthew J Beal University at Buffalo, the State University of New York, Buffalo USA {srihari,hs32}@cedar.buffalo.edu,[email protected]
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Keywords image processing, signature verification, false acceptance rate, false rejection rate, forgeries, feature vectors, support vector machines.
International Journal of Computer Application and Engineering Technology Volume 3-Issue2, Apr 2014.Pp. 188-192 www.ijcaet.net OFFLINE SIGNATURE VERIFICATION SYSTEM -A REVIEW Pooja Department of Computer
. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns
Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties
Automata Theory. Şubat 2006 Tuğrul Yılmaz Ankara Üniversitesi
Automata Theory Automata theory is the study of abstract computing devices. A. M. Turing studied an abstract machine that had all the capabilities of today s computers. Turing s goal was to describe the
Part-Based Recognition
Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple
Recognition Method for Handwritten Digits Based on Improved Chain Code Histogram Feature
3rd International Conference on Multimedia Technology ICMT 2013) Recognition Method for Handwritten Digits Based on Improved Chain Code Histogram Feature Qian You, Xichang Wang, Huaying Zhang, Zhen Sun
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Visibility optimization for data visualization: A Survey of Issues and Techniques
Visibility optimization for data visualization: A Survey of Issues and Techniques Ch Harika, Dr.Supreethi K.P Student, M.Tech, Assistant Professor College of Engineering, Jawaharlal Nehru Technological
Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration
Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases Marco Aiello On behalf of MAGIC-5 collaboration Index Motivations of morphological analysis Segmentation of
III. SEGMENTATION. A. Origin Segmentation
2012 International Conference on Frontiers in Handwriting Recognition Handwritten English Word Recognition based on Convolutional Neural Networks Aiquan Yuan, Gang Bai, Po Yang, Yanni Guo, Xinting Zhao
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan
Handwritten Signature Verification ECE 533 Project Report by Ashish Dhawan Aditi R. Ganesan Contents 1. Abstract 3. 2. Introduction 4. 3. Approach 6. 4. Pre-processing 8. 5. Feature Extraction 9. 6. Verification
Efficient on-line Signature Verification System
International Journal of Engineering & Technology IJET-IJENS Vol:10 No:04 42 Efficient on-line Signature Verification System Dr. S.A Daramola 1 and Prof. T.S Ibiyemi 2 1 Department of Electrical and Information
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Tracking Groups of Pedestrians in Video Sequences
Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal
Cursive Handwriting Recognition for Document Archiving
International Digital Archives Project Cursive Handwriting Recognition for Document Archiving Trish Keaton Rod Goodman California Institute of Technology Motivation Numerous documents have been conserved
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Biometric Authentication using Online Signatures
Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu [email protected], [email protected] http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,
COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS
COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS B.K. Mohan and S. N. Ladha Centre for Studies in Resources Engineering IIT
Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering
Advances in Intelligent Systems and Technologies Proceedings ECIT2004 - Third European Conference on Intelligent Systems and Technologies Iasi, Romania, July 21-23, 2004 Evolutionary Detection of Rules
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Tracking of Small Unmanned Aerial Vehicles
Tracking of Small Unmanned Aerial Vehicles Steven Krukowski Adrien Perkins Aeronautics and Astronautics Stanford University Stanford, CA 94305 Email: [email protected] Aeronautics and Astronautics Stanford
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
Image Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
Investigating Clinical Care Pathways Correlated with Outcomes
Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
Leveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
DIAGONAL BASED FEATURE EXTRACTION FOR HANDWRITTEN ALPHABETS RECOGNITION SYSTEM USING NEURAL NETWORK
DIAGONAL BASED FEATURE EXTRACTION FOR HANDWRITTEN ALPHABETS RECOGNITION SYSTEM USING NEURAL NETWORK J.Pradeep 1, E.Srinivasan 2 and S.Himavathi 3 1,2 Department of ECE, Pondicherry College Engineering,
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Jiří Matas. Hough Transform
Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian
Simultaneous Gamma Correction and Registration in the Frequency Domain
Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong [email protected] William Bishop [email protected] Department of Electrical and Computer Engineering University
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
Signature Segmentation from Machine Printed Documents using Conditional Random Field
2011 International Conference on Document Analysis and Recognition Signature Segmentation from Machine Printed Documents using Conditional Random Field Ranju Mandal Computer Vision and Pattern Recognition
Algebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Colour Image Segmentation Technique for Screen Printing
60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
SoSe 2014: M-TANI: Big Data Analytics
SoSe 2014: M-TANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering
Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining
Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,
K-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
2.2 Creaseness operator
2.2. Creaseness operator 31 2.2 Creaseness operator Antonio López, a member of our group, has studied for his PhD dissertation the differential operators described in this section [72]. He has compared
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Alok Gupta. Dmitry Zhdanov
RESEARCH ARTICLE GROWTH AND SUSTAINABILITY OF MANAGED SECURITY SERVICES NETWORKS: AN ECONOMIC PERSPECTIVE Alok Gupta Department of Information and Decision Sciences, Carlson School of Management, University
Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition
IWNEST PUBLISHER Journal of Industrial Engineering Research (ISSN: 2077-4559) Journal home page: http://www.iwnest.com/aace/ Adaptive sequence of Key Pose Detection for Human Action Recognition 1 T. Sindhu
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Using Neural Networks to Create an Adaptive Character Recognition System
Using Neural Networks to Create an Adaptive Character Recognition System Alexander J. Faaborg Cornell University, Ithaca NY (May 14, 2002) Abstract A back-propagation neural network with one hidden layer
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Visual Structure Analysis of Flow Charts in Patent Images
Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information
Multimedia Document Authentication using On-line Signatures as Watermarks
Multimedia Document Authentication using On-line Signatures as Watermarks Anoop M Namboodiri and Anil K Jain Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824
Neural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
Management Science Letters
Management Science Letters 4 (2014) 905 912 Contents lists available at GrowingScience Management Science Letters homepage: www.growingscience.com/msl Measuring customer loyalty using an extended RFM and
Online Farsi Handwritten Character Recognition Using Hidden Markov Model
Online Farsi Handwritten Character Recognition Using Hidden Markov Model Vahid Ghods*, Mohammad Karim Sohrabi Department of Electrical and Computer Engineering, Semnan Branch, Islamic Azad University,
A Lightweight and Effective Music Score Recognition on Mobile Phone
J Inf Process Syst, http://dx.doi.org/.3745/jips ISSN 1976-913X (Print) ISSN 92-5X (Electronic) A Lightweight and Effective Music Score Recognition on Mobile Phone Tam Nguyen* and Gueesang Lee** Abstract
Introduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
Emotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE
LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-
LONG BEACH CITY COLLEGE MEMORANDUM
LONG BEACH CITY COLLEGE MEMORANDUM DATE: May 5, 2000 TO: Academic Senate Equivalency Committee FROM: John Hugunin Department Head for CBIS SUBJECT: Equivalency statement for Computer Science Instructor
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur
Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur Module No. # 01 Lecture No. # 05 Classic Cryptosystems (Refer Slide Time: 00:42)
Determining optimal window size for texture feature extraction methods
IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
APPLYING COMPUTER VISION TECHNIQUES TO TOPOGRAPHIC OBJECTS
APPLYING COMPUTER VISION TECHNIQUES TO TOPOGRAPHIC OBJECTS Laura Keyes, Adam Winstanley Department of Computer Science National University of Ireland Maynooth Co. Kildare, Ireland [email protected], [email protected]
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Face Model Fitting on Low Resolution Images
Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com
Web Forensic Evidence of SQL Injection Analysis
International Journal of Science and Engineering Vol.5 No.1(2015):157-162 157 Web Forensic Evidence of SQL Injection Analysis 針 對 SQL Injection 攻 擊 鑑 識 之 分 析 Chinyang Henry Tseng 1 National Taipei University
Visual-based ID Verification by Signature Tracking
Visual-based ID Verification by Signature Tracking Mario E. Munich and Pietro Perona California Institute of Technology www.vision.caltech.edu/mariomu Outline Biometric ID Visual Signature Acquisition
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
PARTIAL FINGERPRINT REGISTRATION FOR FORENSICS USING MINUTIAE-GENERATED ORIENTATION FIELDS
PARTIAL FINGERPRINT REGISTRATION FOR FORENSICS USING MINUTIAE-GENERATED ORIENTATION FIELDS Ram P. Krish 1, Julian Fierrez 1, Daniel Ramos 1, Javier Ortega-Garcia 1, Josef Bigun 2 1 Biometric Recognition
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke
1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline
Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability
Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller
Vision based Vehicle Tracking using a high angle camera
Vision based Vehicle Tracking using a high angle camera Raúl Ignacio Ramos García Dule Shu [email protected] [email protected] Abstract A vehicle tracking and grouping algorithm is presented in this work
Vehicle Tracking System Robust to Changes in Environmental Conditions
INORMATION & COMMUNICATIONS Vehicle Tracking System Robust to Changes in Environmental Conditions Yasuo OGIUCHI*, Masakatsu HIGASHIKUBO, Kenji NISHIDA and Takio KURITA Driving Safety Support Systems (DSSS)
Handwritten digit segmentation: a comparative study
IJDAR (2013) 16:127 137 DOI 10.1007/s10032-012-0185-9 ORIGINAL PAPER Handwritten digit segmentation: a comparative study F. C. Ribas L. S. Oliveira A. S. Britto Jr. R. Sabourin Received: 6 July 2010 /
Introduction. Chapter 1
1 Chapter 1 Introduction Robotics and automation have undergone an outstanding development in the manufacturing industry over the last decades owing to the increasing demand for higher levels of productivity
