An Overview of the Tesseract OCR Engine
|
|
|
- Leslie Oliver
- 9 years ago
- Views:
Transcription
1 An Overview of the Tesseract OCR Engine Ray Smith Google Inc. Abstract The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy[1], is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier. 1. Introduction Motivation and History Tesseract is an open-source OCR engine that was developed at HP between 1984 and Like a supernova, it appeared from nowhere for the 1995 UNLV Annual Test of OCR Accuracy [1], shone brightly with its results, and then vanished back under the same cloak of secrecy under which it had been developed. Now for the first time, details of the architecture and algorithms can be revealed. Tesseract began as a PhD research project [2] in HP Labs, Bristol, and gained momentum as a possible software and/or hardware add-on for HP s line of flatbed scanners. Motivation was provided by the fact that the commercial OCR engines of the day were in their infancy, and failed miserably on anything but the best quality print. After a joint project between HP Labs Bristol, and HP s scanner division in Colorado, Tesseract had a significant lead in accuracy over the commercial engines, but did not become a product. The next stage of its development was back in HP Labs Bristol as an investigation of OCR for compression. Work concentrated more on improving rejection efficiency than on base-level accuracy. At the end of this project, at the end of 1994, development ceased entirely. The engine was sent to UNLV for the 1995 Annual Test of OCR Accuracy[1], where it proved its worth against the commercial engines of the time. In late 2005, HP released Tesseract for open source. It is now available at 2. Architecture Since HP had independently-developed page layout analysis technology that was used in products, (and therefore not released for open-source) Tesseract never needed its own page layout analysis. Tesseract therefore assumes that its input is a binary image with optional polygonal text regions defined. Processing follows a traditional step-by-step pipeline, but some of the stages were unusual in their day, and possibly remain so even now. The first step is a connected component analysis in which outlines of the components are stored. This was a computationally expensive design decision at the time, but had a significant advantage: by inspection of the nesting of outlines, and the number of child and grandchild outlines, it is simple to detect inverse text and recognize it as easily as black-on-white text. Tesseract was probably the first OCR engine able to handle white-on-black text so trivially. At this stage, outlines are gathered together, purely by nesting, into Blobs. Blobs are organized into text lines, and the lines and regions are analyzed for fixed pitch or proportional text. Text lines are broken into words differently according to the kind of character spacing. Fixed pitch text is chopped immediately by character cells. Proportional text is broken into words using definite spaces and fuzzy spaces. Recognition then proceeds as a two-pass process. In the first pass, an attempt is made to recognize each word in turn. Each word that is satisfactory is passed to an adaptive classifier as training data. The adaptive classifier then gets a chance to more accurately recognize text lower down the page. Since the adaptive classifier may have learned something useful too late to make a contribution near the top of the page, a second pass is run over the page, in which words that were not recognized well enough are recognized again. A final phase resolves fuzzy spaces, and checks alternative hypotheses for the x-height to locate smallcap text.
2 3. Line and Word Finding 3.1. Line Finding The line finding algorithm is one of the few parts of Tesseract that has previously been published [3]. The line finding algorithm is designed so that a skewed page can be recognized without having to de-skew, thus saving loss of image quality. The key parts of the process are blob filtering and line construction. Assuming that page layout analysis has already provided text regions of a roughly uniform text size, a simple percentile height filter removes drop-caps and vertically touching characters. The median height approximates the text size in the region, so it is safe to filter out blobs that are smaller than some fraction of the median height, being most likely punctuation, diacritical marks and noise. The filtered blobs are more likely to fit a model of non-overlapping, parallel, but sloping lines. Sorting and processing the blobs by x-coordinate makes it possible to assign blobs to a unique text line, while tracking the slope across the page, with greatly reduced danger of assigning to an incorrect text line in the presence of skew. Once the filtered blobs have been assigned to lines, a least median of squares fit [4] is used to estimate the baselines, and the filtered-out blobs are fitted back into the appropriate lines. The final step of the line creation process merges blobs that overlap by at least half horizontally, putting diacritical marks together with the correct base and correctly associating parts of some broken characters Baseline Fitting Once the text lines have been found, the baselines are fitted more precisely using a quadratic spline. This was another first for an OCR system, and enabled Tesseract to handle pages with curved baselines [5], which are a common artifact in scanning, and not just at book bindings. The baselines are fitted by partitioning the blobs into groups with a reasonably continuous displacement for the original straight baseline. A quadratic spline is fitted to the most populous partition, (assumed to be the baseline) by a least squares fit. The quadratic spline has the advantage that this calculation is reasonably stable, but the disadvantage that discontinuities can arise when multiple spline segments are required. A more traditional cubic spline [6] might work better. Fig. 1. An example of a curved fitted baseline. Fig.1 shows an example of a line of text with a fitted baseline, descender line, meanline and ascender line. All these lines are parallel (the y separation is a constant over the entire length) and slightly curved. The ascender line is cyan (prints as light gray) and the black line above it is actually straight. Close inspection shows that the cyan/gray line is curved relative to the straight black line above it Fixed Pitch Detection and Chopping Tesseract tests the text lines to determine whether they are fixed pitch. Where it finds fixed pitch text, Tesseract chops the words into characters using the pitch, and disables the chopper and associator on these words for the word recognition step. Fig. 2 shows a typical example of a fixed-pitch word. Fig. 2. A fixed-pitch chopped word Proportional Word Finding Non-fixed-pitch or proportional text spacing is a highly non-trivial task. Fig. 3 illustrates some typical problems. The gap between the tens and units of 11.9% is a similar size to the general space, and is certainly larger than the kerned space between erated and junk. There is no horizontal gap at all between the bounding boxes of of and financial. Tesseract solves most of these problems by measuring gaps in a limited vertical range between the baseline and mean line. Spaces that are close to the threshold at this stage are made fuzzy, so that a final decision can be made after word recognition. Fig. 3. Some difficult word spacing. 4. Word Recognition Part of the recognition process for any character recognition engine is to identify how a word should be segmented into characters. The initial segmentation output from line finding is classified first. The rest of the word recognition step applies only to non-fixedpitch text.
3 4.1 Chopping Joined Characters While the result from a word (see section 6) is unsatisfactory, Tesseract attempts to improve the result by chopping the blob with worst confidence from the character classifier. Candidate chop points are found from concave vertices of a polygonal approximation [2] of the outline, and may have either another concave vertex opposite, or a line segment. It may take up to 3 pairs of chop points to successfully separate joined characters from the ASCII set. Fig. 4. Candidate chop points and chop. Fig. 4 shows a set of candidate chop points with arrows and the selected chop as a line across the outline where the r touches the m. Chops are executed in priority order. Any chop that fails to improve the confidence of the result is undone, but not completely discarded so that the chop can be re-used later by the associator if needed. When the A* segmentation search was first implemented in about 1989, Tesseract s accuracy on broken characters was well ahead of the commercial engines of the day. Fig. 5 is a typical example. An essential part of that success was the character classifier that could easily recognize broken characters. 5. Static Character Classifier 5.1. Features An early version of Tesseract used topological features developed from the work of Shillman et. al. [7-8] Though nicely independent of font and size, these features are not robust to the problems found in reallife images, as Bokser [9] describes. An intermediate idea involved the use of segments of the polygonal approximation as features, but this approach is also not robust to damaged characters. For example, in Fig. 6(a), the right side of the shaft is in two main pieces, but in Fig. 6(b) there is just a single piece Associating Broken Characters When the potential chops have been exhausted, if the word is still not good enough, it is given to the associator. The associator makes an A* (best first) search of the segmentation graph of possible combinations of the maximally chopped blobs into candidate characters. It does this without actually building the segmentation graph, but instead maintains a hash table of visited states. The A* search proceeds by pulling candidate new states from a priority queue and evaluating them by classifying unclassified combinations of fragments. It may be argued that this fully-chop-then-associate approach is at best inefficient, at worst liable to miss important chops, and that may well be the case. The advantage is that the chop-then-associate scheme simplifies the data structures that would be required to maintain the full segmentation graph. Fig. 5. An easily recognized word. Fig. 6. (a) Pristine h, (b) broken h, (c) features matched to prototypes. The breakthrough solution is the idea that the features in the unknown need not be the same as the features in the training data. During training, the segments of a polygonal approximation [2] are used for features, but in recognition, features of a small, fixed length (in normalized units) are extracted from the outline and matched many-to-one against the clustered prototype features of the training data. In Fig. 6(c), the short, thick lines are the features extracted from the unknown, and the thin, longer lines are the clustered segments of the polygonal approximation that are used as prototypes. One prototype bridging the two pieces is completely unmatched. Three features on one side and two on the other are unmatched, but, apart from those, every prototype and every feature is well matched. This example shows that this process of small features matching large prototypes is easily able to cope with recognition of damaged images. Its main problem is that the computational cost of computing the distance between an unknown and a prototype is very high.
4 The features extracted from the unknown are thus 3- dimensional, (x, y position, angle), with typically features in a character, and the prototype features are 4-dimensional (x, y, position, angle, length), with typically features in a prototype configuration Classification Classification proceeds as a two-step process. In the first step, a class pruner creates a shortlist of character classes that the unknown might match. Each feature fetches, from a coarsely quantized 3-dimensional lookup table, a bit-vector of classes that it might match, and the bit-vectors are summed over all the features. The classes with the highest counts (after correcting for expected number of features) become the short-list for the next step. Each feature of the unknown looks up a bit vector of prototypes of the given class that it might match, and then the actual similarity between them is computed. Each prototype character class is represented by a logical sum-of-product expression with each term called a configuration, so the distance calculation process keeps a record of the total similarity evidence of each feature in each configuration, as well as of each prototype. The best combined distance, which is calculated from the summed feature and prototype evidences, is the best over all the stored configurations of the class Training Data Since the classifier is able to recognize damaged characters easily, the classifier was not trained on damaged characters. In fact, the classifier was trained on a mere 20 samples of 94 characters from 8 fonts in a single size, but with 4 attributes (normal, bold, italic, bold italic), making a total of training samples. This is a significant contrast to other published classifiers, such as the Calera classifier with more than a million samples [9], and Baird s 100-font classifier [10] with training samples. 6. Linguistic Analysis Tesseract contains relatively little linguistic analysis. Whenever the word recognition module is considering a new segmentation, the linguistic module (mis-named the permuter) chooses the best available word string in each of the following categories: Top frequent word, Top dictionary word, Top numeric word, Top UPPER case word, Top lower case word (with optional initial upper), Top classifier choice word. The final decision for a given segmentation is simply the word with the lowest total distance rating, where each of the above categories is multiplied by a different constant. Words from different segmentations may have different numbers of characters in them. It is hard to compare these words directly, even where a classifier claims to be producing probabilities, which Tesseract does not. This problem is solved in Tesseract by generating two numbers for each character classification. The first, called the confidence, is minus the normalized distance from the prototype. This enables it to be a confidence in the sense that greater numbers are better, but still a distance, as, the farther from zero, the greater the distance. The second output, called the rating, multiplies the normalized distance from the prototype by the total outline length in the unknown character. Ratings for characters within a word can be summed meaningfully, since the total outline length for all characters within a word is always the same. 7. Adaptive Classifier It has been suggested [11] and demonstrated [12] that OCR engines can benefit from the use of an adaptive classifier. Since the static classifier has to be good at generalizing to any kind of font, its ability to discriminate between different characters or between characters and non-characters is weakened. A more font-sensitive adaptive classifier that is trained by the output of the static classifier is therefore commonly [13] used to obtain greater discrimination within each document, where the number of fonts is limited. Tesseract does not employ a template classifier, but uses the same features and classifier as the static classifier. The only significant difference between the static classifier and the adaptive classifier, apart from the training data, is that the adaptive classifier uses isotropic baseline/x-height normalization, whereas the static classifier normalizes characters by the centroid (first moments) for position and second moments for anisotropic size normalization. The baseline/x-height normalization makes it easier to distinguish upper and lower case characters as well as improving immunity to noise specks. The main benefit of character moment normalization is removal of font aspect ratio and some degree of font stroke width. It also makes recognition of sub and superscripts simpler, but requires an additional classifier feature to distinguish some upper and lower case characters. Fig. 7 shows an example of 3 letters in baseline/x-height normalized form and moment normalized form.
5 and data, as well as Luc Vincent, Igor Krivokon, Dar- Shyang Lee, and Thomas Kielbus for their comments on the content of this paper. Fig. 7. Baseline and moment normalized letters. 8. Results Tesseract was included in the 4 th UNLV annual test [1] of OCR accuracy, as HP Labs OCR, but the code has changed a lot since then, including conversion to Unicode and retraining. Table 1 compares results from a recent version of Tesseract (shown as 2.0) with the original 1995 results (shown as HP). All four 300 DPI binary test sets that were used in the 1995 test are shown, along with the number of errors (Errs), the percent error rate (%Err) and the percent change relative to the 1995 results (%Chg) for both character errors and non-stopword errors. [1] More up-to-date results are at Table 1. Results of Current and old Tesseract. Character Word Ver Set Errs %Err %Chg Errs %Err %Chg HP bus bus HP doe doe HP mag mag HP news news total Conclusion and Further Work After lying dormant for more than 10 years, Tesseract is now behind the leading commercial engines in terms of its accuracy. Its key strength is probably its unusual choice of features. Its key weakness is probably its use of a polygonal approximation as input to the classifier instead of the raw outlines. With internationalization done, accuracy could probably be improved significantly with the judicious addition of a Hidden-Markov-Model-based character n-gram model, and possibly an improved chopper. 10. Acknowledgements The author would like to thank John Burns and Tom Nartker for their efforts in making Tesseract open source, the ISRI group at UNLV for sharing their tools 11. References [1] S.V. Rice, F.R. Jenkins, T.A. Nartker, The Fourth Annual Test of OCR Accuracy, Technical Report 95-03, Information Science Research Institute, University of Nevada, Las Vegas, July [2] R.W. Smith, The Extraction and Recognition of Text from Multimedia Document Images, PhD Thesis, University of Bristol, November [3] R. Smith, A Simple and Efficient Skew Detection Algorithm via Text Row Accumulation, Proc. of the 3 rd Int. Conf. on Document Analysis and Recognition (Vol. 2), IEEE 1995, pp [4] P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, Wiley-IEEE, [5] S.V. Rice, G. Nagy, T.A. Nartker, Optical Character Recognition: An Illustrated Guide to the Frontier, Kluwer Academic Publishers, USA 1999, pp [6] P.J. Schneider, An Algorithm for Automatically Fitting Digitized Curves, in A.S. Glassner, Graphics Gems I, Morgan Kaufmann, 1990, pp [7] R.J. Shillman, Character Recognition Based on Phenomenological Attributes: Theory and Methods, PhD. Thesis, Massachusetts Institute of Technology [8] B.A. Blesser, T.T. Kuklinski, R.J. Shillman, Empirical Tests for Feature Selection Based on a Pscychological Theory of Character Recognition, Pattern Recognition 8(2), Elsevier, New York, [9] M. Bokser, Omnidocument Technologies, Proc. IEEE 80(7), IEEE, USA, Jul 1992, pp [10] H.S. Baird, R. Fossey, A 100-Font Classifier, Proc. of the 1 st Int. Conf. on Document Analysis and Recognition, IEEE, 1991, pp [11] G. Nagy, At the frontiers of OCR, Proc. IEEE 80(7), IEEE, USA, Jul 1992, pp [12] G. Nagy, Y. Xu, Automatic Prototype Extraction for Adaptive OCR, Proc. of the 4 th Int. Conf. on Document Analysis and Recognition, IEEE, Aug 1997, pp [13] I. Marosi, Industrial OCR approaches: architecture, algorithms and adaptation techniques, Document Recognition and Retrieval XIV, SPIE Jan 2007,
Tesseract OCR Engine. What it is, where it came from, where it is going. Ray Smith, Google Inc OSCON 2007
Tesseract OCR Engine What it is, where it came from, where it is going. Ray Smith, Google Inc OSCON 2007 Contents Introduction & history of OCR Tesseract architecture & methods Announcing Tesseract 2.00
Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment
2009 10th International Conference on Document Analysis and Recognition Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment Ahmad Abdulkader Matthew R. Casey Google Inc. [email protected]
Automatic License Plate Recognition using Python and OpenCV
Automatic License Plate Recognition using Python and OpenCV K.M. Sajjad Department of Computer Science and Engineering M.E.S. College of Engineering, Kuttippuram, Kerala [email protected] Abstract Automatic
Adjunct Assistant Professor Gayle Rembold Furbert. Typography II. County College of Morris Graphic Design Degree Program
Adjunct Assistant Professor Gayle Rembold Furbert Typography II County College of Morris Graphic Design Degree Program Voice of Type As you look at typefaces, analyze their forms, learn their history and
Determining the Resolution of Scanned Document Images
Presented at IS&T/SPIE EI 99, Conference 3651, Document Recognition and Retrieval VI Jan 26-28, 1999, San Jose, CA. Determining the Resolution of Scanned Document Images Dan S. Bloomberg Xerox Palo Alto
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan
Handwritten Signature Verification ECE 533 Project Report by Ashish Dhawan Aditi R. Ganesan Contents 1. Abstract 3. 2. Introduction 4. 3. Approach 6. 4. Pre-processing 8. 5. Feature Extraction 9. 6. Verification
ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite
ALGEBRA Pupils should be taught to: Generate and describe sequences As outcomes, Year 7 pupils should, for example: Use, read and write, spelling correctly: sequence, term, nth term, consecutive, rule,
Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability
Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller
Visual Structure Analysis of Flow Charts in Patent Images
Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information
Face detection is a process of localizing and extracting the face region from the
Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.
How To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
SOURCE SCANNER IDENTIFICATION FOR SCANNED DOCUMENTS. Nitin Khanna and Edward J. Delp
SOURCE SCANNER IDENTIFICATION FOR SCANNED DOCUMENTS Nitin Khanna and Edward J. Delp Video and Image Processing Laboratory School of Electrical and Computer Engineering Purdue University West Lafayette,
Word processing software
Unit 244 Word processing software UAN: Level: 2 Credit value: 4 GLH: 30 Assessment type: Relationship to NOS: Assessment requirements specified by a sector or regulatory body: Aim: R/502/4628 Portfolio
Script and Language Identification for Handwritten Document Images. Judith Hochberg Kevin Bowers * Michael Cannon Patrick Kelly
Script and Language Identification for Handwritten Document Images Judith Hochberg Kevin Bowers * Michael Cannon Patrick Kelly Computer Research and Applications Group (CIC-3) Mail Stop B265 Los Alamos
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
New Hash Function Construction for Textual and Geometric Data Retrieval
Latest Trends on Computers, Vol., pp.483-489, ISBN 978-96-474-3-4, ISSN 79-45, CSCC conference, Corfu, Greece, New Hash Function Construction for Textual and Geometric Data Retrieval Václav Skala, Jan
PageX: An Integrated Document Processing and Management Software for Digital Libraries
PageX: An Integrated Document Processing and Management Software for Digital Libraries Hanchuan Peng, Zheru Chi, Wanchi Siu, and David Dagan Feng Department of Electronic & Information Engineering The
Copyright 2011 Casa Software Ltd. www.casaxps.com. Centre of Mass
Centre of Mass A central theme in mathematical modelling is that of reducing complex problems to simpler, and hopefully, equivalent problems for which mathematical analysis is possible. The concept of
Scanning for OCR Text Conversion
Scanning for OCR Text Conversion Background Optical Character Recognition (OCR) is the recognition and interpretation of text in a digital image. Its main application is the translation of printed type
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
Numeracy and mathematics Experiences and outcomes
Numeracy and mathematics Experiences and outcomes My learning in mathematics enables me to: develop a secure understanding of the concepts, principles and processes of mathematics and apply these in different
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Automatic Extraction of Signatures from Bank Cheques and other Documents
Automatic Extraction of Signatures from Bank Cheques and other Documents Vamsi Krishna Madasu *, Mohd. Hafizuddin Mohd. Yusof, M. Hanmandlu ß, Kurt Kubik * *Intelligent Real-Time Imaging and Sensing group,
More Local Structure Information for Make-Model Recognition
More Local Structure Information for Make-Model Recognition David Anthony Torres Dept. of Computer Science The University of California at San Diego La Jolla, CA 9093 Abstract An object classification
Cursive Handwriting Recognition for Document Archiving
International Digital Archives Project Cursive Handwriting Recognition for Document Archiving Trish Keaton Rod Goodman California Institute of Technology Motivation Numerous documents have been conserved
MiSeq: Imaging and Base Calling
MiSeq: Imaging and Page Welcome Navigation Presenter Introduction MiSeq Sequencing Workflow Narration Welcome to MiSeq: Imaging and. This course takes 35 minutes to complete. Click Next to continue. Please
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
Data Storage 3.1. Foundations of Computer Science Cengage Learning
3 Data Storage 3.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List five different data types used in a computer. Describe how
Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.
DEVELOPMENT OF HYBRID VECTORIZING SOFTWARE FOR DIGITIZATION OF CADASTRAL MAPS
DEVELOPMENT OF HYBRID VECTORIZING SOFTWARE FOR DIGITIZATION OF CADASTRAL MAPS Byoungjun SEO, Jaejoon JEONG, Jaebin LEE and Prof. Yongil KIM, Korea ABSTRACT The Cadastral map is a basic data that prescribes
Reflection and Refraction
Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
SIGNATURE VERIFICATION
SIGNATURE VERIFICATION Dr. H.B.Kekre, Dr. Dhirendra Mishra, Ms. Shilpa Buddhadev, Ms. Bhagyashree Mall, Mr. Gaurav Jangid, Ms. Nikita Lakhotia Computer engineering Department, MPSTME, NMIMS University
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:
Chapter 3 Data Storage Objectives After studying this chapter, students should be able to: List five different data types used in a computer. Describe how integers are stored in a computer. Describe how
Solving Geometric Problems with the Rotating Calipers *
Solving Geometric Problems with the Rotating Calipers * Godfried Toussaint School of Computer Science McGill University Montreal, Quebec, Canada ABSTRACT Shamos [1] recently showed that the diameter of
Multimedia Document Authentication using On-line Signatures as Watermarks
Multimedia Document Authentication using On-line Signatures as Watermarks Anoop M Namboodiri and Anil K Jain Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824
Linear Programming. Solving LP Models Using MS Excel, 18
SUPPLEMENT TO CHAPTER SIX Linear Programming SUPPLEMENT OUTLINE Introduction, 2 Linear Programming Models, 2 Model Formulation, 4 Graphical Linear Programming, 5 Outline of Graphical Procedure, 5 Plotting
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
Chapter 3. Application Software. Chapter 3 Objectives. Application Software
Chapter 3 Objectives Chapter 3 Application Software Identify the categories of application software Explain ways software is distributed Explain how to work with application software Identify the key features
Each function call carries out a single task associated with drawing the graph.
Chapter 3 Graphics with R 3.1 Low-Level Graphics R has extensive facilities for producing graphs. There are both low- and high-level graphics facilities. The low-level graphics facilities provide basic
Fast Sequential Summation Algorithms Using Augmented Data Structures
Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik [email protected] Abstract This paper provides an introduction to the design of augmented data structures that offer
Form Design Guidelines Part of the Best-Practices Handbook. Form Design Guidelines
Part of the Best-Practices Handbook Form Design Guidelines I T C O N S U L T I N G Managing projects, supporting implementations and roll-outs onsite combined with expert knowledge. C U S T O M D E V E
Common Tools for Displaying and Communicating Data for Process Improvement
Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot
Index Terms Domain name, Firewall, Packet, Phishing, URL.
BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet
Efficient on-line Signature Verification System
International Journal of Engineering & Technology IJET-IJENS Vol:10 No:04 42 Efficient on-line Signature Verification System Dr. S.A Daramola 1 and Prof. T.S Ibiyemi 2 1 Department of Electrical and Information
Automatic Detection of PCB Defects
IJIRST International Journal for Innovative Research in Science & Technology Volume 1 Issue 6 November 2014 ISSN (online): 2349-6010 Automatic Detection of PCB Defects Ashish Singh PG Student Vimal H.
Adaptive Automated GUI Testing Producing Test Frameworks to Withstand Change
Adaptive Automated GUI Testing Producing Test Frameworks to Withstand Change Abstract Although QA and Development managers may see many challenges in creating an automated GUI testing framework, it is
Thoughts on RFM Scoring
John R. Miglautsch, PDM Thoughts on RFM Scoring John Miglautsch received his Master of Arts from the University of Wisconsin-Madison in 1985, and his Bachelor of Arts with Distinction from UW-Madison in
Digitization of Old Maps Using Deskan Express 5.0
Dražen Tutić *, Miljenko Lapaine ** Digitization of Old Maps Using Deskan Express 5.0 Keywords: digitization; scanner; scanning; old maps; Deskan Express 5.0. Summary The Faculty of Geodesy, University
Keywords image processing, signature verification, false acceptance rate, false rejection rate, forgeries, feature vectors, support vector machines.
International Journal of Computer Application and Engineering Technology Volume 3-Issue2, Apr 2014.Pp. 188-192 www.ijcaet.net OFFLINE SIGNATURE VERIFICATION SYSTEM -A REVIEW Pooja Department of Computer
4. How many integers between 2004 and 4002 are perfect squares?
5 is 0% of what number? What is the value of + 3 4 + 99 00? (alternating signs) 3 A frog is at the bottom of a well 0 feet deep It climbs up 3 feet every day, but slides back feet each night If it started
Low-resolution Character Recognition by Video-based Super-resolution
2009 10th International Conference on Document Analysis and Recognition Low-resolution Character Recognition by Video-based Super-resolution Ataru Ohkura 1, Daisuke Deguchi 1, Tomokazu Takahashi 2, Ichiro
Map/Reduce Affinity Propagation Clustering Algorithm
Map/Reduce Affinity Propagation Clustering Algorithm Wei-Chih Hung, Chun-Yen Chu, and Yi-Leh Wu Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology,
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH Vikas J Dongre 1 Vijay H Mankar 2 Department of Electronics & Telecommunication, Government Polytechnic, Nagpur, India 1 [email protected]; 2
Such As Statements, Kindergarten Grade 8
Such As Statements, Kindergarten Grade 8 This document contains the such as statements that were included in the review committees final recommendations for revisions to the mathematics Texas Essential
A Study of Automatic License Plate Recognition Algorithms and Techniques
A Study of Automatic License Plate Recognition Algorithms and Techniques Nima Asadi Intelligent Embedded Systems Mälardalen University Västerås, Sweden [email protected] ABSTRACT One of the most
Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick Reference Guide
Open Crystal Reports From the Windows Start menu choose Programs and then Crystal Reports. Creating a Blank Report Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
Colour Image Segmentation Technique for Screen Printing
60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone
Fingerprint s Core Point Detection using Gradient Field Mask
Fingerprint s Core Point Detection using Gradient Field Mask Ashish Mishra Assistant Professor Dept. of Computer Science, GGCT, Jabalpur, [M.P.], Dr.Madhu Shandilya Associate Professor Dept. of Electronics.MANIT,Bhopal[M.P.]
Locating and Decoding EAN-13 Barcodes from Images Captured by Digital Cameras
Locating and Decoding EAN-13 Barcodes from Images Captured by Digital Cameras W3A.5 Douglas Chai and Florian Hock Visual Information Processing Research Group School of Engineering and Mathematics Edith
A Dynamic Approach to Extract Texts and Captions from Videos
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Oracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction
Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds
Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative
Offline Word Spotting in Handwritten Documents
Offline Word Spotting in Handwritten Documents Nicholas True Department of Computer Science University of California, San Diego San Diego, CA 9500 [email protected] Abstract The digitization of written
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
Chapter 3. Application Software. Chapter 3 Objectives. Application Software. Application Software. Application Software. What is application software?
Chapter 3 Objectives Chapter 3 Application Software Identify the the categories of of application software Explain ways software is is distributed Explain how to to work with application software Identify
In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.
MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target
MATH 095, College Prep Mathematics: Unit Coverage Pre-algebra topics (arithmetic skills) offered through BSE (Basic Skills Education)
MATH 095, College Prep Mathematics: Unit Coverage Pre-algebra topics (arithmetic skills) offered through BSE (Basic Skills Education) Accurately add, subtract, multiply, and divide whole numbers, integers,
A Stock Pattern Recognition Algorithm Based on Neural Networks
A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo [email protected] Xun Liang [email protected] Xiang Li [email protected] Abstract pattern respectively. Recent
When older typesetting methods gave
Typographic Terms When older typesetting methods gave way to electronic publishing, certain traditional terms got carried along. Today we use a mix of old and new terminology to describe typography. Alignment
CHI-SQUARE: TESTING FOR GOODNESS OF FIT
CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity
System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1
System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect
Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
Classifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
Technical Trading Rules as a Prior Knowledge to a Neural Networks Prediction System for the S&P 500 Index
Technical Trading Rules as a Prior Knowledge to a Neural Networks Prediction System for the S&P 5 ndex Tim Cheno~ethl-~t~ Zoran ObradoviC Steven Lee4 School of Electrical Engineering and Computer Science
Back to Elements - Tetrahedra vs. Hexahedra
Back to Elements - Tetrahedra vs. Hexahedra Erke Wang, Thomas Nelson, Rainer Rauch CAD-FEM GmbH, Munich, Germany Abstract This paper presents some analytical results and some test results for different
Galaxy Morphological Classification
Galaxy Morphological Classification Jordan Duprey and James Kolano Abstract To solve the issue of galaxy morphological classification according to a classification scheme modelled off of the Hubble Sequence,
A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling
A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling Background Bryan Orme and Rich Johnson, Sawtooth Software March, 2009 Market segmentation is pervasive
FPGA Implementation of Human Behavior Analysis Using Facial Image
RESEARCH ARTICLE OPEN ACCESS FPGA Implementation of Human Behavior Analysis Using Facial Image A.J Ezhil, K. Adalarasu Department of Electronics & Communication Engineering PSNA College of Engineering
Product Overview & New Features
Kofax Transformation Modules 6.0 Product Overview & New Features Product Overview Kofax Transformation Modules streamline the transformation of business documents into structured electronic information
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
Online Farsi Handwritten Character Recognition Using Hidden Markov Model
Online Farsi Handwritten Character Recognition Using Hidden Markov Model Vahid Ghods*, Mohammad Karim Sohrabi Department of Electrical and Computer Engineering, Semnan Branch, Islamic Azad University,
Measurement with Ratios
Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical
Analecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
Concepts of digital forensics
Chapter 3 Concepts of digital forensics Digital forensics is a branch of forensic science concerned with the use of digital information (produced, stored and transmitted by computers) as source of evidence
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode
Year 9 set 1 Mathematics notes, to accompany the 9H book.
Part 1: Year 9 set 1 Mathematics notes, to accompany the 9H book. equations 1. (p.1), 1.6 (p. 44), 4.6 (p.196) sequences 3. (p.115) Pupils use the Elmwood Press Essential Maths book by David Raymer (9H
