OmniPage Capture SDK s enhanced barcode recognition capabilities. Judit Lánczky, Principal Software Engineer Dr. István Marosi, Senior Project Lead Nuance Document Imaging Developers Conference 2013 2002-2013 Nuance Communications, Inc. All rights reserved. Page 1
20 new symbologies in OP SDK 19 BAR & BAR_AMP BAR only BAR_AMP only EAN 8/13 UPCA A, UPC E, Code 39, Code 39 EXT, Code 39 NSS Bookland Databar Limited, Databar Expanded Code 32 Code 128, UCC 128 EAN 14, SSCC 18 ITF, Airline 2of5, Standard 2of5, Matrix 2of5, ITF 14 Italian Postal 2of5 Codabar Code 11 Code 93 MSI PDF 417, Datamatrix, QR code Postnet, Planet, USPS OneCode, Australia Post Aztec UK Royal Post, Royal Dutch Post, Singapore Post, Denmark Post Patch 2002-2013 Nuance Communications, Inc. All rights reserved. Page 2
Barcode types 1 dimensional barcodes only the width of bars and spaces contain information Code39 GS1-Databar Limited 2002-2013 Nuance Communications, Inc. All rights reserved. Page 3
Barcode types 2 dimensional barcodes PDF417: information is placed in multirows QR code and DataMatrix: information is in matrix form 2002-2013 Nuance Communications, Inc. All rights reserved. Page 4
Barcode types Less-used barcodes 2 state barcodes: E.g. Planet 4 state barcodes: E.g. Australia Post Patch code: 2002-2013 Nuance Communications, Inc. All rights reserved. Page 5
Comparison of densities The test string: XY1234CD5678EF901234 Dimension in square modules, where: The narrow bar width is 1 module Wide-narrow ratio is 2.5 The code height is 0.15 percent of symbol length in case of 1d The cell height is 3 modules in case of PDF417 The cell is a 1 x 1 module in case of QR code Code 39: 16335 Code 128: 6427 PDF 417: 3213 QR code: 441 2002-2013 Nuance Communications, Inc. All rights reserved. Page 6
1D barcode features Start-stop pattern Uniquely identifies symbology and orientation Exception: Code 39 NSS can not automatically decide the orientation, because reading left-to-right and right-to-left result different, but valid, strings 2002-2013 Nuance Communications, Inc. All rights reserved. Page 7
1D barcode features Example of missing start-stop pattern Code 39 NSS upright The same barcode upside down is valid Code 39 NSS: 2002-2013 Nuance Communications, Inc. All rights reserved. Page 8
1D barcode features Checksum Symbologies without checksum: Codabar, UPC E, Patch Symbologies with optional checksum: Code 39, Code 39 EXT, Code 39 NSS, ITF, Standard 2 of 5, Matrix 2 of 5, MSI No check is done by default, check digit (if any) is returned as data Use the "Kernel.Ocr.BAR.bar1D.<name>.CDX" settings to check. Not returned in the result in this case. Warning! Barcode is not recognized if checking is forced on a code without check digit. Warning! MSI checkdigit algorithm is not standardized! We are using the Luhn algorithm. 2002-2013 Nuance Communications, Inc. All rights reserved. Page 9
1D barcode features Checksum Symbologies with mandatory checksum: All the rest. E.g.: EAN 8/13, UPC A, Code 128, UCC 128, Code 93, Airline 2 of 5, Code 11 Checksum is always checked, check digit is usually not returned Check digit is returned as part of the result for EAN 8/13 and UPC A Check digit can be returned as part of the result for Code 128 Use the "Kernel.Ocr.BAR.bar1D.C128.CDT" setting to return 2002-2013 Nuance Communications, Inc. All rights reserved. Page 10
1D barcode features Character set: Numeric only: E.g.: EAN 8/13, UPC A, UPC E, ITF, Standard 2 of 5, Matrix 2 of 5, Airline 2 of 5 Numeric with special symbols: E.g.: Codabar: 0-9, -, $, :, /,., + (start-stop char: A,B,C,D) Alphanumeric: E.g.: Code 39: 0-9, A-Z, -,., +, $, /, %, space 2002-2013 Nuance Communications, Inc. All rights reserved. Page 11
1D barcode features Character set: ASCII 128 Code 39 EXT, Code 128, Code 93 Text with multiple lines How to work with multiple barcodes in a zone? R_ENDOFLINE marks last character of a line R_ENDOFPARA marks end of barcode R_ENDOFZONE marks last barcode in the zone (LETTER::makeup) Binary mode: (No code conversion is done) "Kernel.OcrMgr.BarBinary" setting DTXT_BINARY output format 2002-2013 Nuance Communications, Inc. All rights reserved. Page 12
1D barcode features Self checking capability Hamming distance is greater than 1 Not recommended symbologies Standard 2 of 5, Matrix 2 of 5, Code 11, MSI Hamming distance is 1 Code 39 NSS Hamming distance is 0 (for upside down codes) Orientation detection is impossible Barcode type can not be recognized without start-stop bars 2002-2013 Nuance Communications, Inc. All rights reserved. Page 13
2D barcode features Start-stop pattern PDF417 QR code Data Matrix 2002-2013 Nuance Communications, Inc. All rights reserved. Page 14
2D barcode features Error correction Reed-Solomon error correction PDF417: 9 levels of error correction 2 512 error detection and correction codewords can be added Datamatrix ECC200: Tolerates 30% of damage 2002-2013 Nuance Communications, Inc. All rights reserved. Page 15
2D barcode features Error correction Reed-Solomon error correction QR code: 4 levels of error correction: the higher level has less storage capacity - L(Low) 7%, M(Medium) 15%, Q(Quartile) 25%, H(High) 30% of the codewords can be restored - - Low level High level 2002-2013 Nuance Communications, Inc. All rights reserved. Page 16
2D barcode features Character set PDF417 ASCII 2710 numeric, 1850 alphanumeric, 1108 binary codes Datamatrix ASCII 3116 numeric, 2335 alphanumeric, 1555 binary codes QR code ISO-8859-1, UTF-8, Shift-JIS, ECI mode supported 7089 numeric, 4296 alphanumeric, 2953 binary codes Binary mode with the "Kernel.OcrMgr.BarBinary" setting 2002-2013 Nuance Communications, Inc. All rights reserved. Page 17
Barcode recognition in OmniPage krecinsertzone Filling method: FM_BARCODE or FM_BARCODE2D Recognition module: RM_BAR or RM_BAR_AMP Bounding box: could be the full page! krecsetbartypes Array of enabled barcode symbologies Default: the 5 most common 1D barcodes (EAN 8/13, ITF, Code 39, Code 128, Codabar) krecrecognize 2002-2013 Nuance Communications, Inc. All rights reserved. Page 18
Incompatible symbologies Three types of incompatibility Uncombinable: type must be alone (image features are different) 2D symbologies, Postal family (2/4-state), Patch Incompatible: mutually exclusive types (different physical encoding) E.g.: ITF and Airline 2of5: same start-stop pattern, same encoding table, but the information is in bars and spaces vs. bars Inconsistent: the result is ambiguous (different logical encoding) E.g.: Code 39 and Code 39 Ext: Escape characters in Ext Code 39: "ARM/CHAIR" Code 39 Ext:"ARM#HAIR" 2002-2013 Nuance Communications, Inc. All rights reserved. Page 19
New features in OmniPage SDK 19 New functions kreccheckbartypes() An Expert system for detecting incompatibilities Input: array of enabled barcode types Use case #1: Check if barcode types are supported by a given engine Use case #2: Check barcode compatibility Use case #3: Correct array of enabled barcodes Static Dynamic: designed for interactive symbology selection on the UI 2002-2013 Nuance Communications, Inc. All rights reserved. Page 20
New features in OmniPage SDK 19 New functions krecsetzonebartypes() Zone by zone barcode type setting Designed for the new Form recognition krecgetocrzonetext() Get the OCR result in a string For any zone (not only barcode ones) Use in a binary barcode zone: code < 0x20 and 0x80..0x9F are escaped as '\xnn' 2002-2013 Nuance Communications, Inc. All rights reserved. Page 21
New features in OmniPage SDK 19 Fast mode Faster barcode detection for finding batch separator pages Use the new "Kernel.OcrMgr.BarFastMode" setting Average time: 6-13 ms (on full page text and graphics without barcodes) Stricter dimension and quality requirements Does not find small and ugly barcodes Challenges: 2002-2013 Nuance Communications, Inc. All rights reserved. Page 22
New features in OmniPage SDK 19 Further new settings "Kernel.Ocr.BAR.bar1D.MinLength" minimum length of valid recognition result (default: 3) to prevent misrecognition "Kernel.Ocr.BAR.bar1D.4STATE.AUSPOST.CustomerEncoding" AUSPOST_ENC_CHARACTER (default) AUSPOST_ENC_NUMERIC AUSPOST_ENC_RAW 2002-2013 Nuance Communications, Inc. All rights reserved. Page 23
Some suggestions Don t use weak symbologies, if possible See Not recommended symbologies above Use tight zones around barcodes, if possible Faster More accurate Workflow for recognizing mixed pages: (barcode + text) Insert a full page zone and recognize barcodes Then put ignore zones at found barcode areas And run full page OCR Use 1 dimensional barcodes on separator pages 2002-2013 Nuance Communications, Inc. All rights reserved. Page 24
Thank you 2002-2013 Nuance Communications, Inc. All rights reserved. Page 25