EMC CAPTIVA 7 MODULES OVERVIEW ABSTRACT This document provides an overview on the EMC Captiva Capture and Advanced Recognition modules, including the image capture, document classification, and data recognition modules, exporters, utilities, administration, software development kits, and Professional Services modules. The document also provides a list of all modules in version 7 that are ScaleServer aware or that can be run as a native Windows service. EMC PRODUCT DESCRIPTION GUIDE
TABLE OF CONTENTS EMC CAPTIVA INPUT AND PROCESSING MODULES 4 SCANPLUS 4 RESCANPLUS 4 MULTI-DIRECTORY WATCH (MDW) 4 EMAIL IMPORT 4 IMAGE PROCESSOR 4 IMAGE CONVERTER 5 IMAGE DIVIDER 5 EMC CAPTIVA RECOGNITION MODULES 5 CLASSIFICATION 5 EXTRACTION 6 NUANCE OCR 6 EAST EURO / APAC OCR 6 PAGE REGISTRATION 6 COLLECTOR 6 EMC CAPTIVA OPERATOR MODULES 6 DESKTOP 6 CLASSIFICATION EDIT 6 EMC CAPTIVA EXPORTS 7 STANDARD EXPORT 7 ODBC EXPORT 7 EMC CAPTIVA ENTERPRISE EXPORTS 7 EMC DOCUMENTUM ADVANCED EXPORT 7 EMC DOCUMENTUM APPLICATIONXTENDER EXPORT 7 IBM CONTENT MANAGER EXPORT 8 IBM-FILENET PANAGON IMAGE SERVICES / CONTENT SERVICES EXPORT 8 IBM-FILENET CONTENT MANAGER EXPORT 8 OPENTEXT LIVELINK EXPORT 8 MICROSOFT SHAREPOINT EXPORT 8 SAP ARCHIVE EXPORTER AND AP CONNECT 8 EMC CAPTIVA DESIGN AND ADMINISTRATION
DESIGNER 9 RECOGNITION DESIGNER 9 CAPTUREFLOW SCRIPT EDITOR 9 ADMINISTRATOR 9 WEB SERVICES 9 SOFTWARE DEVELOPMENT KIT (SDK) 10 EMC CAPTIVA UTILITY MODULES 10 COPY 10 MULTI 10 TIMER 10 CAPTIVA HIGH AVAILABILITY AND CLUSTERING 10 CAPTIVA SCALESERVER 10 CAPTIVA CAPTURE SERVER AND MICROSOFT CLUSTERING11 EMC PARTNER ADD-ON MODULES AND PRODUCTS 11 PRIMEOCR MODULE FOR CAPTIVA CAPTURE 11 PDFOPTIMIZER FOR CAPTIVA CAPTURE 11 REVEILLE MANAGEMENT CONSOLE FOR CAPTIVA CAPTURE11 PROFESSIONAL SERVICES MODULES 11 EXTRACTOR MODULE 11 BATCH CREATOR 11 REPLICATOR 12 SORT 12 OPEX SCANNER MULTI-DIRECTORY WATCH 12 IBM ONDEMAND EXPORT 12 IBML IMPORT 12 DIGITAL SIGNATURES 12 SCALESERVER-AWARE MODULES 13 MODULES RUNNING AS NATIVE WINDOWS SERVICES 14 9
EMC CAPTIVA INPUT AND PROCESSING MODULES SCANPLUS ScanPlus is a smart client module that enables users to scan hardcopy documents and import image files into Captiva. It uses Pixel Translations' ISIS (Image and Scanner Interface Specification) driver set, which is the industry standard interface for high performance scanners, and supports more than 450 makes and models of scanners. An intuitive and customizable workspace layout lets users resize, move, add, and delete task panels, re-order images in a batch, and modify scanner settings. RESCANPLUS RescanPlus is a smart client module that enables users to rescan hardcopy documents. RescanPlus uses all the same basic functionality of the ScanPlus module, including ISIS. The RescanPlus module is used for rescanning poor quality images. MULTI-DIRECTORY WATCH (MDW) Multi-Directory Watch module monitors multiple directories for new files. When new files are detected in a specified directory, the module creates a new Captiva batch based on the Captiva process defined for that directory. Multi-Directory Watch: Runs at intervals as needed. Each time it runs the module imports files found in a watched directory into one or more batches until all files are imported. Locates images in subdirectories within a watched directory. Determines the level at which an image is inserted with each processed file. Deletes a file after it is successfully imported into the Captiva system, unless an alternate success path directory is specified. Moves files with import errors to a selected error path directory, and logs errors to an error log file in the same directory. Displays errors that occur while importing files in the application window. Reads a delimited text file and parses the text into separate Captiva attributes. Runs in unattended mode as a service. Parses the text of a specified XML file into separate Captiva attributes. EMAIL IMPORT Email Import module receives documents in the form of e-mail and attachments from a mail server. The modules parse the incoming e-mail, enabling the various parts of the e-mail message (message body and attachments) to be imported as separate items into Captiva. Email Import module is another way to get images and data into your Captiva system. Email Import module includes the ability to: Import any attachment, including non-image files Search multiple e-mail Inboxes Import from various e-mail systems Capture information such as recipient, subject, and body, as Captiva values to be saved with that batch. IMAGE PROCESSOR Image Processor, together with Image Processing profiles, replaces the Auto Annotate and Image Enhancement modules. Image Processor module lets you apply image filters to detect content, remove distractions such as holes or lines, adjust colors,
improve line quality, and correct page properties using Image Processing profiles. It is also possible to integrate third-party image filters, and using scripting to dynamically apply filters based on set conditions. You can use profile scripting to extend the functionality of existing filters. Image processing filters include the following: Detection: barcodes, blank pages, color marks, colorfulness, patch codes. Removal: background, black bars, holes, lines. Color Adjustment: Adjust overall color, convert specific color, convert to black white, convert to black white advanced, invert black white Image Quality: adjust lighting, adjust thickness, remove specs, smooth edges Page Correction: crop, deskew, rotate, scale IMAGE CONVERTER Image Converter module is responsible for image conversion functionality. Image conversion is a mechanism that lets you convert files from one format to another and transforms files from one type to another. Image Converter module implements the following features: Performs file conversion: Depending on the image conversion profile defined for a step, conversion can include: Changing image properties including file format, color format, and compression; Converting non-image files to images and PDF files, and images to PDF files; Generating output files of specific file types such as, for example, PDF, TIFF, and BMP, with the opportunity to specify additional options for these types of files; Merging single-page files to multi-page documents and splitting multi-page documents into single pages; Merging annotations added to TIFF images by other modules, such as Image Processor or Captiva Desktop, into the output image. Supports processing of image and non-image files: Supports converting a wide range of image formats, Microsoft Office documents, PDF, and HTML files. Creates thumbnails for all pages with the single-page output type: Generates thumbnails of the pages processed. IMAGE DIVIDER Image Divider acquires, identify, and process multi-page image files. Once Image Divider identifies an incoming file as a multi-page image file, it is able to split this file into single-page files while preserving the attributes of the original image file. EMC CAPTIVA RECOGNITION MODULES CLASSIFICATION Classification module comes with Captiva Advanced Recognition and used to classify documents automatically by assigning each document to a template defined in a project. Documents that are not classified automatically during the classification phase need to be classified manually using the Classification Edit module. This module is included with Captiva Advanced Recognition product. The classification engine used by the Classification module applies several classification algorithms to the images, including: Full Page Image Analysis: Evaluates and compares an entire image to the models stored in each template.
Handwritten Detection Analysis: Evaluates images to determine the percentage of handwriting they contain. If higher than a predefined threshold, an image is classified as handwritten. Full text Analysis: Performs OCR and evaluates the resulting text for keywords, pattern matches, or regular expressions that were defined in a template. High Precision Anchors: Selects a feature of an image based on a similar feature that was demarcated on a model image stored in a template. EXTRACTION Extraction module extracts data from each page of a document using. It then combines the page-level outputs into a single document. Two technologies are used for data extraction: Zonal recognition: extracts data from predefined areas of a page Free form recognition: extracts data from the entire page Several recognition engines are available supporting almost all languages and types of recognition including machine print, hand print, checkboxes, 1D barcodes, 2D barcodes, signatures (present or not), French and US Checks (machine print and hand print, MICR/CMC7, and CAR/LAR). NUANCE OCR Nuance OCR module is the standard OCR module that is included with Captiva Capture. This module performs optical character recognition (OCR) on images using up to three engines and supports 114 languages and a wide variety of output formats, including PDF. Optional licensing is available for handprint and optical mark recognition (OMR). EAST EURO / APAC OCR East Euro / APAC OCR module performs optical character recognition of scanned or imported images and exports the image and index data to more than 25 different word processing and text formats. The module works at any trigger level (all nodes below the trigger level task are combined into a single output file). PAGE REGISTRATION Page Registration module registers images to conform to a template image. You can perform zonal OCR on static form fields with confidence that the OCR zones will be associated with the correct area of their scanned images. COLLECTOR Collector is part of Production Auto-Learning (PAL). PAL automatically creates new graphic templates using the images collected by Collector. The newly created templates are used to automatically identify a document. EMC CAPTIVA OPERATOR MODULES DESKTOP Captiva Desktop is a new desktop client in Captiva 7.0 for capture operators. Captiva Desktop can be easily customized for all types of capture tasks image quality review, document assembly, high-speed data entry and indexing, high-speed data correction, and data validation. CLASSIFICATION EDIT Classification Edit comes with Captiva Advanced Recognition and is used to review and classify documents that were not automatically classified during document recognition. Classification Edit enables operators to manually classify all documents
that were not classified automatically by the Classification module. Operators can classify documents by assigning each document to a template that has been defined in a project. Classification Edit module is an attended module that an operator interacts with to successfully process documents onto the next step. Batches selected for processing during production open automatically in the Classification Edit production window where the operator can complete and correct automatic classification that was performed during the Classification step. EMC CAPTIVA EXPORTS STANDARD EXPORT Standard Export module performs data export from a batch to the specified destination. The data and image can be exported in any of the supported file formats. Export profiles define the batch processing scenario and the export commands that will be applied to the batch. They define filters that select batch nodes for data export based on a filtering condition. The selected nodes are handled with a set of export commands defined for that filter. ODBC EXPORT ODBC Export module saves Captiva values and image data, and performs SQL queries and database updates by using Open Database Connectivity (ODBC) drivers into ODBC-enabled applications. EMC CAPTIVA ENTERPRISE EXPORTS EMC DOCUMENTUM ADVANCED EXPORT EMC Documentum Advanced Export module enables users to specify an unlimited number of objects for export, define properties for each one, end export documents to new or existing objects in the Documentum system. The module processes tasks in unattended production mode, specifies objects and corresponding object properties, designates the owners of newly created documents, and specifies who can access each document. The Documentum Advanced Export functionality: Processes tasks unattended in production mode. Specifies objects and corresponding object properties. Designates the owners of newly created documents and each object. Creates a flexible export list containing objects and renditions. All definitions can be exported to one or more folders and cabinets. Implements an object search to find and export documents. Supports major, minor, and branch versioning. Exporting a branch version is useful when the current version of the document is locked. Specifies the state within a document lifecycle to apply to a document, and specifies an alias set to determine the document permissions. Links one or more available documents to several folders during the export. Exports multiple renditions of a document. EMC DOCUMENTUM APPLICATIONXTENDER EXPORT EMC Documentum ApplicationXtender Export module sends image data and index data into the ApplicationXtender content management repository. ApplicationXtender Export module:
Initiates ApplicationXtender workflows. Provides Full Text Indexing module support. The module can route a document to the ApplicationXtender Export Full Text Indexing module. Map Captiva values to the index fields (including multi-valued fields) of a selected document class. When setting up the module, users can map Captiva values to the index fields of a selected document class. Use Captiva values to specify output locations. When setting up the module, users can specify the desired export location using a combination of hard-coded characters and Captiva values in setup mode. A Captiva value is a variable used within Captiva to store setup and processing information, including file and image names, module configuration settings, processing statistics, and data captured during processing. Users separately specify the application name and index fields for maximum flexibility. ApplicationXtender Export module creates any part of the required path that does not already exist. IBM CONTENT MANAGER EXPORT Specifically for older i-series, the IBM Content Manager Export modules export image data and index data into an IBM ImagePlus VisualInfo repository. The Captiva Capture Tree structure is preserved as folders and documents, while index data is formatted and stored as VisualInfo attributes. The IBM Content Manager Export allows you to export image data and, optionally, index data to IBM ImagePlus. Indexing can also be performed prior to exporting by using the Captiva Capture/IBM CMIP-390 Index module. IBM Content Manager Exporter for Captiva sends data directly to an IBM Content Manager back end. IBM-FILENET PANAGON IMAGE SERVICES / CONTENT SERVICES EXPORT IBM-FileNet Panagon Image Services / Content Services Export module enables you to connect a Captiva server to an IBM-FileNet server, and populate your IBM-FileNet system with batch information, including tree structures, documents, and Captiva values. During module setup, users can select what items to export and assign Captiva values to index or property fields to facilitate later retrieval from the FileNet system IBM-FILENET CONTENT MANAGER EXPORT IBM-FileNet Content Manager Export module exports Captiva images and data to an IBM-FileNet Content Manager system. You can export and store any type of document and map Captiva Capture values to index fields within document classes. OPENTEXT LIVELINK EXPORT OpenText Livelink Export exports image and index data directly to the Livelink Content Management system. The export enables you to specify categories and assign attribute values for each item, and export captured data to an enterprise workspace, personal workspace, workflow, Object ID, or Volume ID. MICROSOFT SHAREPOINT EXPORT Microsoft SharePoint Export module sends image and index data directly to Microsoft SharePoint 2003, 2007, and 2010. SAP ARCHIVE EXPORTER AND AP CONNECT SAP Archive Export and AP Connect module export content from Captiva Capture to a content server via the SAP HTTP Content Server Interface, and export administrative data (metadata) from Captiva Capture values to an SAP R/3 system via the SAP ArchiveLink interface. If Archive Export is configured for the Late Archiving mode, then only a small amount of data is sent to the SAP R/3 system that
links the existing invoice data to the image on the Content Server. If Archive Export is configured for the Early Archiving mode, then all of the name-value pairs defined in the index fields are transferred to the SAP R/3 system. EMC CAPTIVA DESIGN AND ADMINISTRATION DESIGNER Captiva Designer 7.0 is a new design and development tool for creating, configuring, deploying, and testing the capture system end-to-end. It provides a unified development environment that lets you define one or more capture systems. Each capture system is composed of a set of CaptureFlows, profiles, document types, and other service components that make up an end-to-end working system. These reusable service components are configurations that are not specific to any capture process. RECOGNITION DESIGNER Captiva Recognition Designer allows project administrators to create, edit, and test recognition projects before uploading them to the production environment. Recognition projects contain instructions for passing documents and their information through classification, recognition, and validation steps of a process. For additional customization, project administrators can create and edit project scripts and import templates from other projects. Recognition Designer allows users to work in either basic mode using Captiva Capture, or advanced mode using Captiva Advanced Recognition. CAPTUREFLOW SCRIPT EDITOR CaptureFlow Script Editor provides the ability to write custom code as part of the CaptureFlow development process. A process model can be associated with a C# or VB.NET project that stores code-behind for the process steps. A project is added for a CaptureFlow automatically when you add scripting for any of its steps for the first time. Each step in a process model can be given a script file in the project. ADMINISTRATOR Administrator is a browser-based module that is accessed through a Microsoft Internet Explorer browser. It interacts with the Captiva system on several levels, enabling it to perform necessary administrative tasks. An administrator is able to: Configure a ScaleServer group Perform all Captiva server administrative tasks Control all process and batch administration, including monitoring batch traffic and finding batches Perform administrative tasks related to Captiva modules Install and maintain Captiva licenses Configure the logging subsystem to capture informative, real-time data Configure and generate informative, customizable reports WEB SERVICES Web Services provides an XML-based web services framework to support serviceoriented architectures (SOA). This architecture enables the Captiva system to either be a consumer or provider of web services. Captiva web services enable external systems to interact with Captiva processes, and enable Captiva to interact with the workflows of external systems. Also, external systems can use specific capabilities of individual Captiva modules without using the entire Captiva system. The following
modules support web service functionality: Web Services Input module: Captiva module that serves as a web services provider, processing SOA requests from external web services consumers. Web Services Output module: Captiva module that serves as a web services consumer, using Internet protocols to access the functionality of external SOA participants (web services providers). SOFTWARE DEVELOPMENT KIT (SDK) Captiva Software Developer s Kit (SDK) enables an experienced C or Visual Basic programmer to develop Captiva compatible modules. The toolkit gives developers access to required files and documentation for building Captiva compatible modules in a Microsoft Visual Basic development environment that: Communicates with one or more Captiva Capture servers Manages files on the Captiva Capture server(s) Receives tasks from one or more Captiva Capture servers for processing Manages Captiva Capture values Structures and manages batch information in an Captiva Capture tree The SDK also includes source code for sample applications, all necessary header files, and a manual in PDF. EMC CAPTIVA UTILITY MODULES The following utilities are included with the Captiva Server. COPY Copy utility enables batches to be copied either to another Captiva Capture system, a local or network directory, or an FTP site. MULTI Multi is a multiple-purpose module that is able to manipulate nodes within the batch tree, change the effective trigger level in a process, and can also sound a beep to notify when a module finishes processing all tasks for any number of batches. When launched, Multi runs independently of an operator, and receives tasks as they become available from the server. TIMER Timer module changes values of a batch, or a group of batches, at a user-specified time. During setup, rules are created to specify the conditions under which Timer changes values, and the operations Timer performs during production. CAPTIVA HIGH AVAILABILITY AND CLUSTERING Captiva uses several technologies to ensure high availability and failover protection. CAPTIVA SCALESERVER Captiva ScaleServer technology provides Captiva Capture systems with many benefits, including increased availability, higher productivity, improved workload balancing, and centralized control. In a ScaleServer group, multiple Captiva servers work together as a single capture system, distributing the processing workload. Each server in the ScaleServer group manages its own work, and each client workstation requests work from all available servers. When a module finishes processing, it sends the batch back to the Captiva server where it originated. The multiple servers appear
as a single server to the module. Captiva servers share connection information so a module consumes just one connection license. If a Captiva server becomes unavailable due to a planned or unplanned interruption, other Captiva servers in the same ScaleServer group automatically continue sending tasks to and accepting tasks from client modules. CAPTIVA CAPTURE SERVER AND MICROSOFT CLUSTERING Captiva server supports both Active/Passive and Active/Active clustering for failover protection. Individual Captiva servers or entire ScaleServer groups can be clustered to provide both high availability and failover. EMC PARTNER ADD-ON MODULES AND PRODUCTS PRIMEOCR MODULE FOR CAPTIVA CAPTURE Prime Recognition is a partner of EMC, who provides an add-on OCR module for Captiva Capture that is highly accurate, reliable, and supports high volume Captiva Capture document capture environments. PDFOPTIMIZER FOR CAPTIVA CAPTURE Pdf Optimizer for Captiva Capture is available through EMC Select, and is provided by CVISION Technologies. REVEILLE MANAGEMENT CONSOLE FOR CAPTIVA CAPTURE Reveille Management Console for Captiva Capture employs preconfigured applicationaware tests for Captiva Capture, Captiva Advanced Recognition, and Captiva Invoice Capture, which helps organizations create process monitors testing the Captiva server capture process for batches, tasks, and workflow exceptions. PROFESSIONAL SERVICES MODULES The following Captiva modules have been developed by the EMC Professional Services Group. EXTRACTOR MODULE Extractor module is primarily used to improve reporting capabilities. Similar to ODBC Export, it can be used to extract any data from a Captiva batch and write it out to a database table or flat file. The central benefit is that it runs when you set it to run, instead of having to be triggered by the IPP. Some additional features include: No IPP changes needed to implement this into your existing system Ability to run for all batches for a selected Captiva process Ability to link to multiple servers and databases Built-in scheduler to run at set dates/times or periodically BATCH CREATOR Batch Creator module can be used to move documents from one batch to a new one that is based on a different Captiva process. Examples where this module can be of use include an index operator recognizing that documents were scanned into the wrong process and moving them to the correct one, or moving a document to an error queue in its own new batch to let the rest of the original batch continue downstream. One batch can be split up at any level into any number of new batches using different Captiva processes.
REPLICATOR Replicator module is used to duplicate any image or other batch level and move it to a particular part of the existing batch. Examples where this module can be of use include replicating a batch cover sheet to be added to the beginning of each document before export. Should a document need to be indexed several times with different information, the module can replicate that document x number of times before reaching Index. SORT Sort module is used to automatically re-sort a batch that has been scanned or imported out of order. It relies on an input value for each page and it reorganizes the images in numerical or alphabetical order based on that value. This module eliminates the need for manual intervention to re-sort a batch that has come into Captiva incorrectly. OPEX SCANNER MULTI-DIRECTORY WATCH Opex Scanner Multi-Directory Watch (MDW) provides all of the features found in standard MDW but is specifically built to handle the additional importing of the Opex Scanner output file (ODI format) that gets generated. IBM ONDEMAND EXPORT OnDemand Export sends image and index information into an IBM OnDemand environment. IBML IMPORT IBML Import module provides the connection between Captiva and the IBML Image Track scanner. DIGITAL SIGNATURES Digital Signatures module captures document signatures so images can be 100 percent verified as the original paper document. Many legal regulations, pursuant to signature law, require the use of qualified electronic signatures using personal signature cards of accredited trust centers in order to confirm the authenticity of digital copies of documents. Digital Signatures support in Captiva is comprised of two separate modules: DSSign: Captiva module which works in conjunction with the scanning process to perform a visual verification on a random sampling basis, individual signature, or batch signature. Data objects to be signed are delivered and signature data objects returned. The signatures are authenticated automatically. DSVerify: Captiva module for the verification of digital signature authenticity, essentially determining if a signed document has been altered.
SCALESERVER-AWARE MODULES The following Captiva 7.0 modules are ScaleServer aware: ScanPlus ReScanPlus Extraction Desktop NuanceOCR PrimeOCR Image Divider Web Services (In/Out) EMC Documentum Export IBM-FileNet Content Manager P8 Export IBM CommonStore SAP Export IBM Content Manager ImagePlus for OS/390 Export IBM Content Manager Multi-platform Export IBM FAF Index Archive Export for SAP ArchiveLink Content Server Microsoft SharePoint Export Standard Export Multi Utility Timer Utility Image Utility Administrator ODBC Export Page Registration ApplicationXtender Export OpenText LiveLink Export Multi-Directory Watch Classification Classification Edit
MODULES RUNNING AS NATIVE WINDOWS SERVICES The following Captiva modules are capable of running as a native Windows services in version 7.0. Captiva Server Image Processor Documentum Advanced Export IBM Content Manager Export ODBC Export XML Export PrimeOCR NuanceOCR Web Services (Output/Input) Image Divider Email Import Timer Utility Multi Utility Classification Multi-Directory Watch Image Utility Extraction CONTACT US To learn more about how EMC products, services, and solutions can help solve your business and IT challenges, contact your local representative or authorized reseller or visit us at www.emc.com. EMC 2, EMC, the EMC logo, [add other applicable product trademarks in alphabetical order] are registered trademarks or trademarks of EMC Corporation in the United States and other countries. VMware [add additional per above, if required] are registered trademarks or trademarks of VMware, Inc., in the United States and other jurisdictions. All other trademarks used herein are the property of their respective owners. Copyright 2015 EMC Corporation. All rights reserved. Published in the USA. 02/15 EMC Product Description Guide H6118.3 www.emc.com EMC believes the information in this document is accurate as of its publication date. The information is subject to change without notice.