White Paper. 3-Heights Document Converter Basics and Applications



Similar documents
WHITE PAPER. 3-Heights Scan to PDF Server Basics and Applications

PDF/A the standard for long-term archiving

Archiving digital documents and s in PDF/A

White Paper. Digital signatures from the cloud Basics and Applications

PDF/VT the ISO Standard for Variable Data Printing (VDP) Applications

PDF Primer PDF. White Paper

Quick Reference Guide

Server-Based PDF Creation: Basics

PRESS RELEASE. AIIM, Philadelphia, May 15 th 2006 Embargo until, May 15 th 2006, at 5:40 p.m

PDF/VT The ISO Standard for the Printing of Variable and Transactional Documents

PixEdit Server & PixEdit Converter Server Deployment Guide

Whitepaper Document Solutions

BIRT Document Transform

In addition, a decision should be made about the date range of the documents to be scanned. There are a number of options:

Processing PDF/A Documents

ZipMail Client XML PDF PICT V11. New. New. New. For Microsoft Outlook

LOTUS to SharePoint Migration Services

Professional Enterprise Content Management

Tibiscus University, Timişoara

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification

Simplify essential workflows with dynamic scanning capabilities. GlobalScan NX Server 32/Server 750 Capture & Distribution Solution

The All-In-One Browser-Based Document Management Solution

How To Manage Your Digital Assets On A Computer Or Tablet Device

Perfecting Advanced Rendering ADLIB PDF PRODUCT GUIDE

GlobalScan NX. Server 32/Server 750. Intelligent scanning for smarter workflow

Document Exporter for Outlook

Océ PRISMA archive software. Archiving made easy. Powerful, high-volume. archiving software

ZipMail Client XML PDF PICT V11. New. New. New. Automatic and transparent on-the-fly Zip compression and decompression for Lotus. Notes attached files

Navigate your workflow

White Paper: Securely archiving s

Carol Chou. version 1.1, June 2006 supercedes version 1.0, May 2006

Whether your organization is small, medium or large, OpenText RightFax meets these

Quick Start Guide. Managing the Service. Converting Files and Folders

Document technology for the digital age. gdoc Platforms. Quickly introduce new digital document products to your customers. A Global Graphics Brand

Adept PublishWave 2015 SP2 System Requirements

Overview of Active Directory Rights Management Services with Windows Server 2008 R2

Kentucky Department for Libraries and Archives Public Records Division

Document Management Release Notes

ImageNow User. Getting Started Guide. ImageNow Version: 6.7. x

Change Manager 5.0 Installation Guide

InstallAware for Windows Installer, Native Code, and DRM

Enterprise Document & Content Management

Compliance Response Edition 07/2009. SIMATIC WinCC V7.0 Compliance Response Electronic Records / Electronic Signatures. simatic wincc DOKUMENTATION

AVALANCHE MC 5.3 AND DATABASE MANAGEMENT SYSTEMS

Nuance ecopy ShareScan. Brings paper documents into the digital world. Document capture & distribution Nuance ecopy

NUANCE The experience speaks for itself

Konica Minolta Unity Document Suite. Powerful integrated document processing. Document capture & distribution Unity Document Suite

Technical Description. DigitalSign 3.1. State of the art legally valid electronic signature. The best, most secure and complete software for

Oracle Enterprise Content Management

LittleCMS: A free color management engine in 100K.

The Challenge Handling a lot of paper documents

Whether your organization is small, medium or large, OpenText RightFax meets these

Dispatcher Phoenix is available in three distinct and customizable solutions to meet customer needs most effectively and efficiently:

Outside In Image Export Technology SDK Quick Start Guide

Feature and Technical

ArcGIS ArcMap: Printing, Exporting, and ArcPress

MOBILE PRINTING: Secure Printing From Your Handheld Devices

White Paper. Installation and Configuration of Fabasoft Folio IMAP Service. Fabasoft Folio 2015 Update Rollup 3

Points to Note. Chinese and English characters shall be coded in ISO/IEC 10646:2011, and the set of Chinese

PDFSealer User s Guide. ITEKSOFT Corporation Copyright All rights reserved

Enterprise Archive Managed Archiving & ediscovery Services User Manual

PBS ContentLink. Easy and Flexible Connection between Storage, SharePoint and SAP Solutions

Aspose.Cells Product Family

Introduction to WIPOScan Software

TSPrint - Usage Guide. Usage Guide. TerminalWorks TSPrint Usage Guide. support@terminalworks.com

XenData Product Brief: SX-550 Series Servers for LTO Archives

TREENO FILE MONITOR. Installation and Configuration Guide

How To Use Pdf Files On A Pc Or Mac Or Mac With A Pdf File Manager On A Microsoft Powerbook Or Powerbook On A Pdf (Powerbook) On A Mac Or Powerintosh On A Powerbook With A Powerpoint 3D

Smithsonian Institution Archives Guidance Update SIA. ELECTRONIC RECORDS Recommendations for Preservation Formats. November 2004 SIA_EREC_04_03

Print File Formats: A Comparative Analysis of EMF, OpenXPS and PDF for Enterprise Printing

I.R.I.S. launches IRISPdf 5.0, the new version of its production OCR solution including, for the first time, an Arabic OCR add-on!

Administration Guide. WatchDox Server. Version 4.8.0

BlackBerry Enterprise Server for Microsoft Exchange Version: 5.0 Service Pack: 2. Feature and Technical Overview

ELO Product Comparison

BMC CONTROL-M Agentless Tips & Tricks TECHNICAL WHITE PAPER

Self-Service Active Directory Group Management

Dynamic Output Solutions For Oracle

Project Title: Judicial Branch Enterprise Document Management System RFP Number: FIN122210CK Appendix D Technical Features List

More power for your processes ELO Business Logic Provider for Microsoft Dynamics NAV

TIBCO ActiveMatrix BusinessWorks Plug-in for TIBCO Managed File Transfer Software Installation

More power for your processes

Nuance Power PDF Advanced.

CONTROL YOUR INFORMATION BEFORE IT CONTROLS YOU

Network Scanner Tool R3.1. User s Guide Version

ITA Dynamics Waste & Recycling Document Management System enwis) DOCMA

Dream Report REPORTING SOFTWARE

Deltek Vision 7.0 LA. Technical Readiness Guide

Backup & Restore with SAP BPC (MS SQL 2005)

Adobe Developer Workshop Series

PaperSave IT Prerequisites for Blackbaud s The Financial Edge

Electronic Records Management Guidelines - File Formats

Using CONNECT to Outlook. CONNECT to Outlook ProductInfo. A strong team: DocuWare and Microsoft Outlook. Benefits

NEXT GENERATION ARCHIVE MIGRATION TOOLS

Oracle Universal Content Management

PrinterOn Enterprise Administration Support Guide

3 C i t y C e n t e r D r i v e S u i t e S t. L o u i s, MO w w w. k n o w l e d g e l a k e. c o m P a g e 3

Reseller Product Price List

Code Estimation Tools Directions for a Services Engagement

Treeno File Monitor. Installation and Configuration Guide

Transcription:

White Paper 3-Heights Document Converter Basics and Applications

Contents Introduction...3 What does a central conversion service do?...3 How is the service used?...4 What are the benefits of a central service?...5 What additional functions does the service offer?...6 Architecture and performance features.................................... 8 Quality: reproduction fidelity and ISO conformance...8 Robust, unattended operation...10 Distributed architecture and scalability... 11 Performance... 12 Application integration interfaces... 13 Extensibility: Document formats and supplemental functions...14 Product Editions... 15 About Tools AG...16

Introduction What does a central conversion service do? The 3-Heights Document Converter is a central service that converts corporate documents to a uniform, standardized file format, /A, or TIFF. O ffi c e Central Conversion Service /A Scanner 3-Heights Document Converter E-Mail TIFF Figure 1: A central service for converting documents into a standardized format Documents for conversion may exist in a wide variety of file formats. Most common are files produced by applications such as Microsoft Word, Excel, PowerPoint, and Visio, as well as emails, text, and scanned image files such as JPEG, TIFF, PNG, GIF, and BMP. Long-term retention and interchange with business partners are the main reasons for converting such documents. 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 3/16

How is the service used? Documents Email with Attachment 3-Heights Document Converter /A TIFF Archive Specialized Applications /A Archive Figure 2: The 3-Heights Document Converter creates /A files from many different file formats and applications for long-term storage The 3-Heights Document Converter can be used to: Make /A copies of all incoming and outgoing documents for the corporate archive Archive documents which are produced to support business processes Archive all email traffic between the organization and its business partners, including email attachments Migrate archives containing digital documents in an obsolescent or proprietary format to a new archive that uses the ISO standard /A format Generate documents from business applications centrally, via a web service or a programming interface (API) 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 4/16

What are the benefits of a central service? The 3-Heights Document Converter makes sense even with just a small number of workstations since there are economic benefits and qualitative gains. The following table summarizes the benefits of central service vs. local software: Scalability Installation and maintenance User support Unattended operation Variety of input formats supported Application versions and configuration differences Robustness Client Based Workstation computing power is the limiting factor in conversion performance Conversion software needs to be deployed to every client, and to be configured by the users Hotline and remote assistance to fix problems on the workstation Workstation-based conversion often involves manual user actions (responding to dialogs, acknowledging messages, etc.) Native applications must be installed on the workstation to handle each proprietary input format, limiting the number of supported formats Workstations may have different application versions installed, with configuration variances Users must deal with possible disruptions caused by installation and configuration problems, or conflicting applications Server Based Load balancing enables performance to be scaled as desired One centrally configured and maintained software installation Problems can be reproduced in the test environment, then a lasting fix can be implemented in production The conversion process runs with no manual intervention; the service itself controls native applications as required Standardized formats are converted directly; proprietary formats need just one centralized installation for all users Centrally installed and configured software ensures output documents with uniform and consistent quality The service runs each application in its own protected environment, monitors and automatically restarts them when problems arise 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 5/16

What additional functions does the service offer? The primary purpose of the 3-Heights Document Converter is converting digital documents to a uniform, standardized file format such as /A. 3-Heights Document Converter, TIFF, MS Office, Email, etc. /A OCR XMP Digital Signature Text and Barcode Recognition Metadata Embedding of Original Documents /A Validation Document Merging Stamping Container Extraction Figure 3: Main and ancillary functions of the 3-Heights Document Converter The service also provides the following supplemental functions: Digital signature Applying a digital signature assures the document s integrity and authenticity. A signature may be created according to a specific signature law (e.g. Qualified Electronic Signature QED), meet the needs of long-term archiving or just the needs of a straightforward document exchange. A time stamp can be applied with or without a digital signature. The system generates digital signatures via a cryptographic infrastructure (USB token, HSM) and using a standard interface (PKCS#11). Text and barcode recognition Scanned image files need to be made searchable. The service can use the 3-Heights OCR Enterprise Add-On text recognition service to identify text in an image file and embed it in the converted version, thus making it searchable. Embedding of metadata The /A ISO standard requires that metadata is embedded in the form of XMP packets into the document. The service offers this feature. Embedding of the original documents The original file will be embedded into the converted file, e.g. an Excel file into the /A file. This is required if the original file contains information that is otherwise lost in the conversion process, such as Excel formulas. Another example is the embedding of mandatory XML invoice data (e.g. ZUGFeRD). The service implements this feature of the /A-3 standard. /A validation For quality assurance purposes, special software is available to check the conformance of /A files with the ISO standard. Document merging Documents of the same business case can be merged into a single file or a collection of files, for example merge correspondences into a single dossier. 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 6/16

Stamping Output documents may sometimes require a stamp or watermark. The service receives the stamp data from an XML file and applies the required stamps to the document. Container extraction Files may be packaged in TAR, ZIP, RAR and other containers, especially if those are email attachments. Such containers are often nested, i.e. the files inside are themselves containers. The service is capable of extracting content from nested containers to any depth, and sends the unpacked files for conversion. Further useful supplemental functions are documented in the User Manual. 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 7/16

Architecture and performance features The features of the 3-Heights Document Converter is a result of the system architecture chosen by the development team. From the outset, they based their work on the following requirements: High-quality document conversion with emphasis on conformity with ISO standards and reproduction fidelity Robust, hands-off operation High throughput capability Performance scalability Interfaces for application integration Extensible for further file formats and additional functionality Quality: reproduction fidelity and ISO conformance The 3-Heights Document Converter is comprised of the following: Native applications for performing conversions, in particular Microsoft Office Post-processing software for files directly after being produced by those applications Virtual printer driver specifically designed for this purpose: 3-Heights Producer Built-in conversion programs for standard formats Verification software: 3-Heights Validator 3-Heights Document Converter Figure 4: Ensuring high visual fidelity is a key feature of document conversion. For high reproduction fidelity, the service uses the native application designed for a given file format Microsoft Office in particular. Alternatives such as Open Office frequently produce rendering discrepancies that fall short of what users expect from converted documents. Although target formats can often be generated directly from the native applications ( Save As /A ), results tend to lack fidelity of reproduction and standards conformity. When there is benefit to using an application s built-in functionality, the service does so and then post-processes the application output file for quality assurance purposes. Typically, it will use Save As and then convert the resulting to /A format. 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 8/16

With many applications, the print function is the only way of producing a or TIFF file. For optimal document conversion via this route, the service employs a suite of virtual printer drivers specifically developed for this purpose. Those are capable of converting directly to the target TIFF or format, rather than via a PostScript driver with its inherent graphical limitations. The service uses built-in programs such as the 3-Heights Image to Converter, 3-Heights to /A Converter, etc., for the conversion of standard formats such as the raster image formats TIFF, JPEG, PNG, GIF, and BMP as well as EMF and other vector graphic formats, and for converting to /A. Optimal and consistent quality conversion to those formats is thus assured. Extra functionality for /A validation can be used in situations where verifying conformity with the ISO standard is necessary. Conformity checking is often a prerequisite for linking the service with an archive system that does not have its own validation facility. 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 9/16

Robust, unattended operation The 3-Heights Document Converter runs native applications like Microsoft Office in the isolated, controlled environment of a Windows Terminal Server session. Although 3-Heights Document Converter Terminal Server Session etc. Figure 5: The original applications are opened in a Windows Terminal Server session to guarantee robust, stable operation this may seem an exaggerated measure, it is necessary for robust operation. The main reasons for this approach are: Preventing interferences with interactive users Running multiple instances of an interactive application Automatic responses to interactive messages from applications Monitoring, starting and stopping interactive applications and sessions Native applications such as Microsoft Office are generally designed for interactive operation. Controlling these via a programming interface comes with the risk that an interactive user may step in during a command sequence. This can seriously interfere with conversion and ultimately cause a failure. This kind of disruption cannot occur within a protected Terminal Server session. Interactive native applications are usually able to convert only one document at a time. The service accommodates this behavior by serializing the conversion jobs. While this impacts throughput, the system can make up by starting multiple application instances in separate Terminal Server sessions. Native applications designed for interactive use may display dialog boxes in the course of processing a document (e.g. opening a file, or printing). These dialogs require dismissal before processing will continue, so the service watches the application and steps in automatically to close open windows. An interactive session is the sole way to implement this functionality, ideally hosted in a Terminal Server environment. A corrupt document may cause an interactive native application to freeze or crash. Therefore, the service keeps watching for this kind of event and automatically closes and restarts the application should a problem arise. The service also manages the Terminal Server sessions used for hosting instances of the interactive native applications. 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 10/16

Distributed architecture and scalability With its distributed architecture specifically oriented to document processing, the 3-Heights Document Converter is broadly scalable by distributing service tasks among several subsystems. Distributed Architecture of the 3-Heights Document Converter Dispatcher Service Network Client 1 Client 2 Worker Session 1 Worker Session 2 Figure 6: Distributed architecture of the 3-Heights Document Converter The key subsystems are: Dispatcher The dispatcher is a multi-threaded Windows system service that runs as a single instance on each installation. Its tasks are: Accepting conversion jobs and splitting them up for distribution among Worker Sessions that do the conversion work Starting and managing Worker Sessions Starting, monitoring and controlling interactive applications Performing conversions that do not require a Worker Session Performing supplemental functions such as text recognition, digital signatures, etc. Worker Session The Worker Session is a Windows Terminal Server session; one installation may run several sessions concurrently. A session: provides a runtime environment for interactive applications isolates multiple instances of an interactive application from one another generates output files using the virtual printer driver Watched Folder Service This service monitors file system directories, sends conversion jobs to the dispatcher, and saves converted files to an output directory. Mail Folder Service This service monitors mail server mailboxes, sends conversion jobs to the dispatcher, and returns converted files as email or stores them in the file system. Webservice This service is an Internet Information Server (IIS) extension that accepts conversion jobs from the network and returns converted files to the sender. OCR Service A service used by the dispatcher for text recognition. It accepts image files and returns the recognized text. 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 11/16

The distribution of tasks among subsystems makes the 3-Heights Document Converter broadly scalable. Various configurations are feasible, depending on the required throughput. In a basic configuration, all of the subsystems operate within a single session on one computer. In a complex set-up, subsystems may be hosted on separate hardware. Practical production environments often use Windows Server with Terminal Server sessions, all running on high-performance hardware with multi-core processors. Performance The benchmark configuration for measuring the performance of the 3-Heights Document Converter was as follows: Hardware: HP Server DL 380p Gen 8 Virtualization: ESXi/VMWare Operating system: Windows Server 2012 Microsoft Office: Office 2013 Converter Service: 4 Worker Sessions Client: Windows 7 Here are the resulting performance figures: 2 pages 10 pages 50 pages 250 pages Print Print Print Print DOC 0,42 0,55 0,77 0,78 2,02 1,72 8,5 6,4 DOCX 0,4 0,53 0,73 0,87 2,2 1,93,, XLS 0,5 0,6 1,3 0,9 6 2,1 8,7 XLSX 0,45 0,6 1,2 0,85 5,9 2,1 8,6 PPT 1 1,2 1,3 1,6 2,3 2,7 6,8 7,5 PPTX 0,9 1,1 1,2 1,5 2 2,5 5,7 6,5 0,11 0,17 0,65 2,2 tagged 0,2 0,8 5,3 TIFF bilev 0,2 0,4 4,1, 6,9* TIFF OCR,, TIFF color 1,4 1,9,, * From Doc All numbers: seconds per document Figure 7: Performance Figures 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 12/16

Various findings are arising from the measured processing times: 1. A significant portion of processing time is consumed by: Starting Office applications Opening documents in Office applications Sending documents over the network 2. Twice as many pages do not necessarily mean twice the processing time; each document has a certain overhead, and content is also a factor. Complex, cross-referenced documents have a somewhat longer overall processing time. 3. Scaling with multiple Worker Sessions is gainful when multiple clients are able to use the service concurrently. 4. Faster hardware is beneficial for long documents with numerous pages. Application integration interfaces There is a series of interfaces for integrating workstations and servers that host applications used by the 3-Heights Document Converter. The major ones are: Webservice The web service presents a SOAP/XML interface. The web service integrates easily with applications using a WSDL file. Programming Interface (API) These are components for integrating the service with applications at the programming level. Java, C, COM and.net interfaces are all provided. The same components are also available for other platforms such as Linux, Sun OS, AIX, HP-UX, Mac OS/X, etc. Command Line Tool The tool is a standalone program that can be run directly from the command line. A command language (Shell Command) can then be used to automate processes without requiring a development environment. The command line program is also available for other platforms such as Linux, Sun OS, AIX, HP-UX, Mac OS/X, etc. File Explorer add-on This component is a Windows file explorer extension for users to convert single files. 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 13/16

Extensibility: Document formats and supplemental functions Depending on its line of business, an organization may use many document formats. Moreover, a modern archiving system may be seriously challenged by files from older digital archives that were created by specialist applications using proprietary formats (e.g. project planning, design), or obsolescent formats (text processing). Every document format has its own peculiarities, so converting everything into one standard format is not always easy. Standard 3-Heights Document Converter IN Plug-in interface OUT Customer-specific Enhancements Plug-in for document formats Plug-in for applications Plug-in for processing Figure 8: Plug-ins can be used to expand the number of formats and functions supported by the 3-Heights Document Converter Sending documents like these to a central service is made possible by plug-in extensions, which are a feature of the 3-Heights Document Converter architecture. A plug-in is a program that performs a certain conversion process. There is an open interface dedicated to communication between the service and plug-ins. Two examples (COM and.net) are provided to help developers get up to speed with plug-in programming. Plug-ins are suited to integrating a specific conversion function with the service, extending native system functionality. 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 14/16

Product Editions The 3-Heights Document Converter comes in two different editions. Each edition is intended for a certain purpose. Here are the differences between the editions: Interfaces Watched file folders Watched email folders (via IMAP) Command line (batch processing) Programming interface (API) Web-service Shell extension for Explorer (right-click to convert) Functions Enterprise Small-Medium Enterprise OCR optional optional Merge files Compliance validation Digital signature Encryption Linearization File compression Support for meta data Custom (e. g. CAD) Load balancing Script Plugin Input Formats MS Word, Excel, Powerpoint, Visio MS Outlook (MSG) Simple text Word Perfect Open Office Image formats (BMP, GIF, JPEG, PNG, TIFF etc.) Nested containers (ZIP, TAR) Websites (URL) HTML Email, email with attachment Links Output Formats TIFF /A-1a, /A-1b /A-2a, /A-2b, /A-2u Zipped (TIFF or ) excl. URLs 2014 Tools AG Premium Technology White Paper 3-Heights Document Converter, page 15/16

About Tools AG Tools AG counts more than 4,000 companies and organizations in 60 countries among its customers, making it one of the world s leading producers of software solutions and programming components for and /A products. Dr. Hans Bärfuss, founder and CEO of Tools AG, began using technology in customer projects more than 15 years ago. Since then, the and /A format have evolved into a powerful, widely used format and ISO standard that can be used for almost any application. During this time, Tools AG has developed into one of the most important companies on the market for technology, and has played a significant part in developing the /A ISO standard for electronic long-term archiving. As the Swiss representative on the ISO committee for /A and, the company s knowledge flows directly into product development. The result is high quality, efficient products based on the 3 Heights philosophy of the development team, which consists of experienced engineers. The portfolio of Tools AG ranges from components to services through to solutions. The products support the entire document flow, from raw materials to scanning processes through to signing and storage in a legally compliant long-term archive. An advantage of the components and solutions is the broad range of interfaces, which ensure smooth and easy integration into existing environments. Due to the growing demands of the market, the products are enhanced and refined continuously. Support is provided by the developers themselves, allowing them to identify trends and customer requirements quickly and use this knowledge when planning enhancements and components. All development activities are performed in-house at Tools AG in Switzerland. The company does not outsource any programming, so that the entire development process can take place centrally in a single location. This helps to ensure the high standards expected by the company, particularly with regard to the 3 Heights technology. The effectiveness of this approach is confirmed by the success of the products on the market. Our customers include well-known global companies from every industry. That is the greatest compliment of all and the perfect motivation to continue shaping the world of and /A. Tools AG Kasernenstrasse 1 8184 Bachenbülach Switzerland Tel.: +41 43 411 44 51 Fax: +41 43 411 44 55 pdfsales@pdf-tools.com www.pdf-tools.com Copyright 2014 Tools AG. All rights reserved. Names and trademarks of third parties are legally protected property. Rights may be asserted at any time. The representation of third-party products and services is exclusively for information purposes. Tools AG is not responsible for the performance and support of third-party products and assumes no responsibility for the quality, reliability, functionality or compatibility of these products and devices. Whitepaper-DocumentConverter-EN-20141027