Catalyst CR Document Indexing Policy



Similar documents
AirZip FileSECURE 支 持 的 文 件 格 式

PTC ProductView Supported formats

Enterprise Vault Whitepaper

Oracle Outside In Technology: Robust and flexible software components for electronic discovery solutions.

Oracle Outside In Technology: Robust and flexible software components for electronic discovery solutions.

Web Client: DocuWare Web Client: Liste der für die Anzeige unterstützten Dateiformate Frage:

Document Management Server - Overview

Suitable file formats for transfer of digital records to The National Archives

How Xena performs file format identification

Electronic Records Management Guidelines - File Formats

Paraben s P2C 4.1. Release Notes

How to Convert Outlook Folder Into a Single PDF Document

Smithsonian Institution Archives Guidance Update SIA. ELECTRONIC RECORDS Recommendations for Preservation Formats. November 2004 SIA_EREC_04_03

faxing simplified. anytime. anywhere. MyFax User Guide

CALL 888.MY.SPHERE ( ) FOR MORE SUPPORT CALL

Server-Based PDF Creation: Basics

Hosted Mail Archiving (HMA) User Guide

Quick Start Guide. Managing the Service. Converting Files and Folders

Table of Contents. 1 Introducing Corel PDF Fusion Key features Appendix A: Supported file types... 10

AD SUMMATION DII/eDII Guide

Points to Note. Chinese and English characters shall be coded in ISO/IEC 10646:2011, and the set of Chinese

2017 Australia Awards Scholarships. Electronic Application Form / 00

Standards Development. PROS 14/00x Specification 3: Long term preservation formats

White Paper. 3-Heights Document Converter Basics and Applications

Personal Archive User Guide

Document Exporter for Outlook

PEERNET File Conversion Center 6.0

My Account User Guide. Popfax.com login page. Easy, inexpensive Effective!

PDF Portable Document Format

Note: * Indicates optional package and requires additional components, not part of standard ereview Enterprise Edition

GlobalScan NX. Server 32/Server 750. Intelligent scanning for smarter workflow

Sharing Files and Printers. Mac/PC Compatibility: QuickStart Guide for Business

U.S. Securities and Exchange Commission. Data Delivery Standards

Simplify essential workflows with dynamic scanning capabilities. GlobalScan NX Server 32/Server 750 Capture & Distribution Solution

Computer and IT Courses

How DHS is Doing Cybersecurity with Content Filtering

Outside In Image Export Technology SDK Quick Start Guide

Alteva Fax USER GUIDE

dtsearch Desktop dtsearch Network

Clearwell E-Discovery Platform V6.6 Case Administration Guide. Revision: May 9, 2011

RJS Software Systems Inc AS/400 Report Delivery System

MailStore Server The Standard in Archiving

11.5 E-THESIS SUBMISSION PROCEDURE (RESEARCH DEGREES)

Useful Utilities. Here are links to free third party applications that we use and recommend.

Electronic Document Management Small to Medium Enterprise Systems Overview. Technology by DOCOsoft

In addition, a decision should be made about the date range of the documents to be scanned. There are a number of options:

Digital Preservation. Guidance Note: Graphics File Formats

Live Office. Personal Archive User Guide

Office 365 for the Information Governance and ediscovery Practitioner. Part II: ediscovery Deep Dive October 27, 2015

Comparison Document. Comparison between LepideMigrator for Exchange & Lepide Exchange Recovery Manager

Using PDF Files in CONTENTdm

File Formats. Summary

Preservation Handbook

How to create an

Server 32/Server 750. GlobalScan NX Server 5/ SOLUTION. Intelligent scanning for smarter workflow

1352 Blue Oaks Blvd. Suite 180 Roseville, CA (916) Arroyo Consulting Dynamic Website Storyboard

balesio Native Format Optimization Technology (NFO)

What Am I Looking At? Andy Kass

Administrating LAW PreDiscovery User Guide

Improved document archiving speeds; data enters the FileNexus System at a faster rate! See benchmark test spreadsheet.

DATA MANAGEMENT FOR QUALITATIVE DATA USING NVIVO9

Eskills Desktop Courses

Novell GroupWise Microsoft Exchange/Outlook (PST)

Installed Applications Summary... 1

Importing and Exporting With SPSS for Windows 17 TUT 117

MailStore Server 5.0 Documentation

Common Questions and Concerns About Documentum at NEF

easy ntelligent convenient GlobalScan NX Server 5/ Server 32/Server 750 Capture & Distribution Solution Energize Critical Workflows

CARA v3.5 Sept 2013 Major new features. Set your users free

HP ARCHIVING SOFTWARE FOR EXCHANGE

UNITED STATES DISTRICT COURT DISTRICT OF OREGON. Pursuant to the Court s order dated May 4, 2015 (Dkt # 110),

Lotus Notes (NSF) Microsoft Exchange/Outlook (PST)

Topic: Receiving and Responding to CBP Forms

SonaVault Archiving Software

Enhancing Document Review Efficiency with OmniX

INDEX. OutIndex Services...2. Collection Assistance...2. ESI Processing & Production Services...2. Computer-Based Language Translation...

MOBILE PRINTING: Secure Printing From Your Handheld Devices

Best practices for producing high quality PDF files

ZipMail Client XML PDF PICT V11. New. New. New. Automatic and transparent on-the-fly Zip compression and decompression for Lotus. Notes attached files

User Guide - Table of Contents

Version 3.0 May P Xerox Mobile Print Cloud User How To and Troubleshooting Guide

Aspose.Cells Product Family

DLA Internet Bid Board System (DIBBS):

Suggestions and Tips for Managing and Uploading Files for the AEP Application

ImageNow User. Getting Started Guide. ImageNow Version: 6.7. x

PixEdit Server & PixEdit Converter Server Deployment Guide

Using FileMaker Pro with Microsoft Office

Tibiscus University, Timişoara

Microsoft Exchange/Outlook (PST) Office 365

How To Use The Policy Patrol Archiver Server

Transcription:

Catalyst CR Document Indexing Policy While Catalyst CR can accept a wide variety of files for viewing, many formats are not appropriate for full-text indexing. This document sets forth our policy and procedures for indexing files in Catalyst CR. We index a wide variety of document formats including most Microsoft Office formats, email formats, WordPerfect and Lotus formats and general text files. We do not index non-document formats such as image files (which can be loaded and displayed) container files (zip, tar, gzip, etc.), database files, mail archives or system files. These formats are either inappropriate for indexing or should be processed first to extract their contents for indexing. Section 1 provides a list of standard document formats that generally can be indexed on the site. Section 2 provides examples of file types that are not appropriate for indexing. Section 3 provides special rules for very large files which require special arrangements for loading and indexing. If you have questions about this policy, please contact your Catalyst Client Services representative. Caveat Just because a file type is appropriate for indexing does not mean that it will be properly indexed. Native files can be corrupt, have illegal programming characters, or other issues we cannot describe in advance that may cause our search indexer to fail or to only partially index a file. You should know that we do not inspect individual files to determine whether they are indexed properly or whether all of the text in the file was indexed. It would be all but impossible to do so. We do provide a utility that will allow you to view files that had indexing issues. This utility is designed to report when a file cannot be indexed (e.g. an image file, no document found, file can t be indexed). However, the utility does not report instances where some text was indexed but other text could not be. We are not aware of any method to provide this information other than comparing indexed and actual text by hand. In loading files into Catalyst CR, we do not (and cannot) guarantee that all of the text in any particular document is properly indexed. We use the indexing software provided through our FAST search engine license and index on a best efforts by the software basis. File Formats Accepted for Indexing Our FAST engine uses the industry-accepted Stellent filters to extract text from documents for indexing and search. While formats are commonly identified by their 3-letter file extension, our system examines each file to determine its actual character regardless of extension. Catalyst CR will accept most versions of the following file formats for indexing. If you have a question about a specific format, please ask and we will determine whether that file type is indexable. If not, we can assist you in converting it to a format that is indexable, e.g. PDF, text or HTML. www.catalystsecure.com 877.557.4273 info@catalystsecure.com

Word Processing Formats: Lotus WordPro and related versions MacWrite II Microsoft Rich Text Format Microsoft Word for DOS Microsoft Word for Macintosh Microsoft Word for Windows Microsoft WordPad Microsoft Works for DOS Microsoft Works for Macintosh Microsoft Works for Windows Microsoft Write Novell/Corel Perfect Works Novell/Corel WordPerfect for DOS Novell/Corel WordPerfect for Mac Novell/Corel WordPerfect for Windows Open Office Writer (Text Only) Spreadsheet Formats: Lotus 1-2-3 (DOS & Windows) Lotus Symphony Microsoft Excel for Macintosh Microsoft Excel for Windows Microsoft Windows Works Microsoft Works (DOS) Microsoft Works (Macintosh) Open office Calc (Text Only) QuattroPro for DOS & Windows StarOffice Calc (Text Only) 3 2

Presentation Formats: Microsoft PowerPoint for Windows Microsoft PowerPoint for Macintosh Novell\Corel Presentations Open Office Impress (Text Only) Star Office Impress (Text Only) Email Formats: MIME (text mail) MSG Outlook Mail Message (Windows text only) EML (Standards based email formats) Other Formats: PDF (Adobe Acrobat) HTML Text files (Subject to size limitations see below) Microsoft Project (text only) vcard Electronic Business Card As of this writing, Catalyst cannot index Office 2007 formats or Microsoft OneNote files. If you need to index these files, they have to be processed separately and converted to an indexable format. Formats Not Accepted for Indexing Certain file formats are not appropriate for indexing in Catalyst CR without additional processing or other special attention. Here are examples of representative formats that will be excluded from indexing 1 : Container files: ZIP, GZIP, LZA, Microsoft Binder, UNIX TAR Compressed container formats are usually decompressed during processing. We will index the contents of container files but will not index the container files themselves except to present a list of their contents. 1 This list is meant for illustrative purposes and is not meant to be comprehensive. 3

Email Container Formats: PST, OST, NSF and all other mail archives These formats contain msg and eml files that should be processed and extracted before they are uploaded into Catalyst CR. We will index the extracted contents of these files but not the containers themselves. Database Formats: Access, dbase, FoxBase, SQL, Paradox, etc. Database formats are not appropriate for full-text indexing in Catalyst CR. If you need to review database information, we can assist by creating reports and records that may be appropriate for indexing in CR. Please consult with a Client Services representative for special treatment of these files. Graphics Formats: BMP, CGM, GIF, JPEG, PCX. PSP, PNG, TIFF, WMF, WPG, SWF While these formats can be displayed on the site, they do not typically contain text that can be indexed by the system. In some case, they will be accompanied by text files that can be indexed. TIFF is an example of a file format that is regularly accompanied by a matching text file.. Executables: EXE, DLL and all system or program files These formats are often excluded during the processing phase and are rarely needed or desired in a document review system. Unknown files without identifiable extensions or content. Files that are not indexed in Catalyst CR can still be uploaded to the site if desired. The system will index any related metadata for the file contained in the record associated with it, e.g. Date, Author, Description, Control Numbers. Let us know if you have special file formats you need indexed for a particular matter. In most cases we can accommodate so long as the format has text to index and can be accessed using our filters. Special Indexing Rules for Very Large Files Large text or document files often contain program code, sql dumps or other content that is not suitable for full-text searching and which can damage our document indexes. To protect against this and because such files are seldom needed for search, we apply the following rules for very large files: Any file that contains over 8 million unique words (any combination of letters or numbers separated by spaces or punctuation) will be truncated at 8 million words. Any file that is over 85 megabytes in size will not be indexed. Instead, the file will be flagged as oversized allowing for special review and treatment. 4

Any file where the extracted text is greater than 15 megabytes in size will not be indexed. Instead, the file will be flagged as oversized allowing for special review and treatment. To put these rules in perspective, a five hundred page novel typically contains about a megabyte of text. Files containing 15+ megabytes of text are usually system log or core dump files containing substantial amounts of numbers and text that are not suitable for indexing or search. 5