Redefining High Speed ediscovery Processing & Production Conversion of the EDRM Enron Dataset from Natives to TIFF images in 5.3 hours (23 Million pages/day rate) using the Lexbe ediscovery Processing System August 6, 2014 Karsten Weber Principal, Lexbe LC
ediscovery Webinar Series Information Takes Place Monthly Cover a Variety of Relevant ediscovery Topics Presentations Available for Download by Registrants.
ediscovery Webinar Series About Lexbe Lexbe is an Austin, TX based ediscovery software and services provider. Lexbe ediscovery Platform Lexbe ediscovery Platform is a hosted ediscovery processing and review tool. Users can load a variety of file types, process for review, OCR for search, and conduct document reviews, productions, prepare for depos & analyze transcripts, conduct case analytics, prepare for dispositive motions, and provide litigation support during trial. Lexbe ediscovery Services Lexbe performs large volume document culling, processing from native to PDF or TIFF, load file creation, high-volume OCR of image files, Rule 26 and project management consulting, and related ediscovery Services. Lexbe Sales sales@lexbe.com (800) 401-7809 x22
ediscovery Webinar Series Questions & Technical Issues If you have any questions or technical issues, please e-mail them to: webinars@lexbe.com Questions will be forwarded to Gene and answered during the webinar or via e-mail if we run out of time.
ediscovery Webinar Series Karsten Weber bio Current - Principal of Lexbe LC - Principal Architect of Lexbe ediscovery Platform and Lexbe ediscovery Services Prior Experience - Consulting Expert, Lumin Expert Group - Director of Software, nline Corporation - Software Engineering Manager, KLA-Tencor Education - MBA, University of Texas - M.S. Engineering, Danish Technical University Contact Karsten Weber 512-686-3469 karsten@lexbe.com
High-Speed ediscovery Processing Executive Summary Background of ediscovery Processing & Production ediscovery Review Tools in Use Today TIFF Popularity and Processing Throughput Challenge The Lexbe ediscovery Processing System Test Methodology & the EDRM Enron Data Set Performance Results Comparison with a Large Provider Using Traditional Processing Methods Conclusion
Growth of Data Worldwide Data Types and Volume Keep Expanding Zettabytes* 4 Digital Information Created, Captured, Replicated Worldwide 3 2 1 Voip Email iphones Peer-to-Peer Online Storage Digital Cameras Facebook LinkedIn DropBox Backup Devices Elastic Storage SaaS Google Streets Personal Blogs Skype World Satellite Images Personal Scanners Customer Service Recordings Public Webcams Google Goggles Netbooks Cloud Instance Servers PaaS 2005 2010 2015 Source: IDC Digital Universe Study (2012) * 1 Zettabyte = 1 Trillion Gigabytes
Growth of ediscovery Processing Data Volume is Rising GBs of ESI in a Typical Commercial Case High Enron Criminal Trial (2005) Source ESI: 100M pages (~4 TBs) Brought to Trial: 1M pages (~40 GBs) Extraordinary at time Not now Low 1995 2000 2005 2010 2015 Microsoft (2011) Microsoft collects 45 custodians per matter average (2011) Almost 1 TB per matter, average
Growth of ediscovery Processing Processing Costs Are Falling - But Still High Cost per GB to Process ESI in Volume $2,000 $1,500 $1,800/GB (2006) Source: Forrester Research ESI Processing costs have fallen 90% in the last 10 years $1,000 $500 $500/GB (2011) Source: Forrester Research $0 2005 2010 2015
Growth of ediscovery Processing ediscovery Market is Big & Growing ediscovery Software & Services $5.5 Billion today Growing 15.5% annually Projected $9.8 Billion (2017) Services (72%) Software (28%) Source: Complex Discovery (ComplexDiscovery.com) Based on a combination of public market sizing estimates.
ediscovery Processing Background Processing Activities & Functions Setup & Planning Collection Culling & Analysis Processing Review & Production Depos & Motions Collection Identify and execute retrieval of discoverable documents and electronic evidence. Culling Reduces collections using keyword or date range parameters Native Processing Convert Native Documents (Outlook, Microsoft Office, etc.) into reviewable formats (TIFF, PDF, Near Native) Can include application of OCR to make documents searchable Review Load/ingest ESI into Litigation Database to prepare for trial Production Create a production in a specified format and apply Bates Numbers Apply Privilege QC procedures to avoid inadvertently producing confidential case documents.
ediscovery Processing Background Review Environments and TIFF Type Example Description TIFF Concordance, Summation, CaseLogistix, RingTail, iconnect Currently the most commonly used format/review environment Must process ESI to single page TIFFs with text and load files before review PDF WorldDox, Adobe Requires Documents to be converted to PDF for review Processed Natives Relativity, Allegro Must process ESI into a native load file Generate near native HTML for review Raw Natives Lexbe, Digital Warroom, NextPoint Load raw natives that will be automatically processed within the review software
ediscovery Processing Background TIFF Background TIFF Benefits: Standardized Review Format Page level Bates Stamping can be applied Addresses concern of opposition altering native files Easy to redact TIFF viewer is only requirement Often can be hosted & supported internally 2013 ILTA (International Legal Technology Association) survey found that the vast majority (91%) of firms still use TIFF-Based software.
ediscovery Processing Background The TIFFing Challenge Traditional TIFFing methods have been time consuming and expensive due to the process need for considerable computing power As data volumes continue to increase in size, the time and expense issues associated with TIFFing become more severe
High-Speed ediscovery Processing Meeting the Challenge - Study Goals Evaluate the Capabilities of Lexbe ediscovery Processing System (LEPS) under testable and repeatable conditions Use industry standard dataset to ensure transparent result. Study was run on the 53 GB EDRM Enron Data Set. What is the TIFF throughput rate of LEPS? How automated is LEPS? What quality control procedures are in place? How does LEPS compare to current industry leaders?
High-Speed Processing Demonstration Lexbe Architecture Scalable Systems architecture allows LEPS to increase server instances to apply more resources to your processing task Automated LEPS minimizes the need for babysitting. Fault Tolerant Processing tasks are not batch-centric and check-out/check-in procedures insure individual processing steps operate independently Secure Processing Environment LEPS is powered by Amazon S3 servers to facilitate redundancy and the high security standards. All data is strong encrypted (256-bit) in-transit and inplace. Our data centers provide SOC I and II reports published under SSAE 16 and ISAE 3402 professional standards and are ISO 27001 certified.
High-Speed Processing Demonstration Lexbe Process Archive/Container Decompression Full-text indexing File Repair Bates stamping Metadata extraction & fielding PDF & TIFF creation MD5 hash code generation Placeholder creation System file identification & DeNIST Email attachment extraction & parent email association Native text extraction OCR of image files Native extracted, PDF and TIFF loadfile generation in multiple formats: XLSX (Lexbe), DAT/OPT (Case Logistix, Concordance, ipro Allegro, Ringtail, Kura Relativity) and DII (Summation), and quality control reports
High-Speed Processing Demonstration Results
High-Speed Processing Demonstration Sample Output High quality output is critical, especially when making a claim of increased efficiency.
High-Speed Processing Demonstration Lexbe Quality Control Tools and Features High quality output is critical, especially when making a claim of increased efficiency. Programmatic batching of processing to individual servers (reduces human error) Custom QC flag creation and filtering Integration with Excel for reporting and analysis Pivot table analysis and charting Ability to view all documents including parent containers (email and attachments) together Ability to verify image quality Filtering and reporting by any captured or calculated fields including failed to convert, words in document, placeholders, etc. Native files are extracted and provided for linked load and review Statistical sampling and reporting
ediscovery Processing Background Providers of TIFF Processing Type Example Description Service Providers Xerox, Lexbe etc Business Service bureaus that deliver a wide range of processing service. Local server setup and capacity Professionals Internal Litigation Support Department inside of law firms responsible for conducting litigation support processing functions. Often work with service and software providers to meet internal demands. Software Providers Ipro, Law Develop processing software that is licensed for resale by service providers or use in internal litigation support departments
Compare Lexbe to Industry Leaders Lexbe v. Xerox Xerox is known for its highvolume litigation processing and production capacity. Xerox states in its service literature that its production capacity is 5 million pages a day.
High-Speed ediscovery Processing Summary TIFF is important and turn around time is critical Traditional approaches: Fixed capacity leading to variable turn-around time. Lexbe approach: Scalable capacity leading to fixed turn around time. Lexbe study demonstrates what we believe is the worlds fastest TIFF processing thereby allowing you to meet even the toughest discovery deadlines.
High-Speed ediscovery Processing Related Lexbe Services ESI Culling+ Reduce ESI stores to manageable sizes with DeNIST, deduplication, date culling and keyword culling. Metadata extractions and PST reconstitution is available as well. ESI Email Collection+ Flatten and extract native file attachments and metadata to create loadfiles in preparation for native or near native review. Native Processing+ Convert native documents, including Outlook Email and Microsoft Office files, into TIFF or PDF format for searchability, bates stamping, and preparation for online review. ediscovery OCR+ Apply optical character recognition to increase searchability of PDFs, TIFFs, or document-formatted JPGs or PNGs. NearDup Groupings+ Identify key documents, group similar documents, ensure consistency in privilege coding, and enable email threading.
Thank You Contact Info Karsten Weber: karsten@lexbe.com Principal: (800) 401-7809 Stu Van Dusen svandusen@lexbe.com Marketing Manager: (512) 669-9485 Webinar Questions: webinars@lexbe.com