Taming Big Data ediscovery Ten Tips to Avoid be Byten by Big Data in Your Case Presented at the University of Texas 27th Annual Technology Law Conference May 22, 2014 Austin, Texas Gene Albert Principal, Lexbe LC Gene Albert bio Principal of Lexbe LC, a provider of cloud-based litigation review and document management software & ediscovery services. Prior business experience in software, medical services and internet-based businesses. Prior legal experience as in-house counsel and in private practice. Frequent speaker and author on ediscovery and legal technology issues. Education MBA, University of Texas (2005) JD, Southern Methodist University (1983) BA, University of Texas (1979) Contact Gene Albert 512-686-3460 gene@lexbe.com
Taming Big Data ediscovery Agenda ediscovery - the Original Big Data Problem How Should We Look at ediscovery Costs? How Much is ediscovery Data Increasing? Why Are ediscovery Costs Continuing to Rise? 10 Ways to Tame Big Data ediscovery ediscovery: the Original Big Data What is Big Data? Flavor of the Month/News Cycle? Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. (Gartner) Litigation document management is an important and early Big Data application. ediscovery is characterized by sometimes extreme time-pressures often not present in Big Data applications in other industries. How to Control ediscovery Costs Easy answer - drop your case. No ediscovery costs then... Nonsensical answer because ediscovery is part of the strategic pursuit of case goals, and should be approached, evaluated and managed in that context.
What is ediscovery Like? Are ediscovery Costs Like Expert Fees? Yes Large, expected part of commercial litigation Quality matters; not a commodity You need to understand the process undertaken for both No Involuntary (you don t have to hire an expert, but you do need to respond to discovery) Experts are always important to litigation if retained; ediscovery usually only if mistakes occur ediscovery expenses increase based on quantity of data, but experts bill hourly, related to to case importance Why Do ediscovery Costs Rise? ediscovery Market is Big & Growing ediscovery Software & Services Source: Complex Discovery (ComplexDiscovery.com) Based on a combination of public market sizing estimates. $5.5 Billion today Growing 15.5% annually Projected $9.8 Billion (2017) Services (72%) Software (28%)
Is ediscovery Data Increasing? Data Types and Volume Keep Growing Zettabytes* 4 3 2 1 Digital Information Created, Captured, Replicated Worldwide Voip Email iphones Peer-to-Peer 2.8 zettabytes of information were created Online Storage and replicated during 2012, a 56% increase Digital Cameras from 2011 (IDC) Facebook LinkedIn DropBox Backup Devices Elastic Storage SaaS Google Streets Personal Blogs Skype World Satellite Images Personal Scanners Customer Service Recordings Public Webcams Google Goggles Netbooks Cloud Instance Servers PaaS 2005 2010 2015 Source: IDC Digital Universe Study (2012) * 1 Zettabyte = 1 Trillion Gigabytes Is ediscovery Data Increasing? But It s Not Being Retained Today Types of Information that Organizations Retain and Do Not Retain These data types will be used in future litigation because they contain relevant evidence Source: Osterman Research 2014
Why Are ediscovery Costs Rising? It s Not Costs - They re Falling Cost per GB to Process ESI in Volume $2,000 $1,800/GB (2006) ESI costs have fallen 90% in the last 10 years Source: Forrester Research $1,500 $1,000 $500 $500/GB (2011) $150/GB (2014) Source: Forrester Research $0 2005 2010 2015 Why Are ediscovery Costs Rising? The Volume is ESI is Rising Faster GBs of ESI in a Typical Commercial Case Enron Criminal Trial (2005) High Source ESI: 100M pages (~4 TBs) Brought to Trial: 1M pages (~40 GBs) Extraordinary at time Not now Microsoft (2011) Low 1995 2000 2005 2010 2015 Microsoft collects 45 custodians per matter average (2011) Almost 1 TB per matter, average
Why Are ediscovery Costs Rising? Increase in Size per Custodian Microsoft Custodian Size Increases GBs 30 Drivers of Costs 2011 17.5 GBs per Custodian (0.9 Million pages) 20 More ESI data sources More ESI stored Increases in Custodian ESI size Outpaces Drops in per GB costs 10 Source: Legal Technology Leadership Summit (2011) 2008 7 GBs per Custodian (0.5 Million pages) 5 2005 2010 2015 Why Do ediscovery Costs Rise? Review Costs Dominate Total Costs CASE STAGE SOURCE 8% 19% edisc Providers 26% Review 73% Outside Counsel 70% Total 100% Internal Total 4% 100% Best opportunities for further cost savings will be technologies and process improvements that increase attorney review efficiencies. N. Pace and L. Zakaras, Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (RAND Institute for Civil Justice 2012)
1. Understand the ediscovery Process Is ediscovery a Black Box? Seemingly, in goes files, data, money and time And, out comes documents and data What happens inside?... who knows How Do You Manage a Black Box? Yell, plead, whine, beg, pray ediscovery is not a Black Box 1. Understand the ediscovery Processes No avoiding getting one s hands dirty and understanding what is going on in the ediscovery Black Box to manage effectively. At each stage ask: What is being done? What are the drivers of quality and cost? Would process and technology improvement yield a better result? ediscovery processes should be matched to case needs, overall case strategy and risk management.
2. Use a Systems Approach Start with the end in mind -- where do you want to end up? How will ediscovery choices help or hinder. Determine internal and external expertise available to help. Do internal IT personnel support and have expertise? Consider using a project manager and project management methodology for larger, time sensitive or complicated matters to coordinate. Use Rule 26 Conference to negotiate/document discovery plan. Consider negotiating a formal ESI Protocol to address issues of collection, production, data availability, etc. 3. Manage the ediscovery Funnel High Volume Low Relevance ESI ID & << Early Efforts Here Result in << m e EDA & Culling/Filtering << Ti Use (Depos,, Trial) Low Volume High Relevance << Improved Quality and Reduced Costs Here <<
4. Know Your Custodians and ESI Data Mapping/tracking custodians & ESI sources Methodology based on litigation goals & requirements Proportionality considerations Rule 26 (meet & confer); cooperation Issues with asynchronous discovery 4. Know Your Custodians and ESI Data IT network maps alone are not sufficient to identify and manage ESI
4. Know Your Custodians and ESI Data Content Mapping is better for quantitative & qualitative data analysis Custodians Amount of ESI Accessible Backup Policy 5. Avoid Under- & Over- Dangers of Under- Missed, lost or destroyed documents data Sanctions, adverse inferences Greater costs to collect later, reconstruct Address by Early ID of custodians, Mapping of ESI data sources Early case analysis to find data holes Document process; Use Rule 26 conferences
5. Avoid Under- & Over- Dangers of Over- Unnecessary expense in collection, review and production Slowing of case progression; mis-allocation of time and other resources Address by Analyze and test ESI data sources for likely responsive content Prioritize data by importance, proportionality Separate out hold, collection, and processing steps and progressively move through them by data source 5. Avoid Over- & Under- Who is Collecting? Custodian Self-Collect - Increasingly difficult to justify. Issues with custodian collection competency, even if you assume good custodian intentions. Internal IT - Make sure training in ediscovery, sufficient time to meet deadlines. Outside Vendors - Usually most expertise and expense. Rent (external vendor) vs. buy (internal staff) issue.
5. Avoid Over- & Under- Plan & Document How is to be Done Files vs. disc images; Active vs. forensic collection Local vs. remote vs. network collections Search and index quality if data has not been processed Preserving metadata Devices (laptops, phones, flash drives, etc.); Cloud accounts; social media (e.g., Google, Dropbox, Facebook, LinkedIn) 6. Reduce Reviewable Data with Culling Purpose Defensibly remove files from process that are unlikely to lead to responsive documents Culling Processes Issues Keyword selection & testing, concept searching, process documentation, repeatability, culled file retention Reduction ESI may reduce 95% at this stage from raw data size Expansion, repair, DeNIST, OCR Filter by file type & date Deduplication (within or between custodians) Indexing and keyword filtering Linear vs. dynamic culling
7. Prepare ESI for Review Platform ESI and metadata must be processed for review & production except for the smallest cases. Review in Native, Near-native, HTML, PDF or TIFF; Choice driven by review platform capacities; Metadata in load files. How are exceptions handled (e.g., corrupt files, unusual file types, password-protected files, etc.); Use of Placeholder files. Use high-volume vendors when time is tight. for TIFF review tools requires the most throughput capacity. 8. Right-Size Your Review Methodology Match Methodology and Case: Case size/type/budget will drive which review method is preferable in any given case. Linear Review: Read, review, and code all documents, one at a time. Comforting but not cost-effective or even possible in larger cases. Keyword Search: Using search keywords to identify responsive and privileged documents. Accurate and cost-effective if done correctly. TAR: Technology assisted review/predictive coding. Manually review a seed set of documents to train computer algorithm that will automatically code the remaining documents.
8. Right-Size Your Review Methodology Watch out for Inadvertent Privilege Release Larger cases have put a strain on accurate privilege review. Finding 24 versions of a privileged document doesn t help if you release version 25. Nothing is more costly than compromising or losing a case because of privilege disclosure. Claw-back agreements a good idea, but no panacea. You can t unring a bell. 8. Right-Size Your Review Methodology Minimizing Risk of Privilege Release Understand the Privilege Review process undertaken in detail. Build dictionary of privileged sources and issues early in doc review. Check for: untrained or sloppy review; unsearchable documents; incomplete search indices; poor redaction procedures; search not done in metadata and full-text; privilege text retained in natives, text files, load files, text-based PDFs. Use specialized computerized privilege checks for container (email family) consistency, exact-dup and near-dup identification.
9. Understand Your Case Facts & Issues How to identify key documents for depos, dispositive motions, case evaluations and settlement discussions. Particularly important if review has been mechanical/algorithmic (keyword dependent, groupings, predictive coding). Depo prep may be first time attorneys are really looking at evidence. Increasing need for 'early case analysis'; timelining; ID of key docs. 9. Understand Your Case Facts & Issues
9. Understand Your Case Facts & Issues 10. Evaluate and Adopt New Technologies Needed Technology created Big Data and will be needed for evolving solutions. Lawyers generally not the fastest in analyzing and adopting new technologies and modifying workflows. Approach Strategically adopt new ediscovery technologies. Plan time to evaluate and test new software and approaches in a non-emergency environment. Maintain personnel with expertise, within or without the organization, for recommendations and assistance.
Taming Big Data ediscovery UT 27th Annual Tech Law Conference May 22, 2014 Austin, TX Thank you. Questions & comments are welcome. Gene Albert Lexbe 512-686-3460 gene@lexbe.com