Predictive Coding, TAR, CAR NOT Just for Litigation February 26, 2015 Olivia Gerroll VP Professional Services, D4
Agenda Drivers The Evolution of Discovery Technology Definitions & Benefits How Predictive Coding Works The Ripple Effect Data and Information Governance Implications Use Cases & Considerations Selection & Technologies How Do Lawyers and Court View PC? Resources Q&A
Drivers Big Data - Growing FAST HOW MUCH DATA IS A PETABYTE
Drivers - The Technology is Not Unique As you select, the application zeroes in on what you like. 1 1 1 111 11 33 3333 1 1 1 2 3 3 Song 1 Song 2 Song 3
Evolution of Discovery Technology 1990 1995 2000 2005 2010 2015 Stand Alone Review Apps Document Imaging OCR Client/Server Review Tools Computer Forensics Automated Litigation Support Tape Restoration Web Based Review ASP or SaaS Auto Coding Email Thread & Near Dupe Detection Conceptual Search Visualization Systems Clustering & Categorization Legal Hold Management All in One Litigation Support Platforms Social Media and Cloud Collection Managed Services Predictive Coding & Assisted Review Natural Language Applications Artificial Intelligence?
TAR and CAR Technology-Assisted Review (TAR), or Computer- Assisted Review (CAR), is the use of advanced information retrieval technology that helps make the identification and review process more efficient TAR/CAR uses components of existing technologies to organize and sort documents by priority or relevance What differentiates TAR/CAR from other technologies are concept-based search engines and application of quantitative analysis.
Predictive Coding Defined Predictive Coding is one type of TAR/CAR Combines the efficiencies of concept search and statistics with the knowledge of human beings Uses an active machine learning approach or sometimes a support vector machine to distinguish relevant from non-relevant documents, based on decision made by a subject matter expert Uses established statistical principles to measure status and accuracy The technology can be used for applying Information Governance (IG) within a firm to both structured and unstructured data. One key component of predictive coding that differs from searching analytics is the methodology for training the technology that is used to automatically classify records and improve the accuracy and self-learning of predictive coding technology.
Benefits In the normal course of business documents are not organized by relevance With a predictive coding approach a Subject Matter Expert trains the software by coding individual documents responsive or not responsive, as the system samples the population Software calculates relevance scores for each document based on relevance
How It Works Matter expert is assigned to train the engine. The software initially selects a random sample of documents. The expert identifies relevant documents in the sample. The software analyzes the expert s input and creates a profile for relevant and irrelevant documents. The software generates new samples, each time learning more from the expert s input. The process repeats until the software determines it has sufficient information to scores all of the documents. The scores are then used to make informed decisions about the data management.
Predictive Coding Workflow - Discovery
Data Environment
The Ripple Effect Early use of predictive coding can be used to confidently impact settlement before heads-down legal review. www.ediscovery.com -Kroll Predictive coding is a natural way to assess and detect risk patterns, and stop them from developing further. Predictive coding can be utilized to enforce and create record retention policies.
Data and Information Governance Key problems for organizations Find information they need, when needed and in a cheap and efficient manner Have to have the information Must keep it till needed Find valuable information Destroy worthless or unessential information What is valuable? What is worthless?
Implications? Chucking Daisies Ten Rules For Taking Control of Your Organization s Digital Debris Kahn & Datskovsky ARMA International (2013) Ch 1: Stop Keeping Everything Forever Ch 2: Clean Up the Past to Gain Business Efficiency But how? Since people are storing yet even more, predictive coding can help separate the debris: from what is required to be kept. Backup tape reduction. Early Case Assessment. Big Data mining. Compliance investigations.
Considerations All document-related information governance and RIM initiatives rest on and depend upon consistent, comprehensive document classification. Without consistent, comprehensive classification, an organization can't determine what to keep, how long to keep it, who should have access to it, and where to store it. Replace manual classification decision making processes with technology Use predictive technology to create classification schemas for identifying and categorizing data currently in unstructured systems Predictive technology can identify areas of conflict in existing classifications and ensure consistency and uniformity going forward
Considerations Use your skilled experts for creating the appropriate data sets (or seed sets). The data sets should represent content from all information repositories. Product must be able to meet your end-user, IT and legal compliance requirements. Oversight and a comprehensive remediation plan, agreed upon by all stakeholders. Deployment should include a process to audit the application s decisions. Ideally - leverage internal ediscovery resources to help guide the deployment. Litigation technology experts have been working with this technology for years and can provide valuable insight into its usability and functionalities. The hybrid approach You do not have to choose between upstream or downstream data movement. Predictive coding is not a panacea, so any project needs to start with the establishment of an IG framework. See Slide18 Item 3 for resource content location
Information Management IG Data Governance Email Shared drives Local hard drives SharePoint DM Systems Extranets, intranets RIM Data Control & Management Email official record Retention and disposition Onboarding client file intake Off-boarding client file transfer Identify vital and/or historical records Legal Hold/preservation Security, conflict and risk remediation
How can Predictive Coding be Applied? Seed Set Context Human Interaction Validate and Automate Identify Existing Information Data that has been classified in accordance with the organizations RIM policies Leverage Context Use existing resources such as the DM or financial system to provide context to the process Manual Verification Records staff interact with the technology to validate findings and ensure validity of predictive coding assessments Validate & Fully Automate After the manual verification has been validated and/or corrections made the system can be let loose
Application - Information Governance and Records Retention: How to Start Three Key Steps Executive sponsorship that supports Information Governance Form a steering committee of key stakeholders across multiple departments IT Legal Records Management Compliance Security & Privacy, etc. Define global policies Committee must focus on the business processes, laws & regulations, departmental requirements needed to define the global policies needed to govern information within the organization.
Selection Factors Ensure that your environment is ready to implement the technology. Factor in the learning curve necessary to fully understand and effectively use the technology. Skilled resources: The tools are best used by people skilled in big data information analysis understanding the analysis and patterns and how to interpret the results. Ensure that the technology and environment are correctly secured especially when dealing with the cloud and internet access. Understand the technology: Dig under the hood How good are the algorithms inside the software at doing what we tell it to do in finding information?
Some Technologies Information Management: Equivio Recommind Autonomy Symantec IBM EMC CommVault Discovery Nuix Relativity IPRO Autonomy Recommind FTI Catalyst
How Does the Judiciary View PC? Da Silva Moore v. Publicis Groupe Court okayed parties agreement to use; 3.3M emails) Kleen Products v. Packaging Corp. of America Plaintiffs abandoned arguments in favor of PC and went Boolean Global Aerospace Inc. v. Landow Aviation, L.P. Court approved defendant use of PC over objections (2M emails) Actos (Pioglitazone) Products Liability Litigation Court affirmatively approved using PC for review and production EORHB, Inc., et al v. HOA Holdings, LLC Court orders parties to use PC and share an ediscovery vendor
Defensible Predictive Coding Using Da Silva is a Map: Senior attorneys must be involved Cooperate in devising approach Have a written protocol Share the Seed Set (maybe!) Refine repeatedly for accuracy Be transparent Bottom Line for Defensibility: Sampling, transparency, documentation
Resources 1. D4 Knowledge Center 2. The Grossman-Cormack Glossary of Technology-Assisted Review http://www.fclr.org/fclr/articles/html/2010/grossman.pdf 3. Chucking Daisies Ten Rules For Taking Control of Your Organization s Digital Debris ARMA Publication 4. Predictive Coding for Information Governance http://www.ironmountain.com/knowledge-center/reference-library 5. The Electronic Discovery Reference Model www.edrm.net The Sedona Conference Thesedonaconference.com
Questions? On behalf of everyone at D4 thank you ARMA Iowa for this opportunity to present. Olivia Gerroll VP, Professional Services Group OGerroll@d4discovery o 402.682.3771 m 402.547.0742