Recent Developments in the Law & Technology Relating to Predictive Coding

Similar documents
MANAGING BIG DATA IN LITIGATION

The Truth About Predictive Coding: Getting Beyond The Hype

THE PREDICTIVE CODING CASES A CASE LAW REVIEW

PREDICTIVE CODING: SILVER BULLET OR PANDORA S BOX?

Predictive Coding, TAR, CAR NOT Just for Litigation

Technology Assisted Review of Documents

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

How Good is Your Predictive Coding Poker Face?

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

Intermountain ediscovery Conference 2012

The Case for Technology Assisted Review and Statistical Sampling in Discovery

Technology- Assisted Review 2.0

Review & AI Lessons learned while using Artificial Intelligence April 2013

Electronically Stored Information in Litigation

Technology Assisted Review: The Disclosure of Training Sets and Related Transparency Issues Whitney Street, Esq. 1

A Practitioner s Guide to Statistical Sampling in E-Discovery. October 16, 2012

E-discovery Taking Predictive Coding Out of the Black Box

case 3:12-md RLM-CAN document 396 filed 04/18/13 page 1 of 7 UNITED STATES DISTRICT COURT NORTHERN DISTRICT OF INDIANA SOUTH BEND DIVISION

Predictive Coding Defensibility

Quality Control for predictive coding in ediscovery. kpmg.com

Document Review Costs

Traditionally, the gold standard for identifying potentially

Three Methods for ediscovery Document Prioritization:

REDUCING COSTS WITH ADVANCED REVIEW STRATEGIES - PRIORITIZATION FOR 100% REVIEW. Bill Tolson Sr. Product Marketing Manager Recommind Inc.

Software-assisted document review: An ROI your GC can appreciate. kpmg.com

Mastering Predictive Coding: The Ultimate Guide

Case: 1:12-cv Document #: 137 Filed: 07/29/14 Page 1 of 11 PageID #:1365

Case 2:14-cv KHV-JPO Document 12 Filed 07/10/14 Page 1 of 10 IN THE UNITED STATES DISTRICT COURT FOR THE DISTRICT OF KANSAS

The United States Law Week

Predictive Coding Helps Companies Reduce Discovery Costs

The Tested Effectiveness of Equivio>Relevance in Technology Assisted Review

Navigating Information Governance and ediscovery

THE NEW WORLD OF E-DISCOVERY

Discussion of Electronic Discovery at Rule 26(f) Conferences: A Guide for Practitioners

Technology-Assisted Review and Other Discovery Initiatives at the Antitrust Division. Tracy Greer 1 Senior Litigation Counsel E-Discovery

Navigating E-Discovery, And The

White Paper Technology Assisted Review. Allison Stanfield and Jeff Jarrett 25 February

Introduction to Predictive Coding

Managed Services: Maximizing Transparency and Minimizing Expense and Risk in ediscovery and Information Governance

Measurement in ediscovery

Any and all documents Meets Electronically Stored Information: Discovery in the Electronic Age

IN THE UNITED STATES DISTRICT COURT FOR THE DISTRICT OF CONNECTICUT

Case 2:14-md EEF-MBN Document 1785 Filed 12/17/15 Page 1 of 5 UNITED STATES DISTRICT COURT EASTERN DISTRICT OF LOUISIANA

Digital Government Institute. Managing E-Discovery for Government: Integrating Teams and Technology

AN E-DISCOVERY MODEL ORDER

ZEROING IN DATA TARGETING IN EDISCOVERY TO REDUCE VOLUMES AND COSTS

E-Discovery Guidance for Federal Government Professionals Summer 2014

GUIDELINES FOR USE OF THE MODEL AGREEMENT REGARDING DISCOVERY OF ELECTRONICALLY STORED INFORMATION

Assisted Review Guide

Case 1:12-cv JG Document 404 Entered on FLSD Docket 03/18/2014 Page 1 of 14

ABA SECTION OF LITIGATION 2012 SECTION ANNUAL CONFERENCE APRIL 18-20, 2012: PREDICTIVE CODING

Making Sense of E-Discovery: 10 Plain Steps for Producing ESI

The Business Case for ECA

SAMPLING: MAKING ELECTRONIC DISCOVERY MORE COST EFFECTIVE

2011 Winston & Strawn LLP

Xact Data Discovery. Xact Data Discovery. Xact Data Discovery. Xact Data Discovery. ediscovery for DUMMIES LAWYERS. MDLA TTS August 23, 2013

Supreme Court Rule 201. General Discovery Provisions. (a) Discovery Methods.

What Happens When Litigation Starts? How Do You Get People Not To Generate the Bad Documents?

DISCOVERY OF ELECTRONICALLY-STORED INFORMATION IN STATE COURT: WHAT TO DO WHEN YOUR COURT S RULES DON T HELP

Transcription:

Recent Developments in the Law & Technology Relating to Predictive Coding Presented by Paul Neale CEO Presented by Gene Klimov VP & Managing Director Presented by Gerard Britton Managing Director 2012 DOAR Litigation Consulting www.doar.com

Discovery Obligations Proper Timing & Implementation of Litigation Hold Identification of sources of ESI Identification of custodians Implementation of preservation protocols Designation of point person Signed acknowledgements by custodians Rule 26 Conference Scheduling and Planning Initial Disclosures Meet-and-Confer Privilege and Work-Product Protection Form of Production 2012 DOAR Litigation Consulting www.doar.com

Meet and Confer Disclosure of sources of ESI Indication of Not reasonably accessible sources Disaster Recovery Systems (Backup tapes/archives) Legacy Systems Transient or Ephemeral data Proprietary systems and/or atypical ESI Culling methodologies Keyword search terms, Advanced Analytics and/or Predictive Coding Protective order, clawback agreement & 502 protection Form of production 2012 DOAR Litigation Consulting www.doar.com

What is Predictive Coding? Predictive Coding (aka TAR, CAR, CBAA) is the use of technology to assist in the categorization of documents to reduce the time and cost associated with document review and production. TAR Technology Assisted Review CAR Computer Assisted Review CBAA Content Based Advanced Analytics

Predictive Coding Process An iterative process that is designed to provide a sample set of documents to the human reviewer(s) that is most knowledgeable about the subject matter so that predictive coding software can learn from the reviewer and replicate the reviewer s judgment across the entire document population. Sample Review Train No Threshold Met? Yes Validate

How does this actually work?

Traditional Culling Total Data Date Filter Keywords Responsive False Hits Missed Hits

Traditional Review Contract Attorney Contract Attorney Contract Attorney Contract Attorney

Training the System Senior Attorney

Training the System Irrelevant Train Senior Attorney Responsive Privileged

Training the System Senior Attorney

Training the System Irrelevant Train Senior Attorney Responsive Privileged

Training the System Senior Attorney

Training the System n th Iteration Accuracy Threshold Equilibrium is reached when the system predicts the same coding for documents as the human reviewer X% of the time. 1 st 2 nd 3 rd 4 th 5 th... n th Iteration X = Preset Threshold

Yield Calculation Industry Terminology Initial random sample that is reviewed to determine a baseline for responsiveness. This is used to compare to overall system performance at the end of the process. Seed Sets Exemplar documents known to be responsive that are used to train the system.

Iterative Training Industry Terminology Random Completely random selection of documents from the entire universe. (NOTE: Data normalization is important) Suggested System derived document groups based on concepts and similar documents based on seed sets and previous iterations.

Training the System Sample Calculation 1 Sample Size = Z 2 p (1 - p) c 2 Z = Z value (e.g. 1.96 for 95% confidence level) p = percentage picking a choice, expressed as decimal (.5 used for sample size needed) c = confidence interval (margin of error), expressed as decimal (e.g.,.04 = ±4) 1 http://www.surveysystem.com/sscalc.htm

Sample Size v. Population Training the System 18,000 16,000 99% Accuracy ± 1% 14,000 12,000 Sample Size 10,000 8,000 6,000 4,000 2,000 0 99% Accuracy ± 2% 95% Accuracy ± 2% 10,000 50,000 250,000 500,000 1,000,000 2,000,000 5,000,000 10,000,000 Population

Measuring Accuracy The elements of a method s level of accuracy in the Information Retrieval context are Recall and Precision. Recall = Number of actually responsive documents retrieved Number of responsive documents overall Out of the total number of responsive documents that exist in the population, how many did the process retrieve properly?

Measuring Accuracy The elements of a method s level of accuracy in the Information Retrieval context are Recall and Precision. Precision = Number of responsive documents retrieved Number of documents retrieved How accurate was the process in classifying responsiveness in what was retrieved?

F-Measure (or F 1 Score) Measuring Accuracy The weighted harmonic mean of precision and recall, the traditional F-measure or balanced F-score. F 1 Score = 2 Precision Recall Precision + Recall A type of average that is calculated to determine how well a specific content retrieval method performs.

Measuring Accuracy Example: Calculations 100 Relevant System Retrieved Relevant: 100 Irrelevant: 0 Recall = 100 100 = 1.0 900 Irrelevant Precision = 100 100 = 1.0 1,000 Documents F 1 Score = 2 1.0 1.0 1.0 + 1.0 = 1.0

Measuring Accuracy Example: Calculations 100 Relevant System Retrieved Relevant: 100 Irrelevant: 50 Recall = 100 100 = 1.0 900 Irrelevant Precision = 100 150 = 0.667 1,000 Documents F 1 Score = 2 0.667 1.0 0.667 + 1.0 = 0.4

Measuring Accuracy Example: Calculations System Retrieved 100 Relevant Relevant: 40 Irrelevant: 10 Recall = 40 100 = 0.4 900 Irrelevant Precision = 40 50 = 0.8 1,000 Documents F 1 Score = 2 0.8 0.4 0.8 + 0.4 = 0.46

Measuring Accuracy Total Corpus Larger gap indicates lower Recall Responsive Smaller gap indicates higher Precision Irrelevant Documents Correctly Identified as Irrelevant Relevant Documents Correctly Identified as Relevant Predicted Relevant Documents Incorrectly Identified as Irrelevant Irrelevant Documents Incorrectly Identified as Relevant

Validation Responsive Irrelevant Privileged Random Sample Senior Attorney

Validation Responsive Irrelevant Privileged Random Sample Recall & Precision Senior Attorney Calculation of both Precision and Want Recall to from have a statistically High significant Precision random and sample of all documents. Reasonable Recall

Validation Responsive Irrelevant Privileged Random Sample Senior Attorney

Validation Responsive Irrelevant Privileged Random Sample Elusion Senior Attorney A zero acceptance test would require that NO responsive documents be found in the Irrelevant sample.

System Validation The final analysis of the result set predicted to be both responsive & non-responsive through measurement of a sampled set to ensure that the process meets the criteria for accuracy. Recall & Precision The calculated average between Recall & Precision that indicates that the vast majority of responsive documents were located accurately Elusion The percentage of the rejected documents that are actually responsive. This would allow both parties to choose what is an acceptable level. Z-test The comparison of responsive documents across a randomly sampled set at the start of the process compared to another randomly sampled set at the end of the process.

The Current State of Predictive Coding Rapidly evolving segment of the electronic discovery market Gaining momentum in law firms and corporations Fragmented market of providers On the edge of judicial approval

Seminal Cases DA SILVA MOORE, et al. v. PUBLICIS GROUPE PLC, et al. SDNY CASE NO. 11-CV-1279 Joint ESI protocol filed by the parties (with objections) First opinion approving defendants use of predictive coding issued by M. Judge Peck Objections filed by plaintiffs regarding the lack of measurability and inability to validate the process

Seminal Cases KLEEN PRODUCTS LLC, et al. v. PACKAGING CORPORATION OF AMERICA, et al., ND ILL. CASE NO. 1:10-CV-5711 Plaintiffs attempting to force defendants to use CBAA Multiple defendants with multiple ESI plans Hearing before M. Judge Nolan held on February 21, 2012 focusing on measurability and validation August 21, 2012 - Plaintiffs withdrew their demand

Seminal Cases GLOBAL AEROSPACE INC., et al., v. LANDOW AVIATION, L.P., et al. (Nos. CL 61040, CL 61991, CL 64475, CL 63795, CL 63190, CL 63575, CL 61909, CL 61712, 71633) Plaintiffs submitted motion for protective order to disallow the Defendant s use of Predictive Coding Defendant s protocol contained transparency and referenced precision, recall and steps for validation and measuring quality. Judge James H. Chamblin issued order approving the Defendant s use of Predictive Coding Case settled shortly thereafter

Seminal Cases Actos (Pioglitazone) Products Liability Litigation, United States District Court for the Western District of Louisiana, MDL No. 2299 Order issued embodying agreed upon protocol for a Search Methodology Proof of Concept process to test predictive coding output: Random sample seed set Required met and confer conferences to resolve issue Iterative Process Elusion ( Test the Rest ) Validation Order issued Case Management Order issued; includes predictive coding protocol

Seminal Cases Federal Housing Finance Agency v. SG Americas, et al. (S.D.N.Y) JPMC and Plaintiffs Unable to Agree on Use of Predictive Coding Defendants Approached J. Cote Regarding Use of Predictive Coding (7/20/12) Draft Protocol Submitted by JPMC Proposing Use of PC After Search Terms Plaintiff Demanded Review of ALL Documents Predicted to be Non- Responsive Court Ordered All Parties to Meet and Confer to Attempt Agreement Parties Have Agreed to Seed Set and Continue to Negotiate PC Process

Seminal Cases TSI, Incorporated v. Azbil BioVigilant, Inc. (D.C. AZ) Judge David Campbell Orders Party to Consider Predictive Coding Parties File Joint Submission (8/24/12) Heavy Reliance on Da Silva Moore Scope of ESI Too Small Due to Court-Imposed Discovery Limits Cost & Time Too Much with Predictive Coding Use of Search Terms More Effective

Key Considerations Lack of a definable standard Inconsistency of methodologies & technologies Use requires new levels of transparency by producing party Validation of results can only occur once costs are sunk Darwinian effect is inevitable Focus on validation of precision & recall

www.doar.com (800) 875-8705