The Truth About Predictive Coding: Getting Beyond The Hype

Similar documents

MANAGING BIG DATA IN LITIGATION

THE PREDICTIVE CODING CASES A CASE LAW REVIEW

Recent Developments in the Law & Technology Relating to Predictive Coding

How Good is Your Predictive Coding Poker Face?

case 3:12-md RLM-CAN document 396 filed 04/18/13 page 1 of 7 UNITED STATES DISTRICT COURT NORTHERN DISTRICT OF INDIANA SOUTH BEND DIVISION

Technology Assisted Review of Documents

PREDICTIVE CODING: SILVER BULLET OR PANDORA S BOX?

Intermountain ediscovery Conference 2012

Technology Assisted Review: The Disclosure of Training Sets and Related Transparency Issues Whitney Street, Esq. 1

Technology- Assisted Review 2.0

Traditionally, the gold standard for identifying potentially

The Predictive Coding Soundtrack: Rewind, Play, Fast-Forward

The Case for Technology Assisted Review and Statistical Sampling in Discovery

A Practitioner s Guide to Statistical Sampling in E-Discovery. October 16, 2012

Electronically Stored Information in Litigation

E-discovery Taking Predictive Coding Out of the Black Box

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

The United States Law Week

Review & AI Lessons learned while using Artificial Intelligence April 2013

Predictive Coding in Multi-Language E-Discovery

Predictive Coding Helps Companies Reduce Discovery Costs

Navigating Information Governance and ediscovery

Case 1:14-cv RMB-AJP Document 301 Filed 07/15/15 Page 1 of 6

Predictive Coding, TAR, CAR NOT Just for Litigation

REDUCING COSTS WITH ADVANCED REVIEW STRATEGIES - PRIORITIZATION FOR 100% REVIEW. Bill Tolson Sr. Product Marketing Manager Recommind Inc.

Three Methods for ediscovery Document Prioritization:

Power-Up Your Privilege Review: Protecting Privileged Materials in Ediscovery

White Paper Technology Assisted Review. Allison Stanfield and Jeff Jarrett 25 February

2011 Winston & Strawn LLP

Mastering Predictive Coding: The Ultimate Guide

SAMPLING: MAKING ELECTRONIC DISCOVERY MORE COST EFFECTIVE

E-Discovery Best Practices

Case 1:11-cv ALC-AJP Document 96 Filed 02/24/12 Page 1 of 49

Predictive Coding Defensibility

Quality Control for predictive coding in ediscovery. kpmg.com

The case for statistical sampling in e-discovery

Document Review Costs

Managed Services: Maximizing Transparency and Minimizing Expense and Risk in ediscovery and Information Governance

Predictive Coding: Emerging E Discovery Tool Leveraging E Discovery Computer Assisted Review to Reduce Time and Expense of Discovery

In my article Search, Forward: Will manual document review and keyword searches

E-Discovery in Michigan. Presented by Angela Boufford

COURSE DESCRIPTION AND SYLLABUS LITIGATING IN THE DIGITAL AGE: ELECTRONIC CASE MANAGEMENT ( ) Fall 2014

Xact Data Discovery. Xact Data Discovery. Xact Data Discovery. Xact Data Discovery. ediscovery for DUMMIES LAWYERS. MDLA TTS August 23, 2013

TECHNOLOGY-ASSISTED DOCUMENT REVIEW: IS IT DEFENSIBLE?

Software-assisted document review: An ROI your GC can appreciate. kpmg.com

Making Sense of E-Discovery: 10 Plain Steps for Producing ESI

Effective Protocols for Reducing Electronic Discovery Costs

Technology-Assisted Review and Other Discovery Initiatives at the Antitrust Division. Tracy Greer 1 Senior Litigation Counsel E-Discovery

E-Discovery Guidance for Federal Government Professionals Summer 2014

Minimizing ediscovery risks. What organizations need to know in today s litigious and digital world.

Predictive Coding Cases. 1. Da Silva Moore v. Publicis Groupe, 2012 U.S. Dist. LEXIS (SDNY, Feb. 24, 2012)

THE NEW WORLD OF E-DISCOVERY

Predictability in E-Discovery

The Tested Effectiveness of Equivio>Relevance in Technology Assisted Review

Transcription:

www.encase.com/ceic

The Truth About Predictive Coding: Getting Beyond The Hype

David R. Cohen Reed Smith LLP Records & E-Discovery Practice Group Leader David leads a group of more than 100 lawyers in his role as Practice Group Leader of Reed Smith s Records & E-Discovery group. He serves as e-discovery counsel for multiple companies and also counsels clients on records management and litigation readiness issues. David has been named a Pennsylvania Superlawyer in litigation and is Chambers-ranked nationally and internationally in the area of e- discovery. He is a frequent author and trains judges, mediators and lawyers in e-discovery issues. He has also been a court-appointed E- Discovery Special Master in multiple cases. M

Bryon Z. Bratcher Reed Smith LLP Director of Litigation Technology Services Bryon directs Reed Smith s global team of 25 Litigation Technology Analysts, drawing on more than a dozen years of experience in technology services for Am Law 100 firms. He assisted with the selection and implementation, and manages the firm s technologyassisted review tools, and in 2014 was named a winner of The Recorder s Law Firm Innovator award for co-developing Reed Smith s Periscope e-discovery metrics tool. M

Mark E. Harrington Guidance Software Senior Vice President, General Counsel & Corporate Secretary M

2014 kcura. All rights reserved. D

2014 kcura. All rights reserved. D

2014 kcura. All rights reserved. D

2014 kcura. All rights reserved. D

2014 kcura. All rights reserved. D

Agenda What is Predictive Coding? Why Predictive Coding? How Accurate is Human v. Predictive Coding? Barriers to Use of Predictive Coding Case Studies Current Hot Issues in Predictive Coding Takeaways D

What is Predictive Coding? a.k.a. TAR a.k.a. CAR, a.k.a. RAR Machine learning algorithms and statistical probability tools used to duplicate human decision making Software determines relevance after training by human reviewer Computer identifies properties to predict future coding Process continues until accuracy levels reach stability B

Technology-Assisted Review Reference Model Courtesy of: EDRM.net B

Workflow Overview Total Number of Documents 2,000,000 Documents Seed Set for Human Review 2,000 Training Round Results from Categorization Responsive 596,400 Non Responsive 1,391,600 10,000 Uncategorized QC of 1 st Round (Statistical Sample) 2nd Round of Categorization QC of 2 nd Round (Statistical Sample) Validation Criteria Not Met Responsive 635,178 3,068 3,068 Non Responsive 1,349,754 Validation Criteria Met QC Round Overturn Report Overturn Report B

The Numbers Behind the Statistics Sample Size 3,000 2,500 2,000 1,500 1,000 500 0 Confidence: 95% +/- 2.0 +/- 2.5 +/- 5.0 Log. (+/- 2.0) Document Count B

Why Predictive Coding? Cost savings Time savings Reduced risk of errors (?) Greater objectivity in classifications Sometimes volume of documents and/or value of case makes human review impractical D

Technology Assisted Review Universe of Available Documents D

Technology Assisted Review Relevant Documents Universe of Available Documents D

Technology Assisted Review Relevant Documents Universe of Available Documents Documents Selected D

Technology Assisted Review Relevant Documents Relevant Documents Mistakenly Missed (Poor Recall) Universe of Available Documents Irrelevant Documents Mistakenly Selected (Poor Precision) Documents Selected D

Myth #1 Computer Review Will Never Be As Accurate as Human Review D

Da Silva Moore v. Publicis Groupe & MSL Group 287 F.R.D. 182 (S.D.N.Y. 2012) Magistrate Judge Andrew J. Peck: while some lawyers still consider manual review to be the gold standard, that is a myth, as statistics clearly show that computerized searches are at least as accurate, if not more so, than manual review. D

Da Silva Moore v. Publicis Groupe & MSL Group 287 F.R.D. 182 (S.D.N.Y. 2012) Predictive Coding Was Appropriate Because: Parties Agreed Over 3 Million Documents Cost Effectiveness & Proportionality Transparent Process Proposed Spawned Huge Battle Over Protocol & Ultimate Motion to Recuse D

Da Silva Moore v. Publicis Groupe & MSL Group 287 F.R.D. 182 (S.D.N.Y. 2012) District Judge Approved Judge Peck s Proposal: The ESI protocol contains standards for measuring the reliability of the process and the protocol builds in levels of participation by Plaintiffs. It provides that the search methods will be carefully crafted and tested for quality assurance, with Plaintiffs participating in their implementation. D

Magistrate Judge Andrew Peck While this Court recognizes that computer-assisted review is not perfect, the Federal Rules of Civil Procedure do not require perfection. D

How Accurate is Human Coding? Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Maura R. Grossman & Gordon V. Cormack, XVII Richmond Journal of Law and Technology 11 (2011) Computer 77%, Humans 60% The myth that exhaustive manual review is the most effective approach to document review is strongly refuted. Technology-assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort. Technology-assisted reviews require human review of only 1.9% of the documents, a fifty-fold savings over exhaustive manual review. B

How Accurate is Human Coding? Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, Herbert L. Roitblat et al., 61 Journal of American Society for Information Science and Technology 70 (2010) Performance of two computer systems was at least as accurate (measured against the original review) as that of human re-review Level of agreement among human reviewers: 70-75% B

How Accurate is Human Coding? Faster, better, cheaper legal document review, pipe dream or reality? Thomas I. Barnett and Svetlana Godjevac, Autonomy, Inc. (2011) 28,209 documents reviewed by 7 different reviewer groups (5 document review vendors and 2 law firms) Responsiveness rates of review groups ranged from 23% to 54% Unanimity of agreement less than half of the time Inconsistency in 57% of results B

Look the computer did as well as the humans! M

Using search terms is so last decade. - Judge Shira Sheindlin BUT, is predictive coding always a viable option? M

Myth #2 D

Barriers to Use of Technology Assisted Review Not viable for cases with fewer than 10,000-20,000 documents requiring review Limited potential cost savings (e.g. not reliable for privilege) Risk of not getting opposing counsel agreement Time and expertise required to train computer Multiple case problem Unsympathetic judges/discovery masters Danger of losing key word filtering D

Kleen Products LLC v. Packaging Corp. of Am., 2012 WL 4498465 (N.D. Ill. Sept. 28, 2012) Plaintiffs requested court approval of predictive coding, defendant opposed Massive briefing and several days of hearings Plaintiff ultimately withdrew request as to current production requests Parties agreed to meet and confer regarding the search methodology for future production requests B

Kleen Products LLC v. Packaging Corp. of Am., 2012 WL 4498465 (N.D. Ill. Sept. 28, 2012) STIPULATION & ORDER RELATING TO ESI SEARCH As to any ESI beyond the First Request, plaintiffs will not argue that defendants should be required to use predictive coding methodology... With respect to any requests for production beyond the First Request Corpus, the parties will meet and confer regarding the appropriate search methodology to be used for such newly collected documents. If the parties fail to agree on a search methodology, either party may file a motion with the Court seeking resolution. B

Myth #3 D

Rio Tinto PLC v. Vale S.A. 14 Civ. 3042, (RMB) (AJP) (S.D.N.Y. March 2, 2015) Magistrate Judge Andrew Peck, revisiting his landmark decision in De Silva Moore three years later: the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it D

Rio Tinto PLC v. Vale S.A. 14 Civ. 3042, (RMB) (AJP) (S.D.N.Y. March 2, 2015) Observes that one TAR issue that remains open is how transparent and cooperative the parties need to be with respect to the seed or training set(s). In the absence of transparency, statistical estimation of recall and general quality control sampling can still be used to verify appropriate training of the software and secure satisfactory review outcomes D

Black Letter Law? A case law search for predictive w/2 coding returns 35 cases: 12 positive references, in commentary or tone 18 neutral references Often judicial approval of proposed ESI protocols 4 that utilized the term in a non-esi context Still gaining acceptance and momentum D

Global Aerospace Inc. v. Landow Aviation, L.P., 2012 WL 1431215 (Vir.Cir.Ct. April 23, 2012) Defendants requested permission to use predictive coding Plaintiffs opposed the request Order issued approving the use of predictive coding Work now concluded B

Global Aerospace Inc. v. Landow Aviation, L.P., 2012 WL 1431215 (Vir.Cir.Ct. April 23, 2012) 1.3 million docs after deduplication, 5,000 seeded Predictive coding identified 173,000 relevant docs 400 doc sample showed 80% precision Sample of 1.1 million irrelevant documents showed 2.9% relevant 31,000 missed relevant (over 80% recall) Time: 7 months/cost: $200,000 B

In re: Biomet M2a Magnum Hip Implant Products Liability Litigation Cause No. 3:12-MD-2391, (N.D. Ind., South Bend Div., April 18, 2013) Defendant Biomet used combination of electronic search functions to identify relevant documents Beginning universe was 19.5 million documents Used keyword culling and deduplication Reduced to 2.5 million Then employed predictive coding on those 2.5 million D

In re: Biomet M2a Magnum Hip Implant Products Liability Litigation Cause No. 3:12-MD-2391, N.D. Ind. (South Bend Division) April 18, 2013 Plaintiffs objected to this procedure -- requested that Biomet start over: Wanted Defendants to use predictive coding on all 19.5 million documents, with Plaintiffs and Defendants jointly training the software D

Biomet Resolution Court held that Biomet s methodology satisfied its obligations under F.R.C.P. 26(b)(2)(C) Likely benefits of going back to the 19.5 million document set would not outweigh burden and expense Assumed Biomet will remain open to additional reasonably targeted search terms If Plaintiffs wish to restart predictive coding process, Plaintiffs must bear the expense D

Progressive Casualty Insurance Co. v. Delaney 2014 WL 2112927 (D.Nev. May 20, 2014) Court approved a Joint ESI Protocol under which: Parties mutually agreed to search terms for universe of collected documents Progressive had option to produce all non-privileged documents: Captured by the agreed search terms; or Captured by the agreed search terms responsive to the Defendants' document requests, subject to proper objections M

Progressive Casualty Insurance Co. v. Delaney 2014 WL 2112927 (D.Nev. May 20, 2014) Progressive advised it would produce all docs Sept. Oct. 2013 Progressive produced nothing in six months Collected 1.8 million ESI docs, culled to 556,000 using search terms Began to review manually After review began, determined manual review was too time intensive and expensive Without informing Defendants or Court, used predictive coding to review only the 556,000 M

Progressive Casualty Insurance Co. v. Delaney 2014 WL 2112927 (D.Nev. May 20, 2014) Many have argued persuasively that the traditional ways lawyers have culled the documents for production manual human review, or keyword searches are ineffective tools to cull responsive ESI in discovery. Predictive coding has emerged as a far more accurate means of producing responsive ESI in discovery. Studies show it is far more accurate than human review or keyword searches which have their own limitations. M

Progressive Casualty Insurance Co. v. Delaney 2014 WL 2112927 (D.Nev. May 20, 2014) Progressive is unwilling to engage in the type of cooperation and transparency that is needed for a predictive coding protocol to be accepted by the court or opposing counsel as a reasonable method to search for and produce responsive ESI. Progressive is also unwilling to apply the predictive coding method it selected to the universe of ESI collected. The method described does not comply with all of Equivio's recommended best practices. M

Progressive Casualty Insurance Co. v. Delaney 2014 WL 2112927 (D.Nev. May 20, 2014) Had the parties agreed at the onset of this case to a predictive coding based ESI protocol, the court would not hesitate to approve a transparent mutually agreed upon ESI protocol. Ordered Progressive to produce the 565,000 hit documents culled from the use of the search terms, subject to privilege filters, the clawback provisions of FRCP 26(b)(5)(B), and FRE 502(d) and the existing ESI protocol. M

Case Study #1: Product Liability Case 3.5 million documents in Relativity Approximately 2 million had been reviewed Approximately an equal number of responsive vs. non-responsive documents Approximately 40 reviewers on case B

Barriers to Use of Predictive Coding Limited potential cost savings Difficult plaintiff s counsel MDL + numerous state cases Unsympathetic judges/discovery masters Danger of losing key word filtering B

How Could Predictive Coding Be Used? Accelerate the human review and improve our QC We could use predictive coding to accelerate the review, and check the human review It was impractical to use predictive coding as a substitute for human review in this case B

Case Study #1: Cost Analysis Docs/Hour Cost / Hour Total Records Total Cost Current 50 $39.50 2,000,000 $1,580,000 Cost Tier 1 44 $39.50 500,000 $448,863 Cost Tier 2 57 $39.50 1,200,000 $831,578 Cost Tier 3 80 $39.50 300,000 $148,125 TOTAL $1,428,566 Review Savings $151,434 Analytics Cost $60,000 Total Savings $91,434 B

Case Study #2 Client spinning off a division to become separate company Wants former employees to still access old e-mail Wishes to remove privileged documents from set to avoid waiver Perfection not required not an adversarial situation but needs defensible process B

Case Study #2 Total volume: Approximately 200,000 documents Document-by-document review and privilege determinations could cost up to $2 per document Total Cost: Up to $400,000 B

Case Study #2: Our Recommendations We recommended: search term filtering followed by sampling and predictive coding to identify and remove privileged documents Set budget of $30,000 B

Case Study #2: Our Process Following initial filtering, two experienced reviewers sampled hits and misses and adjusted filter terms to fine-tune filtering Reviewers then trained software on selected samples of the remaining hits Analytics accurately identified remaining documents most likely to be privileged Those results were then used for two additional iterations of filter fine-tuning B

Case Study #2: Results We were left with a document population that contains negligible privileged documents to make available to ex-employees Filtering was not perfect, but even human filtering is never perfect Client saved over 90% of the review costs, amounting to several hundred thousand dollars B

Current Hot Issues in Predictive Coding Do parties have to give advance notice and/or obtain consent from adversaries or the court? Should courts allow predictive coding where opposing parties don t consent? Is it okay to run keywords before starting the predictive coding? Should parties share their seed sets with opposing counsel, including irrelevant docs? What workflows are allowable or best? Must predictive coding meet Daubert standards? D

Takeaways Predictive coding is gaining acceptance by courts and will be used increasingly, with or without opposing party notice and/or consent Practical considerations continue to rule out primary reliance on predictive coding for many reviews Even when not replacing human review, predictive coding can still be useful for many purposes Non-adversary review situations Accelerating human review Improving quality control Finding key documents sooner D

Questions? David R. Cohen Bryon Z. Bratcher Mark E. Harrington drcohen@reedsmith.com bbratcher@reedsmith.com mark.harrington@guidancesoftware.com 412-288-1098 415-659-5948 626-229-9191 x4660 Thank you!

David R. Cohen Bryon Z. Bratcher Mark E. Harrington Practice Group Leader Director Senior Vice President, Records & E-Discovery Litigation Technology Services General Counsel & Corp. Secretary M