NOTES. Cohasset Associates, Inc. 2015 Managing Electronic Records Conference 8.1



Similar documents
JaERM Software-as-a-Solution Package

Small Business Cloud Services

Assessing authentically in the Graduate Diploma of Education

Distributions. (corresponding to the cumulative distribution function for the discrete case).

Helicopter Theme and Variations

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report

Hillsborough Township Public Schools Mathematics Department Computer Programming 1

Treatment Spring Late Summer Fall Mean = 1.33 Mean = 4.88 Mean = 3.

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process

Experiment 6: Friction

Anthem Blue Cross Life and Health Insurance Company University of Southern California Custom Premier PPO 800/20%/20%


Econ 4721 Money and Banking Problem Set 2 Answer Key

Corporate Compliance vs. Enterprise-Wide Risk Management

Enterprise Risk Management Software Buyer s Guide

How To Set Up A Network For Your Business

Unit 29: Inference for Two-Way Tables

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

Economics Letters 65 (1999) macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

Software Cost Estimation Model Based on Integration of Multi-agent and Case-Based Reasoning

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Small Business Networking

Why is the NSW prison population falling?

The Definite Integral

Reasoning to Solve Equations and Inequalities

Health insurance exchanges What to expect in 2014

Introducing Kashef for Application Monitoring

Math 135 Circles and Completing the Square Examples

How To Network A Smll Business

According to Webster s, the

Operations with Polynomials

Small Business Networking

Active & Retiree Plan: Trustees of the Milwaukee Roofers Health Fund Coverage Period: 06/01/ /31/2016 Summary of Benefits and Coverage:

Small Business Networking

2001 Attachment Sequence No. 118

Utilization of Smoking Cessation Benefits in Medicaid Managed Care,

AntiSpyware Enterprise Module 8.5

Recognition Scheme Forensic Science Content Within Educational Programmes

Health insurance marketplace What to expect in 2014

Small Business Networking

Data quality issues for accounting information systems implementation: Systems, stakeholders, and organizational factors

Algebra Review. How well do you remember your algebra?

CHAPTER 11 Numerical Differentiation and Integration

SyGEMe: Integrated Municipal Facilities Management of Water Ressources Swiss Geoscience Meeting, Neuchâtel, 21 novembre 2009 k

Integration. 148 Chapter 7 Integration

Unleashing the Power of Cloud

QUESTIONNAIRE. 1. Your Name: 2. Age-group: Below 25 years years. 3. Gender : Male Female. 4. Education : H.S.C or Below H.S.C.

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY


Roudmup for Los Angeles Pierce College ADIV Program ancl csu Dominguez Hilk Rlt-B^sr/ progrum

Pre-Approval Application

Small Businesses Decisions to Offer Health Insurance to Employees

Regular Sets and Expressions

TITLE THE PRINCIPLES OF COIN-TAP METHOD OF NON-DESTRUCTIVE TESTING

San Mateo County ACCEL Adult-Education College and Career Educational Leadership AB 86 Adult Education Consortium Project Management Plan 24,

Numeracy across the Curriculum in Key Stages 3 and 4. Helpful advice and suggested resources from the Leicestershire Secondary Mathematics Team

trademark and symbol guidelines FOR CORPORATE STATIONARY APPLICATIONS reviewed

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics STRATEGIC SECOND SOURCING IN A VERTICAL STRUCTURE

Techniques for Requirements Gathering and Definition. Kristian Persson Principal Product Specialist

baby on the way, quit today

4.11 Inner Product Spaces

COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT

Decision Rule Extraction from Trained Neural Networks Using Rough Sets

6.2 Volumes of Revolution: The Disk Method

Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Graphs on Logarithmic and Semilogarithmic Paper

E-Commerce Comparison

GAO HOME MORTGAGE INTEREST DEDUCTION. Despite Challenges Presented by Complex Tax Rules, IRS Could Enhance Enforcement and Guidance

Application Bundles & Data Plans

ffiiii::#;#ltlti.*?*:j,'i#,rffi

Portfolio approach to information technology security resource allocation decisions

Combined Liability Insurance. Information and Communication Technology Proposal form

Health insurance exchanges What to expect in 2014

2. Transaction Cost Economics

The Relative Advantages of Flexible versus Designated Manufacturing Technologies

Warm-up for Differential Calculus

Balanced Scorecard. Linking Strategy to Actions. KPMG Swiss Practice Benchmarking Congress, Bürgenstock May 28 th, 1997, Roger Jaquet

July 2005, NCJ Substance Dependence, Abuse, and Treatment of Jail Inmates, Highlights. No dependence or abuse

persons withdrawing from addiction is given by summarizing over individuals with different ages and numbers of years of addiction remaining:

Week 11 - Inductance

Lecture 3 Gaussian Probability Distribution

NQF Level: 2 US No: 7480

ACCOUNTING FACULTY RESEARCH COLLABORATION: A STUDY OF RELATIONSHIP BENEFITS AND GENDER DIFFERENCES

A generic Decision Support System for integrated weed management

Section 5-4 Trigonometric Functions

Engineer-to-Engineer Note

Modeling POMDPs for Generating and Simulating Stock Investment Policies

9 CONTINUOUS DISTRIBUTIONS

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

How fast can we sort? Sorting. Decision-tree model. Decision-tree for insertion sort Sort a 1, a 2, a 3. CS Spring 2009

Transcription:

Cohsset Assocites, Inc. Expnding Your Skill Set: How to Apply the Right Serch Methods to Your Big Dt Problems Juli L. Brickell H5 Generl Counsel MER Conference My 18, 2015 H5 POWERING YOUR DISCOVERY GLOBALLY WWW.H5.COM INFO@H5.COM TEL: 1.866.999.4215 Corporte Dt Loctions Internl Enterprise dt sources Externl Mnged Externl Cloud Employee sources Externl Gmil Gmil Google Docs Google Docs 2 Identify The End Gme Gols differ Find prticulr documents Find prticulr document types Segregte wht is needed from wht is not Illuminte drk dt Defensibly dispose of unneeded informtion 2015 Mnging Electronic Records Conference 8.1

Cohsset Assocites, Inc. Prepre to Use Effective Methods Methods lign regrdless of purpose; tools my not Know wht you need Employ the right expertise to find it The right tools The right methods Fine tune for diverse sources Securely dispose, if disposing Serch Superior to Mnul Review Richmond Journl of Lw nd Technology (2011) Overll, the myth tht exhustive mnul review is the most effective nd therefore, the most defensible pproch to document review is strongly refuted. Technology-ssisted review cn (nd does) yield more ccurte results thn exhustive mnul review, with much lower effort. TECHNOLOGY ASSISTED REVIEW IN E DISCOVERY CAN BE MORE EFFECTIVE AND MORE EFFICIENT THAN EXHAUSTIVE MANUAL REVIEW Mur R. Grossmn Gordon V. Cormck XVII RICH. J.L. & TECH. 11 (2011), http://jolt.richmond.edu/v17i3/rticle11.pdf, p.48 Serch Superior to Mnul Review Richmond Journl of Lw nd Technology (2011) Of course, not ll technology-ssisted reviews re creted equl. The prticulr processes found to be superior in this study re both interctive, employing combintion of computer nd humn input. TECHNOLOGY ASSISTED REVIEW IN E DISCOVERY CAN BE MORE EFFECTIVE AND MORE EFFICIENT THAN EXHAUSTIVE MANUAL REVIEW Mur R. Grossmn Gordon V. Cormck XVII RICH. J.L. & TECH. 11 (2011), http://jolt.richmond.edu/v17i3/rticle11.pdf, p.48 2015 Mnging Electronic Records Conference 8.2

Cohsset Assocites, Inc. Serch Results Vry Widely TREC Interctive Tsk Results 2008 2010 1.0 0.8 High Recll High Precision Precision 0.6 0.4 2008 2009 2010 0.2 Keyword Serch (Blir & Mron,1985) Mnul Review (Grossmn & Cormck, 2011) 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Recll TREC: Ntionl Institute of Stndrds nd Technology Text Retrievl Conference Legl Trck Are you meeting your gols? The metrics Recll nd Precision Recll A mesure of how complete the results of retrievl effort re. Recll nswers the question: Out of ll the documents tht retrievl ws chrged with finding, wht proportion did the retrievl succeed in ctully finding? Precision A mesure of how on trget the results of retrievl effort re. Precision nswers the question: Out of ll the documents retrievl identified s responsive, wht proportion ws ctully responsive? Vlidting Retrievl Effort Scientific Perspective There re stndrd informtion retrievl metrics tht nswer the key questions bout the qulity of document retrievl effort: recll nd precision. There re ccepted smpling methodologies for obtining estimtes of recll nd precision in reltively costeffective wy. Smpling pproches to vlidtion offer considerble flexibility nd control over key input prmeters, so tht, in ny given circumstnce, if you think through your rel informtion need, you cn find test tht strikes the optiml blnce between the informtion gined nd the resources required. 2015 Mnging Electronic Records Conference 8.3

Cohsset Assocites, Inc. Technology for Retrievl Incresingly Accepted In the three yers since D Silv Moore, the cse lw hs developed to the point tht it is now blck letter lw tht where the producing prty wnts to utilize TAR for document review, courts will permit it. Rio Tinto PLC v. Vles S.A., Cse 1:14 cv 03042 RMB AJP (Mrch 2, 2015) Methodology Mtters Know the gol Lern the dt popultion Design pproprite smpling process Vet the smple documents Use the knowledge to improve the smple Use the gol nd knowledge to select tool Choose pproprite tools Use pproprite, itertive methodology Vlidte results Design pproprite smpling process Vet the smple documents Estimte recll nd precision; iterte process s needed Sttistics Supports Knowledge nd Choice Yield Estimte Estimte of trget documents in dt set Dt set 100,000 documents 1000 doc smple 15,000 trget docs estimted yield 150 150/1000 trget docs in smple = 15% Hence estimted 15,000/100,000 trget docs in dt set 2015 Mnging Electronic Records Conference 8.4

Cohsset Assocites, Inc. Smpling: Qulity of Smple Affects Qulity of Serch Results document sources skewed smple H D q R i h m b Q 2 M A G d F G y v m g c 2 v p j l A 2 B N 2 s e x f 2 Z t g P w u k r G o j M b d k i g h A f c e smple prmeters to drw smple vry by sitution different deprtments different dtes rolling collection multiple issues Serch is run on n index Token Loctions Sme serch queries provide different results depending on the tool Google Exct serch Algorithmic serch ction 3:1; 24:10; 45:112; ll 3:5; 4; 23 ccountnts 2:2; 41::33 business 2:3; 4::56 conferences 3:12; 7:1; 88:5; 95:1 dte 1:1; 4:1; 5:3; 8:13 dec 1:3; 155:9 Not ll words re indexed smoking Boolen Serch: Keywords or Serch Strings o known o djustble o over-inclusive -- nchor o under-inclusive dd terms o trgets specific lnguge cough! mlise sore throt trffic congest! Relevnt Documents: common cold sneez! runny nose llergies flu fever loss w/3 ppetite virus computers 2015 Mnging Electronic Records Conference 8.5

Cohsset Assocites, Inc. Serch Strings: words you cn red TreC09_204_ST_Retention_Deletion BM enron #w5 [dt, documents, e{ }mil{s}, record{s}, evidence{s}, info{rmtion}, cop[y, ies], file{s}] #w10 [shred{s, ded, ding}, destroy{s, ed, ing}] Mthemticl Serch Algorithms count nd weight words nd other tokens: clustering, concept, predictive Document 1 totl α o unknown o imbedded o hrd to djust o over-inclusive o under-inclusive o groups or rnks bsed on prevlent lnguge β Document 2 totl Vlidtion of Recll looking in the discrd pile A look tht flls short of vlidting Recll Look in the discrd pile, but do not tie results bck to fullcollection prevlence (i.e., do not tie to Recll). Exmple: Look in the discrd pile, find tht 1 out of every 100 documents is ctully responsive. Tht s good, isn t it? It depends Good, if full collection prevlence ws 10%. Bd, if full collection prevlence ws 1%. A look tht tht truly vlidtes Recll Look in the discrd pile, nd do tie the results bck to full collection prevlence (i.e., tie to Recll). 2015 Mnging Electronic Records Conference 8.6

Cohsset Assocites, Inc. Sttistics Supports Knowledge nd Choice Smple of results 46% recll: 7,000/15,100 More trget documents missed thn tgged. Tgged Dt 10,000 documents Not Tgged Dt 90,000 documents 1000 doc smple 700 1000 doc smple 10,000 x 70% correct = 7,000 trget docs tgged 90,000 x 9% missed = 8,100 trget docs missed 90 The Biomet Exmple In re: Biomet M2 Mgnum Hip Implnt Products Libility Litigtion (MDL 2391) Bckground Defendnt followed two stge review process: Keyword culling (+ dedupliction); 19.5m 3.9m 2.5m Predictive coding (+ humn review). Plintiffs contend tht the keyword culling step left lot of responsive documents behind nd tht predictive coding should be pplied to the full 19.5m collection. Defendnt objects, sying tht, while they re willing to entertin dditionl keywords from plintiffs nd to produce dditionl nonprivileged docs from the 2.5m culled in subset, they re not willing to pply predictive coding to the full 19.5m collection. The Biomet Exmple The Judge s Ruling The Judge ruled in fvor of Defendnt, bsed in prt on: Rules nd stndrds governing discovery process Proportionlity considertions Numbers derived from Defendnt s smpling of the collection. With regrd to the numbers, the Judge observed: Smpling of the full collection found tht between 1.37% nd 2.47% of the full collection ws responsive; Smpling of the discrd pile left behind by the keyword cull found tht between 0.55% nd 1.33% of the discrd pile ws responsive. Therefore, were Defendnt to pply predictive coding to the full collection, comprtively modest number of [dditionl responsive] documents would be found (p. 5). 2015 Mnging Electronic Records Conference 8.7

Cohsset Assocites, Inc. Wht the numbers sy The Biomet Exmple Between 267,150 nd 481,650 responsive documents reside documents in the full collection. Between 85,800 nd 207,480 responsive documents reside in the set left behind by the keyword cull. Tking the midpoints of the two rnges, Recll of keyword cull is: 60.8%; o i.e., nerly 40% of the responsive documents re left behind by the keyword cull. Recll of the entire two stge process only flls further when the predictive coding stge is tken into ccount. o Assuming predictive coding chieves 70% recll ( generous ssumption), overll recll of the two stge process is 42.6%; over hlf of the responsive documents re left behind. Implictions? Are there business, legl or ethicl implictions rising from the qulity of the retrievl? Likely. Business, Legl or Ethicl Implictions? Serch designers rely on keywords creted in conference room bsed on ssumptions bout how the business might discuss the trgeted content. As result, lrge mount of responsive dt is missed. Smpling would hve demonstrted the gp. Serch design comprises over inclusive keywords or technology, resulting in retrievl of vst, lrgely off trget dt set. Smpling would hve demonstrted the overge. Algorithmic serch tool used in investigtion differentites bsed on most prevlent lnguge in documents nd rnks very low documents with nunced lnguge indicting frud. Clustering lgorithm groups stndrd form contrcts together but misses informl greements with business prtners. Records mngement exercise plns disposl of pper copies believed to overlp with scnned electronic copies. Records re required by regultors. Smpling exercise to compre sets is improperly designed. 2015 Mnging Electronic Records Conference 8.8

Cohsset Assocites, Inc. You need to drw on the right kinds of expertise if you re going to get sound nswer H5 POWERING YOUR DISCOVERY GLOBALLY WWW.H5.COM INFO@H5.COM TEL: 1.866.999.4215 References TREC 2008 Overview of the Legl Trck http://trec.nist.gov/pubs/trec17/ppers/legal.overview08.pdf TREC 2009 Overview of the Legl Trck http://trec.nist.gov/pubs/trec18/ppers/legal09.overview.pdf TREC 2010 Overview of the Legl Trck http://trec.nist.gov/pubs/trec19/ppers/legal10.overview.pdf Blir nd Mron 1985 Blir, Dvid C., nd M. E. Mron. 1985. An Evlution of Retrievl Effectiveness for Full Text Document Retrievl System. Communictions of the ACM 28 (3): 289 299 Grossmn nd Cormck 2011 Mur R. Grossmn & Gordon V. Cormck, Technology Assisted Review in E Discovery Cn Be More Effective nd More Efficient Thn Exhustive Mnul Review, XVII RICH. J.L. & TECH. 11 (2011), http://jolt.richmond.edu/v17i3/rticle11.pdf In re: Biomet In re: Biomet M2 Mgnum Hip Implnt Prods. Lib. Litig., NO. 3:12 MD 2391, Order Regrding Discovery of ESI (N.D. Ind. Apr. 18, 2013) H5. 2014 2015 Mnging Electronic Records Conference 8.9