1 Through A Lawyer s Lens: Measuring Performance in Conducting Large Scale Searches Against Heterogeneous Data Sets in Satisfaction of Litigation Requirements II Workshop University of Pennsylvania Philadelphia, PA Jason R. Baron Director of Litigation Office of General Counsel National Archives and Records Administration October 26, 2006
2 Searching Through E-Haystacks: A Big Challenge
3 Overview ESI Overload ESI & The New Federal Rules Issues in Search and Info. Retrieval: A Case Example Challenges Additional Sources
4 The problem: heterogeneous electronically stored information (ESI) Word processing, spreadsheets, all applications Integrated with voice mail & VOIP Instant messaging Web pages including intraweb sites Blogs, wikis, and RSS feeds Backup tapes, hard drives Removable media, thumb drives & new storage devices Remote PDAs, blackberrys Metadata of all types
5 ESI and the New Federal Rules Electronically stored information as a new category subject to Rule 34 requests for documents Rule 34(b): request may specify form in which ESI is to be produced, subject to objection
6 ESI and the New Federal Rules Rule 26(f) At the parties planning meeting, issues expected to be discussed include: Any issues relating to disclosure or discovery of electronically stored information, including the form or forms in which it should be produced Any issues relating to preserving discoverable information
7 The Sedona Conference Working Group on Best Practices for Electronic Document Retention and Production
8 Sedona Principle 11 A responding party may satisfy its good faith obligation to preserve and produce potentially responsive electronic data and documents by using electronic tools and processes, such as data sampling, searching, or the use of selection criteria, to identify data most likely to contain responsive information.
9 Case Precedent on Automated Searches -- Current cases emphasize keyword searches under the rubric of search protocols (Treppel v. Biovail) -- Case law on employing concept searches and other alternative search methodologies is expected to emerge over time
10 Real life example: U.S. v. Philip Morris et al. Civil lawsuit brought by Clinton Administration against tobacco companies in 1999 RICO case racketeering allegation that companies have conspired since 1953 to defraud the American public as to the true health effects of smoking Judge Kessler s August 17, 2006 decision
11 U.S. v. Philip Morris E- discovery 1,726 Requests to Produce propounded by tobacco companies on U.S. (30 federal agencies, including NARA) for tobacco related records Along with paper records, records were made subject to discovery 32 million Clinton era records government had burden of searching
12 NARA keyword searches Original Search Terms Used By NARA on database: Tobacco Philip Morris Cigarette R.J. Reynolds Smoking Brown and Williamson <Tar> BAT Industries Nicotine Liggett group smokeless Synar Amendment
13 Problem with false positives Marlboro BUT NOT Upper Marlboro (Maryland) PMI (for Philip Morris Institute) BUT NOT presidential management intern TI (for Tobacco Institute) BUT NOT do re me...
14 Example Search String (((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboro AND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR ("brown & williamson") OR bat industries OR liggett group)
15 Litigation Targets + Defining relevance + Maximizing Recall + Maximizing Precision + Dealing with fuzzy language concerns + Recognizing Inherent ambiguity & variation in all texts
16 Beyond Boolean: Alternative Search Methods Probabilistic models (Bayesian) Fuzzy Search Models Statistical methods (clustering) Machine learning approaches to semantic representation Categorization tools: taxonomies and ontologies Visualization techniques & social network analysis
17 Information Integration Challenges Convincing lawyers and judges that automated searches are not just desirable but necessary in response to large e-discovery demands
18 Challenges (con t) Having all parties and adjudicators understand that the use of automated methods does not guarantee all responsive documents will be identified in a large data collection (i.e., Recall 1.000)
19 Challenges (con t) Parties making a good faith attempt to collaborate on the use of particular search methods
20 Challenges (con t) Being open to using new and evolving search and information retrieval methods and tools
21 Challenges (con t) Designing an overall review process which maximizes the potential to find responsive documents in a large data collection (no matter which search tool is used)
22 TREC Legal Track National Institute of Standards and Technology (NIST) Text Retrieval Conference + New legal track project for 2006: evaluating efficacy of search methods in a legal discovery context + Results to be reported in Nov. 2006
23 Case law re: keywords & search protocols Recent case law where judges have intervened to adjudicate or order search protocols: Balboa Threadworks v. Stucky, 2006 WL (D. Kan. Mar. 24, 2006) Medtronic v. Sofamor Danek, Inc. v. Michelson, 229 F.R.D. 550 (W.D. Tenn 2003) Treppel v. Biovail Corp., 233 F.R.D. 363 (S.D.N.Y. 2006) See also Zubulake v UBS Warburg LLC, 229 F.R.D. 422 (S.D.N.Y. 2004) (discussing keyword searches generally)
24 References Baron, Toward A Federal Benchmarking Standard for Evaluating Information Retrieval Products Used in E-Discovery, 6 Sedona Conference Journal 237 (2005) (available on Westlaw and LEXIS) K. Withers, Electronically Stored Information: The December 2006 Amendments to the Federal Rules of Civil Procedure, 4 Nw. J. of Tech. & Intell. Prop. 171 (2006), available at /3
25 Additional References + Collaborative Expedition Workshop #45, Advancing Information Sharing, Access, Discovery and Assimilation of Diverse Digital Collections Governed by Heterogeneous Sensitivies, held Nov. 8, 2005, see ollections_heterogeneoussensitivities_11_08_05 + NIST/TREC Legal Track 2006 home page, see + Sedona Conference, The Sedona Principles: Best Practices Recommendations & Principles for Addressing Electronic Document Production (2005 version), Principle 11, see blications_html. + Sedona Conference, Commentary on Search & Retrieval Issues (forthcoming 2007)
26 Jason R. Baron Director of Litigation Office of General Counsel National Archives and Records Administration 8601 Adelphi Road Suite 3110 College Park, MD (301)
INFORMATION INFLATION: CAN THE LEGAL SYSTEM ADAPT? George L. Paul * Jason R. Baron** Cite as: George L. Paul and Jason R. Baron, Information Inflation: Can the Legal System Adapt?, 13 RICH. J.L. & TECH.
TECHNOLOGY-ASSISTED REVIEW IN E-DISCOVERY CAN BE MORE EFFECTIVE AND MORE EFFICIENT THAN EXHAUSTIVE MANUAL REVIEW By Maura R. Grossman * & Gordon V. Cormack ** Cite as: Maura R. Grossman & Gordon V. Cormack,
Cataloging and Metadata Education: A Proposal for Preparing Cataloging Professionals of the 21 st Century A response to Action Item 5.1 of the Bibliographic Control of Web Resources: A Library of Congress
White Paper May 2006 Applying Electronic Records Management in the Document Management Environment: An Integrated Approach Written by: Bud Porter-Roth Porter-Roth Associates Table of Contents Introduction
Records Management Best Practices Guide A Practical Approach to Building a Comprehensive and Compliant Records Management Program Protecting and Managing the World s Information. Since 1951, Iron Mountain
Public Access Plan U.S. Department of Energy July 24, 2014 ENERGY.GOV Table of Contents Background... 3 Authority... 3 Public Access to Scientific Publications... 4 Scope... 4 Requirements... 5 Applicability...
SOCIAL MEDIA What Every Litigator Needs to Know Social media content can make or break a case and should be carefully considered at every stage of litigation. As with other emerging technologies, jurisprudence
United States Government Accountability Office Report to the Subcommittee on the Legislative Branch, Committee on Appropriations, U. S. Senate March 2015 INFORMATION TECHNOLOGY Copyright Office Needs to
Policy Brief No. 5, June 2014 Fencing Out Knowledge Impacts of the Children s Internet Protection Act 10 Years Later Kristen R. Batch Office for Information Technology Policy Office for Intellectual Freedom
Department of Defense INSTRUCTION NUMBER 5015.02 February 24, 2015 DoD CIO SUBJECT: DoD Records Management Program References: See Enclosure 1 1. PURPOSE. This instruction reissues DoD Directive (DoDD)
Chapter 1 Introduction to Recommender Systems Handbook Francesco Ricci, Lior Rokach and Bracha Shapira Abstract Recommender Systems (RSs) are software tools and techniques providing suggestions for items
Global Network Initiative Public Report on the Independent Assessment Process for Google, Microsoft, and Yahoo Global Network Initiative Protecting and Advancing Freedom of Expresssion and Protecting and
TABLE OF CONTENTS Introduction... 3 The Importance of Triplestores... 4 Why Triplestores... 5 The Top 8 Things You Should Know When Considering a Triplestore... 9 Inferencing... 9 Integration with Text
Social Networks and Electronic Discovery by Daniel B. Garrie, Anna S. Park and Yoav M. Griver When a lawyer seeks evidence in civil or criminal litigation, his or her remit is to seek evidence wherever
com Disclosures How to Make Effective Disclosures in Digital Advertising Federal Trade Commission March 2013 Contents Overview................................................... i Introduction....1 The
GFI White Paper Why organizations need to archive email The underlying reasons why corporate email archiving is important Over the past few years, email has become an integral part of the business workflow.
General Principles of Software Validation; Final Guidance for Industry and FDA Staff Document issued on: January 11, 2002 This document supersedes the draft document, "General Principles of Software Validation,
United States Department of Justice Federal Bureau of Investigation Information Technology Strategic Plan FY 2010 2015 CIO s Vision to deliver reliable and effective technology solutions needed to fulfill
ediscovery & Information Management Compliance and Litigation Readiness System The economic climate sparks more frequent and rigorous audits, investigative probes, and legal proceedings in which your organization
Guidance for Industry Oversight of Clinical Investigations A Risk-Based Approach to Monitoring U.S. Department of Health and Human Services Food and Drug Administration Center for Drug Evaluation and Research
THE WEB ARCHIVING LIFE CYCLE MODEL The Archive-It Team Internet Archive March 2013 Principle authors: Molly Bragg Kristine Hanna Contributors: Lori Donovan Graham Hukill Anna Peterson Introduction 1 Introduction
FIRST Site Visit Requirements and Assessment Document originally produced by CERT Program at the Software Engineering Institute at Carnegie Mellon University And Cisco Systems PSIRT Revision When Who What
SECURITIES AND EXCHANGE COMMISSION 17 CFR Parts 241 and 271 [Release Nos. 34-58288, IC-28351; File No. S7-23-08] COMMISSION GUIDANCE ON THE USE OF COMPANY WEB SITES AGENCY: Securities and Exchange Commission.
A Cooperative Agreement Program of the Federal Maternal and Child Health Bureau and the American Academy of Pediatrics Acknowledgments The American Academy of Pediatrics (AAP) would like to thank the Maternal
State Mitigation Plan Review Guide Released March 2015 Effective March 2016 FP 302-094-2 This page is intentionally blank. Table of Contents List of Acronyms and Abbreviations... iii SECTION 1: INTRODUCTION...
METADATA STANDARDS AND METADATA REGISTRIES: AN OVERVIEW Bruce E. Bargmeyer, Environmental Protection Agency, and Daniel W. Gillman, Bureau of Labor Statistics Daniel W. Gillman, Bureau of Labor Statistics,
U.S. Department of Labor Office of Federal Contract Compliance Programs Technical Assistance Guide for Federal Supply and Service Contractors August 2009 U.S. Department of Labor Employment Standards Administration