1 ediscovery Document Review: Understanding the Key Differences Between Conceptual Searching and Near Duplicate Grouping White Paper by Kimberlee L. Gunning, Esq 1
2 INTRODUCTION As anyone who has spent time in the document review trenches will attest, duplicate and near-duplicate documents comprise a significant percentage of the electronic information requiring review. Several dozen copies of the same agreement, in different fonts because someone thought Times New Roman was classier than Arial. Seemingly endless chains, which started with an innocent query from a regional manager to district managers until those district managers forwarded it to their assistant managers, who copied their assistants, who hit reply all more often than not. We ve all been there, and it s not pretty. De-duplication technology, which culls out exact duplicates from electronic information to be reviewed, is well known. Near-duplicate grouping should be. Typically, between percent of electronic files in a case are near-duplicates. As the term suggests, nearduplicates are not exact copies of an electronic file but files with small difference. Examples include electronic documents with a few different words (different versions of an agreement or letter), electronic files with the same content but different formatting, and files with the same content but a different file type (a document in both Word and PDF format). Many litigation teams start out with traditional keyword searching and then advance to conceptual word searching. A step beyond traditional keyword searching, conceptual searching retrieves all documents that relate to a given subject. No one will dispute that conceptual searching helps provide perspective and context to a document review by permitting the litigation team to review related documents together. For example, if you are interested in documents regarding Defendant Corporation s Project XYZ, which seems very similar to your client s Project ABC, and which forms the basis for your client s trade secrets claim, it is helpful to review all documents regarding Project XYZ in context with one another. Reviewing notes from members of the Project XYZ development team and drafts of Project XYZ marketing materials can help bolster your client s claim that Defendant Corporation has misappropriated your client s Project ABC technology, changed the name, and marketed the project as its own. Although it goes a step beyond traditional keyword searching, conceptual searching has its own set of limitations. First and foremost, it does not solve the problem associated with near-duplicate documents. This white paper discusses the key differences between conceptual searching and near-duplicate grouping, and explains how using nearduplicate grouping, either alone or in conjunction with conceptual searches, can help you cope with a mountain of electronic files in an efficient and systematic manner. 2
3 NEAR-DUPLICATE GROUPING RESULTS IN MORE UNIFORM RESULTS THAN CONCEPTUAL SEARCHING Starting with a uniform near-duplicate group of documents, you can review just one representative document from each set of near-duplicates, without having to analyze each individual document retrieved as a result of a conceptual search. For example, if you are interested in the type of form letters Defendant Corporation is sending to its sales prospects, you can start with a set of near-duplicate form letters containing Defendant Corporation s sales pitch letters that may differ only by date and name and address of recipient and quickly determine the relevance of the s content for your case. Used in conjunction with a conceptual search tool, near-duplicate grouping provides additional structure to your search results. A reviewer can either skip the nearduplicates, as indicated above, or, depending on the nature of the case, analyze the differences between the near-duplicate documents by using a document comparison utility. NEAR-DUPLICATE GROUPINGS ARE MUTUALLY EXCLUSIVE, RESULTING IN GREATER EFFICIENCY A single document can belong to only one near-duplicate group. This mutually exclusive grouping provides for more consistency and efficiency when reviewing documents. Reviewing similar documents at the same time, and reviewing them only once, means your litigation team is more likely to categorize similar documents in the same way. With near-duplicate grouping, a single reviewer is more likely to analyze related drafts of a document and code these documents in the same way. Document review proceeds more quickly, and is less prone to error, when a document turns up only once and is not subject to being coded differently by a different reviewer, or by the same reviewer when their energy level begins to wane in the late afternoon. The mutually-exclusive nature of near-duplicate grouping enables you to quickly weed out documents with little or no relevance to the case, such as announcements of a company holiday party or multiple employee s discussing last weekend s baseball game. Such documents are the ediscovery equivalent of dandelions ubiquitous and hard to kill. Using near-duplicate grouping ensures you only have to pull these weeds once, thus decreasing costs. Finally, near-duplicate grouping helps you avoid the unfortunate scenario in which your team characterizes similar documents differently and thus withholds some nearduplicates from production while producing others. Characterizing similar documents differently for purposes of document production may cause the opposing party to challenge the adequacy of your efforts to review and produce all relevant non-privileged 3
4 documents, which could lead to a do-over of your document production, adding to the overall cost. NEAR-DUPLICATE GROUPINGS ARE PRE-PROCESSED, NOT LIMITED BY SUBJECTIVE REVIEWER DIFFERENCES Conceptual searches, which retrieve all documents related to a given subject, are improvised and ad hoc. As such, the results and usefulness of a conceptual search, like all term-based searching, are subject to varying levels of knowledge, judgment, and imagination among the different reviewers, not to mention plain old human error. By contrast, near-duplicate groupings are pre-processed persistent categories. Moreover, they are objective groupings built by software independent of subjective judgments. The ability to identify the textual differences between near-duplicates provides reviewers with a high level of confidence and assurance that the near-duplicate groupings make sense and can be trusted. For this reason, the near-duplicate groupings are a useful vehicle for organizing the review flow. The objectivity of the near-duplicate groupings ensures that the use of near-duplicates as an organizing parameter for the review is intuitive and acceptable to users. NEAR-DUPLICATE GROUPING SLAYS THE CHAIN DEMON is by far the most common type of electronic files in most cases. Near-duplicate grouping has unique advantages when applied to review of . Anyone who has reviewed seemingly endless chains or form mail will appreciate the ability to identify files that are near-duplicates. Typically, it is useful to review the last in a chain as it contains the previous and responses. On copied to several individuals, many of whom may not hit reply all, near-duplicate grouping, combined with thread technology, helps locate all in the chain regardless of whether recipients of the initial were copied on later replies and forwards. In short, near-duplicate grouping makes it easier to focus on documents that contain the complete chain and not just the initial or reply. Moreover, when reviewing near-duplicate groupings of , you re not limited by custodian (sender or recipient). Instead, you can review the exchange as it took place and analyze to whom certain messages and attachments were forwarded and when. CONCLUSION Whether proving your claims and defenses requires close comparison of nearly-identical documents or a quick and efficient method of isolating a few key documents, or both, 4
5 you should consider adding near-duplicate grouping to your electronic discovery toolbox. Electronic discovery usually begins with an unstructured mass of documents. Nearduplicate grouping provides a way out of this morass. ABOUT KIMBERLEE L. GUNNING Kimberlee L. Gunning is an attorney in Seattle, Washington (www.gunninglegal.com). Her practice focuses on employment advice and litigation, consumer class actions, and civil and administrative appeals. She often acts as co-counsel or contract attorney in complex litigation matters and has a special interest in the emerging law of ediscovery. ABOUT EQUIVIO Equivio develops text analysis software for e-discovery. Users include the DoJ, the FTC, KPMG, Deloitte, plus hundreds of law firms and corporations. Equivio offers Zoom, an integrated web platform for analytics and predictive coding. Zoom organizes collections of documents in meaningful ways. So you can zoom right in and find out what s interesting, notable and unique. Request a demo at or visit us at Zoom in. Find out. Equivio, Equivio Zoom, Equivio>NearDuplicates, Equivio> Threads, Equivio>Compare, Equivio>Relevance are trademarks of Equivio. Other product names mentioned in this document may be trademarks or registered trademarks of their respective owners. All specifications in this document are subject to change without prior notice. Copyright 2012 Equivio 5
White Paper ARCHIVING FOR EXCHANGE 2013 A Comparison with EMC SourceOne Email Management Abstract Exchange 2013 is the latest release of Microsoft s flagship email application and as such promises to deliver
White Paper How Long Should Email Be Saved? Sponsored by Symantec, Inc. Copyright 2007 Contoural, Inc. Table of Contents Introduction... 3 Considering Email retention... 3 Can IT Set Email Retention Policy?...
Forensic The case for statistical sampling in e-discovery January 2012 kpmg.com 2 The case for statistical sampling in e-discovery The sheer volume and unrelenting production deadlines of today s electronic
Recommended E-Discovery Practices for Federal Criminal Justice Act Cases 1 2 by Douglass Mitchell and Sean Broderick I. INTRODUCTION Federal litigation across the country is experiencing an explosion of
Recommendations for Electronically Stored Information (ESI) Discovery Production in Federal Criminal Cases Department of Justice (DOJ) and Administrative Office of the U.S. Courts (AO) Joint Working Group
Document Review Learning a New Discipline All Over Again Clinton P. Sanko Chattanooga, TN 423.209.4168 email@example.com Angela Lassiter Nashville, TN 615.726.5655 firstname.lastname@example.org Introduction.
Making Sense of E-Discovery: 10 Plain Steps for Producing ESI The following article provides a practical guide to producing electronically stored information (ESI) that lawyers can apply immediately in
Ba k e Offs, De m o s & Kicking t h e Ti r e s: A Pr a c t i c a l Litigator s Br i e f Gu i d e t o Eva l u at i n g Ea r ly Ca s e Assessment So f t wa r e & Search & Review Tools Ronni D. Solomon, King
A One TouchTM White Paper A unique approach to the management of discovery documents, combining centralized management, personnel, expertise and systems to enable and foster retention of work product and
WHITE PAPER ONTRACK POWERCONTROLS Recovering Microsoft Exchange Server Data MARCH 2015 2 KROLL ONTRACK ONTRACK POWERCONTROLS Table of Contents pg 3 / Why Recovering and Searching Email Archives Is Important
Electronic Discovery Without Borders Your Passport to Managing Multilanguage ESI Electronic Discovery Without Borders Your Passport to Managing Multilanguage ESI Written by Paul Brabant, Esq & Hope Haslam,
Recovering Microsoft Office SharePoint Server Data Granular search and recovery means time and cost savings. 2 Introduction 5 Recovering from Data Loss 6 7 Introducing Ontrack PowerControls for SharePoint
Computer Forensic Services and the CPA Practitioner 2010-2012 Forensic Technology Task Force 2010-2012 Forensic Technology Task Force Ron Box Margaret Daley Carl Hoecker Joel Lanz Charles Reid Donna Tamura
UNDERSTANDING ILLINOIS' NEW EDISCOVERY RULES By: Steven M. Puiszis Hinshaw & Culbertson LLP, Chicago I. INTRODUCTION On May 29, 2014, the Illinois Supreme Court formally adopted Rules relating to the discovery
DO VIRTUAL DATA ROOMS ADD VALUE TO THE MERGERS AND ACQUISITIONS PROCESS? by Dr. Christopher Kummer Vlado Sliskovic Institute of Mergers, Acquisitions and Alliances (MANDA) Zurich & Vienna, Dezember 2007
for the December 10, 2010 Contact Point Peter Vincent Principal Legal Advisor 202-732-5000 Reviewing Official Mary Ellen Callahan Chief Privacy Officer Department of Homeland Security (703) 235-0780 Page
White Paper Minimizing ediscovery Complexity Through Vendor Consolidation By Brian Babineau March, 2010 This ESG White Paper was commissioned by FTI Technology and is distributed under license from ESG.
WHITE PAPER: ARCHIVING AND EDISCOVERY........................................ Archiving and ediscovery: Real World ROI from Hard Cost Savings Who should read this paper This white paper is intended for
Email Management in Today s Regulatory Environment An Altegrity Company 1 Why Recovering and Searching Email Archives Is Important 3 Why Recovering and Searching Email Archives Is Difficult 5 How Ontrack
Foundations and Trends R in Information Retrieval Vol. 7, Nos. 2 3 (2013) 99 237 c 2013 D. W. Oard and W. Webber DOI: 10.1561/1500000025 Information Retrieval for E-Discovery By Douglas W. Oard and William
A White Paper By Bob Spurzem Mimosa Systems, Inc. May 2008 Email Archiving Implementation Five Costly Mistakes to Avoid INFORMATION IMMEDIACY, DISCOVERY & CONTINUITY CONTENTS Introduction...3 1. Server
How to Manage Costs and Expectations for Successful E-Discovery: Best Practices Mukesh Advani, Esq., Advisory Board Member, UBIC North America, Inc. UBIC North America, Inc. 3 Lagoon Dr., Ste. 180, Redwood
Auto-Grouping Emails For Faster E-Discovery Sachindra Joshi IBM Research, India email@example.com Prasad M Deshpande IBM Research, India firstname.lastname@example.org Danish Contractor IBM Research, India email@example.com
The Impact of Electronic Discovery on Corporations? Michael J. Powell L. Clint Crosby Look How Far We ve Come ESI ESI = Electronically Stored Information Any information that is stored in a medium from
IBM ediscovery Identification and Collection Turning unstructured data into relevant data for intelligent ediscovery Highlights Analyze data in-place with detailed data explorers to gain insight into data
REED COLLEGE ediscovery GUIDELINES FOR PRESERVATION AND PRODUCTION OF ELECTRONIC RECORDS TABLE OF CONTENTS A. INTRODUCTION... 1 B. THE LANDSCAPE OF ELECTRONIC RECORDS SYSTEMS... 1 1. Email Infrastructure...
C O N T E N T S 3 3 5 9 11 13 13 14 Digital Analytics Disasters: Campaign Management Challenges The Roadmap to Campaign Data Management Integrating Your Data Web Analytics Integration ª ª Google Analytics