Concordance Tip Sheet August 2013 What Am I Looking At? Andy Kass Discovery is the process of requesting, producing and gleaning documents to substantiate assertions of fact in a case. Review is a deep, directed examination of discovery. The question is, what changes when we move from the world of file cabinets to file servers? The answer is both very little and quite a bit. The very little part goes to the heart of what the original purpose of discovery is: to find and analyze documents relevant to the litigation. The quite a bit part goes to the scale and nature of electronic media: how much of it there is, where and how it is stored, how it is recovered, and how it is related. Let s take a closer look, as mapped against the Electronic Discovery Reference Model (EDRM). This is a quick primer not a replacement for reading and understanding the relevant rules (see reference notes at the end). Identification, Preservation and Collection When a complaint is filed or a case otherwise impending, counsel for each side will have an interest in figuring out where relevant information might be found. This Identification phase may involve questionnaires, mutual disclosures, Rule 26(f) meet and confer activities, and so on. The purpose is to figure out who was involved (either as an individual or part of a company or department), the time frame in question, and the location of data for those custodians for that period. (Is everything on a SAN, or would some information only be available on tape? Are there personal systems or email accounts involved?) That s a start. Next, whoever has been identified as having a portion of potentially relevant information in their care, custody or control must officially be notified to Preserve it.
Concordance Tip Sheet August 2013 2 This preservation letter from counsel starts a legal hold meant to freeze all modification or deletion of potentially relevant documents (and in some cases, just about everything) until the next phase, collection, can take place. Counsel has the responsibility not only to notify, but to (a) read and understand all custodians document retention policies (to the extent that there are any) to determine whether any modification is required for the duration of the hold, and (b) to follow up and make sure that no one is going rogue. Preservation can, let s say, constipate data systems, so it is in everyone s interest to properly collect identified custodial resources in a timely, efficient and welldocumented manner. The type of Collection very much depends on the nature and value of the case, and how much the parties were able to accomplish at the Rule 26(f) conference. Sometimes voluntary self-production may suffice: an exported PST file, for example, may be all that is required in a case involving a relatively small claim. Nevertheless, always record every step taken, every filter applied, every application (and version) used, who used it, and when, and through whose hands it has passed. The other two main types of collection are Image, which copies every sector of a drive at a bit level, and Live, which collects only actual files currently recognized by the operating system (and Recycle Bin objects). The first is the most thorough you have the complete state of a system (where this type of collection is possible) as of a date and time certain. The second is leaner, may be done via remote collection, and requires less processing. There is often a temptation to get everything, but there are ramifications to that: There is a cost to extract a forensic image, which is done on a copy of the preserved forensic media. There is much more volume involved in culling an image, which has operating system, application, temporary and deleted files and slack space, in addition to user data. If you have essentially everything from a drive, there may be a presumption that everything can and should be produced (including deleted files), which can dramatically drive up the costs of production and review. Culling and Processing Culling, which may embrace the concept of Early Case Assessment, is a filtering process that takes the mass of electronic data at the top of the funnel through deduplication, date range, filetype, and sometimes keyword parameters to a smaller
Concordance Tip Sheet August 2013 3 amount at the bottom of the funnel. This process costs less than fully processing everything for review. Early Case Assessment goes a little deeper, providing more detailed filters (including the ability to find patterns of credit card, Social Security number and other private information), email thread grouping, deeper reporting, and even direct review tool output. Processing is the last step before the reviewer sees documents in Concordance or some other review software. Care needs to be taken to extract and otherwise supply: The needed metadata fields and descriptors (such as Custodian, Source, etc.); Extracted text (and determine whether OCR is to be tried where there is no text); Images, if explicitly requested this costs much more than native processing; A document numbering scheme with enough padding to cover any expected number of documents (or pages, if imaging); Link fields for Native and Text files, as required; and All load files necessary for the requested database format (e.g., Concordancedelimited DAT {for metadata} and OPT/LOG files {for images}) If all of the above goes swimmingly, you wind up with documents to review and evaluate, sometimes a great many at that. Review We have often enough discussed Concordance review in this space: the use of tags and direct coding to designate relevance, privilege, issues and the like. Since we are taking a 30,000 foot functional view here, let s just illustrate a couple of points about how what you are reviewing affects how you are reviewing and how a problem in any preceding step can make this more difficult than it needs to be. 1. Page Counts. If this is a native production, we don t really know. We can get page counts from PDFs; we can open Word files or PowerPoints, true; but how do you figure on the number of pages in an Excel spreadsheet? A true native production will have DOCIDs a control number for the entire document and will only have Bates numbers if and when TIFFed (as for production). 2. Spreadsheets. If you stipulate that searchable PDF with original document metadata is permissible production, you should be sure to make an exception for Microsoft Excel (and similar) spreadsheets. Unless you are only getting flat tables or address lists, spreadsheet files can have multiple worksheets with multiple dimensions, setting aside the question of whether you print your PDF worksheets vertically (as in
Concordance Tip Sheet August 2013 4 Columns A-F top-to-bottom, then G-L top-to-bottom, etc.) or horizonally (e.g., Rows 1-35 left-to-right, then rows 36-70, etc.). It is important as well to see formulas, and whether any columns, rows or text have been hidden. A further point: the fact that Concordance will import edocuments directly does not make it a good idea. While Concordance does leverage Microsoft Office on the administrator s workstation to ingest recognized file types, (a) it only grabs a small subset of metadata, (b) it recognizes MSG files only as Windows files and ignores embedded email metadata and attachments, and (c) in the case of Excel files, it extracts each worksheet in a spreadsheet file as a separate document. 3. Artifacts. In native file extraction, you will see files that you do not ordinarily run into (.XML files are a good example they may well have content of some sort, but not the kind you usually see, nor is it presented in a manner in which you would normally care to read it). There is generally a reason for this: most of these unusual files are system or coding file artifacts of some sort. Here s an example: Say you have two copies of an email that really ought to be identical. The sender was using Outlook with fonts, a logo, full HTML format; the recipient was using a plain text Web client. The date, subject, body, sender and recipient are the same, yet these are not duplicates. The reason is an attachment on the text version called ATT0000123.htm (the exact number changes). This file is an artifact of the Multipurpose Internet Mail Extension (MIME) mail specifications, which among many other things allow formatting, graphics, and binary files to be included as part of an email. Since the recipient was opening the message as text, the other baggage was included in what is essentially a little extra component box. The recipient replies to the sender, and the original sender might see an attachment called ATT0000532.txt, which is the plain text alternative of the message. This behavior prevents deduplication from doing its job of identifying perfect copies of an item. 4. Volume. Even after culling and deduplication, there may be a great deal more material to review than eyes and hour to review it. Technology-assisted review (TAR) tools may seem out of reach, but add-ons to Concordance such as Polaris near-duplicate assessment can help to gather documents with scalably similar linguistic marker sets without breaking a budget or requiring deep training. This is an overview. A subject of this importance wants deeper study of original sources. This ought to keep you out of trouble until the Fall session!
Concordance Tip Sheet August 2013 5 For further reading: www.edrm.net The Electronic Discovery Reference Model website https://thesedonaconference.org The Sedona Conference ediscovery Research and Educational resource website www.aceds.org Association of Certified E-Discovery Specialists site http://www.law.cornell.edu/rules/frcp Cornell Legal Information Institute Federal Rules of Civil Procedure The views expressed in this Concordance Tip Sheet are solely the views of the author, and do not necessarily represent the opinion of U.S. Legal Support, Inc. See Concordance Tip Archive (all the way back to October 2005!) at our Web Site at http://www.uslegalsupport.com/concordance-archive. Feel free to leave me a note, a comment, a suggestion or a Tip request, and of course, check into all the other great things U.S. Legal Support can do for you at www.uslegalsupport.com. -- Andy Kass akass@uslegalsupport.com 917-512-7503 U.S. LEGAL SUPPORT, INC. ESI & Litigation Services PROVIDING EXPERT SOLUTIONS FROM DISCOVERY TO VERDICT e-discovery Document Collection & Review Litigation Management Litigation Software Training Meet & Confer Advice Court Reporting Services At Trial Electronic Evidence Presentation Trial Consulting Demonstrative Graphics Courtroom & War Room Equipment Deposition & Case Management Services Record Retrieval www.uslegalsupport.com Copyright 2013 U.S. Legal Support, Inc., 425 Park Avenue, New York NY 10022 (800) 824-9055. All rights reserved. To update your e-mail address or unsubscribe from these mailings, please reply to this email with CANCEL in the subject line.