E-Discovery Tip Sheet Random Sampling In days past, one could look at a body of discovery and pretty well calculate how many pairs of eyeballs would be required to examine and code every document within a given period of time. When dealing with a few thousand or, heaven forfend, a few tens of thousands of documents, one could do a sort of spot check on coding, and double check productions, to be reasonably sure that all was well. If that seems quaint, you need to fortify yourself with some fresh strategies to better cope with the volume of modern electronic discovery. Now document assignment is the least concern, because most document review software has a tool for allocating the coding work at each review level. But how can an administrator fairly check the quality of the coding? Let s even go back a step: seed sets for technology-assisted review ( TAR, also known as predictive coding ) are under increasing scrutiny, since they provide the calculated assumptions for an entire review set in large-scale discovery. Seed sets are the 1,500 or so documents that a so-called subject matter expert (SME) reviewer an attorney with deep knowledge of the case at hand judges Relevant or Not Relevant in order to train some TAR software s categorizing algorithm. Since this forms the basis for what a review team even gets to see only a selected skim of documents rated most highly Relevant in light of the initial SME review there is a growing line of case law reviewing the discoverability of seed sets (see, e.g., Da Silva Moore v. Publicis Groupe et al, No. 11 Civ. 1279 (ALC)(AJP), 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24, 2012)). Leaving aside the question of whether an SME s relevance judgments constitute privileged work product (which is beyond my purview), choosing the set of documents
March 2014 E-Discovery Tip Sheet Page 2 that the SME is to examine involves more than dropping slips of paper into a bowl. But then again, the result should be as random, if not more so, and its defensibility beyond reproach. Pinning Down Randomness So how can you select data in a totally, demonstrably random way? Believe it or not, Microsoft Excel can help you do this. I should note here that for sampling larger data sets, you will definitely want Excel 2010 or later; prior versions simply don t afford you enough rows to get this done. I have a preliminary sample file, consisting of comma-separated values for just under 390,000 DOCIDs, FILENAMEs, and EXTENSIONs; all you really need are the DOCIDs, or any field containing a unique identifier for each document in the set to be sampled. And now, follow these simple steps: (1) Insert two new columns before DOCID by clicking on the heading of Column A to select the whole thing, right-click, and select Insert twice. Label the first column RANDOM (actually, the label is just for your reference); you may optionally label the second column RANDOMSORT. (2) Under the RANDOM column, in the first row having a DOCID number, enter the function =RAND(); this enters a random value between 0 and 1, evenly distributed, which changes upon recalculation or change in the spreadsheet.
March 2014 E-Discovery Tip Sheet Page 3 (3) Now COPY the first RANDOM function entry (A2) right-click and select Copy. (4) Select the remainder of the RANDOM column for which there are DOCIDs (click on the first below the copied cell and press SHIFT-CTRL-END, then keep the SHIFT key down and press the LEFT ARROW key three times so that only the first column is selected). (5) Right-click in the selected area and choose Paste from the menu. Different values will appear in each cell in the column: (6) Select the entire RANDOM column, right-click, and Copy. (7) Right-click at the top of the column to the right and choose Paste Special from the menu. (8) In the Paste Special window that pops up, click the Values radio button:
March 2014 E-Discovery Tip Sheet Page 4 (9) While the first set of =RAND() numbers will recalculate, that doesn t matter: you will now have the same numbers as in the original RANDOM column, but these will be hard-coded: (10) Nearly done: Select Column B, right-click and select Format Cells to make the contents of Column B a Number type with at least 10 decimal places (just to be sure!). This will even out the inconsistent random number lengths. (11) The next item is to sort Column B (your RANDOMSORT field) using Excel s DATA / Sort tool, making sure to Expand the selection so that all columns are selected. In the Sort window that pop open, make sure that you check My data has Headers and choose RANDOMSORT from the list:
March 2014 E-Discovery Tip Sheet Page 5 The results should look something like this: (12) The last bit is to select as many rows as are required for your sample and copy them out or at least the DOCID column. That is your random sample set. Random Meets The Road Different systems have different methods of pulling in a list of documents by identifier. Many read in a text list. Even Concordance can read in a list, as long as it is properly massaged into a.qry saved search format; in UltraEdit, Column Edit to insert SEARCH: DOCID = at the beginning of each line, and end up by connecting all the Search numbers (lines) together using OR.
March 2014 E-Discovery Tip Sheet Page 6 You can use your randomly-selected documents list to do a number of useful things: (a) Create a seed set for TAR expert review, as mentioned above. (b) Check a selection of documents initially identified as Not Responsive to verify that these judgments whether by algorithm or reviewer were accurate. (c) Quality check a random selection of coded documents of any type. When the numbers are large and fairness is paramount and when is it not? random selection can save the day. And when the tool to assist in this exercise is already present on your computer, there is little to stop you from taking one more step into electronic discovery. -- Andy Kass akass@uslegalsupport.com 917-512-7503 The views expressed in this E-Discovery Tip Sheet are solely the views of the author, and do not necessarily represent the opinion of U.S. Legal Support, Inc. U.S. LEGAL SUPPORT, INC. ESI & Litigation Services PROVIDING EXPERT SOLUTIONS FROM DISCOVERY TO VERDICT e-discovery Document Collection & Review Litigation Management Litigation Software Training Meet & Confer Advice Court Reporting Services At Trial Electronic Evidence Presentation Trial Consulting Demonstrative Graphics Courtroom & War Room Equipment Deposition & Case Management Services Record Retrieval www.uslegalsupport.com Copyright 2014 U.S. Legal Support, Inc., 425 Park Avenue, New York NY 10022 (800) 824-9055. All rights reserved. To update your e-mail address or unsubscribe from these mailings, please reply to this email with CANCEL in the subject line.