E-Discovery Tip Sheet

E-Discovery Tip Sheet LegalTech 2015 Some Panels and Briefings Last month I took you on a select tour of the vendor exhibits and products from LegalTech 2015. This month I want to provide a small brief that might provide a little more incentive to brave the cold and crowds next time around. Below I have digested one plenary session, and two vendor briefings from kcura on developments in their industry-leading review platform, Relativity 9. A. Legal Tech Panel Session: Taking TAR to the Next Level: Recent Research and the Promise of Continuous Active Learning This panel was comprised of Professor Gordon Cormack of University of Waterloo and Maura R. Grossman of Wachtell Lipton, co-authors of a cornerstone study of technology assisted review; Magistrate Judge Andrew Peck, a leading voice from the Federal bench on ediscovery issues; Susan Nielsen Hammond, General Counsel of Regions Financial Corporation; and moderator John Tredennick of big data review vendor Catalyst Systems. To boil down a deep and interesting discussion, the evolution and efficacy of several classes of computer assisted review were compared to the false gold standard (per Judge Peck) of linear manual review, and to each other. Ms. Grossman, followed up by Professor Cormack, used slides to illustrate the differences in process and efficacy of three different types of computer learning: > Simple Passive Learning (SPL): 1. Critical initial factors are (a) seed set selection random vs. judgmental; and (b) number of documents in the seed set.

March 2015 E-Discovery Tip Sheet Page 2 2. Review and code seed set (by Expert, i.e., senior attorney on case). 3. Feed expertly-coded seed set to algorithm; evaluate machine vote effectiveness and training result. 4. Repeat as required to stabilize results ( till the Popcorn stops popping, or the stability does not materially change). 5. When done, run results against entire document set. 6. Review documents auto-coded Responsive or above confidence ranking percentile cut-off. 7. Team chooses next set to review. > Simple Active Learning (SAL): 1. Create Control Set think of as a Responsive key for benchmark. 2. Critical factors in seed set selection are random vs. judgmental, and number of documents in set, as above. 3. Review and code seed set (by Expert ). 4. Use machine learning algorithm to select documents from which it will learn the most (ambiguous content). 5. Still an iterative process until stable (all the popcorn is popped). > Continuous Active Learning (CAL): 1. Seed set (initial training set) selection is judgmental; also dependent on number of documents in set. Inferentially, some initial document counts have been calculated that seem to create a stable set under multiple circumstances (between about 5,000 and 14,000). Example given was to put in one or more party s Request for Production as part of the set. 2. Machine learning algorithm based upon review. (a) Review and code newly suggested documents and add to training set.

March 2015 E-Discovery Tip Sheet Page 3 (b) Repeat until substantially all documents have been reviewed. 3. Iterative, constant review and feedback. Professor Cormack noted, in reviewing the higher recall of CAL, that search term-based seed sets contain a built-in STOP, limited by the keyword hits, even within TAR. Analyses were offered of recall versus effort in a first-level doc review: for example, 56,000 documents were required in SPL to reach the same level of recall as 5,000 documents in SAL. Ms. Hammond added a practical perspective on the theoretical and judicial discussions: in regulatory practice, precision is vital. Having used most of the types of tools under discussion, she noted that testing is needed for determining good seed sets, and continues to be required as new terms arise during review. She recommended a blended approach continuing to engage human intelligence. A toolkit and resources for the Cormack & Gordon SIGIR 14 report, including 4 Text Retrieval Conference (TREC) 09 Enron databases which were part of those used for the controlled comparison of SPL, SAL and CAL cited, are available for free under the GPL at trec.nist.gov, among other sources. B. kcura Relativity Briefings 1. The Mobile Attorney: Working with Key Documents Using Relativity Binders. Relativity v8 and later can export and synchronize Binder data with an ipad in this Mobile and Web application that helps consolidate critical case documents. Binders are locked behind the Apple encryption keychain for security. Relativity field settings control metadata, docket or coding output, with contents based upon a Saved Search. Binder users must already be licensed Relativity users. Among the limited palette of features available to mobile Binder users are: - Annotations (highlight, note, draw, control colors and thickness only see own); - Organization (create Sections, drag and drop); - Search of metadata or text (builds an index on the ipad, with highlights on hits; must use UPPERCASE only for Boolean AND, OR, NOT);

March 2015 E-Discovery Tip Sheet Page 4 - Offline Access (sync with Relativity as Backup, visible only to individual Binders user, via HTTPS or SSL); - AirPlay ipad Binder info can be wirelessly projected to Apple TV; and - Binders on Web (Binder viewer, track changes, sync across multiple devices). One can do incremental Binder builds, with updates and additions; won t remove anything, though. Apple ios will warn on space, and can set auto-expire to clear. With Relativity 9, users will be able to publish to Binders, even push a single doc to a pre-made Binder. There will also be mobile device management and security configuration, as well as added Notifications, Favorites, Preview before download (but no filesize parameter); the beta is due in March/April 2015. Must have Native Imaging Server (the processing bit add-on module, which requires additional servers) to use Binders. This is NOT a collaborative tool at this point. 2. Relativity Analytics Overview. The presenter discussed analytics in case workflow as ideal where there is a short time line, such as in a Federal second request on a prospective merger, and a lot of data to get through. She cited that the average case here was about 1M docs, and the top 1000 cases were about 3.8M docs. Relativity Analytics is thus intended to (a) investigate an unknown data set for doc types, languages, and find related documents; (b) evaluate large sets of data and prioritize; or (c) structure documents by batching out clusters. The presenter broke it out as follows: > Email Threading (based on Content Analyst) identify a group within a conversation; display groupings; show master inclusive email (indicated by a solid dot). > Near Duplication organization of highly similar text into relational groups with percentage of similarity; used for review batching or conflict check, or to find subtle differences in language between documents. > Language Identification Determine primary and up to 2 secondary languages per document; report percentage of text in each language found; handles 172 languages and

March 2015 E-Discovery Tip Sheet Page 5 dialects. Used to assign documents to language review teams, create grand total charts and reports, and further classification. One cannot exclude text, at least in Relativity 8. The above fall into the category of Document Organization and Structure. Next are Conceptual Analytics: > Latent Semantic Analysis mathematical assessment of language learned from documents in the current case, based upon concepts, not words - aboutness (about a plan, RFP re subject, precis of blog post content), versus more common - is-ness (metadata, keyword, proximity, document type, author). > Search using example sentence, paragraph, entire document to return documents related in concept, based on ideas and thus conceptual relevancy to get around false keyword hits, misspellings and code words. > Keyword expansion submit a term to list conceptually-related items - Develop a search term list (synonyms). - Learn language of a case (jargon/ new terms / idiom). - Revealing code words and variations. Last are the Review and QC analytics: > Clustering group documents by concept and visual hierarchy (a title is provided for each cluster of 4 words found together). One can then batch out by cluster (# of docs, score e.g. 0.65). The process runs an index of all documents in the workspace, or by custodian, or by set submitted for Analytics clustering. This facilitates Mass actions, e.g., Mass Tag a certain cluster Not Relevant. One can batch out either using or overriding the Family Field Group identifier. > Categorization Based upon expert user-defined examples or categories, using example documents from Relativity Assisted Review. Use for Prioritization, sorting large volumes quickly, or creating a pivot table to visualize clusters against categories. Under the Indexing & Analytics Tab, can set example source (e.g., Tag), maximum

March 2015 E-Discovery Tip Sheet Page 6 categories per document, minimum coherent score (default = 70%) and issue designation. The above notes represent a tiny fraction of what was on offer at LegalTech. The show truly is one place and time where legal technology people, knowledge and commerce converge. Hope to see you there next year! -- Andy Kass akass@uslegalsupport.com 917-512-7503 The views expressed in this E-Discovery Tip Sheet are solely the views of the author, and do not necessarily represent the opinion of U.S. Legal Support, Inc. U.S. LEGAL SUPPORT, INC. ESI & Litigation Services PROVIDING EXPERT SOLUTIONS FROM DISCOVERY TO VERDICT e-discovery Document Collection & Review Litigation Management Litigation Software Training Meet & Confer Advice Court Reporting Services At Trial Electronic Evidence Presentation Trial Consulting Demonstrative Graphics Courtroom & War Room Equipment Deposition & Case Management Services Record Retrieval www.uslegalsupport.com Copyright 2015 U.S. Legal Support, Inc., 425 Park Avenue, New York NY 10022 (800) 824-9055. All rights reserved. To update your e-mail address or unsubscribe from these mailings, please reply to this email with CANCEL in the subject line.