Predictive Coding in UK Civil Litigation White Paper by Chris Dale of the e-disclosure Information Project
THE PURPOSE OF THIS PAPER This paper considers the use of technology known as Predictive Coding to reduce the volumes of documents reviewed by lawyers for electronic disclosure in litigation in England & Wales. The paper looks at the formal obligations in the Civil Procedure Rules, and at judicial thinking, as well as at the technology itself. The aim is to encourage lawyers to appreciate that their duty of candour may be discharged by intelligent, cooperative and transparent use of such technology. The paper is written by former commercial litigation solicitor Chris Dale of the UK-based e-disclosure Information Project 1. The Project brings objective and informed comment to lawyers, judges, suppliers and clients aimed at encouraging the better use of technology in e-disclosure. The paper is written in conjunction with Equivio 2, a provider of analytical software for e-discovery. Equivio is a sponsor of the e-disclosure Information Project. EXECUTIVE SUMMARY This paper takes Master Whitaker s decision in Goodale v The Ministry of Justice as a starting point for an overview of the obligations of disclosure in England & Wales, the discretion given to judges, the scope for parties to agree limitations on the scope of disclosure and on the technology which is to be used to cull down the volumes of documents before disclosure is given. It includes reference to the position in the US in considering the question whether the lawyers can rely on technology of this kind in making their selections and still comply with their obligations. It ends with a look at how the user works with the technology and, in particular, how a lawyer can be satisfied with the output. This paper does not purport to explain how the technology works and, indeed, deliberately avoids the terms of art (such as precision and recall) which apply to the subject. The context is not statistical accuracy, important though that is, but the extent to which use of such tools allow a lawyer to satisfy himself, his opponent and the court that the strict rules of disclosure are complied with proportionately. The first step is to establish what the rules really involve. GOODALE V THE MINISTRY OF JUSTICE In Goodale & Ors v The Ministry of Justice & Ors 3, Senior Master Whitaker said this: 1 http://www.edisclosureinformation.co.uk/edisclosureproject.htm 2 www.equivio.com 3 Goodale & Ors v The Ministry of Justice & Ors [2009] EWHC B41 (QB) (05 November 2009) http://www.bailii.org/ew/cases/ewhc/qb/2009/b41.html 2
this [case] is a prime candidate for the application of software that providers now have, which can de-duplicate that material and render it down to a more sensible size and search it by computer to produce a manageable corpus for human review which is of course the most expensive part of the exercise. Indeed, when it comes to review, I am aware of software that will effectively score each document as to its likely relevance and which will enable a prioritisation of categories within the entire document set. This apparently simple paragraph connects a number of separate but related strands which are critical to the management of large volumes of documents in the UK civil litigation. These include: 1 The maximum scope of disclosure, that is, the widest range of document sources which include potential candidates for e-disclosure 2 The principles of proportionality which require that the cost of litigation should be proportionate to its value by reference to certain specific criteria 3 Balancing the obligations in a) with the principles in b) by reliance on: a b c The duties of cooperation and transparency required of solicitors both by the overriding objective and by the specific provisions in the Practice Direction 31B CPR The duty of active case management placed on judges, and their wide discretion The use of technology, and specifically technology of the type referred to in the judgment The latter is what is known generically as predictive or prioritisation software. In broad terms it records the judgment of a senior lawyer or subject-matter expert over a selection of documents and applies the results to the rest. Equivio s application is one of the best known. Like other Equivio products (EmailThreads 4 and NearDuplicates 5 ), its name reflects its function and it is called simply Relevance. The perceived problem lies in reconciling the lawyers duty to disclose everything which should be disclosed with the expense of complying with the traditional view of that obligation. Duty to the court and compliance with professional obligations is a serious matter; recent cases involving electronic disclosure have resulted in adverse costs orders and serious professional embarrassment, generating a climate which encourages lawyers towards over-disclosure. 4 http://www.equivio.com/product.asp?id=4 5 http://www.equivio.com/product.asp?id=5 3
At the same time, the UK perception of the US experience is of severe punishment in sanctions, including both financial penalties and adverse inferences which may cause a case to be lost. A quick reading of some UK cases may suggest that the UK is going the same way. A summary of the new Practice Direction 31B 6 appears to add duties and concomitant risks related to electronic disclosure. The technology seems daunting, and many lawyers are uneasy about what they see as delegation of their duties to a machine. All these propositions that the cases point to excessive risk, that the practice direction increases burdens, that the technology is difficult and that its use compromises one s duty are overstated or just wrong. The UK rules and cases and the US experience, properly understood, are consistent with the use of technology of the kind referred to in Goodale and, indeed, positively encourage it. The principal US concern is whether discovery is defensible, that is, whether the methods, tools and processes used are proof against attack from opponents and the court. The fear is articulated slightly differently in the UK and expressed more as Am I doing my duty?. The use of technology such as Master Whitaker describes is not merely common in the defensible US context but positively encouraged by the rules and cases (see the next section below). The adverse outcomes of cases in both jurisdictions generally derive not from the shortcomings of technology but from wider failures of duty. COMPARISON WITH THE US EXPERIENCE This paper is not intended as an exercise in comparison between the US and UK systems, but the US focus on defensibility and the risk of sanctions for discovery failings makes it worth mentioning the attitude of courts and lawyers there to technology use in making discovery decisions. This is well covered in a paper by Conor Crowley called Defending the Use of Analytical Software in Civil Discovery 7 which refers to cases and authoritative papers showing that the US courts are ready to accept the use of applications like this. The UK system imposes no less strict a duty, but disclosure disputes tend to be about higher levels of inclusion or exclusion - about the choice of custodians, date ranges or sources - rather than about the precise methods used to make detailed selections. There are many things about US e-discovery which the UK system will want to avoid deliberately overbroad selection for defensive or tactical reasons, and satellite litigation, for example but it is reasonable to suggest that a technology which is acceptable in the US is likely to be acceptable in the UK. 6 http://www.justice.gov.uk/civil/procrules_fin/contents/practice_directions/pd_part31b.htm 7 http://www.equivio.com/files/bna%20conor%20crowley%20article.pdf 4
THE UK DISCLOSURE DUTIES In some ways the burden on UK lawyers giving disclosure is greater than that on US lawyers. UK disclosure is not initiated by a request and subsequent battle over the scope of disclosure. Instead, the party giving disclosure decides for itself what is disclosable according to rules which define both what is expected and what limitations there are on the duty of search. The second major difference is that the test is not the broad one of "relevance"; instead a party must disclose documents which are supportive of or adverse to its own case or the case of any other party. The selection may be challenged subsequently by an application for specific disclosure, and the giving party must be able to justify its selection. It follows that a lawyer who relies on technology to help decide what is disclosed and what is omitted must be happy that the technology will give reliable results. A new Practice Direction 31B CPR of October 2010 is designed to ensure that there is collaborative discussion in advance of disclosure. It recites as a general principle that technology should be used in order to ensure that document management activities are undertaken efficiently and effectively (the first time such an express requirement has appeared in the rules). It requires parties to discuss, amongst other things, the categories of electronic documents and their sources, the scope of a reasonable search, and the "tools and techniques (if any) which should be considered to reduce the burden and cost of disclosure of electronic documents". Factors to be discussed include date ranges, custodians, keywords, the use of agreed software tools, and adopting a staged approach to disclosure. The high-level intent is that parties neither omit things which they ought to disclose nor (equally importantly) give disclosure of categories of documents which, whilst potentially disclosable, will not in fact add anything of value to consideration of the issues. It is implicit in the reference to "staged disclosure" that some categories of documents are more important than others, and the references to keyword searches and "agreed software tools" make it clear that technology may be used to achieve this. The Practice Direction provides a form of questionnaire which, in appropriate cases, is to be completed by the parties in order to inform the discussions. GOODALE AS A MODEL OF E-DISCLOSURE MANAGEMENT Before looking more closely at what predictive coding involves, it is worth saying a little more about the Goodale judgment. Most recent UK e-disclosure judgments have involved conduct, failures of principle or breaches of the rules, disagreements over keywords or classes of documents, failures to hold the required discussions, or inadequate identification of sources. They fuel the perception that electronic disclosure is a trap for the unwary, but it is to be noted that none of them has thrown doubt on the 5
efficacy of technology generally or any particular technology. The overall impression that this is a dangerous area contributes to the reluctance of many lawyers to give up the alleged gold standard of manual review. Many lawyers fail to distinguish the culling stages, which may remove whole classes of documents, from the review stage where it may still be necessary to read documents before giving disclosure. The aim, as the quoted paragraph from Goodale says, is to produce a manageable corpus for human review. Unlike other cases, Goodale was dealing with prospective disclosure not with what had already happened. The claimants wanted to get disclosure on the widest possible basis without, apparently, having applied their minds to what was likely to be necessary. The defendants simply asserted that it was disproportionate to look at their electronic documents at all. Master Whitaker took the opportunity to focus on the custodians who seemed to matter. Thus prompted, the claimants identified the four key custodians whom they saw as central and Master Whitaker ordered that they be the starting-point for disclosure. This amounts to a two-tier approach to disclosure first identify top-level criteria (such as key custodians, date ranges etc) to narrow the scope, and then apply appropriate technology to refine the set further and to suggest what is most likely to be important. CAN THE LAWYERS RELY ON PREDICTIVE CODING IN MAKING THEIR SELECTION? As noted above, the Practice Direction expressly requires consideration of the use of technology and discussion about "tools and techniques (if any) which should be considered to reduce the burden and cost of disclosure of electronic documents". It also refers to agreement on keywords and the use of sampling techniques. It deliberately does not seek to define any particular technology. Master Whitaker referred to the particular class of technology which is the subject of this paper because he saw it as the most appropriate way of dealing with the problem in hand. It has to be said that few UK judges are well enough informed to make such suggestions, but that emphasises the duty of lawyers to be equipped to explain how they arrived at their selection. A black box approach, with its implication that the user has no control over the selection nor means of showing how it was reached, will fail even if the user can satisfy himself about the output, there would be no way of demonstrating it to opponents and the court. Predictive coding of the kind offered by Equivio>Relevance is both transparent and susceptible to objective checking. Lawyers are familiar with the idea of using keywords to find documents it is what they do with Google, it has been available for some time, and keywords are specifically referred to in the Practice Direction. It is not the purpose of this paper to set out the 6
well-publicised defects in keyword search Conor Crowley s article referred to above covers this ground well nor to prove that any other technology is more accurate. All that matters, for these purposes, is to demonstrate that the best predictive coding tools provide methods of quality assurance which allow the lawyers to satisfy themselves (and hence their opponents and if necessary the court) that the results stand scrutiny. The references in the Practice Direction to agreed software tools and to the "tools and techniques (if any) which should be considered imposes no restrictions on what should be used if they reduce the burden and cost of disclosure of electronic documents. The next section looks at how Equivio>Relevance achieves this. HOW DOES PREDICTIVE CODING WORK? Each provider of this type of functionality has a slightly different approach, but the essence of the best-known ones is that a senior lawyer or subject-matter expert makes relevance decisions about a selection of documents. These expert decisions are used by the software to build up a picture of the elements, such as words, frequency of words, and distance between words, which distinguish between relevant or irrelevant documents. Equivio>Relevance works by putting forward a batch of, say, 40 documents which the lawyer marks as relevant or not using the simple control which on the training screen. With each round of training, Equivio>Relevance checks the accuracy of its relevance assessments. This monitored, iterative training process, maximising the lawyer input, is used to manage and optimise software training of the software, to ensure appropriate sampling, and to refine the software s ability to distinguish relevant and non-relevant documents. After a number of iterations (depending on the nature of the case and the document population it may take between 25 to 50 batches for this to happen) the system decides that it has enough information about document relevance to apply the lawyer input across the whole body of documents. How long this takes depends on the size of the document population and the available processing power a short period, anyway, relative to the time it would take humans to perform the same exercise. SUBJECTIVITY, OBJECTIVITY AND PROPORTIONALITY A purist might suggest that relevance (that is, in this context, meaning being disclosable under the rules) is largely an objective matter, albeit recognising that different lawyers might properly reach different conclusions over marginal documents. A predictive coding tool like Equivio->Relevance should be more consistently objective than a lawyer (and even more so with a team of lawyers) in that it applies the same rules to every 7
document; the subjective element derives from the range of decisions which the lawyer makes at the training stage. Equivio>Relevance does more, however, than make simple Yes/No decisions. It also ranks documents by their degree of relevance. Once one introduces the idea that some documents are more relevant than others (and provided one can satisfy oneself about this - see below) one can supplement the simple objective/subjective view with the question "is this relevant document worth including?" That is a proportionality question and, as we have seen, proportionality is as important a component in UK litigation as is relevance. Equivio>Relevance allows you to set a relevance threshold, that is, to indicate that you are interested only in documents which are assessed with, say, scores above 70 (on a scale from 0 through 100). Three things follow from this: firstly, you can look at the documents either side of this relevance threshold, and take a view as to whether the threshold has been set correctly; secondly, you can see what are the implications in terms of the percentage of documents to be reviewed; and thirdly, and perhaps most importantly, you are presented with an estimate as to the percentage of relevant documents which will be retrieved by using 70 as the threshold score. You may see, for example, that changing the relevance threshold to 60 will add 10% to the number of documents to be reviewed, but will add 20% to the number of relevant documents retrieved. You can look at samples of those which will be brought in as a result to judge their importance, and you can calculate the additional cost - indeed, Equivio>Relevance shows you what the additional cost will be based on review cost parameters entered by the user. The product also makes it easy to verify the results achieved by the software. Lawyer input is as critical to this process of verifying the results as it is to the training process which produced them. Samples can be taken, both of documents omitted and those included. Equivio>Relevance analyses the discrepancies between the conclusions reached by the software and those achieved by manual review of sample data. This phase enables quality assurance; adverse results can be used to identify possible errors or problems in the training process, such as inconsistencies in relevance criteria being used by the lawyer training the system. Furthermore, the verification process can also be used to compare the results achieved by Equivio>Relevance against an alternative culling method, such as keyword searching, to confirm that the results achieved are optimal. Following the verification phase, the lawyer then has under his hands all the components needed to assess proportionality - what is gained or lost by a particular decision and what are the cost implications. Furthermore, he is in a position to demonstrate to opponents and the court why he has arrived at his decision. Challenges can be resisted on an informed basis. If forced or persuaded to accept a different view (e.g. as to the 8
appropriate relevance threshold) the lawyer can make the necessary adjustments without delay. SUMMARY Despite the obvious differences between their respective systems, the concerns of a US lawyer and those of a UK one are much the same. One may fear sanctions which the other does not, but the primary question Am I disclosing all that I should disclose? raises the same issues in both jurisdictions, particularly when it comes, as it should, with provisos that over-disclosure is as unhelpful as under-disclosure, that costs are to be kept to a minimum, that the work done must be proportionate, and that informed cooperation is an obligation. Conor Crowley s article makes it clear that the use of appropriate technology in the US is not merely sanctioned by the FRCP but that its use is positively mandatory in appropriate cases. The UK rules and cases express this with a clarity which some US thinkers are beginning to envy the Practice Direction says that technology should be used and that parties must discuss the "tools and techniques (if any) which should be considered to reduce the burden and cost of disclosure of electronic documents and the use of agreed software tools. If the UK rules place the burden of selection more squarely on the giving party (because there is no initiating request), they also require discussions in advance which are more codified than the US meet and confer requirements and more subject to judicial scrutiny. The Goodale judgment shows the value, as well as the propriety, of working outwards from the most obvious sources rather than inwards from the widest possible starting-point; it also makes it clear that the use of technology which scores documents for relevance is appropriate. Applications like Equivio>Relevance meet all these requirements. At a simple level, they do the job faster by applying lawyer input into a sample across the whole selection and put the most important documents in front of the senior lawyers first. That on its own does not meet the concerns of a lawyer (whether the giver of the documents or his opponent) or the court concerned with whether the disclosure duty has been discharged. The key factors are the ability to look each side of a chosen relevance threshold to check what lies at the margins, and the ease with which an anomaly, once spotted, can be corrected so that revised conclusions can be instantly reapplied to the whole body of documents. 9
ABOUT CHRIS DALE Chris Dale chrisdaleoxford@gmail.com www.edisclosureinformation.co.uk http://chrisdale.wordpress.com ABOUT EQUIVIO Equivio develops text analysis software for e-discovery. Users include the DoJ, the FTC, KPMG, Deloitte, plus hundreds of law firms and corporations. Equivio offers Zoom, an integrated web platform for analytics and predictive coding. Zoom organizes collections of documents in meaningful ways. So you can zoom right in and find out what s interesting, notable and unique. Request a demo at info@equivio.com or visit us at www.equivio.com. Zoom in. Find out. Equivio, Equivio Zoom, Equivio>NearDuplicates, Equivio>EmailThreads, Equivio>Compare, Equivio>Relevance are trademarks of Equivio. Other product names mentioned in this document may be trademarks or registered trademarks of their respective owners. All specifications in this document are subject to change without prior notice. Copyright 2012 Equivio 10