Software-assisted document review: An ROI your GC can appreciate. kpmg.com
|
|
|
- Albert Jenkins
- 9 years ago
- Views:
Transcription
1 Software-assisted document review: An ROI your GC can appreciate kpmg.com
2 b Section or Brochure name
3 Contents Introduction 4 Approach 6 Metrics to compare quality and effectiveness 7 Results 8 Matter 1 8 Approach 1 8 Approach 2 10 Impact of cutoff score 10 Matter 2 11 Matter 3 12 Summary 13 Impact on cost 14 Results and conclusion 16 Appendix 1: Software-assisted review workflow 17
4 4 Software-assisted document review: An ROI your GC can appreciate Introduction Document review has always been and continues to be one of the most costly and time-consuming aspects of the Discovery phase in most matters. Document review can be percent of the overall expense, and managing review time, production deadlines and cost is very difficult. The key challenge is being able to identify the most relevant documents in a defensible, accurate, timely, and cost-effective manner. Electronic discovery document reviews tend to have multiple layers of review they typically involve a first level of review by contract attorneys followed by quality checks and subsequent second level reviews by subject matter experts. As initial rounds of review involve many attorneys who are not subject matter experts, there may be inconsistencies in the coding that cannot be addressed until the subsequent rounds of review. The multiple rounds of review to improve quality tend to drive up both time and cost. One obvious approach to help manage review cost is to reduce the total number of documents that need to be reviewed. ediscovery and processing technologies have evolved considerably in the last few years and can help filter out nonresponsive documents as early in the overall process as the initial evidence collection stage. Many of these advances such as deduplication, near deduplication, threading, conceptbased review, keyword-based culling, and other enhancements have become standard practice in the industry. Nevertheless, the volume of data to be reviewed continues to increase. The increase in data to be reviewed can be attributed to changes in storage costs and capacity, but also to additional types of electronically stored information (ESI) included in the review population at the outset (examples include voice mails, videos, instant messages, and SharePoint data). Based on the current state of the industry and the continued increase in ESI, the volume of data to be reviewed likely will continue to increase. The quality of a review is as important as managing costs. Review quality can be defined as how definitively and consistently all relevant documents are identified and marked as responsive. This simple concept can be quite difficult to achieve or measure in large document reviews. Reviewing documents involves human judgment and interpretation of complex issues. Reviewers might not consistently identify documents as relevant or not relevant, especially when the issues and documents are more complex. Without a baseline to compare against, it will be difficult to assess the accuracy of document reviews. The need for quality often results in document review workflows relying on multiple levels of
5 review based on a desire for multiple levels of quality control checks. But recent studies have shown that the level of agreement between two different review teams reviewing the same set of documents is often only in the 70 percent range. 1 Simply incorporating multiple levels of review might not be sufficient to guarantee review quality. The good news, however, is that as technology continues to evolve, there are other approaches available to help further reduce the costs involved in discovery while helping to improve the quality. In this paper, we discuss a new approach that may help increase efficiency and reduce costs in first-level review. Using leading categorization and search technology to perform an automated first-level review and following that automated first-level review with rigorous Quality Control (QC) and review by attorneys can not only help to reduce the cost of review significantly, but can also help improve the overall quality of a first-level review. For this paper, the software-assisted firstlevel review approach was applied on three different matters and was found to have significantly better recall with similar precision when compared to a traditional human-based review. 2 Total time taken to review all the documents was also significantly lowered by this approach. Our approach does not replace human review with software review, but helps make human review more efficient and cost effective by supplementing it with software-assisted review. 1 Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, Herbert L. Roitblat, Anne Kershaw, Patrick Oot, Journal of the American Society for Information Science and Technology, Recall and precision are commonly used metrics to measure search efficiency and are discussed in more detail in subsequent sections.
6 6 Software Assisted Document Review: An ROI Your GC Can Appreciate Approach The underlying premise of the approach and workflows described in this paper is that first level review can be made more effective if documents that are most likely to be relevant are identified using technology before a full human review is undertaken. Search and retrieval technologies have evolved considerably in the last few years. With these advances in technologies, patterns can be teased out from terabytes of data, and noise can be significantly reduced, if not eliminated altogether, in an increasingly precise manner. The idea of machine learning is based on the underlying principle that sophisticated search and retrieval algorithms can also learn and adapt based on review decisions made by human reviewers. If documents are grouped together in order of relevance and content similarity, the subsequent human review can be more effective and efficient. In such situations, attorney review can be much more focused and the time and cost associated with the review can be dramatically reduced. In a software-assisted review workflow, documents would be ranked initially by the software system after the system has been trained by a human reviewer with deep knowledge and understanding of the important issues in the underlying legal matter. The documents categorized and ranked by the software system as potentially responsive would then be reviewed in a traditional human review workflow starting with those ranked as most likely relevant. The software-assisted review workflow is described in more detail in Appendix 1. In preparation for this paper, we considered three different matters of varying size and complexity to evaluate workflows that can utilize software-assisted review and compared such workflows with traditional human-based first-level review workflow. The three matters discussed in this paper had already gone through a traditional human-based review process where (a) attorneys had conducted a first-level review, (b) a Quality Assurance/Quality Control (QA/QC) step had been performed after the first-level review, and (c) the process had been supplemented with a second-level human-based review. While each matter had been reviewed by different teams of attorneys, the same team had worked on the entire review for each individual matter. The coding from the traditional human review process was used to initially train the software as well as compare against the software-assisted review workflow. For the software-assisted review, we used Equivio>Relevance, 3 an expert guided software system that ranks documents by assigning a score between 0 and 100 to each document for any aspect that might be of interest in a review. 4 In order to assign the scores, the system must be trained on a sample/training set of documents where knowledgeable human reviewers provide Yes, No, or Do Not Know answers about the documents in the training set for every aspect to be considered by the system. After the software has been trained, it uses statistical and selflearning algorithmic techniques to calculate the scores for all documents. The training was done in batches of 40 documents. In general the number of training documents will depend on the total number of documents to be reviewed and how the documents chosen in the training sets have been coded. For instance if there are not enough documents that have a No answer or a Yes answer, or if there are too many Do Not Know answers, the system might require additional sets of training documents in order to differentiate which documents are important and which are not. Once the scores were available for all documents, and after reviewing the distribution of scores across the entire population, the scores were used in the test case to code documents. After all documents had been coded by the software system, the software-assisted coding was compared to the human review by a subject matter expert. A random sample of documents that was not part of the training set was selected and the subject matter expert compared the coding for these documents. Statistical tests were used to estimate if softwareassisted coding performed significantly better or worse than the human coding. Metrics of precision and recall were also calculated and used to measure the quality of the coding and the effectiveness of both approaches. 3 Equivio>Relevance is integrated into KPMG s Discovery Radar review platform. 4 The training set is a batch of documents randomly chosen from the entire document set. A score of 100 for a document in a particular aspect would mean that the document is highly important for the particular aspect while a score of 0 would mean that the document is not at all important for that aspect. Many times, documents are ranked on things like whether a document is relevant or privileged, etc. The ranking can also be on some aspect that is specific to a matter for example, whether the document is related to financial forecasts or whether any engineering opinions are discussed in the document, etc.
7 Software-Assisted document review: An ROI your GC can appreciate 7 Metrics to compare quality and effectiveness Precision and recall are two widely used metrics for evaluating the effectiveness of searches. Precision is a measure of exactness or correctness while recall is a measure of completeness. Precision is defined as the fraction of retrieved documents that are relevant to the search. Precision takes all retrieved documents into account, but it can also be evaluated at a given cutoff rank, considering only the topmost results returned by the system. This measure is called precision at n or p@n. In the testing used for this paper, precision was measured on a cutoff rank. Recall is defined as the fraction of relevant documents that are retrieved by a search. We will have a recall of 100 percent when we bring back the whole population of documents as responsive to a query. However, our objective is to reduce the number of documents to be reviewed to a more meaningful subset of relevant documents and eliminate as many false positives as possible. Therefore, recall alone is not sufficient as a metric and one needs to measure the number of false positives (i.e., the number of nonrelevant documents in the retrieved population) by computing the precision. Recall and precision are sometimes used together in the F1 Score (or F-measure) 5 to provide a single measurement of the effectiveness of a search. 5 The F-measure is the harmonic mean of precision and recall. van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.)
8 8 Software-assisted document review: An ROI your GC can appreciate Results As previously stated, for this white paper, the software-assisted review approach was tested on data from three actual matters that had already been through a traditional human review workflow. In order to understand how the software-assisted review workflows might be impacted by the number of documents, the matters chosen also had different numbers of documents ranging from as many as 86,000 documents to as few as 4,000 documents. Matter 1: The first matter was the largest of the three matters and had approximately 86,000 documents. The documents in this matter had been coded by a team of attorneys for relevance. We tested how the software-assisted review workflow compared to the human review in determining the relevance of a document. We also tried to understand whether training the software system with one expert or a team of experts had any impact on precision and recall. When the training documents are coded by a team of reviewers, we expected to see more variations within the training documents (as all reviewers might not code each document in exactly the same manner). We wanted to understand how these variations might impact the number of training documents needed and ultimately the recall and Precision. We trained the system with documents coded by the entire team of attorneys (Approach 1) and separately with a training set that was coded by just one attorney (Approach 2) and used the software-assisted review workflow to code the remaining documents in both the approaches. Approach 1: In Approach 1, where all of the documents were coded by a team of attorneys, the system required 101 sets of 40 documents each (i.e., 4,040 documents) to be fully trained. Thereafter, the software was able to assign scores for the remaining documents. Documents with scores greater than 9 were coded as Relevant (i.e. to be produced), and documents with scores less than 9 were coded as Not Relevant.The implication of choosing different cut-off scores is discussed in greater detail later in the paper. From Approach 1, there were 56,018 documents with scores of 9 or greater (65.4 percent of the population), meaning that 65 percent of the documents would then go through the subsequent human review step. A discrepancy analysis was performed on the Equivio results and the results of the human review (Table 1). The boxes in green are documents where the two reviews agreed on the coding for the documents. The discrepancy analysis focused on the boxes in yellow and aimed to understand if human review or the software-assisted review was more accurate when the results of the two reviews were different. The discrepancy analysis was performed by having the subject matter expert review a randomly selected subset of 132 and 91 documents from the yellow boxes. 6 The results of the subject matter expert s review of the subsets of documents with discrepancies from Approach 1 are shown in Table 2; Table 3 compares the precision and recall between the human review and the software-assisted review. The software-assisted review has greater recall while precision is not as high. However, the human review involved two rounds of review with a QA/QC step in between and one would expect higher precision in the multistep human review as compared to the single-step software assisted review. The precision in the software-assisted review is very good considering that only 4,040 of 85,608 documents were reviewed by the humans. Figure 1: Responsiveness from software-assisted coding in Approach 1 6 The size of the subsets was established so that if the entire set of documents was to be reviewed, we can be 95 percent confident that it would give the same results as the subset. The size was determined using the formula where Z α = Z value associated with the α confidence level. α = 0.05, Z α = Z β = Z value associated with the β confidence level. β = 0.01, Z β = P 0 = population proportion error rate (assumed 0.50) P = sample proportion error rate ( based on preliminary review)
9 Software-Assisted document review: An ROI your GC can appreciate 9 Approach 1 Software-assisted review Relevant (X) Nonrelevant (Y) Total Relevant (A) 22,768 2,494 25,262 Human Review Non-Relevant (B) 33,250 27,096 60,346 Total 56,018 29,590 85,608 Table 1: Discrepancies between software-assisted review and human review in Approach 1 Approach 1 HR Relevant/ E>R Nonrelevant (AY) HR Nonrelevant/ E>R Relevant (BX) Sample Size Actual Relevant Table 2: Subject matter expert s discrepancy analysis for Approach 1 Approach 1 Precision Recall f-measure E>R 42.41% 95.00% 58.64% Human Review 88.32% 89.23% 88.78% Table 3: Comparison of human and software-assisted review for Approach 1
10 10 Software-assisted document review: An ROI your GC can appreciate Approach 2: In Approach 2, the software-assisted review was compared against human review where all the documents in the human review had been reviewed by just one reviewer. We took the data that was used in Approach 1 and reduced it to a subset of documents that was reviewed by one reviewer with subject matter expertise. The total population of documents was reduced to 9,373 documents in this case (from 85,608 documents). The system required fewer training sets (i.e., 29 sets of 40 documents each, as compared to the 101 sets of 40 documents needed in Approach 1) to be fully trained before it was able to assign scores to all the 9,373 documents. As in Approach 1, a score of 9 was used as the cutoff for relevant or not relevant (Figure 2). Tables 4, 5, and 6 show the comparisons between human review and software-assisted review after a discrepancy analysis and review by the subject matter expert. Compared to Approach 1, the Precision of the software-assisted review improved when one reviewer reviewed all the documents. Human Review Approach 2 Software-assisted review Relevant (X) Nonrelevant (Y) Total Relevant (A) 4, ,283 Nonrelevant (B) 3,152 1,938 5,090 Total 7,391 1,982 9,373 Table 4: Results of human review vs. software-assisted review for Approach 2. Note we dropped one document from discrepancy analysis (data anomaly). Approach 2 HR Relevant/ E>R Nonrelevant (AY) HR Nonrelevant/ E>R Relevant (BX) Sample Size Actual Relevant Table 5: Subject matter expert s discrepancy analysis for Approach 2 Approach 2 Precision Recall f-measure E>R 58.62% 99.71% 73.83% Human Review 98.09% 96.69% 97.38% Table 6: Comparison of human and software-assisted review for Approach 2 Figure 2: Responsiveness from software-assisted coding in Approach 2 Impact of cutoff score: Review teams should choose cutoff scores appropriate for their particular cases. The basic premise is to pick a cutoff score that will help minimize the number of documents to be reviewed in the next human review step while keeping recall as high as possible (i.e., to help ensure that as many of the relevant documents as possible are retrieved by the software-assisted review). The cutoff score of 9, used in the two approaches above in Matter 1, was recommended by the Equivio software based on the distribution of the scores for all the documents as shown in the histogram in Figure 1. Equivio is designed to recommend a cutoff score that will help maximize the F-measure. A low cutoff score will mean that more documents will be retrieved resulting in greater recall, while a higher cutoff score will yield more precision. In Approach 1, applying a higher cut-off score of 18 (instead of 9) would have increased the precision from 42 percent to 47 percent, yet it would decrease the recall from 95 percent to 82 percent. Likewise, the number of documents indicated as relevant by the software would go down from 89 percent to 52 percent of the total population. The higher cutoff score means that reviewing just the 52 percent of the documents marked as relevant would retrieve nearly 82 percent of the relevant documents this is a fairly significant reduction in the number of documents that one might have to review in a traditional workflow to get to similar Recall rates. In Approach 2, with a cutoff of 25 (instead of 9), the number of documents indicated as relevant would be reduced to 59 percent (from 79 percent) while the precision increases to 70 percent (from 58%) and recall decreases to 90 percent from 99 percent. Again, this would mean that reviewing just about 59% of the documents would retrieve 90 percent of the relevant documents, which is an excellent result.
11 Software-Assisted document review: An ROI your GC can appreciate 11 Matter 2: Matter 2 had approximately 10,000 documents but had to be coded for multiple aspects including 1) relevance and 2) whether documents contained discussions about certain types of people, transactions, or organizations, etc. The relatively small population of documents did not have enough variability across documents in certain aspects for the software system to differentiate and rank these aspects. Other aspects that had sufficient variability in the documents were successfully ranked by the software system. This sort of result is more likely when the total number of documents is small. Figure 3 and Tables 7 9 summarize the results from Matter 2 for one of the aspects. These results also compare very well with human review a review of just 38 percent of the documents would be sufficient to retrieve 93 percent of the relevant documents. When compared to Matter 1, Matter 2 had better recall with a smaller subset of documents. Matter 1 was a more complex matter and had many related submatters. Whether a document in Matter 1 was relevant or not was not as straightforward as in Matter 2 (or Matter 3, discussed hereafter). Consequently, precision in Matter 2 was higher (compared to Matter 1) with a smaller set of documents to be reviewed. Figure 3: Responsiveness from software-assisted coding in Matter 2 Matter 2 Software-assisted review Relevant (X) Non-Relevant (Y) Total Relevant (A) 2, ,162 Human Review Nonrelevant (B) 2,116 7,626 9,742 Total 4,874 8,030 12,904 Table 7: Results of human review vs. software-assisted review for Matter 2 Matter 2 HR Relevant/ E>R Nonrelevant(AY) HR Nonrelevant/ E>R Relevant (BX) Sample Size Actual Relevant Table 8: Subject matter expert s discrepancy analysis for Matter 2 Matter 2 Precision Recall f-measure E>R 62.71% 93.35% 75.02% Human Review 91.77% 88.63% 90.17% Table 9: Comparison of human and software-assisted review for Matter 2
12 Matter 3: In the third matter, there were approximately 4,000 documents that had all been coded by a single attorney for relevance. In addition to the small number of documents, one salient feature of this matter was that documents had been collected in a targeted manner. The system required close to 2,000 training documents (almost 50 percent) this was mainly because of the similarity in the documents (due to the targeted collection). In general, the software-based review approach scales better when larger numbers of documents are present. Figure 4: Responsiveness from software-assisted coding in Matter 3 Human Review Matter 3 Software -assisted review Relevant (X) Nonrelevant (Y) Total Relevant (A) 1, ,229 Nonrelevant (B) 601 2,022 2,623 Total 1,758 2,094 3,852 Table 10: Results of human review vs. software-assisted review for Matter 3. Note we dropped 207 documents from the discrepancy analysis (data anomaly). Matter 3 HR Relevant/ E>R Nonrelevant(AY) HR Nonrelevant/ E>R Relevant (BX) Sample Size Actual Relevant Table 11: Subject matter expert s discrepancy analysis for Matter 3 Matter 3 Precision Recall f-measure E>R 72.71% 94.38% 82.14% Human Review 99.69% 90.46% 94.85% Table 12: Comparison of human and software-assisted review for Matter 3
13 Software-Assisted document review: An ROI your GC can appreciate 13 Summary We were able to quantify the precision and recall of the software-assisted review on matters where the documents had already been reviewed in a traditional human review. The software system required between 1,000 and 5,000 training documents before it was able to review the entire population. In the software-assisted workflow, the training documents would have to be reviewed initially by attorneys and the results provided to the software system. We observed the system was able to train more quickly when one reviewer with subject matter expertise trained the system vs. a group of attorneys. Generally, one reviewer who has subject matter expertise introduces less noise and bias into the system than a team of general reviewers. The additional variability that is present when a team of reviewers trains the system also tends to affect the application s learning process. In all the three matters, the software-assisted review had better Recall than human review; i.e., it was able to identify documents that were actually relevant that had been missed in the traditional human review workflow. In addition, the softwareassisted workflow identified the subset of the total population that was likely to contain relevant documents this subset would have to be reviewed as the next step of the workflow. In the matters we studied, the software system estimated that percent of the relevant documents were in subsets that were only percent of the total population. A discrepancy analysis of the documents where the software-assisted coding did not match the human review provided verification. The software-assisted review had lower precision (between percent) when compared to the traditional human review (88 99 percent). This indicates false positives (nonrelevant documents being identified as relevant) in the softwareassisted review. However, in these three matters, the precision in the traditional human review was quite high as the review had followed a three step process (first-level review, QC step, and second-level review) with the same team of attorneys being involved in all the three steps. The precision from the software-assisted review almost doubled in the second and third matters compared to the first matter. The first matter was a very complex matter with many related submatters. Deciding whether a document was responsive or not was even more subjective than usual in that matter there were multiple layers of complex issues that had be considered before deciding on the responsiveness of any single document, as Roitblat et. al 7. observed the amount of subjectivity and bias that goes into making a Yes/No decision tends to increase with complex documents and increases the variability in the review. Additionally, as the software system was trained on a set of documents with considerable variability, it marked as relevant more documents (any document that was somewhat relevant) and consequently had much lower precision. The software-assisted coding would be the first step in our proposed software-assisted review workflow and any documents flagged as relevant would then be subjected to a human review to verify relevance. That almost all the relevant documents can be identified by only reviewing between 38 and 68 percent of the documents after an initial review of 1,000 5,000 documents is very significant. A considerable portion of the review effort can often be accomplished in just the limited first step of the software-assisted workflow even before any significant human review effort. And the software-assisted review generally yielded better results than the traditional review both in terms of the percentage of documents that had to be reviewed (i.e., cost) and recall (percentage of relevant documents identified). We also noticed that the number of documents required for training was not related to the size of the population. For example, it took 2,000 documents to train the software for Matter 3 which was 50 percent of the overall population (Matter 3 had 4,000 documents); Whereas it took only 1,160 documents (out of 9,373) to train the software in Matter 1 Approach 2. So the training effort does not increase with the volume of data. On the other hand, the requisite number of documents is affected by other factors such as the quality and consistency of the training as well as variability in data. 7 Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, Herbert L. Roitblat, Anne Kershaw, Patrick Oot, Journal of the American Society for Information Science and Technology, 2010
14 14 Software-assisted document review: An ROI your GC can appreciate Impact on cost: In all three matters, there was potential for cost saving if the human review was done only on the documents ranked as responsive (above the cutoff score) by the software system. Table 13 shows the documents that were identified as not responsive in the three matters and the savings that could be realized in review costs if those documents were not reviewed by human reviewers. The hypothetical cost savings analysis assumes a total cost of $3 for the review of a document (the cost of all levels of review, production costs, etc., are lumped into this). The analysis also assumes a cost of $0.10 per document for just the software-assisted coding. The cost savings in a matter would depend on the document collection to be reviewed and the cut-off score that was chosen. And, while all the documents identified as relevant would be included in a human review, just a sample of the documents identified as not relevant could be reviewed to statistically confirm the hypothesis and accuracy of the not relevant determination assigned by the software. Hypothetical Cost Impact Total Documents Documents Identified as Not Relevant Cost of Traditional Human Review Cost of Software Assisted Review Potential Saving Amount Potential Cost Savings Percentage Matter 1: Approach 1 85,608 29,590 $256,824 $176,615 $80, % Matter 1: Approach 2 9,373 1,982 $28,119 $23,110 $5, % Matter 2 12,904 8,030 $38,712 $15,912 $22, % Matter 3 3,852 2,094 $11,556 $5,659 $5, % Table 13: Illustration of cost benefit (assumes hypothetical total cost of human review as $3 per document)
15 Software-Assisted document review: An ROI your GC can appreciate 15
16 16 Software-assisted document review: An ROI your GC can appreciate Results and conclusion Based on the results found in this paper, it is good practice to use a knowledgeable subject matter expert to apply software-assisted categorization and search technologies at the outset of a matter. While these technologies are not intended to replace human review, we have demonstrated that they may help improve efficiencies and reduce costs when used in combination with the human review of documents. Reviewers with subject matter expertise only need to spend a limited amount of review time up front to train the software system before the software system is able to isolate the relevant documents from non-relevant documents with a degree of confidence similar to that obtained in a traditional first-level human review. The results can be used to prioritize review such that the attorneys can focus on the most relevant documents first and then later review or assign to contract reviewers the other documents that are less likely to be relevant. And, at some point in the review of the lower ranked documents, review teams may feel confident enough to limit their review to a statistically valid sample of the remaining documents to realize additional cost and time savings. In addition to the obvious potential for faster review and lowered costs, a workflow of this sort provides other potential benefits. An analysis like that shown in this paper can serve to check and control the quality of review. The results from the software assisted coding can also be used to see if there are any gaps in keywords. And, when applied to data and documents received from the opposing counsel, the software-assisted workflow can be used to identify the critical documents more quickly. Software-assisted review might not be ideal for all situations as was revealed in some of the matters considered in this paper. If the number of documents is not very large, one may still get good results for recall, but might not get significant cost savings. And, if the data being reviewed does not have much variability, the software will need more training documents before it is able to rank all the documents and even then it might not be able to accurately code for criteria (decision points) where the training documents did not have enough variability. The use of software-assisted review is still relatively new to the ediscovery community, and the inner workings of the technology are not fully evident to the end user. However, even routine steps, such as indexing documents for keyword searching, involve subtleties that are not fully understood by an end user. As with most rapid technological advances in ediscovery, we believe the technologies behind softwareassisted review will eventually become standard practice throughout the legal and ediscovery communities alike. Traditional document review processes are time consuming, costly, and subject to human error. Software-assisted review is another means by which document review workflow can be further streamlined to help improve both speed and accuracy of review. While software-assisted review cannot completely replace the insight that a human reviewer brings to the review process, the available advancements in ediscovery software should be leveraged to identify and rank documents to help drive a more efficient and accurate document review.
17 Software-Assisted document review: An ROI your GC can appreciate 17 Appendix 1: Software assisted review workflow Data is collected from various custodians and from shared drives from various servers (including the mail server) from the company. The collection is targeted unless the case requires imaging entire hard drives. The data is then pared down to exclude system files and any other file types that are considered irrelevant to the case. The data is also filtered based on date range. Data is then deduplicated using MD5-hash values. The data is then filtered using keywords that were agreed upon with the opposing counsel during the meet and confer process. The main attorney on the case reviews the data that was collected from the top five custodians to review keywords hits. The attorney will also train the software to identify the most relevant documents from the entire population. The filtered data is batch ranked with KPMG s Discovery Radar and Equivio>Relevance software and ranked from the most relevant to the least relevant documents. The documents are also searched for the presence of privilege terms and any data responsive to the privilege search is separated for a detailed privilege review. The remainder of the documents are batched out in the order of calculated relevance for human review. Attorneys performing the human review can focus their review on the most important documents first. They can also use near deduplication technology to identify and group tag any junk documents as irrelevant. A sample of percent of these documents (possibly higher if necessary) will be subject to the quality assurance review process. If there is a need for rework, the batch of documents will be sent back to the reviewers for possible rework. The document batches that are QC d will be adjusted up or down based on whether they are relevant for the case. This approach should help to speed up document review and will also help save on first-level review costs. The attorneys can spend their review time on the most relevant document sets and can quickly sample or skim through the irrelevant document sets.
18
19 Acknowledgements We would like to acknowledge the contributions of the following KPMG LLP employees: Jacqui DiPerna, Denny Thong, Jim Monty, and Michael Carter. We would also like to thank the product team at Equivio.
20 Contact us For information about this paper, please contact one of authors: Kelli Brooks Principal, KPMG Forensic Technology Services T: E: Priya Keshav Director, KPMG Forensic Technology Services T: E: Meagan Thwaites Counsel CRM Legal Department, Boston Scientific Corporation T: E: kpmg.com The information contained herein is of a general nature and is not intended to address the circumstances of any particular individual or entity. Although we endeavor to provide accurate and timely information, there can be no guarantee that such information is accurate as of the date it is received or that it will continue to be accurate in the future. No one should act upon such information without appropriate professional advice after a thorough examination of the particular situation. Discovery Radar is a trademark of KPMG International. firms affiliated with KPMG International Cooperative ( KPMG International ), a Swiss entity. All rights reserved. Printed in the U.S.A. The KPMG name, logo and cutting through complexity are registered trademarks or trademarks of KPMG International NSS
The Tested Effectiveness of Equivio>Relevance in Technology Assisted Review
ediscovery & Information Management White Paper The Tested Effectiveness of Equivio>Relevance in Technology Assisted Review Scott M. Cohen Elizabeth T. Timkovich John J. Rosenthal February 2014 2014 Winston
Predictive Coding Defensibility and the Transparent Predictive Coding Workflow
WHITE PAPER: PREDICTIVE CODING DEFENSIBILITY........................................ Predictive Coding Defensibility and the Transparent Predictive Coding Workflow Who should read this paper Predictive
Predictive Coding Defensibility and the Transparent Predictive Coding Workflow
Predictive Coding Defensibility and the Transparent Predictive Coding Workflow Who should read this paper Predictive coding is one of the most promising technologies to reduce the high cost of review by
REDUCING COSTS WITH ADVANCED REVIEW STRATEGIES - PRIORITIZATION FOR 100% REVIEW. Bill Tolson Sr. Product Marketing Manager Recommind Inc.
REDUCING COSTS WITH ADVANCED REVIEW STRATEGIES - Bill Tolson Sr. Product Marketing Manager Recommind Inc. Introduction... 3 Traditional Linear Review... 3 Advanced Review Strategies: A Typical Predictive
Viewpoint ediscovery Services
Xerox Legal Services Viewpoint ediscovery Platform Technical Brief Viewpoint ediscovery Services Viewpoint by Xerox delivers a flexible approach to ediscovery designed to help you manage your litigation,
Predictive Coding Defensibility
Predictive Coding Defensibility Who should read this paper The Veritas ediscovery Platform facilitates a quality control workflow that incorporates statistically sound sampling practices developed in conjunction
Three Methods for ediscovery Document Prioritization:
Three Methods for ediscovery Document Prioritization: Comparing and Contrasting Keyword Search with Concept Based and Support Vector Based "Technology Assisted Review-Predictive Coding" Platforms Tom Groom,
Mastering Predictive Coding: The Ultimate Guide
Mastering Predictive Coding: The Ultimate Guide Key considerations and best practices to help you increase ediscovery efficiencies and save money with predictive coding 4.5 Validating the Results and Producing
Quality Control for predictive coding in ediscovery. kpmg.com
Quality Control for predictive coding in ediscovery kpmg.com Advances in technology are changing the way organizations perform ediscovery. Most notably, predictive coding, or technology assisted review,
Litigation Solutions. insightful interactive culling. distributed ediscovery processing. powering digital review
Litigation Solutions insightful interactive culling distributed ediscovery processing powering digital review TECHNOLOGY ASSISTED REVIEW Eclipse combines advanced analytic technology with machine learning
The case for statistical sampling in e-discovery
Forensic The case for statistical sampling in e-discovery January 2012 kpmg.com 2 The case for statistical sampling in e-discovery The sheer volume and unrelenting production deadlines of today s electronic
Symantec ediscovery Platform, powered by Clearwell
Symantec ediscovery Platform, powered by Clearwell Data Sheet: Archiving and ediscovery The brings transparency and control to the electronic discovery process. From collection to production, our workflow
E-Discovery Getting a Handle on Predictive Coding
E-Discovery Getting a Handle on Predictive Coding John J. Jablonski Goldberg Segalla LLP 665 Main St Ste 400 Buffalo, NY 14203-1425 (716) 566-5400 [email protected] Drew Lewis Recommind 7028
Measurement in ediscovery
Measurement in ediscovery A Technical White Paper Herbert Roitblat, Ph.D. CTO, Chief Scientist Measurement in ediscovery From an information-science perspective, ediscovery is about separating the responsive
Document Review Costs
Predictive Coding Gain Earlier Insight and Reduce Document Review Costs Tom Groom Vice President, Discovery Engineering [email protected] 303.840.3601 D4 LLC Litigation support service provider since
Veritas ediscovery Platform
TM Veritas ediscovery Platform Overview The is the leading enterprise ediscovery solution that enables enterprises, governments, and law firms to manage legal, regulatory, and investigative matters using
Recent Developments in the Law & Technology Relating to Predictive Coding
Recent Developments in the Law & Technology Relating to Predictive Coding Presented by Paul Neale CEO Presented by Gene Klimov VP & Managing Director Presented by Gerard Britton Managing Director 2012
www.pwc.nl Review & AI Lessons learned while using Artificial Intelligence April 2013
www.pwc.nl Review & AI Lessons learned while using Artificial Intelligence Why are non-users staying away from PC? source: edj Group s Q1 2013 Predictive Coding Survey, February 2013, N = 66 Slide 2 Introduction
LexisNexis Concordance Evolution Amazing speed plus LAW PreDiscovery and LexisNexis Near Dupe integration
LexisNexis Concordance Evolution Amazing speed plus LAW PreDiscovery and LexisNexis Near Dupe integration LexisNexis is committed to developing new and better Concordance Evolution capabilities. All based
KPMG Forensic Technology Services
KPMG Forensic Technology Services Managing Costs in e-discoverye October 14, 2010 1 Agenda: Strategies to Manage Costs in e-discovery Pre-collection Strategies Filtering Strategies Review and Production
Data Sheet: Archiving Symantec Enterprise Vault Discovery Accelerator Accelerate e-discovery and simplify review
Accelerate e-discovery and simplify review Overview provides IT/Legal liaisons, investigators, lawyers, paralegals and HR professionals the ability to search, preserve and review information across the
IBM Unstructured Data Identification and Management
IBM Unstructured Data Identification and Management Discover, recognize, and act on unstructured data in-place Highlights Identify data in place that is relevant for legal collections or regulatory retention.
Assisted Review Guide
Assisted Review Guide Version 8.2 November 20, 2015 For the most recent version of this document, visit our documentation website. Table of Contents 1 Relativity Assisted Review overview 5 Using Assisted
IBM ediscovery Identification and Collection
IBM ediscovery Identification and Collection Turning unstructured data into relevant data for intelligent ediscovery Highlights Analyze data in-place with detailed data explorers to gain insight into data
Enhancing Document Review Efficiency with OmniX
Xerox Litigation Services OmniX Platform Review Technical Brief Enhancing Document Review Efficiency with OmniX Xerox Litigation Services delivers a flexible suite of end-to-end technology-driven services,
Amazing speed and easy to use designed for large-scale, complex litigation cases
Amazing speed and easy to use designed for large-scale, complex litigation cases LexisNexis is committed to developing new and better Concordance Evolution capabilities. All based on feedback from customers
Understanding How Service Providers Charge for ediscovery Services
ediscovery SERVICES Understanding How Service Providers Charge for ediscovery Services The objective of this document is to briefly define the prominent phases of the ediscovery lifecycle, the fees associated
Clearwell Legal ediscovery Solution
SOLUTION BRIEF: CLEARWELL LEGAL ediscovery SOLUTION Solution Brief Clearwell Legal ediscovery Solution The Challenge: Months Delay in Ascertaining Case Facts and Determining Case Strategy, High Cost of
E-discovery Taking Predictive Coding Out of the Black Box
E-discovery Taking Predictive Coding Out of the Black Box Joseph H. Looby Senior Managing Director FTI TECHNOLOGY IN CASES OF COMMERCIAL LITIGATION, the process of discovery can place a huge burden on
Reduce Cost and Risk during Discovery E-DISCOVERY GLOSSARY
2016 CLM Annual Conference April 6-8, 2016 Orlando, FL Reduce Cost and Risk during Discovery E-DISCOVERY GLOSSARY Understanding e-discovery definitions and concepts is critical to working with vendors,
Not all NLP is Created Equal:
Not all NLP is Created Equal: CAC Technology Underpinnings that Drive Accuracy, Experience and Overall Revenue Performance Page 1 Performance Perspectives Health care financial leaders and health information
The Business Case for ECA
! AccessData Group The Business Case for ECA White Paper TABLE OF CONTENTS Introduction... 1 What is ECA?... 1 ECA as a Process... 2 ECA as a Software Process... 2 AccessData ECA... 3 What Does This Mean
PICTERA. What Is Intell1gent One? Created by the clients, for the clients SOLUTIONS
PICTERA SOLUTIONS An What Is Intell1gent One? Created by the clients, for the clients This white paper discusses: Understanding How Intell1gent One Saves Time and Money Using Intell1gent One to Save Money
e-discovery Forensic Services kpmg.ch Advisory
e-discovery Advisory Forensic Services kpmg.ch e-discovery You or your client are involved in a dispute, investigation, regulatory or internal review. You need to review evidence and may need to disclose
E- Discovery in Criminal Law
E- Discovery in Criminal Law ! An e-discovery Solution for the Criminal Context Criminal lawyers often lack formal procedures to guide them through preservation, collection and analysis of electronically
ediscovery Document Review: Understanding the Key Differences Between Conceptual Searching and Near Duplicate Grouping
ediscovery Document Review: Understanding the Key Differences Between Conceptual Searching and Near Duplicate Grouping White Paper by Kimberlee L. Gunning, Esq 1 INTRODUCTION As anyone who has spent time
A Modern Approach for Corporations Facing the Demands of Litigation
A Modern Approach for Corporations Facing the Demands of Litigation The first pure Software-as-a-Service (SaaS) e-discovery technology designed to help in-house legal teams face the increased risk and
2011 Winston & Strawn LLP
Today s elunch Presenters John Rosenthal Litigation Washington, D.C. [email protected] Scott Cohen Director of E Discovery Support Services New York [email protected] 2 What Was Advertised Effective
DSi Pilot Program: Comparing Catalyst Insight Predict with Linear Review
case study DSi Pilot Program: Comparing Catalyst Insight Predict with Linear Review www.dsicovery.com 877-797-4771 414 Union St., Suite 1210 Nashville, TN 37219 (615) 255-5343 Catalyst Insight Predict
The ediscovery Balancing Act
WHITE PAPER: THE ediscovery BALANCING ACT The ediscovery Balancing Act Striking the Right Mix of In-House and Outsourced Expertise The ediscovery Balancing Act Contents Introduction...........................................
KPMG s Financial Management Practice. kpmg.com
KPMG s Financial Management Practice kpmg.com 1 KPMG s Financial Management Practice KPMG s Financial Management (FM) practice, within Advisory Management Consulting, supports the growing agenda and increased
A Practitioner s Guide to Statistical Sampling in E-Discovery. October 16, 2012
A Practitioner s Guide to Statistical Sampling in E-Discovery October 16, 2012 1 Meet the Panelists Maura R. Grossman, Counsel at Wachtell, Lipton, Rosen & Katz Gordon V. Cormack, Professor at the David
Relativity and Beyond
Relativity and Beyond e-discovery Review Services As a Relativity Premium Hosting Partner, we bring nearly three decades of litigation support experience to one of the most popular, respected and feature-rich
NightOwlDiscovery. EnCase Enterprise/ ediscovery Strategic Consulting Services
EnCase Enterprise/ ediscovery Strategic Consulting EnCase customers now have a trusted expert advisor to meet their discovery goals. NightOwl Discovery offers complete support for the EnCase Enterprise
Pr a c t i c a l Litigator s Br i e f Gu i d e t o Eva l u at i n g Ea r ly Ca s e
Ba k e Offs, De m o s & Kicking t h e Ti r e s: A Pr a c t i c a l Litigator s Br i e f Gu i d e t o Eva l u at i n g Ea r ly Ca s e Assessment So f t wa r e & Search & Review Tools Ronni D. Solomon, King
APPENDIX B TO REQUEST FOR PROPOSALS
Overview and Instructions APPENDIX B The service provider s responsibilities will include the following: (A) Processing of ESI produced to CTAG in a variety of file formats; (B) Hosting ESI produced to
Symantec Enterprise Vault for Microsoft Exchange
Symantec Enterprise Vault for Microsoft Exchange Store, manage, and discover critical business information Data Sheet: Archiving Trusted and proven email archiving Symantec Enterprise Vault, the industry
Predictive Coding, TAR, CAR NOT Just for Litigation
Predictive Coding, TAR, CAR NOT Just for Litigation February 26, 2015 Olivia Gerroll VP Professional Services, D4 Agenda Drivers The Evolution of Discovery Technology Definitions & Benefits How Predictive
Veritas Enterprise Vault for Microsoft Exchange Server
Veritas Enterprise Vault for Microsoft Exchange Server Store, manage, and discover critical business information Trusted and proven email archiving Veritas Enterprise Vault, the industry leader in email
Symantec Enterprise Vault and Symantec Enterprise Vault.cloud
Symantec Enterprise Vault and Symantec Enterprise Vault.cloud Better store, manage, and discover business-critical information Solution Overview: Archiving Introduction The data explosion that has burdened
Simplify the e-discovery process by learning which tools to use and when to use them. CHAPTER 7. Proactive. Review tools. litigation hold tools.
THE WINDOWS MANAGER S GUIDE TO INSIDE: Reactive litigation hold tools Proactive litigation hold tools Review tools Enterprise search tools Archive systems CHAPTER Exploring e-discovery tools Simplify the
KPMG Enterprise- Level Electronic Discovery. kpmg.com
KPMG Enterprise- Level Electronic Discovery kpmg.com 1 ELECTRONIC DISCOVERY ELECTRONIC DISCOVERY 2 10101010101010101010101 1010101010101010101010101010101010101010101010 10101010101010101010101 1010101010101010101010101010101010101010101010
ediscovery Policies: Planned Protection Saves More than Money Anticipating and Mitigating the Costs of Litigation
Brought to you by: ediscovery Policies: Planned Protection Saves More than Money Anticipating and Mitigating the Costs of Litigation Introduction: Rising costs of litigation The chance of your organization
www.istdiscover-e.com
www.istdiscover-e.com who is IST? IST ediscovery provides a full range of on-site litigation support and off-site ediscovery services that will take you from Early Case Assessment (ECA) through hosted
Introduction to Predictive Coding
Introduction to Predictive Coding Herbert L. Roitblat, Ph.D. CTO, Chief Scientist, OrcaTec Predictive coding uses computers and machine learning to reduce the number of documents in large document sets
Symantec Clearwell and Enterprise Vault EOOC Eating Our Own Cooking Initiative in ediscovery Enables Symantec to Save US$13 Million over Seven Years
BUSINESS IMPACT STUDY Symantec Clearwell and Enterprise Vault EOOC Eating Our Own Cooking Initiative in ediscovery Enables Symantec to Save US$13 Million over Seven Years Executive Summary For years, Symantec
AccessData Corporation. No More Load Files. Integrating AD ediscovery and Summation to Eliminate Moving Data Between Litigation Support Products
AccessData Corporation No More Load Files Integrating ediscovery and Summation to Eliminate Moving Data Between Litigation Support Products White Paper August 2010 TABLE OF CONTENTS Introduction... 1 The
ediscovery Software Buyer s Guide FOR SMALL LAW FIRMS
ediscovery Software Buyer s Guide FOR SMALL LAW FIRMS NE X TPOINT.C O M @NE X TPOINT Aided by the sensible use of technology, small firms should no longer be averse to taking on big cases or processing
Christina Wojcik, VP Legal Services, Seal Software Steven Toole, VP Marketing, Content Analyst Company Jason Voss, Senior Product Manager, TCDi
FEBRUARY 3 5, 2015 / THE HILTON NEW YORK ML1: Machine Learning Powered Rapid Insight into Big Content: Discovery from Contracts to Patents to Litigation Panelists Christina Wojcik, VP Legal Services, Seal
Guide to advanced ediscovery solutions
RCLS Services & Technology Guide to advanced ediscovery solutions Océ Business Services Records, Compliance and Legal Solutions Products and Services Océ Business Services has earned the reputation as
Director, Value Engineering
Director, Value Engineering April 25 th, 2012 Copyright OpenText Corporation. All rights reserved. This publication represents proprietary, confidential information pertaining to OpenText product, software
Predictive Analytics for Donor Management
IBM Software Business Analytics IBM SPSS Predictive Analytics Predictive Analytics for Donor Management Predictive Analytics for Donor Management Contents 2 Overview 3 The challenges of donor management
Automated Text Analytics. Testing Manual Processing against Automated Listening
Automated Text Analytics Testing Manual Processing against Automated Listening Contents Executive Summary... 3 Why Is Manual Analysis Inaccurate and Automated Text Analysis on Target?... 3 Text Analytics
