Equivio FAQs Here is some basic information about Equivio for use by Catalyst and our Partners. These will help in responding to queries from clients and prospective clients. 1. What is Equivio? Equivio is software that helps reduce review costs by grouping near duplicate documents and reconstructing email conversations. These features allow the reviewer to pinpoint each document s unique content when reviewing nearduplicate documents and focus exclusively on unique content in email threads. 2. How much money can Equivio save my client? When near duplicates and email threads are handled and treated together, the time and cost of document review can be reduced by 30-50%. Based upon a volume of 300GB and review costs of $250/hour, your clients could save more than $3 million by employing near-duping technology prior to commencing document review. Even at a review cost of $50/hour, the client will save more than a half million dollars. Click here for more information on ROI. 3. Does Equivio cull out documents? No. Equivio groups documents by similarity or email threads. The documents are grouped together for bulk coding but are not culled out. 4. What are Near Dupes? Near dupes (duplicates) are documents that are similar in content, e.g., contract drafts. Equivio groups them so the reviewer can focus on key differences in the group rather than reviewing each document in its entirety, and selects a Pivot document that the reviewer begins with. Here are some examples of near dupes: Documents with somewhat different content (different text). Documents with the same content and file type, but different formatting. Documents with the same content but different file type (DOC and PDF). In many cases, near dupes can make up as much as 30% of the population. Reviewing them as a unit cuts review time dramatically. www.catalystsecure.com 877.557.4273 info@catalystsecure.com
Equivo allows us to select the Pivot document based on: Longest document (maximum number of words) Median-sized document (median number of words) Shortest document (minimum number of words) Document with the numerically greatest document ID Document with the numerically lowest document ID 5. What are Email Threads? Email threads are emails that are part of a conversation. Equivio groups them so they can be reviewed together efficiently. Specifically: Equivio organizes emails into hierarchical groups. Equivio identifies the Inclusive email in each thread. Equivio verifies that each Inclusive contains the entire history of the conversation (taking into account both email content and attachments). This allows the reviewer to see the entire conversation and skip the lesser emails after viewing the Inclusive. In many cases, email conversations can make up as much as 50% of the email population. Reviewing them as a unit cuts review time dramatically. 2
6. How does Equivio work? Equivio analyzes the text of documents to determine if they are loose files or emails. If they are loose files, Equivio extracts the hash value of the file to detect if there are any exact duplicates. Then, it compares bit by bit the body of the files to determine if it is a Near Dupe to another document. For emails, Equivio deconstructs them into email body and attachments. This allows identification and detection of email threads and allows the attachments to be identified as Near Dupes to other attachments or loose files. 7. Can Equivio handle Chinese, Japanese or Russian characters? Yes. Equivio is fully Unicode compliant, so as long as the text is searchable, Equivio can compare any languages. 8. Can Equivio find similar documents across languages? No. Equivio bases its results exclusively on the body of the text it s comparing. 9. Do my documents have to be searchable to be run in Equivio? Yes. The data needs to be searchable or needs to include a corresponding text file per document to be run in Equivio. 10. What file formats are compatible with Equivio? Among many other file types, the following are examples of Equivio-supported extensions: 3
PST MSG EML DOC PPT PDF HTML TXT XLS 11. My data collection is comprised of image-only PDFs and TIFFs. What can we do? Catalyst can perform an additional OCR process on your image-only collection to read text and create text files to be used for Equivio. 12. How is the resemblance between documents calculated? Equivio analyzes the text of each document and compares how similar they are. The technique is called shingling and is well established in the academic community. In essence, Equivio creates a hash value for each of the words in the document (actually it uses successive three-word groupings for the hashes). It then compares documents based on the number of matching hashes it finds. Consider a document with one sentence in it: We need to settle this case before the judge rules on their motion. Equivio would create unique hash values for the three-word phrases in succession. We need to settle, need to settle, to settle this, settle this case, this case before, and so on. It would do the same thing for another document and then compare the number of identical hashes it found. Thus a near duplicate document might have this sentence: We need to settle this case before the judge decides their motion. The sentence is not identical but it is sufficiently similar as to be considered a near duplicate. No approach is perfect, but scientists have found that word shingling does a good job of identifying documents with similar content. In turn, reviewing similar documents in a group saves on review time. 13. Does Equivio allow the client to set the similarity threshold? Yes. The similarity threshold is referred to as an EquiLevel the minimal level of resemblance for two documents to be considered near-duplicates. At the www.catalystsecure.com 877.557.4273 info@catalystsecure.com
beginning of each case, the site administrator can determine the EquiLevel based on case-specific criteria. For example, if the document collection consists mostly of OCR documents, the administrator can choose the similarity to be as low as 40% to compensate for errors common to the OCR process, or as high as 100% if the documents are expected to contain very similar data. This setting can only be adjusted at the beginning of the case. Catalyst s default EquiLevel is 75%. 14. What are the main advantages of using Equivio? A. Faster review: Document grouping allows the reviewer to focus on key differences within a group. B. Lower Cost: Faster review translates to lower review team expenses. C. Minimized Risk: Focusing on differences reduces oversights and errors. Document grouping allows for consistent preservation, coding and annotation of similar documents. 15. How does using Equivio cut the cost of review? Equivio organizes documents for a more efficient review. The software groups documents into EquiSets. An EquiSet is a group of near-duplicate and duplicate documents. A document can belong to no more than one EquiSet. As an example, Equivio identifies a group of 10 near-duplicates, all versions of a 50-page contract. This EquiSet can be assigned to one reviewer. He or she can use the Compare tool to hone in on differences and then review these very similar documents at the same time, in a systematic fashion. In contrast, traditional review methods require reviewing all 10 documents individually and in their entirety. This more organized approach saves time and, therefore, money. 16. How does Document Grouping actually cut down the cost of review? First, the review process is more coherent and organized. Second, within each set of near-duplicates, Equivio suggests a "pivot" document. This is the document the reviewer should read first. It's the most representative document of the set. So you can prioritize the initial review process by reading just the pivots to cover the entire collection. If the pivot is clearly irrelevant to the matter, you can skip the other documents in the near-duplicate set. After all, they differ by 5
just a few words. This consistency and organization turns into faster review at a lower cost. 17. Is there a risk to skipping relevant information by treating groups of documents in bulk? On the contrary. Equivio s grouping of similar documents reduces the risk of inadvertently missing important data. If it is determined that the pivot is relevant in a group of 50 similar documents, we wouldn t need to read the document 50 times in its entirety. Once we ve read the pivot, we can invoke the Text Compare utility which will highlight the differences. We might have a version of the document to which just two words were added. We can immediately focus on these two words. By zooming in on the differences, you pinpoint each item of unique information in each document, and more importantly you've reduced the risk of missing crucial data. 18. How much training is needed to use Equivio in conjunction with Catalyst CR? If you are already familiar with using CR, the training is minimal. One of our project managers can get you up and going in about 15 to 30 minutes. 19. Will running Equivio slow down the time to post new data into a site? Not necessarily. The data upload can be run simultaneously with Equivio processing so reviewers can begin immediately. Once the overlay file has been extracted, the Equivio results will be available on the site. 20. As a partner, we already own Equivio. Can we do our own Equivio processing? Yes. As long as an overlay file with the needed fields is provided with your data, it can be uploaded to CR. 21. We will be running Equivio in our site but we might have additional data being added. Would we need to rerun the whole site again when the new data arrives? No. If it is specified that additional data will be run, we can create and maintain an incremental Equivio database that will allow us to deliver rolling data to the site as it comes in. 6
22. How can we know the case statistics in our data? At the completion of each Equivio project, your Catalyst Project Manager will give you an HTML or Excel report that will show the number of near dupes and email threads in relation to the overall document population. 7