Discovery in the Digital Age: e-discovery Technology Overview. Chuck Rothman, P.Eng Wortzman Nickle Professional Corp.



Similar documents
ARCHIVING FOR EXCHANGE 2013

Enhancing Document Review Efficiency with OmniX

Take an Enterprise Approach to E-Discovery. Streamline Discovery and Control Review Cost Using a Central, Secure E-Discovery Cloud Platform

AccessData Corporation. No More Load Files. Integrating AD ediscovery and Summation to Eliminate Moving Data Between Litigation Support Products

Veritas ediscovery Platform

Discussion of Electronic Discovery at Rule 26(f) Conferences: A Guide for Practitioners

Symantec ediscovery Platform, powered by Clearwell

Viewpoint ediscovery Services

Litigation Solutions. insightful interactive culling. distributed ediscovery processing. powering digital review

Digital Forensics, ediscovery and Electronic Evidence

ediscovery 101 Myth Busting October 29, 2009 Olivia Gerroll ediscovery Solutions Group Director

Reduce Cost and Risk during Discovery E-DISCOVERY GLOSSARY

Portable. Harvester 4.0 has Arrived!! POWERFUL E-DISCOVERY COLLECTION SOFTWARE SEARCH AND COLLECT DISCOVERABLE DOCUMENTS AND HARVESTER FEATURES

for Insurance Claims Professionals

Litigation Solutions insightful interactive culling distributed ediscovery processing powering digital review

E-Discovery Basics For the RIM Professional. Learning Objectives 5/18/2015. What is Electronic Discovery?

Best Practices: Cloud ediscovery Using On-Demand Technology and Workflows to Speed Discovery and Reduce Expenditure

2972 NW 60 th Street, Fort Lauderdale, Florida Tel Fax

What Am I Looking At? Andy Kass

Clearwell Legal ediscovery Solution

The Business Case for ECA

C2C ArchiveOne & Microsoft Exchange Server 2013

Sample Electronic Discovery Request for Proposal

Data Sheet: Archiving Symantec Enterprise Vault Discovery Accelerator Accelerate e-discovery and simplify review

This Webcast Will Begin Shortly

Enterprise Archive Managed Archiving & ediscovery Services User Manual

Understanding How Service Providers Charge for ediscovery Services

Discovery of Electronically Stored Information ECBA conference Tallinn October 2012

Best Practices in Electronic Record Retention

Symantec Enterprise Vault for Microsoft Exchange

Addressing Legal Discovery & Compliance Requirements

Simplify the e-discovery process by learning which tools to use and when to use them. CHAPTER 7. Proactive. Review tools. litigation hold tools.

Stu Van Dusen Marketing Manager, Lexbe LC. September 18, 2014

Electronic documents questionnaire

Proactive Data Management for ediscovery

W H I T E P A P E R. Symantec Enterprise Vault and Exchange Server November 2011

Xact Data Discovery. Xact Data Discovery. Xact Data Discovery. Xact Data Discovery. ediscovery for DUMMIES LAWYERS. MDLA TTS August 23, 2013

Whitepaper: Enterprise Vault Discovery Accelerator and Clearwell A Comparison August 2012

Early Data Assessment. Product Summary. Processing. Review

Only 1% of that data has preservation requirements Only 5% has regulatory requirements Only 34% is active and useful

The Summation Users Guide to Digital WarRoom It s time for a fresh approach to e-discovery.

ediscovery Solutions

E- Discovery in Criminal Law

How Cisco IT Uses SAN to Automate the Legal Discovery Process

APPENDIX B TO REQUEST FOR PROPOSALS

e-disclosure Take the driver s seat

Data Sheet: Archiving Symantec Enterprise Vault for Microsoft Exchange Store, Manage, and Discover Critical Business Information

ediscovery Software Buyer s Guide FOR SMALL LAW FIRMS

CAPABILITY STATEMENT LEGAL TECHNOLOGIES AND COMPUTER FORENSICS. DECEMBER 2013

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn)

Symantec Enterprise Vault and Symantec Enterprise Vault.cloud

Veritas Enterprise Vault for Microsoft Exchange Server

Are you ready for more efficient and effective ways to manage discovery?

Litigation Support. Learn How to Talk the Talk. solutions. Document management

ediscovery Technology That Works for You

ediscovery and Search of Enterprise Data in the Cloud

CITY OF FRISCO PURCHASING DIVISION

Symantec Enterprise Vault for Microsoft Exchange Server


In-house Counsel s Next Cost Savings Frontier: Cost Minimization by Centralizing Litigation Document Collections

LONG INTERNATIONAL. Long International, Inc Whistling Elk Drive Littleton, CO (303) Fax: (303)

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

exobase Discovery Review Platform

2013 Boston Ediscovery Summit. Computer Forensics for the Legal Issue-Spotter

10 Reasons Why Enterprises Select Symantec.cloud for Archiving

ZEROING IN DATA TARGETING IN EDISCOVERY TO REDUCE VOLUMES AND COSTS

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

Best Practices: Defensibly Collecting, Reviewing, and Producing

# Is ediscovery eating a hole in your companies wallet?

Data Sheet: Archiving Symantec Enterprise Vault for Microsoft Exchange Store, Manage, and Discover Critical Business Information

electronic discovery requests


Are Mailboxes Enough?

IMPORTANT CONSIDERATIONS FOR MID-RANGE EDISCOVERY DATA COLLECTION

Symantec Enterprise Vault.cloud Overview

FIVE TIPS FOR A SUCCESSFUL ARCHIVE MIGRATION TO MICROSOFT OFFICE 365 WHITEPAPER

Lowering E-Discovery Costs Through Enterprise Records and Retention Management. An Oracle White Paper March 2007

Guide to advanced ediscovery solutions

Symantec Enterprise Vault for Microsoft Exchange

ESI DEMYSTIFIED. Streamlining the E-Discovery Process Through Internal Processes and Controls. Melinda Burrows Bruce Cosgrove*

Unified ediscovery Platform White DISCOVERY, LLC

PRIVACY IMPACT ASSESSMENT

How ArchiveOne can help with ediscovery (within the EDRM framework)

Investigating the prevalence of unsecured financial, health and personally identifiable information in corporate data

Enhancing Microsoft Exchange & Office 365 Archiving, Retention, and Discovery with Netmail

A Modern Approach for Corporations Facing the Demands of Litigation

Office 365 for the Information Governance and ediscovery Practitioner. Part II: ediscovery Deep Dive October 27, 2015

savvisdirect White Papers

Considering Third Generation ediscovery? Two Approaches for Evaluating ediscovery Offerings

EnCase ediscovery. Automatically search, identify, collect, preserve, and process electronically stored information across the network.

Early Case Assessment in ediscovery

How To Use One To Help Your Case With A Lawsuit

Symantec Enterprise Vault Discovery.cloud

Archiving Compliance Storage Management Electronic Discovery

Making Sense of E-Discovery: 10 Plain Steps for Producing ESI

What You Should Know About ediscovery

Best Practices for Streamlining Digital Investigations

Transcription:

Discovery in the Digital Age: e-discovery Technology Overview Chuck Rothman, P.Eng Wortzman Nickle Professional Corp. The Ontario e-discovery Institute 2013

Contents 1 Technology Overview... 1 1.1 Introduction... 1 1.2 Technology Applicable to EDRM Stages... 1 1.2.1 Identification... 1 1.2.2 Collection... 2 1.2.3 Pre-Processing Analysis... 3 1.2.4 Processing... 4 1.2.5 Data Analytics... 4 1.2.6 Review... 5 2 Tools for Standard and Large Cases... 7 2.1 Introduction... 7 2.2 Standard Cases... 7 2.2.1 Self-Collection Tools... 7 2.2.2 Searching and Analysis Tools... 9 2.2.3 Review Tools... 9 2.3 Large Cases... 9 2.3.1 Data Analytics... 10 3 Scalability... 11 3.1 What is Scalability... 11 3.2 Scalability in e-discovery... 11

1 Technology Overview 1.1 Introduction Discovery, for the purposes of this paper, can be defined as the process in a litigation or regulatory matter where the parties identify and exchange information and records that are relevant to the issues of the matter. Electronic discovery, or e-discovery as it is more commonly referred to, is the combination of traditional discovery practices with computer technology. It involves identifying and exchanging information and records that may be stored in digital form on computers, smartphones, etc. It also involves the techniques and tools that can be used to identify the relevant information and records much more quickly and efficiently than examining each record manually. This paper provides a summary of the different types of e-discovery related tools currently available to the legal practitioner, and discusses where and how those tools can best be used. 1.2 Technology Applicable to EDRM Stages The EDRM 1 is a standard framework describing the various processes involved in e- discovery. It encompasses nine distinct stages, beginning with Information Management (how information and records are stored in the normal course of business) and ending with Presentation (how relevant information is presented at examinations, hearings, trials, etc.). The EDRM does not specifically recommend the use of any technology, although technology is implied in many of its processes. The following is a list of tools appropriate to the main discovery-related stages of the EDRM. 1.2.1 Identification Identification involves learning about your client s digital storage systems and the way they create and consume digital information. This knowledge is then overlaid with the issues of the matter in order to pinpoint where the potentially relevant records are located within your client s digital universe. Very few corporations have standardized, rigid information management controls. As a result, in most organizations, individual custodians are free to store information in many different forms, and in a multitude of locations. This is most prevalent when it comes to email. 1 Electronic Discovery Reference Model. For more information, see http://www.edrm.net/ 1

Because of variation in record storage from individual to individual and from organization to organization, there is no all-encompassing tool that can be used to identify the type and location of relevant records. Civil litigation does, however, include the principle of legal hold, a process whereby steps are taken to ensure that potentially relevant information is preserved. Since this is a fairly standard process, tools exist to assist in its implementation. The legal hold process generally involves notifying custodians who control potentially relevant information of their preservation obligations. Legal hold best practices ensure that the custodians understand and acknowledge receipt of the notice, and receive periodic reminders of their ongoing obligations. Legal hold software automates some of the process, such as emailing the notice to a specified group of people, keeping track of acknowledgements, and sending reminders. Some tools allow the custodians to complete on-line questionnaires to provide feedback detailing the types of records in their custody and control, and where those records are located. More sophisticated legal hold tools include technology to lock-down information within certain digital repositories, such as email archives, Sharepoint sites, and certain file storage systems. Some of these tools are stand-alone software applications that can either be accessed from a website or installed on a local workstation. Other legal hold tools are incorporated into end-to-end e-discovery applications. 1.2.2 Collection At some point after potentially relevant information has been identified and preserved, it needs to be collected. This generally involves making a copy of the individual records. Due to the volatile nature of digital information compared to its paper equivalent, best practice dictates that steps should be taken to ensure that the collected records are exact copies of their originals, and additional information is captured to allow the copied records to be validated and authenticated, if necessary, at a later time. There are two broad methods of collecting digital information collect individual records, or collect entire repositories of records. The first method involves picking and choosing which records to collect. It is useful where time permits a more targeted identification and collection process, and is the preferred technique for the majority of e- discovery collections. This method is not appropriate when deleted digital information is needed, or computer records related to use of the device (such as websites visited) need to be captured. Software tools that assist in this collection process range from those included in the standard computer operating systems (such as the copy command), to purpose built 2

applications that automatically document the selection and collection process, log exceptions (such records that could not be collected, or collected records that are not exact copies of their original), and incorporate routines to generate digital signatures for authentication purposes. Some of these tools are stand-alone computer programs that can be installed on a workstation or server. Others are web-based, and involve the installation of a small computer program on each custodian s computer. Targeted collection features are also included in some corporate-wide computer forensic tools and some end-to-end e- discovery tools. A recent advancement in collection tools involves software installed directly onto portable storage devices, such as USB thumbdrives or USB hard drives. The software on the device is configured by a collection specialist so that it only collect desired information. The device is then sent to a custodian, who simply plugs it into their computer. The collection process starts automatically, and once completed, the custodian disconnects the device and sends it back to the collection specialist. This type of technology is quickly gaining acceptance, as it dramatically reduces the cost and disruption involved with some older types of digital collection tools. The second collection method is generally reserved for situations where time does not permit the selection of individual records (such as during an Anton Pillar), or where computer forensic information is required (such as deleted records). These tools create forensic images of the digital storage media, capturing all records stored on the media regardless of their potential relevancy. They also capture data stored in areas of the media that are not usually accessible to individual custodians, such as partially overwritten records. The tools available to carry out forensic imaging collections are generally used by computer forensic professionals. They require some training, and usually need to have the digital device turned off and partially disassembled. The resulting data set will require significant follow up work to separate out the relevant information, and is thus usually more costly to use. 1.2.3 Pre-Processing Analysis Regardless of the method used to collect digital records, some non-relevant information will be collected. This is less of an issue when a targeted method is used. Since digital information incorporates metadata (information about the record), this can be used to sort and possibly cull some of the collected information before it is made ready for lawyer review. Pre-process analysis is also sometimes called early case assessment or early case analysis, and is abbreviated as ECA. There are a number of ECA tools available, some stand-alone and some incorporated into end-to-end solutions. 3

ECA tools all contain features that allow various properties of the records to be analysed and grouped together. For example, every digital record has an associated timestamp. For emails, this is usually the date and time the email was sent. For spreadsheets, it may be the date and time that the contents of the file were last edited. In any case, an ECA tool can group together records whose timestamps fall within a specified range. This is useful to filter out irrelevant information if the matter has a well defined date range. ECA tools vary in the properties they can analyse. Simple tools rely solely on metadata fields such as timestamps, record types and record sizes. More sophisticated tools allow basic keyword searching, so that the contents of the records can also be analysed. The most sophisticated tools incorporate advanced text analytics to group similar records together, and automatically classify records based on their contents. 1.2.4 Processing Unlike paper records that are in a single medium, digital information comes in many forms. Each type of digital record requires a specific software application in order to review it. For example, a Word document requires Microsoft Word in order to open it and review its contents. In order to streamline the lawyer review stage of e-discovery, digital records are processed into a uniform format, so that one software application can be used to review all the records. The processing system produces three distinct parts for each record: a. The plain text contents of the record, with all formatting removed; this is indexed so that keywords can be searched and advanced text analysis can be performed; b. Fielded data about the record, including metadata as well as objectively coded information; c. A visual representation of the record as if it was viewed in its own application or printed to paper. All end-to-end e-discovery solutions incorporate a processing tool. Processing is also incorporated into some ECA tools. A few stand-alone processing tools also exist. Some processing tools incorporate more advanced features, such as automated decryption of password-protected records, grouping of email thread together, near duplicate analysis, automatic filtering of operating system files, and transcription of audio recordings. 1.2.5 Data Analytics Although the features available within ECA tools are becoming more sophisticated, ECA is still primarily used to cull date ranges and file types. Once records have been processed and added to a review environment, more sophisticated data analytics, sorting and culling can be performed. 4

All review platforms incorporate some form of data analytics, ranging from simple keyword searching, advanced, multi-phrase keyword analysis, conceptual searching and classification, and human-assisted machine learning. Simple keyword searching involves searching for a specific word or phrase, similar to the way searches are performed through a web-based search engine such as Google. Some systems allow proximity searching (one phrase within so many words of another phrase), Boolean searching (inclusive and exclusive searches), fuzzy searching (where one or several letters in the responsive word are different than in the search phrase) and thesaurus searching (searching for other words with the same meaning as the search word). Advanced keyword searching involves analysing the results of multiple keyword phrases together in order to determine how accurate each search phrase is. Some systems that employ this feature also incorporate sampling, so that a small subset of the search results can be reviewed in order to determine if the searches returned the required information. Sampling the non-responsive records is also available in some systems. Conceptual analysis involves the computer analysing the text contents of each record and classifying the record based on the frequency and location of various terms within the record. In effect, the computer is able to recognize when different phrases are used to describe the same subject or concept. For example, conceptually searching for cellular could return records that contain mobile, smartphone, cellphone and blackberry. Conceptual searching can be used to make keyword searches more accurate. Conceptual analysis can also be used to group together records that contain the same or similar concepts. This can be used to cull records that do not contain any relevant concepts, and is particularly useful to organize records for review, so that all records of a given concept are assigned to one reviewer. Machine-learning is a relatively new technology for e-discovery, although it has been used in other fields for more than twenty years. Essentially, a subject matter expect, typically a senior lawyer, trains a software application to identify relevant records. Once the application is sufficiently trained, it applies its knowledge to all the records in the collection, ranking them based on how relevant to system believes the record to be. This process is variously call predictive coding, computer assisted review (CAR), and technology assisted review (TAR). A number of advanced review platforms now incorporate this method of analysing and classifying records. 1.2.6 Review Once records have been processed and analysed, they are generally reviewed by a lawyer or team of lawyers in order to identify the relevant and privileged records, 5

classify any records as confidential, issue code records, and derive knowledge of the matter from the record contents. For small reviews (generally under a few thousand records), a simple review environment that keeps track of what has been reviewed and allows information, such as relevancy and issue codes to be associated with records is sufficient. When the number of records to be reviewed exceeds about 20,000, or the number of coding options is large, management of the review team begins to take on a significant role. Sophisticated review environments incorporate features such as review batch creation, grouping records based on email threads, near duplicates and conceptual clusters, and review progress reporting. Some also contain quality control features to measure the accuracy of each review lawyer and catch errors early in the review process. 6

2 Tools for Standard and Large Cases 2.1 Introduction e-discovery tools come in all sizes. Choosing the right tool to fit the requirements of a specific matter can be difficult. This is especially true of the standard case, which may have a small technology budget, even if the e-discovery needs are large. Unless your practice is sufficiently mixed with both large and standard cases, where a full complement of litigation support tools are probably available for any matter, being able to select the right tool for the job is paramount to keeping the budget under control. 2.2 Standard Cases Regardless of the potential number of records that need to be analysed in a matter, if the discovery budget is small, process becomes paramount. In this respect, there are three EDRM phases where technology can assist in minimizing costs: Collection; Analysis and Review. 2.2.1 Self-Collection Tools In standard cases, your client usually identifies and collects their own records. They generally have limited IT resources, and usually don t possess the technical know-how to appropriately preserve and collect electronic records in a form that aids in their review. In many cases, emails are printed out, or are forwarded to counsel. Neither of these methods allow the email metadata to be used to sort and analyse the records, and printing emails precludes simple keyword searching unless additional expense is incurred to scan and OCR the printed pages. Fortunately, there are several technology solutions designed specifically to allow custodians to self-collect electronic records. 2.2.1.1 Cloud Collection Solutions Cloud-based collection services are designed to allow custodians to identify responsive records, which are then forensically uploaded to a server where the lawyer can access and review them. At present, all of these services are offered from U.S. based vendors, and the data is stored on servers located, for the most part, in the U.S. Most of the cloud-based services are designed on a pay-as-you-go basis, and are generally priced based on the volume that is uploaded (per record or per gigabyte 2 ). 2 A gigabyte, abbreviated as GB, is a measure of volume of digital data, and is equal to one billion bytes. The number of records equivalent to a gigabyte of data varies; a one GB mailbox is about 10,000 emails, 7

The lawyer would set up the case and have the system email a link to each custodian. The custodian would receive an email with the link, click on the link, and be led through instructions to select the records to upload. These records could be emails and loose files stored anywhere that the custodian has access from their computer (i.e. stored on their local computer, a local server, cloud-based email such as Gmail, or a cloud-based storage system such as Dropbox). Once the records are selected, an exact copy of each record is automatically uploaded to the secure server. Once the custodian has finished collecting records, the lawyer can log into the server to search and organize the information. A web-based review environment is presented to facilitate quick review of the information. Records that need to be produced are flagged, and once the review is finished, the production set is sent to the lawyer as Adobe PDF documents (or alternatively as TIFF images). These can be copied to a DVD for delivery to the opposing party, or can be printed out. 2.2.1.2 USB based collection tools If the custodian does not have access to a cloud-based server, or the information can t be stored on a cloud-based server, another, inexpensive, simple solution exists. Several vendors offer automated collection software that can be installed onto a USB thumbdrive or a USB hard drive. Once the software is installed and the collection criteria is specified, the USB drive is sent to the custodian. When received, the custodian plugs the drive into their computer. The software on the drive starts automatically, and collects the appropriate electronic records. When finished, the custodian unplugs the drive and sends it back to the lawyer. Although this method is more secure than a cloud-based solution, it usually results in excess information being collected, which needs to be filtered later on. 2.2.1.3 Manual Collection of Electronic Records Sophisticated collection tools avoid some of the manual steps required to ensure that collected electronic records are unaltered and can be authenticated. However, depending on the information that needs to be collected, standard software present on most computers can be used, combined with some manual housekeeping steps. A couple of examples: Email stored in Microsoft Outlook can be copied to a separate Microsoft Outlook PST file using the Outlook software. Once copied, the PST file can be provided to the lawyer with all email metadata remaining intact. whereas one GB of excel spreadsheets could equal anywhere from a few hundred to a few thousand separate files. 8

Loose files can be put into a ZIP archive file using one of many ZIP software programs. Once the files have been archived, the ZIP file can be provided to the lawyer, with most of the file metadata intact. When manually collecting electronic records, some information should be recorded, such as when the information was collected and the selection criteria/method. 2.2.2 Searching and Analysis Tools When the collected records are provided to the lawyer as discrete files (as opposed to using a cloud-based collection service), the information needs to be analysed to remove the clearly irrelevant data before any eyes-on review takes place. When the discovery budget is tight, the analysis options are more limited, but some technology is available to streamline this process. Inexpensive off-the-shelf text indexing tools exist that can examine each record and extract the text contents for keyword searching. Similarly, file comparison software can be used to identify exact duplicates, and quickly sort and filter records by file type, so that records that would not contain relevant information, such as operating system files, can be quickly identified and removed from the collection. 2.2.3 Review Tools Unless the records were collected using a cloud-based service that includes a review environment, the individual emails and files will need to be reviewed. If the emails were provided as an Outlook PST, the PST can be opened in Outlook. However, this does not easily permit tagging and classification, and could increase the review costs due to additional housekeeping overhead. A simple review solution for small volumes of records is to use a program to convert files and emails into Adobe PDF format. These PDF files can then be loaded into Adobe Acrobat and reviewed. The full version of Acrobat (as opposed to the free Reader version) includes classification and redaction tools. Once the records have been reviewed, the relevant PDF files can be copied to a DVD or printed out for production. Some e-discovery software vendors offer light or scaled down versions of their review environments for use with small volumes of records. These are generally programs that are installed on a single computer. Some of these include features to ingest individual emails and files. This type of solution is generally appropriate where several thousand records are involved anything less can usually be handled by appropriate manual processes and Adobe Acrobat. 2.3 Large Cases 9

Cases involved hundreds of thousands or millions of records have their own unique challenges. It is impossible, or at least cost-prohibitive, to manually review millions of records. Fortunately, several technology solutions have been designed with million-plus records in mind. The two most important aspect of e-discovery that come into play with large document collections is analysis (to filter out the low-hanging fruit and reduce the collection to a more manageable size) and review management (to ensure that the eyes-on review is carried out as efficiently and accurately as possible). 2.3.1 Data Analytics Although there are several different data analytic techniques that are applicable to large document collections, the two most effective are conceptual clustering and machine learning. Most of the top-end review platforms include both of these features. However, each implementation is different. When examining review platforms, it is important that someone on your decision team understands the underlying technology being used. For example, conceptual clustering can be performed using, among other things, taxonomies and ontologies, or latent semantic indexing (LSI). Taxonomies and Ontologies are language specific, and would be ineffective when the records are in a language other than anticipated. All implementations of machine learning involve iterative steps where the computer is trained. If the trainer does not know the subject matter well, the results will be inaccurate. However, in addition to the trainer s knowledge, user interface plays a large role in how well the system is used. To illustrate, one particular implementation of machine learning provides a report, after each iteration, informing the trainer of how closely the computer s predictions match the trainer s coding. When the difference between iterations becomes insignificant, the user-interface tells the trainer that the system has been trained. However, unless the trainer reviews a different report, they will not know how well the system has been trained. It is possible with this system to have it predict correctly less than 50% of the time and still report that the system is trained. A better implementation would, behind the scenes, analyse more of the results before reporting a successful training process, rather than leave it up to the trainer to figure out. 10

3 Scalability 3.1 What is Scalability Scalability is defined as the ability to adapt to increased demands. Technologies that employ scalability usually succeed, while those that don t generally never make it off the ground. Google and Facebook, for instance, have grown into practical institutions thanks in large part to their ability to deliver the same user experience and performance no matter how much information is poured into each system. 3.2 Scalability in e-discovery In e-discovery, scalability is usually addressed in the context of rising data volumes. The technology being used to collect, process and analyze that data must be scalable in order to handle the volume digital information at each stage of matter. Achieving scalability is easier said than done. In many cases, technological requirements grow exponentially with the amount of data inputted into a given system. The concept of scalability in e-discovery can best be explained through an example: A company is being sued for poor customer service. They carry out an initial assessment of what information is needed to defend the matter, and collect 25 GB of data from their customer service department. This is put into a standalone e-discovery application running on a law clerk s computer. Once the lawyer begins to review the information, it becomes very clear that additional information, from people in other company departments, will need to be collected. This results in an additional 100 GB of data. This amount of data exceeds the capacity of the standalone e-discovery software. Fortunately, the law firm also employs a server-based e-discovery solution for large matters. The additional data is loaded into this system, along with the initially collected 25 GB, together with all lawyer coding of the initial data up to that point. Due to the increased volume, several lawyers are tasked with the review. It is soon discovered that the company used to out-source their customer service function, and the timeframe of the lawsuit would include this outsourced data. They ultimately collect another 400 GB of data. The capacity of the in-house e- discovery system is stretched beyond its limits, and a decision is made to host all of the data (525 GB) with a third party vendor using a cloud-based e-discovery system that can import all of the existing lawyer work as well as the client data. In this example, as the volume of data increased, the technology required to efficiently handle the data was augmented or changed. At each stage, the work already carried out could be transferred to the upgraded technology, avoiding having to start over. 11

Scalability is a very important factor to consider when choosing the best e-discovery software. It must be able to adapt to sudden increased volumes of data without losing existing work product, while still producing the same accurate results in a given period of time. Scalability in e-discovery software allows law firms and companies with in house legal departments to attend to small legal cases with the same quality and accuracy as they would for larger legal cases. This characteristic of e-discovery should always be considered before making any commitments to vendors and providers of such services. 12