Toolbox 4.3 February 2015 Contents Introduction... 2 Requirements for Toolbox 4.3... 3 Toolbox Applications... 3 Installing on Multiple Computers... 3 Concurrent Loading, Importing, Processing... 4 Client... 5 Software... 5 Hardware... 5 Server and Administrator Computers... 6 Software... 6 Hardware distributed environments... 7 Hardware standalone machines... 8 Source Data Location Requirements... 9 Multiple Loader Instances... 9 Importer... 9 Infrastructure Options... 10 Scenario 1 - Toolbox on the Go... 10 Scenario 2 - Traditional Toolbox... 10 Scenario 3 - Concurrent Toolbox Ingestion... 11 Scenario 4 - Distributed Toolbox... 11 Optical Character Recognition... 12 Virus Protection... 12 Commercial In Confidence Page 1
Introduction Toolbox provides a solution for litigation, dispute resolution and investigations. This document summarises the system requirements for Toolbox applications including Reviewer, Analyst, and supporting components. Toolbox runs on Microsoft Windows and Microsoft.NET technologies. It should be installed on dedicated computers, not on hardware shared with large enterprise applications or shared production systems. The configuration of your environment will depend upon the products you decide to deploy, the extent to which you want to make them available within and external to your organisation and the document volumes to be processed and hosted. As a general rule of thumb, it is always best to obtain the highest possible specification (RAM, disk space, CPU, etc) your budget will accommodate. Other environmental factors such as network utilization and Internet connection speeds may also influence performance and should be configured to allow the highest possible data throughput and concurrent user access. Your network administrator will be the best source of guidance on these environmental network issues. The information contained in this document should be read as a guideline only and as a starting point for consideration by your network administrators and technology advisors. Please feel free to contact support@discoveredt.com with any queries or to arrange for one of our technical team to contact you to discuss your specific requirements and options. Commercial In Confidence Page 2
Requirements for Toolbox 4.3 Toolbox uses Microsoft SQL Server (2008 R2 or 2012) as a central store. The data for each case is stored in its own SQL Server database. The Analyst and Reviewer web applications both use this shared data store. Toolbox does not store copies of all documents inside its databases so disk storage for original source data is also necessary. See the Source Data Location Requirements section below for more information. Toolbox Applications The applications included in Toolbox are summarised below. Application Description Target Users App type Analyst Reviewer Loader Importer QA Manager An early case assessment tool that provides quick access to data profile graphs, cost and time estimates, and tools to investigate and implement various data culling strategies. An on-line web-based review platform that can also initiate productions for exchange purposes. Extracts metadata and text from source data in its native form and adds it to case databases. Loader is a Windows application. Imports structured load files into the database. Imported documents appear in Reviewer they do not appear in Analyst. A tool for fixing and managing documents that caused exceptions during loading. Examples of exceptions include corrupt or password protected documents. Fixed documents can then be processed, completing the ingestion process begun by Loader. Agent Service Performs most of the ongoing processing of documents after the initial load, such as creating PDFs, TIFFs and disclosure packages. The Agent should be running at all times in case there is any work queued up for it to perform. Toolbox Server Performs tasks that require elevated access such as creating new Case databases. Web Manager Monitors and cleans up temporary/cached files on the IIS (Web) server. Analysts etc. Legal Reviewers Administrators Administrators Administrators (None) (None) (None) Web Web Windows Windows Windows Windows Service* Windows Service* Windows Service* * Windows Services are applications that (typically) start when the operating system is booted and run in the background as long as Windows is running. They are designed to run without direct user interaction. They can be managed using Services from the Windows Control Panel. Installing on Multiple Computers While all Toolbox applications can be run on a single computer, in production environments and especially in internet-facing situations, the applications are typically installed on multiple machines. For example, for internetfacing situations the Web Application (IIS) server and the SQL Database Server are often installed on different machines so that that firewalls and other security solutions can be established. Toolbox has a flexible architecture that enables the applications to be installed in different configurations, subject to the restrictions itemised below. Configuration Restrictions: The Loader and Agent must have access to both the source data and Common File Store (CFS) through the same drive mapping. I.e. if the source data was loaded from the X mapped drive/partition, the Agent must have access to the data from the same X mapped drive/partition. As the Agent is a Windows service (as of Toolbox 4.3) setting up mapped drives may require some assistance. Contact EDT support for more information. Commercial In Confidence Page 3
The Loader and Agent require write access to the Common File Store. Only one copy of the Agent should be installed and running. It can be installed on the SQL server, Web Server, or Applications Server but can also be installed on a dedicated server to reduce load on another machine. It is highly recommended that the computer(s) running the Loader and the Agent are dedicated to that task. That is, they are not also used to perform non-toolbox functions. The Toolbox Server must be installed on the computer running SQL Server. On the SQL server, for optimal performance set up the database, transaction logs, common file store and tempdb should be on separate volumes, correctly configured across dedicated spindles. The Web Manager must be installed on the IIS (web) server. If there are more than one webservers then Web Manager needs to be installed on all of those machines. Web Manager is not installed on the SQL server. The Importer can run on any machine or on multiple machines simultaneously. However, if your site does a lot of Loading or Importing you might consider installing the Importer on a machine separate from the Loader/Agent in order to be able to simultaneously Load/Process and Import without being constrained by the available resources of a single computer. Concurrent Loading, Importing, Processing A Toolbox installation currently supports: One Agent per site. The Agent runs continuously waiting for work to perform. Support for multiple Agents concurrently working on a case is planned for a future release of the Toolbox. Multiple Loaders. The Loader can be installed on multiple computers to load data into Toolbox. Multiple Loaders can process and load data into the same case simultaneously. Running multiple copies of the Loader on a single machine is not supported. Multiple Importers. Multiple Importers can be installed on separate computers to import Load Files into Toolbox. Multiple Importers can import into the same case simultaneously. The Agent, multiple Loaders, and multiple Importers can all be running at the same time. However, depending on the system configuration, concurrent Loading and Importing with an active Agent may put significant load on SQL Server which can affect the performance of the Analyst and Reviewer web applications. If you find this to be the case, it is recommended that Loading and Importing are carried out during times when the web applications are not being heavily utilised. Commercial In Confidence Page 4
Client Software End user computers accessing the Analyst or Reviewer web applications require the following software. Software Analyst Reviewer Windows 7 or later Microsoft Internet Explorer 9 or 10 with JavaScript enabled. Adobe Acrobat Reader 8+ Avantstar Quick View Plus 11 or 12 (Professional). 32-bit version is recommended. Required Optional Hardware No specific hardware is required apart from that necessary to run the software listed above. Commercial In Confidence Page 5
Server and Administrator Computers Software The table below lists the required software for the server and administrator applications. An example configuration is shown in the first row of the table below (represented in italic blue font). Software Web Application (IIS) Server (Toolbox Web Manager, Analyst & Reviewer Web Apps ) SQL Database Server (Toolbox Server) Loader Agent Service Importer QA Manager Operating System: Windows Server 2008 SP2+ (64 bit) Or Windows 7 Professional (See footnote 1 ) IIS 7+ (Internet Information Services) SQL Server 2008 R2 or SQL Server 2012 2 3 Ghostscript 9.05+ PDF Printer, one of the following: Adobe Acrobat 8, 9 or X Professional Bullzip PDF Printer biopdf PDF Writer IBM Lotus Notes Client 6.5 (Standalone) or 8.0 (Standalone, Messaging) 4 Avantstar Quick View Plus 12 (Professional Edition, Software Bundle) Microsoft Access Database Engine 2010 (64 bit) Microsoft.NET Framework 4.5 Required Optional 1 Windows 7 is suitable for standalone installations or when running a trial of the software. 64 bit preferred. 2 The Standard Edition of SQL Server is typically sufficient for running Toolbox. However Standard Edition has some limitations, such as being able to utilize a maximum of 4 CPUs (in SQL Server 2008 R2) and 64 GB memory, which should be considered when deciding on an Edition. 3 Proximity keyword searching is only supported when running SQL Server 2012. 4 The Lotus Notes client is required to load.nsf files. Commercial In Confidence Page 6
Hardware distributed environments This information is provided by way of a guideline only as there are many different ways the solution can be implemented to service different client environments and requirements. Detailed discussions should take place with an EDT technical consultant prior to infrastructure procurement and implementation to ensure capacity will meet client needs. Assumptions Moderately Sized Matters Larger Matters Very Large Matters Users (reviewers) 1 20 20 75 75 + Ingestion 100 GB per day 500 GB per day Up to 1TB per day (Metadata only, filtering on load, and multiple case loading) Expected Case Size 500 GB (~5,000,000 docs) 1 TB (~10,000,0000 docs) 4 TB (~40,000,000 docs) Collective Size of Cases 1 TB 5 TB 55 TB Production 20,000 docs per day 100,000 docs per day 350,000 docs per day Hardware Moderately Sized Matters Larger Matters Very Large Matters Agent, Loader & Importer server(s) 4 x 3+GHz Cores 16 GB of RAM 8 x 3+GHz Cores 64 GB of RAM 8 x 3+GHz Cores 64 GB of RAM x4 machines Loading into separate cases Web server(s) 4 x 2.4+ GHz Cores 4 x 2.4+ GHz Cores 8 x 2.4+ GHz Cores 8 GB of RAM 8 GB of RAM 16 GB of RAM x2 machines Load Balanced x4 machines Load Balanced SQL Server(s) 8 x 2.4+ GHz Cores 24 x 2.4+ GHz Cores 64 x 2.4+ GHz Cores 32 GB of RAM 128 GB of RAM 512 GB of RAM Storage 3+ TB 15+ TB 165+ TB Separate spindles for Source data, exports, and databases 6 Gbit/s SAS connection with separate spindles for: 6 Gbit/s SAS connection with separate spindles for: Source data, exports, the Common File Store, and databases, tempdb, and transaction logs Source data, exports, the Common File Store, and databases, tempdb, and transaction logs General hardware recommendations: Obtain the fastest CPUs available within your budget. Some case-level database processes are CPU intensive. Additional memory on the SQL Server will improve the execution time of case-level operations. More memory allows SQL Server to cache more database content, thereby increasing performance. Follow other SQL Server best practices. For example, distribute database-related files/logs/etc across multiple LUNs/spindles/RAID arrays. General virtualisation recommendations: Dedicate sockets, memory and separate hard disks to each virtual machine i.e. do not share or over allocate physical resources Commercial In Confidence Page 7
General storage recommendations: Use fast drives such as 15K RPM SAS and Solid State Disk (SSD) drives with, where appropriate, RAID configurations to maximum disk I/O performance. As a rule of thumb the storage spacerequired for each case is three times the original source data size. The storage space should initially be distributed among the Source data, the Common File Store and the database server. Additional storage is required by the web server for file caching and by the Agent for the export destination. Segregate data storage repositories by physical hard drive spindles. The data storage repositories include the Source data for each case, Common File Store data, Export destination, and the databases. Segregating the data reduces competition for storage resources. Additional storage may be required for external Optical Character Recognition (OCR) applications. Hardware standalone machines This information is provided by way of a guideline only as there are many different ways the solution can be implemented to service different client environments and requirements. Detailed discussions should take place with an EDT technical consultant prior to infrastructure procurement and implementation to ensure capacity will meet client needs. Description Entry level laptop for small or off-site jobs / demos / several reviewers Entry level laptop for small or off-site jobs / demos / several reviewers Example more powerful workstation for single server license running multiple cases and larger review teams Hardware Example Intel Core i5 2410M 2.3 GHz (2.9 GHz Turbo) 16GB RAM 120GB Solid State Drive Intel quad core i7 3.5GHz CPU 16GB DDR3 RAM 1TB SATA3 Hard Drive (3 separate hard drives are preferable to spread SQL load) 2 x six core CPUs and 32GB RAM 2 x 128GB Enterprise SSD 2 x 256GB Enterprise SSD 2 x 512GB Enterprise SSD 2 x HP 1.2TB SAS 10k Hard Drives 1 x RDX Backup Drive external USB3 Commercial In Confidence Page 8
Source Data Location Requirements When the Loader is used to load source data into Toolbox it initially loads only the document metadata and (optionally) the text of the documents. At this point, the database does not contain native files or images. As documents are moved into Reviewer and Disclosure Packages are created and so on, native files and images are retrieved or generated and stored in the database on an as-needed basis. Ongoing access to the source data in its original location (the location it was in at the point of loading) is required by Toolbox applications (the Agent and the Toolbox Server) in order to carry out further on-demand processing, production and file extraction. Multiple Loader Instances As described in the Installing on Multiple Computers section, the Loader and Agent can be installed on separate machines. In addition, to increase Loader performance the Loader can be installed on multiple machines for loading into a single case or multiple cases. The Source Data and Common File Store drive mapping on each Loader machine must be identical to the drive mapping on the machine running the Agent. Importer If the Importer is used to import a load file, all of the relevant content in the Load file is copied into the case database. Therefore the Load file does not need to remain on the system. Commercial In Confidence Page 9
Infrastructure Options Toolbox s flexibility can accommodate almost any infrastructure requirements, whether they be an onsite single instance or a multi-client distributed processing and review service. Our multithreaded, concurrent architecture enables infrastructure to scale with demand, something essential to growing an ediscovery capacity or when fully leveraging Cloud based solutions and supporting Infrastructure and Software as a Service models. Scenario 1 - Toolbox on the Go All applications and services installed on the one machine. Scenario 2 - Traditional Toolbox Traditional segregation of Toolbox, applications, web server and SQL Database on a single site. Commercial In Confidence Page 10
Scenario 3 - Concurrent Toolbox Ingestion An expansion of Scenario 2 to allow concurrent Loader and Importer applications on a single machine and across machines to ingest data more quickly. Scenario 4 - Distributed Toolbox Commercial In Confidence Page 11
A Toolbox Site is way to provide multiple-tenants with an insulated Toolbox instance. Each site has its own: Web user accounts Analyst and Reviewer applications. When Toolbox is installed for the first time a single Toolbox Site is created. The Toolbox installation programs do not currently support creating multiple sites; these must be created manually by EDT support engineers. Multiple Sites are typically only used by service providers who want to set up a site for each customer. Advanced models of Toolbox enable multiple sites to be connected to a single webserver and database server. Each site can carry different branding and can be configured to service a single client. Each site is invisible to other sites due to a Virtual Wall. The separation of sites enables concurrent Loader and Importer applications to service a single case as well as concurrent rendering and exporting across multiple sites. The Distributed Toolbox enables flexible solutions to offer Toolbox as a scalable service where sites are established on a per client basis and additional processing power can be allocated when required. This architecture is particularly suitable for SAAS or cloud based delivery. The Distributed Toolbox enables flexibility to service multiple clients concurrently from the same infrastructure and delivers efficiency, scalability and performance due to the ability to deploy processing power dynamically, as an when it is required across multiple cases and sites. Optical Character Recognition Toolbox supports an OCR workflow with the use of third party products. We recommend ABBYY Recognition Server (running on a dedicated workstation on the same network as Toolbox). Virus Protection It is also recommended that all software to be loaded into Toolbox be checked first for viruses and other malware. This is often done on a dedicated workstation prior to the loading of data. Commercial In Confidence Page 12