Open Text Archive Server and Microsoft Windows Azure Storage

Size: px
Start display at page:

Download "Open Text Archive Server and Microsoft Windows Azure Storage"

Transcription

1 Open Text Archive Server and Microsoft Windows Azure Storage Whitepaper Open Text December 23nd, 2009

2 2 Microsoft W indows Azure Platform W hite Paper Contents Executive Summary / Introduction... 4 Overview... 4 About the Open Text Archive Server... 5 Architecture... 5 Scalability and Distribution... 5 Features of the Open Text Archive Server... 6 Single Instance Archiving... 6 Compression... 6 Encryption of the stored data... 6 Secure Data Transport... 6 Data transport secured with checksums... 6 Retention Handling... 6 Storage Management... 7 Logical archives... 7 Hardware abstraction... 7 Supported storage media... 7 Backup, Replication, High Availability and Disaster Recovery... 7 Backup... 7 Disaster recovery... 8 Remote standby... 8 High Availability... 8 About Microsoft Windows Azure Storage... 9 Archive Server integration with Azure Storage Business Case How can Open Text customers profit from the Microsoft Azure Storage? What are the benefits for the customer Performance Measurements Test scenarios Test environment Host system Virtual test clients Archive Server Network connection Performance Results Load on the Archive Server Iteration with 10 kb documents Iteration with 20 kb documents Iteration with 50 kb documents Iteration with 100 kb documents

3 3 Microsoft W indows Azure Platform W hite Paper Iteration with 200 kb documents Iteration with 500 kb documents Iteration with 1000 kb documents Improvement options... Fehler! Textmarke nicht definiert.26 Summary Microsoft Windows Azure Update About Open Text... 33

4 4 Microsoft W indows Azure Platform W hite Paper Executive Summary / Introduction Overview This white paper describes how the Open Text Archive Server integrates with Microsoft Windows Azure Storage. Azure Storage is not only a newly supported storage platform for the Archive Server, but also brings new features for the deployment of an ECM environment. Traditional storage platforms are optical jukeboxes or hard disk systems which are installed at customer site. The customer had to purchase the hardware together with maintenance contracts. Besides these investments UPS (uninterruptible power supply), cooling and space in the IT center had to be provided. Microsoft Azure Storage relieves the customer from buying expensive hardware which after only a few years is out-dated. With Azure Storage the customer gets a virtually unlimited storage through a web service interface. The performance of local storage will be better than for cloud storage, but for long-term storage cost factors can overweigh high-performance requirements.

5 5 Microsoft W indows Azure Platform W hite Paper About the Open Text Archive Server The Open Text Archive Server is a core component of the Open Text ECM Suite and constitutes the archiving foundation for enterprise-wide ECM solutions. It enables storage, ingestion and retrieval of archived content. The archiving functionality is an integral part of the Open Text Enterprise Library. Open Text offers several connectors to expand the archiving functionality. These connectors allow you to manage business documents in different applications and to link them to the business processes, e.g. Open Text Archiving for Microsoft Exchange, Open Text Storage Services for Microsoft SharePoint Architecture The Open Text Archive Server comprises multiple services and processes, such as the Document Service, the Administration Server and the Storage Manager. The Document Services provides document management functionality, storage of technical metadata, and secure communication with archiving clients.. The Storage Manager is responsible for managing external devices. The Administration Server offers an API to administer the archive environment, tools and jobs. Open Text ArchiveServer Architecture Scalability and Distribution The Archive Server is built for enterprise-wide deployments. This means, the Archive Server has: Strong capabilities in the sense of scalability in document volumes.

6 6 Microsoft W indows Azure Platform W hite Paper Strong capabilities to distribute the system to all business regions. Flexibility to run the system on existing databases and operation systems. Flexibility to connect the system to existing or new storage hardware. The Archive Server client/server architecture provides versatile options for configuring, scaling and distributing an enterprise-wide archive system. Features of the Open Text Archive Server Single Instance Archiving In groupware scenarios, identical documents can be a risk of wasting storage space when s with attachments are sent to hundreds of recipients. The Archive Server enables single instance archiving (SIA), keeping the same document only once on the connected storage platform. Compression In order to save storage space, content can be compressed before writing to storage system. Compression can be activated different content types, and can reduce storage storage by more than 30 percent. Encryption of the stored data By encrypting data, e.g. critical data such as salary tables, content is on the storage is secured and cannot be read without an archive system. Secure Data Transport Use of SSL ensures authorized and encrypted communication. Data transport secured with checksums Checksums are used to recognize and reveal unwanted modifications to content on its way from creation to the long-term storage. Checksums are verified, and errors reported. Retention Handling The Archive Server allows applying retention periods to content. Retention periods are handled by the Archive Server and are passed to the storage platform, as far as the storage platform supports the notion of retention.

7 7 Microsoft W indows Azure Platform W hite Paper Storage Management Logical archives A logical archive is an area on the Archive Server in which documents belonging together can be stored. Each logical archive can be configured to represent a different archiving strategy appropriate to the types of documents archived exclusively there. Logical archives make it possible to store documents in a structured way. You can organize archived documents in different logical archives according to various criteria, e.g. Compliance requirements The archiving and cache strategy Storage platforms Customer relations (for ASPs) Security requirements Hardware abstraction Key task of the Archive Server is hiding specific hardware characteristics to leading applications, providing transparent access, and optimizing storage resources. The Archive Server can handle various types of storage hardware; and provides hardware abstraction by offering a unified storage. If a hardware vendor s storage API changes, or if new versions come up, it s not necessary to change all the leading applications using the hardware, but only the Archive Server s interface to the storage device. Supported storage media The Archive Server supports a wide range of different storage media and devices. Supported storage media are cloud storage, normal hard disk drive storage, hard disk write-once media and optical media. Backup, Replication, High Availability and Disaster Recovery Backup Power outages, physical damage, outdated media, hardware faults or usage errors can unexpectedly shut down IT operations at any time. Archive Server provides a variety of options to optimize the availability of the business documents.

8 8 Microsoft W indows Azure Platform W hite Paper Archive Server can create copies of volumes as backups. The copies may be produced on the local archive server or on a remote backup or standby server. To avoid losing data in the event of a hard disk failure and resume using Archive Server immediately, we recommend using RAID (Redundant Array of Independent Disks) technology as an additional data backup mechanism. In addition to document content, administrative information is synchronized between original and backup systems. Disaster recovery The Archive Server stores the technical meta data together with content on the storage media (e.g. DocId, aid, timestamp, ). This allows Archive Server to completely restore access to archived documents in case the Archive Server hardware has a major breakdown or has been destroyed. Remote standby With a remote standby server, all the documents in an archive are duplicated on a second Archive Server the remote standby server via a WAN connection for geographic separation. If the production Archive Server fails, the remote standby server continues to provide read-access to all the documents. Physically separating the two servers also provides optimal protection against fire, flood and other catastrophic loss. High Availability To eliminate long downtimes, the Archive Server offers active-passive high availability. High availability is a two node cluster solution, in which a fully-equipped Archive Server node monitors the current production system by heart-beat. If a node fails, the other node automatically assumes all activities, with full transparency for end users. If the production system fails, users can continue to work normally on the secondary archive system. In contrast to the remote standby server scenario, both read (retrieval) and write (archiving) access to documents is possible in this configuration.

9 9 Microsoft W indows Azure Platform W hite Paper About Microsoft Windows Azure Storage Describe the properties and features of Azure Storage Technical information: scaling, sizing,

10 10 Microsoft W indows Azure Platform W hite Paper Archive Server integration with Azure Storage The Archive Server treats Microsoft Windows Azure Storage as a storage device where single documents can be stored. To configure the connection to Azure Storage a file <device>.setup has to be configured. The *.Setup file (e.g. azure.setup) is the link between the Archive Server and the Azure system. It contains all necessary information to access the Azure servers. The first line contains the connection info, which is needed to load the corresponding Azure library. This library is provided by Open Text, and establishes the connection to the Azure storage. If installed and configured correctly, you will see an entry in Administration Client under Devices showing the Azure storage device. Storage space in Archive Server devices can be accessed through volumes. Volumes are attached to logical archives, thus providing dedicated storage space to logical archives. Volumes in Azure devices are closely related to Azure containers. A container is basically the top-level directory in an Azure cloud data is being stored in. One or more volumes can be associated with one Azure container. Linkage between Azure containers and volumes is configured in the Setup file. Actually so-called GS-partitions are linked to the containers. GS-partitions have a one-toone relation to Archive Server volumes. To access an Azure container an account name and access key is necessary. The following picture shows an Azure device names OTCloud. Five volumes are configured.

11 11 Microsoft W indows Azure Platform W hite Paper Figure 1 Configuration of Microsoft Windows Azure as storage device The volumes market_vol1, market_vol2, market_vol3 are used in the logical archive HH_LA_4. Figure 2Cloud volumes are attached to a pool

12 12 Microsoft W indows Azure Platform W hite Paper Business Case How can Open Text customers profit from the Microsoft Azure Storage? Any customer using an application based on the Open Text ECM Suite and the Archive Server is a candidate for using Azure Storage. Customers have to upgrade to Archive Server Use of Azure Storage is not restricted to Microsoft Windows platforms, but also available for Unix OS, such as Sun Solaris, IBM AIX, HP HP-UX and Linux. The Archive Server runs on-premise at customer site whereas the Azure Storage is provided over the Internet. To use Azure Storage customers need to contact Microsoft for an account. With the account the customer can configure the storage environment (see page 10) and start using the cloud. The Archive Server comes with an in-built Volume Migration tool which allows transparently migrating existing content on local hardware to the Azure Storage. What are the benefits for the customer The customer only buys what he needs and can grow continuously. He has only to pay for the storage space in use and the upload and download traffic. With Azure Storage customers have a small initial investment, no maintenance fees and pay only for what you really need.

13 13 Microsoft W indows Azure Platform W hite Paper Performance Measurements Performance tests were done by using the Open Text XOTE test tool. The XOTE test tool is an internal test suite developed by the Open Text Quality Assurance department to run performance and benchmark tests with the Archive Server and storage platforms. The tool allows creating arbitrary documents of different size; supports automated test scenarios and collects result for evaluation in log files. Within the benchmark test the following test cases were set up. Test scenarios 1. Write documents to the disk buffer 2. Read documents from the disk buffer (verify) 3. Write the documents to the Microsoft Windows Azure 4. Purge documents from disk buffer (not evaluated in this white paper) 5. Read documents from Microsoft Windows Azure 6. Delete documents from Microsoft Windows Azure The tests were performed with different documents sizes. # of documents Document size kb kb kb kb kb kb kb Measured times are extracted from the log files of the tool with a precision of milliseconds. The start and end times are given in GMT+1 (CET). Test environment The test setup consists of one Archive Server connected to Microsoft Windows Azure storage.

14 14 Microsoft W indows Azure Platform W hite Paper Microsoft Windows Azure is configured as storage device on the Archive Server, and documents are written to the storage by using a so-called Single file pool. Compression and single instance archiving are disabled. The pool is configured with 15 threads to OT Azure library, and 15 connections were configured for the connection between the OT Azure library and Microsoft Windows Azure. SSL was used to connect to Microsoft Azure Storage. ARCHIVE SERVER Document Service libazure http client 15 connections 15 connections http server MICROSOFT AZURE STORAGE Figure 3Archive Server and Microsoft Windows Azure connection There are four test PCs each hosting 5 virtual clients that send parallel read and write request to the Archive Server. In sum, 20 parallel clients send read and write requests. All servers are hosted on a Hyper-V server with Microsoft Windows 2008 Server as operating system. Host system 2 x Quad Core Opteron 2376 (2,3GHz, 6MB) 32GB (8x4GB Dual Rank DIMMs) 667MHz 450GB SAS /min Gigabit Ethernet network Virtual test clients Windows Server 2008 R2 Standard (64 Bit) 2 (virtual) CPU 2,3 GHz Opteron GB memory

15 15 Microsoft W indows Azure Platform W hite Paper Archive Server Version 9.7.1, Patch AS Windows Server 2008 R2 Standard (64 Bit) 4 (virtual) CPU 2,3 GHz Opteron GB memory Network connection The Archive Server is connected via a Gigabit Ethernet to the Open Text network in Munich/Germany. The Open Text network (Ethernet backbone) connects via the Internet (155 Mbit) to the cloud storage stored in South US. Therefore, the latencies and throughput from the Archive Server to Windows Azure is dominated by a combination of (a) the connection between Munich and South US, (b) the bandwidth between the two sites.. Location: Munich, Germany 5 archive clients per PC 20 clients in total Client PC: Windows Server 2008 R2 Standard (64 Bit) 2 (virtual) CPU 2,3 GHz Opteron GB memory Gigabit-Ethernet between Clients and Server ARCHIVE SERVER Hyper-V Windows Server 2008 R2 Standard (64 Bit) 4 (virtual) CPUs 2,3 GHz Opteron 2376, 4 GB memory Gigabit Ethernet Hyper-V Server 2 x Quad Core Opteron 2376 (2,3GHz, 6MB) 32GB (8x4GB Dual Rank DIMMs) 667MHz 450GB SAS /min Internet Location: South U.S. Microsoft Azure Storage

16 16 Microsoft W indows Azure Platform W hite Paper 5 archive clients per PC 20 clients in total Client PC: Windows Server 2008 R2 Standard (64 Bit) 2 (virtual) CPU 2,3 GHz Opteron GB memory Gigabit-Ethernet between Clients and Server ARCHIVE SERVER Hyper-V Windows Server 2008 R2 Standard (64 Bit) 4 (virtual) CPUs 2,3 GHz Opteron 2376, 4 GB memory Gigabit Ethernet Hyper-V Server 2 x Quad Core Opteron 2376 (2,3GHz, 6MB) 32GB (8x4GB Dual Rank DIMMs) 667MHz 450GB SAS /min Internet Microsoft Azure Storage Figure 4 Test environment and deployment

17 17 Microsoft W indows Azure Platform W hite Paper Performance Results Load on the Archive Server The following figures show the load of the server during the different phases. These figures didn t change with different document size. Figure 5 Archive Server taskmanger during archiving documents to the disk buffer Figure 6 Archive Server taskmanger during verifying documents on the disk buffer

18 18 Microsoft W indows Azure Platform W hite Paper Figure 7 Archive server taskmanger during writing documents to the Azure Figure 8 Archive server taskmanger during purging documents from the disk buffer

19 19 Microsoft W indows Azure Platform W hite Paper Figure 9 Archive server taskmanger during verifying documents from Microsoft Windows Azure Figure 10 Archive server taskmanger during deletion of documents from Microsoft Windows Azure

20 20 Microsoft W indows Azure Platform W hite Paper Iteration with 10 kb documents The minimum and maximum values for the different scenarios can vary extremely. This can be due to temporary additional load on the server or on the network. Action AVG (ms) Min (ms) Max (ms) Write to disk buffer 19,75 < Read from disk buffer 87, Write to Azure 1.239, Read from Azure 825, Delete from Azure 1.153, Table 1 Overview on results for 10 kb documents Iteration duration for documents: Start: End: 6,5 hours :11:54 (Fri) :32:12 (Sat) The cause for the maximum value is unknown. The average was calculated over documents. The minimal value (578 ms) for reading from Azure is an upper boundary for the latency time. In the Figure 11 the average time per step during the test is shown in a graphical view.

21 21 Microsoft W indows Azure Platform W hite Paper 10 KB documents average time per step 1.400, , ,00 800,00 600,00 400,00 200,00 0, documents Write to Diskbuffer Read from Diskbuffer Write to Azure Read from Azure Delete from Azure Figure 11 Graphical overview on results for 10 kb documents

22 22 Microsoft W indows Azure Platform W hite Paper Iteration with 20 kb documents Action AVG (ms) Min (ms) Max (ms) Write to disk buffer 20,00 < Read from disk buffer 121, Write to Azure 1.213, Read from Azure 822, Delete from Azure 1.125, Table 2 Overview on results for 20 kb documents The Figure 12 shows the values of Table 2 in a graphical view. 20 KB documents average time per step 1.400, , ,00 800,00 600,00 400,00 200,00 0, documents Write to Diskbuffer Read from Diskbuffer Write to Azure Read from Azure Delete from Azure Figure 12 Graphical overview on results for 20 kb documents Iteration duration for documents: Start: End: 6,5 hours :11:54 (Fri) :32:12 (Sat)

23 23 Microsoft W indows Azure Platform W hite Paper Iteration with 50 kb documents Action AVG (ms) Min (ms) Max (ms) Write to disk buffer 37, Read from disk buffer 200, Write to Azure 1.362, Read from Azure 881, Delete from Azure 1.141, The graphical overview is shown in Figure 5 below. Table 3 Overview on results for 50 kb documents 50 kb documents 1.400, , ,00 800,00 600,00 400,00 200,00 0, documents Write to Diskbuffer Read from Diskbuffer Write to Azure Read from Azure Delete from Azure Figure 13 Graphical overview on results for 50 kb documents Iteration duration for documents: Start: End: approx. 7 hours :31:44 (Sat) :12:24 (Sat)

24 24 Microsoft W indows Azure Platform W hite Paper Iteration with 100 kb documents Action AVG (ms) Min (ms) Max (ms) Write to disk buffer 54, Read from disk buffer 350, Write to Azure 1.402, Read from Azure 1.310, Delete from Azure 1.114, Table 4 Overview for 100 kb documents 100 kb documents 1.600, , , ,00 800,00 600,00 400,00 200,00 0, documents Write to Diskbuffer Read from Diskbuffer Write to Azure Read from Azure Delete from Azure Figure 14 Graphical overview for 100 kb documents The graphic shows that the read request is longer than the delete request and almost as long as the write request. The following iterations show that the time of the read process will increase with file size. Iteration duration for documents: Start: End: approx. 7 hours :16:57 (Sat) :17:31 (Sun)

25 25 Microsoft W indows Azure Platform W hite Paper Iteration with 200 kb documents Action AVG (ms) Min (ms) Max (ms) Write to disk buffer 91, Read from disk buffer 707, Write to Azure 1.650, Read from Azure 2.269, Delete from Azure 935, Table 5 Overview for 200 kb documents As already described in the iteration of 100 kb documents the time consumption of the read process is growing significantly with the size of the documents. 200 kb documents 2.500, , , ,00 500,00 Write to Diskbuffer Read from Diskbuffer Write to Azure Read from Azure Delete from Azure 0, documents Figure 15 Graphical overview on results for 200 kb documents Iteration duration for documents: Start: End: approx. 10,5 hours :47:28 (Sun) :13:38 (Sun)

26 26 Microsoft W indows Azure Platform W hite Paper Iteration with 500 kb documents Action AVG (ms) Min (ms) Max (ms) Write to disk buffer 199, Read from disk buffer 1.551, Write to Azure 2.530, Read from Azure 4.016, Delete from Azure 1.089, The graphical overview is shown in Figure 16. Table 6 Overview of time per step for 500 kb documents 500 kb documents 4.500, , , , , , , ,00 500,00 0, documents Write to Diskbuffer Read from Diskbuffer Write to Azure Read from Azure Delete from Azure Figure 16 Graphical overview on results for 500 kb documents Iteration duration for documents: Start: End: approx. 10 hours :58:19 (Mon) :10:26 (Tue)

27 27 Microsoft W indows Azure Platform W hite Paper Iteration with 1000 kb documents Action AVG (ms) Min (ms) Max (ms) Write to disk buffer 383, Read from disk buffer 3.281, Write to Azure 3.866, Read from Azure 7.249, Delete from Azure 1.074, The graphical overview is shown in Figure 16. Table 7 Overview on results for 1000 kb documents kb documents 8.000, , , , , , , ,00 0, documents Write to Diskbuffer Read from Diskbuffer Write to Azure Read from Azure Delete from Azure Figure 17 Graphical overview on results for 1000 kb documents Iteration duration for documents: Start: End: 11 hours: :21:04 (Fri) :09:09 (Fri)

28 28 Microsoft W indows Azure Platform W hite Paper Summary Any interpretation of the results has to be done with care. There are a lot of known and unknown parameters influencing the results. The following parameters can influence the results: Throughput capacity and variations acrossof the internet is unknown Variation of throughput capacity during time of day is unknown Performance dependency on number of http client connections is unknown. Because of the high network latency times (response time) the throughput strongly depends on the number of parallel requests, i.e. the number of parallel http connections. Outlook The results show that the Internet latency seems to be limiting factor for write and read requests. To proof the assumption and to overcome this factor several steps are possible. Client and Archive Server installation in the U.S. or use a European data center with the clients in Munich. As read performance from Azure Storage decreases with document size, a cache implementation on the Archive Server can improve the performance for larger documents.

29 Time in ms 29 Microsoft W indows Azure Platform W hite Paper time in msec size 10 KB 20 KB 50 KB 100 KB 200 KB 500 KB 1000 KB Write to Diskbuffer Read from Diskbuffer Write to Azure Read from Azure Delete from Azure Table 8 Overall measurement results The following graphic shows an overall view on the different test runs Write to Diskbuffer Read from Diskbuffer Write to Azure Read from Azure Delete from Azure Document size in kb Figure 18 Overall view of benchmark tests with Microsoft Windows Azure The following findings can be deduced from Figure 18: Read and write requests to local disk are significantly faster than requests sent to the cloud. The average time to write documents increases slightly with larger files. This applies to writing to the disk buffer as well as to writing to Microsoft Windows Azure. The increase of dependency on document size is higher for write requests from the cloud compared to local disk. This is probably due to the latencies for the http requests and network bandwidth.

30 Write rate in MB/sec 30 Microsoft W indows Azure Platform W hite Paper The deletion of the documents is mainly independent of the document size. The time is constant over the different iterations. The document retrieval (read) time increases for documents larger than 64 kb. This applies to reading from the disk buffer as well as to reading from the cloud. This affect is due to the fact that the Archive Server reads documents in 64 kb chunks. This effect did not matter up to now, as it only gets significant if latency times are high. The problem could be resolved by implementing a read cache on the Archive Server. 4,50 4,00 3,50 3,00 2,50 2,00 1,50 1,00 0,50 0, Write rate Document size in kb Figure 19 Write rates for Microsoft Windows Azure The write rate to Azure is calculated from the number of connections to Azure, document size and write time per document. Write Rate = # of connections/write time * document size The number of connections for writing was 15. The rate did not yet reach the saturation, i.e. more connections or larger documents could lead to higher write rates.

31 Read rate in MB/sec 31 Microsoft W indows Azure Platform W hite Paper 3,00 2,50 2,00 1,50 1,00 Read rate Azure 0,50 0, Document size in KB Read Rate = # of clients/read time * docsize # of clients for reading was: 20 The decrease of the rate for 20 kb documents is unclear. Due to 64 kb block reads the read rate is lower than the write rate. Microsoft Windows Azure Update On November 11 th 2009 Microsoft released a new version of Windows Azure. Open Text was not aware of the upgrade, and some tests were already performed with the new release. The results did not show a significant change. The following diagram shows a representation of the results in a logarithmical manner. This allows a better overview on the results for small documents.

32 Time in ms 32 Microsoft W indows Azure Platform W hite Paper Write to Diskbuffer Read from Diskbuffer Write to Azure Read from Azure Delete from Azure Document size in kb Figure 20 Overall view of benchmark test with Microsoft Windows Azure Cloud Storage (logarithmical)

33 33 Microsoft W indows Azure Platform W hite Paper About Open Text Open Text is a leader in Enterprise Content Management (ECM). With two decades of experience helping organizations overcome the challenges associated with managing and gaining the true value of their business content, Open Text stands unmatched in the market. Together with our customers and partners, we are truly The Content Experts, supporting 46,000 organizations and millions of users in 114 countries around the globe. We know how organizations work. We have a keen understanding of how content flows throughout an enterprise, and of the business challenges that organizations face today. It is this knowledge that gives us our unique ability to develop the richest array of tailored content management applications and solutions in the industry. Our unique and collaborative approach helps us provide guidance so that our customers can effectively address business challenges and leverage content to drive growth, mitigate risk, increase brand equity, automate processes, manage compliance, and generate competitive advantage. Organizations can trust the management of their vital business content to Open Text, The Content Experts. group URL] Sales: Support: [ address] [phone number] [ address] [phone number] www. o p e n text.com For more information about Open Text products and services, visit Open Text is a publicly traded company on both NASDAQ (OTEX) and the TSX (OTC). Copyright 2009 by Open Text Corporation. Open Text and The Content Experts are trademarks or registered trademarks of Open Text Corporation. This list is not exhaustive. All other trademarks or registered trademarks are the property of their respective owners. All rights reserved. SKU#_EN