Building Cost-Effective Storage Clouds A Metrics-based Approach

Size: px
Start display at page:

Download "Building Cost-Effective Storage Clouds A Metrics-based Approach"

Transcription

1 Building Cost-Effective Storage Clouds A Metrics-based Approach Ning Zhang #1, Chander Kant 2 # Computer Sciences Department, University of Wisconsin Madison Madison, WI, USA 1 nzhang@cs.wisc.edu Zmanda Inc. Sunnyvale, CA, USA 2 ck@zmanda.com Abstract There is a critical need, across various application classes such as archiving, backup of thousands of nodes in an organization, video sharing, Dropbox-like storage, etc., that requires highly reliable and scalable storage systems. In the new cloud world, these systems can be built fairly economically (and in house). So, the challenge is how to balance the cost and performance of building and operating such a storage system. The focus of this work is on proposing an effective solution and valuable observations for this challenge within the context of the OpenStack Object Storage (Swift) system. I. INTRODUCTION Traditional filesystem-based data stores don t scale and don t have the access characteristics that are needed by many modern IT services, such as archiving, data backup, video sharing, and Dropbox-like storage. RESTful API-based cloud storage platforms provide many characteristics needed by such applications. This trend is already evident with the success of several large scale cloud storage services such as Amazon S3 and Google Cloud Storage. With the help of new cloud computing technologies, it is now feasible to build in-house cloud storage systems, i.e. private cloud storage. But, there is a lack of available research and tools to offer guidance to storage cloud builders as to how to best build such storage clouds. Cloud-based storage is a crucial part of the future of big data, and warrants a rigorous study to provide insights and best practices for achieving high availability, durability and scalability. For our study, we choose the OpenStack Object Storage (Swift) [1], which is a massively scalable redundant key-value storage system. Swift can run on commodity hardware, provide high durability and scale from a few servers to thousands of nodes with petascale storage capacity. Since Swift is a free and popular open-source project, and has been deployed in production environments, such as HP Cloud [2] and Rackspace Cloud Files [3], using Swift as the cloud storage infrastructure provides us with a platform that is a sound practical starting point for storage cloud builders. Similar to other large-scale data processing systems, Swift is a sophisticated key-value store with heterogeneous types of nodes, with different responsibilities assigned to each type of node. Thus, cloud operators who build a Swift-based cloud storage (or simply called Swift cluster) with low upfront cost, while adhering to certain Service-Level Agreement (SLA) on performance and availability, need to address some arduous challenges with overlapping implications. These challenges are: (1) How many servers should be provisioned for each type of node? (2) How to provision enough hardware resources (e.g. CPU, network, I/O devices) for each type of node? In this paper, we present our method, called Swift Advisor, to answer the questions listed above. Based on our results from evaluation, we observe Swift Advisor is effective in providing appropriate configurations for different workloads. Thus, with our Swift Advisor, storage cloud architects now have a tool that easily allows them to build modern cloud-based storage systems. Another important challenge that the cloud operators need to consider is the impact of s in the cloud infrastructure (e.g. hardware or software ). In many cases, cloudbased storage is deployed to serve the storage needs spread over a long period of time. Given the size and duration of such infrastructure, periodic s in the infrastructure are inevitable. So, it is important to consider how a Swift cluster performs in a degraded mode. Can clusters be designed to ensure that the performance doesn t drop below the required SLA even when faced with some s? In this paper, we also address this question and describe how to operate a Swift cluster to minimize the negative impact caused by common s. Cloud builders for storage clouds with high accessibility requirements need to consider the impact of site s. Thus storage clouds need to be geo-redundant. In this paper, we also present a method and performance considerations for building a storage cloud with a remote replica. On a cautionary note, we acknowledge that in this initial paper, we only focus on a small part of the provisioning problem in storage clouds. For example, we focus only on the hardware provisioning, ignoring the software tunnings. The area of resources provisioning in storage clouds is fairly new and there are many open unsolved problems. We hope that this work seeds other work in this area to examine and solve

2 Applications PUT requires quorum Proxy Node 1 Storage Node 1 Load Balancer copy copy #1 #2 Storage Node Storage Node 2 Storage Node 3 GET from single copy Proxy Node 2 Fig. 1: Swift architecture copy #3 Storage Node 5 these and other open problems. The remainder of this paper is organized as follows: Section II introduces some background material. Section III describes the Swift Advisor. Detailed empirical results are presented in Section IV. In Section V, we describe how to deal with various scenarios. In Section VI we present and evaluate our solution for geo-mirroring. Related work is described in Section VII, and Section VIII contains our concluding remarks and directions for future work. II. SWIFT OVERVIEW Swift is a highly scalable and durable key-value storage system that is designed to store large amounts of unstructured data. Swift does not have a central brain or master point of control and hence there is no single point of. The Swift architecture contains two types of nodes, proxy nodes and storage nodes, as shown in Fig. 1. The proxy nodes intercept the incoming requests. Once a request is sent to a proxy node, it figures out which storage nodes should handle the request. For a small Swift deployment, a minimum of two proxy nodes are recommended for redundancy. If one proxy node goes down, the other one can take over the workload from the failed proxy node. The storage nodes take requests from the proxy nodes and provide the actual storage space for storing the objects that are replicated (the default replication factor is three). The three copies of each object are evenly spread over all storage nodes via consistent hashing. In the Swift architecture, a zone is used to isolate boundaries. Each copy of data resides in a separate zone so that three copies of data are stored in three different zones. A minimum Swift deployment must have five storage nodes (and two proxy nodes), and each storage node represents its own zone. By default, a Swift cluster has five zones. In a large Swift deployments, a zone could be an entire rack of storage nodes. The objective of defining the concept of a zone is to tolerate sudden unavailability of some storage nodes. If a zone becomes unaccessible, the data in that zone is replicated across other zones. Similar to other key-value stores, the PUT, GET and DEL operations are used to upload, retrieve and delete objects from a Swift cluster via API, respectively. PUT operation: The PUT operation triggers three concurrent writes to push three copies of a data item to the storage nodes. If two writes complete successfully, then Swift declares success, and returns a completed message to the client. The third write can be delayed and Swift relies on the replication process to ensure the third replica is created. GET operation: The GET operation fetches one copy of the data. If it fails (e.g. due to data corruption), the GET request will fetch the data item from another replica in the system. DEL operation: The DEL request is sent to the storage nodes to remove the (three) copies of the data. As each object in the Swift cluster is replicated three times by default, cloud operators need to prepare a Swift cluster with capacity of the total size of all objects that will be stored. Planning and increasing the storage capacity is simple: the cloud operators simply need to attach new s to the storage nodes as needed without affecting the existing data and normal Swift operations. III. SWIFT ADVISOR Within the context of this paper, a Swift configuration includes: (1) How many proxy nodes and storage nodes to allocate in a Swift cluster? (2) How to provision the hardware, including CPU, memory, network and I/O, for the proxy and the storage nodes? A naive way to determine the optimal Swift configuration is to enumerate and validate each possible configuration. However, this approach is expensive, since with M different hardware choices for proxy nodes, N different hardware choices for storage nodes and any combination between the numbers of proxy and storage nodes within a cluster, the search space of all possible configurations is huge, which can be prohibitively expensive to build and evaluate each of these hardware configurations. Given a specific workload and its performance SLA, our Swift Advisor take a simple, but practical approach, which is described in Section III-A. A. Solution Description The Swift Advisor consists of two steps: First, it explores the space of small-sized Swift clusters, and determines the best configurations. The small-sized Swift cluster does not have to meet the performance SLA and simply needs to conform to following minimum requirement on the cluster size: It must have at least two proxy nodes and five storage nodes (as mentioned in Section II this is the minimum configuration required for a Swift cluster), and there is no certain upper limit on the size of small-sized Swift clusters. But in our practice, the total number of nodes is usually less than 2. As the size of cluster is small, we can quickly deploy and benchmark each Swift configuration against the target workloads, and find the best ones (e.g. top 3 configurations) that lead to high performance/cost ratio. (Here, cost means the upfront cost and performance could be benchmarked.) 2

3 Algorithm 1 Recommend best Swift configurations for smallsized Swift clusters Input:A Swift workload 1: for each hardware configuration for proxy nodes do 2: for each hardware configuration for storage nodes do 3: for each value for number of proxy nodes do : for each value for number of storage nodes do 5: A small-sized Swift cluster is deployed based on given hardware settings 6: Run workload against the cluster, calculate performance/cost 7: end for : end for 9: end for 1: end for Output:Top-n recommended Swift configurations that lead to highest performance/cost In the second step, the Swift Advisor scales-out each of the best small-sized Swift clusters (selected in step 1) to a largesized Swift cluster deployment, so the large-sized cluster has more resources to meet the performance SLA (on the target workload). Note that, the configurations in large-sized Swift cluster are same as the best configurations selected in step 1, except the number of nodes in the cluster. After the second step, we can easily identify one configuration for a large-sized Swift cluster to be our final solution, which incurs the lowest upfront cost, while providing a minimum performance guarantee. The Swift Advisor described above is fairly simple, and hence has a big appeal in practice. As we will see in the experimental results below, the Swift Advisor is quite effective in producing configurations that are close to the optimal configuration which is founded by exhaustive search. Based on the above discussion, the key operation in Swift Advisor is step 1, namely recommending the best Swift configurations for small-sized Swift clusters. So we formally present the details of step 1 as shown in Algorithm 1. In Algorithm 1, several factors can determine how close the recommended configuration is to the optimum. First, Line 1 and 2 of Algorithm 1 decide the hardware range to be considered for the proxy and storage nodes. If this range is large, then a better configuration may be considered, but the runtime of the algorithm may become large. Line 3 and of Algorithm 1 explore the possible numbers of proxy nodes and storage nodes within a small-sized Swift cluster. If we increase the range of these two searches, then a small-sized Swift cluster with higher performance/cost may be found, but this will incur longer running time. Even it is up to the users to control the search space, we still present some heuristic techniques in in Section III-B that can significantly reduce the runtime cost. We also note that in the future, it may be possible to collect the results of step 1 above in a community maintained database so that that step can simply be converted into a search on the community database. Of course, this raises a number of issues such as how to maintain and search such community databases, and how to deal with issues such as anonymization. In this initial paper, we do not consider these issues, but there is likely a rich direction of future work in this area. B. Heuristic Techniques We could use a number of heuristic techniques to prune the computation shown in Algorithm 1. In this section, we outline these heuristics. Note in the interest of space, we omit showing the pseudocode for the algorithm using these heuristics, as the discussion below outline our heuristics in the context of individual lines in Algorithm 1 that are impacted by the heuristics. Lines 1 and 2 of Algorithm 1 enumerate all possible hardware choices for the proxy and storage nodes. However, a good rule of thumb [] is that the proxy nodes are usually CPU and network intensive, while the storage nodes are and network intensive. So we can use this rule of thumb to limit and prune the hardware search space. For example, we do not consider configurations that have low CPU and network resources for the proxy nodes. Lines 3 and of Algorithm 1 explore different numbers of proxy nodes and storage nodes within a small-sized Swift cluster. In a production environment, the number of proxy nodes is usually far smaller than the number of storage nodes [7]. (For example, the 1PB Swift cluster in [7] has 5 storage nodes and only 6 proxy nodes.) So, within a smallsized Swift cluster, we usually do not consider more than proxy nodes in Algorithm 1. Another heuristic that we apply to Line is to start the number of storage nodes from 5 and increase it by 5 each time to ensure it is always a multiple of the number of zones (by default, 5 zones). More importantly, the iteration in Line is terminated when adding more storage nodes does not further increase the performance of the cluster. This is because after the proxy nodes are fixed (Line 1 and Line 3 of Algorithm 1), they will be eventually saturated at certain point if we keep adding more storage nodes. So, the Swift cluster will not produce higher performance/cost when the proxy nodes are the bottleneck. In our practice (Section IV), we note 15 storage nodes can overwhelm 2 proxy nodes in most cases. Overall, we extend Algorithm 1 with the above heuristics to allow it to find a good solution with a smaller number of iterations of the innermost loop (Line 5-6) in Algorithm 1. Recall the innermost step here is a manual step of testing a cluster, so reducing the number of iterations has a huge impact in practice. In Section IV, we use different workloads to show the effectiveness of the Swift Advisor, which is essentially Algorithm 1 with all the heuristics described here. IV. EVALUATION FOR SWIFT ADVISOR In this section, we use different workloads to verify the effectiveness of the Swift Advisor. We first introduce the hardware configurations to be considered in our experiments. 3

4 HPC High-CPU Large CPU Speed 33.5 EC2 2 EC2 EC2 Compute Units Compute Units Compute Units Memory 23 GB 7 GB 7.5 GB Network 1 GE 1 GE 1 GE Pricing $.32 per hour $.66 per hour $ 1.3 per hour TABLE II: Specifications of EC2 Instances A. Hardware Configurations and Upfront Cost Because it is very expensive to directly purchase large amounts of hardware for evaluation (before we come up with the final cluster recommendation), we chose to take advantages of the virtualized hardware offered by Amazon Elastic Compute Cloud [5] (or simply called EC2). The benefits of using EC2 are: (1) EC2 provides many types of compute instances with different capacities for resources (e.g. CPU, memory and network), ranging from commodity compute instance to High Performance Compute (HPC) instance. Hence, we can explore many hardware configurations paying for only what we use. (2) EC2 provides a reasonably structured pricing, which also provides a good framework to estimate the upfront cost associated with actually building (the recommended) private Swift cluster. (3) The hardware specification of an EC2 instance is clearly defined. Once a Swift cluster is deployed on the EC2 instances with expected performance, the EC2 hardware specifications can effectively guide the purchase of physical hardware, because our final goal is to deploy a costefficient Swift cluster on the physical servers. Given the EC2 resources, Swift Advisor will choose the hardware for the proxy nodes from two EC2 instances: (1) Cluster Compute Quadruple Extra Large Instance (or simply called HPC) and (2) High-CPU Extra Large Instance (or simply called High-CPU). While for storage nodes, the hardware options are: (1) Large Instance (or simply called Large) and (2) High-CPU. The hardware specification and pricing of the HPC, High-CPU and Large instances are shown in Table II. In Table II, the CPU speed is presented in terms of the EC2 Compute Units, which is used to calibrate the CPU resources in EC2. Note that the network bandwidth of 1 Gigabit Ethernet (1 GE) translates to 125 MB/s, and the 1 Gigabit Ethernet (1 GE) can deliver 125 MB/s of network bandwidth. Since some Swift workloads are I/O intensive (e.g. uploading an object triggers three writes of data replica), using one per storage node can be a serious performance bottleneck. In addition, we want to know the impact of number of s per storage node on the performance against different workloads. To simulate a physical server with multiple s in EC2, we can attach several block devices (backed up by Elastic Block Store (EBS)) to an EC2 instance. In our experiments, we consider the following configurations: 2 EBS, EBS or EBS per storage node. We are aware of the additional cost incurred by using EBS. For simplicity, in our experiments we only consider and count The pricing is based on the On-Demand Instances in the US East (N. Virginia) Region as of October 212. the aggregate EC2 instance cost (in terms of $/hour) as the upfront cost of a cluster. Also, as widely known, Amazon EC2 and EBS may suffer variable performance. To alleviate this problem, we use the provisioned IOPS for EBS [6], which guarantees certain IOPS (e.g. 1, IOPS in our experiment) per EBS volume, then run each experiment several times and report the average performance. B. Performance SLA In this paper, we define the performance SLA metric for a Swift cluster in terms of the average duration between the operation initiation and completion (or simply called, Response Time ). So for example, the SLA might say that the system must return back to the client at least % of the responses within 1 second. The reason we use Response Time in our SLA specification as it is an intuitive and commonly used metric by the end users. C. Workloads As shown in Table I, we define four Swift workloads based on the two parameters ratio between PUT, GET and DELETE and range of object size. When the workload generates small objects (object size from 1 KB to 1 KB), we launch 12 concurrent client threads to load the Swift cluster. Otherwise, 12 concurrent threads are used when the object size is large (from 1 MB to 1 MB). In each case, we make sure that the workload threads (running on the client machines) are not the system bottleneck and generate enough work to fully saturate the Swift cluster. In our experiments, the workloads are generated using COSBench [17], which upload, download and delete objects from 6 containers ( bucket in the Amazon S3 terminology) in the Swift cluster. It is the future work to consider the dynamic workloads and different SLA definitions. D. Key Result: Overview of the Evaluation In this section, we present an overview of our key results, and follow up with more detailed results in the subsequent sections. The Swift configurations recommended by Swift Advisor for small-sized Swift clusters are shown in Tables III, IV, V and VI for Workload A, B, C and D respectively. To highlight the significance of our results, the two Hardware columns pick three best hardware configurations out of all configuration posibilities for the proxy and the storage nodes, where the best is determined by performance/cost and all configurations are sorted from high to low values. (The configuration with higher performance/cost value is perferred, as it is explained in Section III). Note that in these tables, the hardware configuration for proxy or storage nodes is presented as the number of certain EC2 instances plus the number of s per EC2 instance. For example 5 (High-CPU + s) means that there are five High-CPU instances, and each instance has attached s. Now, we analyze the recommended Swift configurations for each workload.

5 Intensive Upload Intensive Download (GET:5%, PUT:9%, DEL:5%) (GET:9%, PUT:5%, DEL:5%) Small Objects Example: Game hosting Example: Website hosting (range of object Game sessions are periodically saved in small Once a new webpage is published by the owner, sizes: 1KB-1KB) files to record the user profiles with timestamps many read requests are made to that page. Large Objects Example: Enterprise Backup Example: Video sharing (range of object Files are compressed into large archives and backed Large download traffic is generated when people sizes: 1MB-1MB) up. Occasionally, files need to be recovered and deleted. watch the hot videos. TABLE I: Experimental Swift workloads Workload Hardware for Hardware for throughput A Proxy Nodes Storage Nodes per cost Upload 1st 2 High-CPU 5(High-CPU + EBS) 151 Small 2nd 2 HPC 5(High-CPU + EBS) 135 Objects 3rd 2 HPC 1(High-CPU + EBS) 123 TABLE III: Recommended configurations for Workload A Workload Hardware for Hardware for throughput B Proxy Nodes Storage Nodes per cost Upload 1st 2 HPC 1 (Large + EBS) 5.6 Large 2nd 2 HPC 5 (Large + EBS).9 Objects 3rd 2 High-CPU 5 (Large + EBS).7 TABLE IV: Recommended configurations for Workload B Workload Hardware for Hardware for throughput C Proxy Nodes Storage Nodes per cost Download 1st 2 High-CPU 5 (Large + 2 EBS) 737 Small 2nd 2 HPC 5 (Large + 2 EBS) 572 Objects 3rd 2 High-CPU 1 (Large + 2 EBS) 513 TABLE V: Recommended configurations for Workload C Workload Hardware for Hardware for throughput D Proxy Nodes Storage Nodes per cost Download 1st 2 HPC 5 (Large + EBS) 16. Large 2nd 2 HPC 1 (Large + EBS) 1.5 Objects 3rd 2 High-CPU 5 (Large + EBS) 12.9 TABLE VI: Recommended configurations for Workload D For Workload A, the hardware recommendations are shown in Table III. Here we observe that all three recommended configurations prefer to provision the storage node with the High-CPU instance, and each storage node has attached s (as shown in the Hardware for Storage Node column of Table III). This is because when small objects are concurrently uploaded, the response time to process one upload request is very short, so the storage nodes need to handle large amounts of requests per second (e.g. 7 operation/s) and the CPU is the key resource to determine the performance. Moreover, each object is written in triplicate, and hence the I/O activity is also intensive for the storage nodes. So the High-CPU instance (with 5X more CPU resources and only expensive than the Large instance) with s is the preferred hardware for storage node. To provision the proxy node, the Swift Advisor chooses to use the High-CPU instance in the best configuration (as shown in the Hardware for Proxy Node column of Table III). This is because even though the HPC instance has 1X larger network bandwidth (1 GE) than the High-CPU instance (1 GE), Workload A does not consume large amounts of network bandwidth (MB/s) from the proxy nodes, due to the small object size. For example 1,25 operation/s only incurs 6 MB/s incoming and 6 3 MB/s outgoing traffic through the proxy nodes (replication factor is 3). Thus, using more expensive HPC instance and its 1 GE network for proxy nodes does not pay off. Finally, after choosing the best hardware configurations for the proxy and the storage nodes, the Swift Advisor also figure out how many storage nodes will work with the two proxy nodes to maximize the clusterwide performance/cost. For example in Table III, the 3rd recommended configuration advises to use 1 storage nodes and 2 proxy nodes in the small-sized cluster. For Workload B (see Table IV), the storage nodes from the top three best configurations are all based on the Large instance, and each storage node has attached s (as shown in the Hardware for Storage Node column of Table IV). The reason for choosing the Large instance, instead of High-CPU instance, is as follows: contrasting to the small objects, when the object size is large, longer response time (mostly I/O time) is incurred to finish each upload request, which slows down the request rate. Because of the low request rate, the CPU is no longer the critical bottleneck resource. So provisioning the storage node with the Large instance and s provides sufficient CPU and I/O resources to attain good performance, but at a lower price. On the other hand, since many large objects are uploaded at the same time, the network traffic through the two proxy nodes is heavy. For example, 32 operation/s causes 16 MB/s traffic and 16 3 MB/s outgoing traffic through the proxy nodes. Thus, it pays off to use HPC instance for proxy nodes, as HPC instance has 1 GE which is necessary to handle the large network traffic. Finally, to maximize performance/cost, 1 storage nodes are needed in the 1st configuration to keep up with the 2 HPC based proxy nodes. For Workload C (see Table V), the storage nodes from the top three recommended hardware configurations are all based on the Large instance, and only 2 s are needed per storage node (as shown in the Hardware for Storage Node column in Table V). Since downloading an object only has to retrieve one copy of data replica from the storage nodes (see Section II), the load on both the CPU and the I/O resources at the storage nodes is much smaller than the upload intensive workloads (e.g. Workload A and B). Thus, provisioning each storage node with the Large instance and 2 s is sufficient to maintain good performance. Besides, concurrently retrieving many small objects only consumes small amounts of network bandwidth at the proxy nodes. Hence, in the best configuration, using High-CPU instance 5

6 Cost ($/hour) Cost ($/hour) 1 Storage: 1 (High-CPU + s) 6 2 s Storage: 5 (High-CPU + s) s s 2 Storage: 5 High-CPU 5 1 Throughput (# operations / second) 15 Fig. 2: Workload A: Top three recommended Swift configurations for small-sized Swift cluster. The dark dots in the figure represent the 1st configuration (shown in Table III) with varying number of s attached to the storage nodes. (and 1 GE) for proxy node reduces the cost without throttling the performance. Finally, for workload D, Table VI shows the recommended configurations. All of the three recommended configurations prefer the Large instance for the storage nodes, and each storage node has attached s (as shown in the Hardware for Storage Node column of Table VI). Since this workload is not CPU bound at storage nodes, provisioning the Large instance and s for storage nodes provides sufficient CPU and I/O resources to attain good performance at a lower price. When choosing the hardware for the proxy nodes, we noticed that similar to Workload B, there is a large network bandwidth demand from the proxy nodes (e.g. totally 6 MB/s for incoming and outcoming traffic). Thus, it pays off to use the HPC instance (and 1 GE) for the proxy nodes. E. Workload A: In Detail 1) Small-Sized Swift Clusters: Fig. 2 includes the results of the top three recommended configurations, with the 1st configuration shown in bold. Recall as described in Section IV-D, for this workload, the CPU at the storage nodes is the critical resource for performance. Given the two hardware choices for the storage nodes, the Large instance has EC2 Compute Units with $.32 per hour, while the High-CPU instance has 5X more CPU resources but is only more expensive than the Large instance. Since the High-CPU instance provides more CPU resources per cost, the Swift Advisor chooses the Large instance for the storage node. Also from Fig. 2, we observe that the throughput (# operations/second) of the cluster is very high. In addition, we also observed the low network bandwidth of the cluster. (It is not shown in Fig. 2, but the network bandwidth can be calculated by multiplying the average object size and throughput). For example 1,25 operations/second only incurs around 6 MB/s of incoming traffic and (6 3) amount of outcoming traffic. Given the two hardware choices for the proxy nodes, the High-CPU instance has 2 EC2 Compute Units and 1 GE networking, at $.66 per hour. In contrast, the : 6 Proxy 15 Storage 1.5X Storage: 5 (High-CPU + s) Storage: 1 (High-CPU + s) 1 optimal 6 SLA: RT < 1ms 2 : 2 Proxy Storage: 5 (High-CPU + s) 5 Storage Response Time (ms) of % requests Fig. 3: Workload A: Large-Sized Swift clusters HPC instance has 1.67X more CPU resources but is twice as expensive, which results in a lower CPU resource per dollar. In addition, due to the low network bandwidth demand, using 1GE networking in the proxy node does not throttle the throughput. Thus, by considering the above facts, the High-CPU instance is preferred over the HPC instance to power the proxy nodes in the 1st configuration. In addition, the Swift Advisor also determines that the two proxy nodes will work with five storage nodes in the best configuration. The reason is: the five storage nodes can drive the two proxy nodes to an % busy level. There are still some unused resources in the two proxy nodes, but the two proxy nodes will soon be saturated if we keep adding more storage nodes, and does not give a higher performance/cost. As large amounts of small objects are uploaded per second and each object is written in triplicate, the I/O activity is very intensive for the storage nodes. Using multiple s can evenly distribute the I/O load and greatly reduces the average I/O time to improve the overall performance. For example, we monitored that using 2 s per storage node always gives 1% utilization on each, comparing to 3% utilization when s are used per storage node. So far we explained how the Swift Advisor recommends configurations within the small-sized Swift clusters. In the following section, we explain how the Swift Advisor produces configurations for a large-sized Swift cluster with low upfront cost, while adhering to performance SLA. For this workload, we assume that the SLA is to service at least % of the request in 1 ms. 2) Large-Sized Swift Clusters: Referring to the three recommended Swift configurations shown in Table III, we deploy the large-sized Swift clusters by proportionally scaling out the numbers of proxy and storage nodes. The costs and response times for % requests is shown in Fig. 3. From Fig. 3, we notice that all three Swift configurations can be scaled linearly; e.g., from to or in Fig. 3. If the SLA requires % of requests should be served within 1 ms (as shown As mentioned in [7], for a Swift cluster with 5 storage nodes, it can provide 12 TB capacity. So we believe the large-sized Swift clusters in our experiments, e.g. proxy nodes and 2 storage nodes, can hold.5 PB of data a large enough capacity for many use cases. 6

7 Cost ($/hour) Cost ($/hour) 6 Storage: 1 Large 2 s s s Storage: 5 (High-CPU + s) 2 Storage: 1 (High-CPU + s) Throughput (# operations / second) Fig. : Workload B: Top three recommended Swift configurations for small-sized Swift cluster. The dark dots in the figure represent the 1st configuration (shown in Table IV) with varying number of s attached to the storage nodes. by the dotted line in Fig. 3), we first detect the large-sized Swift clusters that already satisfied the SLA and then compare their costs to find the lowest cost solution. For example, in Fig. 3, the Swift cluster with HPC based proxy nodes and 1 High-CPU based storage nodes (highlighted by the circle in Fig. 3) has the lowest cost, while abiding by the SLA. In addition, the optimal configuration (the star symbol in Fig. 3) is to use 3 HPC based proxy nodes and 1 High-CPU based storage nodes. Note that, to find the optimal configuration, we used an exhaustive search that took over 6 hours, while Swift Advisor found the near-optimal solution only within 1 hour a huge savings from the administration and manageability perspectives. We note that there might be some hotspots (e.g. load balancing) when the cluster is scaled out to hundreds of nodes, but we did not observe them in our experiments. It is our future work to detect and address those hotspots in the storage clouds. F. Workload B: In Detail 1) Small-Sized Swift Clusters: For Workload B, the results of the top three recommended configurations are presented in Fig., and the 1st configuration is highlighted in bold. In contrast to the recommended configurations for Workload A, here the Large instance is preferred for the storage nodes. The reason for this choice is as follows: when uploading large object is the dominating operation in the workload, the throughput of the cluster is very low (as seen in Fig. compared to Fig. 2), so the CPU in the storage nodes only needs to handle a small number of requests per second, and is no longer the crucial bottleneck resource. Given the two hardware choices for the storage nodes, the Large instance is 5% cheaper than the High-CPU instance and both have the same 1 GE network. Thus, choosing the Large instance for the storage nodes provides sufficient CPU resources to attain high performance with a lower cost. On the other hand, we notice that the aggregate network bandwidth from the proxy nodes is the critical resource to the X 6 optimal SLA: RT < 5ms 2 Storage: 1 (Large + s) 5, 1, 15, 2, 25, Response Time (ms) of % requests Storage: 5 (Large + 2 s) Storage: 5 (Large + 2 s) Fig. 5: Workload B: Large-Sized Swift clusters performance. As shown in Fig., 32 operation/s approximately causes 16 MB/s of incoming traffic and (16 3) MB/s of outgoing traffic through the proxy nodes, just because the object size is large. Based on this observation, using 1GE networking on the proxy nodes can seriously throttle the throughput. Thus, the HPC instance (and its 1 GE networking) is chosen in the 1st configuration to handle the heavy network traffic, even though it is more expensive than the High-CPU instance. In addition, the 1st configuration also requires 1 storage nodes to keep up with the 2 HPC based proxy nodes. Similar to the Workload A in which each object is replicated three times, the I/O is intensive for the storage nodes, so using s per storage node provides better performance than using smaller number of s per storage node (as shown in Fig. ). 2) Large-Sized Swift Clusters: Again, based on the three recommended Swift configurations, the large-sized Swift clusters are deployed and their results are shown in Fig. 5. As observed from Fig. 5, each Swift configuration can be scaled linearly. If the SLA requires the cluster to response to % requests in 5 ms (shown as the dotted line in Fig. 5), we note that the Swift cluster with 3 HPC based proxy nodes and 15 Large based storage nodes (highlighted by the circle in Fig. 5) has the lowest cost. In addition, the optimal configuration, founded by an exhaustive search, has the same configuration as the best solution returned by the Swift Advisor. G. Workload C: In Detail 1) Small-Sized Swift Clusters: For Workload C, the results of the top three recommended configurations are drawn in Figure 6 and we highlight the 1st configuration in bold. As most of the operations in this workload is to download small object, only one copy of the object has to be retrieved upon each download request. Plus, the main memory in the storage node can also cache some metadata to speed up the data retrievals. Thus, the consumptions on CPU and I/O resources are only moderate for storage nodes. Given the two hardware choices, the Swift Advisor prefers to provision the storage node with the Large instance which can provide 7

8 Cost ($/hour) Cost ($/hour) Cost ($/hour) Cost ($/hour) Storage: 1 (Large + 2 s) Storage: 5 (Large + 2 ) Storage: 5 (Large + 2 s) Throughput (# operations/second) Fig. 6: Workload C: Top three recommended Swift configurations for the small-sized Swift cluster Storage: 1 (Large + s) Storage: 5 (Large + s) 1 Storage: 5 (Large + s) Throughput (# operations/second) Fig. : Workload D: Top three recommended Swift configurations for the small-sized Swift cluster optimal SLA: RT < 2ms Storage: 1 (Large + 2 s) Storage: 5 (Large + 2 s) 1.5X 2 Storage: 5 (Large + 2 s) Response Time (ms) of % requests Fig. 7: Workload C: Large-Sized Swift clusters optimal 1.5X Storage: 5 (Large + s) Storage: 1 (Large + s) SLA: RT < 1ms 2 Storage: 5 (Large + s) 1, 2, 3,, 5, Response Time (ms) of % requests Fig. 9: Workload D: Large-Sized Swift clusters the adequate resources to attain high performance, while only incurring 5% cost as compared to the High-CPU instance. Besides, due to the caching effect and less burden on the I/O, having 2 s per storage node is sufficient for handling the I/O load caused by the concurrent download requests. In addition, we also notice that this workload only consumes small amounts of network bandwidth at the proxy nodes. For example, as shown in Figure 6, 2 operation/s only cause around 12 MB/s incoming and 12 MB/s outcoming traffic. Thus, the High-CPU instance and its 1 GE networking can sufficiently power the proxy nodes in the 1st configuration, while generating lower cost than using the HPC instance. Moreover, we notice that five Large based storage nodes can fully saturate the two High-CPU based proxy nodes in the 1st configuration. Thus, adding more storage nodes does not deliver higher performance/cost. 2) Large-Sized Swift Clusters: Fig. 7 shows the results of the large-sized Swift clusters for Workload C. As before, the three recommended Swift configurations can be scaled nicely by adding more nodes into the clusters. If our SLA is to mandate the cluster to response % requests in 2 ms (shown as the dotted line in Fig. 7), then the Swift cluster with 6 High-CPU based proxy nodes and 15 Large based storage nodes (highlighted by the circle in Fig. 7, which is also the optimal solution from an exhaustive search) has the lowest cost, while adhering to the performance SLA. H. Workload D: In Detail 1) Small-Sized Swift Clusters: For Workload D, we present the results of the top three recommended Swift configurations in Fig. and show the 1st configuration in bold. Similar to Workload C, the Swift Advisor prefers using the Large instance for provisioning the storage node, because the Large instance is able to provide sufficient CPU resources to attain high performance and has lower cost than the High-CPU instance. However, different from the 1st configuration for Workload C, the proxy nodes are better to use the HPC instance. The reason is: when most of the operations is to download large object, the amounts of network traffic are large at the proxy nodes. For example as shown in Figure, 7 operation/s can cause about 32 MB/s incoming and 32 MB/s outcoming traffic. Thus, in the 1st configuration, it pays off to choose the HPC instance and its 1 GE networking to back up the proxy node. In the 1st configuration, we note that 5 Large based storage nodes and 2 HPC based proxy nodes in the cluseter can deliver the best performance/cost. Besides, each storage node needs attached s. The reason is: reading the large objects will trigger more intensive I/O load than reading the small objects, only using 2 s per storage node tends to throttle the throughput of the cluster. 2) Large-Sized Swift Clusters: The results of the largesized Swift clusters are presented in Fig. 9 and we can see that the linear scale still holds for the three recommended

9 configurations. When the SLA requires the cluster to response % requests in 1 ms (shown as the dotted line in Fig. 9), we notice the Swift cluster with HPC based proxy node and 2 Large based storage nodes (highlighted by the circle in Fig. 9, which is also the optimal solution from an exhaustive search) produces the lowest cost and yet abiding by the SLA. I. Best practices from the Empirical Results From our empirical results, we observe that the Swift Advisor is always effective in recommending the appropriate (near-optimal) Swift configurations for different workloads. When most of the operations in the workload is to upload small object (Workload A), it is better to provision the storage nodes with high-end CPU, and avoid the high-end (and expensive) 1 GE networking for the proxy nodes. However, when the object size becomes larger (Workload B), it is worth using the 1 GE for the proxy nodes and sufficient to provision the storage node with the commodity hardware (e.g. lowend CPU) to achieve high performance. For both upload intensive workloads (Workload A and B), the Swift Advisor recommends to provide more I/O resources for storage nodes to maximize the overall performance. On the other side, when the dominating operation in the workload is to download small objects (Workload C), Swift Advisor does not provision the proxy nodes with 1 GE networking, because the workload only consume small network bandwidth. However, when the object size becomes larger (Workload D), it pays off to power the proxy node with 1 GE, because transferring the large objects incurs heavy network traffic through the proxy nodes. In addition, for both download intensive workloads (Workload C and D), using commodity hardware (e.g. low-end CPU) to power the storage nodes can effectively reduce the cost, while attaining high performance. V. FAILURES IN SWIFT CLUSTER Cloud operators face periodic hardware or software s. A Swift cluster can tolerate up to 2 zones (or equivalently % storage nodes) failed [1] at the same time without losing any data. However, if the objective of the cloud operators is to ensure that their clusters always perform above a certain level even in the presence of s, they should consider several scenarios upfront and determine solutions to minimize the negative impact caused by such s. In this section, the negative impact is interpreted as the performance degradation, and we only consider scenarios that don t result in permanent loss of data. First, we describe common scenarios, and then describe how to reduce the negative impact caused by these s. In the interest of space, we first introduce two common types of s: entire storage node and. The reason for choosing these two scenarios is because the storage nodes and their s are the major components of a Swift cluster, and are usually provisioned with commodity and inexpensive hardware hence, they have a higher probability of failing. In fact, s are by far the most common type of in large clusters. A. Experiment Setup We choose to use Workload B (Intensive Upload Large Object, as defined in Table I and discussed in Section IV-C), because it is the most popular workload that we have seen for cloud storage platforms today. This evaluation is done for a Swift cluster with proxy nodes and 2 storage nodes, and each storage node has attached s. We still use EC2 instances to power the Swift cluster: The proxy nodes are always based on the HPC instance, while for storage nodes, we consider the following two different configurations. When evaluating the entire storage node scenario, we compare two configurations for storage node: using the Large instance and using the High-CPU instance. To simulate node s at different scales, we randomly shut down 1%, 2%, 3% or % storage nodes across the Swift cluster. Note that this experiment mimics events like power distribution units, which usually takes down several racks of servers at the same time. When studying the scenario, we always use the Large instance for the storage node. To simulate the at different scales, 1%, 2%, 3% or % s are randomly shutdown (by umounting them) at the same time over all storage nodes. Note that this scenario mimics what happens if the s are attached to the storage nodes, which are implemented by several external NAS (Network-attached storage) storage. When one NAS storage is down, lots of s will vanish instantaneously from the storage nodes. B. Results for Entire Storage Node Failure In Fig. 1, we compare the throughput of the cluster as the rate changes. For this scenario, the storage nodes are based on the Large instance or based on the High-CPU instance. As shown in Fig. 1, when there is no, these two hardware configurations achieve the same throughput (shown as the no in Fig. 1). However, when the storage nodes start to fail, and the Large instance is used for the storage nodes (shown as the white bars in Fig. 1), the throughput drops very quickly from 1% to %. On the other hand, when the storage nodes are based on the High-CPU instance (shown as the grey bars in Fig. 1), the throughput does not drop from the 1% to the 3% scenarios, and only starts to degrade at the % scenario. To explain our observation above, we monitored the CPU usage and network bandwidth on those unaffected storage nodes, which are shown in Fig. 11 and 12. As observed from Fig. 11, we notice that when the storage node is powered by the High-CPU instance, the CPU usage on the unaffected storage nodes keeps increasing as the rate goes up. However, when the storage node is based on the Large instance, the CPU usage on the unaffected storage nodes is always above 9%, indicating that the CPU resources have been fully utilized even when there is no. As introduced in 9

10 CPU usage (unaffected node) Throughput (# operations/sec) Throughput (# operations/sec) Network Bandwidth in MB/s (unaffected node) no 1% 2% 3% % Large High-CPU Fig. 1: Entire storage node : Comparing the throughput at different rates 1% % 6% % 2% % no 1% 2% 3% % Large High-CPU Fig. 11: CPU usage from the unaffected storage nodes no 1% 2% 3% % Large High-CPU Fig. 12: Network bandwidth from the unaffected storage nodes no 1% 2% 3% % % node Fig. 13: Disk Failure: Comparing throughput at different rate Table II, the High-CPU instance has 5X more CPU resources than the Large instance. When the storage nodes are backed up by the High-CPU instance and some of them get failed, the unaffected nodes will quickly take over the workloads that are originally assigned for those failed ones, so the extra CPU resources provided by the High-CPU instance can be used to minimize the performance degradation. However, if the storage nodes are provisioned using the Large instances, then there are no extra CPU resources to use when the s happen, which rapidly degrades performance. The network bandwidth usage on the unaffected storage nodes, as shown in Fig. 12, presents similar behavior as Fig. 11. When the High-CPU instance is used to power the storage nodes and some of them suddenly become unavailable, we observe the network bandwidth from the unaffected nodes increases as more storage nodes go down. So this indicates the unaffected storage nodes can immediately pick up the workloads that are originally assigned for the failed nodes. Overall, we conclude that if the cloud operators can foresee the potential downtime of some storage nodes, but still have to strictly ensure the performance SLA in face of the s, they should consider provisioning more resources (e.g. CPU) for the storage nodes. In case some of the nodes fail, the extra resources from the unaffected storage nodes can be utilized to mitigate the performance degradation. In addition, we can easily extend Swift Advisor to have it consider the s when recommending the Swift configurations. For example in Line 6 of Algorithm 1, we can intentionally shut down certain amounts of storage nodes to measure the degraded performance of the small-sized Swift cluster. C. Results for Disk Failure In Fig. 13, we compare the throughput of the cluster when different percentage of s in the cluster fail simultaneously. From this figure, we observe that as the rate increases from 1% to %, the performance degrades very slightly. To explain this observation, we monitor the utilization on the unaffected s; these results are shown in Fig. 1. From Fig. 1, we note that the unaffected s get busier when more s are failed. Based on this observation, we infer that the I/O workloads that are assigned for the failed s are now re-distributed over the unaffected s, which helps reduce the performance degradation. In addition, we also notice that compared to the case when % storage nodes are failed (shown as the dark grey bar in Fig. 13), the Swift cluster with % failed s has a much higher throughput, because all the storage nodes are still accessible and functional (even with some failed s). D. Best Practices from the Empirical Results Based on the above results, we summarize our observations as follows: First, if the cloud operators always have to guarantee the performance SLA even in face of the s, it makes sense for them to provision more resources for each storage node to reduce the performance degradation when some of them suddenly become unavailable. Second, given large enough number of s per storage nodes, a certain scale of does not cause a large performance degradation. 1

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda 1 Outline Build a cost-efficient Swift cluster with expected performance Background & Problem Solution Experiments

More information

Cloud Computing and Amazon Web Services

Cloud Computing and Amazon Web Services Cloud Computing and Amazon Web Services Gary A. McGilvary edinburgh data.intensive research 1 OUTLINE 1. An Overview of Cloud Computing 2. Amazon Web Services 3. Amazon EC2 Tutorial 4. Conclusions 2 CLOUD

More information

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Amazon Cloud Storage Options

Amazon Cloud Storage Options Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

Expert Reference Series of White Papers. Introduction to Amazon Relational Database Service (Amazon RDS)

Expert Reference Series of White Papers. Introduction to Amazon Relational Database Service (Amazon RDS) Expert Reference Series of White Papers Introduction to Amazon Relational Database Service (Amazon RDS) 1-800-COURSES www.globalknowledge.com Introduction to Amazon Relational Database Service (Amazon

More information

A Middleware Strategy to Survive Compute Peak Loads in Cloud

A Middleware Strategy to Survive Compute Peak Loads in Cloud A Middleware Strategy to Survive Compute Peak Loads in Cloud Sasko Ristov Ss. Cyril and Methodius University Faculty of Information Sciences and Computer Engineering Skopje, Macedonia Email: sashko.ristov@finki.ukim.mk

More information

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved. Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat

More information

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service Jinesh Varia and Jose Papo March 2012 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1

More information

The OpenStack TM Object Storage system

The OpenStack TM Object Storage system The OpenStack TM Object Storage system Deploying and managing a scalable, open- source cloud storage system with the SwiftStack Platform By SwiftStack, Inc. contact@swiftstack.com Contents Introduction...

More information

The Microsoft Large Mailbox Vision

The Microsoft Large Mailbox Vision WHITE PAPER The Microsoft Large Mailbox Vision Giving users large mailboxes without breaking your budget Introduction Giving your users the ability to store more e mail has many advantages. Large mailboxes

More information

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity P3 InfoTech Solutions Pvt. Ltd http://www.p3infotech.in July 2013 Created by P3 InfoTech Solutions Pvt. Ltd., http://p3infotech.in 1 Web Application Deployment in the Cloud Using Amazon Web Services From

More information

Deep Dive: Maximizing EC2 & EBS Performance

Deep Dive: Maximizing EC2 & EBS Performance Deep Dive: Maximizing EC2 & EBS Performance Tom Maddox, Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved What we ll cover Amazon EBS overview Volumes Snapshots

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Network Infrastructure Services CS848 Project

Network Infrastructure Services CS848 Project Quality of Service Guarantees for Cloud Services CS848 Project presentation by Alexey Karyakin David R. Cheriton School of Computer Science University of Waterloo March 2010 Outline 1. Performance of cloud

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

StorReduce Technical White Paper Cloud-based Data Deduplication

StorReduce Technical White Paper Cloud-based Data Deduplication StorReduce Technical White Paper Cloud-based Data Deduplication See also at storreduce.com/docs StorReduce Quick Start Guide StorReduce FAQ StorReduce Solution Brief, and StorReduce Blog at storreduce.com/blog

More information

Introduction to Red Hat Storage. January, 2012

Introduction to Red Hat Storage. January, 2012 Introduction to Red Hat Storage January, 2012 1 Today s Speakers 2 Heather Wellington Tom Trainer Storage Program Marketing Manager Storage Product Marketing Manager Red Hat Acquisition of Gluster What

More information

How AWS Pricing Works

How AWS Pricing Works How AWS Pricing Works (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 15 Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 Fundamental

More information

How AWS Pricing Works May 2015

How AWS Pricing Works May 2015 How AWS Pricing Works May 2015 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 15 Table of Contents Table of Contents... 2 Abstract... 3 Introduction...

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN

New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN Updated: May 19, 2015 Contents Introduction... 1 Cloud Integration... 1 OpenStack Support... 1 Expanded

More information

www.basho.com Technical Overview Simple, Scalable, Object Storage Software

www.basho.com Technical Overview Simple, Scalable, Object Storage Software www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

Performance Benchmark for Cloud Block Storage

Performance Benchmark for Cloud Block Storage Performance Benchmark for Cloud Block Storage J.R. Arredondo vjune2013 Contents Fundamentals of performance in block storage Description of the Performance Benchmark test Cost of performance comparison

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline References Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of

More information

Introduction to Database Systems CSE 444

Introduction to Database Systems CSE 444 Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service YongChul Kwon References Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website Part of

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Tableau Server Scalability Explained

Tableau Server Scalability Explained Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

DELL s Oracle Database Advisor

DELL s Oracle Database Advisor DELL s Oracle Database Advisor Underlying Methodology A Dell Technical White Paper Database Solutions Engineering By Roger Lopez Phani MV Dell Product Group January 2010 THIS WHITE PAPER IS FOR INFORMATIONAL

More information

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

PostgreSQL Performance Characteristics on Joyent and Amazon EC2 OVERVIEW In today's big data world, high performance databases are not only required but are a major part of any critical business function. With the advent of mobile devices, users are consuming data

More information

OPTIMIZING SERVER VIRTUALIZATION

OPTIMIZING SERVER VIRTUALIZATION OPTIMIZING SERVER VIRTUALIZATION HP MULTI-PORT SERVER ADAPTERS BASED ON INTEL ETHERNET TECHNOLOGY As enterprise-class server infrastructures adopt virtualization to improve total cost of ownership (TCO)

More information

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers BASEL UNIVERSITY COMPUTER SCIENCE DEPARTMENT Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers Distributed Information Systems (CS341/HS2010) Report based on D.Kassman, T.Kraska,

More information

Cloud Computing For Bioinformatics

Cloud Computing For Bioinformatics Cloud Computing For Bioinformatics Cloud Computing: what is it? Cloud Computing is a distributed infrastructure where resources, software, and data are provided in an on-demand fashion. Cloud Computing

More information

GraySort on Apache Spark by Databricks

GraySort on Apache Spark by Databricks GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies

More information

Figure 1. The cloud scales: Amazon EC2 growth [2].

Figure 1. The cloud scales: Amazon EC2 growth [2]. - Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

More information

Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database

Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database JIOS, VOL. 35, NO. 1 (2011) SUBMITTED 02/11; ACCEPTED 06/11 UDC 004.75 Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database University of Ljubljana Faculty of Computer and Information

More information

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud Preparing Your IT for the Holidays A quick start guide to take your e-commerce to the Cloud September 2011 Preparing your IT for the Holidays: Contents Introduction E-Commerce Landscape...2 Introduction

More information

Everything you need to know about flash storage performance

Everything you need to know about flash storage performance Everything you need to know about flash storage performance The unique characteristics of flash make performance validation testing immensely challenging and critically important; follow these best practices

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Case Study - I. Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008.

Case Study - I. Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008. Case Study - I Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008 Challenges The scalability of the database servers to execute batch processes under

More information

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida Amazon Web Services Primer William Strickland COP 6938 Fall 2012 University of Central Florida AWS Overview Amazon Web Services (AWS) is a collection of varying remote computing provided by Amazon.com.

More information

SwiftStack Global Cluster Deployment Guide

SwiftStack Global Cluster Deployment Guide OpenStack Swift SwiftStack Global Cluster Deployment Guide Table of Contents Planning Creating Regions Regions Connectivity Requirements Private Connectivity Bandwidth Sizing VPN Connectivity Proxy Read

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How

More information

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,

More information

2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment

2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment R&D supporting future cloud computing infrastructure technologies Research and Development on Autonomic Operation Control Infrastructure Technologies in the Cloud Computing Environment DEMPO Hiroshi, KAMI

More information

AWS Storage: Minimizing Costs While Retaining Functionality

AWS Storage: Minimizing Costs While Retaining Functionality AWS Storage: Minimizing Costs While Retaining Functionality This whitepaper, the second in our Cost Series, discusses persistent storage with Amazon Web Services. It will focus upon Elastic Block Store

More information

Cloud Gateway. Agenda. Cloud concepts Gateway concepts My work. Monica Stebbins

Cloud Gateway. Agenda. Cloud concepts Gateway concepts My work. Monica Stebbins Approved for Public Release; Distribution Unlimited. Case Number 15 0196 Cloud Gateway Monica Stebbins Agenda 2 Cloud concepts Gateway concepts My work 3 Cloud concepts What is Cloud 4 Similar to hosted

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

Parallels Cloud Storage

Parallels Cloud Storage Parallels Cloud Storage White Paper Best Practices for Configuring a Parallels Cloud Storage Cluster www.parallels.com Table of Contents Introduction... 3 How Parallels Cloud Storage Works... 3 Deploying

More information

Cloud Computing. Adam Barker

Cloud Computing. Adam Barker Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles

More information

Service Description Cloud Storage Openstack Swift

Service Description Cloud Storage Openstack Swift Service Description Cloud Storage Openstack Swift Table of Contents Overview iomart Cloud Storage... 3 iomart Cloud Storage Features... 3 Technical Features... 3 Proxy... 3 Storage Servers... 4 Consistency

More information

How To Make A Backup System More Efficient

How To Make A Backup System More Efficient Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem October, 2006 Introduction Data de-duplication has recently gained significant industry attention,

More information

Zadara Storage Cloud A whitepaper. @ZadaraStorage

Zadara Storage Cloud A whitepaper. @ZadaraStorage Zadara Storage Cloud A whitepaper @ZadaraStorage Zadara delivers two solutions to its customers: On- premises storage arrays Storage as a service from 31 locations globally (and counting) Some Zadara customers

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

Parallels Cloud Server 6.0

Parallels Cloud Server 6.0 Parallels Cloud Server 6.0 Parallels Cloud Storage I/O Benchmarking Guide September 05, 2014 Copyright 1999-2014 Parallels IP Holdings GmbH and its affiliates. All rights reserved. Parallels IP Holdings

More information

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline

More information

The Cloud Hosting Revolution: Learn How to Cut Costs and Eliminate Downtime with GlowHost's Cloud Hosting Services

The Cloud Hosting Revolution: Learn How to Cut Costs and Eliminate Downtime with GlowHost's Cloud Hosting Services The Cloud Hosting Revolution: Learn How to Cut Costs and Eliminate Downtime with GlowHost's Cloud Hosting Services For years, companies have struggled to find an affordable and effective method of building

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics

More information

Data Backups in the Clouds

Data Backups in the Clouds ELEKTROTEHNIŠKI VESTNIK 78(3): 118-122, 2011 ENGLISH EDITION Data Backups in the Clouds Aljaž Zrnec University of Ljubljana, Faculty of Computer and Information Science, Trzaska 25, 1000 Ljubljana, Slovenia

More information

Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching

Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching A SafePeak Whitepaper February 2014 www.safepeak.com Copyright. SafePeak Technologies 2014 Contents Objective...

More information

G22.3250-001. Porcupine. Robert Grimm New York University

G22.3250-001. Porcupine. Robert Grimm New York University G22.3250-001 Porcupine Robert Grimm New York University Altogether Now: The Three Questions! What is the problem?! What is new or different?! What are the contributions and limitations? Porcupine from

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

The State of Cloud Storage

The State of Cloud Storage 203 Industry Report A Benchmark Comparison of Performance, Availability and Scalability Executive Summary In the last year, Cloud Storage Providers (CSPs) delivered over an exabyte of data under contract.

More information

Intro to AWS: Storage Services

Intro to AWS: Storage Services Intro to AWS: Storage Services Matt McClean, AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved AWS storage options Scalable object storage Inexpensive archive

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

The Effect of Priorities on LUN Management Operations

The Effect of Priorities on LUN Management Operations Abstract This white paper describes the effect of each of the four Priorities (ASAP, High, Medium, and Low) on overall EMC CLARiiON performance in executing. The LUN Management Operations are migrate,

More information

DataStax Enterprise, powered by Apache Cassandra (TM)

DataStax Enterprise, powered by Apache Cassandra (TM) PerfAccel (TM) Performance Benchmark on Amazon: DataStax Enterprise, powered by Apache Cassandra (TM) Disclaimer: All of the documentation provided in this document, is copyright Datagres Technologies

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

The State of Cloud Storage 2013 Industry Report

The State of Cloud Storage 2013 Industry Report 2013 Industry Report A Benchmark Comparison of Performance, Availability and Scalability www.nasuni.com Executive Summary In the last year, Cloud Storage Providers (CSPs) delivered over an exabyte of data

More information

Scaling in the Cloud with AWS. By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com

Scaling in the Cloud with AWS. By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com Scaling in the Cloud with AWS By: Eli White (CTO & Co-Founder @ mojolive) eliw.com - @eliw - mojolive.com Welcome! Why is this guy talking to us? Please ask questions! 2 What is Scaling anyway? Enabling

More information

Is Hyperconverged Cost-Competitive with the Cloud?

Is Hyperconverged Cost-Competitive with the Cloud? Economic Insight Paper Is Hyperconverged Cost-Competitive with the Cloud? An Evaluator Group TCO Analysis Comparing AWS and SimpliVity By Eric Slack, Sr. Analyst January 2016 Enabling you to make the best

More information

Multi-Datacenter Replication

Multi-Datacenter Replication www.basho.com Multi-Datacenter Replication A Technical Overview & Use Cases Table of Contents Table of Contents... 1 Introduction... 1 How It Works... 1 Default Mode...1 Advanced Mode...2 Architectural

More information

SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V

SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V White Paper July 2011 Contents Executive Summary... 3 Introduction... 3 Audience and Scope... 4 Today s Challenges...

More information

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: 1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

More information

TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE

TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE Deploy a modern hyperscale storage platform on commodity infrastructure ABSTRACT This document provides a detailed overview of the EMC

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

Designing a Cloud Storage System

Designing a Cloud Storage System Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes

More information

AMAZON S3: ARCHITECTING FOR RESILIENCY IN THE FACE OF FAILURES Jason McHugh

AMAZON S3: ARCHITECTING FOR RESILIENCY IN THE FACE OF FAILURES Jason McHugh AMAZON S3: ARCHITECTING FOR RESILIENCY IN THE FACE OF FAILURES Jason McHugh CAN YOUR S ERVICE S URVIVE? CAN YOUR S ERVICE S URVIVE? CAN YOUR SERVICE SURVIVE? Datacenter loss of connectivity Flood Tornado

More information

Recommendations for Performance Benchmarking

Recommendations for Performance Benchmarking Recommendations for Performance Benchmarking Shikhar Puri Abstract Performance benchmarking of applications is increasingly becoming essential before deployment. This paper covers recommendations and best

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

How to Choose your Red Hat Enterprise Linux Filesystem

How to Choose your Red Hat Enterprise Linux Filesystem How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to

More information

An Esri White Paper January 2011 Estimating the Cost of a GIS in the Amazon Cloud

An Esri White Paper January 2011 Estimating the Cost of a GIS in the Amazon Cloud An Esri White Paper January 2011 Estimating the Cost of a GIS in the Amazon Cloud Esri, 380 New York St., Redlands, CA 92373-8100 USA TEL 909-793-2853 FAX 909-793-5953 E-MAIL info@esri.com WEB esri.com

More information

Tableau Server 7.0 scalability

Tableau Server 7.0 scalability Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different

More information

Introduction to AWS Economics

Introduction to AWS Economics Introduction to AWS Economics Reducing Costs and Complexity May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes

More information

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions

More information