Tape Cloud: Scalable and Cost Efficient Big Data Infrastructure for Cloud Computing

Size: px
Start display at page:

Download "Tape Cloud: Scalable and Cost Efficient Big Data Infrastructure for Cloud Computing"

Transcription

1 Cloud: Scalable and Cost Efficient Big Data Infrastructure for Cloud Computing Varun S. Prakash +, Yuanfeng Wen, Weidong Shi Department of Computer Science, University of Houston, Houston, TX, U.S.A +, Abstract Magnetic tapes have been a primary medium of backup storage for a long time in many organizations. In this paper, the possibility of establishing an inter-network accessible, centralized, tape based data backup facility is evaluated. Our motive is to develop a cloud storage service that organizations can use for long term storage of big data which is typically Write-Once-Read-Many. This Infrastructure-as-a-Service (IaaS) cloud can provide the much needed cost effectiveness in storing huge amounts of data exempting client organizations from high infrastructure investments. We make an attempt to understand some of the limitations induced by the usage of tapes by studying the latency of tape libraries in scenarios most likely faced in the backing up process in comparison to its hard disk counterpart. The result of this study is an outline of methods to overcome these limitations by adopting novel tape storage architectures, filesystem, schedulers to manage data transaction requests from various clients and develop faster ways to retrieve requested data to extend the applications beyond backup. We use commercially available tapes and a tape library to perform latency tests and understand the basic operations of tape. With the optimistic backing of statistics that suggests the extensive usage of tapes to this day and in future, we propose an architecture to provide data backup to a large and diverse client base. I. INTRODUCTION The last decade has seen an explosion of data generated by individuals and organizations. For instance, the amount of video data captured by a single HD surveillance camera at fs in days requires TB storage space []. The number of CCTV cameras in UK alone is estimated to be. million []. There are more than one factors that organizations consider before investing in a certain type of storage infrastructure [] []:()Longevity of the data or the intended period that the data needs to be stored or backed up; ()Durability of the storage media which should have low susceptibility to physical damage and be tolerant of a wide range of environmental conditions without data loss; ()Obsolescence of the storage technology or inversely, the technology s ability to be easily updated based on the availability of newer solutions; ()Cost or the over all expense of ownership including cost of purchasing and maintaining the necessary hardware, software and the media require; and ()Capacity or the overall amount of data that need to be stored or backed up for the organization s future use, ()Data Criticalness or the importance of data that needs to be stored[]. Based on these factors, organizations requiring data storage solutions can either have in house data back up facility or rely on a service provider to carry out the task efficiently and economically []. However, there are many intermediate considerations that both players need to consider. Choice of storage media can be meagerly categorized on the basis of the capacity of data that need to be stored because although it is a nearly irrefutable factor, it has a close relationship with the overall costs. Storage service providers and organization will be forced to decide, in most cases, on a tradeoff between a smaller, high speed, expensive storage media and larger, low speed, inexpensive storage media []. A high initial investment for the operating hardware and software may force smaller organizations to discard the option although the cost of the media itself is very economical. Similarly, high media costs can create scalability issues for organizations that needs storage expansion on a regular basis even in a nearly automated environment. Magnetic tapes, which started off as a primary storage media decades ago, have been preferred for backup storage of data generated by organizations for a long time now. There has been a continuous development of quality, form factor, capacity and robustness of the storage cartridges[]. It also continues to be a very economic type of storage media. The data stored on tapes has a property that justifies this choice; in that, this data may or may not be accessed in the near future. For instance, studies of enterprise file servers show that in a three month period, more than percent of the data on the servers was never accessed []. s serve its purpose well in situations where information was lost due to natural calamity, human error or system failures. For this reason, trained personnel are hired in order to maintain and service tape drives and tape libraries []. Despite the advantages of tapes, there has not been an increase in its usage due to high initial investment of the operating hardware and software which force smaller organizations to discard the option although the cost of the media itself is very economical []. This defeats the very purpose of affordable backup and limits the storage bandwidth significantly. Another important reason for the flat rate of increase in tape usage is its inability to promise high data rate transactions. The fact that tapes are linear data access media causes many processes waiting for data input/output from tapes to stay idle for longer periods accounts against tapes to be used for storing data which needs a more volatile storage environment as compared to backup or archival data. The main contributions of our work are as follows, We propose and evaluate for the first time in literature, an exciting new alternative in tape based big data stor-

2 Client Client Side Interface Dropbox, Google Cloud, etc Cloud Disk based Library Periodic Backup Engine based storage for backup and archive Fig.. Cloud is a cloud storage service that uses magnetic tapes as the main storage media to store unstructured and big data unlike most of the commercial cloud storage solution available today. age model that allows organizations to benefit from a large storage bandwidth without having to invest in the infrastructure itself. The crux of this approach lies in the optimization of the usage of the most affordable media. This allows storage service providers to focus on the technicalities such as using affordable media, required infrastructure and man power to handle them and permit client organizations to concentrate monetary expenditures away from investing on the fundamentals of data backup and storage. The design of a cloud based storage framework provided as a service which implements the necessary middleware for seamlessly integration of large scale multi-user tape libraries with the cloud model is presented. The tape cloud is designed to work autonomously or in conjunction with the current cloud infrastructure. An attempt to optimize performance of the tape cloud with a multi-level hybrid storage model where hard disks are employed for providing cache spaces for data to be stored or retrieved from tape libraries(one LTO tape library can provide the storage capacity from TB to more than PB) is made and the observations are reported. Solutions are devised to support efficient multi-user accesses to tape cloud using optimized hardware configurations and combinations, and high level scheduling approaches which are evaluated for their performance. The proposed tape cloud framework points to a new direction for creating service oriented, cost effective, massive scale infrastructure to meet the growing storage challenge in the coming era of big data enabled industries and research. The paper is organized as follows. A study of economic impacts of tapes is done in section II. The system design for the tape cloud along with operational details are provided in section III. We propose models for testing hypotheses on I/O performance on tapes, details of which is discussed in section IV and results of which have been posted in section V. A note on works related to our research is given in section VI and the conclusion and our future plan is given in section VII. II. BACKGROUND Magnetic tapes have been in active usage over the last years, a time period that is comparatively high for storage technology in the growing IT age. This has been possible due to the continuous improvement in the quality, capacity, durability and areas of application of tapes. The dominant motivation to use tapes, however, arises from the fact that tapes are highly economical and ideal to store large amount of data which may or may not be useful to the organization in the near future. The Linear -Open is a set of standards that directs development and manages licensing and certification of media and mechanism manufacturers.the standard form-factor of LTO technology goes by the name Ultrium, the original version of which was released in and could hold GB of data in a cartridge. LTO version released in can hold. TB in a cartridge of the same size as its predecessors. Another very important storage media that has been around for a long time is the Hard Disk Drive. The similarity between hard disk and tapes is only the principle of using magnetic material to store data, hard disks rose to the scene as a better option to store data that needed faster retrieval. The cost of operating hard disks has been an attractive tradeoff to make to the cost of the media itself. This also has proved to be the tape killer as the hardware required to operate tapes and personnel required to maintain them costs unreasonably high. A comparative study performed let us see that the monetary expenses in the form of initial investment involved create an obstruction for many small organizations to have backup solutions on tapes despite its competitive cost advantages, results of which have been shown in table I. Temperature (Centigrade) TABLE I COMPARISON OF COSTS OF STORING UNIT DATA ON DIFFERENT STORAGE MEDIA Solid State Hard Disk s $/TB/yr % total $/TB/yr % total $/TB/yr % total Media... Capital... Maintenance... Facilities... Personnel... Total Hitachi HDSKLA Drive IBM ICLAVER Fujitsu MPEAE Seagate STA Hitachi HDPGLA Seagate STLW Time (seconds) Fig.. Operating Temperatures of different media Energy Used (Joules) Hitachi HDSKLA IBM ICLAVER Fujitsu MPEAE Storage Media Seagate STA Hitachi HDPGLA Idle Standby Seagate STLW Fig.. Unused State Energy Consumption of different media Another contributor to costs is the overall energy consumption of storage media and the associated infrastructure. Power inefficiencies in storage systems can arise at two stages, when the system uses a lot of energy even in the idle state (Figure ) and when the operating temperatures (Figure ) of the system results in the need for external conditioning systems. A study performed with a tape drive and six hard disk drives of

3 Cartridges Slots () Computer (SAS Interface) $,, $,, $,, $,, Drive Read/Write Head () $,, $, Cart Rail () Cart () Controller () $, $, $, $, $, $, $ Acquisition Price Yr Energy Cost Disk Acquisition Price Disk Year Energy Cost Fig.. and Disk: Acquisition and Energy Cost from INSIC Study []. The five year period energy cost of disks is comparable with the acquisition cost of the tapes with equal storage capacities. different specification and manufacturer reveals that tape is much more power efficient than hard disk when used on a large scale such as in data centers. The fact that hard disks need to be electrically powered for operation is a mammoth disadvantage over tapes which is a non powered static storage and has very low power consumption per unit data []. Figure shows the relative expenditure of acquiring the two different kinds of media as compared to the operational costs in terms of power required for their usage. From the perspective of the storage infrastructure service provider, scalability becomes very expensive [] [] []. The infrastructure for using tapes which, in most cases, is the library, introduces a new kind of expenses in the form of delays which need to be incurred due to the use of robotics inside the device. The delay caused can create a slowdown of I/O processes also affecting business processes of the organization []. We study some of these device specific delays. We conducted experiments over commercial LTO- tape libraries (e.g., Tandberg tape library[]). Figure shows the diagrammatic representation of the simple tape library (the number of tape slots is a parameter that varies among different tape library products). The numbered slots are tape cartridge holders. The robotic cart runs on the rail in front of the tape driver and helps load the tapes into the driver. To complete our analysis, we have made a multi trial recording of delay of the various operations performed within the library. We can understand the basic principles of operation and create a time profile for these operations which helps us in creating faster and more efficient hardware. Table II in section V shows the delay incurred in moving tape cartridges in the numbered slots to the drive and back to the slots after performing the operation. The average time taken for the transport of cartridges in both cases is more than a minute. Once the tape is in place, it takes nearly seconds for it to load and be ready to read or write. The tape transportation cart has an upward path time of. seconds and a total end to end path time of. seconds. The robot usually performs both together during the load or unload operation from a slot at the very end of the library so time can be saved. Based on the numbers, we can get a clear time profile about the tape library operations. One of our design objectives is to reduce in the library Fig.. Representation of the Tandberg T Library. At the bottom is the library showing parts with numbers mapped to the representation above it. Bottom right is a single LTO tape cartridge. time spent on the tasks such as movement of tapes from the slots to the drive. III. S YSTEM D ESIGN Figure shows a bird s eye view of the tape cloud architecture. Our effort in creating a cloud storage with a media that is known to be at the bottom of the ladder of performance, calls for alteration of hardware assembly and design. In order to get best results, we create a design for the hardware that cases the tapes along with special instructions for multiple reader rewinder sets aimed to keep the setup cost effective. Although the unit operations such as writing data onto tapes remains similar, we consider some augmentative enhancements that are specific to our case. The software of the tape cloud can be effectively termed as a middleware which operates between faster yet smaller hard disk buffers and comparatively slower yet larger tape backend storage. This middleware needs to function as an agent arbitrating various components in order to reduce the overhead caused by using a slower backend media [] as shown in Figure. It also performs various other tasks such as data set segmentation, scheduling, encryption, load balancing and management of database containing the meta data and block IDs of data stored in tapes. Figure provides an understanding of the middle ware that operates between the distributed file system and tape storage. This serves as an agent aiding the distributed file system in overcoming the latencies of using a slower media and also serves as an abstraction in clouds using hybrid storage infrastructure. A. An Abstraction for Library Hardware In order to make the workflow seamless and application independent, special considerations about the hardware setup and configuration need to be performed. The massive scale of our focus encompasses multiple drives, high speed robotics that seek, grab and load tapes to drivers at high speeds and hazard resistant tape storage space. To make data retrieval and deposition more efficient, we consider a conventional driver to be split into two parts. One part rewinds the subsequent scheduled tapes to the correct position and the other part is involved in reading and writing into tapes (the read and write head also has a rewinding functionality to perform small and quick seeks so it could be the addition of a rewinder in the

4 Library Storage Storage Block Database Scheduling IO Scheduling Policies Stage GET-Collection Server Metaserver/ Memcache Load Balancing Server Central Block Database PUT-Collection Server Central Block Database Obtain Client Data over Network Block Creation Collection Obtain Client Data through Media Blocking and Encryption Policies Client Client Client Client Cloud Fig.. Implementation Architecture of Cloud. The arrows represent the direction of flow of data. Stage Fig.. Stages and functions of each stage of the filesystem for Cloud. Although distributed by functionality, the filesystem is monolithic across the storage system. system). This way we can pipeline the rewinding process and isolate the rewind latency from the overall read operation. B. Multi Tier File System FUSE [] is a framework to help develop customized file system. FUSE module has been officially merged into the Linux kernel tree since kernel version... FUSE provides interfaces to fully comply with POSIX file operations. We design a file system using FUSE used at different tiers in the architecture. The file system is monolithic but logically distributed and staged based on functionality as shown Figure. Figure provides a block representation of the filesystem which is an important part of the middleware. The collection servers (the PUT-Collection servers and GET- Collection servers) implement modules which acquire client data to be written to or retrieved from tapes. Based on client specific policies, the data to be stored on tapes is encrypted and segmented. Each of the segments are identified and accounted in local databases. Similarly, to retrieve data from tapes, the filesystem queries the local databases and requests particular blocks from tapes and converts it to the pristine data. The Interface Machines (TIM) are networked to the collection servers and blocks of data are sent and received via high speed connections. The load balancing server manages a workload based scheduling system in order to efficiently distribute data to various TIMs to avoid IO bottle necks. The filesystem is highly customizable in the sense, data from clients can be blocked and stored based on preference chosen by the clients. For example, video surveillance data, should the client be able to obtain data by the hour, must be handled differently as compared to be able to obtain data by days. So the collection servers are responsible to block these data in a manner easy for retrieval. C. Load Balancing The duties of the Load balancing servers are by far the most critical in the system. It is involved in many services to all other entities and ensures that there is no data clogging at a certain point in the network. First of all, it acts as a contact point to all the clients who wish to store or retrieve data from the tapes. After the client is authorized, the load balancing Disk Buffer Storage Manager Compressed Blocks Data Stream Receiver Host Adapter Data Stream Sender Read Request Queue Realtime Scheduler s File Object Lookup (Embedded Sqlite DB) Write Request Queue Index (Embedded Sqlite DB) Robots Fig.. The block representation of the middleware on the basis of roles and responsibilities within the system. server requests a memory allocation on the collection servers where the user data is going to be collected. It also creates and manages database entries for the blocks that are created and manages redundancy to avoid data loss and increase availibility. The load balancing servers connect and talk directly with the metaserver and the central database which contain information about the load on the Interface Machines and currently serviced clients. D. Data Handling a) Data Acquisition: The system s data collection from the user can be performed in two main ways. Data sets of smaller sizes which can be uploaded directly through the internet are stored in the Collection servers. The client side has an application that requests a data upload session from the Load balancing server which then allocates the required amount of space in the collection server and initiates the client to directly upload its data to the specified location. Another data acquisition protocol is designed for very large data sets where data is received in the form of storage media units itself. The collection servers, in this case, provide a docking interface functionality where the media can be mounted directly on the servers and the extracted data be treated in the same way as uploaded data. s

5 Middleware Block FUSE Kernel Module Storage Manager Disk Buffer Library DFS Interface DFS Meta Data Mounted Path Meta Data in Embedded DB Realtime Scheduler FS Index/Meta Data in Embedded DB Port Port Port Drivers Port Fig.. Logical diagram of a tape cloud node to show the relationship between the various functional units within the FUSE enabled tape cloud infrastructure. SEC: SEC: Read / Write Head SEC: :SEC Read / Write Head :SEC :SEC :SEC :SEC SEC: WRITE :SEC WRITE :SEC :SEC Before Scheduling :SEC After Scheduling :SEC Fig.. The Closest Process First Schedule: before scheduling(top) and after scheduling(bottom) b) Data distribution to Interface Machines (TIM): Once the data has been segmented at collection servers, it is distributed to the TIM computers. Data is distributed to a TIM based on its current IO queue depth. Overloading a single TIM computer with more than required write requests would cause a long latency in the read process as they tend to pile up at a single TIM computer. A record or a map of these segments are made and stored in the meta server. c) Distributed Write/Read: Like any other cloud service, the architecture is complex with a large number of tape drives. In order to increase the IO bandwidth, parallel reading and writing of data into multiple tapes can be envisioned. This consideration, however, needs to be analysed from more than one perspective. The larger the number of subdivisions, the larger the latency induced by seeking for the data. Yet, a huge single read or write to tape can cause an increase in the waiting time of other requests. Many task scheduling algorithms have been designed[] to improve performance of IO in tapes and better utilize drive resources, but little has been done with the latest LTO tapes drives. Based on the latency test results of the tape device, we see that the seek time of the robotics takes a considerable longer time than reading or writing of data to the disk. In order to improve time efficiency, we schedule the I/O requests such that each request spends the least amount of time performing seek operations and longer periods of read or write operations. E. Closest Process First Scheduling From the latency analysis, we can see that the the largest time consumption is by the seek, grab and load into driver operations. In order to make the system time efficient, we need to let the system spend most of the time reading from or writing data onto the tapes. For this reason, we have to schedule read and write operations in a way that the next operation to be performed is on a tape that is either the same tape or one that is closest to the current tape. The scheduling algorithm is applied at the Interface Machines after it obtains the data from the collection servers or after it receives a read read request from the load balancer. Figure has a representation of how the scheduling is done at the TIMs. We evaluate our design and latency model for a scheduler called Closest Process First that does exactly this. The algorithm is shown below. Algorithm Closest Process First Scheduler Input: m tapes: Γ t = {tape, tape,, tape m} n requests: Γ r = {request, request,, request n} Output: Drive/Rewinder Seek Path : New Request req x : Location value of req x = L x = Infinity : Record current location of the drive/rewinder L c : while true do : wait for a request : if new request req x arrives then : Get location L x of process to be performed : for i n do : if L x is closer to L c than L i then : place req x before i in the schedule queue : else : move to next request L i + : end if : end for : end if : end while IV. SYSTEM MODELS In this section, we describe several models to analyze system response time. The assumptions and notations for the models are as follows. T seek (i) is the time to move the tape reader to the correct tape for the i th request, and load the tape to the reader.

6 The data users request is loaded by blocks. Once a block is loaded, users can start processing the blocks, for example, applying the video analysis on the loaded block of videos, while the next block can be loaded simultaneously. The size of each block is chose to minimize the average response time. The size of a block is denoted as BLK. T wind (i,j) is the time to wind the tape fast forward to the first byte of the j th block to read for i th request. T wind (i) is the average time to wind the tape fast forward to the first byte read for i th request. T transfer (i,j) is the time to either read the j th block frome the tape for the i th request. T transfer (i) is the average time to either read a block from the tape for the i th request. R transfer is the data transfer rate of the tape; T transfer = Size transfer R transfer ; R transfer varies from MB/s to MB/s []. R consume (i) is the data consumption rate of the i th request, which describes how fast users can finish processing the data that they request. A. Response Time Model The response time of a user request is defined as the length of the period starting from when the request is released and ending at the point all the data are fully consumed by the user. For example, if one user requests GB data given T seek = s, T wind = s, T transfer =, s, T consume =, s, then the response time is,s. If multiple requests are queued and processed in the order of their arrival only after the only after the completion of the previous request, the average response time would be very long. As discussed before, the data consumption rate is usually much smaller than the data transfer rate, therefore, it is beneficial to read data partially in order to reduce the average response time for each request. B. System I/O Model The I/O time based on the experiments conducted on the tape library has been analyzed to be the sum of three major components: T I/O = T seek + T wind + T transfer. T seek is proportional to the distance between the tape drive and the location of the tape in the library that the robot has to cover. Seeking process is very time consuming. Thus, we have to minimize the seek time by reducing both the number of the seeks and the time for moving the tape reader. For the first purpose, the system chooses a large block size. For the second purpose, the system has to use an optimal schedule for processing the requests. Block Size: The data stored on the tape are logically organized as blocks. A block is considered the minimum unit when reading from or writing to the tapes. The block size can not be either too large or too small. If the block size is too large, it will block the following requests or make them to wait much longer. On the other hand, if it is too small, frequent seeking to perform the next IO will degrade performance. Generally, the block size is set based on the fact that the transfer time of one block should be larger (or much larger) than the average seek time. Here is an example to demonstrate how the block size is determined. Assume that there are two requests in the queue, each of them asks to load D GB data. Since the system uses a block size of BLK, after the j th block is loaded from the tape to the hard drive, it will take BLK/R consume units of time to process. During this period, the reader can switch to process the other request to load the k th block, which will take T seek ()+T wind (, k) + T transfer (, k) to load the k th block for that request. Then the reader switches back, and loads the (j + ) th block for the first request, which will take T seek () + T wind (, j + ) + T transfer (, j + ). Ideally, if BLK/R consume T seek ()+T wind (, k)+t transfer (, k)+ T seek ()+T wind (, j+)+t transfer (, j+), the first request will not even notice that its data loading process has been interrupted. Generally, if there are w requests waiting in the queue, as long as BLK R consume (i) w p= (T seek(p) + T wind (p) + T transfer (p)), the i th request will be processed smoothly without noticing that the data loading process has been interrupted. C. Multiple Model If there are more than one readers available, the scheduling problem is known to be NP-Hard. Since it is impossible to find the optimal solution, we propose a partitioned solution for multiple reader scheduling as shown in Alg.. In this design, each reader is assigned to be responsible for a specific set of tapes. Each reader only stops at the tapes assigned to it and skips the others. For example, if there are two readers and tapes, the first reader may take care of the first tapes while the second reader are in charge of the rest. The tapes in one set should be physically close to each other. However, this design may result in hot spot, that is, one reader may be busy all the time while others are idle. To solve this, the system allows the idle readers help to process the requests, but have to be back to their own duties if requests to their assigned tapes arrive. Algorithm Partitioned Task Scheduling Algorithm for Multiple s Input: m tapes: Γ t = {tape, tape,, tape m } n tape readers: Γ tr = {reader, reader,, reader n } Output: Online Schedule for each coming requests : Assign the m tapes to n readers. Each reader will take care at most m/n tapes which are close to each other. : Store the assignments to the global schedule manager. : while TRUE do : Wait for the next request, req. : Get the meta-information of the req from the database and find the tape id for req. : Forward req to the reader, which is in charge of the requests for tape id. : Schedule req at the reader locally using elevator schedule algorithm. : end while

7 TABLE II TANDGERG T LOAD AND UNLOAD DELAYS Type From To Motion(sec) Load(sec) Type From To Motion(sec) Load(sec) LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. LOAD Drive.. UNLOAD Drive.. Average.. Average.. Average Response Time(min) Without Global Partition With Global Patition # of request (*) Fig.. Average Response Time Using Globally Partitioned Workspace Average Response Time(min) Block Size (GB) Fig.. Average Response Time under Different Block Sizes V. EVALUATION We simulate a tape cloud node having tapes and up to tape readers used simultaneously to evaluate our design. The parameters used in the simulation are measured from an actual Tandberg LTO- tape library [], []. The physical capacity per cartridge is. TB; the data transfer rate is MB/s; the rewind speed is meters/sec; the tape length is meters; the cartridge memory is KB; the load and unload delays are summarized in Table II. In the first set of experiments, the average response time under different block sizes are studied. Each user request requires TB tape data in total. In this experiment, the number of tape reader is one. User requests are randomly generated. Data is read and processed by blocks. The consumption rate is MB/s. As discussed in section IV-B, once one block of data is loaded, the tape reader switch to serve the next request. As shown in Figure, when the block size is between GB and GB, the average response time is minimized. Average Response Time (min) Fig.. # of requests (*) FCFS CPF Average Response Time under Different Scheduling Algorithms Average Response Time (min) # of Drivers Partitioned Scheduling Unpartitioned Scheduling Fig.. Average Response Time Using Partitioned and Unpartitioned Scheduling Algorithms In addition, if the block size is too large, the average response time will increase significantly. The average response time with the block size of GB is almost. times of that with the block size of GB. Second, the average response time results under two different scheduling algorithms, i.e., First Come First Serve (FCFS) and Closet Process First (CPF), are shown in Figure. Up to, requests are generated and their response times are measured. According to the results, the CPF scheduling algorithm saves around % of the response time. When handling multiple tape drives, we either have the option of creating an any-tape to any-drive environment where a drive has access to any tape in a limited system or we can partition the system such that a drive services only a specific set of tapes, referred to as Global partitioning. Global partitioning also avoids uncontrolled tape seek time T seek which, from our study, can be seen to be most expensive. Employing global partitioning decreasing the average response time for requests as shown in Figure. We also found that the scheduling algorithms get affected when we use global partitioning. The average response time results when using multiple readers are evaluated in Figure. One or more (up to ) tape reader are enabled to work simultaneously. Two different scheduling algorithms are compared, the partitioned algorithm shown in Alg. and the unpartitioned algorithm, which allows each reader to serve any tapes in the library. The results are presented in Figure. As the number of working tape readers increase, the average response time is reduced accordingly. However, when the number of tape readers goes beyond, both scheduling approaches show

8 little changes in response time. This is because under the current workload, readers are able to serve the requests efficiently. Even if more readers are added, some of them will be idle for most of the time. Figure indicates that the partitioned scheduling algorithm performs much better than the unpartitioned one. VI. RELATED WORK In spite of the potential to be the most scalable and cost efficient solution for meeting the growing storage demand, cloud based services leveraging tapes have received little or almost no attention from the academic research community. To our best knowledge, our paper is the first and the only one that provides a complete tape based service model and integrates large scale tape infrastructure with the cloud. In industry, there has been a long history of improving the existing and designing new tape libraries for enterprise customers (e.g., Hewlett-Packard StorageWorks ESL/EML, IBM TS/TS, Quantum Scalar, Oracle StorageTek). These tape libraries are designed as end-user products for the enterprise users to own their own on-premise tape infrastructures. Our efforts represent an opposite off-premise cloud based direction with the objective to provide low cost tape based storage to the users without the requirement to own a private tape infrastructure. VII. CONCLUSION AND FUTURE WORK We present and evaluate a design to provide a cloud storage system with tape media as backend which is a first in its kind which implements centralized data storage facility using tape media. We recognize some of the barriers of using tape technology both economically and operationally. Some solutions and ideas are evaluated in order to reduce the consequences faced by these barriers and provide a smooth performance in data storage and retrieval process. From the results obtained in design evaluation, we can see that there is an increase in I/O throughput using the scheduling and parallel I/O models that are discussed in the paper, thus paving way for the possibility of having large scale operations using these techniques. The speed of data retrieval is critical for a cloud service., by nature, has been used for backing up data and creating archives which means that request for data stored on tapes arrives with a probability that varies well below as compared to other day-to-day cloud storage. In this case, the tape cloud data centers provide sufficient security and safety against theft, fire and natural calamity as compared to on site backing up. One of the most exciting aspects of our research is the opportunities it presents for future work. Understanding the economics of revisiting a legacy system to solve the data explosion problems of today requires an overhaul of nearly every piece of technology associated with the storage system. An interesting study to perform is the relation between the number of media units and the number of drivers used in large scale deployments. Also we would like to implement an easy and direct interfacing system which allows the tape cloud to be connected directly to major cloud storage providers such as Dropbox or Google Cloud and see the effects of using tape media in conjunction with the storage media used by other storage providers. Other plans of extension include the evaluation of tape cloud in applications which require much higher I/O throughput. VIII. ACKNOWLEDGEMENT We would like to thank the reviewers for their comments which significantly improved the paper. This research is partially supported by the National Science Foundation under Award Number CNS. The conclusions contained in this document are those of the authors and should not be interpreted as representing the opinions or policies of NSF. REFERENCES [] Seagate, Video surveillance storage: How much is enough? [] County of cameras: Cheshire constabulary aims to count every private camera in the county, CCTV Image Online. [] D. Ally, Choosing the appropriate storage media to collect video-based evidentiary data, Digital Ally Inc., Tech. Rep.,. [] W. Purvis, The hot new storage technology for is tape? Research Note, Data Mobility Group, March. [] J.-H. Lee, T. Feng, W. Shi, A. Bedagkar-Gala, S. K. Shah, and H. Yoshida, Towards quality aware collaborative video analytic cloud, in IEEE CLOUD,, pp.. [] D. S. Rosenthal, D. Rosenthal, E. L. Miller, I. Adams, M. W. Storer, and E. Zadok, The economics of long-term digital storage, in The Memory of the World in the Digital Age: Digitization and Preservation, Sep.. [] I. Foster, Globus online: Accelerating and democratizing science through cloud-based services, Internet Computing, IEEE, vol., no., pp., May-June. [] A. J. Argumedo, D. Berman, R. G. Biskeborn, G. Cherubini, R. D. Cideciyan, E. Eleftheriou, W. Häberle, D. J. Hellman, R. Hutchins, W. Imaino, J. Jelitto, K. Judd, P.-O. Jubert, M. A. Lantz, G. M. McClelland, T. Mittelholzer, C. Narayan, S. Ölçer, and P. J. Seger, Scaling tape-recording areal densities to gb/in, IBM J. Res. Dev., vol., no., pp., Jul.. [] J. Jackson, Most network data sits untouched, Government Computer News, July. [Online]. Available: /Most-network-data-sits-untouched.aspx [] F. Moore, : New game. new rules - tape re-architects for st century data explosion. [Online]. Available: stcentury.pdf [] D. Reine, In search of the long-term archiving solution tape delivers significant tco advantage over disk, The Clipper Group, December. [Online]. Available: research/tcg.pdf [] IDC, Worldwide storage in the cloud - forecast: Growth in public cloud storage services continues as firms decapitalize it. [] International magnetic tape storage roadmap, INFORMATION STOR- AGE INDUSTRY CONSORTIUM, nov. [] Two thirds of disk-only users look to add tape into storage infrastructure, Storage Newsletter, March. [Online]. Available: http: // [] Tandberg storagelibrary t. [Online]. Available: storagelibrary/storagelibrary-t/ [] I. Drago, M. Mellia, M. M. Munafo, A. Sperotto, R. Sadre, and A. Pras, Inside dropbox: understanding personal cloud storage services, in Proceedings of the ACM conference on Internet measurement conference, ser. IMC,, pp.. [] Fuse filesystem project. [Online]. Available: net/ [] O. Sandsta and R. Midtstraum, Improving the access time performance of serpentine tape drives, in Data Engineering,. Proceedings., th International Conference on,, pp.. [] Linear tape-open. [Online]. Available: Linear -Open

Tape Cloud Scalable and Cost Efficient Big Data Infrastructure for Cloud Computing. Varun S. Prakash with Yuanfeng Wen Weidong Shi

Tape Cloud Scalable and Cost Efficient Big Data Infrastructure for Cloud Computing. Varun S. Prakash with Yuanfeng Wen Weidong Shi Tape Cloud Scalable and Cost Efficient Big Data Infrastructure for Cloud Computing Varun S. Prakash with Yuanfeng Wen Weidong Shi University of Houston IEEE Cloud July 1, 2013 I2C Lab Need for Affordable

More information

Taming Big Data Storage with Crossroads Systems StrongBox

Taming Big Data Storage with Crossroads Systems StrongBox BRAD JOHNS CONSULTING L.L.C Taming Big Data Storage with Crossroads Systems StrongBox Sponsored by Crossroads Systems 2013 Brad Johns Consulting L.L.C Table of Contents Taming Big Data Storage with Crossroads

More information

Data Sheet FUJITSU Storage ETERNUS LT260 Tape System

Data Sheet FUJITSU Storage ETERNUS LT260 Tape System Data Sheet FUJITSU Storage ETERNUS LT260 Tape System Data Sheet FUJITSU Storage ETERNUS LT260 Tape System Easy Scalable Tape Automated Solution for up to 3.5 PB of Data ETERNUS LT TAPE STORAGE SYSTEM The

More information

WHITE PAPER WHY ORGANIZATIONS NEED LTO-6 TECHNOLOGY TODAY

WHITE PAPER WHY ORGANIZATIONS NEED LTO-6 TECHNOLOGY TODAY WHITE PAPER WHY ORGANIZATIONS NEED LTO-6 TECHNOLOGY TODAY CONTENTS Storage and Security Demands Continue to Multiply.......................................3 Tape Keeps Pace......................................................................4

More information

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: 1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

More information

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software The Video Edition of XenData Archive Series software manages one or more automated data tape libraries on

More information

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief Optimizing Storage for Better TCO in Oracle Environments INFOSTOR Executive Brief a QuinStreet Excutive Brief. 2012 To the casual observer, and even to business decision makers who don t work in information

More information

A 5 Year Total Cost of Ownership Study on the Economics of Cloud Storage

A 5 Year Total Cost of Ownership Study on the Economics of Cloud Storage 2016 Industry Report Cloud Storage Rent or Buy? A 5 Year Total Cost of Ownership Study on the Economics of Cloud Storage Sponsored by StrongBox Data Solutions, Inc. Copyright 2016 Brad Johns Consulting

More information

XenData Archive Series Software Technical Overview

XenData Archive Series Software Technical Overview XenData White Paper XenData Archive Series Software Technical Overview Advanced and Video Editions, Version 4.0 December 2006 XenData Archive Series software manages digital assets on data tape and magnetic

More information

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014 The Flash-Transformed Financial Data Center Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014 Forward-Looking Statements During our meeting today we will

More information

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router HyperQ Hybrid Flash Storage Made Easy White Paper Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com sales@parseclabs.com

More information

The ROI of Tape Consolidation

The ROI of Tape Consolidation The ROI of Tape Consolidation NOTICE This white paper may contain proprietary information protected by copyright. Information in this white paper is subject to change without notice and does not represent

More information

SLIDE 1 www.bitmicro.com. Previous Next Exit

SLIDE 1 www.bitmicro.com. Previous Next Exit SLIDE 1 MAXio All Flash Storage Array Popular Applications MAXio N1A6 SLIDE 2 MAXio All Flash Storage Array Use Cases High speed centralized storage for IO intensive applications email, OLTP, databases

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Tape. Emerging Storage Technology Panel IEEE/NASA Conference MSST2008 September 25, 2008

Tape. Emerging Storage Technology Panel IEEE/NASA Conference MSST2008 September 25, 2008 Tape IBM Tape Systems Emerging Storage Technology Panel IEEE/NASA Conference MSST2008 September 25, 2008 Glen Jaquette IBM Distinguished Engineer / Architect Tape Drives, Automation, & Subsystems 2008

More information

Tandberg Data AccuVault RDX

Tandberg Data AccuVault RDX Tandberg Data AccuVault RDX Binary Testing conducts an independent evaluation and performance test of Tandberg Data s latest small business backup appliance. Data backup is essential to their survival

More information

Back to the Future: Using Magnetic Tapes in Cloud Based Storage Infrastructures

Back to the Future: Using Magnetic Tapes in Cloud Based Storage Infrastructures Back to the Future: Using Magnetic Tapes in Cloud Based Storage Infrastructures Varun S. Prakash, Xi Zhao, Yuanfeng Wen, and Weidong Shi Department of Computer Science, University of Houston 48 Calhoun

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information

PORTrockIT. Spectrum Protect : faster WAN replication and backups with PORTrockIT

PORTrockIT. Spectrum Protect : faster WAN replication and backups with PORTrockIT 1 PORTrockIT 2 Executive summary IBM Spectrum Protect, previously known as IBM Tivoli Storage Manager or TSM, is the cornerstone of many large companies data protection strategies, offering a wide range

More information

FUJIFILM Recording Media U.S.A., Inc. LTO Ultrium Technology

FUJIFILM Recording Media U.S.A., Inc. LTO Ultrium Technology FUJIFILM Recording Media U.S.A., Inc. LTO Ultrium Technology 1 What is LTO Ultrium Technology? LTO stands for Linear Tape Open, since it is linear data tape, and targeted to open systems environments in

More information

Optimizing LTO Backup Performance

Optimizing LTO Backup Performance Optimizing LTO Backup Performance July 19, 2011 Written by: Ash McCarty Contributors: Cedrick Burton Bob Dawson Vang Nguyen Richard Snook Table of Contents 1.0 Introduction... 3 2.0 Host System Configuration...

More information

Amazon Cloud Storage Options

Amazon Cloud Storage Options Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object

More information

HyperQ Storage Tiering White Paper

HyperQ Storage Tiering White Paper HyperQ Storage Tiering White Paper An Easy Way to Deal with Data Growth Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com

More information

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved. Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat

More information

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

Datasheet Fujitsu ETERNUS LT20 S2

Datasheet Fujitsu ETERNUS LT20 S2 Datasheet Fujitsu ETERNUS LT20 S2 Economy System Ideal for Small Businesses and Branch Offices ETERNUS LT TAPE LIBRARY SYSTEM The affordable ETERNUS LT tape systems offer impressive scalability and reliability.

More information

Archiving On-Premise and in the Cloud. March 2015

Archiving On-Premise and in the Cloud. March 2015 Archiving On-Premise and in the Cloud March 2015 Cloud Storage Storage accessed over a network via web services APIs. http://swift.example.com/v1/account/container/object Source: http://docs.openstack.org/admin-guide-cloud/content/objectstorage_characteristics.html

More information

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015 A Diablo Technologies Whitepaper Diablo and VMware TM powering SQL Server TM in Virtual SAN TM May 2015 Ricky Trigalo, Director for Virtualization Solutions Architecture, Diablo Technologies Daniel Beveridge,

More information

Advantages of Tape-Network (LTO) Technology

Advantages of Tape-Network (LTO) Technology LTO Technology & Storage Area Networks A Winning Combination Hewlett-Packard IBM Corporation Seagate Removable Storage Solutions Introduction Explosive data growth, 24x7 operations and business-critical

More information

Product Brief: XenData X2500 LTO-6 Digital Video Archive System

Product Brief: XenData X2500 LTO-6 Digital Video Archive System Product Brief: XenData X2500 LTO-6 Digital Video Archive System Updated: March 21, 2013 Overview The XenData X2500 system includes XenData6 Workstation software which provides the archive, restore and

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array Evaluation report prepared under contract with Lenovo Executive Summary Love it or hate it, businesses rely on email. It

More information

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

EMC XtremSF: Delivering Next Generation Performance for Oracle Database White Paper EMC XtremSF: Delivering Next Generation Performance for Oracle Database Abstract This white paper addresses the challenges currently facing business executives to store and process the growing

More information

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores

More information

Make the Most of Big Data to Drive Innovation Through Reseach

Make the Most of Big Data to Drive Innovation Through Reseach White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

More information

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data : High-throughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Real-time analytical processing (RTAP) of vast amounts of time-series data from sensors, server logs,

More information

How AWS Pricing Works

How AWS Pricing Works How AWS Pricing Works (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 15 Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 Fundamental

More information

Many government agencies are requiring disclosure of security breaches. 32 states have security breach similar legislation

Many government agencies are requiring disclosure of security breaches. 32 states have security breach similar legislation Is it safe? The business impact of data protection. Bruce Master IBM LTO Program Linear Tape-Open, LTO, LTO Logo, Ultrium and Ultrium Logo are trademarks of HP, IBM and Quantum in the US and other countries.

More information

Data Centric Computing Revisited

Data Centric Computing Revisited Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

Is it Safe? The business impact of data protection. Bruce Master IBM LTO Program

Is it Safe? The business impact of data protection. Bruce Master IBM LTO Program Is it Safe? The business impact of data protection. Bruce Master IBM LTO Program Linear Tape-Open, LTO, LTO Logo, Ultrium and Ultrium Logo are trademarks of HP, IBM and Quantum in the US and other countries.

More information

Inge Os Sales Consulting Manager Oracle Norway

Inge Os Sales Consulting Manager Oracle Norway Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database

More information

Introduction to AWS Economics

Introduction to AWS Economics Introduction to AWS Economics Reducing Costs and Complexity May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes

More information

Solution Brief: Creating Avid Project Archives

Solution Brief: Creating Avid Project Archives Solution Brief: Creating Avid Project Archives Marquis Project Parking running on a XenData Archive Server provides Fast and Reliable Archiving to LTO or Sony Optical Disc Archive Cartridges Summary Avid

More information

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER STORAGE CENTER DATASHEET STORAGE CENTER Go Beyond the Boundaries of Traditional Storage Systems Today s storage vendors promise to reduce the amount of time and money companies spend on storage but instead

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

Why Hybrid Storage Strategies Give the Best Bang for the Buck

Why Hybrid Storage Strategies Give the Best Bang for the Buck JANUARY 28, 2014, SAN JOSE, CA Tom Coughlin, Coughlin Associates & Jim Handy, Objective Analysis PRESENTATION TITLE GOES HERE Why Hybrid Storage Strategies Give the Best Bang for the Buck 1 Outline Different

More information

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System CS341: Operating System Lect 36: 1 st Nov 2014 Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati File System & Device Drive Mass Storage Disk Structure Disk Arm Scheduling RAID

More information

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows Sponsored by: Prepared by: Eric Slack, Sr. Analyst May 2012 Storage Infrastructures for Big Data Workflows Introduction Big

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

Comparison of Native Fibre Channel Tape and SAS Tape Connected to a Fibre Channel to SAS Bridge. White Paper

Comparison of Native Fibre Channel Tape and SAS Tape Connected to a Fibre Channel to SAS Bridge. White Paper Comparison of Native Fibre Channel Tape and SAS Tape Connected to a Fibre Channel to SAS Bridge White Paper Introduction IT Managers may approach backing up Storage Area Networks (SANs) with the idea of

More information

Demystifying the Cloud Computing 02.22.2012

Demystifying the Cloud Computing 02.22.2012 Demystifying the Cloud Computing 02.22.2012 Speaker Introduction Victor Lang Enterprise Technology Consulting Services Victor Lang joined Smartbridge in early 2003 as the company s third employee and currently

More information

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 EXECUTIVE SUMMARY Microsoft Exchange Server is a disk-intensive application that requires high speed storage to deliver

More information

Cloud Based Application Architectures using Smart Computing

Cloud Based Application Architectures using Smart Computing Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products

More information

Technology Insight Series

Technology Insight Series Evaluating Storage Technologies for Virtual Server Environments Russ Fellows June, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved Executive Summary

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...

More information

How AWS Pricing Works May 2015

How AWS Pricing Works May 2015 How AWS Pricing Works May 2015 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 15 Table of Contents Table of Contents... 2 Abstract... 3 Introduction...

More information

Frequently Asked Questions

Frequently Asked Questions Frequently Asked Questions 1. Q: What is the Network Data Tunnel? A: Network Data Tunnel (NDT) is a software-based solution that accelerates data transfer in point-to-point or point-to-multipoint network

More information

Delivering Quality in Software Performance and Scalability Testing

Delivering Quality in Software Performance and Scalability Testing Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High

More information

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures 1 Refreshing Your Data Protection Environment with Next-Generation Architectures Dale Rhine, Principal Sales Consultant Kelly Boeckman, Product Marketing Analyst Program Agenda Storage

More information

Red Hat Storage Server

Red Hat Storage Server Red Hat Storage Server Marcel Hergaarden Solution Architect, Red Hat marcel.hergaarden@redhat.com May 23, 2013 Unstoppable, OpenSource Software-based Storage Solution The Foundation for the Modern Hybrid

More information

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International Keys to Successfully Architecting your DSI9000 Virtual Tape Library By Chris Johnson Dynamic Solutions International July 2009 Section 1 Executive Summary Over the last twenty years the problem of data

More information

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies

More information

A Comparative Study of cloud and mcloud Computing

A Comparative Study of cloud and mcloud Computing A Comparative Study of cloud and mcloud Computing Ms.S.Gowri* Ms.S.Latha* Ms.A.Nirmala Devi* * Department of Computer Science, K.S.Rangasamy College of Arts and Science, Tiruchengode. s.gowri@ksrcas.edu

More information

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down

More information

Getting performance & scalability on standard platforms, the Object vs Block storage debate. Copyright 2013 MPSTOR LTD. All rights reserved.

Getting performance & scalability on standard platforms, the Object vs Block storage debate. Copyright 2013 MPSTOR LTD. All rights reserved. Getting performance & scalability on standard platforms, the Object vs Block storage debate 1 December Webinar Session Getting performance & scalability on standard platforms, the Object vs Block storage

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

SAN Conceptual and Design Basics

SAN Conceptual and Design Basics TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer

More information

How To Create A Large Enterprise Cloud Storage System From A Large Server (Cisco Mds 9000) Family 2 (Cio) 2 (Mds) 2) (Cisa) 2-Year-Old (Cica) 2.5

How To Create A Large Enterprise Cloud Storage System From A Large Server (Cisco Mds 9000) Family 2 (Cio) 2 (Mds) 2) (Cisa) 2-Year-Old (Cica) 2.5 Cisco MDS 9000 Family Solution for Cloud Storage All enterprises are experiencing data growth. IDC reports that enterprise data stores will grow an average of 40 to 60 percent annually over the next 5

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database

Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database JIOS, VOL. 35, NO. 1 (2011) SUBMITTED 02/11; ACCEPTED 06/11 UDC 004.75 Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database University of Ljubljana Faculty of Computer and Information

More information

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving November, 2013 Saqib Jang Abstract This white paper demonstrates how to increase profitability by reducing the operating costs of backup

More information

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive Using XenData Software and a Spectra Logic Archive With the Video Edition of XenData Archive Series software on a Windows server and a Spectra Logic T-Series digital archive, broadcast organizations have

More information

DELL s Oracle Database Advisor

DELL s Oracle Database Advisor DELL s Oracle Database Advisor Underlying Methodology A Dell Technical White Paper Database Solutions Engineering By Roger Lopez Phani MV Dell Product Group January 2010 THIS WHITE PAPER IS FOR INFORMATIONAL

More information

C2030-280.Examcollection.Premium.Exam.34q

C2030-280.Examcollection.Premium.Exam.34q C2030-280.Examcollection.Premium.Exam.34q Number: C2030-280 Passing Score: 800 Time Limit: 120 min File Version: 32.2 http://www.gratisexam.com/ Exam Code: C2030-280 Exam Name: IBM Cloud Computing Infrastructure

More information

OPTIMIZING VIRTUAL TAPE PERFORMANCE: IMPROVING EFFICIENCY WITH DISK STORAGE SYSTEMS

OPTIMIZING VIRTUAL TAPE PERFORMANCE: IMPROVING EFFICIENCY WITH DISK STORAGE SYSTEMS W H I T E P A P E R OPTIMIZING VIRTUAL TAPE PERFORMANCE: IMPROVING EFFICIENCY WITH DISK STORAGE SYSTEMS By: David J. Cuddihy Principal Engineer Embedded Software Group June, 2007 155 CrossPoint Parkway

More information

RECOVERY SCALABLE STORAGE

RECOVERY SCALABLE STORAGE RETENTION RETRIEVAL RECOVERY SCALABLE STORAGE IMATION SCALABLE STORAGE RETENTION RECOVERY RETRIEVAL We work with small and medium-sized businesses that are caught between a rock and hard spot: they are

More information

Four Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER

Four Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER Transform Oil and Gas WHITE PAPER TABLE OF CONTENTS Overview Four Ways to Accelerate the Acquisition of Remote Sensing Data Maximize HPC Utilization Simplify and Optimize Data Distribution Improve Business

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

Implementing a Digital Video Archive Based on XenData Software

Implementing a Digital Video Archive Based on XenData Software Based on XenData Software The Video Edition of XenData Archive Series software manages a digital tape library on a Windows Server 2003 platform to create a digital video archive that is ideal for the demanding

More information

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server White Paper EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server Abstract This white paper addresses the challenges currently facing business executives to store and process the growing

More information

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Sean Lee Solution Architect, SDI, IBM Systems SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Agenda Converging Technology Forces New Generation Applications Data Management Challenges

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Unprecedented Performance and Scalability Demonstrated For Meter Data Management:

Unprecedented Performance and Scalability Demonstrated For Meter Data Management: Unprecedented Performance and Scalability Demonstrated For Meter Data Management: Ten Million Meters Scalable to One Hundred Million Meters For Five Billion Daily Meter Readings Performance testing results

More information

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,

More information

Data Backups in the Clouds

Data Backups in the Clouds ELEKTROTEHNIŠKI VESTNIK 78(3): 118-122, 2011 ENGLISH EDITION Data Backups in the Clouds Aljaž Zrnec University of Ljubljana, Faculty of Computer and Information Science, Trzaska 25, 1000 Ljubljana, Slovenia

More information

Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v.

Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v. Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v. Existing backup methods Most companies see an explosive growth in the amount of data that they have

More information

Implementing Offline Digital Video Storage using XenData Software

Implementing Offline Digital Video Storage using XenData Software using XenData Software XenData software manages data tape drives, optionally combined with a tape library, on a Windows Server 2003 platform to create an attractive offline storage solution for professional

More information

Data Storage Solutions

Data Storage Solutions Data Storage Solutions Module 1.2 2006 EMC Corporation. All rights reserved. Data Storage Solutions - 1 Data Storage Solutions Upon completion of this module, you will be able to: List the common storage

More information

Navigating Among the Clouds. Evaluating Public, Private and Hybrid Cloud Computing Approaches

Navigating Among the Clouds. Evaluating Public, Private and Hybrid Cloud Computing Approaches Navigating Among the Clouds Evaluating Public, Private and Hybrid Cloud Computing Approaches June 2012 Much like the winds of change that continue to alter the cloud landscape in the skies above, a powerful

More information