Revision 7.12-4 HIE ELECTRONICS: HYBRID ARCHITECTURE IN THE CLOUD Discussion of the Cloud and Tiered Architecture Hie Electronics, Inc. Copyright 2012
Introduction This paper compares some of the most popular cloud storage options along with introducing a new option in tiered data storage for the cloud. SSD has changed the cloud storage environment to allow for a high-speed access tier, but not every customer would need high-speed access. A special type of customer will utilize cloud storage rather than onsite storage, owning a disaster recovery and backup tier would increase the number of these customers. Today, the bottom line is return on investment and, for those who do cloud services, economies of scale. There are many ways to go about deciding on the best internal schematics of the service and then also just as many options for a customer to decide on the right service. There seems to be a consistent bottleneck in two different areas; one, the access of data via network and, two, efficiency of the tiered storage architecture. Hie Electronics can expand cloud storage environments to a new tier. The need for tiered storage data architect is a requirement to utilize economies of scale. A study by Carnegie Mellon documents that data does not need to be online at all times; in fact, data acts under Pareto principles and about 80% of all data is considered sleeping data. Sleeping data, or cold data, is rarely accessed and/or changed. (Gibson, 2007) The other 20% of data is real time, or hot, data that must be constantly and instantaneously accessed. To illustrate how that affects the cloud service industry, an EMC expert defines the time line for what is sleeping data: 80% of the world s data does not change after it is 90 days old. (Herzog, 2010) What happens to data that is 90 days old when it resides in a SAS environment? Should that data stay online in an expensive Storage Area Network (SAN) or Network Attached Storage (NAS) appliance? The logical answer is no. Business best practice for the cloud computing service and the majority of the IT world utilizes tiered and actively managed storage architecture. The best of these Active Archive architectures will use hybrid media choice models to combine and exploit the best possible storage technologies available. Tiered Storage in the Cloud In this example, tiers will be referred as stated: Tier 1: SSD (Instant Access) Tier 2: SAS HDD 15K (Hot/High Priority) Tier 3: SATA HDD 7.2K (Warm Data) Tier 4: Blu-ray Disc (BD) (Cold/Sleeping) Modern storage controllers can utilize different forms of disk/disc storage forms; SAS cloud storage providers use slow spinning disk and vary up to SSD technology (Robinson, 2012). The cloud service providers employ slow spinning SAS and SATA disk, or the lowest performance disk, with sleeping or infrequently accessed data. This is also known as cold data. The highest performance SAS and SSD is used for instant access to data, or known as hot data. Based on the Input to Output (I/O), there is a new option that Hie Electronics can exploit using a nearline media that is not possible without any spinning disk technology; use a lower tier option that is not used in the market. Cold/Sleeping data will reside on P a g e 1 Hie Electronics, Inc. Copyright 2012.
this energy passive disc. The TeraStack Solution will become Tier 3 and 4 in this example. With the TeraStack Solution, the IT personnel would set rules as to when data would be considered cold or sleeping and automatically this data would be pulled into the system. Since it is an automated process, no human error can occur and all data that is sleeping will be on the correct storage device. Data will be on the cheapest storage medium, while still allowing for access, if need be. The TeraStack Solution also allows for sleeping data to become hot or active again. When this type of data is accessed, it is then marked hot and will be available for instant access (and therefore can become cold again). Due to the very affordable cost to capacity ratio of the TeraStack Solution this can be accomplished with amazing cost savings immediately and even more so long-term. Utilizing the Cloud The SAS cloud service provider must build a storage architecture that utilizes all the tiers to provide the customer the best cost option for their needs. Many industry reports, such as those by ESG, conclude that instant access to all 100% of their data is no longer a common need or desire. The balance of cost versus performance is the customer s calculation. How do we approach such a complicated issue that is so very customer specific? The answer is simple. As cloud providers must offer Service Level Agreements (SLA), they must offer guidelines meeting all major requirements. These include the following top 5 basic must-have requirements: Instant access to data accessed and added within the past 90 days Guidelines to access data that is older than 90 days Speed of access and access privileges Disaster Recovery Data Model that matches their needs, minimum 2 copies available Disaster Location Management Plan, minimum of 2 different unique and independent sites Providing this information to the customer will allow for the SAS cloud provider to deliver service that is sufficient for the customer and meets the internal requirements. Economies of scale are the purpose of the cloud. SAS can only be offered with economic feasibility by the cloud provider by utilizing a tiered architecture. Having a massive Tier 1 front end will require a substantial and potentially even exponential back end. This back end is typically found in Tier 2 technology such as SATA HDD or Slower HDD devices. Typically, Tier 2 can have a wide array of options: virtual libraries, slower disk arrays, optical technology, or a wall of tapes. Why not use an automated system to manage the movement of data from active data to sleeping data and back i.e. Tier1 to Tier 2 and back to Tier 1? Nearline Data in the Cloud Cloud SAS storage offerings are missing one possible option for their customers. The majority of SAS customers do not need fast access to their data. These customers need Active Archive when stored in the cloud. What if this data could be stored on an energy passive media, and still be available for customer retrieval and use within a minute or two? This option offers data availability but with slightly increased data access time. Customers that would choose this level of service are using the cloud for pure long term data storage, archives which meet data retention legal statutes, backup and disaster P a g e 2 Hie Electronics, Inc. Copyright 2012.
recovery, or do not need instant access to their data. Being expensive to maintain, hard drive storage should not be utilized for storing this type of data. The TeraStack Solution would tier this infrequently accessed data while still allowing for access in a relatively short time. The Hybrid Solution Hie Electronics is a leader in Active Archive data storage system technology industry and the manufacturer of the TeraStack Solution, an Active Archive processing, data backup and archiving system. It allows for application hosting, 50-100 terabytes of data to be nearline accessible on Blu-ray optical media, and up to 42 terabytes of data on online hard drives. The company has been recognized by Frost and Sullivan with the American Video Surveillance Product Innovation of the Year Award and the Data Storage Technologies Green Excellence Award in Technology Innovation, for its ability to reduce energy costs with its storage technologies. A leader in Sustainable IT technology, the Hie Electronics TeraStack Solution product line delivers a 90 percent energy cost savings when compared with that of current technology. Hie Electronics is an Energy Star Small Business Network participant and a stakeholder in the Energy Star Enterprise Storage Initiative. Comparison of Cost and ROI The following example (Figure 1.1) shows different providers that meet all the basic requirements: have a stable internet access with practical WAN access speeds (approximately 2TB/day), multiple data storage facilities, two thirds of the storage uses network egress, ten percent uses network ingress, normal Get and Head requests, and normal service requests. The pricing is a generalization for customers and based on customer experience or published pricing. It may not include all network costs that may be associated. Service Provider Back-Up Required (TB) $ / Month $ / GB Total Cost for 4 Years Total Cost for 8 Years Total Cost for 10 Years Amazon S3 50 $5,515.00 $0.11 $264K $528K $662K (Amazon, 2012) Atmos (Value) 50 $7,500.00 $0.15 $360K $720K $900K (Peer1 hosting, 2011) Barracuda 50 $12,500.00 $0.25 $600K $1,200K $1,500K Google (World Wide) 75 $13,968.12 (Google, 2012) $0.19 $670K $1,341K $1,676K Figure 1.1: Example of Cloud Pricing Options The Cloud SAS Provider must decide on what technical solution to engineer. As a cloud provider, this is an investment that cannot be a year to year decision or reaction process. Rather, it is best to look at the total cost of ownership. Let s take for example, a comparison of the TeraStack Solution to a generic hard drive set up such as 3par, as well as a comparison to a turnkey solution such as KOMpliance. This comparison can be found below in Figure 1.2. In this comparison, factors such as the infrastructure, P a g e 3 Hie Electronics, Inc. Copyright 2012.
software, and management are nulled. Rather, it takes into account assumptions such as 24/7 operation with 99% uptime, average national electricity costs with no increase, and to buy the product upfront. Notice that there is a need to buy the hard drive set up every four years due to industry standard failure rate for HDD. The comparative return on investment is noted immediately after the first data migration that is required by the cloud provider in year 4. Year 0 4 8 10 Generic Cluster HDD SAN Turnkey (150TB Cluster Online System) Hardware Cost $313,158 $313,158 $313,158 Estimated Power Usage @ 675 w/ hr $710 $710 $710 Estimated Cooling Costs $759 $759 $759 Total Cost $313,158 $314,627 $314,627 $1,469 Total Accrued Costs (150TB On-line System) $313,158 $632,191 $951,224 $954,162 Cost per GB/month (150TB On-line System) 8.78 6.61 5.30 Generic HDD SAN Competitor (150TB Online System) Hardware Costs $142,083 $142,083 $142,083 Estimated Power Usage @ 7,870 w/ hr $8,273 $8,273 $8,273 Estimated Cooling Costs $8,852 $8,852 $8,852 Total $142,083 $159,208 $159,208 $17,125 Total Accrued Costs (150TB On-line System) $142,083 $352,666 $563,249 $597,499 Cost per GB/month (150TB On-line System) 4.90 3.91 3.32 8.14.42 TeraStack Solution (142TB Online & Near-line Solution) Hardware Cost $241,571 $5,600 $5,600 Estimated Power Usage @ 425 w/hr $477 $477 $477 Maximum Cooling Costs $478 $478 $478 Total Costs (142Tb near-line system) $241,571 $6,555 $6,555 $955 Total Accrued Cost (142 TB Near-Line System) $241,571 $250,961 $260,381 $262,291 Cost per GB/month (142TB Near-line System) 3.68 1.91 1.54 Figure 1.2: Cloud Comparison Cost per GB over 10 years As time progresses, using the hybrid architecture found in the TeraStack Solution will allow a substantial decrease of hardware costs and running costs. The investment made by the cloud provider when using this solution will see this investment as a win. Speed of a Download Today s CIO and Enterprise Storage Engineers the end users of cloud based storage must consider access speed over a slower Wide Area Network (WAN) against the data access speed when the storage appliance is attached to a Local Area Network (LAN) internet access vs. intranet access. What performance is required by your customer? Let s take a look at a customer example with three P a g e 4 Hie Electronics, Inc. Copyright 2012.
downloads of the same 15MB file copied to each cloud storage service provider. After the download was complete, the customer cleared the Gladinet cache and then initiated by the Command Prompt for the next downloads via specific cloud provider. (Huang, 2010) Upload (sec) Download # 1 (sec) Download # 2 (sec) Download # 3 (sec) Average Download Amazon S3 83 18 20 17 18 AT&T Synaptic 84 37 49 45 43 Storage Google Storage 94 15 17 17 16 for Developer Peer1 CloudOne 85 20 16 19 18 Windows Azure 86 48 72 52 57 Blob Storage Figure 1.3: Example of Customer Experience to Speed of Access to Downloaded Data Noting the time difference and the way the study was done, the difference can be drastic. But who won? Was it Google or Amazon? The question has to be answered on how the customer set up each service and if it was identical. If all is equal, then the real winner is the provider that gave the basic requirements and got the greatest profit based on performance. More than likely Azure won the battle by using their tiered architecture and automated data retrieval system. Using an automated solution, such as the TeraStack Solution, there is a win-win that provides the data to the customer well within the specs and at the lowest cost to the provider. Other Cloud Provider Concerns Application agnostic is another concern to some cloud providers; can the cloud provider use any application desired for internal results and fluidity? The answer should be yes and explains why larger cloud solutions typically have sections of their infrastructure designed and separated based on purpose and applications installed. The data environment is always changing. It is time to take an approach that can mold with how data acts during the Data Life Cycle; a hybrid solution that recognizes the Pareto principle for 80% of the data that is sleeping vs. the 20% that is active. It is time to counteract the problem that all companies face; how to keep 100% of a customer s data for as long as the customer needs to keep it. Summary The SAS Cloud providers building capacity today are facing what all data centers equally must cope with today. The required most efficient use of capital and operational funding assets and resources. How to go about this is always the balance between costs and performance. The SAS cloud providers must select the best technical and lowest cost solutions to achieve maximum ROI. By understanding the lifespan of the customers data and taking this window of time as the basis over which to depreciate the storage costs of operation. Based on industry average, a typical document (excluding emails) will last between 7-10 years. Depreciating the cost to store this normal data should be accounted over a 10 year P a g e 5 Hie Electronics, Inc. Copyright 2012.
period. The most proven method to support the lowest total cost of ownership is to use a hybrid architecture that exploits online, nearline, and offline capabilities. The solution for cloud providers that look for these economies of scale and economic efficiency can be found in the superior hybrid architecture TeraStack Solution. Consider testing this innovative key for your private SAS cloud or select a TeraStack Solution cloud provider for your SAS provider. P a g e 6 Hie Electronics, Inc. Copyright 2012.
Works Cited Amazon. (2012). Amazon S3 Pricing. Retrieved June 17, 2012, from Amazon Web Services: http://aws.amazon.com/s3/pricing/ EMC. (2010, April). Retrieved June 17, 2012, from EMC Centera Contect-Addressed Storage System. : http://www.emc.com/collateral/hardware/data-sheet/c931-emc-centera-cas-ds.pdf Google. (2012, June 16). Google. Retrieved June 19, 2012, from Pricingandterms: https://developers.google.com/storage/docs/pricingandterms Herzog, E. (2010). Cloud Tiering Appliance. (EMC, Interviewer) Peer1 hosting. (2011). CloudOne Storage powered by EMC Atmos. Retrieved June 18, 2012, from Ping & People: http://www.peer1.com/sites/default/files/pdf/datasheets/cloudone_storage_consumer.pdf Robinson, K. (2012, May 31). Tiering in RAID Storage Environments. Retrieved May 31, 2012, from Storage Newsletter: http://www.storagenewsletter.com/news/systems/lsi-tiering-in-raid TechTarget. (2012). Find a Tech Definition. Retrieved June 18, 2012, from WhatIs.com: http://whatis.techtarget.com/ For more information about Hie Electronics and the innovative TeraStack Solution, visit the company s website at www.hie-electronics.com. For further questions, please contact Hie Electronics at info@hie-electronics.com or (972)542-2327. P a g e 7 Hie Electronics, Inc. Copyright 2012.