Building Storage Clouds for Online Applications A Case for Optimized Object Storage
Agenda Introduction: storage facts and trends Call for more online storage! AmpliStor: Optimized Object Storage Cost Reduction through Erasure Coding Use Case: Massive Media Questions Amplidata Confidential 2
Introduction: storage facts and trends 3
Introduction, facts and trends Studies show that data storage capacities will likely increase by over 30X in the coming decade to over 35 Zettabytes 35ZB Storage Consumption High-capacity drives Less Staff / TB Unstructured Data 30X Time 2020 4
Introduction, facts and trends The number of qualified people to manage this data will stay flat (~1.5X) Efficiency: automate & reduce overhead Capcity / Budget Storage Budget Time 5
Introduction, facts and trends Much of that growth (80%) is driven by unstructured data : billions of files Active Archives Online Images Large Files Medical Images Online Storage Online Movies 6
Introduction, facts and trends Traditional storage technologies require too much overhead, power and management -> There is a growing interest in Object Storage -> Erasure coding is the proclaimed successor of RAID 7
Introduction, facts and trends Most current storage technologies require >200% overhead to provide five 9s availability Raid6 + Replication 3 copies in the cloud 8
Introduction, facts and trends Storage currently accounts for 37-40% of overall data center energy consumption from hardware Energy consumption will influence technology procurement criteria Data Center Power Usage 9
Introduction, facts and trends Storage Maintenance processes need to be more automated: E.g., Data migration will soon take longer than the lifetime of media It s like painting the Golden Gate Bridge, but the bridge is continuously getting longer 10
A Call for more online storage 11
A Call for more online storage The Public Cloud industry is far ahead in the storage growth statistics: AWS S3 will soon have 800 billion objects stored Facebook has over 250 million photos uploaded per day, which is over 7 billion per month Youtube receives over 24hrs of new video every minute 12
A Call for more online storage Backup and Recovery is increasingly moved to the Cloud Document sharing is HOT Archives are moved back off tape, online archives are BIG Big Data is taking many shapes Social-Local-Mobile will keep stimulating digital data growth 13
A Call for more online storage 800 billion objects: Don t you want some of that???? 14
A Call for more online storage 800 billion objects: So how would you store that? 15
Object Storage for Online Applications 16
Object Storage What are the requirements? Data has to be always available online Direct interface to the applications Petabyte scalability Extreme reliability, integrity Cost-efficient Security } } Commodity-HDD Storage + REST API, Cloud-enabled + Erasure Coding = Optimized Object Storage 17
Storage Clouds Storage Cloud infrastructures Private or public setup Provide highest availability Applications File systems are obsolete Use REST API Application Application Application REST API Massively Scalable Storage Pool 18
Petabyte Scalability Object Storage systems will scale: Beyond petabytes of data Beyond billions of data objects Systems should scale uniformly Add resources incrementally Scale performance and capacity separately 19
Petabyte Scalability Scalable metadata repository (capacity & performance) Lightweight metadata, designed to scale up to billions of objects Flat namespace 20
Data Integrity Ensuring the integrity of long-term unstructured data, new data protection algorithms are required, to: Address the increasing capacity of disk drives Solve issues related to long RAID rebuild windows Object storage systems based on erasure-coding can not only protect data from higher numbers of drive failures, but also against the failure of entire storage modules. 21
Cost-efficient Power, cooling and floor-space requirements are paramount concerns: erasure coding drastically reduces overhead numbers Systems need to be self-managing The system needs to be hardware independent: data migration needs to be an automatic, continuous background process. 22
Cost-efficient Eliminate the need for manual disk swaps: move to higher-level container management tasks. The system should automatically manage allocation to the underlying disks 23
Security Multi-tenant authentication/authorisation Read Read/Write List Auditing & Logging Secure protocols/encryptions (https) Individual disks cannot be mis-used Data is encoded and spread 24
So, what is this erasure encoding? 25
Erasure Coding, simply explained BitSpread Encodes data in linear equations Distributes the equations across disks, storage nodes, racks, data centers Original data can always be uniquely determined from a subset of the equations BitSpread uses 4K variables independent of object size Extra blocks can be generated without knowing what is missing Simplified mathematics: Original Object 75 Decomposed Object 7 5 BitSpread Series of Equations X+Y=12 X-Y=2 2X+Y=19 Any 2 out of 3 equations uniquely determine object 7 5 7 5 7 5 26
AmpliStor System Controller Nodes (3+) Dual, quad-core Xeon processors, 16GB RAM, 2 x 200GB SSD, 2 x 10 Gigabit Ethernet network interfaces Object Based Interfaces: http/rest API, C API, Python CLI, WebDav 3 Controllers per System (minimum) can be scaled up for performance (fully shared metadata & storage pool) AS20 Low Power Storage Nodes (8+) 1 U rack mount chassis with 20TB capacity 2 x 1 Gigabit network interfaces Low power processor (Intel Atom) 10 x 2 TB low-power Green SATA disk drives Low power: 65-140 watts per node utilization (3.5-7 watts per TB) 27
Core Software Technology Components BitSpread Distributed Encoder/Decoder RAID replacement technology based on unique variant of Erasure Coding Dial-in fault tolerance through namespace level policies Namespace1: 16/4 policy protects against any 4 failures in 16 disks Namespace2: 18/6 policy protects against any 6 failures in 18 disks Provides availability and reliability even during failures Policies can be dynamically changed BitDynamics Maintenance & Self-Healing Agent Out of band operations agent for disk monitoring, integrity verification & object self-healing Performs automated tasks: scrubs, verifies, self-heals, repairs & optimizes data on disk 28
AmpliStor for Big Unstructured Data Turnkey storage solution for BIG Unstructured Data Systems scales from beyond Petabytes with Global Object Namespace Throughput scales with amount of resources Policy-Driven Storage Durability Ten 9 s of Durability (99.99999999%) and beyond through policies Eliminates the reliability exposures of RAID on high-density disk drives Eliminates data corruption or loss due to bit errors 50-70% improvement in Storage Efficiency 70% reduction in storage footprint compared to Three copies in the cloud 50% reduction in storage footprint compared to mirrored RAID Drives proportional reductions in data center floor space & power Automated Management Self-healing design manages data integrity assurance and auto-repairs data 50-70% reduction in TCO Storage footprint (Capex), power, data center space & management costs 29
AmpliStor Use Case: Massive Media 30
Montreux Jazz, an invaluable research asset Most successful social media group in EMEA Social networking Gaming Dating Massive Media Storage Cloud Half a billion objects 80 million users Highest level of data availability Storage requirements High availability without copying data: 250% overhead is unacceptable Low power: 3.5 Watt/TB Fast migration: REST API 31
Thank You! Tom Leyden, Director of Alliances & Marketing Twitter.com/tomme