Neil Stobart Cloudian Inc. CLOUDIAN HYPERSTORE Smart Data Storage
Storage is changing forever Scale Up / Terabytes Flash host/array Tradi/onal SAN/NAS Scalability / Big Data Object Storage Scale Out / Petabytes High GB Price Bit Price Low GB Price High Speed Latency Low Speed Limited Transac6on Volume Large Local Access Remote Limited Robustness / DR Capable 2
Smart Data Storage DATA STORAGE SMART DATA STORAGE Passive Silo- ed Delayed Analy6cs Sta6c Data Ac6ve Timely Insight Meaning Ac6onable Business Value Silo Silo Silo HYPERSTORE ANALYTICS BLOCKS INFORMATION OBJECT STORE 3
Blueprint for Smart Data Storage COST EFFICIENCY HYBRID CLOUD SMART DATA SoQware- Defined + Commodity Servers Private Cloud + Public Cloud Big Data Storage + Analy/cs Cloudian HyperStore 4
HyperStore Smart Data Storage Your Data Your Cloud Your Way at Webscale Economics Use of Commodity Servers Webscale Simplicity Scale out with Hybrid Flexibility On-Prem & Off-Prem Single/Multiple Regions Multi tenant & Open with Enterprise Control User Access Control Reporting & Alerting Encryption & Compression 5
Your Data at Webscale Economics Commodity Servers Scale Out Durable Simple to Use Tenant A Tenant B Tenant C HyperStore: SoNware Defined Storage 300TB 100TB Heterogeneous Node CPU Disks Network 6
Your Cloud with Hybrid Flexibility Open S3 Ecosystem Hybrid Mul6- Tenant Mul6- Region Tenant A Tenant B Tenant C Amazon S3 Amazon Glacier HyperStore: SoNware Defined Storage <- Auto Tiering - > 7
Your Way with Enterprise Control ROLE based ACCESS QUALITY of SERVICE MONITORING ENCRYPTION SYSTEM/GROUP/USER QUOTAS SNMP & JMX AES- 256 BUCKET or OBJECT GROUP/REGION/USER CHARGE BACK BUCKET or OBJECT ACLs AUTO ENFORCEMENT REPORTS TRANSPARENT PROGRAMMABLE APIs HyperStore: SoNware Defined Storage 8
What s Inside 9
Distributed & Elas/c Geo Cluster Server <- > vnodes <- > Disks Add Node <- > Auto Rebalance DC1 Peer- to- peer system = no SPOF DC2 Distributed Everything = Data, Metadata, Configura?on User Defined Loca6on Affinity 10
Mul/ data center Replicas can span across data centers Replicas with loca6on constraint Single- Data Center Single Region Mul/- Data Center Single Region Mul/- Data Center Mul/ Region 11
Data Flow: An S3 PUT 2 4 1 3 5 1. S3 request is sent to any node. That is the coordinator node that applies a policy (e.g., 2 replicas in DC1, 1 replica in DC2). 2. Cassandra computes a token for that object and a set of data nodes. 3. In parallel, the request is sent to the data nodes. 4 DC1 4. At each data node, the data is wrihen to disk, and the response is sent back to the coordinator. 4 DC2 5. Once the required number of responses (based on Consistency Level) are received from the data nodes, the coordinator node responds to the S3 client. 12
HyperStore Dynamic Consistency Levels. Conven/onal approach: Sta?c, Inflexible Synchronous replica6on + : No data loss - : Must access to a remote loca6on Asynchronous replica6on + : Can work without access to a remote loca6on - : Risk of data loss when a disaster hits a data center Cloudian Mul/- DC: Sta?c, flexible sefng Synchronous replica6on Each Quorum Local Quorum Quorum Asynchronous replica6on Cloudian 5.x Mul/- DC: Dynamic change, flexible sefng Synchronous replica6on #1 Each Quorum Local Quorum #2 Quorum Asynchronous replica6on #3 13
Configurable Data Protec/on Metadata Small objects Large Objects Active Content Optimal Data Storage HyperStore Optimal Data Protection Replication Deep Archives Erasure Coding Optimized for all data types/sizes Optimize for space consumption with per object data protection selection Replication Erasure Coding Optimized for performance with Chunking/Multipart Uploading 14
Configurable HyperStore Erasure coding Based on Vandermonde- Reed- Solomon error correc6on Geographical protec6on against failure at a lower footprint cost Suitable for archive data, backup, large content and mul6media Divide objects into k data parts and m encodes Cloudian Differen/ators Configurable per object Configurable per bucket (storage pool) Fully configurable, by default (4,2) 4 data fragments +2 checksum Spread objects across 6 servers For example a 4+2 Scheme can survive a 2 disks or nodes failure (Credit: http://nisl.wayne.edu/papers/tech/dsn-2009.pdf, Page 2) 15
HyperStore Adaptive Policy Engine (Multi-Tenant, Continuous, Adaptive, Policy Engine) OBJECT SIZE Optimize <50k) ERASURE CODING (N+1,2,3,4) ACCESS REPLICATION CONTROL (RF=1,2,3,4) (ACLs, Expiration) CLOUD ARCHIVE (S3 Tier >24hr) Policy Controlled Virtual Storage Pools (buckets like Amazon) Scale/reduce storage on demand Multi-tenanted with many application tenants on same infrastructure Dynamically Adjust protection policies Optimize for small objects by policy Cloud archiving by virtual pool Virtual Pool 1 Virtual Pool 2 HyperStore Storage Pool Virtual Pool 3 Heterogeneous Node 16
Control with QoS Define Limits Warning and Max Levels If Limit is reached, requests are rejected un6l next window Storage Bytes USER GROUP Storage Objects Requests per Min Inbound Bytes/Min Outbound Bytes/Min PROGRAMMABLE APIs HyperStore SoNware Defined Storage 17
HyperStore Success Stories 2,300,000 Users storing data on Single HyperStore System 3,500 Enterprise on Single HyperStore System 18
HyperStore Features Natively S3 Hybrid Storage Cloud Extreme Durability Multi-Tenant Geo-Distribution Scale out Intelligence in Software Smart Support Billing & Reporting Data Protection Quality of Service Programmable 19
SoQware Defined?? Hardware Preconfigured Commodity Server SoQware Choose Your Server OR With 3 Op6ons Entry Level, Capacity Op6mized & Performance Op6mized Install on supported RedHat and CentOS 20
HyperStore Use Cases Media Content Store Backup Archive Data Analy/cs Private Cloud File Distribu/on & Sharing 21
THANK YOU Questions? www.cloudian.com Cloud Storage for Everyone PRESENTATION TITLE DATE