Christian Bolik (bolik@de.ibm.com), IBM Storage Software Development Software-Defined Storage (SDS)
Objectives Understand the driving forces behind the desire to move to an SDS Understand the purpose of SDS, and its relation to SDE and Cloud Gain insight into the 2 primary perspectives on SDS Learn about what is required for establishing an SDS 2
The increasing complexity, volume, and value of data 8 zettabytes of digital content created by 2015 3
The Information Explosion Zettabytes Information doubling every 18-24 months Storage growing 20-40% per year Storage budgets up 1%-5% Velocity of Change Acquisitions Mergers Consolidations Exabytes ILM, Data Retention initiatives Petabytes Born on the Web type applications Legal Requirements Terabytes Gigabytes The information explosion meets budget reality Regulations demanding data to be retained for many years Ever increasing variety of data stored digitally 2000 2005 2010 2015 4
Big Data Source: http://www.domo.com/blog/2012 /06/how-much-data-is-createdevery-minute/ 5
Storage management pain points Top pain points are dominated by Growth management Cost Complexity Problems seem even more severe for midsize enterprises compared to large enterprises The InfoPro Storage Study 1H12 451 Research 6
Managing increasing amounts of storage takes time and money Survey respondents cited 77% of storage staff time devoted to administration of ongoing operations. Things that could be automated. The InfoPro Storage Study 1H12 451 Research 7
The special needs of Virtual Servers Virtualized Non-Virtualized 60% of storage spend in 2011 was for attachment to virtual servers $13.331 $20.082 2011 Storage Spend IDC Storage Workloads 10/2012 Nearly all customers reported some storage issue with VM usage. Virtual servers bring their own unique storage requirements, and need special consideration for: New capacity New operational processes DR New performance management Planning considerations 8 From: Research Report: 2012 Storage Market Survey. Source: Enterprise Strategy Group Created for Connie Bright, IBM. 2012 Enterprise Strategy Group, Inc. All Rights Reserved
Changing Workload Requirements Agility & Rapid Scale Systems of Engagement (Situational Need) Born on Cloud Orchestration across compute/network/storage for provisioning, deployment and management of workloads (DevOps) Dynamic scalability as applications and data requirements grow Cost-optimized storage via disks embedded in servers Multi-tenant security at a fine-grained, highly scaled level Open support of industry standards and APIs Workload Optimized & Transaction Integrity Systems of Record (Traditional Operations) Enabled for Cloud Orchestration across compute/network/storage for provisioning, deployment, and management of workloads Automation of provisioning and configuration of storage based on application requirements, with ongoing adjustments based on policies/sla Programmable adjustments to storage (via APIs) as application needs change Heterogeneous environment support Efficient management of data copies (backup/archive/compliance) Value is shifting to software to provide the dynamic and agile storage environment required by these workloads 9
Introducing: Software-Defined Storage (SDS) IDC Definition of SDS: A software-defined data center is...a loosely coupled set of software components that seek to virtualize and federate datacenter-wide hardware resources such as storage, compute, and network resources... The goal for a software-defined datacenter is to...make the datacenter available in the form of an integrated service... Key attributes It is software Offers a full suite of storage services Flexibility, lower cost Abstraction of storage capabilities Federates physical storage capacity from multiple locations/technologies Flexibility through virtualisation Based on IDC s Worldwide Software-Based (Software-Defined) Storage Taxonomy, 2013 10
Software Defined Storage = programmable smart storage 11 Today s World, with No Software Defined Storage 1. A Workload Definition Layer (or application) defines storage capacity requirements 2. Storage administrators define logical volumes with required storage capacity and do a best guess of performance requirements 3. Storage administrators map the logical volumes to the application 4. All the following events will need storage administrators intervention: a) Storage capacity needs to be increased or decreased b) Application performance degrades due to resource contention c) Performance requirements change (increase or decrease) d) Data protection needs change e) Replication policies change f) RPO and RTO of the data changes g) Backup and archive policy changes The New World with Software Defined Storage Programmable Storage 1. A Workload Definition Layer (or application) will specify storage requirements explicitly: a) Performance b) Capacity c) RPO/RTO d) Replication, etc. 2. A Workload Orchestrator will schedule workload with appropriate compute, network and storage resources to satisfy Service Level Objectives 3. If performance of an application is impacted, storage service will automatically detect it and adjust resources to maintain its Service Level Agreements 4. If the requirements are changed, applications will communicate with storage via APIs. Storage service will adjust the resources accordingly
Key characteristics of an SDS-enabled Storage Service Commoditized persistent data storage (lower cost) Service-based infrastructure (easy to consume) Open standards and interfaces based platform (no vendor lock-in) Focus on solution rather then technical platform (application-oriented) Scalability (capacity, throughput, performance) Resilient (always available) Workload-aware ( fit for purpose, optimized) Covering block, file and object storage Cost-efficient and highly automated 12
SDS in SDE Software Defined Environments 13 Tighter coordination between applications and storage/network, Exposing storage capabilities for the software to dynamically provision storage with the most suitable characteristics Introducing new operations between software and storage to let storage better adapt to the needs of software Integrating storage functions to the software to leverage higher-level knowledge Control planes separated from the hardware to the software layer. Unified Control Planes allow rich resource abstractions to assemble purpose fit systems Programmable infrastructures allow for dynamic optimization to respond to business requirements Control Plane SDC Control and Config Data Plane C Workload Abstraction SDE Unified Control Plane: Cross-Domain Orchestration Resource Abstraction APIs APIs APIs Heterogeneous Compute Resources & capabilities SDN Control and Config Virtualized Network C SDS Control and Config Heterogeneous Storage Resources & capabilities
Relation of SDS and SDE to Cloud Layering Enabling business transformation Business Process as a Service Business Process Solutions Application Application Application Application Application Marketplace of high value consumable business applications Software as a Service External Ecosystem Industry Collaboration Human Resources Big Data & Analytics Commerce Marketing Composable and integrated application development platform Built using open standards Platform as a Service Developmen t Big Data & Analytics Security Integration Mobile Social Traditional Workloads Infrastructure as a Service Enterprise class, optimized infrastructure, via Software-Defined Environments (SDE) Built using open standards Software-Defined Compute Software-Defined Storage Software-Defined Networking Public. Private. Dynamic hybrid. 14
Different views of the same coin... Expectations on a Storage Service: Consumer Self-service Flexible and dynamic, elastic Cost-efficient, no overprovisioning Charged by capacity and service level used Reliable and always available No need to have any knowledge of infrastructure details Provider Highly automated storage lifecycle management Virtualized and standards-based, simple capacity planning and forecasting Automated and optimized, space-efficient Capacity reporting and metering, multi-tenancy-enabled Highly available, replicated, self-monitoring and self-healing, secure Automated mapping of consumer requirements to infrastructure capabilities 15
Key Aspect of IT Service Management in General: Mapping Business Requirements Separation of concerns to Consumer Provider Infrastructure Capabilities 16
What this means for Storage Service Management Mapping Business Requirements Capacity Accessibility Availability Performance Security Retention/Compliance 17 to Infrastructure Capabilities Media type Disk technologies RAID levels Encryption Compression Thin Provisioning Number of Copies Access latency Access protocols Backup/Replication etc...
Establishing a service catalog of supported service classes which service consumers can choose from Service Catalog Service Class Platinum $$$$ Different service classes map to different levels of service in some or all of the different service level catagories: Service Class Gold Service Class Silver Service Class Bronze $$$ $$ $ Accessibility Availability Performance Consistency Retention / Compliance Security 18
Defining Requirements for Storage Services: Service Level Categories Service Level Objectives (SLOs) Accessibility Initial Access Time Data Sharing Requires Access Transparency Max-Out-Of Space Duration Availability Availability Period Planned Downtime Max. Unplanned Downtime Aggregate Max Unplanned Downtime Per Instance Recovery Point Objective (RPO) Recovery Time Objective (RTO) Consistency Number Of Copies Number Of Versions Retain Deleted Performance Avg. I/O Rate Avg. Data Throughput Retention / Compliance Immutability Disposal Durability Retention Period Security Accountability Integrity Authenticity Confidentiality Physical Security 19
Mapping storage resource and management capabilities to SLOs Accessibility Initial Access Time Data Sharing Requires Access Transparency Max-Out-Of Space Duration Availability Availability Period Planned Downtime 20 (app-aware?), Max. Unplanned Downtime Aggregate Backup/Restore Max Unplanned Downtime Per Instance (file/imagelevel), Recovery Point Objective (RPO) Recovery Time Objective (RTO) versioning,... Consistency Number Of Copies Number Of Versions Retain Deleted Tape/Disk/Flash, HSM, NAS exports, vaulting, thin provisioning,... Metro Mirror, Global Mirror, Snapshots Performance Avg. I/O Rate Avg. Data Throughput Retention / Compliance Immutability Disposal Durability Retention Period Security Accountability Integrity Authenticity Confidentiality Physical Security Different disk media (RPMs etc.), tape, flash, RAID levels, Cache,... WORM storage, automated deletion, data shredding, media lifetime,... Encryption, key management, access controls, lockable cabinets, etc.
Provider s Goal: Maximize storage capacity, minimize down-time: Store data with as little cost as possible while maintaining committed SLAs (Service Level Agreements) How? Thin provisioning: Allocate only as much storage as is used, expand allocation as needed Compression: Reduce actual capacity used Data deduplication: Store only one copy of files/blocks containing the same data Tiering: Always place data on the lowest cost storage tier which still fulfills customer requirements, optimize continuously Monitoring: Threshold-based alerting to detect impending performance bottlenecks early, balance volumes to address Virtualization: Employ virtualization to have the freedom of moving data to lower cost storage without any downtime Optimal Storage Tier Distribution 1-5% 15-20% 20-25% 50-60% Tier 0 Tier 1 Tier 2 Tier 3 21
Flexibility through Storage Virtualization Traditional Storage Capacity is isolated in SAN islands Multiple management points Potentially poor capacity utilization Capacity is purchased for and owned by individual applications With Storage Virtualization Combines storage capacity into 1 large storage pool Single management point Uses storage assets more efficiently Capacity purchases can be deferred Plus: Non-disruptive data migration between storage resources 95% capacity 20% capacity SAN 50% capacity SAN Storage Hypervisor 55% capacity HDS IBM EMC HP HDS IBM EMC HP 22 22
Storage Management Interface Abstraction via SMI-S SMI-S (Storage Management Initiave Specification) started in 2002, with the goal to standardize management interfaces of storage devices SMI-S is currently supported by 21 different vendors (http://www.snia.org/ctp/conformingproviders/index.html) SMI-S is developed by a workgroup of the SNIA (Storage Networking Industry Association); in the meantime it has been accepted both as an ANSI and an ISO standard SMI-S builds on CIM (Common Information Model), defined by the DMTF, uses XML for formatting the payload, and HTTP as the transport mechanism 23
OSLC: Built on Linked Data Linked Data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried [1] 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful information, using the standards 4. Include links to other URIs. so that they can discover more things OSLC describes a method for integration of disparate tools, across domains, by providing a set of integration services, through other tools can be discovered, and more information about resources retrieved. This is enabled by Linked Data. 24 Open Services for Lifecycle Collaboration: http://open-services.net/ [1] Bizer, Heath, Berners-Lee (2009). "Linked Data - The Story So Far"
OpenStack provides an open mechanism for provisioning/managing storage to workloads and is driving a rapidly developing ecosystem OpenStack storage includes Cinder: Provision and manage block storage for compute Swift: object storage Manila (future): file storage IBM provides support for OpenStack, and provides extensions through standard mechanisms OpenStack provides a common, open interface for ISVs, applications and cloud admins to provision and manage storage resources Integrated with compute and networking management Commands Community/ Competitor storage support Smarter Data Protection OpenStack Cinder Smarter Mgmt on any storage Capabilities Drivers IBM storage; TPC IBM sol n for Internal storage Integration with SDN and SDC IBM Enterprise Object Storage Solution Platform OpenStack Swift Object Middleware Community Enable & Extend Differentiate 25
In Summary... The exponential and on-going growth in data storage requirements calls for new, more flexible storage management methods Software-Defined Storage promises to provide the flexibility, service orientation, and cost-efficiency required to address today s requirements By abstracting storage resource capabilities through service classes and APIs, SDS is enabled to snap-in to an SDE The 2 primary views on SDS are that of a service consumer and a service provider, each having overlapping goals and expectations Main challenges for the provider are to map consumer-specified business requirements to storage infrastructure capabilities, and to maintain committed SLAs For the provider to be able to offer an attractively priced, yet sustainable storage service, various technologies and methods exist 26
27
BACKUP 28
What s the problem with storage these days? Data growth is exponential 128 GB/ person Drivers: The World s total data per person. Digital Information Created, Captured, Replicated WW Velocity of Change Acquisitions Mergers 2006: 180 exabytes 2007: 280 exabytes... 2011: 1800 exabytes (1800 billion gigabytes) Consolidations ILM, Data Retention initiatives Born on the Web type applications 0.8 GB/ person 24 GB/ person Expected compound annual growth rate is almost 60% Sources: IDC, Worldwide Disk Storage Systems 2007-2011 Forecast Update, Doc #209490 IDC Whitepaper: The Diverse and Exploding Digitall Universe, March 2008 Legal Requirements Regulations demanding data to be retained for many years Ever increasing variety of data stored digitally 29 2003 2006 2010
Globally, storage requirement is 80% file-based unstructured data, and growing Worldwide Storage Capacity Shipped by Segment, 2008 2013 Explosion of data, transactions, and digitally-aware devices strains IT infrastructure and operations. Storage capacity is doubling every 18 months. Majority of this data is unstructured filebased, such as user files, medical images, web and rich media content, growing at 63% Block storage, while still well suited for existing OLTP/database workloads, is not where majority of strategic analytics-based applications and strategic storage initiatives are being deployed Source: IDC, State of File-Based Storage Use in Organizations: Results from IDC's 2009 Trends in File-Based Storage Survey: Dec 2009: Doc # 221138 30
Customer Storage Needs - General Across a range of customer types, rapid growth of unstructured data, the complexity of data protection, and hardware costs are the biggest challenges. There is a list of other issues, made worse by the size and growth rates of data Space constraints, poor utilization Management tasks Long implementation times Lack of skills Staff costs Several System trends show through to Storage Support for virtual server environments Support for VDI From: Research Report: 2012 Storage Market Survey. Source: Enterprise Strategy Group Created for Connie Bright, IBM. 2012 Enterprise Strategy Group, Inc. All Rights Reserved 31
Storage Services Layer Security and Availability Authentication/Auditing Encryption Mirroring/DR Platinum High Availability Backup & Recovery Performance and Opt. Striping Clustering Compression Gold Silver Bronze Deduplication Tiering/ILM SOFTWARE DEFINED COMPUTE RESILIENCY SOFTWARE DEFINED STORAGE HETEROGENEITY CAPABILITY OPTIMIZATION FABRIC MANAGEMENT SOFTWARE DEFINED NETWORK Storage replication Disaster recovery Consistency groups Backup Storage Abstraction Storage Provisioning Storage Monitoring SAN/GPFS/NAS/DAS Storage tiers Performance aware placement Continous optimizations Migration FC/FCoE/iSCSI/ Infiniband Zone management 32 32 Workload Abstraction Resource Abstraction ` Mapping to Resource Continuous Optimization
What is needed for Software Defined Storage? Storage Resource Management Storage Service Management Business Continuity Management Data Protection Management Control Plane (incl. resource abstraction) - Management Devices Block Storage Systems / Storage Arrays File Storage Systems / NAS Filers Object Storage Systems Tape Systems / Archive Systems Storage Virtualizers Storage Networks Services Thin Provisioning De-Duplication Data Replication Encryption Compression... Data Plane - I/O 33
IBM SmartCloud Virtual Storage Center Feature Options Example of a Software Defined Storage Platform Key attributes ( IDC): It is software It offers a full suite of storage services Tivoli Storage Productivity Center / FlashCopy Manager Management Software Platform Policy-based Management and Automation Snapshot and Backup Management Storage Software Platform Security and Availability Control Plane Layer It federates physical storage capacity IBM Storwize Storage Software Platform Authentication/Auditing Encryption Mirroring/DR High Availability Backup & Recovery Performance and Opt. Striping Clustering Object Storage Cluster File System Block Virtualization Data Plane Layer Compression Deduplication Tiering/ILM Storage Infrastructure 34