IBM Research - Zurich GPFS Cloud ILM Storage Research Technology Outlook Dr. Thomas Weigold (twe@zurich.ibm.com) Manager Cloud Storage & Security IBM Research Zurich
Why Cloud Storage? Economics! Lower total cost No capital expenditure Greatly reduced IT staff No floor space, heating, cooling, storage Pay for actual usage not unused capacity Opportunity cost Reliable/Available/Distributed/Maintained Someone else worries about backups! No h/w maintenance, service, upgrades, troubleshooting Economies of scale at service provider Unlimited capacity On-demand capacity rather than forecasting usage Reduced time for deployment Reduced time for procurement 2
From Cloud to Intercloud Cloud-based object storage systems is a success story... Prices and scale which can t be met with traditional architectures Popular and successful (Amazon S3 exceeded 2 trillion objects in April 2013) Simple APIs (Put/Get Key Value) with several drawbacks Poor passive security of stored data (unless client-side encryption is used) Resilience Cloud downtime causes data unavailability Each cloud provider has it's own API: large switching effort & risk Storage simplicity goes hand in hand with lack of enterprise features Security Main Customer Concerns Vendor lock-in Storage in the intercloud Provider agnostic API One API exposed to the client, different APIs in the background Add Client-side intelligence offers enterprise-grade features Use multiple clouds (public/private) Limits trust in single provider Heterogeneity ensures genuine failure independence 3
Multi-Cloud Storage Toolkit in a Nutshell What: A Software-defined enterprise cloud storage gateway Why: Address customer concerns regarding cloud security, resilience, and vendor lock-in Goal: Enable storage products to natively support public/private cloud storage Cloud-enabled Storage Products Toolkit Library Cloud Storage Provider Private cloud Rackspace IBM Softlayer Microsoft Azure (e.g. GPFS/GSS, Storwize) provider agnostic API (Java, C) Keys (local, cloud, KMIP) provider specific REST API Amazon S3 A Software-Defined Enterprise Cloud Storage Gateway for: transparent data migration backup and disaster recovery security and high availability 4
Value proposition No Capital Expenditure Cloud Storage Lower total cost Pay-as-you-go Hosts Gold Storage Pool Silver Storage Pool Bronze Storage Pool Tape Storage read/write Reliable/Available Distributed/Maintained read only read/write SHARING Unlimited Capacity Reduced Time for deployment 5
GPFS Informa-on Lifecycle Management (ILM) 6
Use case 1: GPFS Cloud ILM Goal: Enabling a secure, reliable, transparent cloud storage tier in GPFS (GSS, Storwize Unified etc.) Motivation: Manage data growth by placing file data in the right tier at the right time according to its value while being available under one common name space at any time leveraging the economy of scale of the cloud A single namespace across disk/tape/cloud for tiering, migration, backup, DR, data sharing.. Single Name Space AFM POSIX NFS CIFS Object GPFS Cluster GPFS External Stores Metadata ICStore Toolkit Resilience Integrity Encryption Keys IBM Softlayer Amazon S3 SSD Fast Disk Slow Disk LTFS Tape Private Cloud MS Azure Policies for backup and tiering 7
GPFS Cloud ILM Customer Value & Research Directions Seamless file migration between disk and cloud Cloud storage becomes a GPFS storage pool There can be multiple cloud pools with different properties e.g. plain private cloud, compression and encryption for public cloud Mix-and-match public and private clouds GPFS policy selects files for migration and defines target cloud pool File system metadata always remains online File metadata includes cloud container/object id Data is retrieved transparently on demand Cloud ILM can coexist with tape (TSM HSM/LTFS EE) Cloud acts as cold data store with low latency CLOUD Store Hosts Full file system backup for disaster recovery (DR) Scale out backup/restore (SOBAR) Pre-migrate all files to the cloud (copy on disk and cloud) Create snapshot of file system and create metadata image using SOBAR Store the metadata image to the cloud Gold Storage Pool Silver Storage Pool Bronze Storage Pool Off-line Storage Rapid restore by restoring metadata only (low RTO) Restore files selectively (e.g. most important first) 8
GPFS Cloud ILM Customer Value & Research Directions Efficient data sharing between remote clusters Pre-migrate files to the cloud Replicate file system metadata to target cluster via active file management (AFM) Target cluster can now access files in the cloud data metadata Run workloads locally or in the cloud Run workloads on local files Migrate files to the cloud Run workloads in the cloud (e.g. for scale out) Access files outside GPFS Propagate changes back to the local GPFS cluster Seamless data migration between cloud providers Switch to new cloud provider immediately Data is transparently migrated in the background File-level backup/restore Selective incremental backup and restore on file granularity 9
GPFS Cloud ILM Demo (featured at Edge 2013 and CeBIT 2014) Threshold based space management on a cloud-enabled Storwize V7000 Unified (migrating files to local Swift cluster and S3) disk disk disk GPFS ILM Connector Policy Enforcer GPFS exported over Samba Multi-cloud Storage Toolkit Toolkit jclouds S3 API Swift API Amazon S3 Private Cloud Cloud storage Video: http://youtu.be/iz8sze9gros 10
Use Case 2: SVC Native Cloud Backup (Video: http://www.youtube.com/watch?v=6zrhbqakmdi&feature=youtu.be) The multi-cloud storage toolkit runs alongside SVC and stores full/incremental snapshots of SVC volumes to the cloud. GUI SVC/V7000 CLI The toolkit applies encryption, integrity protection etc. as configured. production volumes Snapshots can be restored from the cloud to the original or to a new SVC volume. Cloud backup management integrated in SVC GUI Feature FlashCopy A cloud-based time machine for enterprise block storage Built-in easy-to-use feature for various use cases: Metadata Toolkit Compression Encryption Integrity Resilience Keys NEW Backup Disaster recovery Data sharing Migration/archiving Compliance/auditing full/incremental snapshots metadata Private Cloud Public Cloud 11
SVC Native Cloud Backup Prototype Demo: A V7k using OpenStack Swift private cloud as backup target Host server (exports the volume via CIFS) Schedule/trigger backups Monitor progress Browse and restore backups Storwize GUI FC production volume full/incremental snapshots Native Cloud Backup V7000 backup/restore (Ethernet) OpenStack Swift Cluster 12 Featured at CeBIT 2014
Thanks!