Utilizing the SDSC Cloud Storage Service PASIG Conference January 13, 2012 Richard L. Moore rlm@sdsc.edu San Diego Supercomputer Center University of California San Diego
Traditional supercomputer center Functional Systems storage systems Tape-based archival system Built for capacity We ve extended the archive beyond HPC simulation data to experimental data and other digital assets - and as a node in geographically-distributed digital preservation systems (e.g. Chronopolis) High-bandwidth parallel file system Built for speed Transient data, single-copy reliability Home directory system (e.g. NFS) Built for robustness and reliability Regular backups Limitations Archival data is difficult to access - high latency, lower bandwidth, user interfaces Difficult to share archival data by multiple users All too often archived data, particularly HPC simulations, is write-once-read-never Not sustainable and no incentives for users to retain only high-value data
Adapting to emerging requirements and changing technologies Exponential data growth - and analysis of that data - are increasingly important to the research enterprise Requires ready access to data, w/ low latency & high bandwidth Collaborative team science demands easy data sharing Consumer product development drives prices Disk capacities increasing quickly Flash memory becoming more affordable Gordon compute system just now being deployed with 0.25 PB of flash - to fill the latency gap between DRAM and spinning disk For HPC systems with historical byte/flop ratios, storage would be an increasingly significant fraction of total system cost Can t afford open-ended archival storage must develop methods to place value on data, especially for long-term high-reliability storage
SDSC is deploying a new repertoire of storage systems SDSC Cloud Storage of Digital Data for Ubiquitous Access and High-Durability Access: Multi-platform web interface, S3 interfaces, backup SW Data Oasis (PFS) High-Performance Transient Parallel File System for HPC Access: Lustre on HPC Systems (Gordon, Trestles, Triton) Project Storage Purpose: Typical Project / User File Server Storage Needs Access: NFS/CIFS, isci
A Paradigm Shift for Long-Term Storage: Access, Sharing and Collaboration SDSC Cloud http://cloud.sdsc.edu Launched September 2011 Largest, highest-performance known academic cloud 5.5 Petabytes (raw), 8 GB/sec System can upload 500GB in ~1 min Automatic dual-copy and verification Capacity and performance scale linearly to 100 s of petabytes Open source platform based on NASA and RackSpace software 5
Key Features of SDSC Cloud Always-there disk-based availability of data Tape latency and multi-user issues addressed High reliability Disk RAID; automatic dual-copy; continuous background checksum verification/ restoration; offsite replication soon Simple data owner user interfaces to data, its management, its access and setting permissions for sharing data Easy access to shared data for any users with permission under range of mechanisms (http, APIs, portals, gateways ) Encryption readily incorporated and addresses issues of storing HIPAA/proprietary data Transaction history is logged track usage, assess utility, support provenance Scalable system in both capacity and bandwidth Interfaces to commercial and open-source products
Applications of SDSC Cloud Shared/published/curated data collections HPC simulation data storage and sharing Web/portal applications and site hosting Application integration using supported APIs Serving images/videos Backup services
Why Openstack Swift Cloud Software? Evaluated Software OpenStack Swift Open Source Community Support Highly Configurable Eucalyptus Highly Flexible Compute Focused Caringo Castor Commercial Software Long Development Cycle Industry Standard More than 100 leading companies from over a dozen countries are participating in OpenStack, including Cisco, Citrix, Dell, Intel and Microsoft. Highly Compatible Compatibility w/ public OpenStack clouds means it s easy to migrate data and apps to public clouds when desired based on security policies, economics, and other key business criteria. Proven Software Running the OpenStack cloud operating system is same software that powers many large public and private clouds, including RackSpace Cloud Storage. Control & Flexibility Open source platform means not locked to a proprietary vendor, and modular design can integrate with legacy or 3rd-party technologies. OpenStack project provided under Apache 2.0 license.
SDSC Cloud Interfaces Data Owners Traditional Clients GUI Applications Command Line SDSC Web I/F Load Balanced Proxy Servers External Users Web Services API Amazon S3 Rackspace CloudFiles / Openstack API Swift Object Storage Cluster Commercial Products Commvault Amanda Backup Tools Crashplan User- Developed Web Portals/ Gateways
SDSC Cloud Explorer
Rates and Funding Mechanisms See https://cloud.sdsc.edu/hp/pricing.php for current pricing; HW costs subject to market volatility; contact services@sdsc.edu if interested in service On Demand Cloud Storage Pay monthly per GB used (water-mark) U California users: $X/TB-Year dual-copy + applicable indirect costs + 50% premium for additional off-site copy (when available) Users external to UC: 2*$X/TB-year dual-copy, 3*X for dual-copy + 1 off-site copy Condo Cloud Storage Recipient buys HW that is integrated into the storage service and pays annual operating costs for maintenance and system administration Purchase condo HW at $Y market price (pre-configured head node and disk array - currently 2TB drives with 8.5 TB usable dual-copy; space will increase over time) Annual operating cost: $Z/year/condo + applicable indirect costs & UC-external factors User has right to use condo for 5 years; TCO/condo = $Y + 5*Z over 5 years *Encryption and HIPAA Compliant Storage is available with both options
Questions? Get a trial account with an.edu email address cloud.sdsc.edu (no charges first 30 days)