Cloud storage with Apache jclouds



Similar documents
Technical Overview Simple, Scalable, Object Storage Software

Aspera Direct-to-Cloud Storage WHITE PAPER

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

Aspera Direct-to-Cloud Storage WHITE PAPER

A programming model in Cloud: MapReduce

SMB in the Cloud David Disseldorp

Hadoop & its Usage at Facebook

Gladinet Cloud Access Solution Simple, Secure Access to Online Storage

Amazon Glacier. Developer Guide API Version

Hadoop & its Usage at Facebook

Windows Azure Storage Essential Cloud Storage Services

Practical Data Integrity Protection in Network-Coded Cloud Storage

Client-aware Cloud Storage

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Maginatics Cloud Storage Platform Feature Primer

Backing up to the Cloud

HOST EUROPE CLOUD STORAGE REST API DEVELOPER REFERENCE

Cloud Computing with Windows Azure using your Preferred Technology

Cloud and Big Data initiatives. Mark O Connell, EMC

COSC 6397 Big Data Analytics. Distributed File Systems (II) Edgar Gabriel Spring HDFS Basics

Apache Hadoop FileSystem and its Usage in Facebook

CloudFTP: A free Storage Cloud

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

IBM Spectrum Protect in the Cloud

WP4: Cloud Hosting Chapter Object Storage Generic Enabler

Apache Hadoop FileSystem Internals

ONLINE BACKUP AND RECOVERY USING AMAZON S3

CLOUD STORAGE USING HADOOP AND PLAY

Intro to AWS: Storage Services

NetApp Data Fabric: Secured Backup to Public Cloud. Sonny Afen Senior Technical Consultant NetApp Indonesia

Service Description Cloud Storage Openstack Swift

5 HDFS - Hadoop Distributed System

Introduction to Azure: Microsoft s Cloud OS

Herve Roggero 3/3/2015

SECURE BACKUP SYSTEM DESKTOP AND MOBILE-PHONE SECURE BACKUP SYSTEM HOSTED ON A STORAGE CLOUD

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2

Distributed File Systems

Flight Workflow User's Guide. Release

Platforms in the Cloud

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

Transforming cloud infrastructure to support Big Data Ying Xu Aspera, Inc

Clouds and Other Computa1onal Frameworks. Evere7 Toews, Cybera Inc. Todd King, UCLA

A block based storage model for remote online backups in a trust no one environment

Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Amazon Simple Notification Service. Developer Guide API Version

GDC Data Transfer Tool User s Guide. NCI Genomic Data Commons (GDC)

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

STORAGE S3 API. Storage S3 API. Programmer s Guide. Revision 1.0 (20/04/2012) Lunacloud Tel: Southampton Row info@lunacloud.

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

How To Image A Single Vm For Forensic Analysis On Vmwarehouse.Com

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

OpenStack. Orgad Kimchi. Principal Software Engineer. Oracle ISV Engineering. 1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.

GeoCloud Project Report GEOSS Clearinghouse

Amazon Hosted ESRI GeoPortal Server. GeoCloud Project Report

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software

User Guide. Gladinet

Implementing Cloud Storage with OpenStack Swift

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Distributed Filesystems

UForge 3.4 Release Notes

Storage Made Easy Enterprise File Share and Sync (EFSS) Cloud Control Gateway Architecture

The State of Cloud Storage

SDFS Overview. By Sam Silverberg

High Performance Computing OpenStack Options. September 22, 2015

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn)

Cloud Computing. Up until now

Cloud Computing Trends

OpenStack Introduction. November 4, 2015

Mobile Cloud Computing T Open Source IaaS

WELCOME TO CITUS CLOUD LOAD TEST

The OpenStack TM Object Storage system

Cloud on TEIN Part I: OpenStack Cloud Deployment. Vasinee Siripoonya Electronic Government Agency of Thailand Kasidit Chanchio Thammasat University

How To Choose Cloud Computing

Open Text Archive Server and Microsoft Windows Azure Storage

A SHORT INTRODUCTION TO CYBERDUCK WITH CLOUD OBJECT STORAGE. Version

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

CLOUD CRUISER FOR WINDOWS AZURE PACK

Hadoop Big Data for Processing Data and Performing Workload

Cloud Elements! Events Management BETA! API Version 2.0

Manage cloud infrastructures using Zend Framework

Cloud Sync White Paper. Based on DSM 6.0

StorReduce Technical White Paper Cloud-based Data Deduplication

From Internet Data Centers to Data Centers in the Cloud

Reduction of Data at Namenode in HDFS using harballing Technique

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Service Level Agreement for Windows Azure operated by 21Vianet

Transcription:

Cloud storage with Apache jclouds Andrew Gaul 19 November 2014 http://jclouds.apache.org/ http://gaul.org/ 1 / 29

Overview What is Apache jclouds What is object storage Basic concepts How to put and get objects Advanced topics 2 / 29

What is Apache jclouds Interface with public and private cloud services via Java Provider-specific implementations, e.g., Amazon S3, OpenStack Swift Cross-provider, portable abstractions, e.g., BlobStore Supports all major providers 3 / 29

Why object storage Scale out to billions of objects and petabytes of data Lower cost, higher performance, better reliability Applications and tenants can share one object store Many competitive public and private cloud offerings 4 / 29

Why not object storage Must rewrite existing applications against new interfaces Distributed systems introduce complexity and additional failure modes Lacks features that databases and filesystems have 5 / 29

What is object storage Distributed system of storage nodes Supports narrow interface: Create and delete containers Put, get, and delete objects List objects in a container Modify and retrieve metadata 6 / 29

Example applications Store photos and videos for a social networking site Archive large data sets, e.g., scientific measurements 7 / 29

Basic concepts APIs and Providers Regions Containers Objects 8 / 29

APIs and Providers (1) API defines how to communicate with a cloud Provider specializes a given API, e.g., regions Support major APIs: Atmos, Azure, Google, S3, Swift Support many providers: Glacier, HP, Rackspace 9 / 29

APIs and Providers (2) 10 / 29

Regions Most public providers offer multiple geographic regions Usually associate a container with a given region Some providers can replicate between regions automatically, others offer a per-object cross-region copy 11 / 29

Containers Containers can contain many named objects Some providers have partial directory support Some providers allow publicread access containers 12 / 29

Objects Objects have a name, stream of bytes, and metadata map Must write entire stream of bytes but can read ranges Metadata map includes standard HTTP headers, e.g., Content-Type: text/plain and arbitrary user headers, e.g., x- amz-meta-foo: bar 13 / 29

Example put and get try (BlobStoreContext context = ContextBuilder.newBuilder("transient").credentials("identity", "credential").buildview(blobstorecontext.class)) { BlobStore blobstore = context.getblobstore(); ByteSource payload = ByteSource.wrap( new byte[] { 1, 2, 3, 4 }); Blob blob = blobstore.blobbuilder(blobname).payload(payload).contentlength(payload.size()).build(); blobstore.putblob(containername, blob); blob = blobstore.getblob( containername, blobname); try (InputStream is = blob.getpayload().openstream()) { ByteStreams.copy(is, os); } } 14 / 29

Consistency (1) Some blobstores have weak semantics and can return stale data for some time putblob(name, data1), putblob(name, data2), getblob(name) can return data1 or data2 15 / 29

Consistency (2) Applications may need retry loops or other fallback logic S3 (standard region) and Swift have various forms of eventual consistency Atmos, Azure, and GCS have strong consistency, more like a file system 16 / 29

Data integrity (1) Provider guarantees object integrity at-rest via content hash Application can guarantee object integrity on-the-wire by providing and verifying Content-MD5 header Guava has useful helpers 17 / 29

Data integrity (2) Guarantee integrity during putblob: ByteSource payload = ByteSource.wrap( new byte[] { 1, 2, 3, 4 }); Blob blob = blobstore.blobbuilder(blobname).payload(payload).contentlength(payload.size()).contentmd5(payload.hash(hashing.md5())).build(); blobstore.putblob(containername, blob); This reads InputStream twice 18 / 29

Data integrity (3) Guarantee integrity during getblob: Blob blob = blobstore.getblob(containername, blob); HashCode md5 = HashCode.fromBytes(blob.getMetadata().getContentMetadata().getContentMD5()); try (HashingInputStream his = new HashingInputStream( blob.getpayload().openstream()), Hashing.md5()) { ByteStreams.copy(his, os); if (md5.equals(his.hash())) { throw new IOException(); } } 19 / 29

Large object sizes Some objects may not fit in memory or in a byte[] Non-repeatable payloads, e.g., InputStream Repeatable payloads, e.g., Guava ByteSource, can retry after network failure 20 / 29

Multi-part upload Some objects are too large for a single put operation, e.g., AWS S3 > 5 GB, Azure > 64 MB jclouds abstracts these details via: Blob blob =... blobstore.putblob(containername, blob, new PutOptions().multipart()); 21 / 29

Many objects Amazon S3 supports an arbitrary number of objects per container Swift recommends max 500,000 objects per container Atmos recommends max 65,536 objects per directory Solution: shard over multiple containers or directories 22 / 29

URL signing (1) Most object stores allow creating a URL which your application can vend to external clients client 1. request URL 2. vend signed URL 3. access object jclouds blobstore 23 / 29

URL signing (2) URL is cryptographically signed; allows time-limited access to a single object Clients interact directly with the object store, removing your application as the bottleneck 24 / 29

Amazon Glacier Optimized for storage, not retrieval Read requests can take several hours to complete Least expensive public cloud provider, often cheaper than private cloud 25 / 29

Local blobstores In-memory (transient) appropriate for unit testing Filesystem has production uses with limited number of objects or quantity of data Remote clients require running a HTTP server and implementing authorization 26 / 29

jclouds-cli Command-line tool useful for administrative and debugging tasks: jclouds blobstore container-list \ --provider aws-s3 \ --identity $IDENTITY \ --credential $CREDENTIAL jclouds blobstore write \ --provider aws-s3 \ --identity $IDENTITY \ --credential $CREDENTIAL \ $CONTAINER_NAME $OBJECT_NAME $FILE_NAME 27 / 29

References https://github.com/jclouds/jcloudsexamples https://github.com/andrewgaul/s3proxy http://gaul.org/object-storecomparison/ 28 / 29

Thank you! http://jclouds.apache.org/ http://gaul.org/ 29 / 29