www.basho.com Technical Overview Simple, Scalable, Object Storage Software



Similar documents
Multi-Datacenter Replication

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

CLOUD BASED SERVICE (CBS STORAGE)

The OpenStack TM Object Storage system

Simple Storage Service (S3)

Diagram 1: Islands of storage across a digital broadcast workflow

How to Choose Between Hadoop, NoSQL and RDBMS

Alfresco Enterprise on AWS: Reference Architecture

Apache Hadoop. Alexandru Costan

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

TECHNOLOGY WHITE PAPER Jun 2012

Designing a Cloud Storage System

Scalable Architecture on Amazon AWS Cloud

Hadoop Distributed File System. Dhruba Borthakur June, 2007

vsphere Replication for Disaster Recovery to Cloud

VMware vsphere Data Protection

Aspera Direct-to-Cloud Storage WHITE PAPER

Information Retrieval Elasticsearch

The last 18 months. AutoScale. IaaS. BizTalk Services Hyper-V Disaster Recovery Support. Multi-Factor Auth. Hyper-V Recovery.

Enterprise Private Cloud Storage

Implementing Multi-Tenanted Storage for Service Providers with Cloudian HyperStore. The Challenge SOLUTION GUIDE

SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

StorReduce Technical White Paper Cloud-based Data Deduplication

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

Assignment # 1 (Cloud Computing Security)

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Release Notes. LiveVault. Contents. Version Revision 0

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Creating a Cloud Backup Service. Deon George

TECHNOLOGY WHITE PAPER Jan 2016

Maginatics Cloud Storage Platform Feature Primer

Epimorphics Linked Data Publishing Platform

The Google File System

Service Catalogue. virtual services, real results

HDFS Users Guide. Table of contents

Amazon Glacier. Developer Guide API Version

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

Hitachi Content Platform (HCP)

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Federated Application Centric Infrastructure (ACI) Fabrics for Dual Data Center Deployments

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Evolution from Big Data to Smart Data

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Hadoop & its Usage at Facebook

Features of AnyShare

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

HDFS: Hadoop Distributed File System

EMC SCALEIO OPERATION OVERVIEW

Building Storage Service in a Private Cloud

Introduction to Cloud Computing

Storage Made Easy Enterprise File Share and Sync (EFSS) Cloud Control Gateway Architecture

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Cloud storage with Apache jclouds

ZooKeeper. Table of contents

vsphere Replication for Disaster Recovery to Cloud

Exposing the Cloud: It It s More than a Buzzword Tim Connors, Director, AT&T AT&T

The deployment of OHMS TM. in private cloud

Site Recovery Manager Installation and Configuration

A programming model in Cloud: MapReduce

Informatica Cloud & Redshift Getting Started User Guide

Cloud Models and Platforms

Microsoft Private Cloud Fast Track

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Hadoop IST 734 SS CHUNG

Vess A2000 Series HA Surveillance with Milestone XProtect VMS Version 1.0

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

12 Key File Sync and Share Advantages of Transporter Over Box for Enterprise

owncloud Architecture Overview

Architecture Overview

NoSQL and Hadoop Technologies On Oracle Cloud

Technical. Overview. ~ a ~ irods version 4.x

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Hadoop in the Hybrid Cloud

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Amazon Cloud Storage Options

Amazon Elastic Beanstalk

CTERA Agent for Mac OS-X

Hadoop & its Usage at Facebook

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hypertable Architecture Overview

319 MANAGED HOSTING TECHNICAL DETAILS

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Search and Real-Time Analytics on Big Data

Transcription:

www.basho.com Technical Overview Simple, Scalable, Object Storage Software

Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces... 3 Administration and Users... 3 Storage API... 3 Multipart Upload... 3 Storage and Usage Statistics... 3 Operations... 4 Monitoring and Metrics... 4 Scaling Out... 4 Riak S2 Enterprise:... 4 Multi- Datacenter Replication... 4 Next Steps... 5 Getting Started... 5

Introduction & Overview Riak S2 (formerly Riak CS) is enterprise object storage software. It is open source and built on top of Riak, offering S3- compatible, multi- tenant large object storage. It can be used for building public or private clouds, or as reliable storage for applications and platforms. Use cases for Riak S2 include: Storage for images, text, documents, videos and other large objects Internal storage providing on- demand capacity for business units A foundational storage layer for public clouds and cloud services Infrastructure for moving applications off of Amazon S3, or for hybrid solutions Redundant storage for backup and disaster recovery scenarios The primary features of Riak S2 are: Highly available, fault- tolerant storage Large object support S3- compatible API, authentication, and authorization Multi- tenancy with per- user reporting on usage and network I/O Simple operational model for adding capacity Riak S2 Enterprise, a commercial offering, adds: Multi- datacenter replication for active backups, disaster recover, and data locality Per- node or capacity- based pricing, inclusive of 24/7 support This technical brief covers Riak S2 architecture, APIs, and operations with a brief note on Riak S2 Enterprise pricing. Additional technical details on Riak S2 can be found in the online documentation. 1

Architecture Riak S2 is a cloud storage product built on top of the open source, distributed database Riak. Riak S2 handles large file uploads and exposes an S3- compatible API, user and administrative functions, and usage reporting. The underlying Riak cluster stores objects and metadata for retrieval - providing replication, fault tolerance and high availability for data stored in the system. We recommend a one- to- one mapping of Riak KV (formerly named Riak) to Riak S2 nodes, so there should be one Riak KV node for every Riak S2 node. Ideally, Riak KV and Riak S2 nodes should be run on the same physical machine to minimize unnecessary network traffic. How it Works In a Riak S2 system, any node can respond to client requests - there is no master node and each node has the same responsibilities. Since data is replicated (three replicas per object by default) and other nodes automatically take over the responsibility of failed or non- communicative nodes, data remains available even in the event of node failure or network partition. More information on how Riak handles failure conditions can be found in the Riak documentation. When an object is uploaded via the storage API, Riak S2 breaks the object into smaller chunks that are streamed, written, and replicated in Riak. Each chunk is associated with metadata for later retrieval. The diagram below provides a visualization. 2

APIs and Interfaces Riak S2 has HTTP interfaces for handling storage operations, authentication, administration, and per- user storage and bandwidth statistics. Below is an overview and additional details can be found in the Riak S2 documentation. Administration and Users Riak S2 exposes an interface for user creation, disablement, and credential management. Once created and issued credentials, users are able to authenticate, create buckets, upload and download files to their account, retrieve account information, obtain new credentials, or disable their account through the API. Data access can also be defined based on IP address. Riak S2 can be configured so only administrators can create users, or so that anyone can create a user directly. Administrator accounts have additional special privileges: they can retrieve a list of all users in the system and query the user account information of any user. Riak S2 uses a request serializer to ensure that global entities (Riak S2 users and bucket names) are unique in the system. One instance of this serializer (Stanchion) must be installed in every Riak S2 system. Riak S2 supports the standard S3 authentication scheme, with support for header and query parameter authorization. In addition, integration with OpenStack's Keystone authentication is available. Storage API The Riak S2 storage interface is S3- compatible and can be configured for use with existing S3 tools, including s3cmd, official S3 libraries, and clients such as Fog and Boto. Like S3, the API has simple, RESTful GET, PUT, Copy, and DELETE operations for objects and buckets in addition to support GET range queries. Riak S2 also supports the addition of arbitrary metadata to objects so users can store more useful information about their objects. Additionally, S3- style ACLs are provided for managing object and bucket permissions. Riak S2 also supports integration with OpenStack's Object Storage API. Multipart Upload As of the Riak S2 1.3 release, Riak S2 supports multipart uploads. At its simplest, multipart upload allows the uploading of a single large object as a series of parts. Once the individual parts have been uploaded, Riak S2 presents the data as a single object. The parts can be between 5MB and 5GB in size. Storage and Usage Statistics Operators can use the Riak S2 storage, usage, and network statistics to support use cases like accounting, subscription, billing, or multi- group utilization for public or private clouds. Riak S2 will report information on how much storage a named user is consuming and the network operations related to accessing it. This data can be queried on the default timespan now, or as a range from start time through end time. Access statistics are reported as bytes in and bytes out for both object and bucket operations. Reporting of this information can be scheduled for a set interval or manually triggered. Additional details billing and usage reporting are available in the docs. 3

Operations Riak S2 is designed to be easy to operate and scale, with monitoring and metrics solutions and a simple model for adding capacity as needed. Monitoring and Metrics Riak S2 exposes stats on critical operations that are accessible via HTTP request. Additionally, Riak S2 and Riak nodes both have DTrace support for analysis of running systems. In version 1.5 Syslog support was added to facilitate log aggregation. Scaling Out Adding new capacity to a Riak S2 cluster requires installing both Riak S2 and Riak on a new physical node and joining the node to the cluster using several simple management commands. Riak automatically redistributes data in the system so all nodes have equal responsibility, preventing storage hot spots and decreasing the operational burden of adding new nodes. As part of the Riak S2 Enterprise license, Basho offers set up, configuration, and operational support for the implementation. Basho also provides Chef recipes to automatically install Riak S2 on all nodes in the system. Riak S2 Enterprise: Multi- Datacenter Replication Riak S2 Enterprise is a commercial extension of Riak S2 that provides multi- datacenter replication and 24/7 support. Multi- datacenter replication for Riak S2 Enterprise provides two modes of object replication: full- sync and real- time sync. Data is streamed over a TCP connection, and multi- datacenter replication has support for SSL so data can be securely replicated between sites. In Riak S2, large objects are broken into blocks and streamed to the underlying Riak cluster on write, where they are replicated for high availability (three replicas by default). A manifest for each object is maintained so that blocks can be retrieved from the cluster and the full object presented to clients. For multi- datacenter replication in Riak S2 Enterprise, global information for users, bucket information, and manifests are streamed in real- time from a primary implementation to a secondary site, so global state is maintained across locations. Objects can then be replicated in either full- sync or real- time sync mode. In full- sync, objects are replicated from a primary Riak S2 Enterprise implementation to a secondary site on a configurable interval the default is six hours. During full- sync replication, each cluster computes a hash for each key s block value. Key/block pairs are compared, and the primary site streams any missing blocks or updates needed to the secondary site. 4

Real- time sync is triggered when an update is sent from a client to a primary Riak S2 Enterprise implementation. Once replicated in the first location, the updates are streamed in real- time to the secondary site. But what if a client requests an object from the secondary cluster and not all of its blocks have been replicated to that cluster? With Riak S2 Enterprise multi- datacenter replication, the secondary cluster will request any missing blocks from the primary cluster so the client request can be served quickly and accurately. Riak S2 Enterprise has two pricing models. Riak S2 Enterprise can be purchased on a per- node, annual subscription or perpetual license. The per- node price is inclusive of the Riak S2 nodes as well as the underlying Riak nodes required. Riak S2 Enterprise can also be purchased in blocks of raw storage. Under this model, at higher storage tiers, the effective per- gig price of the license is lower. Both models include 24/7 support. Request a developer trial of Riak S2 Enterprise to try it on your own infrastructure. Next Steps Getting Started Riak S2 packages are available on our downloads page and source code is available on Github. If you are interested in Riak S2 Enterprise, please contact us. We would love to talk to you about your possible use case and needs. ABOUT BASHO Basho is a distributed systems company dedicated to developing disruptive technology that simplify enterprises most critical data management challenges. Basho has attracted one of the most talented groups of engineers and technical experts ever assembled devoted exclusively to solving some of the most complex issues presented by scaling distributed systems. Basho s distributed database, Riak KV, the industry leading distributed NoSQL database, and Basho s cloud storage software, Riak S2, are used by fast growing Web businesses and by one third of the Fortune 50 to power their critical Web, mobile and social applications. The Basho Data Platform helps enterprises reduce the complexity of supporting Big Data applications by integrating Riak KV and Riak S2 with Apache Spark TM, Redis and Apache Solr TM. Basho is the organizer of RICON a distributed systems conference. Riak is the registered trademark of Basho Technologies, inc. 5