Lambert: Achieve High Durability, Low Cost & Flexibility at Same Time

Similar documents

Freezing Exabytes of Data at Facebook s Cold Storage. Kestutis Patiejunas (kestutip@fb.com)

Scaling from 1 PC to a super computer using Mascot

Alexandria Overview. Sept 4, 2015

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Microsoft Private Cloud Fast Track

Building Storage Clouds for Online Applications A Case for Optimized Object Storage

IBM Spectrum Protect in the Cloud

Introduction to OpenStack Swift CloudOpen Japan 2014

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study

Microsoft Private Cloud Fast Track Reference Architecture

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Avoiding Performance Bottlenecks in Hyper-V

How Solace Message Routers Reduce the Cost of IT Infrastructure

THE WORLD S MOST EFFICIENT DATA CENTER

Data Centric Computing Revisited

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Designing a Cloud Storage System

SwiftStack Global Cluster Deployment Guide

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Symantec NetBackup 5000 Appliance Series

Hadoop: Embracing future hardware

Quantcast Petabyte Storage at Half Price with QFS!

Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

Embracing Open Source: Practice and Experience from Alibaba. Wensong Zhang The 11 th Northeast Asia OSS Promotion Forum

Masters Project Proposal

Unveiling ~okeanos: A public cloud IaaS service coming from the depths of the GRNET's DataCenter facilities

Shared Parallel File System

Big Data and Cloud Computing for GHRSST

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

High Performance Computing OpenStack Options. September 22, 2015

Pluribus Netvisor Solution Brief

Fabrics that Fit Matching the Network to Today s Data Center Traffic Conditions

SAM-FS - Advanced Storage Management Solutions for High Performance Computing Environments

State of the Art Cloud Infrastructure

International Journal of Computer & Organization Trends Volume21 Number1 June 2015 A Study on Load Balancing in Cloud Computing

Η υπηρεσία Public IaaS ΕΔΕΤ ανάπτυξη και λειτουργία για χιλιάδες χρήστες

Beyond the Hypervisor

bigdata Managing Scale in Ontological Systems

Intel DPDK Boosts Server Appliance Performance White Paper

2) Xen Hypervisor 3) UEC

Big Data Trends and HDFS Evolution

With Red Hat Enterprise Virtualization, you can: Take advantage of existing people skills and investments

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Sales Slide Midokura Enterprise MidoNet V1. July 2015 Fujitsu Limited

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Using object storage as a target for backup, disaster recovery, archiving

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Top 5 Reasons to choose Microsoft Windows Server 2008 R2 SP1 Hyper-V over VMware vsphere 5

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Big + Fast + Safe + Simple = Lowest Technical Risk

StorReduce Technical White Paper Cloud-based Data Deduplication

Uses 100% Open Source. to Process. 1Billion. more than. Transactions per Day

ioscale: The Holy Grail for Hyperscale

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

DataCentred Cloud Services Pricing MediaCityUK, Manchester Flexible, Open Source, Cost Effective

The Convergence of Software Defined Storage and Physical Appliances Hybrid Cloud Storage

ZFS Backup Platform. ZFS Backup Platform. Senior Systems Analyst TalkTalk Group. Robert Milkowski.

RevoScaleR Speed and Scalability

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Hadoop & its Usage at Facebook

SAN Conceptual and Design Basics

REDEFINE SIMPLICITY TOP REASONS: EMC VSPEX BLUE FOR VIRTUALIZED ENVIRONMENTS

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

Adding scalability to legacy PHP web applications. Overview. Mario Valdez-Ramirez

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2.

Product Spotlight. A Look at the Future of Storage. Featuring SUSE Enterprise Storage. Where IT perceptions are reality

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

Deploying SwiftStack Object Storage for OpenStack Glance and Cinder Backups

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

Why EMC for SAP HANA. EMC is the #1 Storage Vendor for SAP (IDC Storage User Demand Study, Fall 2011)

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

Archiving On-Premise and in the Cloud. March 2015

HA for Enterprise Clouds: Oracle Solaris Cluster & OpenStack

White Paper Storage for Big Data and Analytics Challenges

Sun Constellation System: The Open Petascale Computing Architecture

Service Catalogue. virtual services, real results

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Hadoop & its Usage at Facebook

Easier - Faster - Better

Transcription:

Lambert: Achieve High Durability, Low Cost & Flexibility at Same Time Open source storage engine for exabyte data in Alibaba Coly Li <bosong.ly@alibaba-inc.com> Robin Dong <haodong@alibaba-inc.com>

About Speakers Coly Li Member of technical architecture committee, in charge of storage engineering of AIS (Alibaba Infrastructure Service) Robin Dong Chief software engineer, cold data storage of AIS (Alibaba Infrastructure Service).

Content Cold Data in Real World Work Load characteristic and Technical Challenge System design and topology Motivation of using open source technology Improvement & Contribution to sheepdog project Credit to Open Source Community

Cold Data in Real World From an internal storage system, we observe different access pattern unlike existing backup/restore usage. In last 18 months, Less than 5% data is accessed (read/delete) Each read access does not touch whole data object Only a small range of data is accessed from a single data object This is typical cold data access pattern, existing backup/restore system is expensive to handle cold data storage work load.

Work Load characteris9c High performance is NOT first priority, Internal customer doesn t require high throughput or low latency Access latency in hours is acceptable High Durability is critical, Not high availability (failure of network, power supply, mainboard) Data is safe on storage media, after failure is recovered

Work Load characteris9c (Cont.) Extremely low cost is required, Data is required to be accessible for many years, even whole company life cycle Data set increases very fast, especially when online business goes very well (US $9.3 billion in GMV on 11.11 Shopping Festival) Extremely low storage cost is critical, in Exabyte cold data storage

Technical Challenge High Durability & Low Cost in same time, Reliable & cheap storage media Multiple copies and fast failure recovery

Technical Challenge (Cont.) Deployment in third-party data centers, Power supply, cooling, rack capacity might be variable in different data centers Cold storage hardware is required to scale from Petabyte to Exabyte, with different power supplies (e.g. from 4KW to 8KW) Not every data center (we have) is perfect like this

Technical Challenge (Cont.) Tape or Blue-Ray disk does not work perfectly in our case currently, Tape is cheap, but automatic tape library is expensive, if not in large scale deployment Is Blue-Ray disk cheaper in long term? We need help from industry to prove it, in our work load. tape library with automatic robot http://www.boston.com/bigpicture/2009/11/large_hadron_collider_ready_to.html In our current situation, we prefer mechanical hard disk as storage media for cold data. Facebook Blue-Ray storage for cold data http://www.burnworld.com/blu-ray-the-future-of-data-storage/

System design and topology Hardware is designed for low cost & high density storage l 18 hard disks (3.5 inch) in a single 1U case l 4T or 8T low performance hard disk l 32U cases in a single rack l Low cost & power consume CPU and memory The hardware design is called Project Scorpio. Project Scorpio storage hardware reported by ZDnet (chinese tech news media), http://solution.zdnet.com.cn/2014/0724/3028228.shtml Check open data center committee website for more chinese information: http://www.opendatacenter.cn/data/manual/index.html

System design and topology (Cont.) Deployment unit Hardware is extended in deployment unit (e.g. 4 Scorpio rack). In each unit, there are several distributed storage clusters. The sub-cluster is minimum unit of software defined storage. Single data object will be only stored within single specific sub-cluster. When a sub-cluster is full, turn it into sealed state.

System design and topology (Cont.) Simple software defined storage Sub-cluster is a distributed storage cluster as minimum software storage unit. Just adding more sub-clusters (deployment unit) when extend storage capacity. No matter how large a cold storage system is, we only encounter scalability issue of a single sub-cluster. It is much easier. Small storage cluster means simple, simple means reliable and stable in large scale systems.

System design and topology (Cont.) Deploy in large scale, only a small group of sub-clusters are in working state

Mo9va9on of using open source technology For the distributed storage system of sub-cluster, we need, Simplicity: easy to improve, optimize and maintain Consistent Hashing: distributed block storage Erasure Code: less data duplication with higher durability RESTful API: swift interface for data store, access and control Other than developing from scratch, it is more efficient to start from a simple open source project as code base for cold data storage engine. The code name of this cold data storage engine is called: Lambert

Mo9va9on of using open source technology (Cont.) Sheepdog volume, the code base to start. https://github.com/sheepdog/sheepdog/wiki/sheepdog-design

Improvement & Contribu9on to sheepdog project In early 2013, sheepdog project only has consistent hashing storage framework. Since June 2013, Alibaba contributes engineering source to improve sheepdog for cold data storage, Yuan Liu, implements erasure code support with zfec, and many other sheepdog improvement. Robin Dong, implements (1) RESTful API, complying with Openstack Swift interface spec; (2) hyper volume, for big data object storage; (3) data recovery performance improvement All general patches are back to sheepdog upstream. Nov 2014, Lambert starts online service in selected data center. Zfec project: http://freecode.com/projects/zfec, Openstack swift project: https://swiftstack.com/openstack-swift/

Improvement & Contribu9on to sheepdog project (Cont.) Erasure Code

Improvement & Contribu9on to sheepdog project (Cont.) Hyper volume implementation

Improvement & Contribu9on to sheepdog project (Cont.) Swift interface implementation

Improvement & Contribu9on to sheepdog project (Cont.) Data recovery optimization, Enable multiple threads for data object recovery. Optimize recovery mode: Before (Node mode): If one disk failed, the server will fetch data from other servers and calculate the EC to recovery its own lost data. After (Disk mode): All the servers in subcluster will participate the recovery work. Recovery performance increases 4 times, which results much better data storage durability.

Hats oﬀ to Lambert Engineering Team Coly Li Guining Li Robin Dong Kai Zhou Yuan Liu Bingpeng Zhu Meng An

Credit to open source community Without cooperation with open source community, we are not able to move forward fast. Sheepdog community builds a simple and elegant code base for our cold data storage engine. Top 10 contributors of sheepdog project [1] : liuy, kazum, mitake, levin108, RobinDong, fujita, liangry, kylezh, yunkai, arachsys Intel open source engineering team help us to accelerate generic storage algorithm [2] performance on Intel ATOM processors. [1] Some people in top 10 contributors work{s,ed} in Alibaba as well, they are liuy, levin108, RobinDong, yunkai. [2] Performance acceleration is included but not limited to erasure code, hashing, and other storage related algorithms.

Great Thanks to You All!