Embracing Open Source: Practice and Experience from Alibaba. Wensong Zhang The 11 th Northeast Asia OSS Promotion Forum 2012.11.13



Similar documents
Influence of Open Source on Engineer Culture. Shen jindi Alibaba Group

Introduction to Red Hat Storage. January, 2012

ZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG

Open Source Project from China. Northeast Asia Open Source SoftwareCompetition Nov. 2012

(Scale Out NAS System)

High Availability Databases based on Oracle 10g RAC on Linux

The Design and Implementation of the Zetta Storage Service. October 27, 2009

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Scala Storage Scale-Out Clustered Storage White Paper

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

Wikimedia architecture. Mark Bergsma Wikimedia Foundation Inc.

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Table of Contents. Overview... 1 Introduction... 2 Common Architectures Technical Challenges with Magento ChinaNetCloud's Experience...

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

Agenda. Brief introduction to Canonical. Ubuntu LTS Desktop Edition. Ubuntu LTS Netbook Edition. Ubuntu Long Term Support

Belgacom Group Carrier & Wholesale Solutions. ICT to drive Your Business. Hosting Solutions. Datacenter Services

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

CHOOSING THE RIGHT STORAGE PLATFORM FOR SPLUNK ENTERPRISE

Terms of Reference Microsoft Exchange and Domain Controller/ AD implementation

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

Random Walk Shoes. Setting Up a Web Server

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

Efficient Network Marketing - Fabien Hermenier A.M.a.a.a.C.

RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES

Introduction to Gluster. Versions 3.0.x

High Performance Computing OpenStack Options. September 22, 2015

Virtualised Cloud Storage Driver for Business ROI. Raymond Ng Principal Consultant Storage Solutions, Asia Pacific Huawei Symantec Technologies

How To Store Data On A Server Or Hard Drive (For A Cloud)

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

Huawei and Open Source. Industry development department Shi Hao

owncloud Enterprise Edition on IBM Infrastructure

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

Providing Self-Service, Life-cycle Management for Databases with VMware vfabric Data Director

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Big Data Processing: Past, Present and Future

Serving 4 million page requests an hour with Magento Enterprise

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Product Overview. UNIFIED COMPUTING Managed Hosting Compute Data Sheet

Dell Reference Configuration for Hortonworks Data Platform

Linux Performance Optimizations for Big Data Environments

Parallels Cloud Storage

Drupal Performance Tips and Tricks. Khalid Baheyeldin. Drupal Camp Toronto 2014

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

BIG DATA USING HADOOP

Enabling Technologies for Distributed Computing

NetApp Replication-based Backup

Cost-Effective Business Intelligence with Red Hat and Open Source

Dimension Data Enabling the Journey to the Cloud

SLIDE 1 Previous Next Exit

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Accenture Cloud Enterprise Services

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

SQL Server Virtualization 101. David Klee, Group Principal and Practice Lead. SQL PASS Virtualization VC,

Cisco Solutions for Big Data and Analytics

FOR SERVERS 2.2: FEATURE matrix

Taking the Plunge: Desktop Infrastructure. Rich Clifton

The High-Performance Cloud Infrastructure Company! 2011 Joyent, Inc. Contains Joyent Restricted Secrets. Not for Public Disclosure. Patents Pending.!

Mirror File System for Cloud Computing

Hadoop & its Usage at Facebook

Silverton Consulting, Inc. StorInt Briefing

Red Hat Enterprise Linux 6. Stanislav Polášek ELOS Technologies

Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers

Technical Overview Simple, Scalable, Object Storage Software

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Building and Managing a Standard Operating Environment

Zadara Storage Cloud A

Tuning Tableau Server for High Performance

THE HADOOP DISTRIBUTED FILE SYSTEM

ServerCentral Cloud Services Reliable. Adaptable. Robust.

The Future of Data Management

Big Data, SAP HANA. SUSE Linux Enterprise Server for SAP Applications. Kim Aaltonen

Big Data Performance Growth on the Rise

Red Hat Cluster Suite

Traditional v/s CONVRGD

ebay Storage, From Good to Great

Building All-Flash Software Defined Storages for Datacenters. Ji Hyuck Yun Storage Tech. Lab SK Telecom

HP reference configuration for entry-level SAS Grid Manager solutions

RFP-MM Enterprise Storage Addendum 1

Leading Entertainment Provider Optimizes Offsite Disaster Recovery with Silver Peak

CSE-E5430 Scalable Cloud Computing Lecture 2

Sheepdog: distributed storage system for QEMU

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

BORG DIGITAL High Availability

Siemens SAP Cloud - Infrastructure as a Service

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Scale-Out File Server. Subtitle

Datasheet NetApp FAS8000 Series

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Current Status of FEFS for the K computer

OPTIMIZING PRIMARY STORAGE WHITE PAPER FILE ARCHIVING SOLUTIONS FROM QSTAR AND CLOUDIAN

Enabling Technologies for Distributed and Cloud Computing

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Transcription:

Embracing Open Source: Practice and Experience from Alibaba Wensong Zhang The 11 th Northeast Asia OSS Promotion Forum 2012.11.13 1

Agenda 1. Overview of Alibaba 2. Taobao Software Infrastructure 3. Cases: Storage, CDN & DB 4. OSS from Alibaba 5. Open Source Procedure 6. Conclusions 2

Alibaba Group Service Portfolio C2C B2C Search & pricing comparison Global e-commerce platform for small businesses E-commerce platform for Chinese small businesses payment Group Shopping Cloud OS + Service China Yahoo! Data Analytics Digital Media (video, music, book, like itune) Mobile e- commence (Ali Cloud phone, Taobao & Alipay client) 3

Taobao.com Overview The merchandise value of online shopping in China reached about 785 billion RMB in 2011, in which Taobao took about 80% market share Taobao.com create 2.7 million direct job opportunity in 2011. Alexa Traffic Rank: 13 (12~18)in the world In 12 December 2011, there were over 120M unique visitors, the peak traffic of CDN is 856Gbps Over 800 applications on the web sites 4

Agenda 1. Overview of Alibaba 2. Taobao Software Infrastructure 3. Cases: Storage, CDN & DB 4. OSS from Alibaba 5. Open Source Procedure 6. Conclusions 5

Taobao System Architecture 6

Software Infrastructure 7

Major Software CDN: the largest picture CDN system based on open source LVS+Haproxy+Squid+Bind capability of shipping 2400Gbps traffic TFS: distributed object storage (home grown) Capacity of 12PB space, in which over 8PB is used Every GB picture space cost 3.4RMB/Y, ->2.5 RMB/Y TAIR: distributed memory cache & K/V system Integrate open source Redis and LevelDB Provide disaster tolerant support OceanBase: Distributed Table System (home grown) Support 100 billion records in a table, and transaction 8

Major Software(2) Big Data: Using Hadoop platform One single Hadoop cluster has about 3000 servers About 70PB space, and 48PB is used Running over 150,000 jobs a day Database: optimizing MySQL with high speed nonvolatile memory and multiple level tuning WangWang IM:Home grown, over 10M simultaneous users, 99.97% available in 2011 Web Server Platform: Nginx is deployed for over 150 applications, over 2000 servers; TMD system; Tengine open source project 9

Major Software(3) Underlying Software Develop Taobao JVM based on OpenJDK Maintain Taobao Linux kernel based on Red Hat KVM + Sheepdog for internal testing platform Load balancing solution based on LVS Network Mirror system based on open source software Taobao Platform is completely built on open source and home grown software 10

Agenda 1. Overview of Alibaba 2. Taobao Software Infrastructure 3. Cases: Storage, CDN & DB 4. OSS from Alibaba 5. Open Source Procedure 6. Conclusions 11

Picture Storage before 2007 Upload Server Admin Server Image Server 滨 江 联 通 备 份 中 心 SnapMirror SnapMirror SnapVault 远 程 数 据 冗 灾 NearStore R200 DR: A+B+C+D SnapShot FAS980C SnapShot SnapShot FAS980C SnapShot A: Online Image B: Online Image 杭 州 网 通 IDC C: Online Image D: Online Image 注 : 红 框 中 设 备 是 2006 年 的 新 增 设 备, 明 年 还 得 在 原 有 的 NetApp 980C/R200 12 存 储 上 增 加 20TB 左 右 新 的 硬 盘 容 量

System Requirement System requirements Data safety is becoming more and more important Data is tripled every year Commercial storage products could not meet requirements any more There is no optimization for little object The number of files is too huge to support The number of network connections reach its limit The cost is too high, 10T NAS cost over 1M RMB at that time There is no grantee in disaster tolerance 13

TFS 1.0 June 2007 TFS(Taobao File System)1.0 went to production Distributed storage for massive small objects 200 PC Servers(146G*6 SAS 15K Raid5) File Number: several billions Deployed: 140TB Used: 50TB 14

TFS 1.0 Architecture 15

TFS1.0 Features Master-backup Name Servers and multiple Data Servers Data Server runs on Linux servers with directly attached Each block is 64M in size Filename (Object ID) contains metadata information, in order to keep metadata extremely small in Name Servers E.g. T2auNFXXBaXXXXXXXX_!!140680281.jpg include its block_no and object_no. Each block can have multiple copies. Use ext3 file system to store block 16

TFS 1.3 June 2009 TFS(Taobao File System)1.3 on production TFS Cluster(2010.8.22) 440 台 PC Server (300G*12 SAS 15K RPM) + 30 台 PC Server (600G*12 SAS 15K RPM) File Number: tens of billions Deployed: 1800 TB Used: 995TB Name Server only uses 217MB memory for metadata 17

TFS Development TFS was released in open source in September 2010 TFS 2.0 has already been used in production Support large files Support directory using external MySQL Add resource center for QoS Onging Performance tuning, and cost saving Hybrid storage(ssd/sata), object migration Implementing Erasure Coding 18

Agenda 1. Overview of Alibaba 2. Taobao Software Infrastructure 3. Cases: Storage, CDN & DB 4. OSS from Alibaba 5. Open Source Procedure 6. Conclusions 19

Taobao CDN System Issues Commercial products: performance bottleneck, less features, and instability Challenges in scale, performance, availability and manageability Develop own CDN system New architecture and tuning on CDN site CDN monitoring platform Global load balancing system CDN real-time object purging system CDN configuration management system 20

client CDN Site Old Architecture VIP1 NetScaler(7LB) VIP2 NetScaler(7LB) 频 道 1 频 道 2 频 道 频 道 n squid squid squid squid 源 站 21

client CDN Site New Architecture VIP1 LVS(L4) 心 跳 VIP2 LVS (L4) Haproxy(L7) haproxy(l7) Haproxy(L7) 所 有 频 道 统 一 调 度 squid squid squid squid 源 站 22

CDN Site Comparison Features New Architecture Old Architecture Traffic Distribution Maintenance Anti-attacking Self-Control Price Scalability Flexibility Scalability:one VIP address can ship 100G traffic with 10Gigabit NIC Flexibility: consistent hashing scheduling can make it easy to add/remove servers, where only 1/(n+1) objects needs moving

Squid Optimization Implement TCOSS based on COSS, FIFO + hot objects is kept, supporting 1T size file Squid memory optimization: 10M objects can save about 1250M memory in index Use sendfile to send objects in disk IO optimization: one request will need about 0.9 IO operation in average Use SSD+SAS+SATA hybrid storage, develop dynamic object migration algorithm 24

Agenda 1. Overview of Alibaba 2. Taobao Software Infrastructure 3. Cases: Storage, CDN & DB 4. OSS from Alibaba 5. Open Source Procedure 6. Conclusions 25

OSS from Alibaba Alibaba have released over 100 pieces of software in open source so far, including frontend, backend, database, file system, hardware, and so on Alibaba also contribute changes back to upstream 26

Agenda 1. Overview of Alibaba 2. Taobao Software Infrastructure 3. Cases: Storage, CDN & DB 4. OSS from Alibaba 5. Open Source Procedure 6. Conclusions 27

Open Source Committee Found Alibaba Open Source Committee with ten persons, who is from Developers mainly Lawyer Security analysis The goal of Alibaba Open Source Committee is to help promote open source Ensure open source procedure Give advice on open source

Alibaba Open Source Procedur Engineers initiate open source request Need approval from their manager and the head of big division Security analysis team will check document and code for any security issues Alibaba Open Source Committee record this software in open source list Offer advices on open source license issue Host project on Taocode and also synchronize it to github

Encourage Open Source Accomplishment in open source project is the best encouragement to engineers. Host internal open source meetings from time to time, for sharing and brain storming. Reward excellent open source projects. 30

Open Source Licenses GPL vs. BSD vs. MIT vs. Apache GPL is better to guarantee development of open source project Alibaba s consideration on licenses Most projects choose GPL Some libraries use BSD or Apache license Some just follow the license from upstream Alibaba Group is the copyright owner (C) 2007-2012 Alibaba Group Holding Limited

Open Source Platform code.taobao.org Goal The platform itself is open source too Fast access in China Status There are 373 projects Mature open source projects is mainly from Taobao Non-Taobao projects are becoming active, some is good.

code.alibabatech.com Open Source Platform

Agenda 1. Overview of Alibaba 2. Taobao Software Infrastructure 3. Cases: Storage, CDN & DB 4. OSS from Alibaba 5. Open Source Procedure 6. Conclusions 34

From Engineer Side Alibaba engineers gain some benefits from doing open source projects More accomplishment Working with more developers and hackers, good for skill improvement Getting more users Make a longer lifecycle of their code Happy to see how their code is used in different ways

From Company Side Improve software quality through open source More user requirements and feedbacks More users tesing and bug reports Patches from external developers Receive industry recognition on technology capability and open spirit Engineers have strong sense of identity Attracting more talents to join Alibaba

Conclusion Alibaba Group is the beneficiary of open source system, and actively participate in building open source eco-system. Hope that the company can accumulate better, and attract more talents to meet the greater technical challenges in the future. Alibaba Group hopes to do more technology innovation with the industry in a more open manner. 37

Q&A Q&A Thanks 38