Big Data, Deep Learning and Other Allegories: Scalability and Fault- tolerance of Parallel and Distributed Infrastructures.



Similar documents
Amr El Abbadi. Computer Science, UC Santa Barbara

Divy Agrawal and Amr El Abbadi Department of Computer Science University of California at Santa Barbara

Cloud Computing at Google. Architecture

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

How To Manage A Multi-Tenant Database In A Cloud Platform

DATABASE REPLICATION A TALE OF RESEARCH ACROSS COMMUNITIES

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

Project Overview. Collabora'on Mee'ng with Op'mis, Sept. 2011, Rome

Challenges for Data Driven Systems

Cassandra A Decentralized Structured Storage System

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Scalable and Elastic Transactional Data Stores for Cloud Computing Platforms

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

CSE-E5430 Scalable Cloud Computing Lecture 2

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Tier Architectures. Kathleen Durant CS 3200

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

NoSQL for SQL Professionals William McKnight

Big Data With Hadoop

Using RDBMS, NoSQL or Hadoop?

Hadoop and Map-Reduce. Swati Gore

A1 and FARM scalable graph database on top of a transactional memory layer

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

MyDBaaS: A Framework for Database-as-a-Service Monitoring

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Cloud Based Application Architectures using Smart Computing

Aaron J. Elmore, Carlo Curino, Divyakant Agrawal, Amr El Abbadi. cs.ucsb.edu microsoft.com

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Massive Data Storage

Cloud Computing Is In Your Future

Traditional v/s CONVRGD

High Availability Using MySQL in the Cloud:

Distribution transparency. Degree of transparency. Openness of distributed systems

The Impact of PaaS on Business Transformation

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY

SQL Server 2012 Performance White Paper

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

CSE-E5430 Scalable Cloud Computing Lecture 11

Chapter 19 Cloud Computing for Multimedia Services

SQream Technologies Ltd - Confiden7al

Hosting Transaction Based Applications on Cloud

Scality RING High performance Storage So7ware for pla:orms, StaaS and Cloud ApplicaAons

Microsoft Private Cloud Fast Track

From the Monolith to Microservices: Evolving Your Architecture to Scale. Randy linkedin.com/in/randyshoup

Datacenter Operating Systems

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

NoSQL and Hadoop Technologies On Oracle Cloud

Managing large clusters resources

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Benchmarking Cassandra on Violin

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Lecture Data Warehouse Systems


Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Tushar Joshi Turtle Networks Ltd

bigdata Managing Scale in Ontological Systems

Putting Apache Kafka to Use!

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

Cassandra A Decentralized, Structured Storage System

Neil Stobart Cloudian Inc. CLOUDIAN HYPERSTORE Smart Data Storage

Cloud Based Distributed Databases: The Future Ahead

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Distributed Architecture of Oracle Database In-memory

Cloud Computing with Microsoft Azure

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Graph Database Proof of Concept Report

Architectures for Big Data Analytics A database perspective

High Performance Big Data Analy5cs powered by Unique Web Accelera5on and NoSQL. The Big Data Engine

ORACLE DATABASE 10G ENTERPRISE EDITION

Transcription:

Big Data, Deep Learning and Other Allegories: Scalability and Fault- tolerance of Parallel and Distributed Infrastructures Professor of Computer Science UC Santa Barbara Divy Agrawal Research Director, Data AnalyFcs Qatar CompuFng Research InsFtute VisiFng ScienFst, Ads Data Infrastructure Google Inc With: Sanjay Chawla et al (QCRI), Amr El Abbadi et al (UCSB), & Shiv Venkataraman et al (Google)

MoFvaFon Availability of vast amounts of data: Hundreds of billions of text documents Billions of images/videos with descripfve annotafons Tens of trillions of log records capturing human acfvity Machine Learning + Big Data transforming ficfon into reality: Self- driven automobiles Automated image understanding And most recently, deep learning to simulate a human brain 2

Big Data Challenges X 1 X 2 X d x 1 x 11 x 12 x 1d StaFsFcal Hardness f :{ C} 2 { C} x 2 x 21 x 22 x 2d x n x n1 x n2 x nd ComputaFonal Complexity T : S S dup {0,1}

Data Analy0cs, Data Mining, and Machine Learning Data: The apple of my eye is hooked on Apple s smart phone and loves apple and yogurt Database Query: how many Fmes does apple appear in the data? Data Mining Query: what are the most frequent items that appear together in the data? Machine Learning: how many Fme does the fruit:<apple> appear in the data? 4

ApplicaFons Analysis ReporFng Learning X 1 X 2 X d x 1 x 11 x 12 x 1d x 2 x 21 x 22 x 2d x n x n1 x n2 x nd Discovery 5

X 1 X 2 X d x 1 x 11 x 12 x 1d x 2 x 21 x 22 x 2d x n x n1 x n2 x nd BIG DATA MANAGEMENT (UCSB) 6

Paradigm Shih in CompuFng 7

Cloud CompuFng: Why? Experience with very large datacenters Unprecedented economies of scale Transfer of risk Technology factors Pervasive broadband Internet Maturity in VirtualizaFon Technology Business factors Minimal capital expenditure Pay- as- you- go billing model 8

Economics of Cloud CompuFng Pay by use instead of provisioning for peak Resources Capacity Demand Resources Capacity Time Time Demand StaFc data center Data center in the cloud 9

Scaling in the Cloud Client Site Client Site Client Site HAProxy (Load Balancer) ElasFc IP Apache + App Server Apache + App Server Apache + App Server Apache + App Server Apache + App Server Database becomes ReplicaFon the MySQL MySQL Scalability BoTleneck Master DB Slave DB Cannot leverage elas0city 10

Scaling in the Cloud Client Site Client Site Client Site HAProxy (Load Balancer) ElasFc IP Apache + App Server Apache + App Server Apache + App Server Apache + App Server Apache + App Server MySQL Master DB ReplicaFon MySQL Slave DB 11

Scaling in the Cloud Client Site Client Site Client Site HAProxy (Load Balancer) ElasFc IP Apache + App Server Apache + App Server Apache + App Server Apache + App Server Apache + App Server Scalable and Elas0c Key Value Stores But limited consistency and opera0onal flexibility 12

Scale- up Classical enterprise selng (RDBMS) Flexible ACID transac0ons TransacFons in a single node Scale- out Two approaches to scalability Cloud friendly (Key value stores) ExecuFon at a single server Limited funcfonality & guarantees No mul0- row or mul0- step transacfons 13

Key- value Stores: Design Principles Separate System and Applica0on State Limit Applica0on interac0ons to a single node Decouple Ownership from Data Storage Limited distributed synchroniza0on is prac0cal 14

Scalable Data ManagemenFn the Cloud RDBMS Fission Key Value Stores Fusion ElasTraS [HotCloud 09,TODS 13] Cloud SQL Server [ICDE 11] RelaFonalCloud [CIDR 11] Google F1 (SIGMOD 12, VLDB 13) G-Store [SoCC 10] MegaStore [CIDR 11] ecstore [VLDB 10] 15

Data Fission Basic building- block: Data ParFFoning (Table level è Distributed TransacFons) Three Example Systems ElasTraS (UCSB) SQL Azure (MSR) RelaFonal Cloud (MIT) 16

Schema Level ParFFoning Pre- defined parffoning scheme eg: Tree schema ElasTras, SQLAzure, F1 Workload driven parffoning scheme eg: Schism in RelaFonalCloud 17

ElasTraS Architecture TM Master Health and Load Management OTM OTM Lease Management OTM Metadata Manager Master and MM Proxies Txn Manager P 1 P 2 P n DB Partitions Durable Writes Log Manager Distributed Fault- tolerant Storage 18

Data Fusion Key value: Atomicity guarantee on single keys Combining the individual key- value pairs into larger granules of transacfonal access Megastore: StaFcally defined by the apps G- Store: Dynamically defined by the apps 19

G- Store: TransacFons on Groups Without distributed transacfons Grouping Protocol Key Group Ownership of keys at a single node One key selected as the leader Followers transfer ownership of keys to leader 20

ElasFcity A database system built over a pay- per- use infrastructure Infrastructure as a Service for instance Scale up and down system size on demand UFlize peaks and troughs in load Minimize operafng cost while ensuring good performance 21

ElasFcity in the Database Layer DBMS 22

ElasFcity in the Database Layer Capacity expansion to deal with high load Guarantee good performance DBMS 23

ElasFcity in the Database Layer Consolida0on during periods of low load Cost Minimiza0on DBMS 24

Live Database MigraFon No prior work on Database migra0on State- of- the- art use VM migrafon [Clark et al, NSDI 2005], [Liu et al, HPDC 2009] Requires execufng DB- in- VM High performance overhead Poor performance and consolidafon rafo [Curino et al, CIDR 2011] 25

Shared Disk Architecture: Albatross DBMS Node Cached DB State No failed request vs 100s of failed requests (naïve solufon) Much shorter unavailability window (naïve solufon) 15-30% increase in latency vs 200-400% latency increase (naïve solufon) TransacFon State Cached DB State TransacFon State Tenant/DB Partition Source Tenant/DB Partition Destination Persistent Image 26

Shared- Nothing Architecture: Zephyr 50-100 failed requests vs 1000s of failed requests (naïve solufon) No unavailability window vs 5-8 Freeze Index Structures seconds (naïve solufon) 20% increase in latency vs 15% latency increase (naïve solufon) Source Destination 27

X 1 X 2 X d x 1 x 11 x 12 x 1d x 2 x 21 x 22 x 2d x n x n1 x n2 x nd BIG DATA ANALYTICS (QCRI): LEARNING 28

Learning: Model Filng ApplicaFon Model Ω Dataset = ObservaFons 29

Machine Learning in One Slide (x i, y i = +1) (x i, y i = +1) y=wx +b (x j, y j =- 1) (x j,y j = - 1) w = min w n=10 12 i=1 (x i, y i, w)+ Γ(w) dimension 10 9 find a model that best separates: + and - Big Data Selng w = min w n i=1 (x i, y i, w)+ Γ(w) Loss funcfon RegularizaFon 30

31

Gradient Descent: Sequential Computation Repeat until convergence { read (θ 0, θ 1,, θ m ); Compute gradient; } write (θ 0, θ 1,, θ m );

Scaling Machine Learning Algorithms Leverage data and feature partitioning to parallelize the computations

Parallel Version Worker i (at iteration α): π 1 π 2 π d Worker j (at iteration α): read synchronization read (π 1, π 2,, π d ); Compute gradient projection; write synchronization write ( π i ); read synchronization read (π 1, π 2,, π d ); Compute gradient projection; write synchronization write ( π j );

Parallel Version Worker i (at iteration α): π 1 π 2 π d Worker j (at iteration α): i,j,k wk[πk][α 1] < ri[πj][α] read (π 1, π 2,, π d ); Compute gradient projection; i,j,k rk[πj][α] < wi[πi][α] write ( π i ); i,j,k wk[πk][α 1] < ri[πj][α] read (π 1, π 2,, π d ); Compute gradient projection; i,j,k rk[πj][α] < wi[πi][α] write ( π j );

Straggler or Last Reducer Problem π 2 π 1

Scaling ML Algorithms Current Approaches [eg, Parameter Server]: - - - - Allow synchronization violations albeit bounded Not equivalent to semantics of sequential executions Use function-centric arguments to demonstrate convergence Higher tolerance to imprecision (nature of ML) Process-based synchronization è data-centric approaches: - - Model read and write of parameter variables as database actions 2-phase locking during each iteration for fine-grained concurrency Unfortunately, does not work: - Need a new framework for data-centric synchronization!

Barrier Relaxation π 2 π 1

Data-centric View of Synchronization Read Constraint: Worker i can read π j in iteration α only after worker j has completed writes of iteration α 1 i,j wj[πj][α 1] < ri[πj][α] Write Constraint: Worker j can write π j in iteration α only after all the workers have read it i,j ri[πj][α] < wj[πj][α]

40

41

Data ParFFoning Feature Par11oning Fully Distributed A (1) A 11 A 12 A (2) A 1 A 2 A 3 A 21 A 22 A (m) Row- Oriented Column- Oriented Mixed

Work-in-progress 1 Characterization of sequentially consistent executions 2 Data-centric constraints è sequentially consistent 3 Process-centric synchronization è sequentially consistent 4 Qualitative analysis of different classes of executions 5 Develop protocols that enforce data-centric constraints 6 Experimental evaluation

X 1 X 2 X d x 1 x 11 x 12 x 1d x 2 x 21 x 22 x 2d x n x n1 x n2 x nd BIG DATA PRAGMATICS: DATA- CENTERS, DATA PIPELINES AND MULTI- HOMING (GOOGLE) 44

The Scalability Challenge: DropBox Case Study 45

Datacenter is a new substrate: Why? Dis- aggregafon (& virtualizafon) of resources: Processing elements Storage elements è The classical model of CPU + Disk is not tenable Processing Storage 46

Resource Dis- aggregafon? At odds with the cloud compufng model (scalability and elasfcity) Scaling- out & Scaling- in At odds with the uflity model (fault tolerance) Failures 47

ArchitecFng DBMSs in Datacenters DBMS DBMS Controller DBMS Controller DBMS Controller Server DBMS Controller DBMS DBMS Controller Controller DB DBMS Worker Controller Pool Distributed Shared Memory DBMS DBMS Controller Controller DB DBMS Worker Controller Pool Distributed Storage Layer 48

Notes on the Architecture Network and I/O latency mifgafon: Batching or pipelining of data accesses Leverage parallelism at the distributed storage layer Query execufon plans: An addifonal degree- of- freedom (underlying resource plaxorm is dynamic, eg, 10 vs 100 machines) Data replicafon: Block level vs DBMS level? 49

HIGH- VOLUME TRANSACTION PROCESSING 50

Internet Backend Architecture Internet Users (Current) AdverFsers + Publishers Ads Queries, Ad Clicks Ads Serving Accounts Campaigns Budget ReporFng Billing Analysis User InteracFons DBMS Persistent Storage: 1 AdverFser Data 2 Publisher Data 3 Ad StaFsFcal Data Event Logs Log Joining, Aggrega0on, and Injec0on 51

Internet Back- end Architecture Internet Users (Postulated) AdverFsers + Publishers Ads Queries, Ad Clicks Ads Serving Accounts Campaigns Budget ReporFng Billing Analysis User InteracFons μtransac0ons Persistent Storage: 1 AdverFser Data 2 Publisher Data 3 Ad StaFsFcal Data Event Logs Log Joining, Aggrega0on, and Injec0on 52

MULTI- HOMING (GEO- REPLICATION) 53

Cross- datacenter ReplicaFon Cloud applica0on Cloud applica0on Cloud Cloud applica0on applica0on Cross- datacenter ReplicaFon (Spanner) Distributed Storage Layer 54

Cross- Datacenter ReplicaFon? Current state: Google s technologies: MegaStore, Spanner, and PaxosDB Typically, passive replicafon of ALL data Sustainable approach: CriFcal data should be based on synchronous techniques Most data, especially applicafon data, should be updated using acfve replicafon (ie, by execufng operafons redundantly at each datacenter) Why? Fault- deterrence (by execufng acfons redundantly) OperaFon latency not dependent on a single master (rather fastest quorum) è Cross- datacenter latencies: 100s of milliseconds 55

Cross- datacenter ReplicaFon: Cloud applica0on Cloud applica0on Cloud Cloud applica0on applica0on Distributed Logs Logs from other Datacenters Distributed Log Service Distributed Log Storage Replicated to other Datacenters 56

Big Data Pragmas Debunk Single Machine è Datacenter is a computer ComputaFon is already disaggregated DisaggregaFon of storage resources: Disk storage: local disk assumpfon is seriously flawed Flash storage: will be integral in the storage hierarchy Main memory: likely to meet the same fate (local vs remote) Networking: Intra- datacenter latencies < 05ms (big opportunity) Inter- datacenter latency sfll remains a challenge (need innova0on) 57

Concluding Remarks Cloud CompuFng Challenge: Scalability, Reliability, and ElasFcity Re- architecfng DBMS technology Big Data Analysis and Learning: Scaling IteraFve ComputaFon over Big Data DBMS- like plaxorm for Machine Learning Big Data PragmaFcs: Complex Data Processing Pipelines MulF- homing and Geo- replicafon 58