Big Data Technology CS , Technion, Spring 2014

Size: px

Start display at page:

Download "Big Data Technology CS 236620, Technion, Spring 2014"

Gyles George
10 years ago
Views:

1 Big Data Technology CS , Technion, Spring 2014 System Design Principles Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa

2 Data = Systems We need to Move, Store and Process data

3 Big Data = Big Systems

4 How to Get the Big Systems Right? A multidisciplinary science on its own right Distributed Computing, Networking Hardware and Software Architecture Operations Research, Measurement, Performance Evaluation Power Management and even Civil Engineering In this course - aspects related to Computer Science We ll start with some principles And see how they manifest in real systems

5 An Ideal System Should 1. Scale

6 Keeping up With the Growth

7 Partitioning = Parallelism = Scalability

8 Architect s Dream - Throughput How many requests can be served in a unit of time?

9 Architect s Dream - Latency How long does a single request take?

10 Scaling Up? Scaling Out? Scale up Scale out

11 Example: Network Filesystems Monolithic (e.g., historical NFS) NFS server server:/a/b/z.txt Distributed (e.g., Hadoop FS) Data service (datanode) /users/bob/courses/cs101.txt <server_123, block 20> R/W request Metadata service (namenode)

12 Scale- Out Philosophy Scalability through Decoupling Whatever is split can be scaled independently HDFS: Metadata and Data accesses decoupled Minimize centralized processing Metadata accesses coordinated but lean Maximize I/O parallelism Clients access the data nodes concurrently

13 The Peer- to- Peer Approach Completely server-less All nodes and functions are fully symmetric E.g., in a distributed data store every node has a serving function and a management function Less favored in managed DC environments Very hard to maintain consistency guarantees Very hard to optimize globally Lightweight centralized critical services prevail

14 An Ideal System Should 2. Be Resilient

15 Protecting the Critical Services

16 Resilience = Redundancy

17 The Tail at Scale Problems are aggravated in large systems Component-level variability amplified by scale Failures and slow components are part of normal life, not an exception Two ways of addressing service variability Prevent bad things from happening by detecting and isolating the slow/flawed components Contain bad things through redundancy Hedged/tied requests, speculative task execution

18 Redundancy Means Synchronization

19 An Ideal System Should 3. Be designed for the right goal

20 Expected Workload Matters Latency-oriented Interactive, user-facing systems Example: Web search serving Throughput-oriented Back-end heavyweights Example: Web search indexing

21 Data Accessibility Matters vs Stream Warehouse

22 Access Patterns Matter Data Analytics Throughput-oriented applications Write-once (typically, append) Read-many (typically, large sequential reads) Online Transaction Processing (OLTP) Latency-oriented applications Write-intensive Typically, many small direct accesses Huge gray area in between

23 Hardware Constraints Matter

24 Compute- or Data-Intensive? Compute Storage

25 Locality Matters Can computation and storage be aligned? Optimization? How repetitive is the workload? Optimization? Power-law distribution Dominant Items Pr( x > X ) ~ X α Long tail

26 Consistency MaZers Stricter properties = stronger consistency Are you prepared to handle weird stuff? Fancy stock alerts Is it okay to lose an event once in a while? Fancy a social network Bob deletes photos with his ex-date Alice Bob befriends Carol Can Carol observe these events in reverse order?

27 A Dialogue in the Wild Engineer: we afraid of any kind of synchronization Scientist: what kind of guarantee do you want to get? Engineer: let s build something simple Relax your consistency models We want the systems to be eventually consistent Scientist: this is an interesting problem Are you really sure this is what you want to get?

28 Example: Amazon s Outage Weak consistency models can lead to data loss

29 Services Over the Network

30 Elasticity Matters Resource demands often unknown in advance Driven by application popularity Goal: enablement of organic growth Add- (and pay-) as-you-grow Economies of scale Pool multiple datasets and services in huge DC s Better use of shared resources (personnel, real estate, electricity, network, compute and storage)

31 Cloud Computing Computing resources delivered over a network Infrastructure issues abstracted away ***-as-a-service SaaS, PaaS, IaaS,

32 A Word on Data Center Management

33 Designing the Air Flows Source: 42u Consulting

34 Power Efficiency - Surprising Facts At Facebook's Prineville, OR, facility, ambient air flows into the building, passing first through a series of filters to remove bugs, dust, and other contaminants. Previous estimates suggested that electricity consumption in massive server farms would double between 2005 and Instead, the number rose by 56% worldwide, and merely 36% in the US. The most efficient data centers now hover at temperatures closer to 80 degrees Fahrenheit, and instead of sweaters, the technicians walk around in shorts.

35 Summary Design for scale Design for fault-tolerance Know what you design for Be aware of the environment

36 Further Reading Lessons of Scale at Facebook Redesigning the Data Center (CACM)

Big Data Technology Core Hadoop: HDFS-YARN Internals

Big Data Technology Core Hadoop: HDFS-YARN Internals Eshcar Hillel Yahoo! Ronny Lempel Outbrain *Based on slides by Edward Bortnikov & Ronny Lempel Roadmap Previous class Map-Reduce Motivation This class