Google Bing Daytona Microsoft Research

Transcription

1

2 Google Bing Daytona Microsoft Research

3

4 Raise your hand Great, you can help answer questions ;-) Sit with these people during lunch...

5

6 An increased number and variety of data sources that generate large quantities of data Sensors (e.g. location, acoustical, ) Web 2.0 (e.g. twitter, wikis, ) Web clicks Realization that data was too valuable to delete Dramatic decline in the cost of hardware, especially storage If storage was still $00/GB there would be no big data revolution underway

7 Old guard Use a parallel database system ebay 0PB on 256 nodes Young turks Use a NoSQL system Facebook - 20PB on 2700 nodes Bing 50PB on 40K nodes Just a subset of the data these companies manage

8 NOSQL What s in the name...

9 NO to SQL It s not about saying that SQL should never be used, or that SQL is dead...

10 NOT Only SQL It s about recognizing that for some problems other storage solutions are better suited! Would have been more intuitive if named NO JOINS

11 More data model flexibility JSON as a data model, simple grammar, maps to ds, No schema first requirement Relaxed consistency models such as eventual consistency Willing to trade consistency for availability and speed See Sebastian Burckhardt s talk(s) Low upfront software costs Never learned anything but C/Java in school Hate declarative languages like SQL Faster time to insight from data acquisition

12 SQL: Data Arrives Cleanse the data Transform the data Load the data Derive a schema 2 RDBMS SQL Queries 6 NoSQL: Data Arrives No cleansing! No ETL! No load! NOSQL System Application Program 2 Analyze data where it lands!

13 Key/Value Stores Examples: MongoDB, CouchBase, Cassandra, Windows Azure, Flexible data model such as JSON Records sharded across nodes in a cluster by hashing on key Single record retrievals based on key Hadoop Scalable fault tolerant framework for storing and processing MASSIVE data sets Typically no data model Records stored in distributed file system

14 Relational Databases vs. Hadoop? (what s the future?) Relational databases and Hadoop are designed to meet different needs 4

15 Structured & Unstructured Relational DB Systems Structured data w/ known schema ACID Transactions SQL Rigid Consistency Model ETL Longer time to insight Maturity, stability, efficiency NoSQL Systems (Un)(Semi)structured data w/o schema No ACID No transactions No SQL Eventual consistency No ETL Faster time to insight Flexibility

16 2003 MR/GFS 2006 Hadoop Requirements: Store Hadoop = HDFS + MapReduce Massive amounts of click stream data that had to be stored and analyzed Store Scalable to PBs and 000s of nodes Highly fault tolerant Simple to program against Process GFS + MapReduce distributed $ fault-tolerant Process new programming paradigm

17 Scalability and a high degree of fault tolerance Ability to quickly analyze massive collections of records without forcing data to first be modeled, cleansed, and loaded Easy to use programming paradigm for writing and executing analysis programs that scale to 000s of nodes and PBs of data Low, up front software and hardware costs Think data warehousing for Big Data

18 Zookeeper Avro (Serialization) HDFS distributed, fault tolerant file system MapReduce framework for writing/executing distributed, fault tolerant algorithms Hive & Pig SQL-like declarative languages Sqoop package for moving data between HDFS and relational DB systems + Others BI Reporting ETL Tools RDBMS Hive & Pig Map/ Reduce HBase Sqoop HDFS

19 Underpinnings of the entire Hadoop ecosystem HDFS design goals: Scalable to 000s of nodes Assume failures (hardware and software) are common Targeted towards small numbers of very large files Write once, read multiple times Traditional hierarchical file organization of directories and files Highly portable Hive & Pig Map/ Reduce HDFS Sqoop

20 Large File Let s color-code them 6440MB Block Block 2 Block 3 Block 4 Block 5 Block 6 Block 00 Block 0 64MB 64MB 64MB 64MB 64MB 64MB 64MB 40MB e.g., Block Size = 64MB Files are composed of set of blocks Typically 64MB in size Each block is stored as a separate file in the local file system (e.g. NTFS)

21 Block Block 2 Block 3 Block 2 Block 2 Block 3 Block Block 3 Block Default placement policy: Node Node 2 e.g., Replication factor = 3 Node 3 Node 4 Node 5 First copy is written to the node creating the file (write affinity) Second copy is written to a data node within the same rack (to minimize cross-rack network traffic) Third copy is written to a data node in a different rack (to tolerate switch failures)

22 NameNode one instance per cluster Responsible for filesystem metadata operations on cluster, replication and locations of file blocks NameNode Master Backup Node responsible for backup of NameNode CheckpointNode or BackupNode (backups) DataNodes one instance on each node of the cluster Responsible for storage of file blocks Serving actual file data to client DataNode DataNode DataNode DataNode Slaves

23 NameNode BackupNode namespace backups (heartbeat, balancing, replication, etc.) DataNode DataNode DataNode DataNode DataNode nodes write to local disk

24 Giant File HDFS Client {node2, {node, node3, node4, node2, node 4} 5} 3} (based on replication factor ) (3 by default) Name node tells client where to store each block of the file Client transfers block directly to assigned data nodes NameNode BackupNode and so on DataNode DataNode DataNode DataNode DataNode

25 Giant File HDFS Client return locations of blocks of file stream blocks from data nodes NameNode BackupNode DataNode DataNode DataNode DataNode DataNode

26 HDFS was designed with the expectation that failures (both hardware and software) would occur frequently Failure types: Disk errors and failures DataNode failures Switch/Rack failures NameNode failures Datacenter failures DataNode NameNode

27 NameNode BackupNode Blocks are NameNode auto-replicated detects DataNode on remaining loss nodes to satisfy replication factor DataNode DataNode DataNode DataNode DataNode

28 NameNode BackupNode NameNode Blocks detects are re-balanced new DataNode and is added re-distributed to cluster DataNode DataNode DataNode DataNode DataNode

29 Highly scalable 000s of nodes and massive (00s of TB) files Large block sizes to maximize sequential I/O performance No use of mirroring or RAID. Reduce cost Use one mechanism (triply replicated blocks) to deal with a wide variety of failure types rather than multiple different mechanisms Negatives Why? Block locations and record placement is invisible to higher level software Makes it impossible to employ many optimizations successfully employed by parallel DB systems

30 Zookeeper Avro (Serialization) HDFS distributed, fault tolerant file system MapReduce framework for writing/executing distributed, fault tolerant algorithms Hive & Pig SQL-like declarative languages Sqoop package for moving data between HDFS and relational DB systems + Others BI Reporting Hive & Pig ETL Tools Map/ Reduce RDBMS Sqoop HDFS

31 Programming framework (library and runtime) for analyzing data sets stored in HDFS MapReduce jobs are composed of two functions: map() reduce() sub-divide & conquer combine & reduce cardinality User only writes the Map and Reduce functions MR framework provides all the glue and coordinates the execution of the Map and Reduce jobs on the cluster. Fault tolerant Scalable

32 REDUCE MAP Essentially, it s. Take a large problem and divide it into sub-problems 2. Perform the same function on all sub-problems DoWork() DoWork() DoWork() 3. Combine the output from all sub-problems Output

33 DataNode <key i, value i > Mapper <key A, value a > <key B, value b > <key C, value c > <key A, list(value a, value b, value c, )> Reducer DataNode DataNode <key i, value i > <key i, value i > Mapper Mapper <key A, value a > <key B, value b > <key C, value c > <key A, value a > <key B, value b > <key C, value c > Sort and group by key <key B, list(value a, value b, value c, )> Reducer Output <key C, list(value a, value b, value c, )> <key i, value i > Mapper <key A, value a > <key B, value b > <key C, value c > Reducer

34 - Coordinates all M/R tasks & events - Manages job queues and scheduling - Maintains and Controls TaskTrackers - Moves/restarts map/reduce tasks if needed - Uses checkpointing to combat single points of failure Master hadoop-namenode JobTracker NameNode JobTracker controls and heartbeats TaskTracker nodes MapReduce Layer HDFS Layer Execute individual map and reduce tasks as assigned by JobTracker MapReduce (in separate JVM) Layer HDFS Layer Slaves hadoopdatanode hadoopdatanode2 hadoopdatanode3 hadoopdatanode4 TaskTracker TaskTracker TaskTracker TaskTracker TaskTrackers store temp data DataNode Temporary DataNode data stored to DataNode local file system DataNode

35 MR Client Submit jobs to JobTracker JobTracker jobs get queued map() s are assigned to TaskTrackers (HDFS DataNode reduce phase locality begins aware) mappers spawned in separate JVM and execute TaskTracker TaskTracker TaskTracker TaskTracker Mapper Mapper Mapper Mapper Reducer Reducer Reducer Reducer mappers store temp results Temporary data stored to local file system

36 DataNode DataNode2 DataNode3 Get sum sales grouped by zipcode (custid, zipcode, amount) $75 Blocks of the Sales file in HDFS $ $ $ $ $ $ $ $ $ $65 $ $ $ $ $ $5 Mapper Group By $ $ $ $ $ $ $5 One output bucket per reduce task $ $ $5 Mapper $ $25 Group By 025 $5 025 $ $ $ $5 Map tasks 4433 $ $25

37 Mapper Shuffle Mapper Reducer $ $ $ $ $ $5 Sort $ $5 SUM $ $ $ $55 Reducer $ $ $ $ $95 Sort 0025 $ $ $ $25 SUM 0025 $ $ $ $ $55 Reduce tasks 4433 $55 Reducer 025 $5 025 $ $ $ $ $ $5 025 $5 Sort 025 $5 025 $ $ $22 SUM 025 $ $97

38 Actual number of Map tasks M is generally made much larger than the number of nodes used. Why? Helps deal with data skew and failure Example: Say M = 0,000 and W = 00 (W is number of Map workers) Each worker does (0,000 / 00) = 00 Map tasks If it suffers from skew or fails the uncompleted work can easily be shifted to another worker Skew with Reducers is still a problem Example: In a query = get sales by zipcodes, some zipcodes (e.g. 0025) may have many more sales records than others

39 Like HDFS, MapReduce framework designed to be highly fault tolerant Worker (Map or Reduce) failures Detected by periodic Master pings Map or Reduce jobs that fail are reset and then given to a different node If a node failure occurs after the Map job has completed, the job is redone and all Reduce jobs are notified Master failure (Hadoop 2..0-beta 8/5/203) If the master fails there is now failover

40 Highly fault tolerant Relatively easy to write arbitrary distributed computations over very large amounts of data MR framework removes burden of dealing with failures from programmer Schema embedded in application code A lack of shared schema Makes sharing data between applications difficult Makes lots of DBMS goodies such as indices, integrity constraints, views, impossible No declarative query language

41 Pace of cloud-scale data processing available in the open source community is accelerating Workflow Management Chubby Azkaban ZooKeeper Big Top Data Analytics Data Processing Protoco l Buffers Pig Dremel & Pregel Giraph FlumeJava Crunch Storage and File Systems BigTable Voldemort Spanner GFS MapReduce Colossus

42 HBase saw a sharp rise in it s code base due, in large part, to contributions from Cloudera, Twitter, and Facebook. 570K Lines of code Lines of Code & 50K 00+ Adopters # of Adopters 5+ New Committers Cumulative Committers Note: Number of adopters is non-exhaustive Represents one committer

43 Increased integration with Hadoop helped adoption of Cassandra grow from 20+ to 68+ companies between 20 and K 20K Lines of code Lines of Code 68+ Adopters & # of Adopters 50K 75K 20+ New Committers Cumulative Committers Note: Number of adopters is non-exhaustive Represents one committer

44 Apache Mahout is driving a new era of machine learning analytics. Lines of Code 50K 250K 300K 26+ Adopters Lines of code & # of Adopters 0K 50K New Committers Cumulative Committers Represents one contributor

45 At LinkedIn in product development and operations. People You May Know What it takes 20B New daily relationships Who s Viewed My Profile 82 Hadoop jobs 6TB ~5 Intermediate data Weekly test algorithms Career Explorer

46 46