Understanding Hadoop lusters and the Network Part 1. Introduction and Overview Brad Hedlund http://bradhedlund.com http://www.linkedin.com/in/bradhedlund @bradhedlund
Hadoop Server Roles lients Distributed Data nalytics Map Reduce Distributed Data Storage HDFS Job Tracker Secondary masters & Task Tracker & Task Tracker & Task Tracker & Task Tracker & Task Tracker & Task Tracker slaves
Hadoop luster World Job Tracker Secondary NN lient Rack 1 Rack 2 Rack 3 Rack 4 Rack N
Typical Workflow Load data into the cluster (HDFS writes) nalyze the data (Map Reduce) Store results in the cluster (HDFS writes) Read the results from the cluster (HDFS reads) Sample Scenario: How many times did our customers type the word Fraud into emails sent to customer service? Huge file containing all emails sent to customer service
Blk Blk B Blk I want to write Blocks,B, of Writing files to HDFS lient OK. Write to s 1,5,6 1 5 6 N Blk Blk B Blk lient consults lient writes block directly to one s replicates block ycle repeats for next block
Hadoop Rack wareness Why? 1 B 2 B 3 5 Rack 1 5 6 7 8 Rack 5 9 B 10 11 12 Rack 9 Rack aware Rack 1: 1 2 3 Rack 5: 5 6 7 metadata = Blk : DN1, DN5, DN6 Blk B: DN7, DN1, DN2 Blk : DN5, DN8,DN9 Never loose all data if entire rack fails Keep bulky flows in-rack when possible ssumption that in-rack is higher bandwidth, lower latency
Blk Blk B Blk I want to write Block Preparing HDFS writes Ready s 5,6 lient Ready! OK. Write to s 1,5,6 Rack aware Rack 1: 1 1 5 Rack 5: 5 6 Ready 6 Ready? 6 Rack 1 Rack 5 Ready! picks two nodes in the same rack, one node in a different rack Data protection Locality for M/R
Pipelined Write Blk Blk B Blk s 1 & 2 pass data along as its received TP 50010 lient 1 5 6 Rack aware Rack 1: 1 Rack 5: 5 6 Rack 1 Rack 5
Pipelined Write Blk Blk B Blk lient Success Blk : DN1, DN2, DN3 Block received Rack 1: 1 1 2 Rack 5: 2 3 3 Rack 1 Rack 5
Multi-block Replication Pipeline Blk Blk B Blk lient 1TB File = 3TB storage 3TB network traffic Blk 1 Blk X Blk 2 Blk B Blk B Y Blk Blk 3 Blk W Blk B Z Rack 1 Rack 4 Rack 5
lient writes Span the HDFS luster lient Rack 1 Rack 2 Rack 3 Rack 4 Rack N Factors: Block size File Size More blocks = Wider spread
writes span itself, and other racks B B B Rack 1 Rack 2 Rack 3 Rack 4 Rack N Results.txt Blk Blk B Blk
wesome! Thanks. metadata File system DN1:, DN2:, DN3:, =, I have blocks:, I m alive! 1 2 3 N sends Heartbeats Every 10 th heartbeat is a Block report builds metadata from Block reports TP every 3 seconds If is down, HDFS is down
Re-replicating missing replicas Uh Oh! Missing replicas metadata DN1:, DN2:, DN3:, Rack wareness Rack1: DN1, DN2 Rack5: DN3, Rack9: DN8 opy blocks, to Node 8 1 2 3 8 Missing Heartbeats signify lost Nodes consults metadata, finds affected data consults Rack wareness script tells a to re-replicate
Secondary File system metadata =, Secondary Its been an hour, give me your metadata Not a hot standby for the onnects to every hour* Housekeeping, backup of metadata Saved metadata can rebuild a failed
lient reading files from HDFS Tell me the block locations of Results.txt Blk = 1,5,6 Blk B = 8,1,2 Blk = 5,8,9 lient 1 B 5 8 B metadata Results.txt= Blk : DN1, DN5, DN6 2 B 6 9 Blk B: DN7, DN1, DN2 Blk : DN5, DN8,DN9 Rack 1 Rack 5 Rack 9 lient receives list for each block lient picks first for each block lient reads blocks sequentially
reading files from HDFS Tell me the locations of Block of Block = 1,5,6 1 B 2 B 3 5 6 8 B 9 Rack aware Rack 1: 1 2 3 Rack 5: 5 metadata = Blk : DN1, DN5, DN6 Blk B: DN7, DN1, DN2 Blk : DN5, DN8,DN9 Rack 1 Rack 5 Rack 9 provides rack local Nodes first Leverage in-rack bandwidth, single hop
Data Processing: Map How many times does Fraud appear in? lient Job Tracker ount Fraud in Block Map Task Map Task Map Task 1 5 9 B Fraud = 3 Fraud = 0 Fraud = 11 Map: Run this computation on your local data Job Tracker delivers Java code to Nodes with local data
What if data isn t local? How many times does Fraud appear in? lient Job Tracker ount Fraud in Block I need block 1 no Map tasks left 2 Map Task Map Task 5 9 B Fraud = 0 Fraud = 11 Rack 1 Rack 5 Rack 9 Job Tracker tries to select Node in same rack as data rack awareness
Data Processing: Reduce lient Job Tracker Sum Fraud Results.txt Fraud = 14 X Y Z HDFS Reduce Task 3 Fraud = 0 Map Task Map Task Map Task 1 5 9 B Reduce: Run this computation across Map results Map Tasks deliver output data over the network Reduce Task data output written to and read from HDFS
Unbalanced luster NEW NEW **I was assigned a Map Task but don t have the block. Guess I need to get it. Rack 1 Rack 2 New Rack New Rack *I m bored! Hadoop prefers local processing if possible New servers underutilized for Map Reduce, HDFS* Might see more network bandwidth, slower job times**
luster Balancing NEW NEW Rack 1 Rack 2 New Rack New Rack brad@cloudera-1:~$hadoop balancer Balancer utility (if used) runs in the background Does not interfere with Map Reduce or HDFS Default speed limit 1 MB/s
Thanks! Narrated at: http://bradhedlund.com/?p=3108