Apache: Big Data Europe 2015 HDFS 2015: Past, Present, and Future 9/30/2015 NTT DATA Corporation Akira Ajisaka Copyright 2015 NTT DATA Corporation
Self introduction Akira Ajisaka (NTT DATA) Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has deployed and supported 10k+ nodes of Hadoop clusters overall for 7 years Contributing to Apache Hadoop 6th in the world with NTT [1] [1] The Activities of Apache Hadoop Community 2014 http://ajisakaa.blogspot.com/2015/02/the-activities-of-apache-hadoop.html Copyright 2015 NTT DATA Corporation 2
About Copyright 2015 NTT DATA Corporation 3 Similar to "YARN 2015" presentation by @tshooter HDFS is developed faster than YARN 1400 1200 Resolved issues in 2015 (cumulative) HDFS YARN 1000 800 600 400 200 0 1-Jan-15 1-Feb-15 1-Mar-15 1-Apr-15 1-May-15 1-Jun-15 1-Jul-15 1-Aug-15 1-Sep-15 Need a summary of HDFS new features
4 Agenda Past Present Future
Past Copyright 2015 NTT DATA Corporation 5
6 Past releases 2.X is the release branch 1.X and 0.23.X are no longer maintained 2009 2010 2011 2012 2013 2014 2015 0.20.1 0.20.205 1.0.0 1.1.0 1.2.1(stable) New append branch-1 (branch-0.20) 0.21.0 Security 0.22.0 NameNode Federation, YARN 0.23.11(final) 0.23.0 NameNode HA 2.1.0-beta 2.3.0 2.5.0 2.7.0 branch-2 2.0.0-alpha 2.2.0 (GA) 2.4.0 2.6.0 trunk
7 Hadoop 2.2 (2013-10-13) NameNode High-Availability No Single Point of Failure Federation Multiple NameNodes, multiple namespaces Improve scalability Snapshots Read only point-in-time copy (Copy on Write) NFSv3 mount
8 Hadoop 2.3 (2014-02-20) Heterogeneous Storages (Phase 1) In-memory caching Introduce memory-locality Make efficient use of memory in DNs DFSClient 1. Ask NN to cache a file NameNode File DataNode DISK Memory
9 Hadoop 2.3 (2014-02-20) Heterogeneous Storages (Phase 1) In-memory caching Introduce memory-locality Make efficient use of memory in DNs DFSClient NameNode File DataNode File 2. Ask DN to cache blocks DISK Memory
10 Hadoop 2.3 (2014-02-20) Heterogeneous Storages (Phase 1) In-memory caching Introduce memory-locality Make efficient use of memory in DNs File DFSClient DataNode File If cached locally, read directly from memory and skip checksum calculation DISK Memory
11 Hadoop 2.4 (2014-04-07) Rolling Upgrades No need to wait for hours ACLs More fine-grained permissions Similar to POSIX ACL -rw-rw-r-- 3 tester hadoop 129 2015-09-15 12:00 /user/tester/test.txt $ hdfs dfs -setfacl -m group:hive:rw- /user/tester/test.txt gives write permission to hive group
12 Hadoop 2.5 (2014-08-11) Extended Attributes (XAttrs) Similar to extended attributes in Linux -rw-r--r-- 3 tester hadoop 129 2015-09-15 12:00 /user/tester/test.txt Set XAttrs $ hdfs dfs -setfattr -n user.locale -v jp /user/tester/test.txt $ hdfs dfs -setfattr -n user.city -v tokyo /user/tester/test.txt Get XAttrs $ hdfs dfs -getfattr -d /user/tester/test.txt # file: /user/tester/test.txt user.locale="jp" user.city="tokyo" Currently used by transparent encryption
13 Hadoop 2.6 (2014-11-18) Hot swap volumes Recover from disk failures w/o stopping DNs Integrate Apache HTrace (incubating) Trace RPCs inside HDFS Time node 1 node 2 RPC Span A Span B trace id: 12345 parent: root trace id: 12345 parent: A Easy to find parent-child relations RPC RPC node 3 Span C Span D Finding bottlenecks becomes easier
14 Hadoop 2.6 (2014-11-18) (Cont.d) Heterogeneous Storages (Phase 2) Archival Storage Memory as storage tier Transparent Encryption
Heterogeneous Storages Problem SSD is getting cheaper Want to store hot data in SSD to achieve higher throughput Solution: Introduce storage type and block placement policy Storage: HDD, SSD, ARCHIVE,... Policy: One_SSD, HOT, WARM, COLD,... Example: A -> One_SSD, B -> HOT Hadoop 2.6 A SSD DN1 DISK SSD DN2 B DISK SSD DN3 DISK DISK Copyright 2015 NTT DATA Corporation B DISK A DISK DISK DISK A B DISK 15
16 Heterogeneous Storages How to use Configure HDFS to recognize storage type for each disk <parameter> <name>dfs.datanode.data.dir</name> <value>[ssd]file:///data/ssd,[hdd]file:///data/hdd</value> </parameter> Set block placement policy to HDFS path Reset policy after putting data is possible Mover will move blocks to satisfy the policy considering rack awareness Hadoop 2.6 $ hdfs setstoragepolicies -setstoragepolicy -path <path> -policy <policy>
17 Archival Storage DISK or ARCHIVE? ARCHIVE is for cold data Hadoop 2.6 ebay reduces cost/gb by 5x [1] Use low-spec DNs for ARCHIVE No need to split cluster! Regular Node Archival Node Drives 12 HDDs 60 HDDs CPU 32 Cores 4 Cores Memory 128GB 64GB Run NodeManager Yes No [1] Reduce Storage Costs by 5x Using The New HDFS Tierd Storage Feature http://www.slideshare.net/hadoop_summit/reduce-storage-costs-by-5x-using-the-new-hdfstiered-storage-feature
18 Transparent Encryption Problem Cannot guard data from OS-level attacks Hadoop 2.6 DataTransferProtocol can be encrypted Data DataNode NOT encrypted! Client Encrypted data DISK Data Solution Provide end-to-end encryption Encrypt/decrypt data transparently No need to rewrite user application
Transparent Encryption: How to encrypt data Copyright 2015 NTT DATA Corporation 19 DEK (Data Encryption Key) Hadoop 2.6 A unique key for each file in EZ (Encryption Zone) Stored in an Xattr of the file, encrypted (EDEK) Client 1. Create file in EZ 3. Store EDEK in metadata EDEK NameNode 2. Get EDEK Proxy to underlying key provider ACLs on per key basis Bundled with Hadoop package Key Management Server
Transparent Encryption: How to encrypt data Copyright 2015 NTT DATA Corporation 20 DEK (Data Encryption Key) Hadoop 2.6 A unique key for each file in EZ (Encryption Zone) Stored in an Xattr of the file, encrypted (EDEK) EDEK Client 4. EDEK returned EDEK NameNode 5. Call to decrypt EDEK to DEK Key Management Server
Transparent Encryption: How to encrypt data Copyright 2015 NTT DATA Corporation 21 DEK (Data Encryption Key) Hadoop 2.6 A unique key for each file in EZ (Encryption Zone) Stored in an Xattr of the file, encrypted (EDEK) DEK Client EDEK NameNode Encrypted data 6. Write encrypted data to DN using DEK DataNode Encrypted data Key Management Server
22 Transparent Encryption: Very low overhead Very low overhead Simple benchmark with 3 slaves (m3.xlarge, 4 core Xeon E5-2670 v2) Use AES-NI Encryption Off 1GB Teragen 17 sec 18 sec 1GB Terasort 47 sec 49 sec Encryption On Hadoop 2.6 Known issue Encryption is sometimes done incorrectly (HADOOP-11343) Recommend 2.7.1 or 2.6.1
Present Copyright 2015 NTT DATA Corporation 23
24 Hadoop 2.7 (2014-11-18) Quota per storage type Truncate API Files with variable-length blocks Web UI for NFS gateway NNTop: top-like tool for NameNode List top users for each operation Exposed via metric fsck -blockid option Print the file which the blockid belongs to Inotify
25 INotify for HDFS Problem Some components do caching Hive caches path names Impala caches block locations When to invalidate cache? Hadoop 2.7 Solution Introduce a tool similar to Linux inotify Client can monitor the events without parsing NN log or edits
26 INotify for HDFS: Technical Approach Client polls NameNode periodically Not push model Hadoop 2.7 1. Poll any events after #XX Client NameNode 2. Return events after #XX Caches the highest event number Known issue Truncate is not notified (HDFS-8742) Fixed in 2.8.0
Future Copyright 2015 NTT DATA Corporation 27
Many features are being developed 2.8 (not released) Support OAuth2 in WebHDFS RPC Congestion control Feature branches Erasure Coding (HDFS-7285) Ozone: Object store (HDFS-7240) BlockManager Scalability Improvements (HDFS-7836) HTTP/2 support for DataTransferProtocol (HDFS-7966) Implement an async pure c++ HDFS client (HDFS-8707) Copyright 2015 NTT DATA Corporation 28
29 RPC Congestion Control Problem NameNode RPC queue is FIFO DDoS can kill entire cluster Hadoop 2.8 while (true) { dfs.exists("/data"); } Don't do this! Solution Fair scheduling for RPC queue (2.6.0) Retriable exception with exponential backoff (2.8.0) Enable by default in 2.8
30 Erasure Coding Problem Reduce costs of storage Blocks are replicated to 3 DNs 3x storage overhead is costly Solution Use Erasure Code 3-replication (6,3)-Reed-Solomon Tolerates 2 failures 3 failures Disk Usage 3x 1.5x
31 Erasure Coding: Write files using (6,3)-Reed-Solomon Write data to 9 DNs in parallel ECClient 6 Data Blocks DN1 Incoming Data 3 Parity Blocks DN6 DN7 DN9
Erasure Coding: Read files Copyright 2015 NTT DATA Corporation 32 Read data from 6 DNs in parallel ECClient DN1 DN6 DN7 DN9
Erasure Coding: Read files when DN fails Copyright 2015 NTT DATA Corporation 33 Read data from (arbitrary) 6 DNs in parallel ECClient DN1 DN6 DN7 DN9
34 Erasure Coding: Current status Suitable for cold data No data locality Very low cost/gb with archival storage Now preparing for merge Follow on work Intel ISA-L support for faster encoding Support append/truncate/hflush/hsync More encoding schemas Pipeline error handling Support contiguous layout (HDFS EC Phase 2)
Summary Copyright 2015 NTT DATA Corporation 35 Many features are still in development I cannot predict when the feature will be available Recommend anyone who wants a feature to join contributing to it to make the development faster There are many ways to contribute Creating/Testing/Reviewing patches Reporting bugs Writing documents Discussing architecture design https://wiki.apache.org/hadoop/howtocontribute
Copyright 2011 NTT DATA Corporation Copyright 2015 NTT DATA Corporation
References Apache Hadoop Docs: http://hadoop.apache.org/docs/current/ In-memory caching (HDFS-4949) In-memory Caching in HDFS: Lower Latency, Same Grate Taste: http://www.slideshare.net/hadoop_summit/inmemory-caching-inhdfs-lower-latency-same-great-taste-33921794 Heterogeneous Storages (HDFS-5682) Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature: http://www.slideshare.net/hadoop_summit/reducestorage-costs-by-5x-using-the-new-hdfs-tiered-storage-feature Transparent Encryption (HDFS-6134) Transparent Encryption in HDFS: http://www.slideshare.net/hadoop_summit/transparentencryption-in-hdfs INotify (HDFS-6634) Keep Me in the Loop: Introducing HDFS Inotify: http://www.slideshare.net/hadoop_summit/keep-me-in-the-loopinotify-in-hdfs Copyright 2015 NTT DATA Corporation 37
References RPC congestion control (HADOOP-9640, HADOOP-10597, HDFS-8820) Improving HDFS Availability with Hadoop RPC Quality of Service: http://www.slideshare.net/mingma4/hadooprpcqoshadoopsummit2015 Erasure Coding (HDFS-7285) HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency: http://www.slideshare.net/hadoop_summit/hdfserasure-code-storage-same-reliability-at-better-storage-efficiency Copyright 2015 NTT DATA Corporation 38