Extended Attributes and Transparent Encryption in Apache Hadoop Uma Maheswara Rao G Yi Liu ( 刘 轶 )
Who we are? Uma Maheswara Rao G - umamahesh@apache.org - Software Engineer at Intel - PMC/committer, Apache Hadoop - PMC/committer, Apache BookKeeper Yi Liu ( 刘 轶 ) - yliu@apache.org - Software Engineer at Intel - Active committer, Apache Hadoop - PMC/committer, Apache Tajo - Senior security expert of Big data
Intel BigData Team Global team, local focus Worldwide (China, US and India) teams, >80% in China Local collaborations (industry & academic) a high priority Greater impact thru open source Active open source development (Spark, Hadoop, HBase, Storm, etc.) Widely used in the industry (from Facebook to Alibaba to Cloudera to China Mobile ) Strong influence in the open source community ~10 project committers in the team Technology and innovation oriented Next generations of Big Data Technologies Real-time, in-memory, complex analytics (statistic modeling, machine learning, graph analysis, ) Bridging advanced research and real-world applications
Agenda Extended Attributes Transparent Encryption
ZooKeeper HBase HADOOP Ecosystem Batch Processing MAPREDUCE, HIVE, PIG Search SQL Stream SPARK Machine Learning YARN (Resource Management) HDFS (Hadoop Distributed File System) DATA INTEGRATION (Sqoop, Flume )
HDFS Extended Attributes HDFS-2006
Introduction Allows user to associate additional metadata with files/directories Extended Attributes(Xattrs) can be set as Key-Value pair on any INode XAttrs will not be interpreted by File System Derived from Linux XAttrs feature, so it is functionally similar Allows user to set custom encoding format to XAttrs
Namespaces of XAttrs XAttrs should be prefixed with namespace HDFS support 5 XAttrs namespaces USER Access permission defined by file/directory permission bits For Sticky directories, only owner and privileged users can write TRUSTED Only visible and accessed by privileged users SYSTEM Not visible to users Only available for System kernel SECURITY Not visible to users Only available for System kernel for storing security information RAW They are like SYSTEM attributes, but they can be accessed the files/directories under./reserved/raw by the super users only.
Implementation details XAttrs implemented as separate INode feature in Namenode XAttrs will be persisted as part of INode information XAttrs will be validated against the Namespaces at the Namenode No compatibility issues. Upgrades automatically handled as Xattrs stored as Inode feature. XAttrs development was tracked under HDFS-2006
Configuration dfs.namenode.xattrs.enabled Whether the support of XAttrs is enabled in HDFS dfs.namenode.fs-limits.max-xattrs-per-inode Max number of XAttrs per Inode. Default 32 dfs.namenode.fs-limits.max-xattr-size Max combined size of name and value of XAttrs. Default 16384 bytes
Use Cases Storing the Encrypted Data Encryption Keys as XAttrs in HDFS Encrypted cluster environment Storing policy for Heterogeneous Storage Release HDFS-2006 branch merged to Trunk and Branch-2 Feature released in hadoop-2.5.0
How to use? Java API Command line
Transparent Encryption in Hadoop (HADOOP-10150 & HDFS-6134)
Outlines Transparent to upper layer applications and transparent access to encrypted files by all HDFS clients. High performance, it s not bottleneck. Encryption is independent of the file type, data format. Scalable key management. End-to-end encryption: data can only be encrypted and decrypted by the client. This satisfies two typical requirements for encryption: at-rest encryption and in-transit encryption. Security: HDFS never handles unencrypted data or data encryption keys.
Write file 5. Encrypt data using DEK DFS Client 4. Decrypt EDEK and get DEK KMS Backing keystore Fill EDEK cache in background DN DN DN NN NN 2. EDEK from cache and persist to File metadata. HDFS
Read file 6. Decrypt data using DEK DFS Client 4. Decrypt EDEK and get DEK KMS Backing keystore DN DN DN NN NN 2. Read EDEK from File metadata. HDFS 16
Implementation details Pread support. Original file and Cipher file have the same length and 1:1 corresponding by using AES-CTR Use AES-NI support on Intel platform to improve encryption performance, 20x speedup. We define encryption zone and files are transparently encrypted/decrypted in the zone. We use two layer keys: encryption zone key (EZK), and data encryption key (DEK) which is encrypted by EZK. Each file has a different DEK. 17
Encryption/Decryption for HDFS Blocks 18
User Ops Create Key hadoop key create <keyname> [-cipher <cipher>] [-size <size>] [-description <description>] [-attr <attribute=value>] [-provider <provider>] Roll Key hadoop key roll <keyname> [-provider <provider>] Delete Key hadoop key delete <keyname> [-provider <provider>] List Keys hadoop key list [-provider <provider>] [-metadata] 19
Admin Ops Create Encryption Zone hdfs crypto -createzone -keyname <keyname> -path <path> List Encryption Zones hdfs crypto -listzones 20
Usage Example As a normal user, create a new encryption key: $ hadoop key create mykey As the super user, create a new empty directory and make it an encryption zone: $ sudo -u hdfs hadoop fs -mkdir /zone $ sudo -u hdfs hdfs crypto -createzone -keyname mykey -path /zone Change its ownership to the normal user: $ sudo -u hdfs hadoop fs -chown myuser:myuser /zone As the normal user, put a file in, read it out: $ hadoop fs -put helloworld /zone $ hadoop fs -cat /zone/helloworld 21
Release Fs-encryption branch merged to trunk and branch-2 Feature released in hadoop-2.6.0 22
Performance AES-NI enabled TestDFSIO Benchmark
Call for Collaborations Close collaborations with local ecosystems Intel Big Data engineering teams, industry partners and academic research Building next generations of Big Data Technologies Real-time, in-memory, complex analytics, etc. Bridging advanced research and real-world applications Highly impactful through open source, university research (e.g., UC Berkeley) and industry adoptions (e.g., Alibaba, Cloudera, etc.) 24
Q & A Thanks!
Notices and Disclaimers: Intel, the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Intel technologies may require enabled hardware, specific software, or services activation. Check with your system manufacturer or retailer. No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting from such losses. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from publish. 26