Hadoop Applications on High Performance Computing Devaraj Kavali devaraj@apache.org
About Me Apache Hadoop Committer Yarn/MapReduce Contributor Senior Software Engineer @Intel Corporation 2
Agenda Objectives HDFS Applications with HPC File Systems Yarn Application Mapreduce Job HPC Schedulers Yarn Protocols Log Aggregation Shuffle Implementation Q&A 3
Objectives Use existing HPC Cluster for running Hadoop Applications Use any of the HPC File Systems like Lustre, PVFS, IBRIX Fusion, etc. Use any of the HPC schedulers like Slurm, Moab, PBS Pro, etc. Combine Hadoop workloads with HPC workloads No code changes to existing Hadoop(HDFS/YARN/MR) applications Minimal Hadoop configuration changes 4
HDFS Applications Using HPC File Systems public class <HPC>Adapter extends AbstractFileSystem{.. HDFS Application.. HPC FS } 5
Hadoop Configurations for File System <property> <name>fs.defaultfs</name> <value>${hpc-uri}:///</value> </property> <property> <name>fs.abstractfilesystem.${hpc-uri}.impl</name> <value>hpcfilesystemadapter</value> </property> 6
YARN Application Node Manager Container Container Client Resource Manager Node Manager App Master Container 7
YARN Application with HPC Scheduler HPC Scheduler Slave Container Container Client HPC Scheduler Master HPC Scheduler Slave App Master Container 8
Yarn Application Submission with HPC Scheduler Yarn Application 1. submitapplication() Yarn Client HPC 2. submit() Application 3. run 7. Launch Yarn Child HPC Scheduler Client Protocol Impl Container Yarn Child 4. Launch 5. Allocate App Master 6. Run Child Task HPC Application Master Protocol Impl Application Master HPC Container Management Protocol Impl 8. Report Progress AMRM Client NM Client 9
Mapreduce Job with HPC Scheduler 1. submit() LocalClientProtocol Provider Job Client Local Job Runner Yarn Client ApplicationC lientprotocol Proxy Resource Manager 2. submitjobinternal() YarnClientProtocol Provider 5. submit() 6. submit() Job Submitter 3. submitjob() YARN Runner 4. submit() ResourceMg rdelegate HPCApplicat ionclientprot ocolimpl 7. run() HPC Scheduler Job History Server 10
Yarn Protocols Configurations RPC class Configuration <property> <description>rpc class implementation</description> <name>yarn.ipc.rpc.class</name> <value>hadoopyarnhpcrpc</value> </property> 11
Yarn Protocols Configurations public class HadoopYarnHPCRPC extends HadoopYarnProtoRPC { @Override public Object getproxy(class protocol, InetSocketAddress address, Configuration conf) { Object proxy; if (protocol == ApplicationClientProtocol.class) { proxy = new HPCApplicationClientProtocolImpl(conf); } else if (protocol == ApplicationMasterProtocol.class) { proxy = new HPCApplicationMasterProtocolImpl(conf); } else if (protocol == ContainerManagementProtocol.class) { proxy = new HPCContainerManagementProtocolImpl(conf); } else { proxy = super.getproxy(protocol, address, conf); } return proxy; } 12
Application Client Protocol Yarn Application Client Protocol Flow Yarn Application Client RMClient RMApplicati onclientpro tocolproxy 1. getnewapplication () 2. submitapplication () 3. forcekillapplication () 4. getclustermetrics () 5. getclusternodes () 6. getqueueinfo ().. Resource Manager 13
Application Client Protocol Yarn Application Client HPC Scheduler Flow Yarn Application Client RMClient HPCApplicati onclientprot ocolimpl 1. Allocate Resources cmd 2. Submit Job(Batch) cmd 3. Cancel Job cmd 4. Cluster Info cmd 5. Get Jobs Report cmd.. HPC Scheduler 14
Application Client Protocol API s for interaction 1. getnewapplication() The interface used by clients to obtain a new ApplicationId for submitting new applications. 2. submitapplication() The interface used by clients to submit a new application to the ResourceManager. 3. forcekillapplication() The interface used by clients to request the ResourceManager to abort submitted application. 4. getclustermetrics() 5. getclusternodes() 6. getqueueinfo() 15
Application Master Protocol Yarn Application Master Flow Diagram RMClient Application Master RMApplicati onmasters erviceproto colproxy 1. registerapplicationmaster() 2. allocate() 3. finishapplicationmaster() Resource Manager 16
Application Master Protocol HPC Scheduler Application Master Flow Diagram NMClient Application Master HPCApplicati onmasterprot ocolimpl 1. Cluster Info Cmd 2. Resource Allocation Cmd 3. Finish Tasks Cmds HPC Scheduler 17
Application Master Protocol API s for interaction 1. registerapplicationmaster() The interface used by a new ApplicationMaster to register with the ResourceManager. 2. allocate() The main interface between an ApplicationMaster and the ResourceManager. 3. finishapplicationmaster() The interface used by an ApplicationMaster to notify the ResourceManager about its completion (success or failed). 18
Container Management Protocol Yarn Container Management Flow NMClient Application Master NMContain ermanagem entprotocol Proxy 1. startcontainers() 2. stopcontainers() 3. getcontainerstatuses() Node Manager 19
Container Management Protocol HPC Scheduler Task Management Flow NMClient Application Master HPCContai nermanage mentprotoc olimpl 1. Start Containers Cmd 2. Stop Containers Cmd 3. Get Container Statuses Cmd HPC Scheduler 20
Container Management Protocol API s for interaction 1. startcontainers() The ApplicationMaster provides a list of StartContainerRequest's to a NodeManager to start Container's allocated to it using this interface. 2. stopcontainers() The ApplicationMaster requests a NodeManager to stop a list of Container's allocated to it using this interface. 3. getcontainerstatuses() The API used by the ApplicationMaster to request for current statuses of Container's from the NodeManager. 21
Yarn Log Aggregation Log Aggregation by Node Manager <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> Log Aggregation with HPC Scheduler Issue an HPC scheduler command to execute in all nodes(where application tasks executed) as part of ApplicationMasterProtocol.finishApplicationMaster() for aggregating the application logs. 22
Shuffle Handling Hadoop Node 1 Map Task 1. Assign Task 3. Task Completed MRAppMaster 4. Get Map Completion Events 5. Map Completed Reduce Task EventFetcher Node Manager 2. Write Final Map O/P Local Dirs Shuffle Consumer 6. Read Map O/P 23
Shuffle Handling HPC File Systems Map Task 1. Assign Task 3. Task Completed MRAppMaster 4. Get Map Completion Events 5. Map Completed Reduce Task EventFetcher Shuffle Consumer 2. Write Final Map O/P 6. Read Map O/P Parallel File System 24
Shuffle Handling Shuffle Handler <property> <name>mapreduce.job.map.output.collector.class</name> <value>org.apache.hadoop.mapred.maptask$mapoutputbuffer</value> <description> The MapOutputCollector implementation(s) to use. This may be a comma-separated list of class names, in which case the map task will try to initialize each of the collectors in turn. The first to successfully initialize will be used. </description> </property> 25
Shuffle Handling Shuffle Consumer <property> <name>mapreduce.job.reduce.shuffle.consumer.plugin.class</name> <value>org.apache.hadoop.mapreduce.task.reduce.shuffle</value> <description> Name of the class whose instance will be used to send shuffle requests by reduce tasks of this job. The class must be an instance of org.apache.hadoop.mapred.shuffleconsumerplugin. </description> </property> 26
Summary HDFS configuration for new File System HPC Schedulers YARN Protocols M/R Shuffle Implementation Yarn Log Aggregation 27
Q & A 28
Thank You devaraj@apache.org 29
Notices and Disclaimers Copyright 2014 Intel Corporation. Intel, the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. See Trademarks on intel.com for full list of Intel trademarks. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel technologies may require enabled hardware, specific software, or services activation. Check with your system manufacturer or retailer. No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting from such losses. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from publish. 30
31