Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

Size: px

Start display at page:

Download "Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing"

Antony Foster
8 years ago
Views:

1 Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers,

2 Motivation There is a need to couple data sources, HPC, analytics! 20+ applications identified at STREAM16 Challenges: Data applications and pipelines are complex Scalability and Elasticity: dynamic changes in resource demands Scheduling and provisioning of resources: right amount of resources at right time Programming models: HPC (MPI, OpenMP, GPU) vs. Big Data (Java, Python, R) Interoperability: Data sources sinks often in different environments (IoT, cloud, HPC, HPDC) than compute Current State: Streaming (in sciences) often implemented on application-level (w/ limited re-use) Manifold landscape of streaming tools (Apache Open Source Tools, Cloud Tools)

Scheduling and provisioning of resources: right amount of resources at right time Programming models: HPC (MPI, OpenMP, GPU) vs.

3 Workload Characteristics HPC Resource HPC Resource 1 HPC Resource 2 Simulation Analysis Simulation Analysis

4 Workload Characteristics HPC Resource 1 Simulation Message Broker HPC Resource 2 HPC Resource 3 Analysis 1 Analysis 2

5 Introduction Pilot Abstraction User Space User Application Pilot-Job System Pilot-Job Pilot-Job Policies Resource Manager System Space Resource A Resource B Resource C Resource D

Policies Resource Manager System Space Resource A

6 The Convergence of HPC and Data Intensive Computing MPI Frameworks for Advanced Analytics & Machine Learning (Blas, ScaLAPACK, CompLearn, PetSc, Blast) Applications Orchestration (Pegasus, Taverna, Dryad, Swift) Advanced Analytics & Machine Learning (Pilot-KMeans, Replica Exchange) MapReduce Frameworks (Pilot-MapReduce) Declarative Languages (Swift) Workload Management (Pilots, Condor) Higher-Level Workload Management (TEZ, LLama) Applications Advanced Analytics & Machine Learning (Mahout, R, MLBase) SQL-Engines (Impala, Hive, Shark, Phoenix) Data Store & Processing (HBase) Scheduler Orchestration (Oozie, Pig) In-Memory (Spark) Spark Scheduler MapReduce Map Reduce Scheduler Twister MapReduce Twister Scheduler Data Processing, Analytics, Orchestration Higher-Level Runtime Environment MPI, RDMA Hadoop Shuffle/Reduction, HARP Collectives Communication Cluster Resource Manager (Slurm, Torque, SGE) Compute Resources (Nodes, Cores, VMs) Data Access (Virtual Filesystem, GridFTP, SSH) Storage Management (irods, SRM, GFFS) Storage Resources (Lustre, GPFS) Cluster Resource Manager (YARN, Mesos) Compute and Data Resources (Nodes, Cores, HDFS) Resource Management Resource Fabric High-Performance Computing Apache Hadoop Big Data A Tale of Two Data-Intensive Paradigms: Data Intensive Applications, Abstractions and Architectures In collaboration with Geoffrey Fox (Indiana),

Higher-Level Workload Management (TEZ, LLama) Applications Advanced Analytics & Machine Learning (Mahout, R, MLBase) SQL-Engines (Impala, Hive, Shark, Phoenix) Data Store & Processing (HBase)

7 Pilot-Abstraction for HPC and Hadoop Interoperability Map Reduce Spark- App Other YARN App Hadoop/Spark App HPC App (e.g. MPI) Application YARN Pilot-Job Spark Hadoop Application Scheduler (e.g. Spark, Tez, LLama) Pilot-Job Application-level Scheduling HPC Scheduler (Slurm, Torque, SGE) Mode I: Hadoop on HPC YARN/HDFS Mode II: HPC on Hadoop System-level Scheduling

8 Streaming and Batch Computing Data Broker Broker Broker Streaming Framework Compute (e.g. YARN, SLURM, Torque, PBS) ETL Hadoop SQL Storage and Format (e.g. Lustre, HDFS, ) Raw Text HDF5 Columnar Mutable/ Random Access Machine Learning Other Questions: - How to manage batch and streaming frameworks side-byside? - How to enable interoperability between different programming system/models/middleware/schedu lers? - How to enable elasticity? Message Broker Storage Stream Processing

9 Pilot-Streaming Distributed Application SAGA Local/ Parallel FS (SSH/GO) HPC Node Node n n Pilot Agent SSH SSH Pilot Compute GFFS HTC (OSG/EGI) Pilot API Cloud Pilot Data Cloud YARN SSH irods Globus Online Cloud HDFS Kafka Local (irods) SRM (irods) Node Node n n Pilot Agent SSH SSH Local / EBS (SSH) S3 (HTTP) EC2 Node VM Node n n Pilot Agent SSH SSH Hadoop HDFS (WebHDFS) YARN Node Node n n Pilot Agent SSH SSH Infrastructure User-Space

Kafka Local (irods) SRM (irods) Node Node n n Pilot Agent SSH SSH Local / EBS (SSH) S3 (HTTP) EC2 Node VM

10 Conclusion 1. Pilot-Jobs enable the co-location of HPC/Simulations and Big Data Tools (Hadoop, Spark, higher-level tools) 2. Pilot-Streaming will support message-broker as data source/sink that enables the de-coupling of applications 3. Dynamic resource management provided by the Pilot- Abstraction is critical for stream environments

11 Thank you!

ASU NGCC MISSION. Establishment of a Global Center for Interdisciplinary Research, discovery and development by 2020.

ASU NGCC MISSION. Establishment of a Global Center for Interdisciplinary Research, discovery and development by 2020. ASU NGCC MISSION Choreography of a diverse collection of physical and logical capabilities that perform as an integrated whole; where infrastructure resources represent a combination of local instantiations