OpenPOWER Software Stack with Big Data Example March 2014

Similar documents
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Introducing the IBM Software Development Kit for PowerLinux

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Data Center and Cloud Computing Market Landscape and Challenges

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Data Centric Systems (DCS)

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Infrastructure Matters: POWER8 vs. Xeon x86

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Multi-core Programming System Overview

Implement Hadoop jobs to extract business value from large and varied data sets

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

The Virtualization Practice

HPC Wales Skills Academy Course Catalogue 2015

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

Linux Performance Optimizations for Big Data Environments

High Performance or Cycle Accuracy?

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Le langage OCaml et la programmation des GPU

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Introduction to GPU Programming Languages

Android Development: a System Perspective. Javier Orensanz

The red hat enterprise linux developer program

Take full advantage of IBM s IDEs for end- to- end mobile development

Building Applications Using Micro Focus COBOL

Eddy Integrated Development Environment, LemonIDE for Embedded Software System Development

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Multi-/Many-core Modeling at Freescale

Chapter 2 System Structures

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach

Manjrasoft Market Oriented Cloud Computing Platform

Parallel Computing for Data Science

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Enhanced Project Management for Embedded C/C++ Programming using Software Components

Bringing Big Data Modelling into the Hands of Domain Experts

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

An Easier Way for Cross-Platform Data Acquisition Application Development

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

U N C L A S S I F I E D

Effective Java Programming. efficient software development

Optimizing Application Performance with CUDA Profiling Tools

ODROID Multithreading in Android

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

GPU Tools Sandra Wienke

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

OpenACC 2.0 and the PGI Accelerator Compilers

Manjrasoft Market Oriented Cloud Computing Platform

COM 444 Cloud Computing

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

BSC vision on Big Data and extreme scale computing

Open Source for Cloud Infrastructure

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

I/O Considerations in Big Data Analytics

PHP vs. Java. In this paper, I am not discussing following two issues since each is currently hotly debated in various communities:

GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS

Reminders. Lab opens from today. Many students want to use the extra I/O pins on

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

HIGH PERFORMANCE BIG DATA ANALYTICS

File S1: Supplementary Information of CloudDOE

Data Centers and Cloud Computing

ORACLE MOBILE APPLICATION FRAMEWORK DATA SHEET

ITG Software Engineering

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

Amazon EC2 Product Details Page 1 of 5

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Virtual Machines.

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

IBM i25 Trends & Directions

Parallel Computing with MATLAB

System Structures. Services Interface Structure

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

Big Data Analytics - Accelerated. stream-horizon.com

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Packet-based Network Traffic Monitoring and Analysis with GPUs

A survey on platforms for big data analytics

Introduction to Cloud Computing

Enabling Technologies for Distributed and Cloud Computing

Example of Standard API

Chapter 3 Operating-System Structures

Lecture 1: an introduction to CUDA

<Insert Picture Here> Java, the language for the future

GPU Parallel Computing Architecture and CUDA Programming Model

Best of breed Multi Architecture Cloud

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Distributed Computing and Big Data: Hadoop and MapReduce

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Transcription:

OpenPOWER Software Stack with Big Data Example March 2014

Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise, investment, and server-class intellectual property to serve the evolving needs of customers. Opening the architecture to give the industry the ability to innovate across the full Hardware and Software stack Simplify system design with alternative architecture Includes SOC design, Bus Specifications, Reference Designs, FW OS and Hypervisor Open Source Little Endian Linux to ease the migration of software to POWER Driving an expansion of enterprise class Hardware and Software stack for the data center Building a complete ecosystem to provide customers with the flexibility to build servers best suited to the Power architecture

Giving ecosystem partners a license to innovate OpenPOWER will enable data centers to rethink their approach to technology. Member companies may use POWER for custom open servers and components for Linux based cloud data centers. OpenPOWER ecosystem partners can optimize the interactions of server building blocks microprocessors, networking, I/O & other components to tune performance. How will the OpenPOWER Foundation benefit clients? OpenPOWER technology creates greater choice for customers Open and collaborative development model on the Power platform will create more opportunity for innovation New innovators will broaden the capability and value of the Power platform What does this mean to the industry? Game changer on the competitive landscape of the server industry Will enable and drive innovation in the industry Provide more choice in the industry Platinum Members

PCIe Proposed Ecosystem Enablement Power Open Source Software Stack Components Software Cloud Software Standard Operating Environment (System Mgmt) Operating System / KVM Firmware XCAT OpenPOWER Firmware Existing Open Source Software Communitie s New OSS Community System Operating Environment Software Stack A modern development environment is emerging based on this type of tools and services Hardware OpenPOWER Technology Multiple Options to Design with POWER Technology Within OpenPOWER POWER8 Framework to Integrate System IP on Chip Industry IP License Model Hardware CAPP CAPI over PCIe Standard POWER Products 2014 Customizable Custom POWER SoC Future

Software Stack

OpenPOWER Enables Hybrid Cloud OpenPOWER SW Stack Commonality Linux to provide commonality for: Operating system management Operating system features Application programming model Little Endian to provide source code and data commonality KVM to provide virtualization management and feature commonality Firmware interfaces to provide platform management commonality Migrating Software to OpenPOWER Software written in interpreted languages (Javascript, PHP, Perl, Python, Ruby, Java, etc.) Generally, no work is required. Software written in compiled languages (C/C++, Fortran, etc.) Generally, this requires just a simple recompile for POWER. Rarely, dependencies on specific behaviors can require source code modification: Multi-threaded applications that don t use standard synchronization models and depend upon specific memory ordering behavior (unusual) Applications that depend upon specific memory page sizes (rare)

Development for PowerLinux 7 x86 Platform Architect & Develop Eclipse-based PowerLinux SDK Advanced toolchain Gnu tools, LLVM, Gold linker IBM JVM, OpenJDK Edit, Compile, & Debug SDK Source Code Advisor SDK integration of oprofile, Helgrind, Valgrind, FDPR, CPI breakdown, etc. Profile & Tune SDK support for dynamic languages SDK Migration Advisor Power-Linux support for SOE stacks Open source development infrastructure support Migrate / Publish SDK remote deployment control SDK Power emulator for x86 HW Access (Local/ Remote) Profile & Tune Dev/Test Cloud Eval Boards POWER System HW

LE Toolchain & Development Ecosystem Build Tools GCC (+binutils+glibc) toolchain (AT native and cross) config.guess, config.sub, M4, automake, autoconf, libtool, pkg-config, Gcov, rpm, Perf, SystemTap Profiling and Tracing Tools Oprofile (operf, ocount, perf-events), Valgrind (memcheck, massiff, cachegrind, hellgrind, itrace) PowerLinux Multi-core Runtime TBB, CBB, userrcu, SPHDE, tcmalloc, Boost,... IBM Tools (binary only) FDPR (bprober) Pthread_mon (sync_mon) Mambo (simulator) Eclipse SDK Tools PPC64LE JVM (prereq - binary only) PPC64LE Eclipse IES: Native code + CDT, PTP, LTP,... PPC64LE Eclipse PowerLinux plugins: MA, SCA, CPI, TA,...

CUDA / GPU / POWER Technology Kit Technology Kit Available in 3Q2014 CUDA toolkit and runtime for POWER Linux for POWER NVIDIA Tesla GPUs POWER platform AIX Kernel Contact Richard Talbot, IBM Email rdtalbot <at> us.ibm.com

Java GPU Acceleration

Java GPU Acceleration on OpenPOWER IBM and NVIDIA partnering to bring GPU acceleration to mainstream Java workloads and enterprise environments on OpenPOWER Evolve the Java support to enable both transparent Java acceleration and Java programmer enablement to program explicitly for GPUs from Java Key compute-intensive algorithms in Java standard libraries and domain-specific libraries to be accelerated with GPU First step was to build a GPU control and runtime framework in Java Allow Java programmers to do data transfers and launch GPU kernels Gain experience in APIs and GPU programming models applied to Java Demo! 11 OpenPOWER Foundation 2014

Java GPU Framework Overview Exploring Java equivalents of CUDA concepts Reuse concepts defined by CUDA in Java Improve usability with familiar Java idioms Key mappings from Java to CUDA CudaDevice a CUDA-capable device CudaBuffer a region of device memory CudaException for when something goes wrong CudaModule code loaded onto a device CudaFunction a kernel entry point CudaKernel for launching a device function CudaEvent for timing or inter-stream synchronization CudaStream a CUDA stream Key Benefits Java write once, run anywhere extended to include GPU code Using exceptions removes the tedium of error checking and avoids problems of omissions Provides a foundation for building more general class libraries and reusing existing GPU code 12 OpenPOWER Foundation 2014

Java GPU Framework Device Management // how many devices do we have? int devicecount = CudaDevice.getCount(); // create a handle for the first device CudaDevice device = new CudaDevice(0); // how many multiprocessors does the device have? int smcount = device.getattribute( CudaDevice.ATTRIBUTE_MULTIPROCESSOR_COUNT); // add a Runnable callback (to the default stream) device.addcallback(action); // wait for a device to complete queued work device.synchronize(); Prototype API shown here is subject to change or withdrawal without notice, and represents goals and objectives only. 13 OpenPOWER Foundation 2014

Java GPU Framework Memory Management // allocate memory on a device CudaBuffer buffer = new CudaBuffer(device, bytecount); // copy data from Java array to device memory buffer.copyfrom(array, startindex, endindex); // copy data to Java array from device memory buffer.copyto(array, startindex, endindex); // clear device memory buffer.fillbyte(0, bytecount); // release device memory buffer.close(); 14 OpenPOWER Foundation 2014

Java GPU Framework Launching Kernels // load a module onto a device CudaModule module = new CudaModule(device, content); // locate a kernel function CudaKernel kernel = new CudaKernel(module, kernelname); // define grid size CudaGrid grid = new CudaGrid(gridDim, blockdim); // build bundle of launch arguments CudaParameters args = new CudaParameters(2); args.set(0, buffer); args.set(1, n); // launch kernel kernel.launch(grid, args); 15 OpenPOWER Foundation 2014

Java GPU Framework Events // create a new event CudaEvent event = new CudaEvent(); // queue an event device.record(event); stream.record(event); // wait until an event has occurred event.synchronize(); // check whether an event has occurred int errorcode = event.query(); // query the time between events float millis = event.elapsedtimesince(priorevent); // destroy an event event.close(); 16 OpenPOWER Foundation 2014

Java GPU Framework Streams // create a new stream CudaStream stream = new CudaStream(device); // make stream wait for an event stream.waitfor(event); // wait until all items have been processed stream.synchronize(); // check whether all items have been processed int errorcode = stream.query(); // destroy a stream stream.close(); 17 OpenPOWER Foundation 2014

Automatic Resource Management Relevant classes implement AutoCloseable to enable automatic release via Java try-with-resources statement Manual memory management with fewer errors int[] data =... try (CudaBuffer buffer = new CudaBuffer(device, data.length * 4)) { // copy from Java to device buffer.copyfrom(data, 0, data.length); // launch kernels, etc. // copy from device to Java buffer.copyto(data, 0, data.length); } // buffer.close() called automatically 18 OpenPOWER Foundation 2014

GTC Demo Motivation Large global retailers collect petabytes of data from customer transactions They want to identify customers with similar behavior, so they can create customized products and more effective marketing programs K-Means is a well-known algorithm for performing clustering analysis that can identify these segments Businesses that react quickly to changes in market segments will have an advantage: we want a faster implementation Apache Mahout is an open-source Java implementation which is applicable to Big Data because it is built on top of Apache Hadoop Can we do better using GPUs? 19 OpenPOWER Foundation 2014

GTC Demo K-Means Algorithm Overview Input consists of: D the number dimensions in each data point K the number of clusters sought a set of N sample data points a set of K points which are approximate representatives of the clusters The output is an improved set of cluster representatives The process iterates until the set of representatives stabilizes 20 OpenPOWER Foundation 2014

GTC Demo K-Means Map-Reduce Structure Each point is logically mapped to a matrix with K rows and 2D+1 columns (the table below assumes D=2) The non-zero row corresponds to the closest input cluster center Count X Y X^2 Y^2 0 0 0 0 0 0 0 0 0 1 x y x^2 y^2 0 0 0 0 0 0 0 0 0 The reduction phase computes the simple sum of these matrices. From this, the output centers and radii can be derived 21 OpenPOWER Foundation 2014

GTC Demo K-Means Map-Reduce Structure (continued) The map computation can be parallelized (at a finer granularity than managed by Hadoop); a new mapper class was created Batches of points are transferred to a GPU for mapping A partial reduction is returned to the host A single Mahout class was modified only to allow a system property to select the (new) mapper class 22 OpenPOWER Foundation 2014

GTC Demo Performance Comparison NVIDIA K40 GPU Speed-up factor vs. modern CPU (higher is better) Baseline is standard Mahout implementation Comparison excludes Hadoop framework and I/O D=1 D=2 D=4 D=8 D=16 D=32 D=64 K=2 197 120 85 56 38 29 20 K=4 183 119 92 61 40 27 19 K=8 286 134 104 66 42 26 18 K=16 400 160 117 77 44 24 17 K=32 529 261 155 92 49 22 22 K=64 661 388 201 113 55 34 24 Measurements completed in a controlled environment. Results may vary by customer based on individual workload, configuration and software levels. 23 OpenPOWER Foundation 2014

GTC Demo Clustering of Big Data 24 OpenPOWER Foundation 2014

What s Next? Java 8 platform adding lambda support Concise syntax for writing small snippets of code used in bulk operations (kernels) Perfect for GPUs, parallel operation and high core count machines Standard class libraries will leverage GPUs where it makes sense Bulk operations (map, reduce) on Java collections New parallel operations defined as part of a standard JCP process Java platform needs to evolve for best performance optimal layout and control of memory needed for highest performance IBM PackedObjects, Oracle value types evolving for future Java platform definitions Demo in booth #921 in the Exhibit Hall Related talks: NVIDIA compiling Java kernels to PTX as a proof of concept see their talks S4830 CUDA 6 and Beyond S4939 Accelerating Java on GPUs 25 OpenPOWER Foundation 2014

Questions?