Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child



Similar documents
GPU ACCELERATED DATABASES Database Driven OpenCL Programming. Tim Child 3DMashUp CEO

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Parallel Programming Survey

Introduction to GPU hardware and to CUDA

Introduction to GPU Computing

Introduction to GPU Programming Languages


CSE 6040 Computing for Data Analytics: Methods and Tools

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Next Generation GPU Architecture Code-named Fermi

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

HPC Wales Skills Academy Course Catalogue 2015

Introduction to GPGPU. Tiziano Diamanti

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful:

GPUs for Scientific Computing

Case Study on Productivity and Performance of GPGPUs

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

Embedded Systems: map to FPGA, GPU, CPU?

OpenACC 2.0 and the PGI Accelerator Compilers

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Evaluation of CUDA Fortran for the CFD code Strukti

Multi-core Programming System Overview

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

OpenACC Basics Directive-based GPGPU Programming

Le langage OCaml et la programmation des GPU

Introduction to GPU Architecture

Parallel Algorithm Engineering

Multi-Threading Performance on Commodity Multi-Core Processors

COSCO 2015 Heterogeneous Computing Programming

Mitglied der Helmholtz-Gemeinschaft. OpenCL Basics. Parallel Computing on GPU and CPU. Willi Homberg. 23. März 2011

CUDA programming on NVIDIA GPUs

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Introduction to OpenCL Programming. Training Guide

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Crosswalk: build world class hybrid mobile apps

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

GPGPUs, CUDA and OpenCL

A quick tutorial on Intel's Xeon Phi Coprocessor

Computer Graphics Hardware An Overview

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

~ Greetings from WSU CAPPLab ~

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Bringing Big Data Modelling into the Hands of Domain Experts

Debugging with TotalView

HPC-related R&D in 863 Program

Lecture 1 Introduction to Android

BLM 413E - Parallel Programming Lecture 3

Lecture 3. Optimising OpenCL performance

Data-parallel Acceleration of PARSEC Black-Scholes Benchmark

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

Parallel Firewalls on General-Purpose Graphics Processing Units

GPU Acceleration of the SENSEI CFD Code Suite

Turbomachinery CFD on many-core platforms experiences and strategies

GPU Computing - CUDA

AMD APP SDK v2.8 FAQ. 1 General Questions

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Enabling Technologies for Distributed Computing

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

Java GPU Computing. Maarten Steur & Arjan Lamers

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Binary search tree with SIMD bandwidth optimization using SSE

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism

Enabling Technologies for Distributed and Cloud Computing

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

HIGH PERFORMANCE BIG DATA ANALYTICS

OpenCL for programming shared memory multicore CPUs

CUDA Programming. Week 4. Shared memory and register

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Accelerating sequential computer vision algorithms using OpenMP and OpenCL on commodity parallel hardware

Software: Systems and. Application Software. Software and Hardware. Types of Software. Software can represent 75% or more of the total cost of an IS.

OpenACC Programming and Best Practices Guide

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

SYCL for OpenCL. Andrew Richards, CEO Codeplay & Chair SYCL Working group GDC, March Copyright Khronos Group Page 1

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

GPU Parallel Computing Architecture and CUDA Programming Model

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Transcription:

Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc. VP Informix Leader at Illustra, Autodesk, Navteq, Intuit, 30+ years experience in 3D, CAD, GIS and DBMS

Terminology Term Description Procedure Language Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, ) GPU Graphics Processing Unit (highly specialized CPU for graphics) GPGPU General Purpose GPU (non-graphics programming on a GPU) CUDA Nvidia s GPU programming environment APU Accelerated Processing Unit (AMD s Hybrid CPU & GPU chip) ISO C99 Modern standard version of the C language OpenCL Open Compute Language OpenMP Open Multi-Processing (parallelizing compilers) SIMD Single Instruction Multiple Data (Vector instructions ) SSE x86, x64 (Intel, AMD) Streaming SIMD Extensions xpu Any Processing Unit device (CPU, GPU, APU) Kernel Functions that execute on a OpenCL Device Work Item Instance of a Kernel Workgroup A group of Work Items FLOP Floating Point Operation (single = SQL real type ) MIC Many Integrated Cores (Intel s 50+ x86 Core chip architecture)

Some Technology Trends Impacting DBMS Solid State Storage Reduced Access Time, Lower Power, Increasing in capacity Virtualization Server consolidation, Specialized VM s, lowers direct costs Cloud Computing EC2, Azure, lowers capital requirements Multi-Core 2,4,6,8, 12,. Lots of benefits to multi-threaded applications xpu (GPU/APU) GPU >1000 Cores > 1T FLOP /s @ 2500 APU = CPU + GPU Chip Hybrids due in Mid 2011 2 T FLOP /s for $2.10 per hour (AWS EC2) Intel MIC Knights Corner > 50 x86 Cores

Compute Intensive xpu Database Applications Bioinformatics Signal/Audio/Image Processing/Video Data Mining & Analytics Searching Sorting Spatial Selections and Joins Map/Reduce Scientific Computing Many Others

GPU vs CPU Vendor Architecture NVidia Fermi Cores 448 Simple ATI Radeon Evergreen 1600 Simple Intel Nehalem 4 Complex Transistors 3.1 B 2.15 B 731 M Clock 1.5 G Hz 851 M Hz 3 G Hz Peak Float Performance Peak Double Performance Memory Bandwidth Power Consumption SIMD / Vector Instructions 1500 G FLOP / s 750 G FLOP / s 2720 G FLOP / s 544 G FLOP / s 96 G FLOP / s 48 G FLOP / s ~ 190 G / s ~ 153 G / s ~ 30 G / s 250 W > 250 W 80 W Many Many SSE4+

Multi-Core Performance Source NVidia

Future (Mid 2011) APU Based PC APU (Accelerated Processing Unit) APU Chip CPU CPU ~20 GB/s System RAM North Bridge ~20 GB/s Embedded GPU PCIE ~12 GB/s APU s Adds an Embedded GPU Discrete GPU 150 GB/s Graphic RAM Source AMD

Scalar vs. SIMD Scalar Instruction C = A + B 1 + 2 3 = SIMD Instruction Vector C = Vector A + Vector B 1 3 5 7 + 2 4 6 8 = 3 7 11 15 OpenCL Vector lengths 2,4,8,16 for char, short, int, float, double

Summarizing xpu Trends Many more xpu Cores in our Future Compute Environment becoming Hybrid CPU and GPU s Need CPU to give access to GPU power GPU Capabilities Lots of cores Vector/SIMD Instructions Fast Memory GPU Futures Virtual Memory Multi-tasking / Pre-emption

Scaling PostgreSQL Queries on xpu s Multi-Core CPU Many Core GPU Threads Threads Threads Threads Threads Threads Threads Thread Thread Postgres Process Threads Threads Threads Thread Thread Threads Threads Threads Thread Thread Using More Transistors

Parallel Programming Systems Category CUDA OpenMP OpenCL Language C C, Fortran C Cross Platform X Standard Vendor OpenMP Khronos CPU X GPU X Clusters X X Compilation / Link Static Static Dynamic

What is OpenCL? OpenCL - Open Compute Language Subset of C 99 Open Specification Proposed by Apple Many Companies Collaborated on the Specification Portable, Device Agnostic Specification maintained by Khronos Group OpenCL as a PostgreSQL Procedural Language

System Overview DBMS Server Web Browser HTTP Web Server SQL Statement SQL Procedure TCP/IP PCIe X2 Bus App Server PostgreSQL GPGPU PostgreSQL Client TCP/IP Disk I/O Tables

A subset of ISO C99 OpenCL Language - But without some C99 features such as standard C99 headers, function pointers, recursion, variable length arrays, and bit fields A superset of ISO C99 with additions for: - Work-items and Workgroups - Vector types - Synchronization - Address space qualifiers Also includes a large set of built-in functions - Image manipulation - Work-item manipulation, - Specialized math routines, etc.

Components New PostgreSQL Procedural Language Language handler Maps arguments Calls function Returns results Language validator Creates Function with parameter & syntax checking Compiles Function to a Binary format New data types cl_double4, cl_double8,. System Admin Pseudo-Tables Platform, Device, Run-Time,

Admin

PGOpenCL Function Declaration CREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[]) AS $BODY$ #pragma PGOPENCL Platform : ATI Stream #pragma PGOPENCL Device : CPU kernel attribute ((reqd_work_group_size(64, 1, 1))) void VectorAdd( global const float *a, global const float *b, global float *c) { int i = get_global_id(0); } c[i] = a[i] + b[i]; $BODY$ Language ;

Execution Model Table A B Select Table to Array 100 s - 1000 s of Threads (Kernels) xpu VectorAdd(A, B) Returns C + = A B C Table Copy Copy Unnest Array To Table C C C C C C C C C C C C C

Table of Arrays Using Re-Shaped Tables 100 s - 1000 s of Threads (Kernels) A + B = C Table of Arrays A B A B Copy xpu VectorAdd(A, B) Returns C C C C C C C C C Copy

Today s GPGPU Challenges No Pre-emptive Multi-Tasking No Virtual Memory Limited Bandwidth to discrete GPGPU 1 8 G/s over PCIe Bus Hard to Program New Parallel Algorithms and constructs New C language dialect Immature Tools Compilers, IDE, Debuggers, Profilers - early years Data organization really matters Types, Structure, and Alignment SQL needs to Shape the Data Profiling and Debugging is not easy Solves Well for Problem Sets with the Right Shape!

Making a Problem Work for You Determine % Parallelism Possible for ( i = 0, i <, i++) for ( j = 0; j < ; j++ ) for ( k = 0; k < ; k++ ) Arrange data to fit available GPU RAM Ensure calculation time >> I/O transfer overhead Learn about Parallel Algorithms and the OpenCL language Learn new tools Carefully choose Data Types, Organization and Alignments Profile and Measure at Every Stage

System Requirements PostgreSQL 9.x For GPU s AMD ATI OpenCL Stream SDK 2.x NVidia CUDA 3.x SDK Recent Macs with O/S 11.6 For CPU s (Pentium M or more recent) AMD ATI OpenCL Stream SDK 2.x Intel OpenCL SDK Alpha Release (x86) Recent Macs with O/S 11.6

Today Prototype 1Q 2011 Beta PGOpenCL Status 2010 2011 Wish List Beta Testers Existing OpenCL App? Have a GPU App? Contributors Code server side functions? Sponsors & Supporters AMD Fusion Fund? Khronos?

Future Plans Increase Platform Support Scatter/Gather Functions Additional Type Support Image Types Sparse Matrices Run-Time Asynchronous Events Profiling Debugging

APU Chip Using the Whole Brain PgOpenCl CPU CPU Postgres North Bridge CPU PgOpenCl You can t be in a parallel universe with a single brain! ~20 GB/s Embedded GPU PgOpenCl PgOpenCl PgOpenCl PgOpenCl Heterogeneous Compute Environments CPU s, GPU s, APU s Expect 100 s 1000 s of cores The Future Is Parallel: What's a Programmer to Do?

Summarizing Supports Heterogeneous Parallel Compute Environments CPU s, GPU s, APU s OpenCL Portable and high-performance framework Ideal for computationally intensive algorithms Access to all compute resources (CPU, APU, GPU) Well-defined computation/memory model Efficient parallel programming language C99 with extensions for task and data parallelism Rich set of built-in functions Open standard for heterogeneous parallel computing Integrates PostgreSQL with OpenCL Provides Easy SQL Access to xpu s APU, CPU, GPGPU Integrates OpenCL SQL + Web Apps(PHP, Ruby, )

PGOpenCL Twitter @3DMashUp OpenCL www.khronos.org/opencl/ More Information www.amd.com/us/products/technologies/stream-technology/opencl/ http://software.intel.com/en-us/articles/intel-opencl-sdk http://www.nvidia.com/object/cuda_opencl_new.html http://developer.apple.com/technologies/mac/snowleopard/opencl.html

Q & A Using Parallel Applications? Benefits of OpenCL /? Want to Collaborate on?