COSCO 2015 Heterogeneous Computing Programming

Size: px

Start display at page:

Download "COSCO 2015 Heterogeneous Computing Programming"

Myra Carter
8 years ago
Views:

1 COSCO 2015 Heterogeneous Computing Programming Michael Meyer, Shunsuke Ishikuro Supporters: Kazuaki Sasamoto, Ryunosuke Murakami July 24th, 2015

2 Heterogeneous Computing Programming 1. Overview 2. Methodology 3. Example and Evaluation 4. Conclusion

3 Heterogeneous Computing Programming 1. Overview 2. Methodology 3. Example and Evaluation 4. Conclusion

4 Heterogeneous Architecture- What is it diverse in character or content. Different Types in one package Types of? Different Types of Cores Different Types of Processing(GPU and CPU) Different Functions(CPU and Memory) Different Communication Mediums(Optical and Electric and RF or sensors)

Different Types of Cores Different Types of Processing(GPU and CPU)

5 Heterogeneous Architecture- Different Types of Cores CELL Uses SPEs for Floating-Point calculations Power Processing Element is used for all other major functions

6 Heterogeneous Architecture- Different Types of Processing(GPU and CPU)

7 Heterogeneous Architecture- Different Functions(CPU and Memory)

8 Heterogeneous Architecture- Communication Mediums(Optical and Electric and RF or sensors)

9 Heterogeneous Computing Programming 1. Overview 2. Methodology 3. Example and Evaluation 4. Conclusion

10 2. Methodology Parallel Computing Hardware Software OpenCL s Approach Basic idea Programming Model Development Environment

11 Parallel Computing

12 Why Parallel? Serial A problem can be divided to small tasks Parallel

13 Hardware: Flynn s Taxonomy

14 Hardware: Memory Shared Memory Distributed Memory

15 Hardware: Accelerator Host GPU DSP FPGA

16 Software Sequential Task A Task B Task C Task D Sum Task A Task B Task C Task D Sum Task A Task B Task C Task D Sum Task A Task B Task C Task D Sum Multi Core Task A Task B Task C Task D Sum Task A Task B Task C Task D Sum Sum Task A Task B Task C Task D Sum Task A Task B Task C Task D Sum

Sum Multi Core Task A Task B Task C Task D Sum Task A Task B Task C Task

17 Software: 1. Analysis Task A Task B Task C Task D Sum Task A Task B Task C Task D Sum Task A Task B Task C Task D Sum Task A Task B Task C Task D Sum

18 Software: 1. Analysis Amdahl s Law

19 Software: 2. Algorithm Data Parallelism Task Parallelism Task A Task B Task C Task D Task A Task A Task A Task A Task A Task B Task C Task D Task B Task B Task B Task B Task A Task B Task C Task D Sum Task C Task C Task C Task C Task A Task B Task C Task D Task D Task D Task D Task D

20 Software: 3. Programming OS API pthread Framework OpenMP, CUDA, OpenCL

21 OpenCL s Approach

22 Basic Idea

23 Basic Idea OpenCL Device Host OpenCL Device OpenCL Device

24 Basic Idea OpenCL Device Host OpenCL Device OpenCL Device Common API Portable Optimization

25 OpenCL C, OpenCL runtime OpenCL C Language C/C++ OpenCL Runtime Library Host OpenCL Device OpenCL C Language OpenCL Device OpenCL C Language OpenCL Device

26 OpenCL Device OpenCL Device Compute Unit Processing Element Host

27 Programming Model OpenCL Device Command Queue Work Group #0 #1 #2 Work item #0 #1 #2 #3 #4

28 Memory Model OpenCL Device Global Memory Compute Unit Processing Element Local Memory Constant Memory Private Private Private Private Private

29 Comparison OpenMP vs OpenCL OpenMP: Multiprocessors OpenCL: Multiprocessors and Accelerators CUDA vs OpenCL CUDA: only for NVIDIA GPU OpenCL: Supporting AMD, Intel and NVIDIA GPU

30 Comparison HSA vs OpenCL HSA: Framework for hardware vendors OpenCL: Better development environment and materials

31 Development Environment Intel Intel Multicore Processor + Intel OpenCL SDK NVIDIA NVIDIA GPU + CUDA Apple Intel Mac + Xcode Altera Altera PCIe FPGA + Altera SDK For OpenCL

32 Heterogeneous Computing Programming 1. Overview 2. Methodology 3. Example and Evaluation 4. Conclusion

33 Hello World See the program list handout hello.cl: Kernel code which works on OpenCL Device hello.cpp: Host program which works on a host machine

34 hello.cl Run on OpenCL Device #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable kernel void hello( global char* string) { string[0] = 'H'; string[1] = 'e'; string[2] = 'l'; string[3] = 'l'; string[4] = 'o'; string[5] = ','; string[6] = ' '; string[7] = 'W'; string[8] = 'o'; string[9] = 'r'; string[10] = 'l'; string[11] = 'd'; string[12] = '!'; string[13] = ' 0'; }

35 hello.cpp FILE *fp; char filename[] = "./hello.cl"; char *source_str; size_t source_size; /* カーネルを含むソースコードをロード / Load kernel code */ fp = fopen(filename, "r");

36 hello.cpp /* プラットフォームデバイスの情報の取得 / Get device information */ ret = clgetplatformids(1, &platform_id, &ret_num_platforms); ret = clgetdeviceids( platform_id, CL_DEVICE_TYPE_DEFAULT, 1, &device_id, &ret_num_devices); /* OpenCLコンテキストの作成 / Create OpenCL Context */ context = clcreatecontext( NULL, 1, &device_id, NULL, NULL, &ret); There is no platform dependency

37 hello.cpp /* OpenCLカーネルを実行 / Execute an OpenCL Kernel */ ret = clenqueuetask(command_queue, kernel, 0, NULL,NULL); /* メモリバッファから結果を取得 / Get result data from memory buffer */ ret = clenqueuereadbuffer(command_queue, memobj, CL_TRUE, 0, MEM_SIZE * sizeof(char),string, 0, NULL, NULL); /* 結果の表示 / Display the result */ puts(string);

38 Hello World: Build and Run Build (NVIDIA) $ g++ I/usr/local/cuda/include o hello hello.cpp lopencl Run $./hello Hello World!

39 Image Processing Edge Filter

40 FFT: Fourier Transformation W = exp( 2πi n )

41 FFT: Inverse Fourier Transformation Inverse Trans

42 start Generate Twiddle Factors FFT Core FFT Core start Transpose Matrix FFT Core Filter Loop count < log 2 N Butterfly Calc FFT Core (Inverse) Transpose Matrix FFT Core (Inverse) Normalize if (Inverse) end end

43 FFT: Source Code See the program list handout fft.cl: Kernel code which works on OpenCL Device fft.cpp: Host program which works on a host machine

44 FFT: Evaluation Tesla C2050(NVIDIA) Number of workitems and execution time(ms) num of workitems membuf_write spinfactor bitreverse butterfly normalize highpassfilter membuf read

45 Conclusion Heterogeneous computing is one of parallel computing methods. Parallel computing needs knowledge of Hardware and software characteristics. OpenCL framework helps heterogeneous computing with portable API.

46 References 株式会社フィックスターズ, 改訂新版 OpenCL 入門, インプレスジャパン, Khronos OpenCL Working Group, OpenCL 詳説, カットシステム, Blaise Barney, Lawrence Livermore National Laboratory, Introduction to Parallel Computing, Wikipedia, Flynn's taxonomy, Dauger Research, Parallel Programming Paradigms, NCI NATIONAL FACILITY, MPI Applications Course Overview,

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.