Towards OpenMP Support in LLVM



Similar documents
Intel Media SDK Library Distribution and Dispatching Process

Keys to node-level performance analysis and threading in HPC applications

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

The ROI from Optimizing Software Performance with Intel Parallel Studio XE

OpenMP* 4.0 for HPC in a Nutshell

Scaling up to Production

High Performance Computing and Big Data: The coming wave.

Intel Many Integrated Core Architecture: An Overview and Programming Models

Intel Platform and Big Data: Making big data work for you.

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

Intel Media Server Studio Professional Edition for Windows* Server

Parallel Computing. Parallel shared memory computing with OpenMP

Improve Fortran Code Quality with Static Analysis

Floating-point control in the Intel compiler and libraries or Why doesn t my application always give the expected answer?

Intel Cyber Security Briefing: Trends, Solutions, and Opportunities. Matthew Rosenquist, Cyber Security Strategist, Intel Corp

Large-Data Software Defined Visualization on CPUs

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

Implementation and Performance of AES-NI in CyaSSL. Embedded SSL

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism

The Foundation for Better Business Intelligence

Intel Perceptual Computing SDK My First C++ Application

Finding Performance and Power Issues on Android Systems. By Eric W Moore

Parallel Computing. Shared memory parallel programming with OpenMP

Multi-Threading Performance on Commodity Multi-Core Processors

Big Data for Big Science. Bernard Doering Business Development, EMEA Big Data Software

Overview

Recovery BIOS Update Instructions for Intel Desktop Boards

High Performance Computing

How To Write A Program In Anieme Frontend (For A Non-Programmable Language)

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

Intel SSD 520 Series Specification Update

Accomplish Optimal I/O Performance on SAS 9.3 with

Extended Attributes and Transparent Encryption in Apache Hadoop

Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP

Revealing the performance aspects in your code. Intel VTune Amplifier XE Generics. Rev.: Sep 1, 2013

Fast, Low-Overhead Encryption for Apache Hadoop*

Improve Fortran Code Quality with Static Security Analysis (SSA)

Intel Integrated Native Developer Experience (INDE): IDE Integration for Android*

Accelerating Business Intelligence with Large-Scale System Memory

XDB Intel System Debugger 2015 Overview Training. Robert Mueller-Albrecht, TCE, SSG DPD ECDL

Benchmarking Cloud Storage through a Standard Approach Wang, Yaguang Intel Corporation

How To Install An Intel System Studio 2015 For Windows* For Free On A Computer Or Mac Or Ipa (For Free)

Cloud-based Analytics and Map Reduce

Multicore Parallel Computing with OpenMP

Intel Desktop Board DP55WB

OpenMP C and C++ Application Program Interface

Parallel Programming Survey

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Intel Desktop Board DG43RK

SAP * Mobile Platform 3.0 Scaling on Intel Xeon Processor E5 v2 Family

Intel Desktop Board DG41WV

Scalability evaluation of barrier algorithms for OpenMP

Accelerating Business Intelligence with Large-Scale System Memory

An Introduction to Parallel Programming with OpenMP

Intel Media SDK Features in Microsoft Windows 7* Multi- Monitor Configurations on 2 nd Generation Intel Core Processor-Based Platforms

Floating Point C Compiler: Tips and Tricks Part I

Instructions for Recovery BIOS Update

The Transition to PCI Express* for Client SSDs

Pexip Speeds Videoconferencing with Intel Parallel Studio XE

BLM 413E - Parallel Programming Lecture 3

* * * Intel RealSense SDK Architecture

Cloud based Holdfast Electronic Sports Game Platform

Intel Cyber-Security Briefing: Trends, Solutions, and Opportunities

Intel Service Assurance Administrator. Product Overview

Intel Desktop Board DQ43AP

Intel Desktop Board DG33TL

Eliminate Memory Errors and Improve Program Stability

Programming the Intel Xeon Phi Coprocessor

A Superior Hardware Platform for Server Virtualization

Intel Desktop Board D945GCPE

A Crash course to (The) Bighouse

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service. Eddie Dong, Tao Hong, Xiaowei Yang

Intel Software Guard Extensions(Intel SGX) Carlos Rozas Intel Labs November 6, 2013

Device Management API for Windows* and Linux* Operating Systems

CLOUD SECURITY: Secure Your Infrastructure

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

RAID and Storage Options Available on Intel Server Boards and Systems

Intel Desktop Board DQ35JO

University of Amsterdam - SURFsara. High Performance Computing and Big Data Course

Intel Core TM i3 Processor Series Embedded Application Power Guideline Addendum

Oracle Solaris Studio Code Analyzer

Running Windows 8 on top of Android with KVM. 21 October Zhi Wang, Jun Nakajima, Jack Ren

Intel Desktop Board DG41BI

Intel Storage System SSR212CC Enclosure Management Software Installation Guide For Red Hat* Enterprise Linux

Ensure that the AMD APP SDK Samples package has been installed before proceeding.

Transcription:

Towards OpenMP Support in LLVM Alexey Bataev, Andrey Bokhanko, James Cownie Intel 1

Agenda What is the OpenMP * language? Who Can Benefit from the OpenMP language? OpenMP Language Support Early / Late Outlining History OpenMP Runtime OpenMP support in Clang * 2

What is the OpenMP Language? Industry-wide standard for shared memory multiprocessing programming Vendor-neutral, platform-neutral, portable, managed by an independent consortium Supports C, C++ and Fortran Implemented in GCC *, ICC, Open64 *, Visual C++ *, But not in Clang / LLVM *! Current version is 3.1 4.0 under development www.openmp.org #pragma omp parallel for for (i = 0; i < N; i++) {... } 3

USED Who Can Benefit from the OpenMP Language? Anyone who uses a multi-core processor Your phone almost certainly has more than 1 core! Must have for HPC Without OpenMP support, LLVM is at a disadvantage in this area Becomes a must have for power clients You can hardly find a desktop / notebook with a single core NOT USED 4

OpenMP Support Compile-time Run-time *.cpp with OpenMP C / C++ Front-End (clang) Back-End (llvm) a.out Three essential parts: Front-end Back-End Library OpenMP RTL Two approaches: Early / late outlining 5

Early / Late Outlining Parallel regions are put into separate routines To be executed in separate threads This can be done either in front-end or back-end float a,x,y,z; #pragma omp parallel for for (i = 0; i < N; i++) { a[i] = x * y * z;... // rest of loop } omp_parallel_for(0, N, N/omp_get_num_threads(), forb)... void forb(int L, int U, R *r) { for (i = L; i < U; i++) { r->a[i] = r->x * r->y * r->z;... // rest of loop } } 6

OpenMP in LLVM: A Brief History 2H 2012: Several proposals with late outlining From Intel, Hal Finkel, others All of them involve changes to LLVM IR and thus, require modifications of LLVM phases None of them got enough support in the community October 2012: OpenMP in Clang project Started by AMD *, continued by Intel Early outlining OpenMP RTL calls generated in Clang No changes to LLVM IR 7

OpenMP Support Compile-time Run-time *.cpp with OpenMP C / C++ Front-End (clang) Back-End (llvm) a.out Three essential parts: Front-end Back-End Library OpenMP RTL Two approaches: Early / late outlining 8

OpenMP Runtime Fortunately, there is libgomp Unfortunately, it is under GPLv3 * Copyleft license Clang / LLVM uses UoI / NCSA OSL * Permissive (aka BSD-style) free software license Permissively licensed free runtime library is needed 9

Intel OpenMP Runtime Intel OpenMP runtime was released in April with LLVM compatible 3-clause BSD license This is Intel s production runtime used by icc and ifort Continual development/tuning since before the OpenMP language existed (>15 years) Highly scalable (used on Intel Xeon Phi coprocessor with 244 threads, large SGI * and Bull * ccnuma SMP machines) 10

Intel OpenMP Runtime Supports OpenMP 3.1 (and parts of OpenMP 4.0 [work in progress]) ABI compatible with Intel Compilers (icc, icpc, ifort) GCC so gcc compiled code can be linked in without libgomp to avoid issues if there are multiple OpenMP runtimes in the same process Doxygen * documentation in the source Available from www.openmprtl.org 11

OpenMP Support Compile-time Run-time *.cpp with OpenMP C / C++ Front-End (clang) Back-End (llvm) a.out Three essential parts: Front-end Back-End Library OpenMP RTL Two approaches: Early / late outlining 12

OpenMP Support in Clang First approach is to represent OpenMP directives as C++11 attributes (Olaf Krzikalla, Nov. 2012) Currently may require two parsing passes May need to change code generation for standard statements Second approach is to use standard pragma parsing harness Declarative directive is represented as a special kind of declaration Executable directives and clauses are represented as a special kind of statements 13

Representation in AST Declarative Directives Variable OMPThreadPrivateDecl Variable 14

Representation in AST Executable Directives OMPClause OMPExecutableDirective OMPParallelDirective OMPExecutableDirective OMPClause OMPForDirective CapturedStmt 15

Representation in AST Clauses OMPClause OMPIfClause Expr OMPIfClause Variable OMPPrivateClause OMPPrivateClause Variable 16

Representation in AST Statements And Variables Statements are Structured Statements with protected regions A single statement for #pragma omp parallel One or more for-loops for #pragma omp for Statements are represented as CapturedStmt to capture local variables Special processing for threadprivate variables Private variables are constructed by default Shared variables are captured by reference Special processing for firstprivate, lastprivate, reduction variables 17

An Example #pragma omp parallel if(a) private(argc,b) foo(); -OMPParallelDirective <line:9:2, col:43> -OMPIfClause <col:22, col:27> `-ImplicitCastExpr <col:25> '_Bool' <IntegralToBoolean> `-ImplicitCastExpr <col:25> 'int' <LValueToRValue> `-DeclRefExpr <col:25> 'int' lvalue Var 'a' 'int' -OMPPrivateClause <col:28, col:43> -DeclRefExpr <col:36> 'int' lvalue ParmVar 'argc' 'int' `-DeclRefExpr <col:41> 'int' lvalue Var 'b' 'int' `-CapturedStmt <line:10:2, col:7> `-CallExpr <col:2, col:7> 'void' `-ImplicitCastExpr <col:2> 'void (*)(void)' <FunctionToPointerDecay> `-DeclRefExpr <col:2> 'void (void)' lvalue Function 'foo' 'void (void)' 18

Code Generation All variables are combined into an autogenerated record according to their datasharing attributes (predetermined, explicit or implicit) OpenMP regions are outlined as functions with a single argument pointer to the record LLVM IR code is generated to use captured variables instead of original ones 19

Current Status and Plans Implemented and committed: -fopenmp option #pragma omp threadprivate Parsing and semantic analysis, AST representation Implemented, under code review: All pragmas (parallel, for, sections, task etc.) Parsing and semantic analysis, data-sharing attributes analysis, AST representation Under development CodeGen for all OpenMP constructs 20

Acknowledgements Thank you to all code reviewers! Especially to Dmitri Gribenko, Hal Finkel and Doug Gregor Your contribution is welcomed! 21

Legal Disclaimer & INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 22

23