Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61



Similar documents
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

GPU Tools Sandra Wienke

Next Generation GPU Architecture Code-named Fermi

GPU Parallel Computing Architecture and CUDA Programming Model

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

HPC Wales Skills Academy Course Catalogue 2015

CUDA Basics. Murphy Stein New York University

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Introduction to GPU Programming Languages

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

GPU Computing - CUDA

Optimizing Application Performance with CUDA Profiling Tools

CUDA programming on NVIDIA GPUs

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

CUDA Debugging. GPGPU Workshop, August Sandra Wienke Center for Computing and Communication, RWTH Aachen University

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

OpenACC 2.0 and the PGI Accelerator Compilers

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU Performance Analysis and Optimisation

Guided Performance Analysis with the NVIDIA Visual Profiler

OpenPOWER Software Stack with Big Data Example March 2014

Evaluation of CUDA Fortran for the CFD code Strukti

Introduction to GPU hardware and to CUDA

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Le langage OCaml et la programmation des GPU


CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

~ Greetings from WSU CAPPLab ~

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Parallel Computing with MATLAB

GPU for Scientific Computing. -Ali Saleh

Texture Cache Approximation on GPUs

Part I Courses Syllabus

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Learn CUDA in an Afternoon: Hands-on Practical Exercises

U N C L A S S I F I E D

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Multi-GPU Load Balancing for Simulation and Rendering

GPUs for Scientific Computing

GPU Profiling with AMD CodeXL

Computer Graphics Hardware An Overview

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Overview of HPC Resources at Vanderbilt

L20: GPU Architecture and Models

Case Study on Productivity and Performance of GPGPUs

General Introduction

CUDA Tools for Debugging and Profiling. Jiri Kraus (NVIDIA)

Parallel Programming Survey

An Easier Way for Cross-Platform Data Acquisition Application Development

Data Centric Systems (DCS)

GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Stream Processing on GPUs Using Distributed Multimedia Middleware

GPGPU Parallel Merge Sort Algorithm

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Lecture 1: an introduction to CUDA

NVIDIA GeForce GTX 580 GPU Datasheet

Dynamic Web Programming BUILDING WEB APPLICATIONS USING ASP.NET, AJAX AND JAVASCRIPT

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

A Multi-layered Domain-specific Language for Stencil Computations

Weighted Total Mark. Weighted Exam Mark

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

OpenACC Programming and Best Practices Guide

CHAPTER 1 INTRODUCTION

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

HP ProLiant SL270s Gen8 Server. Evaluation Report

Software Development around a Millisecond

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Introduction to the CUDA Toolkit for Building Applications. Adam DeConinck HPC Systems Engineer, NVIDIA

Data Structure Oriented Monitoring for OpenMP Programs

Introduction to Cloud Computing

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Java Embedded Applications

Hands-on CUDA exercises

Amazon EC2 Product Details Page 1 of 5

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Transcription:

F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase turns F# into first class CUDA language! Live coding examples including CUDA scripting in Excel! Pointers to more information

About Us! Software development! Quantitative finance and risk management! Derivative pricing and modeling! Numerical computing! High performance computing (clusters, grid, GPUs)! Software engineering (C++, F#, Scala, )! Early adopter of GPUs! Creator of Alea.cuBase! Consulting and advisory! Risk management, quantitative finance, reporting, regulations! Banks, insurance firms, asset management firms, energy and commodity trading firms! Interdisciplinary team, with expertise in methodology, implementation, IT architecture! Using QuantAlea technology in projects

Challenges! Software applications in financial industry are! High complexity! Heterogeneous! Strongly interlinked! Numerically intense! Subject to continuous change! Challenges to build and maintain financial software! Time to market! Correctness of results! Robustness of applications! Performance and efficient execution! Extendibility of solutions! Total cost of ownership

Needs How to face these challenges? Flexible high productivity software engineering tools Modern programming languages and concepts Solid framework and library components Support iterative development and rapid prototyping Building on top of modern technologies like.net or the JVM Executing as fast as compiled Fortran or C/ C++ F# with Visual Studio tooling has been specifically designed for these needs!

How does F# achieve this?! Strongly typed functional first language with OO elements! Compact and clear syntax! Designed for! High productivity! Development of correct, robust and efficient code! F# is highly flexible and extensible! Computational work flows, code quotations, type providers,! F# is ideal for! Numerical computing and GPU computing! Domain specific languages, code generation and transformation! F# has lot of potential in computational finance and big data! Good tool support through Visual Studio, Tsunami, Excel (Fcell)! Intellisense, F# interactive! Vast.NET ecosystem

Three success stories! Real project examples are more convincing than listings of technical benefits! Illustrate capabilities of F# with 3 project examples

F# success story I! Derivative pricing library! Self-contained pricing library developed in F#! Innovative design on the general idea of perturbations! Framework to do Greeks and more general calculation! GPU Monte Carlo! GPU accelerated local volatility calibration and pricing! PDE pricing algorithms! Main focus on equity derivatives, some FX! F# advantages! Pattern matching logic for complex dispatching! Computational work flows for data preprocessing! Async work flow for parallel processing! Rich data containers

F# success story II! Grid infrastructure for derivative pricing! High throughput CPU / GPU cluster! Based on ICE from ZeroC! Integration in Sungard Front Arena! F# advantage! Multithreaded server side programming! Async work flows! Callback handling with F# functions and closures! Code generation and transformation from meta information and reflection

F# success story III! Alea.cuBase! Dynamically compile F# code to fast CUDA code! Based on LLVM technology! Started May 2012! First prototype ready after 4 month! First production release after 7 month! Working on Nsight GPU debugging support in Visual Studio! F# advantage! Code quotations! AST processing with pattern matching! Extensibility of F# with computational work flows! High productivity

Our lessons! Lessons! Significant productivity improvement! Less code for more features! Fewer bugs, less testing and debugging! Better code quality! Increased reusability of components! Consequences! Smaller development team! Increased project agility! Reduced project costs! Faster time to market! Today F# is our preferred development language

What are GPUs?

Schematic GPU Architecture! GPU is a coprocessor! Integrates several streaming multiprocessors (SM)! Complex memory hierarchy! Execution of hundreds of threads in parallel! Thread creation, scheduling and switching in hardware! Minimal thread scheduling unit is a warp of 32 threads! Zero overhead for warp scheduling 400-600 cycles 1 cycle Processor 1 Registers 1-2 cycles Processor 1 1-2 cycles 1 cycle... 1 cycle Registers Host Controller 1-2 cycles Data Bus 1-2 cycles... Processor 1 cyclem Shared Memory Device Memory Registers Texture Cache Constant Cache Processor m Registers Instruction Unit Shared Memory Texture Cache Constant Cache Multiprocessor 1 Instruction Unit Multiprocessor n GPU

Programming GPUs! CUDA programming model abstracts GPU device! Kernel is a function executing sequential code in parallel with large number of threads! Two level thread hierarchy! Each thread obtains thread id in its block and its block id in grid! Three dimensional block and grid layout Thread Thread Block Grid

What is?

Alea.cuBase! Alea.cuBase extends F# to a first class CUDA language! Based on LLVM and CUDA 5 technology! Noninvasive single language solution for host and GPU programming! No additional language additions required, in particular no <<< >>>! Extensible! Basis for creating higher level GPU aware DSLs Dynamic code generation GPU algorithm scripting Industry grade performance Rapid development Solid framework for reusability Advanced CUDA programming

Benefits Dynamic code generation! Generate GPU code programmatically at run-time! Usable from any other.net language! Use generics and F# code quotation splicing for flexible kernels! Foundation to develop GPU aware domain specific languages

Benefits Rapid development! Easy and quick setup of development environment, no need to install NVIDIA nvcc compiler tools! Rapid prototyping in F# interactive! Iteratively improve CUDA kernel algorithms without time consuming build cycles! Simple deployment

Benefits GPU algorithm scripting! Execute F# scripts with GPU algorithms on command line or in F# interactive! GPU scripting in Excel! Integrate Alea.cuBase directly with Python

Benefits Solid framework for reusability! Framework for type-safe definition of GPU resources! CUDA monad to specify GPU resources together with launch logic in unified manner! Reuse GPU kernel code and compose them to modular GPU kernel libraries cuda { kernel_a launch logic } cuda { kernel_b launch logic }

Benefits Industry grade performance! Generating performance optimized code which is on par with compiled CUDA C/C++ code! Low level device functions and special math functions! Built in occupancy calculator to identify optimal thread block layout Segmented Scan by Key Alea.cuExtension against CUDA Thrust 800.00% 700.00% 600.00% 500.00% 400.00% 300.00% int32 float32 float64 200.00% 100.00% 0.00% 2097152 8388608 16777216 33554432

Benefits Advanced CUDA programming! Support for texture, constant and shared memory! Pointer operations to partition array data! Special pointer types such as volatile pointers! Runtime compilation control e.g. fast math! Multiple streams! Thread safe use of multiple GPUs! Inline PTX assembly instructions

Ecosystem CUDA C Ecosystem Alea Ecosystem User Applications Thrust CUDPP Alea.cuExtension CUDA Runtime API Alea.cuBase CUDA Driver API

How does work?

Development Process! Four steps to a CUDA kernel with F# and Alea.cuBase cuda { constant array texture kernel... launch logic } PTemplate 1 NVVM IR module 3 CUDA module 4 2 PTX module Launch function CUDA device CUDA context CUDA cubin module PModule Device worker Comilation process

How easily can I use?

Live Coding! We can run code! As executable! In Visual Studio F# interactive! As script on the command line! Smoothly integrated with Python, Matlab,! Can use the new IDE Tsunami, integrated into Excel, to do GPU kernel scripting within Excel! Consider the following GPU examples! Simple kernel! Simplistic Monte Carlo simulation to calculate Pi! PDE solver for 2d heat equation in GPU! Excel GPU scripting with Alea.cuBase and Alea.cuExtension, in Tsunami IDE and FCell

Conclusion! F# combined with modern programming technologies! Allow seamless integration of modern hardware acceleration technologies such as GPUs and CUDA! Significantly improve reusability and composition of components! Enhance correctness and robustness! Allows to solve problems with less code! Requires less developers! Shortens time to market! Reduces development and maintenance cost! If properly used, virtually no performance difference as compared to Fortran and C / C++

More Resources and Free Licenses! More resources! https://www.quantalea.net/products/resources/! https://github.com/quantalea! How to set up! Fermi device or higher! Windows with.net 4 and F# 2.0! CUDA 5 driver! Install Alea.cuBase! No need for CUDA toolkit or NVCC compiler! Apply for PESONAL licenses! https://www.quantalea.net/news/22/

Thank you