RIPL. An Image Processing DSL. Rob Stewart & Deepayan Bhowmik. 1st May, Heriot Watt University

Size: px
Start display at page:

Download "RIPL. An Image Processing DSL. Rob Stewart & Deepayan Bhowmik. 1st May, 2014. Heriot Watt University"

Transcription

1 RIPL An Image Processing DSL Rob Stewart & Deepayan Bhowmik Heriot Watt University 1st May, 2014

2 Image processing language Rathlin = + FPGA

3 Motivation

4 Application scenario

5 FPGAs good fit for remote image processing reconfigurable energy efficient FPGA constraints Memory Task scheduling Language design tradeoffs Solution: Small DSL closely coupled to FPGA instruction set

6 Requirements

7 Design

8 High level imperative language Language choices image algebra Existing languages/libraries FPGA instruction set abstraction Platform independent reference interpreter GPUs & CPUs

9 RIPL Language Features Functions and procedues Assignment let rgb img a = foo(..) {.. } ; action(..) {.. } ; let rgb image a =.. ; var grey image b; b :=.. ; Iteration & conditional branching for i in 0.. n {.. } ; if (.. ) {.. } else {.. } ; Image algebra implementation Overloading b := (a (+) s)ˆ2 ; c := max( sum(b), d) ; let rgb image a =.. ; let rgb img b =.. ; let rgb image c = a - b ; let int x = 3 ; let int y = 4 ; let int z = x - y ;

10

11 Constructing Images in RIPL image : pointset valueset /* RGB img */ let rgb image a = [1:3,1:2] {(1,56,35),(94,22,42),(155,134,99), (56,7,21),(245,1,32),(42,211,111)};

12 Constructing Images in RIPL image : pointset valueset /* RGB img */ let rgb image a = [1:3,1:2] {(1,56,35),(94,22,42),(155,134,99), (56,7,21),(245,1,32),(42,211,111)}; /* Same RGB img to image algebra notation b F Y */ let ptset Y = [1:3,1:2] ; let valset F = {(1,56,35),(94,22,42),(155,134,99), (56,7,21),(245,1,32),(42,211,111)}; let rgb image b = FˆY ;

13 Constructing Images in RIPL image : pointset valueset /* RGB img */ let rgb image a = [1:3,1:2] {(1,56,35),(94,22,42),(155,134,99), (56,7,21),(245,1,32),(42,211,111)}; /* Same RGB img to image algebra notation b F Y */ let ptset Y = [1:3,1:2] ; let valset F = {(1,56,35),(94,22,42),(155,134,99), (56,7,21),(245,1,32),(42,211,111)}; let rgb image b = FˆY ; /* mutable variables */ var grey image c; c := [1:2,1:3] {221,244,230,165,102,124};

14 Overloaded Operations /* add two integers */ let int i = 3; let int j = 4; print(i+j); /* add two value sets */ let valset v1 = {3,4,5}; let valset v2 = {1,2,6}; print(v1+v2); /* add two point sets */ let ptset pt1 = {(1,2),(4,3)}; let ptset pt2 = {(3,1),(5,2)}; print(pt1+pt2); /* add a dog and a cat */ var rgb image dog, cat, friends; cat = readfile("cat.bmp"); dog = readfile("dog.bmp"); friends = cat + dog; writefile(friends,"out.bmp");

15

16 Thresholding I I Segment into regions of interest Semi thresholding: pixels within threshold are retained var grey image a; var grey image b; a := readfile("pumpkin.jpg") ; b := X[ ](a) ; writefile(b,"segmented.jpg") ;

17 Edge detection Contour with abrupt brightness change Important for segmentation & scene analysis Convolve two kernels over original image to calculate approximations of the X & Y derivatives G = s 2 + t 2

18 let grey image a = readfile( pumpkin.bmp ) ; let int template s = [3,3] {-1, 0, 1, -2, 0, 2, -1, 0, 1 } ; let int template t = [3,3] {-1,-2,-1, 0, 0, 0, 1, 2, 1 } ; let grey image newimg = (((a (+) s)ˆ2) + (((a (+) t)ˆ2)))ˆ(1/2) ; writefile(b,"out.bmp");

19 RIPL let rgb image a = readfile("images/bike.bmp"); /* Sobel template definitions */ let grey image newimg = (((a(+)s)ˆ2) + (((a(+)t)ˆ2)))ˆ(1/2); writefile(newimg,"pumpkin-edges.bmp"); OpenCV Mat src, dst, grad_x, grad_y, abs_grad_x, abs_grad_y;; src = imread( pumpkin.bmp ); Sobel(src, grad_x, ddepth, 1, 0, 3, 1, 0, BORDER_DEFAULT); convertscaleabs( grad_x, abs_grad_x ); Sobel(src, grad_y, ddepth, 0, 1, 3, 1, 0, BORDER_DEFAULT); addweighted( grad_x, 0.5, grad_y, 0.5, 0, dst ); convertscaleabs( grad_y, abs_grad_y ); imwrite( pumpkin-edges.bmp,dst);

20 Tackling the pyramid with image algebra Image enhancement Edge detection Thresholding Connected components Morphological transformations Shape detection Image features

21 Image Algebra Operations to express all image-to-image transformations Small number of concise & simple operations 25 point operations 15 point set operations 9 value set operations 30 image operations 37 template operations 4 neighbourhood operations Amenable optimisation techniques that are machine: independent formal mathematical systems dependent FPGAs, CPUs & GPUs

22 Image Algebra Value sets Numeric data for points of types Z, R, or Z 2 k Point sets Spatial relationship between points Image pixels Tuple of point & value function (x, a(x)) Image Function from points to values F -valued image on X is a : X F, or a F X. Rectangular point set X = Z + m Z + n where Z + m Z+ n = {(x 1, x 2 ) Z 2 : 1 x 1 m, 1 x 2 n}

23 Thresholding For source image a R X and threshold range [h, k], semithreshold image b R X is given by: { a(x) if h a(x) k b(x) = 0 otherwise Semithresholded image b R X over [100, 255] is b := a χ [100,255] (a) var grey image a; var grey image b; a := readfile("pumpkin.jpg") ; b := X[ ](a) ; writefile(b,"segmented.jpg") ;

24 Edge detection ( [ Edge enhanced image b R Y is b := (a s) 2 + (a t) 2] ) 1/2 2 x = (i 1, j) 1 x = (i 1, j 1), (i 1, j + 1) s (i,j) (x) = 1 x = (i + 1, j 1), (i + 1, j + 1) 2 x = (i + 1, j) 0 otherwise 2 x = (i, j + 1) 1 x = (i 1, j + 1), (i + 1, j + 1) t (i,j) (x) = 1 x = (i 1, j 1), (i + 1, j 1) 2 x = (i, j 1) 0 otherwise

25 let grey image a = readfile( pumpkin.bmp ) ; let int template s = [3,3] {-1, 0, 1, -2, 0, 2, -1, 0, 1 } ; let int template t = [3,3] {-1,-2,-1, 0, 0, 0, 1, 2, 1 } ; let grey image newimg = (((a (+) s)ˆ2) + (((a (+) t)ˆ2)))ˆ1/2 ( [ (a s) 2 + (a t) 2] 1/2 ) writefile(b,"pumpkin-edges.bmp");

26 Tool Support

27 Syntax highlighting & code completion

28 Rendering RIPL programs as image algebra Video demonstration

29 Implementation

30 RIPL syntax described in labelled BNF notation Prog. Program ::= [Decl] Body ; CmdIf. Command ::= SelectionStm ; EENorm. Exp ::= "[[" Exp "]]2" ; BNot. Exp ::= " " Exp ; ESumIA. Exp ::= "\\sum" Exp ;... Compiled to lexer, parser & AST RIPL Interpreter traverses user program using AST

31 BNF Converter ELisp backend

32 Symbolic RIPL Markup Operation RIPL RIPL-IA IA symbol Negation -x -x x Ceiling ceil(x) \ceil*{x} x Floor floor(x) \floor*{x} x Rounding [x ] [x ] [x] Projection p(i,x) p i(x) p i (x) Sum sum(x) \sum x x Product product(x) \Pi x Π x Maximum max(x) \vee x x Minimum min(x) \wedge x x Euclidean norm [[x ]]2 x 2 x 2 Characteristic X x (z) \chi x(z) χ X (z)

33 Symbolic RIPL Markup

34 RIPL Interpreter

35 Evaluation

36 Parallel image processing Current status: Profiling RIPL on Heriot-Watt Beowulf cluster CPU 2Ghz Intel Xeon, 12Gb memory GPU 1.6Ghz GeForce GT 610, 1Gb memory Feeding repa & accelerate benchmarks to community Goal: to match OpenCV performance (Optimising data parallel code is hard)

37 Future Work Full coverage of image algebra operations Implement algorithm libraries in RIPL RIPL dataflow compiler Hardware support for low-level image algebra operations

38 References Handbook of Computer Vision Algorithms in Image Algebra 2nd Ed., G. Ritter & J Wilson, Supporting image algebra in the Matlab programming language for compression research, M. Schmalz et al. SPIE, Efficient parallel stencil convolution in Haskell, B Lippmeier & G Keller, Haskell, Accelerating Haskell array codes with multicore GPUs, M Chakravarty et al. DAMP, 2011.

39 Thanks

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

Computational Foundations of Cognitive Science

Computational Foundations of Cognitive Science Computational Foundations of Cognitive Science Lecture 15: Convolutions and Kernels Frank Keller School of Informatics University of Edinburgh [email protected] February 23, 2010 Frank Keller Computational

More information

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Learn CUDA in an Afternoon: Hands-on Practical Exercises Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA

More information

Dataflow Programming with MaxCompiler

Dataflow Programming with MaxCompiler Dataflow Programming with MaCompiler Lecture Overview Programming DFEs MaCompiler Streaming Kernels Compile and build Java meta-programming 2 Reconfigurable Computing with DFEs Logic Cell (10 5 elements)

More information

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected].

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt. Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected] Outline Motivation why do we need GPUs? Past - how was GPU programming

More information

DEFERRED IMAGE PROCESSING IN INTEL IPP LIBRARY

DEFERRED IMAGE PROCESSING IN INTEL IPP LIBRARY DEFERRED IMAGE PROCESSING IN INTEL IPP LIBRARY Alexander Kibkalo ([email protected]), Michael Lotkov ([email protected]), Ignat Rogozhkin ([email protected]), Alexander Turovets

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

Xilinx SDAccel. A Unified Development Environment for Tomorrow s Data Center. By Loring Wirbel Senior Analyst. November 2014. www.linleygroup.

Xilinx SDAccel. A Unified Development Environment for Tomorrow s Data Center. By Loring Wirbel Senior Analyst. November 2014. www.linleygroup. Xilinx SDAccel A Unified Development Environment for Tomorrow s Data Center By Loring Wirbel Senior Analyst November 2014 www.linleygroup.com Copyright 2014 The Linley Group, Inc. This paper examines Xilinx

More information

Next Generation Operating Systems

Next Generation Operating Systems Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the

More information

HIGH PERFORMANCE BIG DATA ANALYTICS

HIGH PERFORMANCE BIG DATA ANALYTICS HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning

More information

Adaptive Stable Additive Methods for Linear Algebraic Calculations

Adaptive Stable Additive Methods for Linear Algebraic Calculations Adaptive Stable Additive Methods for Linear Algebraic Calculations József Smidla, Péter Tar, István Maros University of Pannonia Veszprém, Hungary 4 th of July 204. / 2 József Smidla, Péter Tar, István

More information

Accelerating CFD using OpenFOAM with GPUs

Accelerating CFD using OpenFOAM with GPUs Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Workshop on Computer Architecture Education 2015 Dan Connors, Kyle Dunn, Ryan Bueter Department of Electrical Engineering University

More information

CFD Implementation with In-Socket FPGA Accelerators

CFD Implementation with In-Socket FPGA Accelerators CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline

More information

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD [email protected]

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD yoel@emet.co.il :Introducing Star-P The Open Platform for Parallel Application Development Yoel Jacobsen E&M Computing LTD [email protected] The case for VHLLs Functional / applicative / very high-level languages allow

More information

How To Program With Adaptive Vision Studio

How To Program With Adaptive Vision Studio Studio 4 intuitive powerful adaptable software for machine vision engineers Introduction Adaptive Vision Studio Adaptive Vision Studio software is the most powerful graphical environment for machine vision

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

GPU Renderfarm with Integrated Asset Management & Production System (AMPS)

GPU Renderfarm with Integrated Asset Management & Production System (AMPS) GPU Renderfarm with Integrated Asset Management & Production System (AMPS) Tackling two main challenges in CG movie production Presenter: Dr. Chen Quan Multi-plAtform Game Innovation Centre (MAGIC), Nanyang

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Multicore Parallel Computing with OpenMP

Multicore Parallel Computing with OpenMP Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

Semester Review. CSC 301, Fall 2015

Semester Review. CSC 301, Fall 2015 Semester Review CSC 301, Fall 2015 Programming Language Classes There are many different programming language classes, but four classes or paradigms stand out:! Imperative Languages! assignment and iteration!

More information

Go Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING

Go Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING Go Faster - Preprocessing Using FPGA, CPU, GPU Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING WHO ARE STEMMER IMAGING? STEMMER IMAGING is: Europe's leading independent provider

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

Object Recognition and Template Matching

Object Recognition and Template Matching Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

More information

CUDAMat: a CUDA-based matrix class for Python

CUDAMat: a CUDA-based matrix class for Python Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based

More information

Using MATLAB to Measure the Diameter of an Object within an Image

Using MATLAB to Measure the Diameter of an Object within an Image Using MATLAB to Measure the Diameter of an Object within an Image Keywords: MATLAB, Diameter, Image, Measure, Image Processing Toolbox Author: Matthew Wesolowski Date: November 14 th 2014 Executive Summary

More information

Network Traffic Monitoring & Analysis with GPUs

Network Traffic Monitoring & Analysis with GPUs Network Traffic Monitoring & Analysis with GPUs Wenji Wu, Phil DeMar [email protected], [email protected] GPU Technology Conference 2013 March 18-21, 2013 SAN JOSE, CALIFORNIA Background Main uses for network

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

GeoImaging Accelerator Pansharp Test Results

GeoImaging Accelerator Pansharp Test Results GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance

More information

A numerically adaptive implementation of the simplex method

A numerically adaptive implementation of the simplex method A numerically adaptive implementation of the simplex method József Smidla, Péter Tar, István Maros Department of Computer Science and Systems Technology University of Pannonia 17th of December 2014. 1

More information

GPU Point List Generation through Histogram Pyramids

GPU Point List Generation through Histogram Pyramids VMV 26, GPU Programming GPU Point List Generation through Histogram Pyramids Gernot Ziegler, Art Tevs, Christian Theobalt, Hans-Peter Seidel Agenda Overall task Problems Solution principle Algorithm: Discriminator

More information

Distributed Image Processing using Hadoop MapReduce framework. Binoy A Fernandez (200950006) Sameer Kumar (200950031)

Distributed Image Processing using Hadoop MapReduce framework. Binoy A Fernandez (200950006) Sameer Kumar (200950031) using Hadoop MapReduce framework Binoy A Fernandez (200950006) Sameer Kumar (200950031) Objective To demonstrate how the hadoop mapreduce framework can be extended to work with image data for distributed

More information

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @

More information

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Jinglin Zhang, Jean François Nezan, Jean-Gabriel Cousin, Erwan Raffin To cite this version: Jinglin Zhang,

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

FPGA-based MapReduce Framework for Machine Learning

FPGA-based MapReduce Framework for Machine Learning FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China

More information

CS1112 Spring 2014 Project 4. Objectives. 3 Pixelation for Identity Protection. due Thursday, 3/27, at 11pm

CS1112 Spring 2014 Project 4. Objectives. 3 Pixelation for Identity Protection. due Thursday, 3/27, at 11pm CS1112 Spring 2014 Project 4 due Thursday, 3/27, at 11pm You must work either on your own or with one partner. If you work with a partner you must first register as a group in CMS and then submit your

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

Chapter 5 Functions. Introducing Functions

Chapter 5 Functions. Introducing Functions Chapter 5 Functions 1 Introducing Functions A function is a collection of statements that are grouped together to perform an operation Define a function Invoke a funciton return value type method name

More information

Image Processing & Video Algorithms with CUDA

Image Processing & Video Algorithms with CUDA Image Processing & Video Algorithms with CUDA Eric Young & Frank Jargstorff 8 NVIDIA Corporation. introduction Image processing is a natural fit for data parallel processing Pixels can be mapped directly

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age

Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics

More information

A Multi-layered Domain-specific Language for Stencil Computations

A Multi-layered Domain-specific Language for Stencil Computations A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Workshop ExaStencils 2014,

More information

Network Traffic Monitoring and Analysis with GPUs

Network Traffic Monitoring and Analysis with GPUs Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar [email protected], [email protected] GPU Technology Conference 2013 March 18-21, 2013 SAN JOSE, CALIFORNIA Background Main uses for network

More information

GPU-Based Network Traffic Monitoring & Analysis Tools

GPU-Based Network Traffic Monitoring & Analysis Tools GPU-Based Network Traffic Monitoring & Analysis Tools Wenji Wu; Phil DeMar [email protected], [email protected] CHEP 2013 October 17, 2013 Coarse Detailed Background Main uses for network traffic monitoring

More information

Writing Applications for the GPU Using the RapidMind Development Platform

Writing Applications for the GPU Using the RapidMind Development Platform Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs A general-purpose virtualization service for HPC on cloud computing: an application to GPUs R.Montella, G.Coviello, G.Giunta* G. Laccetti #, F. Isaila, J. Garcia Blas *Department of Applied Science University

More information

The Click2NetFPGA Toolchain. Teemu Rinta-aho, Mika Karlstedt, Madhav P. Desai USENIX ATC 12, Boston, MA, 13 th of June, 2012

The Click2NetFPGA Toolchain. Teemu Rinta-aho, Mika Karlstedt, Madhav P. Desai USENIX ATC 12, Boston, MA, 13 th of June, 2012 The Click2NetFPGA Toolchain Teemu Rinta-aho, Mika Karlstedt, Madhav P. Desai USENIX ATC 12, Boston, MA, 13 th of June, 2012 Click2NetFPGA We have explored the possibilities of High Level Synthesis (HLS)

More information

Introduction to GPGPU. Tiziano Diamanti [email protected]

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it [email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K [email protected], [email protected] Abstract In this project we propose a method to improve

More information

High-speed image processing algorithms using MMX hardware

High-speed image processing algorithms using MMX hardware High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to

More information

ROBUST VEHICLE TRACKING IN VIDEO IMAGES BEING TAKEN FROM A HELICOPTER

ROBUST VEHICLE TRACKING IN VIDEO IMAGES BEING TAKEN FROM A HELICOPTER ROBUST VEHICLE TRACKING IN VIDEO IMAGES BEING TAKEN FROM A HELICOPTER Fatemeh Karimi Nejadasl, Ben G.H. Gorte, and Serge P. Hoogendoorn Institute of Earth Observation and Space System, Delft University

More information

Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1

Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1 The Role of Programming in Informatics Curricula A. J. Cowling Department of Computer Science University of Sheffield Structure of Presentation Introduction The problem, and the key concepts. Dimensions

More information

Installation Guide. (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom

Installation Guide. (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom Installation Guide (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom Tel: +44 (0) 141 3322681 Fax: +44 (0) 141 3326792 www.mve.com Table of Contents 1.

More information

GPU-based Decompression for Medical Imaging Applications

GPU-based Decompression for Medical Imaging Applications GPU-based Decompression for Medical Imaging Applications Al Wegener, CTO Samplify Systems 160 Saratoga Ave. Suite 150 Santa Clara, CA 95051 [email protected] (888) LESS-BITS +1 (408) 249-1500 1 Outline

More information

Cluster Computing at HRI

Cluster Computing at HRI Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: [email protected] 1 Introduction and some local history High performance computing

More information

Computational Mathematics with Python

Computational Mathematics with Python Computational Mathematics with Python Basics Claus Führer, Jan Erik Solem, Olivier Verdier Spring 2010 Claus Führer, Jan Erik Solem, Olivier Verdier Computational Mathematics with Python Spring 2010 1

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

How To Write A Data Processing Pipeline In R

How To Write A Data Processing Pipeline In R New features and old concepts for handling large and streaming data in practice Simon Urbanek R Foundation Overview Motivation Custom connections Data processing pipelines Parallel processing Back-end

More information

Speed up numerical analysis with MATLAB

Speed up numerical analysis with MATLAB 2011 Technology Trend Seminar Speed up numerical analysis with MATLAB MathWorks: Giorgia Zucchelli Marieke van Geffen Rachid Adarghal TU Delft: Prof.dr.ir. Kees Vuik Thales Nederland: Dènis Riedijk 2011

More information

S1600 1394b at 1.6 Gigabit/Second Bandwidth Encourages Industrial Imaging and Instrumentation Applications Growth

S1600 1394b at 1.6 Gigabit/Second Bandwidth Encourages Industrial Imaging and Instrumentation Applications Growth S1600 1394b at 1.6 Gigabit/Second Bandwidth Encourages Industrial Imaging and Instrumentation Applications Growth Dave Thompson, LSI Corp Virtually all but the most basic Personal Computers produced today

More information

Practical Generic Programming with OCaml

Practical Generic Programming with OCaml Practical Generic Programming with OCaml Jeremy Yallop LFCS, University of Edinburgh ML Workshop 2007 Instead of this... type α tree = Node of α Branch of (α tree) (α tree) val show_tree : (α string) (α

More information

Intelligent Heuristic Construction with Active Learning

Intelligent Heuristic Construction with Active Learning Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field

More information

CUDA in the Cloud Enabling HPC Workloads in OpenStack With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)

CUDA in the Cloud Enabling HPC Workloads in OpenStack With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR) CUDA in the Cloud Enabling HPC Workloads in OpenStack John Paul Walters Computer Scien5st, USC Informa5on Sciences Ins5tute [email protected] With special thanks to Andrew Younge (Indiana Univ.) and Massimo

More information

Low power GPUs a view from the industry. Edvard Sørgård

Low power GPUs a view from the industry. Edvard Sørgård Low power GPUs a view from the industry Edvard Sørgård 1 ARM in Trondheim Graphics technology design centre From 2006 acquisition of Falanx Microsystems AS Origin of the ARM Mali GPUs Main activities today

More information

Parallel Computing with Mathematica UVACSE Short Course

Parallel Computing with Mathematica UVACSE Short Course UVACSE Short Course E Hall 1 1 University of Virginia Alliance for Computational Science and Engineering [email protected] October 8, 2014 (UVACSE) October 8, 2014 1 / 46 Outline 1 NX Client for Remote

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Interactive Level-Set Deformation On the GPU

Interactive Level-Set Deformation On the GPU Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Big Data Visualization on the MIC

Big Data Visualization on the MIC Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth [email protected] Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth

More information

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Daniel Weingaertner Informatics Department Federal University of Paraná - Brazil Hochschule Regensburg 02.05.2011 Daniel

More information

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

How To Build An Ark Processor With An Nvidia Gpu And An African Processor Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

CS231M Project Report - Automated Real-Time Face Tracking and Blending

CS231M Project Report - Automated Real-Time Face Tracking and Blending CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, [email protected] June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android

More information

Cisco Enhanced Device Interface 2.2

Cisco Enhanced Device Interface 2.2 Cisco Enhanced Device Interface 2.2 Product Features Q. What is Cisco Enhanced Device Interface (EDI)? A. Cisco EDI is an external implementation and extension of the Cisco network element interface designed

More information

Computational Mathematics with Python

Computational Mathematics with Python Boolean Arrays Classes Computational Mathematics with Python Basics Olivier Verdier and Claus Führer 2009-03-24 Olivier Verdier and Claus Führer Computational Mathematics with Python 2009-03-24 1 / 40

More information

WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math

WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit

More information

Computer Vision Technology. Dave Bolme and Steve O Hara

Computer Vision Technology. Dave Bolme and Steve O Hara Computer Vision Technology Dave Bolme and Steve O Hara Today we ll discuss... The OpenCV Computer Vision Library Python scripting for Computer Vision Python OpenCV bindings SciPy / Matlab-like Python capabilities

More information

Chapter One Introduction to Programming

Chapter One Introduction to Programming Chapter One Introduction to Programming 1-1 Algorithm and Flowchart Algorithm is a step-by-step procedure for calculation. More precisely, algorithm is an effective method expressed as a finite list of

More information

Clustering Billions of Data Points Using GPUs

Clustering Billions of Data Points Using GPUs Clustering Billions of Data Points Using GPUs Ren Wu [email protected] Bin Zhang [email protected] Meichun Hsu [email protected] ABSTRACT In this paper, we report our research on using GPUs to accelerate

More information

Data Center and Cloud Computing Market Landscape and Challenges

Data Center and Cloud Computing Market Landscape and Challenges Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information