RIPL. An Image Processing DSL. Rob Stewart & Deepayan Bhowmik. 1st May, Heriot Watt University
|
|
|
- Richard Crawford
- 10 years ago
- Views:
Transcription
1 RIPL An Image Processing DSL Rob Stewart & Deepayan Bhowmik Heriot Watt University 1st May, 2014
2 Image processing language Rathlin = + FPGA
3 Motivation
4 Application scenario
5 FPGAs good fit for remote image processing reconfigurable energy efficient FPGA constraints Memory Task scheduling Language design tradeoffs Solution: Small DSL closely coupled to FPGA instruction set
6 Requirements
7 Design
8 High level imperative language Language choices image algebra Existing languages/libraries FPGA instruction set abstraction Platform independent reference interpreter GPUs & CPUs
9 RIPL Language Features Functions and procedues Assignment let rgb img a = foo(..) {.. } ; action(..) {.. } ; let rgb image a =.. ; var grey image b; b :=.. ; Iteration & conditional branching for i in 0.. n {.. } ; if (.. ) {.. } else {.. } ; Image algebra implementation Overloading b := (a (+) s)ˆ2 ; c := max( sum(b), d) ; let rgb image a =.. ; let rgb img b =.. ; let rgb image c = a - b ; let int x = 3 ; let int y = 4 ; let int z = x - y ;
10
11 Constructing Images in RIPL image : pointset valueset /* RGB img */ let rgb image a = [1:3,1:2] {(1,56,35),(94,22,42),(155,134,99), (56,7,21),(245,1,32),(42,211,111)};
12 Constructing Images in RIPL image : pointset valueset /* RGB img */ let rgb image a = [1:3,1:2] {(1,56,35),(94,22,42),(155,134,99), (56,7,21),(245,1,32),(42,211,111)}; /* Same RGB img to image algebra notation b F Y */ let ptset Y = [1:3,1:2] ; let valset F = {(1,56,35),(94,22,42),(155,134,99), (56,7,21),(245,1,32),(42,211,111)}; let rgb image b = FˆY ;
13 Constructing Images in RIPL image : pointset valueset /* RGB img */ let rgb image a = [1:3,1:2] {(1,56,35),(94,22,42),(155,134,99), (56,7,21),(245,1,32),(42,211,111)}; /* Same RGB img to image algebra notation b F Y */ let ptset Y = [1:3,1:2] ; let valset F = {(1,56,35),(94,22,42),(155,134,99), (56,7,21),(245,1,32),(42,211,111)}; let rgb image b = FˆY ; /* mutable variables */ var grey image c; c := [1:2,1:3] {221,244,230,165,102,124};
14 Overloaded Operations /* add two integers */ let int i = 3; let int j = 4; print(i+j); /* add two value sets */ let valset v1 = {3,4,5}; let valset v2 = {1,2,6}; print(v1+v2); /* add two point sets */ let ptset pt1 = {(1,2),(4,3)}; let ptset pt2 = {(3,1),(5,2)}; print(pt1+pt2); /* add a dog and a cat */ var rgb image dog, cat, friends; cat = readfile("cat.bmp"); dog = readfile("dog.bmp"); friends = cat + dog; writefile(friends,"out.bmp");
15
16 Thresholding I I Segment into regions of interest Semi thresholding: pixels within threshold are retained var grey image a; var grey image b; a := readfile("pumpkin.jpg") ; b := X[ ](a) ; writefile(b,"segmented.jpg") ;
17 Edge detection Contour with abrupt brightness change Important for segmentation & scene analysis Convolve two kernels over original image to calculate approximations of the X & Y derivatives G = s 2 + t 2
18 let grey image a = readfile( pumpkin.bmp ) ; let int template s = [3,3] {-1, 0, 1, -2, 0, 2, -1, 0, 1 } ; let int template t = [3,3] {-1,-2,-1, 0, 0, 0, 1, 2, 1 } ; let grey image newimg = (((a (+) s)ˆ2) + (((a (+) t)ˆ2)))ˆ(1/2) ; writefile(b,"out.bmp");
19 RIPL let rgb image a = readfile("images/bike.bmp"); /* Sobel template definitions */ let grey image newimg = (((a(+)s)ˆ2) + (((a(+)t)ˆ2)))ˆ(1/2); writefile(newimg,"pumpkin-edges.bmp"); OpenCV Mat src, dst, grad_x, grad_y, abs_grad_x, abs_grad_y;; src = imread( pumpkin.bmp ); Sobel(src, grad_x, ddepth, 1, 0, 3, 1, 0, BORDER_DEFAULT); convertscaleabs( grad_x, abs_grad_x ); Sobel(src, grad_y, ddepth, 0, 1, 3, 1, 0, BORDER_DEFAULT); addweighted( grad_x, 0.5, grad_y, 0.5, 0, dst ); convertscaleabs( grad_y, abs_grad_y ); imwrite( pumpkin-edges.bmp,dst);
20 Tackling the pyramid with image algebra Image enhancement Edge detection Thresholding Connected components Morphological transformations Shape detection Image features
21 Image Algebra Operations to express all image-to-image transformations Small number of concise & simple operations 25 point operations 15 point set operations 9 value set operations 30 image operations 37 template operations 4 neighbourhood operations Amenable optimisation techniques that are machine: independent formal mathematical systems dependent FPGAs, CPUs & GPUs
22 Image Algebra Value sets Numeric data for points of types Z, R, or Z 2 k Point sets Spatial relationship between points Image pixels Tuple of point & value function (x, a(x)) Image Function from points to values F -valued image on X is a : X F, or a F X. Rectangular point set X = Z + m Z + n where Z + m Z+ n = {(x 1, x 2 ) Z 2 : 1 x 1 m, 1 x 2 n}
23 Thresholding For source image a R X and threshold range [h, k], semithreshold image b R X is given by: { a(x) if h a(x) k b(x) = 0 otherwise Semithresholded image b R X over [100, 255] is b := a χ [100,255] (a) var grey image a; var grey image b; a := readfile("pumpkin.jpg") ; b := X[ ](a) ; writefile(b,"segmented.jpg") ;
24 Edge detection ( [ Edge enhanced image b R Y is b := (a s) 2 + (a t) 2] ) 1/2 2 x = (i 1, j) 1 x = (i 1, j 1), (i 1, j + 1) s (i,j) (x) = 1 x = (i + 1, j 1), (i + 1, j + 1) 2 x = (i + 1, j) 0 otherwise 2 x = (i, j + 1) 1 x = (i 1, j + 1), (i + 1, j + 1) t (i,j) (x) = 1 x = (i 1, j 1), (i + 1, j 1) 2 x = (i, j 1) 0 otherwise
25 let grey image a = readfile( pumpkin.bmp ) ; let int template s = [3,3] {-1, 0, 1, -2, 0, 2, -1, 0, 1 } ; let int template t = [3,3] {-1,-2,-1, 0, 0, 0, 1, 2, 1 } ; let grey image newimg = (((a (+) s)ˆ2) + (((a (+) t)ˆ2)))ˆ1/2 ( [ (a s) 2 + (a t) 2] 1/2 ) writefile(b,"pumpkin-edges.bmp");
26 Tool Support
27 Syntax highlighting & code completion
28 Rendering RIPL programs as image algebra Video demonstration
29 Implementation
30 RIPL syntax described in labelled BNF notation Prog. Program ::= [Decl] Body ; CmdIf. Command ::= SelectionStm ; EENorm. Exp ::= "[[" Exp "]]2" ; BNot. Exp ::= " " Exp ; ESumIA. Exp ::= "\\sum" Exp ;... Compiled to lexer, parser & AST RIPL Interpreter traverses user program using AST
31 BNF Converter ELisp backend
32 Symbolic RIPL Markup Operation RIPL RIPL-IA IA symbol Negation -x -x x Ceiling ceil(x) \ceil*{x} x Floor floor(x) \floor*{x} x Rounding [x ] [x ] [x] Projection p(i,x) p i(x) p i (x) Sum sum(x) \sum x x Product product(x) \Pi x Π x Maximum max(x) \vee x x Minimum min(x) \wedge x x Euclidean norm [[x ]]2 x 2 x 2 Characteristic X x (z) \chi x(z) χ X (z)
33 Symbolic RIPL Markup
34 RIPL Interpreter
35 Evaluation
36 Parallel image processing Current status: Profiling RIPL on Heriot-Watt Beowulf cluster CPU 2Ghz Intel Xeon, 12Gb memory GPU 1.6Ghz GeForce GT 610, 1Gb memory Feeding repa & accelerate benchmarks to community Goal: to match OpenCV performance (Optimising data parallel code is hard)
37 Future Work Full coverage of image algebra operations Implement algorithm libraries in RIPL RIPL dataflow compiler Hardware support for low-level image algebra operations
38 References Handbook of Computer Vision Algorithms in Image Algebra 2nd Ed., G. Ritter & J Wilson, Supporting image algebra in the Matlab programming language for compression research, M. Schmalz et al. SPIE, Efficient parallel stencil convolution in Haskell, B Lippmeier & G Keller, Haskell, Accelerating Haskell array codes with multicore GPUs, M Chakravarty et al. DAMP, 2011.
39 Thanks
Turbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
Computational Foundations of Cognitive Science
Computational Foundations of Cognitive Science Lecture 15: Convolutions and Kernels Frank Keller School of Informatics University of Edinburgh [email protected] February 23, 2010 Frank Keller Computational
Learn CUDA in an Afternoon: Hands-on Practical Exercises
Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA
Dataflow Programming with MaxCompiler
Dataflow Programming with MaCompiler Lecture Overview Programming DFEs MaCompiler Streaming Kernels Compile and build Java meta-programming 2 Reconfigurable Computing with DFEs Logic Cell (10 5 elements)
The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected].
Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute [email protected] Outline Motivation why do we need GPUs? Past - how was GPU programming
DEFERRED IMAGE PROCESSING IN INTEL IPP LIBRARY
DEFERRED IMAGE PROCESSING IN INTEL IPP LIBRARY Alexander Kibkalo ([email protected]), Michael Lotkov ([email protected]), Ignat Rogozhkin ([email protected]), Alexander Turovets
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Xilinx SDAccel. A Unified Development Environment for Tomorrow s Data Center. By Loring Wirbel Senior Analyst. November 2014. www.linleygroup.
Xilinx SDAccel A Unified Development Environment for Tomorrow s Data Center By Loring Wirbel Senior Analyst November 2014 www.linleygroup.com Copyright 2014 The Linley Group, Inc. This paper examines Xilinx
Next Generation Operating Systems
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the
HIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
Adaptive Stable Additive Methods for Linear Algebraic Calculations
Adaptive Stable Additive Methods for Linear Algebraic Calculations József Smidla, Péter Tar, István Maros University of Pannonia Veszprém, Hungary 4 th of July 204. / 2 József Smidla, Péter Tar, István
Accelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Workshop on Computer Architecture Education 2015 Dan Connors, Kyle Dunn, Ryan Bueter Department of Electrical Engineering University
CFD Implementation with In-Socket FPGA Accelerators
CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline
:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD [email protected]
:Introducing Star-P The Open Platform for Parallel Application Development Yoel Jacobsen E&M Computing LTD [email protected] The case for VHLLs Functional / applicative / very high-level languages allow
How To Program With Adaptive Vision Studio
Studio 4 intuitive powerful adaptable software for machine vision engineers Introduction Adaptive Vision Studio Adaptive Vision Studio software is the most powerful graphical environment for machine vision
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
GPU Renderfarm with Integrated Asset Management & Production System (AMPS)
GPU Renderfarm with Integrated Asset Management & Production System (AMPS) Tackling two main challenges in CG movie production Presenter: Dr. Chen Quan Multi-plAtform Game Innovation Centre (MAGIC), Nanyang
FPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Computer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:
Semester Review. CSC 301, Fall 2015
Semester Review CSC 301, Fall 2015 Programming Language Classes There are many different programming language classes, but four classes or paradigms stand out:! Imperative Languages! assignment and iteration!
Go Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING
Go Faster - Preprocessing Using FPGA, CPU, GPU Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING WHO ARE STEMMER IMAGING? STEMMER IMAGING is: Europe's leading independent provider
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
Parallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
Object Recognition and Template Matching
Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of
CUDAMat: a CUDA-based matrix class for Python
Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based
Using MATLAB to Measure the Diameter of an Object within an Image
Using MATLAB to Measure the Diameter of an Object within an Image Keywords: MATLAB, Diameter, Image, Measure, Image Processing Toolbox Author: Matthew Wesolowski Date: November 14 th 2014 Executive Summary
Network Traffic Monitoring & Analysis with GPUs
Network Traffic Monitoring & Analysis with GPUs Wenji Wu, Phil DeMar [email protected], [email protected] GPU Technology Conference 2013 March 18-21, 2013 SAN JOSE, CALIFORNIA Background Main uses for network
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
GeoImaging Accelerator Pansharp Test Results
GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance
A numerically adaptive implementation of the simplex method
A numerically adaptive implementation of the simplex method József Smidla, Péter Tar, István Maros Department of Computer Science and Systems Technology University of Pannonia 17th of December 2014. 1
GPU Point List Generation through Histogram Pyramids
VMV 26, GPU Programming GPU Point List Generation through Histogram Pyramids Gernot Ziegler, Art Tevs, Christian Theobalt, Hans-Peter Seidel Agenda Overall task Problems Solution principle Algorithm: Discriminator
Distributed Image Processing using Hadoop MapReduce framework. Binoy A Fernandez (200950006) Sameer Kumar (200950031)
using Hadoop MapReduce framework Binoy A Fernandez (200950006) Sameer Kumar (200950031) Objective To demonstrate how the hadoop mapreduce framework can be extended to work with image data for distributed
CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing
CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @
Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration
Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Jinglin Zhang, Jean François Nezan, Jean-Gabriel Cousin, Erwan Raffin To cite this version: Jinglin Zhang,
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
FPGA-based MapReduce Framework for Machine Learning
FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China
CS1112 Spring 2014 Project 4. Objectives. 3 Pixelation for Identity Protection. due Thursday, 3/27, at 11pm
CS1112 Spring 2014 Project 4 due Thursday, 3/27, at 11pm You must work either on your own or with one partner. If you work with a partner you must first register as a group in CMS and then submit your
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
Chapter 5 Functions. Introducing Functions
Chapter 5 Functions 1 Introducing Functions A function is a collection of statements that are grouped together to perform an operation Define a function Invoke a funciton return value type method name
Image Processing & Video Algorithms with CUDA
Image Processing & Video Algorithms with CUDA Eric Young & Frank Jargstorff 8 NVIDIA Corporation. introduction Image processing is a natural fit for data parallel processing Pixels can be mapped directly
GPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age
Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics
A Multi-layered Domain-specific Language for Stencil Computations
A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Workshop ExaStencils 2014,
Network Traffic Monitoring and Analysis with GPUs
Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar [email protected], [email protected] GPU Technology Conference 2013 March 18-21, 2013 SAN JOSE, CALIFORNIA Background Main uses for network
GPU-Based Network Traffic Monitoring & Analysis Tools
GPU-Based Network Traffic Monitoring & Analysis Tools Wenji Wu; Phil DeMar [email protected], [email protected] CHEP 2013 October 17, 2013 Coarse Detailed Background Main uses for network traffic monitoring
Writing Applications for the GPU Using the RapidMind Development Platform
Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Overview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
A general-purpose virtualization service for HPC on cloud computing: an application to GPUs
A general-purpose virtualization service for HPC on cloud computing: an application to GPUs R.Montella, G.Coviello, G.Giunta* G. Laccetti #, F. Isaila, J. Garcia Blas *Department of Applied Science University
The Click2NetFPGA Toolchain. Teemu Rinta-aho, Mika Karlstedt, Madhav P. Desai USENIX ATC 12, Boston, MA, 13 th of June, 2012
The Click2NetFPGA Toolchain Teemu Rinta-aho, Mika Karlstedt, Madhav P. Desai USENIX ATC 12, Boston, MA, 13 th of June, 2012 Click2NetFPGA We have explored the possibilities of High Level Synthesis (HLS)
Introduction to GPGPU. Tiziano Diamanti [email protected]
[email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
CUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.
Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K [email protected], [email protected] Abstract In this project we propose a method to improve
High-speed image processing algorithms using MMX hardware
High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to
ROBUST VEHICLE TRACKING IN VIDEO IMAGES BEING TAKEN FROM A HELICOPTER
ROBUST VEHICLE TRACKING IN VIDEO IMAGES BEING TAKEN FROM A HELICOPTER Fatemeh Karimi Nejadasl, Ben G.H. Gorte, and Serge P. Hoogendoorn Institute of Earth Observation and Space System, Delft University
Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1
The Role of Programming in Informatics Curricula A. J. Cowling Department of Computer Science University of Sheffield Structure of Presentation Introduction The problem, and the key concepts. Dimensions
Installation Guide. (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom
Installation Guide (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom Tel: +44 (0) 141 3322681 Fax: +44 (0) 141 3326792 www.mve.com Table of Contents 1.
GPU-based Decompression for Medical Imaging Applications
GPU-based Decompression for Medical Imaging Applications Al Wegener, CTO Samplify Systems 160 Saratoga Ave. Suite 150 Santa Clara, CA 95051 [email protected] (888) LESS-BITS +1 (408) 249-1500 1 Outline
Cluster Computing at HRI
Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: [email protected] 1 Introduction and some local history High performance computing
Computational Mathematics with Python
Computational Mathematics with Python Basics Claus Führer, Jan Erik Solem, Olivier Verdier Spring 2010 Claus Führer, Jan Erik Solem, Olivier Verdier Computational Mathematics with Python Spring 2010 1
GPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
How To Write A Data Processing Pipeline In R
New features and old concepts for handling large and streaming data in practice Simon Urbanek R Foundation Overview Motivation Custom connections Data processing pipelines Parallel processing Back-end
Speed up numerical analysis with MATLAB
2011 Technology Trend Seminar Speed up numerical analysis with MATLAB MathWorks: Giorgia Zucchelli Marieke van Geffen Rachid Adarghal TU Delft: Prof.dr.ir. Kees Vuik Thales Nederland: Dènis Riedijk 2011
S1600 1394b at 1.6 Gigabit/Second Bandwidth Encourages Industrial Imaging and Instrumentation Applications Growth
S1600 1394b at 1.6 Gigabit/Second Bandwidth Encourages Industrial Imaging and Instrumentation Applications Growth Dave Thompson, LSI Corp Virtually all but the most basic Personal Computers produced today
Practical Generic Programming with OCaml
Practical Generic Programming with OCaml Jeremy Yallop LFCS, University of Edinburgh ML Workshop 2007 Instead of this... type α tree = Node of α Branch of (α tree) (α tree) val show_tree : (α string) (α
Intelligent Heuristic Construction with Active Learning
Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field
CUDA in the Cloud Enabling HPC Workloads in OpenStack With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)
CUDA in the Cloud Enabling HPC Workloads in OpenStack John Paul Walters Computer Scien5st, USC Informa5on Sciences Ins5tute [email protected] With special thanks to Andrew Younge (Indiana Univ.) and Massimo
Low power GPUs a view from the industry. Edvard Sørgård
Low power GPUs a view from the industry Edvard Sørgård 1 ARM in Trondheim Graphics technology design centre From 2006 acquisition of Falanx Microsystems AS Origin of the ARM Mali GPUs Main activities today
Parallel Computing with Mathematica UVACSE Short Course
UVACSE Short Course E Hall 1 1 University of Virginia Alliance for Computational Science and Engineering [email protected] October 8, 2014 (UVACSE) October 8, 2014 1 / 46 Outline 1 NX Client for Remote
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
Interactive Level-Set Deformation On the GPU
Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Big Data Visualization on the MIC
Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth [email protected] Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth
Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter
Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Daniel Weingaertner Informatics Department Federal University of Paraná - Brazil Hochschule Regensburg 02.05.2011 Daniel
How To Build An Ark Processor With An Nvidia Gpu And An African Processor
Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
CS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, [email protected] June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
Cisco Enhanced Device Interface 2.2
Cisco Enhanced Device Interface 2.2 Product Features Q. What is Cisco Enhanced Device Interface (EDI)? A. Cisco EDI is an external implementation and extension of the Cisco network element interface designed
Computational Mathematics with Python
Boolean Arrays Classes Computational Mathematics with Python Basics Olivier Verdier and Claus Führer 2009-03-24 Olivier Verdier and Claus Führer Computational Mathematics with Python 2009-03-24 1 / 40
WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math
Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit
Computer Vision Technology. Dave Bolme and Steve O Hara
Computer Vision Technology Dave Bolme and Steve O Hara Today we ll discuss... The OpenCV Computer Vision Library Python scripting for Computer Vision Python OpenCV bindings SciPy / Matlab-like Python capabilities
Chapter One Introduction to Programming
Chapter One Introduction to Programming 1-1 Algorithm and Flowchart Algorithm is a step-by-step procedure for calculation. More precisely, algorithm is an effective method expressed as a finite list of
Clustering Billions of Data Points Using GPUs
Clustering Billions of Data Points Using GPUs Ren Wu [email protected] Bin Zhang [email protected] Meichun Hsu [email protected] ABSTRACT In this paper, we report our research on using GPUs to accelerate
Data Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
