U N C L A S S I F I E D
|
|
|
- Mary Townsend
- 10 years ago
- Views:
Transcription
1 CUDA and Java GPU Computing in a Cross Platform Application Scot Halverson [email protected] LA-UR Slide 1
2 What s the goal? Run GPU code alongside Java code Take advantage of high parallelization Utilize GPU technology with existing Java code Minimal modifications Cross platform Slide 2
3 How? Separate processes communicating directly or indirectly Shared memory, file I/O Take advantage of Java Native Interface directly Use an intermediate library JCUDA, JOCL, etc Provide software interfaces to device Use an additional processing/compiling tool Rootbeer, aparapi, etc Converts some Java code to device code Slide 3
4 Separate Processes Advantages Java code is still cross platform Disadvantages C/CUDA side must be compiled for each platform Interprocess communication can be awkward Slide 4
5 Java Native Interface Advantages No management of communication between processes All code can be contained in one project Disadvantages Cross platform capabilities of Java are much more complicated Might require JNI code to be written for each platform supported Requires embedded C/C++ & CUDA code Slide 5
6 Intermediate Library Advantages Maintains build once, run anywhere Java ideal No recompiling for different platforms Any platform that supports Java and CUDA should work Disadvantages Requires bundled CUDA code Requires bundled library Slide 6
7 Additional Processing or Compiling tool Advantages Write only Java/Java-like code Fewer low-level concerns Memory management, pointers, etc. Disadvantages Relatively new & experimental Existing support info may not apply Few developers with experience Long term availability Less control Kernels and interfacing code are written by tool Slide 7
8 BeSSy Motivation for CUDA and Java interoperability at LANL D-3 BeSSy is a sensor siting tool Used to find optimal/near optimal placement of a set of sensors Uses computationally expensive algorithms Simulated Annealing Genetic Algorithm (experimental) Written in Java We d like to improve on this with GPU computing A significant amount of developer time has been spent on existing code Rewriting everything is not an option Slide 8
9 Optimization Example Population Density High Medium Low Zero
10 Optimization Example same city, same weather, same scenario; only the number of collectors differs Slide 10
11 BeSSy Using Separate Processes Advantages Code can be distributed and used independently Disadvantages Multiple processes are cumbersome Still requires C/CUDA executable to be recompiled and possibly modified upon platform change Slide 11
12 BeSSy Java Native Interface Advantages Probably don t need separate methods per platform Code all in one project Disadvantages Still not platform independent Requires native code to be compiled for each supported platform Additional build steps Producing C/C++ headers Compiling C/C++ code Compiling CUDA code Slide 12
13 BeSSy Intermediate Library Advantages Platform independent As long as library is supported on platform No intermediate C/C++ code between Java and CUDA Disadvantages Requires library to be distributed Additional build step for CUDA code Slide 13
14 BeSSy Additional Processing or Compiling Tool Advantages No C/C++ or CUDA Write Java-like code to run on GPU Disadvantages Requires additional build step Requires use of tool(s) with which few people are familiar Relatively untested systems Longevity in question due to competing tools, openacc Depending on tool, might still need to recompile per platform Slide 14
15 BeSSy GPU Parallelization Chose Intermediate library as best option Specifically, JCUDA JCUDA is open source As long as Java and CUDA are supported on a platform, JCUDA will probably work JCUDA is mature and well maintained Frequent updates, supports CUDA 5.0 (latest at time of writing) Working project since at least 2009 an eternity in GPU Computing Slide 15
16 BeSSy GPU Parallelization No recompiling of our own code Java is platform independent, and CUDA can be compiled to PTX supporting wide range of CUDA GPUs No need to distribute BeSSy source code The only set of code that is platform dependent is the library Existing Java and CUDA developers are already familiar with the necessary tools Slide 16
17 BeSSy Traveling Salesman Analog BeSSy is a sensitive tool not to be utilized outside of official use cases Traveling Salesman Problem is an effective analog Both are NP-Hard problems Simulated Annealing and Genetic Algorithms are reasonable approaches for generating solutions Traveling Salesman Problem is well known in CS world TSP is not sensitive to Gov t work Source code can be shown and distributed Slide 17
18 Traveling Salesman Problem n number of cities Salesman must visit each city exactly once Goal is to produce an optimal or near optimal sequence of cities to path Optimal defined as shortest overall distance traveled For n cities, there are ~n! possible solutions Solver Slide 18
19 TSP Simulated Annealing 1 - Generate set of solutions 2 - Modify solutions based on some Temperature (T) Higher T produces greater change on average 3 - Evaluate solutions 4 - Replace bad solutions with copies of good solutions 5 Repeat starting at 2, until some number of iterations is reached Slide 19
20 TSP Genetic Algorithm 1 Generate set of parent solutions 2 Generate child solutions for each parent solution 3 Correct children solutions 4 Find best between parent and children and replace parent 5 Repeat starting at 2, until some number of unchanged iterations reached Slide 20
21 Traveling Salesman Problem Ideal for GPU parallelization Highly parallelizable Low memory requirements Low device è host and host ç device transfers Algorithm porting from CPU to GPU is nearly direct Substantial Performance benefits ~20x improvement for Genetic Algorithm ~80x improvement for Simulated Annealing Slide 21
22 TSP Initializing the GPU Nearly identical to C/C++ GPU initialization Create CUdevice object Create CUcontext object using Cudevice object Load PTX on device Create device function pointers Allocate memory on device Copy data to device Slide 22
23 TSP Initializing the GPU cuinit(0); //Initialize the driver device = new CUdevice(); //create a context for the first device cudeviceget(device, 0); context = new CUcontext(); cuctxcreate(context, 0, device); module = new CUmodule(); //Load the ptx file cumoduleload(module, ptxfilename); initparents = new CUfunction(); //initialize CUDA global function pointers cumodulegetfunction(initparents, module, "initparents"); d_x = new CUdeviceptr(); d_y = new CUdeviceptr(); //initialize device array pointers cumemalloc(d_x, citycount * Sizeof.FLOAT); //allocate device memory for arrays cumemalloc(d_y, citycount * Sizeof.FLOAT); cumemcpyhtod(d_x, Pointer.to(x), citycount * Sizeof.FLOAT); //copy city positions to device cumemcpyhtod(d_y, Pointer.to(y), citycount * Sizeof.FLOAT); Slide 23
24 TSP Example Kernel extern "C" global void reject(int *solutions, float *values, int citycount, int solutioncount, int *mi, float *mv){ int index = blockidx.x * blockdim.x + threadidx.x; if(index < solutioncount){ } float rejectionmultiplier = 1.15f; if(values[index] > rejectionmultiplier * mv[0]){ } values[index] = mv[0]; for(int a = 0; a < citycount; a++){ } solutions[index * citycount + a] = solutions[mi[0] * citycount + a]; } Slide 24
25 TSP Kernel Launch Setup Create Pointer to set of kernel arguments Copy any necessary data to device Call culaunchkernel Set grid and block size Set Shared memory utilization Set kernel, extra parameters Copy any necessary data to host Slide 25
26 TSP Kernel Launch Example //create set of parameters for call to reject on device kernelparameters = Pointer.to( Pointer.to(d_solutions), // Solution array Pointer.to(d_values), // Value array Pointer.to(new int[]{citycount}), // Number of cities Pointer.to(new int[]{solutioncount}), // Number of solutions Pointer.to(d_mi), // index of best solution Pointer.to(d_mv) // value of best solution ); // Call reject on device. blocksizex = 256; gridsizex = (int)math.ceil((double)solutioncount / blocksizex); culaunchkernel(reject, gridsizex, 1, 1, // Grid dimension blocksizex, 1, 1, // Block dimension 0, null, // Shared memory size and stream kernelparameters, null // Kernel- and extra parameters Slide 26
27 Combining Java and CUDA Notes Major programming concerns result from programming styles associated with two languages Garbage collection in Java Manual memory management in CUDA Pointers are uncommon and avoided in Java Pointers are critical in CUDA Typical CUDA parallelization issues Setting up kernels rather than just calling host functions Copying data to/from host Designing code to run efficiently in parallel Slide 27
28 Conclusion Number of options for programming CUDA & Java Best option for BeSSy is intermediate library CUDA programming with Java is very similar to CUDA programming for C/C++ Most Functions behave identically or near identically Java programming model is somewhat different than C/C++ & CUDA Some extra difficulty is introduced for programmers not familiar with either Java or CUDA Slide 28
29 Source Code Code soon to be available at: Slide 29
PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)
PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon
Rootbeer: Seamlessly using GPUs from Java
Rootbeer: Seamlessly using GPUs from Java Phil Pratt-Szeliga. Dr. Jim Fawcett. Dr. Roy Welch. Syracuse University. Rootbeer Overview and Motivation Rootbeer allows a developer to program a GPU in Java
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
CUDA Debugging. GPGPU Workshop, August 2012. Sandra Wienke Center for Computing and Communication, RWTH Aachen University
CUDA Debugging GPGPU Workshop, August 2012 Sandra Wienke Center for Computing and Communication, RWTH Aachen University Nikolay Piskun, Chris Gottbrath Rogue Wave Software Rechen- und Kommunikationszentrum
OpenACC Basics Directive-based GPGPU Programming
OpenACC Basics Directive-based GPGPU Programming Sandra Wienke, M.Sc. [email protected] Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum (RZ) PPCES,
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
CUDA Basics. Murphy Stein New York University
CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Learn CUDA in an Afternoon: Hands-on Practical Exercises
Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA
GPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
OpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
Monitoring, Tracing, Debugging (Under Construction)
Monitoring, Tracing, Debugging (Under Construction) I was already tempted to drop this topic from my lecture on operating systems when I found Stephan Siemen's article "Top Speed" in Linux World 10/2003.
GPU Tools Sandra Wienke
Sandra Wienke Center for Computing and Communication, RWTH Aachen University MATSE HPC Battle 2012/13 Rechen- und Kommunikationszentrum (RZ) Agenda IDE Eclipse Debugging (CUDA) TotalView Profiling (CUDA
Cross-Platform GP with Organic Vectory BV Project Services Consultancy Services Expertise Markets 3D Visualization Architecture/Design Computing Embedded Software GIS Finance George van Venrooij Organic
Introduction to CUDA C
Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU computing as first-class capability Retain traditional DirectX/OpenGL graphics performance CUDA C Based on industry-standard
OpenACC Programming on GPUs
OpenACC Programming on GPUs Directive-based GPGPU Programming Sandra Wienke, M.Sc. [email protected] Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum
Hands-on CUDA exercises
Hands-on CUDA exercises CUDA Exercises We have provided skeletons and solutions for 6 hands-on CUDA exercises In each exercise (except for #5), you have to implement the missing portions of the code Finished
HPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
High Performance CUDA Accelerated Local Optimization in Traveling Salesman Problem
High Performance CUDA Accelerated Local Optimization in Traveling Salesman Problem Kamil Rocki, PhD Department of Computer Science Graduate School of Information Science and Technology The University of
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v5.5 July 2013 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About
Le langage OCaml et la programmation des GPU
Le langage OCaml et la programmation des GPU GPU programming with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte Le projet OpenGPU : un an plus tard Ecole Polytechnique - 8 juin 2011 Outline
To Java SE 8, and Beyond (Plan B)
11-12-13 To Java SE 8, and Beyond (Plan B) Francisco Morero Peyrona EMEA Java Community Leader 8 9...2012 2020? Priorities for the Java Platforms Grow Developer Base Grow Adoption
Introduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
CUDA Programming. Week 4. Shared memory and register
CUDA Programming Week 4. Shared memory and register Outline Shared memory and bank confliction Memory padding Register allocation Example of matrix-matrix multiplication Homework SHARED MEMORY AND BANK
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v6.5 August 2014 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About
GPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
Embedded Software development Process and Tools: Lesson-4 Linking and Locating Software
Embedded Software development Process and Tools: Lesson-4 Linking and Locating Software 1 1. Linker 2 Linker Links the compiled codes of application software, object codes from library and OS kernel functions.
CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing
CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
Evaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
Virtual Machines. www.viplavkambli.com
1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software
NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS
NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS DU-05349-001_v6.0 February 2014 Installation and Verification on TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2.
GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 4 - Optimizations Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 3 Control flow Coalescing Latency hiding
Using EDA Databases: Milkyway & OpenAccess
Using EDA Databases: Milkyway & OpenAccess Enabling and Using Scripting Languages with Milkyway and OpenAccess Don Amundson Khosro Khakzadi 2006 LSI Logic Corporation 1 Outline History Choice Of Scripting
Java GPU Computing. Maarten Steur & Arjan Lamers
Java GPU Computing Maarten Steur & Arjan Lamers Overzicht OpenCL Simpel voorbeeld Casus Tips & tricks Vragen Waarom GPU Computing Afkortingen CPU, GPU, APU Khronos: OpenCL, OpenGL Nvidia: CUDA JogAmp JOCL,
Experiences on using GPU accelerators for data analysis in ROOT/RooFit
Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,
System Structures. Services Interface Structure
System Structures Services Interface Structure Operating system services (1) Operating system services (2) Functions that are helpful to the user User interface Command line interpreter Batch interface
An Easier Way for Cross-Platform Data Acquisition Application Development
An Easier Way for Cross-Platform Data Acquisition Application Development For industrial automation and measurement system developers, software technology continues making rapid progress. Software engineers
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
12 Tips for Maximum Performance with PGI Directives in C
12 Tips for Maximum Performance with PGI Directives in C List of Tricks 1. Eliminate Pointer Arithmetic 2. Privatize Arrays 3. Make While Loops Parallelizable 4. Rectangles are Better Than Triangles 5.
Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment
Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment Nuno A. Carvalho, João Bordalo, Filipe Campos and José Pereira HASLab / INESC TEC Universidade do Minho MW4SOC 11 December
Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students
Eastern Washington University Department of Computer Science Questionnaire for Prospective Masters in Computer Science Students I. Personal Information Name: Last First M.I. Mailing Address: Permanent
Lecture 1 Introduction to Android
These slides are by Dr. Jaerock Kwon at. The original URL is http://kettering.jrkwon.com/sites/default/files/2011-2/ce-491/lecture/alecture-01.pdf so please use that instead of pointing to this local copy
How To Write Portable Programs In C
Writing Portable Programs COS 217 1 Goals of Today s Class Writing portable programs in C Sources of heterogeneity Data types, evaluation order, byte order, char set, Reading period and final exam Important
Lecture 1: an introduction to CUDA
Lecture 1: an introduction to CUDA Mike Giles [email protected] Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Overview hardware view software view CUDA programming
Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology
Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology Parallel Reduction Common and important data parallel primitive Easy to implement in CUDA Harder to get it right Serves as
The C Programming Language course syllabus associate level
TECHNOLOGIES The C Programming Language course syllabus associate level Course description The course fully covers the basics of programming in the C programming language and demonstrates fundamental programming
How do Users and Processes interact with the Operating System? Services for Processes. OS Structure with Services. Services for the OS Itself
How do Users and Processes interact with the Operating System? Users interact indirectly through a collection of system programs that make up the operating system interface. The interface could be: A GUI,
Programming Languages
Programming Languages Qing Yi Course web site: www.cs.utsa.edu/~qingyi/cs3723 cs3723 1 A little about myself Qing Yi Ph.D. Rice University, USA. Assistant Professor, Department of Computer Science Office:
An Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
GPU Computing with CUDA Lecture 3 - Efficient Shared Memory Use. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 3 - Efficient Shared Memory Use Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 2 Shared memory in detail
Install Java Development Kit (JDK) 1.8 http://www.oracle.com/technetwork/java/javase/downloads/index.html
CS 259: Data Structures with Java Hello World with the IntelliJ IDE Instructor: Joel Castellanos e-mail: joel.unm.edu Web: http://cs.unm.edu/~joel/ Office: Farris Engineering Center 319 8/19/2015 Install
Parallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.
Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development
ProTrack: A Simple Provenance-tracking Filesystem
ProTrack: A Simple Provenance-tracking Filesystem Somak Das Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology [email protected] Abstract Provenance describes a file
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
Introduction to Parallel Programming (w/ JAVA)
Introduction to Parallel Programming (w/ JAVA) Dec. 21st, 2015 Christian Terboven IT Center, RWTH Aachen University [email protected] 1 Moore s Law Cramming More Components onto Integrated Circuits
Cloud Computing. Up until now
Cloud Computing Lecture 11 Virtualization 2011-2012 Up until now Introduction. Definition of Cloud Computing Grid Computing Content Distribution Networks Map Reduce Cycle-Sharing 1 Process Virtual Machines
Tool - 1: Health Center
Tool - 1: Health Center Joseph Amrith Raj http://facebook.com/webspherelibrary 2 Tool - 1: Health Center Table of Contents WebSphere Application Server Troubleshooting... Error! Bookmark not defined. About
AI: A Modern Approach, Chpts. 3-4 Russell and Norvig
AI: A Modern Approach, Chpts. 3-4 Russell and Norvig Sequential Decision Making in Robotics CS 599 Geoffrey Hollinger and Gaurav Sukhatme (Some slide content from Stuart Russell and HweeTou Ng) Spring,
OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
Black-Scholes option pricing. Victor Podlozhnyuk [email protected]
Black-Scholes option pricing Victor Podlozhnyuk [email protected] June 007 Document Change History Version Date Responsible Reason for Change 0.9 007/03/19 vpodlozhnyuk Initial release 1.0 007/04/06
Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students
Eastern Washington University Department of Computer Science Questionnaire for Prospective Masters in Computer Science Students I. Personal Information Name: Last First M.I. Mailing Address: Permanent
HP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich [email protected]
OpenACC Programming and Best Practices Guide
OpenACC Programming and Best Practices Guide June 2015 2015 openacc-standard.org. All Rights Reserved. Contents 1 Introduction 3 Writing Portable Code........................................... 3 What
How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)
TN203 Porting a Program to Dynamic C Introduction Dynamic C has a number of improvements and differences compared to many other C compiler systems. This application note gives instructions and suggestions
Unit Testing & JUnit
Unit Testing & JUnit Lecture Outline Communicating your Classes Introduction to JUnit4 Selecting test cases UML Class Diagrams Rectangle height : int width : int resize(double,double) getarea(): int getperimeter():int
OpenPOWER Software Stack with Big Data Example March 2014
OpenPOWER Software Stack with Big Data Example March 2014 Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
BSc (Hons) Business Information Systems, BSc (Hons) Computer Science with Network Security. & BSc. (Hons.) Software Engineering
BSc (Hons) Business Information Systems, BSc (Hons) Computer Science with Network Security & BSc. (Hons.) Software Engineering Cohort: BIS/05/FT BCNS/05/FT BSE/05/FT Examinations for 2005-2006 / Semester
Research and Design of Universal and Open Software Development Platform for Digital Home
Research and Design of Universal and Open Software Development Platform for Digital Home CaiFeng Cao School of Computer Wuyi University, Jiangmen 529020, China [email protected] Abstract. With the development
An Incomplete C++ Primer. University of Wyoming MA 5310
An Incomplete C++ Primer University of Wyoming MA 5310 Professor Craig C. Douglas http://www.mgnet.org/~douglas/classes/na-sc/notes/c++primer.pdf C++ is a legacy programming language, as is other languages
Writing Applications for the GPU Using the RapidMind Development Platform
Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...
GPGPU Parallel Merge Sort Algorithm
GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led
Mutual Exclusion using Monitors
Mutual Exclusion using Monitors Some programming languages, such as Concurrent Pascal, Modula-2 and Java provide mutual exclusion facilities called monitors. They are similar to modules in languages that
Sensors and the Zaurus S Hardware
The Zaurus Software Development Guide Robert Christy August 29, 2003 Contents 1 Overview 1 2 Writing Software for the Zaurus 2 3 Using the bathw Library 3 3.1 Using the fans.............................
Fachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert
Ubiquitous Computing Ubiquitous Computing The Sensor Network System Sun SPOT: The Sun Small Programmable Object Technology Technology-Based Wireless Sensor Networks a Java Platform for Developing Applications
TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING
TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING NVIDIA DEVELOPER TOOLS BUILD. DEBUG. PROFILE. C/C++ IDE INTEGRATION STANDALONE TOOLS HARDWARE SUPPORT CPU AND GPU DEBUGGING & PROFILING
Experimental Comparison of Concolic and Random Testing for Java Card Applets
Experimental Comparison of Concolic and Random Testing for Java Card Applets Kari Kähkönen, Roland Kindermann, Keijo Heljanko, and Ilkka Niemelä Aalto University, Department of Information and Computer
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach Beniamino Di Martino, Antonio Esposito and Andrea Barbato Department of Industrial and Information Engineering Second University of Naples
Test Driven Development of Embedded Systems Using Existing Software Test Infrastructure
Test Driven Development of Embedded Systems Using Existing Software Test Infrastructure Micah Dowty University of Colorado at Boulder [email protected] March 26, 2004 Abstract Traditional software development
Freescale Semiconductor, I
nc. Application Note 6/2002 8-Bit Software Development Kit By Jiri Ryba Introduction 8-Bit SDK Overview This application note describes the features and advantages of the 8-bit SDK (software development
Java Embedded Applications
TM a One-Stop Shop for Java Embedded Applications GeeseWare offer brings Java in your constrained embedded systems. You develop and simulate your Java application on PC, and enjoy a seamless hardware validation.
Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:
Tools Page 1 of 13 ON PROGRAM TRANSLATION A priori, we have two translation mechanisms available: Interpretation Compilation On interpretation: Statements are translated one at a time and executed immediately.
MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA
MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS Julien Demouth, NVIDIA STAC-A2 BENCHMARK STAC-A2 Benchmark Developed by banks Macro and micro, performance and accuracy Pricing and Greeks for American
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions
CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly
GPU Accelerated Monte Carlo Simulations and Time Series Analysis
GPU Accelerated Monte Carlo Simulations and Time Series Analysis Institute of Physics, Johannes Gutenberg-University of Mainz Center for Polymer Studies, Department of Physics, Boston University Artemis
VOC Documentation. Release 0.1. Russell Keith-Magee
VOC Documentation Release 0.1 Russell Keith-Magee February 07, 2016 Contents 1 About VOC 3 1.1 The VOC Developer and User community................................ 3 1.2 Frequently Asked Questions.......................................
The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist
The Top Six Advantages of CUDA-Ready Clusters Ian Lumb Bright Evangelist GTC Express Webinar January 21, 2015 We scientists are time-constrained, said Dr. Yamanaka. Our priority is our research, not managing
Case Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke [email protected] ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
General Introduction
Managed Runtime Technology: General Introduction Xiao-Feng Li ([email protected]) 2012-10-10 Agenda Virtual machines Managed runtime systems EE and MM (JIT and GC) Summary 10/10/2012 Managed Runtime
Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA
Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Amol
