How To Test Your Code On A Cuda Gdb (Gdb) On A Linux Computer With A Gbd (Gbd) And Gbbd Gbdu (Gdb) (Gdu) (Cuda
|
|
|
- Winifred Franklin
- 5 years ago
- Views:
Transcription
1 Mitglied der Helmholtz-Gemeinschaft Hands On CUDA Tools and Performance-Optimization JSC GPU Programming Course 26. März 2011 Dominic Eschweiler
2 Outline of This Talk Introduction Setup CUDA-GDB Profiling Performance 26. März 2011 Dominic Eschweiler Folie 2
3 Introduction Every section (and subsection) of the exercise paper is a task for the hands on. The task description tells You which file is to open. After opening the file You have to look after the parts which are marked with TODO. 1 / TODO : Some d e s c r i p t i o n s / / TODO / Every informational part on this slides has bullets, every task has a number: März 2011 Dominic Eschweiler Folie 3
4 Setup Your Tools (Exercise 2) 1 Please use ssh -X jugipsy to open a remote session to our GPU system. 2 Try out if X-forwarding works (type cudaprof) 3 Extract tar -xzf testbed.tar.gz to your home directory and change to its folder. 26. März 2011 Dominic Eschweiler Folie 4
5 CUDA-GDB Debugging CUDA Programs is a hard task because the compute kernel is a blackbox. Printf (except Fermi) or systemcalls are not available for kernels. NVIDIA offers a special version of GDB for debugging, which is able to debug kernels. 26. März 2011 Dominic Eschweiler Folie 5
6 CUDA-GDB (Exercise 2.1) 1 Change to the gdb subfolder cd gdb. 2 Type make and do a ls. 3 Open the file gdb test.cu with an text editor of your choice. 4 Now start it with the cuda-gdb by typing cuda-gdb./gdb test. 26. März 2011 Dominic Eschweiler Folie 6
7 (CUDA-)GDB Commands (Exercise 2.1) break <filename.cu>:<linenumber> break <functionname> run continue next step print <variable> Add a breakpoint on this line in the named file. Add a breakpoint on the beginning of this function. Execute the program and hold on the first breakpoint. Go on to the next breakpoint. Step to the next line. Step into the current function. Print out the current value of this variable. 26. März 2011 Dominic Eschweiler Folie 7
8 Profiling To measure durations of different function calls during runtime is a hard job, if the programmer only uses printf() and gettimeofday(). Introduces alien code into the program. Is not very fault tolerant. It is not clear if one can trust the results.... A profiler is a performance analysis tool that, most commonly, measures the frequency and duration of function calls. cudaprof is the profiler from NVIDIA for CUDA 26. März 2011 Dominic Eschweiler Folie 8
9 Profile Your Code (Exercise 2.2) 1 Start the profiler (run cudaprof). 2 After that You will see the main window. Abbildung: CUDA profiler main window. 26. März 2011 Dominic Eschweiler Folie 9
10 Profile Your Code (Exercise 2.2) 1 Add a new project by clicking on file and then on new. 2 Type in a name and press ok. Abbildung: New project dialog. 26. März 2011 Dominic Eschweiler Folie 10
11 Profile Your Code (Exercise 2.2) 1 Klick start and see the profiling output. 2 Open the cudaprof.pdf and find out what each counter mean. Abbildung: CUDA profiler with the project dialog. 26. März 2011 Dominic Eschweiler Folie 11
12 Performance GPU Architecture Host Input Assembler Setup/Rstr/ZCull Vtx Thread Issue Geom Thread Issue Pixel Thread Issue SP SP SP SP SP SP SP SP TF TF TF TF TF TF TF TF Thread Processor L1 L1 L1 L1 L1 L1 L1 L1 R L2 R L2 R L2 R L2 R L2 R L2 FB FB FB FB FB FB Abbildung: The G80 architecture. 26. März 2011 Dominic Eschweiler Folie 12
13 Performance Memory Layer Shared Memory Global Memory Host Memory Abbildung: Memory hierarchy. Thread 1 4c Thread 1 4c Thread n 4c... Thread n 4c 600c 600c 600c 600c Local Memory Global Memory Local Memory Abbildung: Memory organization. 26. März 2011 Dominic Eschweiler Folie 13
14 Performance Make (Exercise 3.1) 1 Please go to the main folder of the testbed. 2 Type make debug. 1 ptxas i n f o : Compiling e n t r y f u n c t i o n Z15P1 Fixed KernelPfS S 2 ptxas i n f o : Used 6 r e g i s t e r s, bytes smem, 12 bytes cmem[ 1 ] 3 ptxas i n f o : Compiling e n t r y f u n c t i o n Z16P1 Broken KernelPfS S 4 ptxas i n f o : Used 6 r e g i s t e r s, bytes smem, 12 bytes cmem[ 1 ] 3 Go to the subfolder cudals and type make again. 4 Type./cudals and find out how many registers the GPU have and how many threads per blocks could be launched. 26. März 2011 Dominic Eschweiler Folie 14
15 Performance Excessive Global Memory Usage (Exercise 3.2) Thread Shared memory Global memory A A A A A A A A A A A Thread Shared memory Global memory A A' A' A' A' A' A' A' A' A' Runtime benefit Abbildung: Shared memory is much faster than global memory. This part demonstrates a performance issue where the kernel program only uses global memory for performing calculations. The better way is to use registers or shared memory to store intermediate results. 26. März 2011 Dominic Eschweiler Folie 15
16 Performance Bank Conflicts (Exercise 3.3) Thread 0 Thread 1 Shared memory Thread 0 Thread 1 Shared memory Bank 0 Bank 1 Bank 0 Bank 1 Runtime benefit Shared memory is a parallel memory which is distributed over several banks, where every bank can only accessed by one thread at the same time. If more than one thread try to access the same bank of the shared memory, the execution is serialized. Abbildung: Bank Conflicts. 26. März 2011 Dominic Eschweiler Folie 16
17 Performance Memory Coalescing (Exercise 3.4) Thread Shared memory Global memory Thread Shared memory Global memory A1' A2' A A2A A1 AA3 A1 A2A A3 A4 A1' A2' A3' A4' A3' Runtime benefit Abbildung: Sometimes memory accesses can be coalesced. A A4 A4' One transfer cycle between global and shared memory has always the size of 128 Bit. The compiler performs automatically every smaller access with a 128 Bit transfer. The better way is to coalesce smaller transfers to bigger ones (if possible). 26. März 2011 Dominic Eschweiler Folie 17
18 Performance Scattered Host Transfer (Exercise 3.5) Global memory Host memory Global memory Host memory A1 A2 A3 A4 init A1 init A2 init A3 init A4 init A1 A2 A3 A4 A1 A2 A3 A4 Runtime benefit Abbildung: Scattered against non scattered host transfer. It could happen that the input data is not located in a consecutive buffer on the host. To scatter the copies directly between host and device memory is a bad idea. 26. März 2011 Dominic Eschweiler Folie 18
19 Performance Thread Register Imbalance (Exercise 3.6) Thread block slot 0 Thread block slot 1 Thread block slot 2 Thread block slot 3 Thread block slot 0 Thread block slot 1 Thread block slot 2 Thread block slot 3 N Thread N Thread N Thread N Thread reg. block 0 reg. block 1 reg. block 2 reg. block 3 Multiprocessor Runtime benefit N/4 Thread reg. block 0 N/4 Thread reg. block 1 N/4 Thread reg. block 2 N/4 Thread reg. block 3 Multiprocessor A kernel should always use every available thread slot on the shared multiprocessor. This can be limited by the number of registers which are used per thread (see compiler output). Abbildung: Keep the multiprocessors busy. 26. März 2011 Dominic Eschweiler Folie 19
20 Performance Wait at Barrier (Exercise 3.7) Thread 0 Barrier Thread 1 Barrier Thread 2 Barrier Thread 3 Barrier Runtime benefit Thread 0 Barrier Thread 1 Barrier Thread 2 Barrier Thread 3 Barrier Abbildung: Wait at barrier. Sometimes there is some initialization needed, which must be divided with an barrier from the calculation steps. If shared memory is used in this initialization step, an easy way to reduce barrier waiting time is to let only one thread do the initialization 26. März 2011 Dominic Eschweiler Folie 20
21 Performance Branch Diverging (Exercise 3.8) Warp 0 Warp 1 Warp 0 Warp 1 B B B B If-case If-case If-case Else-case Abbildung: Branch Diverging. Else-case Else-case Runtime benefit Branching is traditionally a hard job for SIMD architectures. Branches in CUDA do only have no impact on the performance, if they are aligned to the warp boarders. 26. März 2011 Dominic Eschweiler Folie 21
Hands-on CUDA exercises
Hands-on CUDA exercises CUDA Exercises We have provided skeletons and solutions for 6 hands-on CUDA exercises In each exercise (except for #5), you have to implement the missing portions of the code Finished
Optimizing Application Performance with CUDA Profiling Tools
Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory
CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA
CUDA Optimization with NVIDIA Tools Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nvidia Tools 2 What Does the Application
Learn CUDA in an Afternoon: Hands-on Practical Exercises
Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA
CUDA Tools for Debugging and Profiling. Jiri Kraus (NVIDIA)
Mitglied der Helmholtz-Gemeinschaft CUDA Tools for Debugging and Profiling Jiri Kraus (NVIDIA) GPU Programming@Jülich Supercomputing Centre Jülich 7-9 April 2014 What you will learn How to use cuda-memcheck
OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
GPU Tools Sandra Wienke
Sandra Wienke Center for Computing and Communication, RWTH Aachen University MATSE HPC Battle 2012/13 Rechen- und Kommunikationszentrum (RZ) Agenda IDE Eclipse Debugging (CUDA) TotalView Profiling (CUDA
GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 4 - Optimizations Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 3 Control flow Coalescing Latency hiding
GPU Performance Analysis and Optimisation
GPU Performance Analysis and Optimisation Thomas Bradley, NVIDIA Corporation Outline What limits performance? Analysing performance: GPU profiling Exposing sufficient parallelism Optimising for Kepler
Debugging CUDA Applications Przetwarzanie Równoległe CUDA/CELL
Debugging CUDA Applications Przetwarzanie Równoległe CUDA/CELL Michał Wójcik, Tomasz Boiński Katedra Architektury Systemów Komputerowych Wydział Elektroniki, Telekomunikacji i Informatyki Politechnika
GPGPU Computing. Yong Cao
GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power
1. If we need to use each thread to calculate one output element of a vector addition, what would
Quiz questions Lecture 2: 1. If we need to use each thread to calculate one output element of a vector addition, what would be the expression for mapping the thread/block indices to data index: (A) i=threadidx.x
ANDROID DEVELOPER TOOLS TRAINING GTC 2014. Sébastien Dominé, NVIDIA
ANDROID DEVELOPER TOOLS TRAINING GTC 2014 Sébastien Dominé, NVIDIA AGENDA NVIDIA Developer Tools Introduction Multi-core CPU tools Graphics Developer Tools Compute Developer Tools NVIDIA Developer Tools
Computer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
Guided Performance Analysis with the NVIDIA Visual Profiler
Guided Performance Analysis with the NVIDIA Visual Profiler Identifying Performance Opportunities NVIDIA Nsight Eclipse Edition (nsight) NVIDIA Visual Profiler (nvvp) nvprof command-line profiler Guided
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v5.5 July 2013 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About
GPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
CUDA Debugging. GPGPU Workshop, August 2012. Sandra Wienke Center for Computing and Communication, RWTH Aachen University
CUDA Debugging GPGPU Workshop, August 2012 Sandra Wienke Center for Computing and Communication, RWTH Aachen University Nikolay Piskun, Chris Gottbrath Rogue Wave Software Rechen- und Kommunikationszentrum
E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING
TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING NVIDIA DEVELOPER TOOLS BUILD. DEBUG. PROFILE. C/C++ IDE INTEGRATION STANDALONE TOOLS HARDWARE SUPPORT CPU AND GPU DEBUGGING & PROFILING
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing
CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @
Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Modern GPU
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology
Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology Parallel Reduction Common and important data parallel primitive Easy to implement in CUDA Harder to get it right Serves as
NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS
NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS DU-05349-001_v6.0 February 2014 Installation and Verification on TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2.
Getting Started with CodeXL
AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic
Case Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke [email protected] ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
Debugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v6.5 August 2014 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About
Developing, Deploying, and Debugging Applications on Windows Embedded Standard 7
Developing, Deploying, and Debugging Applications on Windows Embedded Standard 7 Contents Overview... 1 The application... 2 Motivation... 2 Code and Environment... 2 Preparing the Windows Embedded Standard
GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
MONITORING PERFORMANCE IN WINDOWS 7
MONITORING PERFORMANCE IN WINDOWS 7 Performance Monitor In this demo we will take a look at how we can use the Performance Monitor to capture information about our machine performance. We can access Performance
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
GPU Programming Strategies and Trends in GPU Computing
GPU Programming Strategies and Trends in GPU Computing André R. Brodtkorb 1 Trond R. Hagen 1,2 Martin L. Sætra 2 1 SINTEF, Dept. Appl. Math., P.O. Box 124, Blindern, NO-0314 Oslo, Norway 2 Center of Mathematics
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1
RecoveryVault Express Client User Manual
For Linux distributions Software version 4.1.7 Version 2.0 Disclaimer This document is compiled with the greatest possible care. However, errors might have been introduced caused by human mistakes or by
NVIDIA Tools For Profiling And Monitoring. David Goodwin
NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale
CS3600 SYSTEMS AND NETWORKS
CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 2: Operating System Structures Prof. Alan Mislove ([email protected]) Operating System Services Operating systems provide an environment for
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
AMD CodeXL 1.7 GA Release Notes
AMD CodeXL 1.7 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on
GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics
GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),
Online Backup Linux Client User Manual
Online Backup Linux Client User Manual Software version 4.0.x For Linux distributions August 2011 Version 1.0 Disclaimer This document is compiled with the greatest possible care. However, errors might
SKP16C62P Tutorial 1 Software Development Process using HEW. Renesas Technology America Inc.
SKP16C62P Tutorial 1 Software Development Process using HEW Renesas Technology America Inc. 1 Overview The following tutorial is a brief introduction on how to develop and debug programs using HEW (Highperformance
Online Backup Client User Manual
For Linux distributions Software version 4.1.7 Version 2.0 Disclaimer This document is compiled with the greatest possible care. However, errors might have been introduced caused by human mistakes or by
Introduction to Android Development
2013 Introduction to Android Development Keshav Bahadoor An basic guide to setting up and building native Android applications Science Technology Workshop & Exposition University of Nigeria, Nsukka Keshav
Operating Systems 4 th Class
Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Introduction. What is an Operating System?
Introduction What is an Operating System? 1 What is an Operating System? 2 Why is an Operating System Needed? 3 How Did They Develop? Historical Approach Affect of Architecture 4 Efficient Utilization
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Example Program for Crestron - Setup Guide -
Example Program for Crestron - Setup Guide - May, 2010 Introduction This guide is a step-by-step setup guide to setting up the Yamaha Commercial Audio demonstration programming for Crestron. The program
Cosmic Board for phycore AM335x System on Module and Carrier Board. Application Development User Manual
Cosmic Board for phycore AM335x System on Module and Carrier Board Application Development User Manual Product No: PCL-051/POB-002 SOM PCB No: 1397.0 CB PCB No: 1396.1 Edition: October,2013 In this manual
1. Product Information
ORIXCLOUD BACKUP CLIENT USER MANUAL LINUX 1. Product Information Product: Orixcloud Backup Client for Linux Version: 4.1.7 1.1 System Requirements Linux (RedHat, SuSE, Debian and Debian based systems such
Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:
Tools Page 1 of 13 ON PROGRAM TRANSLATION A priori, we have two translation mechanisms available: Interpretation Compilation On interpretation: Statements are translated one at a time and executed immediately.
OpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
Online Backup Client User Manual Linux
Online Backup Client User Manual Linux 1. Product Information Product: Online Backup Client for Linux Version: 4.1.7 1.1 System Requirements Operating System Linux (RedHat, SuSE, Debian and Debian based
Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma [email protected]
Debugging and Profiling Lab Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma [email protected] Setup Login to Ranger: - ssh -X [email protected] Make sure you can export graphics
Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform
Mitglied der Helmholtz-Gemeinschaft System monitoring with LLview and the Parallel Tools Platform November 25, 2014 Carsten Karbach Content 1 LLview 2 Parallel Tools Platform (PTP) 3 Latest features 4
Online Backup Client User Manual
Online Backup Client User Manual Software version 3.21 For Linux distributions January 2011 Version 2.0 Disclaimer This document is compiled with the greatest possible care. However, errors might have
ultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
CSC 2405: Computer Systems II
CSC 2405: Computer Systems II Spring 2013 (TR 8:30-9:45 in G86) Mirela Damian http://www.csc.villanova.edu/~mdamian/csc2405/ Introductions Mirela Damian Room 167A in the Mendel Science Building [email protected]
Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model
Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Amin Safi Faculty of Mathematics, TU dortmund January 22, 2016 Table of Contents Set
CodeWarrior Development Studio for Freescale S12(X) Microcontrollers Quick Start
CodeWarrior Development Studio for Freescale S12(X) Microcontrollers Quick Start SYSTEM REQUIREMENTS Hardware Operating System Disk Space PC with 1 GHz Intel Pentum -compatible processor 512 MB of RAM
RWTH GPU Cluster. Sandra Wienke [email protected] November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke [email protected] November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative
SimbaEngine SDK 9.4. Build a C++ ODBC Driver for SQL-Based Data Sources in 5 Days. Last Revised: October 2014. Simba Technologies Inc.
Build a C++ ODBC Driver for SQL-Based Data Sources in 5 Days Last Revised: October 2014 Simba Technologies Inc. Copyright 2014 Simba Technologies Inc. All Rights Reserved. Information in this document
Using the CoreSight ITM for debug and testing in RTX applications
Using the CoreSight ITM for debug and testing in RTX applications Outline This document outlines a basic scheme for detecting runtime errors during development of an RTX application and an approach to
OpenCL Programming for the CUDA Architecture. Version 2.3
OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different
Installation Guide. (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom
Installation Guide (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom Tel: +44 (0) 141 3322681 Fax: +44 (0) 141 3326792 www.mve.com Table of Contents 1.
STLinux Software development environment
STLinux Software development environment Development environment The STLinux Development Environment is a comprehensive set of tools and packages for developing Linux-based applications on ST s consumer
Yocto Project Eclipse plug-in and Developer Tools Hands-on Lab
Yocto Project Eclipse plug-in and Developer Tools Hands-on Lab Yocto Project Developer Day San Francisco, 2013 Jessica Zhang Introduction Welcome to the Yocto Project Eclipse plug-in
IN STA LLIN G A VA LA N C HE REMOTE C O N TROL 4. 1
IN STA LLIN G A VA LA N C HE REMOTE C O N TROL 4. 1 Remote Control comes as two separate files: the Remote Control Server installation file (.exe) and the Remote Control software package (.ava). The installation
LOG MANAGEMENT Update Log Setup Screen Update Log Options Use Update Log to track edits, adds and deletes Accept List Cancel
Log Management - Page 22-1 LOG MANAGEMENT There are various log options throughout ZonePro. A log file is a file that store information about changes that have been made while using ZonePro. Log files
Introduction to Embedded Systems. Software Update Problem
Introduction to Embedded Systems CS/ECE 6780/5780 Al Davis logistics minor Today s topics: more software development issues 1 CS 5780 Software Update Problem Lab machines work let us know if they don t
Eliminate Memory Errors and Improve Program Stability
Eliminate Memory Errors and Improve Program Stability with Intel Parallel Studio XE Can running one simple tool make a difference? Yes, in many cases. You can find errors that cause complex, intermittent
APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE
APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, [email protected]
XID ERRORS. vr352 May 2015. XID Errors
ID ERRORS vr352 May 2015 ID Errors Introduction... 1 1.1. What Is an id Message... 1 1.2. How to Use id Messages... 1 Working with id Errors... 2 2.1. Viewing id Error Messages... 2 2.2. Tools That Provide
RA MPI Compilers Debuggers Profiling. March 25, 2009
RA MPI Compilers Debuggers Profiling March 25, 2009 Examples and Slides To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar
Fast Implementations of AES on Various Platforms
Fast Implementations of AES on Various Platforms Joppe W. Bos 1 Dag Arne Osvik 1 Deian Stefan 2 1 EPFL IC IIF LACAL, Station 14, CH-1015 Lausanne, Switzerland {joppe.bos, dagarne.osvik}@epfl.ch 2 Dept.
Embedded Linux development training 4 days session
Embedded Linux development training 4 days session Title Overview Duration Trainer Language Audience Prerequisites Embedded Linux development training Understanding the Linux kernel Building the Linux
CS 455 Spring 2015. Word Count Example
CS 455 Spring 2015 Word Count Example Before starting, make sure that you have HDFS and Yarn running, using sbin/start-dfs.sh and sbin/start-yarn.sh Download text copies of at least 3 books from Project
Lab 1 Beginning C Program
Lab 1 Beginning C Program Overview This lab covers the basics of compiling a basic C application program from a command line. Basic functions including printf() and scanf() are used. Simple command line
Introduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
Keil Debugger Tutorial
Keil Debugger Tutorial Yifeng Zhu December 17, 2014 Software vs Hardware Debug There are two methods to debug your program: software debug and hardware debug. By using the software debug, you do not have
Novell ZENworks Asset Management 7.5
Novell ZENworks Asset Management 7.5 w w w. n o v e l l. c o m October 2006 INSTALLATION GUIDE Table Of Contents 1. Installation Overview... 1 If you are upgrading... 1 Installation Choices... 1 ZENworks
Installing S500 Power Monitor Software and LabVIEW Run-time Engine
EigenLight S500 Power Monitor Software Manual Software Installation... 1 Installing S500 Power Monitor Software and LabVIEW Run-time Engine... 1 Install Drivers for Windows XP... 4 Install VISA run-time...
The installation of pylon for Linux is described in the INSTALL text document.
pylon 4 Camera Software Suite for Linux for Use with Basler Gigabit Ethernet(GigE) and Basler USB 3.0 Cameras (U3V) ==== System Requirements ==== GigE ---- A GigE network adapter that supports jumbo frames
Example of Standard API
16 Example of Standard API System Call Implementation Typically, a number associated with each system call System call interface maintains a table indexed according to these numbers The system call interface
Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research
! Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research! Cynthia Cornelius! Center for Computational Research University at Buffalo, SUNY! cdc at
Lab 2-2: Exploring Threads
Lab 2-2: Exploring Threads Objectives Prerequisites After completing this lab, you will be able to: Add profiling support to a Windows CE OS Design Locate files associated with Windows CE profiling Operate
Chapter 6, The Operating System Machine Level
Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
MetaMorph Microscopy Automation & Image Analysis Software Super-Resolution Module
MetaMorph Microscopy Automation & Image Analysis Software Super-Resolution Module Version 7 Installation Instructions July 2013 This document is provided to customers who have purchased Molecular Devices
