CONTENTS. I Introduction 2. II Preparation 2 II-A Hardware... 2 II-B Checking Versions for Hardware and Drivers... 3

Size: px
Start display at page:

Download "CONTENTS. I Introduction 2. II Preparation 2 II-A Hardware... 2 II-B Checking Versions for Hardware and Drivers... 3"

Transcription

1 1 CONTENTS I Introduction 2 II Preparation 2 II-A Hardware II-B Checking Versions for Hardware and Drivers III Software 3 III-A Installation III-B Checking Software Versions III-C Running CUDA Examples IV Creating Your Own CUDA Program 5 V Integrate CUBLAS to ATLAS 6 V-A Sources for Sdot V-B Makefiles V-C ATLAS Source Modification V-D Compilation and Execution VI User BLAS Implementation on CUDA 8 VI-A Directory Structure VI-B Makefiles VI-C Sources for Sdot VI-D ATLAS Source Modification VI-E Compilation and Execution VI-F Test Runs References 14

2 2 ATLAS GPU Tutorial Chia-Tien Dan Lo Department of Computer Science University of Texas at San Antonio Abstract This paper describes how to run Basic Linear Algebra Subprograms (BLAS) routines on Graphics Processing Units (GPUs) in the Automatically Tuned Linear Algebra Software (ATLAS) project. The BLAS implementations invoked in this tutorial include CUBLAS 2.2, a BLAS implementation in Compute Unified Device Architecture (CUDA), and CULBASLO 1.0, our BLAS implementation. Basically, we integrate CUBLAS and CUBLASLO in ATLAS via a test function, l1blastst.c, defined in the ATLAS source tree. A test run will be illustrated this integration, and its results will be reported. I. INTRODUCTION This tutorial will be using a Linux box running a 64-bit Ubuntu 8.10 Linux ( ). It can be applied to a Windows based machine with an installed Cygwin to simulate the Linux environment. Software used in this tutorial includes GNU Make 3.8.1, and GCC GPUs installed in the Linux box is Nvidia e- GeForce 8600 GTS with 512 MB DDR3 and PCIe x16. Other GPUs may also apply. Installed in the Linux box is another VGA card, which is Nvidia GeForce 7200 GS with 256 MB RAM, for the main display purpose. The 7-series Nvidia card is not supported by CUDA. Therefore, the CUDA computation will be on the 8600 GTS, which is not connected to any monitor. This configuration will exceed the 5-second limit on running CUDA programs, caused by the CUDA driver if GPUs are attached to a display. II. PREPARATION Before the tutorial, make sure the software and hardware are installed and are in the correct versions. It is very often that you may be stuck on some point during the tutorial if some of the software and the hardware are the same as ones used in my machine. If the box does not have ATLAS and LAPACK, follow the ATLAS installation Guide with the packages ATLAS and LAPACK [1]. If the installed versions are different, remove it and install the correct version accordingly. We will be referring two directories in the ATLAS source tree: SRCDIR and BLDDIR. In my box, I get and unzip ATLAS to danlo@etl-corei74b: /atlas/atlas , and make a build directory danlo@etl-corei74b: /atlas/atlas /linux_i7_64_build. So SRCDIR will refer to the former directory, and BLDDIR will refer to the latter one through this tutorial. A. Hardware For GPUs computation, of course, we have to install GPUs to the motherboard. To bypass the 5-second limit, it is strongly recommended that you add an extra dedicated GPUs card for the CUDA computation. Therefore, the motherboard has to have at least two PCIe 2.0 slots: one for normal VGA display card that connects to a display, and one for the CUDA enabled GPUs. The motherboard used in the tutorial is ASUS P6T Deluxe with two PCIe 2.0 slots. At the time of this writing, ASUS L1N64-SLI WS Dual L(1207FX) provides 4 PCIe x16 slots. In Windows machines, both of the two GPUs cards have to be the same brand, i.e., Nvidia, for the driver issue. Linux boxes, however, may take a different VGA card along with a CUDA enabled GPUs card, though this configuration will not be validated in this tutorial. It is highly recommended that both cards are the Nvidia brand to save some hurdles. Installation of the GPUs card includes two steps: first, open the computer box and insert the card to the motherboard, and second, install device driver. Make sure the power cord is detached in addition to

3 3 powering off the machine. There is still power running when the power switch is off. The latest Nvidia device driver at the time of this writing is cudatoolkit_2.2_linux_64_ubuntu8.10.run from Nvidia. There are a number of Nvidia drivers for a variety of operating systems and machines. Make sure you get the right one for your system. You will have to remove old drivers first. Use the following command to remove old Nvidia drivers. sudo apt-get remove nvidia* Then download the new drivers from Nvidia s website and install it using the following command. sudo cudadriver_2.2_linux_64_ beta.run Since you need to close your x session to do this, I recommend you ssh into your box from another machine so that you can get a nice terminal to run Nvidia s installer (It seems to have some problems if you run it from the console; I got flashing text and other nonsense). Once you ve modprobed the new driver, you will need to install software in the next section. B. Checking Versions for Hardware and Drivers To find out what VGA device hardware is installed in your system, use the following command: danlo@etl-corei74b: $ lspci grep VGA 02:00.0 VGA compatible controller: nvidia Corporation G72 [GeForce 7300 SE] (re 03:00.0 VGA compatible controller: nvidia Corporation GeForce 8600 GTS (rev a1) In my machine, there are two VGA cards: GeForce 7200 and GeForce 8600 GTS. The first one is not supported by CUDA. Only the second one is CUDA-enable GPU. Also note that the PCI bus information prefixed to each device. When setting XServer, this information has to be set in /etc/x11/xorg.conf manually. An NVIDIA Linux Display driver is needed to run CUDA code on a CUDA-enabled NVIDIA GPU. CUDA 2.2 requires version or later of the Linux NVIDIA Display Driver. Please see the NVIDIA CUDA Toolkit 2.2 release notes for more details. We can use the following command to verify the running device driver version: danlo@etl-corei74b:/proc$ less /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module Thu Apr 30 15:48:49 GCC version: gcc version (Ubuntu ubuntu12) For information on installing NVIDIA Linux display drivers, please refer to the NVIDIA Accelerated Linux Driver Set README and Installation Guide: III. SOFTWARE The software required to run CPU programs includes CUDA toolkit, SDK, and Debugger. At the time of this writing, the last version is 2.2. The toolkit contains Nvidia s C compiler (nvcc), assembler (ptxas), debugger (cuda-gdb), and other tools. The SDK is composed of a series of sample CUDA programs, and a template to create user programs. A. Installation Nvidia has created *.run self-extracting archives to install these tools. In the following steps, we will install the required software, and test running 1) Install version 2.2 of the NVIDIA CUDA Toolkit by executing the file NVIDIA_CUDA_Toolkit_2.2-*.run corresponding to your Linux distribution

4 4 Add the CUDA binaries and lib path to your PATH and LD_LIBRARY_PATH environment variables. It is recommended that you run the installer as root and use the default install path (/usr/local). Make sure that you add the location of the CUDA binaries (such as nvcc) to your PATH environment variable and the location of the CUDA libraries (such as libcuda.so) to your LD_LIBRARY_PATH environment variable. In the bash shell, one way to do this is to add the following lines to the file.bash_profile in your home directory. PATH=$PATH:<CUDA_INSTALL_PATH>/bin LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<CUDA_INSTALL_PATH>/lib export PATH export LD_LIBRARY_PATH 2) Install version 2.2 of the NVIDIA CUDA SDK by executing the file NVIDIA_CUDA_SDK_2.2-*.run The installer will prompt you to enter an installation path for the SDK or accept the default. We will refer to the path you choose as SDK_INSTALL_PATH. 3) Build the SDK project examples. cd <SDK_INSTALL_PATH> make 4) Run the examples: cd <SDK_INSTALL_PATH>/bin/linux32/release matrixmul (or any of the other executables in that directory) This package consists of a.run file. This is a self-extracting archive that decompresses its contents to a temporary folder and then installs the contents to a path that you specify. The archive is: NVIDIA_CUDA_SDK_2.2-*.run : NVIDIA CUDA SDK Installer B. Checking Software Versions If the installation is good or you may want to find out what current CUDA version is running, you can run the following command: danlo@etl-corei74b: $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) NVIDIA Corporation Built on Thu_Apr 9_05:05:52_PDT_2009 Cuda compilation tools, release 2.2, V The command nvcc is CUDAs C compiler. Later on, you will use nvcc to compile your CUDA programs. C. Running CUDA Examples Before writing any CUDA it, program is highly recommended to compile and run the sample programs that come along with the SDK. By passing these sample programs, it shows that your system is ready for you to develop CUDA programs. To build the SDK project examples, follow the steps: 1) Go to <SDK_INSTALL_PATH> (cd <SDK_INSTALL_PATH>) 2) Build: release configuration by typing make.

5 5 debug configuration by typing make dbg=1. emurelease configuration by typing make emu=1. emudebug configuration by typing make emu=1 dbg=1. Running make at the top level first builds libcutil, a utility library used by the SDK examples (libcutil is simply for convenience it is not a part of CUDA and is not required for your own CUDA programs). Make then builds each of the projects in the SDK. NOTES: The release and debug configurations require a CUDA-capable GPU to run properly (see Appendix A.1 of the CUDA Programming Guide [2] for a complete list of CUDA-capable GPUs). The emurelease and emudebug configurations run in device emulation mode, and therefore do not require a CUDA-capable GPU to run properly. You can build an individual sample by typing make (or make emu=1, etc.) in that sample s project directory. For example: cd <SDK_INSTALL_PATH>/projects/matrixmul make emu=1 And then execute the sample with: <SDK_INSTALL_PATH>/bin/linux32/emurelease/matrixmul To build just libcutil, type make (or make dbg=1 ) in the common subdirectory: cd <SDK_INSTALL_PATH>/common make Run the examples from the release, debug, emurelease, or emudebug directories located in /bin/linux32/[release debug emurelease emudebug]. IV. CREATING YOUR OWN CUDA PROGRAM Creating a new CUDA Program using the NVIDIA CUDA SDK infrastructure is easy. Nvidia has provided a template project that you can copy and modify to suit your needs. Just follow these steps: 1) Copy the template project cd <SDK_INSTALL_PATH>/projects cp -r template <myproject> 2) Edit the filenames of the project to suit your needs mv template.cu myproject.cu mv template_kernel.cu myproject_kernel.cu mv template_gold.cpp myproject_gold.cpp 3) Edit the Makefile and source files. Just search and replace all occurences of template with myproject. 4) Build the project make You can build a debug version with make dbg=1, an emulation version with make emu=1, and a debug emulation with make dbg=1 emu=1. 5) Run the program../../bin/linux32/release/myproject (It should print Test PASSED ) 6) Now modify the code to perform the computation you require. See the CUDA Programming Guide for details of programming in CUDA.

6 6 V. INTEGRATE CUBLAS TO ATLAS CUBLAS is a CUDA implementation for BLAS. To write programs that call CUBLAS, we would have to implement a wrapper that initializes GPU and CUDA library, and makes calls to CUBLAS library. Remember that BLDDIR refers to the build directory of ATLAS. We will create a directory for the wrapper. In this tutorial, I created one in BLDDIR/bin/DanLo/ATL_CUBLAS. Under this directory, I organize one subdirectory for a BLAS library function. Since I will use sdot as an example, I therefore created a subdirectory sdot. In the sdot directory, all the source files related to sdot resides. A. Sources for Sdot The source of sdot.c is listed as follows for your reference. Before calling a CUBLAS library function, the GPU and the CUBLAS library have to be initialized. The wrapper to be tested is basically composed of the following steps: 1) Headers. There are two headers needed: cublas.h, cuda_runtime.h. The former contains CUBLAS library functions whereas the latter consists of CUDA runtime APIs, functions start with cuda*. Note that CUDA driver APIs start with cu*. Lines 1 and 2 include the two headers. Line 3 include for util.h is a header where I put my own utility functions such as defines, wrapper and the like. 2) Initialize GPU and CUBLAS library. There is a cutildeviceinit() defined in cutil_inline_runtime.h that will parse command switches and initialize GPU by calling CudaSetDevice(). For CUBLAS, there are helper functions to ease device initialization (cublasinit()), clean up after execution (cublasshutdown()), error handling (cublasgeterror()) memory management (cublasalloc(), cublasfree()), and matrix manipulation (cublassetvector(), cublasgetvector(), cublassetmatrix(), cublasgetmatrix() We will use cublasinit() to initialize CUBLAS library and bound CUBLAS to the current attached GPU. In case of multiple initializations, a static variable cublasinitflag is used to guard multiple attempts. Lines 11 16, implement the initialization. 3) GPU memory allocation. Before we ask GPU to compute something, we have to request space to hold data to be processed. Lines allocate memory for two vectors on the GPU device. In this implementation, we take advantage of the CUBLAS s helper routine cublasalloc() to request device memory. 4) Date movement from host to device. Once we got device memory, we have to initialize the memory with the data to be processed. In this implementation, lines 25-29, again, the CUBLAS s helper function cublassetvector() is used to do the job. The function check_cuda() is a utility in util.h I wrote to verify if the previous library call is successful. Lines 31-32, commented out, show another way of moving data around using cudamemcpy(). 5) Computation. Line 35 is a statement to actually call CUBLAS library routine. In this tutorial, cublassdot() is called. Since cublassdot() returns a float [3], a local variable result is used to keep the result. Note that the current CUDA implementation (version 2.2) does not support return value for a kernel function. Therefore, CUBLAS library functions that return a value have to ship result back to CPU somehow. Therefore, the following step will be done by the CUBLAS library functions. 6) Data movement from device to host. Since the cublassdot() function returns a float, we don t need to move result back to CPU. If the result, e.g., a matrix, contains more than one values, we will have to move the result back to CPU manually. 7) Housekeeping. After GPU computation, the memory allocated will be freed for next computation. Here we call cublasfree() to clean the space used for the two vectors (Lines 38 29). If the memory is not returned, sooner or later, the GPU memory will be full of garbage and further memory requests will no longer be grated. So always keep a good habit of housekeeping.

7 7 1 #include "cublas.h" 2 #include "cuda_runtime.h" 3 #include "util.h" 4 5 static int cublasinitflag = 0; 6 7 float ATL_GPU_sdot(int n, const float *x, int incx, const float *y, int incy) { 8 /* Dan Lo */ 9 float result; 10 float *x_d, *y_d; 11 /* Dan Lo: init GPU */ 12 if (!cublasinitflag) { 13 cublasinit(); 14 printf("cublas Init...\n"); 15 cublasinitflag = 1; 16 } /* allocate GPU memory */ 19 cublasstatus stat; 20 stat = cublasalloc(n, sizeof(*x_d)*incx, (void**)&x_d); 21 check_cuda(stat, "alloc"); 22 stat = cublasalloc(n, sizeof(*y_d)*incy, (void**)&y_d); 23 check_cuda(stat, "alloc"); /* move data */ 26 stat = cublassetvector(n, sizeof(*x_d), x, incx, x_d, incx); 27 check_cuda(stat, "setvector"); 28 stat = cublassetvector(n, sizeof(*y_d), y, incy, y_d, incy); 29 check_cuda(stat, "setvector"); 30 /* 31 cudamemcpy(x_d, x, sizeof(*x_d)*incx*n, cudamemcpyhosttodevice); 32 cudamemcpy(y_d, y, sizeof(*y_d)*incy*n, cudamemcpyhosttodevice); 33 */ 34 /* computation */ 35 result = cublassdot(n, x_d, 1, y_d, 1); /* free GPU memory */ 38 cublasfree(x_d); 39 cublasfree(y_d); 40 return result; 41 }/* ATL_GPU_sdot() */ B. Makefiles In order to integrate the added code to the ATLAS make system, we will have to modify the file BLDDIR/bin/Makefile, and add one per each added subdirectory. Since we will build level-1 BLAS test xsl1blastst, the following modification on the file BLDDIR/bin/Makefile should be made: DanLoStuff: (cd DanLo;$(MAKE))

8 8 xsl1blastst : sl1blastst.o sl1lib ststlib DanLoStuff $(FLINKER) $(FCLINKFLAGS) -o $@ sl1blastst.o \ $(TESTlib) $(BLASlib) $(ATLASlib) $(LIBS) \ -lcudart -L/usr/local/cuda/lib -latl_cublas -L./DanLo The first couple of lines in the Makefile simply tell the make utility to work on subdirectory DanLo. Therefore, we will need to add the file BLDDIR/bin/DanLo/Makefile as follows: libatl_cublaslo.a: MUSTDO ar -ru libatl_cublas.a ATL_CUBLAS/sdot/*.o ranlib libatl_cublas.a MUSTDO: (cd ATL_CUBLAS;$(MAKE)) Also add the make file BLDDIR/bin/DanLo/ATL_CUBLAS/Makefile as follows: MUSTDO: (cd sdot;$(make)) Finally, add the following make file BLDDIR/bin/DanLo/ATL_CUBLAS/sdot/Makefile to actually compile the sdot sources as follows: sdot.o: sdot.c util.h gcc -g -c sdot.c -I/usr/local/cuda/include C. ATLAS Source Modification Because we will use ATLAS s timers to test sdot implementation, the ATLAS source SRCDIR/bin/l1blastst.c have to be modified. We can simply change the define for trusted_dot to our sdot function. The following shows the modification. The original Mjoin (commented out) is replaced with our sdot implementation ATL_CUBLAS_sdot. #define trusted_dot( N, X, ix, Y, iy ) \ ATL_CUBLAS_sdot (N, X, ix, Y, iy ) //Mjoin( TP1, dot )( N, X, ix, Y, iy ) D. Compilation and Execution Once the makefiles are in place, change directory to BLDDIR/bin and type the following command: make xsl1blastst To run the test program, simply type the following command:./xsl1blastst R dot More options can be found by typing the following:./xsl1blastst help VI. USER BLAS IMPLEMENTATION ON CUDA In this tutorial, I will show you my implementation of sdot on CUDA. The flow will pretty much similar to integrating CUBLAS to ATLAS depicted in the previous sections.

9 9 A. Directory Structure The following directory structure is built. BLDDIR/bin/DanLo/ATL_CUBLASLO/sdot B. Makefiles The Makefiles are similar to ones used in the previous section except any occurrence of CUBLAS should be replaced with CUBLASLO. The one in BLDDIR/bin/DanLo/ATL_CUBLASLO/sdot/Makefile is listed as fllows: CUDA_INSTALL_PATH := /usr/local/cuda ROOTDIR := /home/danlo/nvidia_cuda_sdk COMMONDIR := $(ROOTDIR)/common INCLUDE := -I. -I$(CUDA_INSTALL_PATH)/include -I$(COMMONDIR)/inc # debug -g COMMONFLAG := $(INCLUDE) -DUNIX -g # debug -D_DEBUG NVCCFLAG := -D_DEBUG CUBIN_ARCH_FLAG := -m64 SMVERSION := -arch sm_11 # if use driver API -lcuda LIB := -lcudart OBJS := main.cu.o kernel.cu.o util.c.o all: $(OBJS) %.cu.o :: %.cu nvcc $(NVCCFLAG) -c $< $(SMVERSION) $(INCLUDE) $(COMMONFLAG) -o $@ %.c.o :: %.c gcc -g -c $< -o $@ clean: rm $(OBJS) It is worth noting that this compilation does not involve linking. Therefore, I don t include CUDA runtime libraries. The CUDA runtime libraries will be specified in the file BLDDIR/bin/Makefile as follows: xsl1blastst : sl1blastst.o sl1lib ststlib DanLoStuff $(FLINKER) $(FCLINKFLAGS) -o $@ sl1blastst.o \ $(TESTlib) $(BLASlib) $(ATLASlib) \ $(LIBS) -lcudart -L/usr/local/cuda/lib \ -latl_cublaslo -L./DanLo This is where the CUDA runtime library and my own library libatl_cublaslo.a got linked.

10 10 C. Sources for Sdot The implementation of our sdot() function is composed of two sources: main.cu and kernel.cu. Listed in the following is the kernel.cu, which implements two kernels: dot_product_kernel() and sum_reduction(). The first kernel performs multiplications, and the latter sums them together. 1 /* This file implements kernel functions for dot product test. 2 * 3 * Chia-Tien Dan Lo 4 * May 1, * 6 */ 7 8 #ifndef KRNEL_CU 9 #define KRNEL_CU #include <stdio.h> 12 #include <global.h> global void dot_product_kernel(vector_t *v1, vector_t *v2, vector_t *p) { 15 // find which element to work with from block and thread info 16 unsigned int idx = blockidx.x * BLOCK_X + threadidx.x; 17 /* each thread works on a product */ 18 p[idx] = v1[idx]*v2[idx]; }/* dot_product_kernel() */ /* sum vector and store it to v[vector_size] */ 23 /* assume vector size is multiple of 2*BLOCK_X */ 24 /* 5/17/2009 remove multiple of 2*BLOCK_X limit */ 25 global void sum_reduction(vector_t *v, unsigned int n, vector_t *ps) { 26 unsigned int t = threadidx.x; 27 unsigned int idx; 28 idx = blockidx.x*block_x+t; 29 for (unsigned int stride = BLOCK_X/2; stride > 0; stride = stride >> 1 ) { 30 if (t<stride) { 31 v[idx] += v[idx+stride]; 32 } /* if */ 33 syncthreads(); 34 }/* for stride */ 35 /* only this guy will collect partial sum! */ 36 if (t==0) 37 ps[blockidx.x] = v[idx]; }/* sum_reduction() */ #endif

11 11 The only function defined in main.cu is ATL_CUBLASLO_sdot(), which is indeed a wrapper similar to the one implemented to call CUBLAS function. We implement this wrapper to initialize GPU and CUDA runtime library, allocate device memory, copy data to device, call kernel functions, move result back to CPU, free any memory allocated, and finally return results back to the caller. 1 /* 2 * Dot product operation in libatl_cublaslo.a. 3 * 4 * 5 * Chia-Tien Dan Lo 6 * May 16, * 8 */ 9 10 #include <stdio.h> 11 #include <cutil_inline.h> 12 //#include "oracle.h" 13 #include "util.h" 14 #include "global.h" 15 #include "kernel.h" static int cuinitflag = 0; /* dot_product_cu starts here */ 21 /* we may have to init device later! 22 int main(int argc, char **argv) { */ 23 extern "C" vector_t ATL_CUBLASLO_sdot(const int n, const vector_t *v1, const int incx, const vector_t *v2, const int incy) { 24 /* vector_t v1[n], v2[n], p[n+1]={}; */ 25 int i; 26 vector_t *p, *ps; 27 vector_t *zero; 28 vector_t result=0; 29 unsigned int N; 30 /* init GPU */ 31 if (!cuinitflag) { 32 cuinitflag = 1; 33 /* set highest device for computation */ 34 cudasetdevice( cutgetmaxgflopsdeviceid() ); 35 } 36 /* round up N to multiple of BLOCK_X */ 37 N = (n+block_x_1); 38 N = N & BLOCK_X_MASK; 39 /* allocate memory */ 40 /* 41 v1 = (vector_t*)dan_malloc(sizeof(vector_t)*n); 42 v1 = (vector_t*)dan_malloc(sizeof(vector_t)*n); 43 */ 44 p = (vector_t*)dan_calloc(sizeof(vector_t)*(n));

12 45 ps = (vector_t*)dan_calloc(sizeof(vector_t)*(n/block_x)); 46 /* used to zero extra GPU memory for both vectors*/ 47 zero = (vector_t*)dan_calloc(sizeof(vector_t)*(n-n)); 48 p[n] = 0; 49 // random generate vectors 50 /* 51 vector_gen(v1, N, MAX_VALUE); 52 vector_gen(v2, N, MAX_VALUE); 53 */ 54 // test out 55 // vector_out(v, 100); 56 /* init device */ 57 // CUT_DEVICE_INIT(argc, argv); 58 /* allocate device memory */ 59 vector_t *v1d, *v2d, *pd, *psd; 60 cutilsafecall(cudamalloc((void**)&v1d, N*sizeof(vector_t))); 61 cutilsafecall(cudamalloc((void**)&v2d, N*sizeof(vector_t))); 62 cutilsafecall(cudamalloc((void**)&pd, (N)*sizeof(vector_t))); 63 cutilsafecall(cudamalloc((void**)&psd, (N/BLOCK_X)*sizeof(vector_t))); 64 // create and start timer 65 /* 66 unsigned int timer = 0; 67 cutilcheckerror( cutcreatetimer( &timer)); 68 cutilcheckerror( cutstarttimer( timer)); 69 */ 70 /* move data to device */ 71 cutilsafecall(cudamemcpy(v1d, v1, n*sizeof(vector_t), cudamemcpyhosttodevice)); 72 cutilsafecall(cudamemcpy(v2d, v2, n*sizeof(vector_t), cudamemcpyhosttodevice)); 73 /* zero the accumulate partial sum */ 74 cutilsafecall(cudamemcpy(psd, ps, (N/BLOCK_X)*sizeof(vector_t), cudamemcpyhosttodevice)); 75 cutilsafecall(cudamemcpy(v1d+n, zero, (N-n)*sizeof(vector_t), cudamemcpyhosttodevice)); 76 cutilsafecall(cudamemcpy(v2d+n, zero, (N-n)*sizeof(vector_t), cudamemcpyhosttodevice)); 77 /* setup execution parameters */ 78 dim3 block(block_x, BLOCK_Y); 79 dim3 grid(n/block.x, block.y); 80 /* invoke kernel */ 81 dot_product_kernel<<<grid, block>>>(v1d, v2d, pd); 82 sum_reduction<<<grid, block>>>(pd, N, psd); 83 /* copy result from device to host */ cutilsafecall( cudamemcpy( ps, psd, (N/BLOCK_X)*sizeof(vector_t), cudamemcpydevicetohost) ); 86 /* set result */ 87 for (i=0;i<n/block_x;i++) 88 result += ps[i]; 12

13 /* clean memory */ 91 cudafree(v1d); 92 cudafree(v2d); 93 cudafree(pd); 94 cudafree(psd); 95 free(p); 96 free(ps); 97 free(zero); 98 /* return result */ 99 return result; 100 }/* main() */ D. ATLAS Source Modification Because we will use ATLAS s timers to test sdot implementation, the ATLAS source SRCDIR/bin/l1blasts have to be modified. We can simply change the define for trusted_dot to our sdot function. The following shows the modification. The original Mjoin (commented out) is replaced with our sdot implementation ATL_CUBLASLO_sdot. #define trusted_dot( N, X, ix, Y, iy ) \ ATL_CUBLASLO_sdot (N, X, ix, Y, iy ) //Mjoin( TP1, dot )( N, X, ix, Y, iy ) E. Compilation and Execution Once the makefiles are in place, change directory to BLDDIR/bin and type the following command: make xsl1blastst To run the test program, simply type the following command:./xsl1blastst R dot More options can be found by typing the following:./xsl1blastst help F. Test Runs The following test runs show the correct implementation (PASS es) and performance of running our BLAS (SpUp s). danlo@etl-corei74b: /atlas/atlas /linux_i7_64_build/bin$./xsl1blastst -R./xsl1blastst -R dot -N DOT TST# N INCX INCY TIME MFLOP SpUp TEST ==== ==== ==== ==== ====== ===== ===== ===== PASS

14 PASS PASS PASS PASS PASS PASS PASS PASS PASS 10 tests run, 10 passed REFERENCES [1] R. C. Whaley, Atlas installation guide, in Technical Report CS-TR CS at UTSA, [2] Nvidia cuda programming guide version 2.2, develop.html, [3] Cuda cublas library v2.2, develop.html, 2008.

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v6.5 August 2014 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About

More information

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v5.5 July 2013 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About

More information

CUDA Debugging. GPGPU Workshop, August 2012. Sandra Wienke Center for Computing and Communication, RWTH Aachen University

CUDA Debugging. GPGPU Workshop, August 2012. Sandra Wienke Center for Computing and Communication, RWTH Aachen University CUDA Debugging GPGPU Workshop, August 2012 Sandra Wienke Center for Computing and Communication, RWTH Aachen University Nikolay Piskun, Chris Gottbrath Rogue Wave Software Rechen- und Kommunikationszentrum

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Learn CUDA in an Afternoon: Hands-on Practical Exercises Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA

More information

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS DU-05349-001_v6.0 February 2014 Installation and Verification on TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2.

More information

Debugging CUDA Applications Przetwarzanie Równoległe CUDA/CELL

Debugging CUDA Applications Przetwarzanie Równoległe CUDA/CELL Debugging CUDA Applications Przetwarzanie Równoległe CUDA/CELL Michał Wójcik, Tomasz Boiński Katedra Architektury Systemów Komputerowych Wydział Elektroniki, Telekomunikacji i Informatyki Politechnika

More information

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959

More information

GPGPU Parallel Merge Sort Algorithm

GPGPU Parallel Merge Sort Algorithm GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led

More information

GPU Tools Sandra Wienke

GPU Tools Sandra Wienke Sandra Wienke Center for Computing and Communication, RWTH Aachen University MATSE HPC Battle 2012/13 Rechen- und Kommunikationszentrum (RZ) Agenda IDE Eclipse Debugging (CUDA) TotalView Profiling (CUDA

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

CUDA Basics. Murphy Stein New York University

CUDA Basics. Murphy Stein New York University CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture

More information

The "Eclipse Classic" version is recommended. Otherwise, a Java or RCP version of Eclipse is recommended.

The Eclipse Classic version is recommended. Otherwise, a Java or RCP version of Eclipse is recommended. Installing the SDK This page describes how to install the Android SDK and set up your development environment for the first time. If you encounter any problems during installation, see the Troubleshooting

More information

OpenACC Basics Directive-based GPGPU Programming

OpenACC Basics Directive-based GPGPU Programming OpenACC Basics Directive-based GPGPU Programming Sandra Wienke, M.Sc. wienke@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum (RZ) PPCES,

More information

Hands-on CUDA exercises

Hands-on CUDA exercises Hands-on CUDA exercises CUDA Exercises We have provided skeletons and solutions for 6 hands-on CUDA exercises In each exercise (except for #5), you have to implement the missing portions of the code Finished

More information

CS 103 Lab Linux and Virtual Machines

CS 103 Lab Linux and Virtual Machines 1 Introduction In this lab you will login to your Linux VM and write your first C/C++ program, compile it, and then execute it. 2 What you will learn In this lab you will learn the basic commands and navigation

More information

gpus1 Ubuntu 10.04 Available via ssh

gpus1 Ubuntu 10.04 Available via ssh gpus1 Ubuntu 10.04 Available via ssh root@gpus1:[~]#lspci -v grep VGA 01:04.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a) 03:00.0 VGA compatible controller: nvidia Corporation

More information

Project Discussion Multi-Core Architectures and Programming

Project Discussion Multi-Core Architectures and Programming Project Discussion Multi-Core Architectures and Programming Oliver Reiche, Christian Schmitt, Frank Hannig Hardware/Software Co-Design, University of Erlangen-Nürnberg May 15, 2014 Administrative Trivia

More information

Installation Guide. (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom

Installation Guide. (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom Installation Guide (Version 2014.1) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom Tel: +44 (0) 141 3322681 Fax: +44 (0) 141 3326792 www.mve.com Table of Contents 1.

More information

Lecture 1: an introduction to CUDA

Lecture 1: an introduction to CUDA Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Overview hardware view software view CUDA programming

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Building graphic-rich and better performing native applications. Pro. Android C++ with the NDK. Onur Cinar

Building graphic-rich and better performing native applications. Pro. Android C++ with the NDK. Onur Cinar Building graphic-rich and better performing native applications Pro Android C++ with the NDK Onur Cinar For your convenience Apress has placed some of the front matter material after the index. Please

More information

Copyright 2014, SafeNet, Inc. All rights reserved. http://www.safenet-inc.com

Copyright 2014, SafeNet, Inc. All rights reserved. http://www.safenet-inc.com Ve Version 3.4 Copyright 2014, SafeNet, Inc. All rights reserved. http://www.safenet-inc.com We have attempted to make these documents complete, accurate, and useful, but we cannot guarantee them to be

More information

Introduction to the CUDA Toolkit for Building Applications. Adam DeConinck HPC Systems Engineer, NVIDIA

Introduction to the CUDA Toolkit for Building Applications. Adam DeConinck HPC Systems Engineer, NVIDIA Introduction to the CUDA Toolkit for Building Applications Adam DeConinck HPC Systems Engineer, NVIDIA ! What this talk will cover: The CUDA 5 Toolkit as a toolchain for HPC applications, focused on the

More information

Rootbeer: Seamlessly using GPUs from Java

Rootbeer: Seamlessly using GPUs from Java Rootbeer: Seamlessly using GPUs from Java Phil Pratt-Szeliga. Dr. Jim Fawcett. Dr. Roy Welch. Syracuse University. Rootbeer Overview and Motivation Rootbeer allows a developer to program a GPU in Java

More information

GIVE WINGS TO YOUR IDEAS TUTORIAL

GIVE WINGS TO YOUR IDEAS TUTORIAL GIVE WINGS TO YOUR IDEAS TUTORIAL PLUG IN TO THE WIRELESS WORLD Tutorial Version: 001 / 1.0 Date: October 30, 2001 Reference: WM_SW_OAT_UGD_001 confidential Page: 1 / 18 (THIS PAGE IS INTENTIONALY LEFT

More information

TNM093 Practical Data Visualization and Virtual Reality Laboratory Platform

TNM093 Practical Data Visualization and Virtual Reality Laboratory Platform October 6, 2015 1 Introduction The laboratory exercises in this course are to be conducted in an environment that might not be familiar to many of you. It is based on open source software. We use an open

More information

PetaLinux SDK User Guide. Application Development Guide

PetaLinux SDK User Guide. Application Development Guide PetaLinux SDK User Guide Application Development Guide Notice of Disclaimer The information disclosed to you hereunder (the "Materials") is provided solely for the selection and use of Xilinx products.

More information

Setting up PostgreSQL

Setting up PostgreSQL Setting up PostgreSQL 1 Introduction to PostgreSQL PostgreSQL is an object-relational database management system based on POSTGRES, which was developed at the University of California at Berkeley. PostgreSQL

More information

CUDAMat: a CUDA-based matrix class for Python

CUDAMat: a CUDA-based matrix class for Python Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based

More information

Supported platforms & compilers Required software Where to download the packages Geant4 toolkit installation (release 9.6)

Supported platforms & compilers Required software Where to download the packages Geant4 toolkit installation (release 9.6) Supported platforms & compilers Required software Where to download the packages Geant4 toolkit installation (release 9.6) Configuring the environment manually Using CMake CLHEP full version installation

More information

Code Estimation Tools Directions for a Services Engagement

Code Estimation Tools Directions for a Services Engagement Code Estimation Tools Directions for a Services Engagement Summary Black Duck software provides two tools to calculate size, number, and category of files in a code base. This information is necessary

More information

How To Run A Password Manager On A 32 Bit Computer (For 64 Bit) On A 64 Bit Computer With A Password Logger (For 32 Bit) (For Linux) ( For 64 Bit (Foramd64) (Amd64 (For Pc

How To Run A Password Manager On A 32 Bit Computer (For 64 Bit) On A 64 Bit Computer With A Password Logger (For 32 Bit) (For Linux) ( For 64 Bit (Foramd64) (Amd64 (For Pc SafeNet Authentication Client (Linux) Administrator s Guide Version 8.1 Revision A Copyright 2011, SafeNet, Inc. All rights reserved. All attempts have been made to make the information in this document

More information

PKCS #11 opencryptoki for Linux HOWTO

PKCS #11 opencryptoki for Linux HOWTO PKCS #11 opencryptoki for Linux HOWTO Kristin Thomas kristint@us.ibm.com This HOWTO describes the implementation of the RSA Security Inc. Public Key Cryptographic Standard #11 (PKCS #11) cryptoki application

More information

Andreas Burghart 6 October 2014 v1.0

Andreas Burghart 6 October 2014 v1.0 Yocto Qt Application Development Andreas Burghart 6 October 2014 Contents 1.0 Introduction... 3 1.1 Qt for Embedded Linux... 3 1.2 Outline... 4 1.3 Assumptions... 5 1.4 Corrections... 5 1.5 Version...

More information

Table of Contents. Overview... 1. Features... 1. Applications... 1. Hardware requirement... 1. Card dimensions... 1. Software Installation...

Table of Contents. Overview... 1. Features... 1. Applications... 1. Hardware requirement... 1. Card dimensions... 1. Software Installation... Table of Contents Overview... 1 Features... 1 Applications... 1 Hardware requirement... 1 Card dimensions... 1 Software Installation... 1 Software Configuration... 4 E1/T1/MFCR2 mode settings... 4 E1 Mode...

More information

Leak Check Version 2.1 for Linux TM

Leak Check Version 2.1 for Linux TM Leak Check Version 2.1 for Linux TM User s Guide Including Leak Analyzer For x86 Servers Document Number DLC20-L-021-1 Copyright 2003-2009 Dynamic Memory Solutions LLC www.dynamic-memory.com Notices Information

More information

Zynq-7000 Platform Software Development Using the ARM DS-5 Toolchain Authors: Simon George and Prushothaman Palanichamy

Zynq-7000 Platform Software Development Using the ARM DS-5 Toolchain Authors: Simon George and Prushothaman Palanichamy Application Note: Zynq-7000 All Programmable Soc XAPP1185 (v2.0) May 6, 2014 Zynq-7000 Platform Software Development Using the ARM DS-5 Toolchain Authors: Simon George and Prushothaman Palanichamy Summary

More information

Introduction to CUDA C

Introduction to CUDA C Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU computing as first-class capability Retain traditional DirectX/OpenGL graphics performance CUDA C Based on industry-standard

More information

What is CUDA?... 3. Why do I care about CUDA?... 3. What is CUDA not?... 4. Getting started... 5. Nvidia-Drivers... 5. Overclocking...

What is CUDA?... 3. Why do I care about CUDA?... 3. What is CUDA not?... 4. Getting started... 5. Nvidia-Drivers... 5. Overclocking... 1 Table of Contents What is CUDA?... 3 Supported GPUs...3 Why do I care about CUDA?... 3 Where can I get this CUDA thing?. 4 What is CUDA not?... 4 Getting started... 5 Nvidia-Drivers.... 5 Overclocking...

More information

SheevaPlug Development Kit README Rev. 1.2

SheevaPlug Development Kit README Rev. 1.2 SheevaPlug Development Kit README Rev. 1.2 Introduction... 3 Flow to use the Software Development Kit packages... 3 Appendix A... 5 GCC cross-compiler... 5 Appendix B... 6 Mini-USB debug driver installation

More information

Code::Blocks Student Manual

Code::Blocks Student Manual Code::Blocks Student Manual Lawrence Goetz, Network Administrator Yedidyah Langsam, Professor and Theodore Raphan, Distinguished Professor Dept. of Computer and Information Science Brooklyn College of

More information

ANDROID DEVELOPER TOOLS TRAINING GTC 2014. Sébastien Dominé, NVIDIA

ANDROID DEVELOPER TOOLS TRAINING GTC 2014. Sébastien Dominé, NVIDIA ANDROID DEVELOPER TOOLS TRAINING GTC 2014 Sébastien Dominé, NVIDIA AGENDA NVIDIA Developer Tools Introduction Multi-core CPU tools Graphics Developer Tools Compute Developer Tools NVIDIA Developer Tools

More information

Outside In Image Export Technology SDK Quick Start Guide

Outside In Image Export Technology SDK Quick Start Guide Reference: 2009/02/06-8.3 Outside In Image Export Technology SDK Quick Start Guide This document provides an overview of the Outside In Image Export Software Developer s Kit (SDK). It includes download

More information

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint) TN203 Porting a Program to Dynamic C Introduction Dynamic C has a number of improvements and differences compared to many other C compiler systems. This application note gives instructions and suggestions

More information

Embedded Software Development

Embedded Software Development Linköpings Tekniska Högskola Institutionen för Datavetanskap (IDA), Software and Systems (SaS) TDDI11, Embedded Software 2010-04-22 Embedded Software Development Host and Target Machine Typical embedded

More information

The embedded Linux quick start guide lab notes

The embedded Linux quick start guide lab notes The embedded Linux quick start guide lab notes Embedded Linux Conference Europe 2010 Date: Tuesday 26th October Location: DeVere University of Arms Hotel, Cambridge Room: Churchill Suite Presenter: Chris

More information

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING NVIDIA DEVELOPER TOOLS BUILD. DEBUG. PROFILE. C/C++ IDE INTEGRATION STANDALONE TOOLS HARDWARE SUPPORT CPU AND GPU DEBUGGING & PROFILING

More information

Lazy OpenCV installation and use with Visual Studio

Lazy OpenCV installation and use with Visual Studio Lazy OpenCV installation and use with Visual Studio Overview This tutorial will walk you through: How to install OpenCV on Windows, both: The pre-built version (useful if you won t be modifying the OpenCV

More information

CSC230 Getting Starting in C. Tyler Bletsch

CSC230 Getting Starting in C. Tyler Bletsch CSC230 Getting Starting in C Tyler Bletsch What is C? The language of UNIX Procedural language (no classes) Low-level access to memory Easy to map to machine language Not much run-time stuff needed Surprisingly

More information

GIVE WINGS TO YOUR IDEAS TOOLS MANUAL

GIVE WINGS TO YOUR IDEAS TOOLS MANUAL GIVE WINGS TO YOUR IDEAS TOOLS MANUAL PLUG IN TO THE WIRELESS WORLD Version: 001 / 1.0 Date: October 30, 2001 Reference: WM_TOO_OAT_UGD_001 confidential Page: 1 / 22 (THIS PAGE IS INTENTIONALY LEFT BLANK)

More information

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, typeng2001@yahoo.com

More information

IBM 4765 PCIe Cryptographic Coprocessor Custom Software Developer's Toolkit Guide

IBM 4765 PCIe Cryptographic Coprocessor Custom Software Developer's Toolkit Guide IBM 4765 PCIe Cryptographic Coprocessor Custom Software Developer's Toolkit Guide Note: Before using this information and the products it supports, be sure to read the general information under Notices

More information

Retour d expérience : portage d une application haute-performance vers un langage de haut niveau

Retour d expérience : portage d une application haute-performance vers un langage de haut niveau Retour d expérience : portage d une application haute-performance vers un langage de haut niveau ComPAS/RenPar 2013 Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte 16 Janvier 2013 Our Goals Globally

More information

Università Degli Studi di Parma. Distributed Systems Group. Android Development. Lecture 1 Android SDK & Development Environment. Marco Picone - 2012

Università Degli Studi di Parma. Distributed Systems Group. Android Development. Lecture 1 Android SDK & Development Environment. Marco Picone - 2012 Android Development Lecture 1 Android SDK & Development Environment Università Degli Studi di Parma Lecture Summary - 2 The Android Platform Android Environment Setup SDK Eclipse & ADT SDK Manager Android

More information

PrimeRail Installation Notes Version A-2008.06 June 9, 2008 1

PrimeRail Installation Notes Version A-2008.06 June 9, 2008 1 PrimeRail Installation Notes Version A-2008.06 June 9, 2008 1 These installation notes present information about installing PrimeRail version A-2008.06 in the following sections: Media Availability and

More information

Cross-Platform GP with Organic Vectory BV Project Services Consultancy Services Expertise Markets 3D Visualization Architecture/Design Computing Embedded Software GIS Finance George van Venrooij Organic

More information

IUCLID 5 Guidance and support. Installation Guide Distributed Version. Linux - Apache Tomcat - PostgreSQL

IUCLID 5 Guidance and support. Installation Guide Distributed Version. Linux - Apache Tomcat - PostgreSQL IUCLID 5 Guidance and support Installation Guide Distributed Version Linux - Apache Tomcat - PostgreSQL June 2009 Legal Notice Neither the European Chemicals Agency nor any person acting on behalf of the

More information

GRID VGPU FOR VMWARE VSPHERE

GRID VGPU FOR VMWARE VSPHERE GRID VGPU FOR VMWARE VSPHERE DU-07354-001 March 2015 Quick Start Guide DOCUMENT CHANGE HISTORY DU-07354-001 Version Date Authors Description of Change 0.1 7/1/2014 AC Initial draft for vgpu early access

More information

10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition

10 STEPS TO YOUR FIRST QNX PROGRAM. QUICKSTART GUIDE Second Edition 10 STEPS TO YOUR FIRST QNX PROGRAM QUICKSTART GUIDE Second Edition QNX QUICKSTART GUIDE A guide to help you install and configure the QNX Momentics tools and the QNX Neutrino operating system, so you can

More information

Android Setup Phase 2

Android Setup Phase 2 Android Setup Phase 2 Instructor: Trish Cornez CS260 Fall 2012 Phase 2: Install the Android Components In this phase you will add the Android components to the existing Java setup. This phase must be completed

More information

Code::Block manual. for CS101x course. Department of Computer Science and Engineering Indian Institute of Technology - Bombay Mumbai - 400076.

Code::Block manual. for CS101x course. Department of Computer Science and Engineering Indian Institute of Technology - Bombay Mumbai - 400076. Code::Block manual for CS101x course Department of Computer Science and Engineering Indian Institute of Technology - Bombay Mumbai - 400076. April 9, 2014 Contents 1 Introduction 1 1.1 Code::Blocks...........................................

More information

Table of Contents. The RCS MINI HOWTO

Table of Contents. The RCS MINI HOWTO Table of Contents The RCS MINI HOWTO...1 Robert Kiesling...1 1. Overview of RCS...1 2. System requirements...1 3. Compiling RCS from Source...1 4. Creating and maintaining archives...1 5. ci(1) and co(1)...1

More information

TESLA C2050/2070 COMPUTING PROCESSOR INSTALLATION GUIDE

TESLA C2050/2070 COMPUTING PROCESSOR INSTALLATION GUIDE TESLA C2050/2070 COMPUTING PROCESSOR INSTALLATION GUIDE TESLA C2050 INSTALLATION GUIDE NVIDIA Tesla C2050/2070 TABLE OF CONTENTS TABLE OF CONTENTS Introduction 1 About This Guide 1 Minimum System Requirements

More information

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism Intel Cilk Plus: A Simple Path to Parallelism Compiler extensions to simplify task and data parallelism Intel Cilk Plus adds simple language extensions to express data and task parallelism to the C and

More information

LSN 10 Linux Overview

LSN 10 Linux Overview LSN 10 Linux Overview ECT362 Operating Systems Department of Engineering Technology LSN 10 Linux Overview Linux Contemporary open source implementation of UNIX available for free on the Internet Introduced

More information

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

RWTH GPU Cluster. Sandra Wienke wienke@rz.rwth-aachen.de November 2012. Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky RWTH GPU Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de November 2012 Rechen- und Kommunikationszentrum (RZ) The RWTH GPU Cluster GPU Cluster: 57 Nvidia Quadro 6000 (Fermi) innovative

More information

OpenGeo Suite for Linux Release 3.0

OpenGeo Suite for Linux Release 3.0 OpenGeo Suite for Linux Release 3.0 OpenGeo October 02, 2012 Contents 1 Installing OpenGeo Suite on Ubuntu i 1.1 Installing OpenGeo Suite Enterprise Edition............................... ii 1.2 Upgrading.................................................

More information

Installation Guide for Basler pylon 2.3.x for Linux

Installation Guide for Basler pylon 2.3.x for Linux Installation Guide for Basler pylon 2.3.x for Linux Version 2.3.x Document ID Number: AW00100401000 Revision Date: May 27, 2011 Subject to Change Without Notice Basler Vision Technologies Installation

More information

Freescale Semiconductor, I

Freescale Semiconductor, I nc. Application Note 6/2002 8-Bit Software Development Kit By Jiri Ryba Introduction 8-Bit SDK Overview This application note describes the features and advantages of the 8-bit SDK (software development

More information

In-System Programmer USER MANUAL RN-ISP-UM RN-WIFLYCR-UM-.01. www.rovingnetworks.com 1

In-System Programmer USER MANUAL RN-ISP-UM RN-WIFLYCR-UM-.01. www.rovingnetworks.com 1 RN-WIFLYCR-UM-.01 RN-ISP-UM In-System Programmer 2012 Roving Networks. All rights reserved. Version 1.1 1/19/2012 USER MANUAL www.rovingnetworks.com 1 OVERVIEW You use Roving Networks In-System-Programmer

More information

INTEGRAL OFF-LINE SCIENTIFIC ANALYSIS

INTEGRAL OFF-LINE SCIENTIFIC ANALYSIS I N T E G R A L C S E C N I T E R N E C E D A INTEGRAL OFF-LINE SCIENTIFIC ANALYSIS INSTALLATION GUIDE Issue 10.2 December 2015 INTEGRAL Science Data Centre Chemin d Ecogia 16 CH-1290 Versoix isdc.unige.ch

More information

Before entering the lab, make sure that you have your own UNIX and PC accounts and that you can log into them.

Before entering the lab, make sure that you have your own UNIX and PC accounts and that you can log into them. 1 Objective Texas A&M University College of Engineering Computer Science Department CPSC 321:501 506 Computer Architecture Fall Semester 2004 Lab1 Introduction to SPIM Simulator for the MIPS Assembly Language

More information

GPU Profiling with AMD CodeXL

GPU Profiling with AMD CodeXL GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources

More information

NVIDIA CUDA INSTALLATION GUIDE FOR MICROSOFT WINDOWS

NVIDIA CUDA INSTALLATION GUIDE FOR MICROSOFT WINDOWS NVIDIA CUDA INSTALLATION GUIDE FOR MICROSOFT WINDOWS DU-05349-001_v7.5 September 2015 Installation and Verification on Windows TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements...

More information

================================================================== CONTENTS ==================================================================

================================================================== CONTENTS ================================================================== Disney Epic Mickey 2 : The Power of Two Read Me File ( Disney) Thank you for purchasing Disney Epic Mickey 2 : The Power of Two. This readme file contains last minute information that did not make it into

More information

Introduction to NaviGenie SDK Client API for Android

Introduction to NaviGenie SDK Client API for Android Introduction to NaviGenie SDK Client API for Android Overview 3 Data access solutions. 3 Use your own data in a highly optimized form 3 Hardware acceleration support.. 3 Package contents.. 4 Libraries.

More information

CUDA Programming. Week 4. Shared memory and register

CUDA Programming. Week 4. Shared memory and register CUDA Programming Week 4. Shared memory and register Outline Shared memory and bank confliction Memory padding Register allocation Example of matrix-matrix multiplication Homework SHARED MEMORY AND BANK

More information

Partek Flow Installation Guide

Partek Flow Installation Guide Partek Flow Installation Guide Partek Flow is a web based application for genomic data analysis and visualization, which can be installed on a desktop computer, compute cluster or cloud. Users can access

More information

Installing Java. Table of contents

Installing Java. Table of contents Table of contents 1 Jargon...3 2 Introduction...4 3 How to install the JDK...4 3.1 Microsoft Windows 95... 4 3.1.1 Installing the JDK... 4 3.1.2 Setting the Path Variable...5 3.2 Microsoft Windows 98...

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon

More information

Waspmote IDE. User Guide

Waspmote IDE. User Guide Waspmote IDE User Guide Index Document Version: v4.1-01/2014 Libelium Comunicaciones Distribuidas S.L. INDEX 1. Introduction... 3 1.1. New features...3 1.2. Other notes...3 2. Installation... 4 2.1. Windows...4

More information

DE4 NetFPGA Packet Generator Design User Guide

DE4 NetFPGA Packet Generator Design User Guide DE4 NetFPGA Packet Generator Design User Guide Revision History Date Comment Author 01/30/2012 Initial draft Harikrishnan Contents 1. Introduction... 4 2. System Requirements... 4 3. Installing DE4 NetFPGA

More information

Using C to Access Data Stored in Program Space Memory on the TMS320C24x DSP

Using C to Access Data Stored in Program Space Memory on the TMS320C24x DSP Application Report SPRA380 April 2002 Using C to Access Data Stored in Program Space Memory on the TMS320C24x DSP David M. Alter DSP Applications - Semiconductor Group ABSTRACT Efficient utilization of

More information

SAS 9.4 In-Database Products

SAS 9.4 In-Database Products SAS 9.4 In-Database Products Administrator s Guide Fifth Edition SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS 9.4 In-Database Products:

More information

1 The Installation of GPGPUsim

1 The Installation of GPGPUsim 1 The Installation of GPGPUsim I. System Environment. I installed GPGPUsim along with CUDA in my virtual machine system. The system environment is given as following Host OS Mac OS X 10.9 Host CPU Intel

More information

APPLICATION NOTE. How to build pylon applications for ARM

APPLICATION NOTE. How to build pylon applications for ARM APPLICATION NOTE Version: 01 Language: 000 (English) Release Date: 31 January 2014 Application Note Table of Contents 1 Introduction... 2 2 Steps... 2 1 Introduction This document explains how pylon applications

More information

Le langage OCaml et la programmation des GPU

Le langage OCaml et la programmation des GPU Le langage OCaml et la programmation des GPU GPU programming with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte Le projet OpenGPU : un an plus tard Ecole Polytechnique - 8 juin 2011 Outline

More information

An Embedded Wireless Mini-Server with Database Support

An Embedded Wireless Mini-Server with Database Support An Embedded Wireless Mini-Server with Database Support Hungchi Chang, Sy-Yen Kuo and Yennun Huang Department of Electrical Engineering National Taiwan University Taipei, Taiwan, R.O.C. Abstract Due to

More information

Informatica e Sistemi in Tempo Reale

Informatica e Sistemi in Tempo Reale Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 25, 2010 G. Lipari (Scuola Superiore Sant Anna)

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

Android: How To. Thanks. Aman Nijhawan

Android: How To. Thanks. Aman Nijhawan Android: How To. This is just a collection of useful information and tricks that I used during the time I was developing on the android ADP1. In some cases the information might be a little old and new

More information

Installing Eclipse C++ for Windows

Installing Eclipse C++ for Windows Installing Eclipse C++ for Windows I. Introduction... 2 II. Installing and/or Enabling the 32-bit JRE (Java Runtime Environment)... 2 A. Windows 32-bit Operating System Environment... 2 B. Windows 64-bit

More information

How To Install Acronis Backup & Recovery 11.5 On A Linux Computer

How To Install Acronis Backup & Recovery 11.5 On A Linux Computer Acronis Backup & Recovery 11.5 Server for Linux Update 2 Installation Guide Copyright Statement Copyright Acronis International GmbH, 2002-2013. All rights reserved. Acronis and Acronis Secure Zone are

More information

Tutorial. Reference http://www.openflowswitch.org/foswiki/bin/view/openflow/mininetgettingstarted for more thorough Mininet walkthrough if desired

Tutorial. Reference http://www.openflowswitch.org/foswiki/bin/view/openflow/mininetgettingstarted for more thorough Mininet walkthrough if desired Setup Tutorial Reference http://www.openflowswitch.org/foswiki/bin/view/openflow/mininetgettingstarted for more thorough Mininet walkthrough if desired Necessary Downloads 1. Download VM at http://www.cs.princeton.edu/courses/archive/fall10/cos561/assignments/cos561tutorial.zip

More information

Procedure to Create and Duplicate Master LiveUSB Stick

Procedure to Create and Duplicate Master LiveUSB Stick Procedure to Create and Duplicate Master LiveUSB Stick A. Creating a Master LiveUSB stick using 64 GB USB Flash Drive 1. Formatting USB stick having Linux partition (skip this step if you are using a new

More information

Acronis Backup & Recovery 10 Server for Linux. Update 5. Installation Guide

Acronis Backup & Recovery 10 Server for Linux. Update 5. Installation Guide Acronis Backup & Recovery 10 Server for Linux Update 5 Installation Guide Table of contents 1 Before installation...3 1.1 Acronis Backup & Recovery 10 components... 3 1.1.1 Agent for Linux... 3 1.1.2 Management

More information