Large-scale Virtual Acoustics Simulation at Audio Rates Using Three Dimensional Finite Difference Time Domain and Multiple GPUs
|
|
- Charlene Simon
- 8 years ago
- Views:
Transcription
1 Large-scale Virtual Acoustics Simulation at Audio Rates Using Tree Dimensional Finite Difference Time Domain and Multiple GPUs Craig J. Webb 1,2 and Alan Gray 2 1 Acoustics Group, University of Edinburg 2 Edinburg Parallel Computing Centre, University of Edinburg To be presented at te 21st International Congress on Acoustics, Montréal, Canada, 2013 Abstract Te computation of large-scale virtual acoustics using te 3D finite difference time domain (FDTD) is proibitively computationally expensive, especially at ig audio sample rates, wen using traditional CPUs. In recent years te computer gaming industry as driven te development of extremely powerful Grapics Processing Units (GPUs). Troug specialised development and tuning we can exploit te igly parallel GPU arcitecture to make suc FDTD computations feasible. Tis paper describes te simultaneous use of multiple NVIDIA GPUs to compute scemes containing over a billion grid points. We examine te use of asyncronous alo transfers between cards, to ide te latency involved in transferring data, and overall computation time is considered wit respect to variation in te size of te partition layers. As ardware memory poses limitations on te size of te room to be rendered, we also investigate te use of single precision aritmetic. Tis allows twice te domain space, compared wit double precision, but results in pase sifting of te output wit possible audible artefacts. Using tese tecniques, large-scale spaces of several tousand cubic metres can be computed at 44.1kHz in a useable time frame, making teir use in room acoustics rendering and auralization applications possible in te near future. C.J.Webb-2@sms.ed.ac.uk 1
2 INTRODUCTION Hig fidelity virtual room acoustics can be approaced troug direct numerical simulation of wave propagation in a defined space. Unlike ray-based [1] or image source [2] tecniques, tis approac seeks to model te entire acoustic field witin te simulation domain. Tree-dimensional Finite Difference Time Domain (FDTD) scemes can be employed, owever at audio sample rates suc scemes are extremely computationally expensive [3], proibitively so for serial computation. Recent advances in grapics processing unit (GPU) arcitectures allow for general purpose computation to be performed on tese forms of igly parallel ardware. Wilst central processing units (CPUs) may contain a small number of cores, suc as four or eigt, GPUs contain undreds of processing cores tat can be used to perform parallel computation. Using tis arcitecture, te data independence of FDTD scemes can be leveraged to gain significant acceleration over single-treaded implementations [4], and tis allows large-scale simulations to be computed in time scales tat are actually useable for performing researc. For scientific computing, Nvidia s Tesla GPUs are typically used in a workstation or compute node tat can be configured wit four GPUs connected across te same PCIe bus. Tis paper examines te simultaneous use of tese four-gpu systems to render virtual acoustic simulations. Tis allows greater acceleration of existing models, or te combined use of all available memory across four GPUs to render large-scale domains containing billions of grid points. Recent versions of te CUDA language facilitate tis process [5], witout recourse to MPI programming tecniques. Te first section details te FDTD scemes being used, followed by an outline of te CUDA programming model for te simultaneous use of multiple GPUs. We ten describe te implementation of te scemes using bot non-asyncronous and asyncronous approaces. Finally, we detail experimental testing in terms of floating-point precision and overall computation times for various configurations, including large-scale simulations tat use maximum memory. VIRTUAL ACOUSTICS USING FINITE DIFFERENCE METHOD Te starting point for acoustic FDTD simulations is te 3D wave equation, wic in second order form is given by: 2 Ψ t 2 = c2 2 Ψ (1) Here Ψ is te target acoustical field quantity, c is te wave speed in air, 2 is te 3D Laplacian. Simple first-order boundary conditions are used, were: Ψ = cβn Ψ (2) t Here n is a unit normal to a wall or obstacle, and β is an absorption coefficient. Te standard FDTD discretisation leads to te following update equation, wic includes boundary loss terms using a single reflection coefficient, w n+1 l,m,p = 1 ( ) (2 Kλ 2 )w n l,m,p 1+λβ + λ2 S n l,m,p (1 λβ)wn 1 l,m,p (3) were w l,m,p is te discrete acoustic field, K is 6 in free space, 5 at a face, 4 at an edge and 3 at a corner, λ= ct X, β te coefficient for boundary reflection losses, and Sn l,m,p is (wn l+1,m,p +wn l 1,m,p +wn l,m+1,p +wn l,m 1,p + w n l,m,p+1 + wn l,m,p 1 ). Te stability condition for te sceme follows from von Neumann analysis [6], suc tat for a given time step T te grid spacing X must satisfy: X 3c 2 T 2 (4) Te basic sceme can be extended to include te effect of viscosity, wic gives a frequency dependent damping [4]. 2 Ψ t 2 = c2 2 Ψ+ cα 2 Ψ (5) t
3 Here α is a viscosity coefficient. Tis leads to an update equation of te form: w n+1 l,m,p = 1 ( ) (2 Kλ 2 )w n l,m,p 1+λβ + λ2 S n l,m,p (1 λβ)wn 1 l,m,p + ckα(sn l,m,p Sn 1 l,m,p ) (6) Note tat tis update uses te nearest neigbours from two time steps ago, tus requiring te use of tree data grids. Te basic sceme, wic only uses te centre point from two time steps ago, can be implemented using only two grids and a read, ten overwrite procedure. Tese systems are referred to as te "2 Grid" and "3 Grid" scemes trougout tis paper. PARALLEL COMPUTING AND USE OF MULTIPLE GPUS IN CUDA CUDA is Nvidia s programming arcitecture for implementing igly-treaded GPU code. In a serial implementation of equation 3, loops would be used to iterate over te computation domain, applying te update equation to eac grid point. In CUDA, we issues a large number of kernel treads tat implement te SIMD (Single Instruction Multiple Data) operation, and tese treads are sceduled to execute using a large number of parallel processing cores. Te program code contains a mixture of ost serial C code, and device CUDA code. Wilst te ost uses standard CPU memory, te device as multiple memory types. Global memory is te core data store, and is typical in te range of 3 to 6Gb per GPU. CUDA treads make use of local, register memory, and also ave a small amount of sared memory per tread block. Tey can also communicate directly wit te global and (read-only) constant memory. Eac type as different access speeds, wit global memory being te slowest, and sared and constant being fast. Wit a four GPU server, we ave four instances of tis memory model wic are independent. Te GPUs are connected in a pair-wise manner over te PCIe bus, as sown in figure 1. GPU0 GPU1 GPU2 GPU3 PCIe PCIe PCIe To Host FIGURE 1: PCIe connections for four GPUs in a single compute node. Version four and above of te CUDA arcitecture contains functionality tat allows multiple GPUs tat are connected in suc a manner to be used concurrently [5]. Peer-to-peer communication allows data transfer between GPUs tat bypasses te ost altogeter (transferring data from device to ost, ten back to anoter device is an expensive operation). Tis can be combined wit te use of multiple streams of execution and asyncronous sceduling to acieve scalable speedups wen using multiple GPUs. IMPLEMENTATION OF THREE-DIMENSIONAL FDTD SCHEMES Tis section gives a detailed description of te implementation of te basic 2 Grid FDTD sceme wit its first-order boundaries. We start wit a single GPU version, ten extend tis to an initial multiple GPU implementation. Tis is ten developed to include te use of asyncronous data transfers of te partition alos.
4 Single GPU implementation Te CUDA programming model makes use of treads tat are grouped first into blocks, and ten into a grid. Bot of tese objects can be one, two, or tree-dimensional in sape. Given a tree-dimensional data domain, tere are many possible approaces to mapping te treading model over te domain. Prior to te FERMI arcitecture, te standard approac was to utilise sared memory by issuing treads to cover a 2D layer of data. Eac tread itself would ten iterate over te final dimension, reusing data and sared memory [7]. Post FERMI, te cacing system negates te benefits of tis approac, and one can simply issue treads tat cover te wole data domain [8]. Tree-dimensional treads blocks can be used, for example 32 x 4 x 2, and a tree-dimensional tread grid placed over te data. Te time loop for te simulation contains a single kernel launc, ten te input/output is processed, followed by swapping te pointers to te data grids, as sown in listing 1. 1 for (n=0;n<nf; n++) 2 { 3 UpDateSceme<<<dimGridInt, dimblockint >>>(u_d, u1_d ) ; 4 / / perform I /O 5 inout <<<dimgridio, dimblockio>>>(u_d, out_d, ins, n ) ; 6 / / update pointers 7 dummy_ptr = u1_d ; u1_d = u_d ; u_d = dummy_ptr ; 8 } LISTING 1: Time loop for single GPU implementation. Te tread kernel itself implements bot te interior and boundary update in a single SIMD operation, as sown in listing 2. 1 global void UpDateSceme( real *u, real *u1 ) { 2 / / Get X,Y, Z from 3D tread and block Id s 3 int X = blockidx. x * Bx + treadidx. x ; 4 int Y = blockidx. y * By + treadidx. y ; 5 int Z = blockidx. z * Bz + treadidx. z + 1; 6 LISTING 2: Kernel code for single GPU implementation. 7 / / Test tat not at alo, Z block excludes Z alo 8 i f ( ( X>0) && (X<(Nx 1)) && (Y>0) && (Y<(Ny 1 ) ) ) { 9 / / Calculate linear centre position 10 int cp = Z*area +(Y*Nx+X ) ; 11 int K = (0 (X 1) + (0 X (Nx 2)) + (0 Y 1) + (0 Y (Ny 2)) + (0 Z 1) + (0 Z (Nz 2)); 12 real c f = 1. 0 ; 13 real cf2 = 1. 0 ; 14 / / set loss c o e f f i c i e n t s i f at a boundary 15 i f (K<6){ c f = cf_d [ 0 ]. loss1 ; cf2 = cf_d [ 0 ]. loss2 ; } 16 / / Get sum of neigbour points 17 real S = u1 [ cp 1]+u1 [ cp+1]+u1 [ cp Nx]+u1 [ cp+nx]+u1 [ cp area ]+u1 [ cp+area ] ; 18 / / Calculate te update 19 u[ cp ] = c f *( (2.0 K* cf_d [ 0 ]. l2 )*u1 [ cp ] + cf_d [ 0 ]. l2 *S cf2 *u[ cp ] ) ; 20 } 21 } Te kernel keeps te use of conditional statements to a minimum. A layer of non-updated "gost" points is used around te data domain, and so line 8 employs a conditional to ceck for tis. A single furter conditional is used at line 15, to load te coefficients used at a boundary. Te logical expression at line 11 computes boundary position in an efficient manner, witout te need for a lengty IF-ELSEIF statement. Non-asyncronous implementation using multiple GPUs In transitioning from a single GPU to te use of four GPUs, te data domain needs to be partitioned. Te individual GPUs ave discrete memory, and so te 3D data needs to be separated into four segments. A furter complication is tat te FDTD sceme requires neigbouring points in all dimensions, and so overlap alos will be required. Te 3D data itself is decomposed using a row-major alignment for eac layer of te Z dimension, wit consecutive layers in series. In tis format, eac layer occupies contiguous memory locations. Tus, te most natural partitioning is across te Z dimension, as sown in figure 2. Te overlap alos are individual Z layers, and so can be transferred as a single contiguous block of memory.
5 Nz Data partitioned across te Nz layers, wit overlap alos () Ny Nx Ny Nx Nx... FIGURE 2: Data partitioning across te Z dimension using four GPUs, wit overlap alos. Wilst te domain partitioning is straigtforward, te CUDA code itself requires many extensions compared to te single GPU case. In terms of te pre-time loop setup code, individual commands suc as cudamalloc( ) become embedded in loops over te four GPUs. A call is made to cudasetdevice( ) at eac iteration, to perform te operation on individual GPUs. Single pointers to device memory become arrays of pointers, and constant memory as to be allocated to eac GPU. Te alo offset locations ave to be calculated as linear positions across memory, and finally te peer-to-peer access as to be initialised. In a non-asyncronous implementation, te time loop operates as follows: 1. Loop over te GPUs, issuing a kernel launc to compute te data on tat GPU. 2. Syncronize all GPUs. 3. Perform peer-to-peer data transfer for overlap alos. 4. Perform input/output. 5. Syncronize and swap data pointers. Eac GPU computes its data simultaneously, but only wen all ave completed do we ten perform te data transfers of te individual overlap alos. Implementation using asyncronous data transfers Te above implementation contains an inerent time lag, as te GPUs are idle during te data transfers across te PCIe bus. To eliminate tis, we can make use of asyncronous beaviour and streams. Te approac used is based on tat outlined by Nvidia [9], but is extended ere to operate wit te large-scale alo layers tat occur in te 3D case. An individual Z layer can contain millions of floating-point values, and six layers ave to be transferred between GPUs at eac time step. A stream is simply a sequence of CUDA events tat occur in series. However, multiple streams can be used so tat events can execute in a concurrent and asyncronous manner. As te FDTD sceme is data-independent at eac time step, te overlap alo layers can be computed and te data transfers performed at te same time as te larger interior data segments on eac GPU. Tis is accomplised by using one stream of events for te alos and transfers on eac GPU, wilst a second
6 stream is used for te interior. Te streams are identified in te kernel launces, as sown in te time loop code detailed in listing 3. LISTING 3: Time loop for asyncronous four GPU implementation. 1 for (n=0;n<nf; n++) 2 { 3 / / Compute alo layers, ten interior 4 p = 0; 5 for ( i =0; i <num_gpus ; i ++){ 6 cudasetdevice ( gpu [ i ] ) ; 7 UpDateHalo<<<dimGridHalo, dimblockhalo,0, stream_alo [ i ]>>>(u_d [ i ], u1_d [ i ], pos [p ] ) ; 8 p++; 9 i f ( i >0 && i <num_gpus 1){ 10 UpDateHalo<<<dimGridHalo, dimblockhalo,0, stream_alo [ i ]>>>(u_d [ i ], u1_d [ i ], pos [p ] ) ; 11 p++; 12 } 13 cudastreamquery ( stream_alo [ i ] ) ; 14 UpDateInterior <<<dimgridint, dimblockint,0, stream_int [ i ]>>>(u_d [ i ], u1_d [ i ], i ) ; 15 } 16 / / Excange Halos 17 cudamemcpypeerasync ( u_d [ 1 ], gpu[1],&u_d [ 0 ] [ pos [ 0 ] ], gpu [ 0 ], area_size, stream_alo [ 0 ] ) ; / / perform I /O / / Syncronise and update pointers } 24 } Initially, we iterate over te GPUs and launc te kernels required for te alos using stream_alo. Note tat GPUs 0 and 3 ave a single alo, wilst GPUs 1 and 2 contain two alos. Ten te main interior data kernels are launced, using stream_int. Te data transfer events are ten pused into stream_alo, wic will execute wen te alos ave been computed. In tis manner, te data transfers proceed at te same time as te interior computation is being performed. Te GPUs are ten syncronized before swapping te data pointers. EXPERIMENTAL TESTING Initial testing is performed using data grids containing 100 million points eac. At double precision tis requires 0.8Gb of data per grid (1.6Gb for te wole 2 Grid simulation), and so allows a comparison to be made between single GPU and four GPU implementation using Tesla C2050 GPUs tat ave 3Gb of global memory. Using a sample rate of 44.1kHz, te domain size is 244m 3. Computation times Te 2 Grid sceme is used to compare computation times for te single GPU, basic (non-asyncronous) four GPU, and asyncronous four GPU implementations. Te simulations are computed for 4,410 samples in eac case, at 44.1kHz, and for bot single and double precision floating-point accuracy. Table 1 sows te resulting times. Te data grids are of size Nx : 960 points, Ny : 396 points, and Nz : 264 points. TABLE 1: Computation times and speedups for double (DP) and single (SP) precision. Setup DP Time (sec) Speedup SP Time (sec) Speedup Single GPU Basic four GPU 59.4 x x2.5 Async four GPU 48.1 x x3.0 Te basic four GPU implementation only acieves a speedup of x 2.5, wilst te asyncronous version gets to x3. Te grid sizes for tese initial tests contain a very large Z layer (960 x 396 = 380,160 points). As six overlap alos of tis size ave to be transferred between GPUs at eac time step, tis is still a limiting factor. To test te effect of te Z layer size, te double precision simulation is performed for decreasing
7 sizes wilst keeping te same overall domain size of 244m 3, ranging from 380,160 points down to 76,032 points. Figure 3 sows te effect in terms of te dimensions of te space. Ny: 396 Z layer size : 380,160 Z layer size: 76,032 Ny: 396 Nx: 960 Nx: m m m m m m FIGURE 3: Variations in Z layer sizes for a domain of 244m 3. Table 2 sows te timing results. As te Z layer size decreases we get closer to te x4 scalable speedup. TABLE 2: Effect of variation in te size of te Z layer on computation time. Z layer size (points) Time (sec) Speedup over single GPU 380, x , x , x , x , x , x , x3.52 Floating-point precision Single precision floating-point variables require 32 bits of memory compared to double precision wic requires 64 bits. So, using single precision we can effectively double te size of te computation domain using te same amount of memory. Tere is also an additional benefit, as GPUs offer greater peak performance at single precision. However, testing on te 100 million point domain reveals stability issues wen running at, or very close to, te Courant limit for te sceme. Figure 4 sows te outputs for a 40,000 time step simulation at 44.1kHz using a DC-blocked audio input and grid spacing set at te Courant limit. Te single precision output (blue) is stable initially, but sows pase and amplitude differences compared to te double precision (red). After 30,000 samples, te single precision output begins to diverge, and finally becomes unstable after 40,000 samples. Backing away from te Courant limit by around 0.05% ensures stability in single precision, at te cost of introducing greater dispersion.
8 Normalised Level Time (samples) x 10 4 FIGURE 4: Double (red) vs Single (blue) precision at te Courant limit over 40,000 time steps. LARGE-SCALE ACOUSTIC SIMULATIONS Having detailed te efficiency of te asyncronous four GPU implementation, we can now consider te use of maximum available memory to perform large-scale simulations. Nvidia s Tesla GPUs come wit various amounts of global memory, and so table 3 sows te maximum simulation sizes for various configurations, at a sample rate of 44.1kHz. Note tat te GPUs ave less available memory tan is actually labelled, for example a 3Gb C2050 as a useable global memory of around 2.8Gb. Wilst te table TABLE 3: Maximum simulation sizes in points per grid (millions) and cubic metres, at 44.1kHz. GPU 2 Grid SP m 3 2 Grid DP m 3 3 Grid SP m 3 3 Grid DP m 3 3Gb Gb 595 1, Gb 722 1, , x 3Gb 1,409 3, , , ,160 4 x 5Gb 2,380 5,844 1,189 2,918 1,582 3, ,960 4 x 6Gb 2,889 7,096 1,444 3,546 1,920 4, ,357 sows te maximum sizes for bot te 2 Grid and 3 Grid scemes, in practice tis as to be reduced to allow for storage of audio output arrays and, in te four GPU case, overlap alos of variable size. Four Tesla C2050 GPUs are used for te testing ere, eac of wic as 3Gb of global memory. Tus for te basic 2 Grid sceme at single precision we can compute simulations using 1.4 billion grid points, and a resulting simulation size of 3,350 m 3. For te 3 Grid sceme including viscosity, te grids contain just under a billion points. Table 4 sows te computation times for maximum memory simulations, running for 44,100 samples at 44.1kHz. TABLE 4: Maximum memory computation times for 44,100 samples at 44.1kHz. Simulation Size (m 3 ) Time (min) 2 Grid DP (double precision) 1, Grid SP (single precision) 3, Grid DP (double precision) 1, Grid SP (single precision) 2,
9 CONCLUSIONS Te use of asyncronous data transfers and concurrent execution allows multiple GPUs to be used effectively to acieve near-scaleable speedups in tree-dimensional FDTD scemes, typically ranging from x3 to x3.5 wen using four GPUs, depending on te size of te overlap alos. By using all available memory on a four GPU compute node, we can perform virtual acoustic simulations using billions of grid points. At audio rates suc as 44.1kHz, tis allows te modelling of large rooms and alls, of several tousand cubic metres. Stability becomes an issue wen running at single precision to maximise memory usage. Computing scemes at te Courant limit using single precision can lead to instability over time, altoug tey may appear stable initially. Backing away from te Courant limit wit a small increase in te spatial resolution resolves tis beaviour. Computation times for large-scale maximum memory simulations are around forty to fifty minutes per second at 44.1kHz, using four Tesla C2050 GPUs. Initial testing on te latest Kepler arcitecture GPUs sows a near two-fold speedup over te FERMI Tesla GPUs used ere, and so sould bring computation times down to under alf an our. ACKNOWLEDGEMENTS Tis work is supported by te European Researc Council, under Grant StG NESS. REFERENCES [1] N. Rober, U. Kaminski, and M. Masuc, Ray acoustics using computer grapics tecnology, in Proc. of te 10t Int. Conf. on Digital Audio Effects (DAFx-07, Bordeaux, France) (2007). [2] E. Lemann and A. Joansson, Diffuse reverberation model for efficient image-source simulation of room impulse responses, in IEEE Transactions on Audio, Speec and Language Processing, volume 18(6), (2010). [3] L. Savioja, D. Manoca, and M. Lin, Use of GPUs in room acoustic modeling and auralization, in Proc. Int. Symposium on Room Acoustics (Melbourne, Australia) (2010). [4] C. Webb and S. Bilbao, Computing room acoustics wit CUDA - 3D FDTD scemes wit boundary losses and viscosity, in Proc. of te IEEE Int. Conf. on Acoustics, Speec and Signal Processing (Prague, Czec Republic) (2011). [5] Nvidia, Cuda C programming guide, CUDA toolkit documentation.[online][cited: 8t Jan 2013.] ttp://docs.nvidia.com/cuda/ (2012). [6] J. Strikwerda, Finite Difference Scemes and Partial Differential Equations (Wadswort and Brooks/- Cole Advanced Books and Software, Pacific Grove, California) (1989). [7] P. Micikevicius, 3D finite difference computation on GPUs using CUDA, in Proceedings of 2nd Worksop on General Purpose Processing on Grapics Processing Units, GPGPU-2, (New York, NY, USA) (2009). [8] C. Webb and S. Bilbao, Virtual room acoustics: A comparison of tecniques for computing 3D FDTD scemes using CUDA, in Proc. 130t Convention of te Audio Engineering Society (AES) (London, UK) (2011). [9] P. Micikevicius, Multi-GPU Programming, Nvidia Cuda webinars. [Online][Cited: 6t Jan 2013.] ttp://developer.download.nvidia.com/cuda/training/ (2011).
Verifying Numerical Convergence Rates
1 Order of accuracy Verifying Numerical Convergence Rates We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, suc as te grid size or time step, and
More informationAbstract. Introduction
Fast solution of te Sallow Water Equations using GPU tecnology A Crossley, R Lamb, S Waller JBA Consulting, Sout Barn, Brougton Hall, Skipton, Nort Yorksire, BD23 3AE. amanda.crossley@baconsulting.co.uk
More informationOptimized Data Indexing Algorithms for OLAP Systems
Database Systems Journal vol. I, no. 2/200 7 Optimized Data Indexing Algoritms for OLAP Systems Lucian BORNAZ Faculty of Cybernetics, Statistics and Economic Informatics Academy of Economic Studies, Bucarest
More informationHow To Ensure That An Eac Edge Program Is Successful
Introduction Te Economic Diversification and Growt Enterprises Act became effective on 1 January 1995. Te creation of tis Act was to encourage new businesses to start or expand in Newfoundland and Labrador.
More informationFINITE DIFFERENCE METHODS
FINITE DIFFERENCE METHODS LONG CHEN Te best known metods, finite difference, consists of replacing eac derivative by a difference quotient in te classic formulation. It is simple to code and economic to
More informationThe EOQ Inventory Formula
Te EOQ Inventory Formula James M. Cargal Matematics Department Troy University Montgomery Campus A basic problem for businesses and manufacturers is, wen ordering supplies, to determine wat quantity of
More informationDesign and Analysis of a Fault-Tolerant Mechanism for a Server-Less Video-On-Demand System
Design and Analysis of a Fault-olerant Mecanism for a Server-Less Video-On-Demand System Jack Y. B. Lee Department of Information Engineering e Cinese University of Hong Kong Satin, N.., Hong Kong Email:
More informationComparison between two approaches to overload control in a Real Server: local or hybrid solutions?
Comparison between two approaces to overload control in a Real Server: local or ybrid solutions? S. Montagna and M. Pignolo Researc and Development Italtel S.p.A. Settimo Milanese, ITALY Abstract Tis wor
More informationResearch on the Anti-perspective Correction Algorithm of QR Barcode
Researc on te Anti-perspective Correction Algoritm of QR Barcode Jianua Li, Yi-Wen Wang, YiJun Wang,Yi Cen, Guoceng Wang Key Laboratory of Electronic Tin Films and Integrated Devices University of Electronic
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationPart II: Finite Difference/Volume Discretisation for CFD
Part II: Finite Difference/Volume Discretisation for CFD Finite Volume Metod of te Advection-Diffusion Equation A Finite Difference/Volume Metod for te Incompressible Navier-Stokes Equations Marker-and-Cell
More informationComputer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
More informationSchedulability Analysis under Graph Routing in WirelessHART Networks
Scedulability Analysis under Grap Routing in WirelessHART Networks Abusayeed Saifulla, Dolvara Gunatilaka, Paras Tiwari, Mo Sa, Cenyang Lu, Bo Li Cengjie Wu, and Yixin Cen Department of Computer Science,
More informationGeometric Stratification of Accounting Data
Stratification of Accounting Data Patricia Gunning * Jane Mary Horgan ** William Yancey *** Abstract: We suggest a new procedure for defining te boundaries of te strata in igly skewed populations, usual
More informationACT Math Facts & Formulas
Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Rationals: fractions, tat is, anyting expressable as a ratio of integers Reals: integers plus rationals plus special numbers suc as
More informationSHAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGHTS ON ELECTRICAL LOAD PATTERNS
CIRED Worksop - Rome, 11-12 June 2014 SAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGTS ON ELECTRICAL LOAD PATTERNS Diego Labate Paolo Giubbini Gianfranco Cicco Mario Ettorre Enel Distribuzione-Italy
More informationDigital evolution Where next for the consumer facing business?
Were next for te consumer facing business? Cover 2 Digital tecnologies are powerful enablers and lie beind a combination of disruptive forces. Teir rapid continuous development demands a response from
More informationOptimizing Desktop Virtualization Solutions with the Cisco UCS Storage Accelerator
Optimizing Desktop Virtualization Solutions wit te Cisco UCS Accelerator Solution Brief February 2013 Higligts Delivers linear virtual desktop storage scalability wit consistent, predictable performance
More informationArea-Specific Recreation Use Estimation Using the National Visitor Use Monitoring Program Data
United States Department of Agriculture Forest Service Pacific Nortwest Researc Station Researc Note PNW-RN-557 July 2007 Area-Specific Recreation Use Estimation Using te National Visitor Use Monitoring
More informationAn inquiry into the multiplier process in IS-LM model
An inquiry into te multiplier process in IS-LM model Autor: Li ziran Address: Li ziran, Room 409, Building 38#, Peing University, Beijing 00.87,PRC. Pone: (86) 00-62763074 Internet Address: jefferson@water.pu.edu.cn
More informationThe modelling of business rules for dashboard reporting using mutual information
8 t World IMACS / MODSIM Congress, Cairns, Australia 3-7 July 2009 ttp://mssanz.org.au/modsim09 Te modelling of business rules for dasboard reporting using mutual information Gregory Calbert Command, Control,
More informationSAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY
ASA Section on Survey Researc Metods SAMPLE DESIG FOR TE TERRORISM RISK ISURACE PROGRAM SURVEY G. ussain Coudry, Westat; Mats yfjäll, Statisticon; and Marianne Winglee, Westat G. ussain Coudry, Westat,
More informationSAT Subject Math Level 1 Facts & Formulas
Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Reals: integers plus fractions, decimals, and irrationals ( 2, 3, π, etc.) Order Of Operations: Aritmetic Sequences: PEMDAS (Parenteses
More information2 Limits and Derivatives
2 Limits and Derivatives 2.7 Tangent Lines, Velocity, and Derivatives A tangent line to a circle is a line tat intersects te circle at exactly one point. We would like to take tis idea of tangent line
More informationh Understanding the safe operating principles and h Gaining maximum benefit and efficiency from your h Evaluating your testing system's performance
EXTRA TM Instron Services Revolve Around You It is everyting you expect from a global organization Te global training centers offer a complete educational service for users of advanced materials testing
More information2.23 Gambling Rehabilitation Services. Introduction
2.23 Gambling Reabilitation Services Introduction Figure 1 Since 1995 provincial revenues from gambling activities ave increased over 56% from $69.2 million in 1995 to $108 million in 2004. Te majority
More informationOPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS
OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS ERIC T. CHUNG AND BJÖRN ENGQUIST Abstract. In tis paper, we developed and analyzed a new class of discontinuous
More informationGraphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
More informationOperation go-live! Mastering the people side of operational readiness
! I 2 London 2012 te ultimate Up to 30% of te value of a capital programme can be destroyed due to operational readiness failures. 1 In te complex interplay between tecnology, infrastructure and process,
More informationImproved dynamic programs for some batcing problems involving te maximum lateness criterion A P M Wagelmans Econometric Institute Erasmus University Rotterdam PO Box 1738, 3000 DR Rotterdam Te Neterlands
More informationSWITCH T F T F SELECT. (b) local schedule of two branches. (a) if-then-else construct A & B MUX. one iteration cycle
768 IEEE RANSACIONS ON COMPUERS, VOL. 46, NO. 7, JULY 997 Compile-ime Sceduling of Dynamic Constructs in Dataæow Program Graps Soonoi Ha, Member, IEEE and Edward A. Lee, Fellow, IEEE Abstract Sceduling
More informationWorking Capital 2013 UK plc s unproductive 69 billion
2013 Executive summary 2. Te level of excess working capital increased 3. UK sectors acieve a mixed performance 4. Size matters in te supply cain 6. Not all companies are overflowing wit cas 8. Excess
More informationPyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core
More informationTorchmark Corporation 2001 Third Avenue South Birmingham, Alabama 35233 Contact: Joyce Lane 972-569-3627 NYSE Symbol: TMK
News Release Torcmark Corporation 2001 Tird Avenue Sout Birmingam, Alabama 35233 Contact: Joyce Lane 972-569-3627 NYSE Symbol: TMK TORCHMARK CORPORATION REPORTS FOURTH QUARTER AND YEAR-END 2004 RESULTS
More informationLearn CUDA in an Afternoon: Hands-on Practical Exercises
Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA
More informationOverview of Component Search System SPARS-J
Overview of omponent Searc System Tetsuo Yamamoto*,Makoto Matsusita**, Katsuro Inoue** *Japan Science and Tecnology gency **Osaka University ac part nalysis part xperiment onclusion and Future work Motivation
More informationStaying in-between Music Technology in Higher Education
Staying in-between Music Tecnology in Higer Education (Post-modern) Callenges and Opportunities for Music Tecnology Education Carola Boem Carola Boem Centre for Music Tecnology Department of Music Department
More informationInstantaneous Rate of Change:
Instantaneous Rate of Cange: Last section we discovered tat te average rate of cange in F(x) can also be interpreted as te slope of a scant line. Te average rate of cange involves te cange in F(x) over
More information1.6. Analyse Optimum Volume and Surface Area. Maximum Volume for a Given Surface Area. Example 1. Solution
1.6 Analyse Optimum Volume and Surface Area Estimation and oter informal metods of optimizing measures suc as surface area and volume often lead to reasonable solutions suc as te design of te tent in tis
More informationBroadband Digital Direct Down Conversion Receiver Suitable for Software Defined Radio
Broadband Digital Direct Down Conversion Receiver Suitable for Software Defined Radio Moamed Ratni, Dragan Krupezevic, Zaoceng Wang, Jens-Uwe Jürgensen Abstract Sony International Europe GmbH, Germany.
More informationSAT Math Must-Know Facts & Formulas
SAT Mat Must-Know Facts & Formuas Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Rationas: fractions, tat is, anyting expressabe as a ratio of integers Reas: integers pus rationas
More informationCUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Qingyu Meng, Alan Humphrey, Martin Berzins Thanks to: John Schmidt and J. Davison de St. Germain, SCI Institute Justin Luitjens
More informationGPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
More information2.12 Student Transportation. Introduction
Introduction Figure 1 At 31 Marc 2003, tere were approximately 84,000 students enrolled in scools in te Province of Newfoundland and Labrador, of wic an estimated 57,000 were transported by scool buses.
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationNAFN NEWS SPRING2011 ISSUE 7. Welcome to the Spring edition of the NAFN Newsletter! INDEX. Service Updates Follow That Car! Turn Back The Clock
NAFN NEWS ISSUE 7 SPRING2011 Welcome to te Spring edition of te NAFN Newsletter! Spring is in te air at NAFN as we see several new services cropping up. Driving and transport emerged as a natural teme
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationShell and Tube Heat Exchanger
Sell and Tube Heat Excanger MECH595 Introduction to Heat Transfer Professor M. Zenouzi Prepared by: Andrew Demedeiros, Ryan Ferguson, Bradford Powers November 19, 2009 1 Abstract 2 Contents Discussion
More informationA GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
More informationDerivatives Math 120 Calculus I D Joyce, Fall 2013
Derivatives Mat 20 Calculus I D Joyce, Fall 203 Since we ave a good understanding of its, we can develop derivatives very quickly. Recall tat we defined te derivative f x of a function f at x to be te
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationHis solution? Federal law that requires government agencies and private industry to encrypt, or digitally scramble, sensitive data.
NET GAIN Scoring points for your financial future AS SEEN IN USA TODAY S MONEY SECTION, FEBRUARY 9, 2007 Tec experts plot to catc identity tieves Politicians to security gurus offer ideas to prevent data
More informationOPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS
New Developments in Structural Engineering and Construction Yazdani, S. and Sing, A. (eds.) ISEC-7, Honolulu, June 18-23, 2013 OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS JIALI FU 1, ERIK JENELIUS
More informationStream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
More informationCase Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationGPU Acceleration of the SENSEI CFD Code Suite
GPU Acceleration of the SENSEI CFD Code Suite Chris Roy, Brent Pickering, Chip Jackson, Joe Derlaga, Xiao Xu Aerospace and Ocean Engineering Primary Collaborators: Tom Scogland, Wu Feng (Computer Science)
More informationTurbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
More informationResearch on Risk Assessment of PFI Projects Based on Grid-fuzzy Borda Number
Researc on Risk Assessent of PFI Projects Based on Grid-fuzzy Borda Nuber LI Hailing 1, SHI Bensan 2 1. Scool of Arcitecture and Civil Engineering, Xiua University, Cina, 610039 2. Scool of Econoics and
More informationComputer Science and Engineering, UCSD October 7, 1999 Goldreic-Levin Teorem Autor: Bellare Te Goldreic-Levin Teorem 1 Te problem We æx a an integer n for te lengt of te strings involved. If a is an n-bit
More informationApplications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
More informationTangent Lines and Rates of Change
Tangent Lines and Rates of Cange 9-2-2005 Given a function y = f(x), ow do you find te slope of te tangent line to te grap at te point P(a, f(a))? (I m tinking of te tangent line as a line tat just skims
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
More information- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring 2012. Handout by Julie Zelenski with minor edits by Keith Schwarz
CS106B Spring 01 Handout # May 3, 01 Huffman Encoding and Data Compression Handout by Julie Zelenski wit minor edits by Keit Scwarz In te early 1980s, personal computers ad ard disks tat were no larger
More informationA New Cement to Glue Nonconforming Grids with Robin Interface Conditions: The Finite Element Case
A New Cement to Glue Nonconforming Grids wit Robin Interface Conditions: Te Finite Element Case Martin J. Gander, Caroline Japet 2, Yvon Maday 3, and Frédéric Nataf 4 McGill University, Dept. of Matematics
More informationReferendum-led Immigration Policy in the Welfare State
Referendum-led Immigration Policy in te Welfare State YUJI TAMURA Department of Economics, University of Warwick, UK First version: 12 December 2003 Updated: 16 Marc 2004 Abstract Preferences of eterogeneous
More informationOptimizing a 3D-FWT code in a cluster of CPUs+GPUs
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la
More informationRobust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code
Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code F. Rossi, S. Sinigardi, P. Londrillo & G. Turchetti University of Bologna & INFN GPU2014, Rome, Sept 17th
More informationParallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs
Parallel Smooters for Matrix-based Multigrid Metods on Unstructured Meses Using Multicore CPUs and GPUs Vincent Heuveline Dimitar Lukarski Nico Trost Jan-Pilipp Weiss No. 2-9 Preprint Series of te Engineering
More informationCan a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?
Can a Lump-Sum Transfer Make Everyone Enjoy te Gains from Free Trade? Yasukazu Icino Department of Economics, Konan University June 30, 2010 Abstract I examine lump-sum transfer rules to redistribute te
More informationHP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch
More informationTheoretical calculation of the heat capacity
eoretical calculation of te eat capacity Principle of equipartition of energy Heat capacity of ideal and real gases Heat capacity of solids: Dulong-Petit, Einstein, Debye models Heat capacity of metals
More informationOpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
More informationGPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationGPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy
More informationTis Problem and Retail Inventory Management
Optimizing Inventory Replenisment of Retail Fasion Products Marsall Fiser Kumar Rajaram Anant Raman Te Warton Scool, University of Pennsylvania, 3620 Locust Walk, 3207 SH-DH, Piladelpia, Pennsylvania 19104-6366
More informationDesign and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures
Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy Perspectives of GPU Computing in Physics
More informationNote: Principal version Modification Modification Complete version from 1 October 2014 Business Law Corporate and Contract Law
Note: Te following curriculum is a consolidated version. It is legally non-binding and for informational purposes only. Te legally binding versions are found in te University of Innsbruck Bulletins (in
More informationwww.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING
www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging
More informationHETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationNote nine: Linear programming CSE 101. 1 Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1
Copyrigt c Sanjoy Dasgupta Figure. (a) Te feasible region for a linear program wit two variables (see tet for details). (b) Contour lines of te objective function: for different values of (profit). Te
More informationScalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age
Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics
More informationE6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
More informationA strong credit score can help you score a lower rate on a mortgage
NET GAIN Scoring points for your financial future AS SEEN IN USA TODAY S MONEY SECTION, JULY 3, 2007 A strong credit score can elp you score a lower rate on a mortgage By Sandra Block Sales of existing
More information13 PERIMETER AND AREA OF 2D SHAPES
13 PERIMETER AND AREA OF D SHAPES 13.1 You can find te perimeter of sapes Key Points Te perimeter of a two-dimensional (D) sape is te total distance around te edge of te sape. l To work out te perimeter
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationWhat is Advanced Corporate Finance? What is finance? What is Corporate Finance? Deciding how to optimally manage a firm s assets and liabilities.
Wat is? Spring 2008 Note: Slides are on te web Wat is finance? Deciding ow to optimally manage a firm s assets and liabilities. Managing te costs and benefits associated wit te timing of cas in- and outflows
More informationAccelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
More informationHigh Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República
More informationMixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms
Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State
More informationReal-time Visual Tracker by Stream Processing
Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol
More informationCollege Planning Using Cash Value Life Insurance
College Planning Using Cas Value Life Insurance CAUTION: Te advisor is urged to be extremely cautious of anoter college funding veicle wic provides a guaranteed return of premium immediately if funded
More informationMulti-GPU Load Balancing for In-situ Visualization
Multi-GPU Load Balancing for In-situ Visualization R. Hagan and Y. Cao Department of Computer Science, Virginia Tech, Blacksburg, VA, USA Abstract Real-time visualization is an important tool for immediately
More informationIntroduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
More informationCatalogue no. 12-001-XIE. Survey Methodology. December 2004
Catalogue no. 1-001-XIE Survey Metodology December 004 How to obtain more information Specific inquiries about tis product and related statistics or services sould be directed to: Business Survey Metods
More informationDesign and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
More informationUnemployment insurance/severance payments and informality in developing countries
Unemployment insurance/severance payments and informality in developing countries David Bardey y and Fernando Jaramillo z First version: September 2011. Tis version: November 2011. Abstract We analyze
More informationDetermine the perimeter of a triangle using algebra Find the area of a triangle using the formula
Student Name: Date: Contact Person Name: Pone Number: Lesson 0 Perimeter, Area, and Similarity of Triangles Objectives Determine te perimeter of a triangle using algebra Find te area of a triangle using
More information