FFT Opencl and Polynomial Multiplication

Size: px
Start display at page:

Download "FFT Opencl and Polynomial Multiplication"

Transcription

1 FFT Opencl and Polynomial Multiplication CSE 5211 Design and Analysis of Algorithms D. Eiland, Y. Duan & S. Wang 12/1/2011

2 OpenCL based Polynomial Multiplication OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. And it provides parallel computing using task-based and data-based parallelism. It has been adopted by Intel, AMD, Nvidia, and ARM. No doubt, OpenCL is a very new technology. I will illustrate how to write a hello world as beginning. Headings Just like any other external API used in C++, a header file must be included when using the OpenCL API. For the C++ bindings we use cl.hpp. Besides that, OpenCL also support JAVA programming language. But in this project, our team decide to use C++. Then, a small number of additional C++ headers, which are agnostic to OpenCL, are used. Errors Figure 1 Headings in Hello world A common property of most OpenCL API calls is that they either return an error code (type cl_int) as the result of the function itself, or the error code is stored at a location passed by the user as a parameter to the call. So, it is important for the application to check its each behavior correctly in the case of error. So, it is very necessary to define a function to handle the error code each time. As you can see in Figure 2.

3 Figure 2 Context The steps above are preparation works. And the following step to initializing and using OpenCL is to create a context. The rest of the OpenCL work (creating devices and memory, compiling and running programs) is performed within this context. A context can have a number of associated devices (for example, CPU or GPU devices), and, within a context, OpenCL guarantees a relaxed memory consistency between devices. However, before creating the context, we need to choose a platform from the platform list. Figure 3 platform and context You can alternate CL_DEVICE_TYPE_GPU or CL_DEVICE_TYPE_CPU to run the OpenCL and you also could choose the proper platform you like in the formlist. Buffer Before delving into compute devices, where the real work happens, an OpenCL buffer should be allocated to hold the result of the kernel that will be run on the device. Passing the flag CL_MEM_USE_HOST_PTR, when creating the buffer.

4 Figure 4 Devices In OpenCL, although many operations are performed with respect to a given context, there are also device specific operations. OpenCL provides the ability to queue information about particular objects, and using the C++ API it comes in the form of object.getinfo <CL_OBJECT_QUERY>(). Figure 5 After obtaining the proper device, the kernel file should be built and loaded in this device. Kernel The first few lines of the following code simply load the OpenCL device program from disk, convert it to a string, and create a cl::program::sources object using the helper constructor. Given an object of type cl::program::sources a cl::program, an object is created and associated with a context, then built for a particular set of devices. EntryPoint Figure 6 A given program can have many entry points, called kernels. There is assumed to exist a straightforward mapping from kernel names, represented as strings, to a function defined with the kernel attribute in the compute program.

5 Building a cl::kernel object, kernel. Kernel arguments are set using the C++ API with kernel.setarg(), which takes the index and value for the particular argument. CommandQueue Figure 7 Each command queue has a one-to-one mapping with a given device; it is created with the associated context using a call to the constructor for the class cl::commandqueue. Given a cl::commandqueue queue, kernels can be queued using queue.enqueundrangekernel. This queues a kernel for execution on the associated device. Event Figure 8 The final argument to the enqueuendrangekernel call above was a cl::event object, which can be used to query the status of the command with which it is associated. It supports the method wait() that blocks until the command has completed. This is required to ensure the kernel has finished execution before reading the result back into host memory with queue.enqueuereadbuffer(). With the compute result back in host memory, it is simply a matter of outputting the result to std::cout and exiting the program. Figure 9

6 KernelFile Before showing the code, it s necessary to introduce the memory mode in OpenCL, which is also important about the kernel file. OpenCL 1.0 defines 4 memory spaces: private, local, constant and global. The figure below shows a diagram of the memory hierarchy defined by OpenCL. Private memory is memory that can only be used by a single compute unit. This is similar to registers in a single compute unit or a single CPU core. Local memory is memory that can be used by the work-items in a work-group. This is similar to the local data share that is available on the current generation of AMD GPUs. Constant memory is memory that can be used to store constant data for read-only access by all of the compute units in the device during the execution of a kernel. The host processor is responsible for allocating and initializing the memory objects that reside in this memory space. This is similar to the constant caches that are available on AMD GPUs. Finally, global memory is memory that can be used by all the compute units on the device. This is similar to the off-chip GPU memory that is available on AMD GPUs. Figure 10 memory model Overview We implemented a polynomial multiplication tool that uses the properties of the Discrete Fourier Transform (DFT) to perform the bulk of the work. DFT-based Polynomial Multiplication The product of two polynomials (A*B) is normally an O(n 2 ) operation, however, by using the DFT operation it can be reduced to an O(nlogn) operation. This is done by first doubling the size of the

7 polynomials (A, B) and transforming them using a DFT operation. These results are them multiplied together (using a point-wise operation) and then an inverse DFT is performed which results in the expected polynomial coefficients. To achieve O(nlogn) speed of the transformation, the DFT operation is substituted with the more efficient Fast Fourier Transform (FFT) version. The FFT is reliant upon the bufferfly operation and determines how elements (or values) are combined during the transformation. The primary different between our iterative and parallel implementations is the execution of the butterfly operation. With the iterative version each operation is performed one after another, while the parallel version executes stages of (size n) operations simultaneously. OpenCL implementation OpenCL has three major concepts: Buffers Memory that can be accessed within an OpenCL execution context Devices Commanded to execute code in parallel Kernels Code that can be executed (on Devices) We used the following algorithm for out OpenCL implementation: 1. Create random data sets A + B (size = n) and pad with zeros (size = 2n) 2. Load OpenCL Device 3. Load FFT, Point-Wise Multiply and Inverse FFT Kernels 4. Copy A + B to Buffers on OpenCL Device 5. For log 2 (2n) iterations; Execute (n times in parallel) FFT Kernel on A + B 6. Execute (n times in parallel) Point-Wise Multiply Kernel on FFT output 7. Execute (n times in parallel) Inverse FFT on Point-Wise Multiply output 8. Copy Inverse FFT output from Buffers on OpenCL Device Results For our tests, we compared the following implementations: Iterative Parallel w/opencl CPU Device Parallel w/opencl GPU Device All tests were conducted on the following hardware/software configuration: OS: Linux (Fedora Core 15) CPU: AMD Phenom X4 (2.6 Ghz) RAM: 4 GB GPU: Radeon 46XX (1 GB Video Ram) Compiler: GCC OpenCL: AMD APP 2.3

8 To test the FFT, I give a coefficient polynomial, and the device is Nvidia GMS 360, then the result is : Figure 11 To test the polynomial multiplication, the coefficient size is 2 4, then the result is : Figure 12 From table 1, we can see the overall run-time results. It is fairly obvious that once we reach a certain input size (2 10 ) that both the parallel OpenCL versions become faster than the iterative version and in the final run will be 60x faster. However, if we profile the run-time of specific portions of the algorithms does not clarify why OpenCL versions require the large input. While there is some overhead from loading the OpenCL kernel and loading the compute device with memory, the majority of the time is always spent within parallel (FFT) execution. This might be related to some overhead for launch the threads or possibly an inefficient FFT

9 implementation. Whatever the case, it is clear the given a large enough problem, OpenCL parallelization has the potential to significantly speed-up computation times. Table 1 - Total Run-Time Input Size (2 n ) Total Run-Time Iterative Parallel (CPU) Parallel (GPU) Table 2 - Iterative Run-Time Break Down Input Size (2 n ) FFT Point-wise Multiplication Inverse FFT Total Run-Time

10 Input Size (2 n ) Kernel Load / Compile Time Buffer Copy (CPU -> GPU) Table 3 - OpenCL (CPU) Timing Break Down Buffer copy (GPU -> CPU) FFT Point-wise Multiply Inverse FFT Total Run- Time Table 4 - OpenCL (GPU) Timing Break Down Input Size (2 n ) Kernel Load / Compile Time Buffer Copy (CPU -> GPU) Buffer copy (GPU -> CPU) FFT Point-wise Multiply Inverse FFT Total Run- Time

11

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

Cross-Platform GP with Organic Vectory BV Project Services Consultancy Services Expertise Markets 3D Visualization Architecture/Design Computing Embedded Software GIS Finance George van Venrooij Organic

More information

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Experiences on using GPU accelerators for data analysis in ROOT/RooFit Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,

More information

Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful:

Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful: Course materials In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful: OpenCL C 1.2 Reference Card OpenCL C++ 1.2 Reference Card These cards will

More information

COSCO 2015 Heterogeneous Computing Programming

COSCO 2015 Heterogeneous Computing Programming COSCO 2015 Heterogeneous Computing Programming Michael Meyer, Shunsuke Ishikuro Supporters: Kazuaki Sasamoto, Ryunosuke Murakami July 24th, 2015 Heterogeneous Computing Programming 1. Overview 2. Methodology

More information

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

Introduction to GPU Computing

Introduction to GPU Computing Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture

More information

Hard Disk Drive vs. Kingston SSDNow V+ 200 Series 240GB: Comparative Test

Hard Disk Drive vs. Kingston SSDNow V+ 200 Series 240GB: Comparative Test Hard Disk Drive vs. Kingston Now V+ 200 Series 240GB: Comparative Test Contents Hard Disk Drive vs. Kingston Now V+ 200 Series 240GB: Comparative Test... 1 Hard Disk Drive vs. Solid State Drive: Comparative

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

OpenACC 2.0 and the PGI Accelerator Compilers

OpenACC 2.0 and the PGI Accelerator Compilers OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group michael.wolfe@pgroup.com This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present

More information

Introduction to OpenCL Programming. Training Guide

Introduction to OpenCL Programming. Training Guide Introduction to OpenCL Programming Training Guide Publication #: 137-41768-10 Rev: A Issue Date: May, 2010 Introduction to OpenCL Programming PID: 137-41768-10 Rev: A May, 2010 2010 Advanced Micro Devices

More information

Reliable Systolic Computing through Redundancy

Reliable Systolic Computing through Redundancy Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Ensure that the AMD APP SDK Samples package has been installed before proceeding.

Ensure that the AMD APP SDK Samples package has been installed before proceeding. AMD APP SDK v2.6 Getting Started 1 How to Build a Sample 1.1 On Windows Ensure that the AMD APP SDK Samples package has been installed before proceeding. Building With Visual Studio Solution Files The

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

OpenCL. Administrivia. From Monday. Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011. Assignment 5 Posted. Project

OpenCL. Administrivia. From Monday. Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011. Assignment 5 Posted. Project Administrivia OpenCL Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 5 Posted Due Friday, 03/25, at 11:59pm Project One page pitch due Sunday, 03/20, at 11:59pm 10 minute pitch

More information

Multi-core Programming System Overview

Multi-core Programming System Overview Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Intelligent Heuristic Construction with Active Learning

Intelligent Heuristic Construction with Active Learning Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

I/O Management. General Computer Architecture. Goals for I/O. Levels of I/O. Naming. I/O Management. COMP755 Advanced Operating Systems 1

I/O Management. General Computer Architecture. Goals for I/O. Levels of I/O. Naming. I/O Management. COMP755 Advanced Operating Systems 1 General Computer Architecture I/O Management COMP755 Advanced Operating Systems Goals for I/O Users should access all devices in a uniform manner. Devices should be named in a uniform manner. The OS, without

More information

Leveraging Aparapi to Help Improve Financial Java Application Performance

Leveraging Aparapi to Help Improve Financial Java Application Performance Leveraging Aparapi to Help Improve Financial Java Application Performance Shrinivas Joshi, Software Performance Engineer Abstract Graphics Processing Unit (GPU) and Accelerated Processing Unit (APU) offload

More information

RTOS Debugger for ecos

RTOS Debugger for ecos RTOS Debugger for ecos TRACE32 Online Help TRACE32 Directory TRACE32 Index TRACE32 Documents... RTOS Debugger... RTOS Debugger for ecos... 1 Overview... 2 Brief Overview of Documents for New Users... 3

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

How to choose a suitable computer

How to choose a suitable computer How to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and post-processing your data with Artec Studio. While

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Lecture 3. Optimising OpenCL performance

Lecture 3. Optimising OpenCL performance Lecture 3 Optimising OpenCL performance Based on material by Benedict Gaster and Lee Howes (AMD), Tim Mattson (Intel) and several others. - Page 1 Agenda Heterogeneous computing and the origins of OpenCL

More information

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Streamline Computing Linux Cluster User Training. ( Nottingham University) 1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running

More information

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library

More information

Application Performance Monitoring: Trade-Off between Overhead Reduction and Maintainability

Application Performance Monitoring: Trade-Off between Overhead Reduction and Maintainability Application Performance Monitoring: Trade-Off between Overhead Reduction and Maintainability Jan Waller, Florian Fittkau, and Wilhelm Hasselbring 2014-11-27 Waller, Fittkau, Hasselbring Application Performance

More information

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v6.5 August 2014 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

farmerswife Contents Hourline Display Lists 1.1 Server Application 1.2 Client Application farmerswife.com

farmerswife Contents Hourline Display Lists 1.1 Server Application 1.2 Client Application farmerswife.com Contents 2 1 System requirements 2 1.1 Server Application 3 1.2 Client Application.com 1 1 Ensure that the computers on which you are going to install the Server and Client applications meet the system

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

9/26/2011. What is Virtualization? What are the different types of virtualization.

9/26/2011. What is Virtualization? What are the different types of virtualization. CSE 501 Monday, September 26, 2011 Kevin Cleary kpcleary@buffalo.edu What is Virtualization? What are the different types of virtualization. Practical Uses Popular virtualization products Demo Question,

More information

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12 XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines A.Zydroń 18 April 2009 Page 1 of 12 1. Introduction...3 2. XTM Database...4 3. JVM and Tomcat considerations...5 4. XTM Engine...5

More information

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 15 th January 2014 Al Chrosny Director, Software Engineering TreeAge Software, Inc. achrosny@treeage.com Andrew Munzer Director, Training and Customer

More information

Oracle9i Release 2 Database Architecture on Windows. An Oracle Technical White Paper April 2003

Oracle9i Release 2 Database Architecture on Windows. An Oracle Technical White Paper April 2003 Oracle9i Release 2 Database Architecture on Windows An Oracle Technical White Paper April 2003 Oracle9i Release 2 Database Architecture on Windows Executive Overview... 3 Introduction... 3 Oracle9i Release

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

New Technology Introduction: Android Studio with PushBot

New Technology Introduction: Android Studio with PushBot FIRST Tech Challenge New Technology Introduction: Android Studio with PushBot Carol Chiang, Stephen O Keefe 12 September 2015 Overview Android Studio What is it? Android Studio system requirements Android

More information

gpus1 Ubuntu 10.04 Available via ssh

gpus1 Ubuntu 10.04 Available via ssh gpus1 Ubuntu 10.04 Available via ssh root@gpus1:[~]#lspci -v grep VGA 01:04.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a) 03:00.0 VGA compatible controller: nvidia Corporation

More information

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

EECS 678: Introduction to Operating Systems

EECS 678: Introduction to Operating Systems EECS 678: Introduction to Operating Systems 1 About Me Heechul Yun, Assistant Prof., Dept. of EECS Office: 3040 Eaton, 236 Nichols Email: heechul.yun@ku.edu Research Areas Operating systems and architecture

More information

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

More information

Virtual Machines. COMP 3361: Operating Systems I Winter 2015 http://www.cs.du.edu/3361

Virtual Machines. COMP 3361: Operating Systems I Winter 2015 http://www.cs.du.edu/3361 s COMP 3361: Operating Systems I Winter 2015 http://www.cs.du.edu/3361 1 Virtualization! Create illusion of multiple machines on the same physical hardware! Single computer hosts multiple virtual machines

More information

Try Linux: Brief Guide for Rookies

Try Linux: Brief Guide for Rookies Try Linux: Brief Guide for Rookies December 8, 2010 Outline 1 2 3 4 5 Many people are afraid of technical difficulties of Linux. Many people fear that installing Linux may screw up their computer. Two

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

Mitglied der Helmholtz-Gemeinschaft. OpenCL Basics. Parallel Computing on GPU and CPU. Willi Homberg. 23. März 2011

Mitglied der Helmholtz-Gemeinschaft. OpenCL Basics. Parallel Computing on GPU and CPU. Willi Homberg. 23. März 2011 Mitglied der Helmholtz-Gemeinschaft OpenCL Basics Parallel Computing on GPU and CPU Willi Homberg Agenda Introduction OpenCL architecture Platform model Execution model Memory model Programming model Platform

More information

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are

More information

1. Computer System Structure and Components

1. Computer System Structure and Components 1 Computer System Structure and Components Computer System Layers Various Computer Programs OS System Calls (eg, fork, execv, write, etc) KERNEL/Behavior or CPU Device Drivers Device Controllers Devices

More information

Testing Database Performance with HelperCore on Multi-Core Processors

Testing Database Performance with HelperCore on Multi-Core Processors Project Report on Testing Database Performance with HelperCore on Multi-Core Processors Submitted by Mayuresh P. Kunjir M.E. (CSA) Mahesh R. Bale M.E. (CSA) Under Guidance of Dr. T. Matthew Jacob Problem

More information

Computer Architecture. Secure communication and encryption.

Computer Architecture. Secure communication and encryption. Computer Architecture. Secure communication and encryption. Eugeniy E. Mikhailov The College of William & Mary Lecture 28 Eugeniy Mikhailov (W&M) Practical Computing Lecture 28 1 / 13 Computer architecture

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Grant Management. System Requirements

Grant Management. System Requirements January 26, 2014 This is a publication of Abila, Inc. Version 2014.x 2013 Abila, Inc. and its affiliated entities. All rights reserved. Abila, the Abila logos, and the Abila product and service names mentioned

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

Performance Report Modular RAID for PRIMERGY

Performance Report Modular RAID for PRIMERGY Performance Report Modular RAID for PRIMERGY Version 1.1 March 2008 Pages 15 Abstract This technical documentation is designed for persons, who deal with the selection of RAID technologies and RAID controllers

More information

OpenProdoc. Benchmarking the ECM OpenProdoc v 0.8. Managing more than 200.000 documents/hour in a SOHO installation. February 2013

OpenProdoc. Benchmarking the ECM OpenProdoc v 0.8. Managing more than 200.000 documents/hour in a SOHO installation. February 2013 OpenProdoc Benchmarking the ECM OpenProdoc v 0.8. Managing more than 200.000 documents/hour in a SOHO installation. February 2013 1 Index Introduction Objectives Description of OpenProdoc Test Criteria

More information

Copyright 1999-2011 by Parallels Holdings, Ltd. All rights reserved.

Copyright 1999-2011 by Parallels Holdings, Ltd. All rights reserved. Parallels Virtuozzo Containers 4.0 for Linux Readme Copyright 1999-2011 by Parallels Holdings, Ltd. All rights reserved. This document provides the first-priority information on Parallels Virtuozzo Containers

More information

SOFTWARE TECHNOLOGIES

SOFTWARE TECHNOLOGIES SOFTWARE TECHNOLOGIES (September 2, 2015) BUS3500 - Abdou Illia, Fall 2015 1 LEARNING GOALS Identify the different types of systems software. Explain the main functions of operating systems. Know the various

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

GeoImaging Accelerator Pansharp Test Results

GeoImaging Accelerator Pansharp Test Results GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance

More information

Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)

Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs) Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs) 1. Foreword Magento is a PHP/Zend application which intensively uses the CPU. Since version 1.1.6, each new version includes some

More information

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version 6.3.1 Fix Pack 2.

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version 6.3.1 Fix Pack 2. IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version 6.3.1 Fix Pack 2 Reference IBM Tivoli Composite Application Manager for Microsoft

More information

About Me: Brent Ozar. Perfmon and Profiler 101

About Me: Brent Ozar. Perfmon and Profiler 101 Perfmon and Profiler 101 2008 Quest Software, Inc. ALL RIGHTS RESERVED. About Me: Brent Ozar SQL Server Expert for Quest Software Former SQL DBA Managed >80tb SAN, VMware Dot-com-crash experience Specializes

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

ST810 Advanced Computing

ST810 Advanced Computing ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview

More information

INSTALLATION GUIDE ENTERPRISE DYNAMICS 9.0

INSTALLATION GUIDE ENTERPRISE DYNAMICS 9.0 INSTALLATION GUIDE ENTERPRISE DYNAMICS 9.0 PLEASE NOTE PRIOR TO INSTALLING On Windows 8, Windows 7 and Windows Vista you must have Administrator rights to install the software. Installing Enterprise Dynamics

More information

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems Fastboot Techniques for x86 Architectures Marcus Bortel Field Application Engineer QNX Software Systems Agenda Introduction BIOS and BIOS boot time Fastboot versus BIOS? Fastboot time Customizing the boot

More information

SierraVMI Sizing Guide

SierraVMI Sizing Guide SierraVMI Sizing Guide July 2015 SierraVMI Sizing Guide This document provides guidelines for choosing the optimal server hardware to host the SierraVMI gateway and the Android application server. The

More information

2: Computer Performance

2: Computer Performance 2: Computer Performance http://people.sc.fsu.edu/ jburkardt/presentations/ fdi 2008 lecture2.pdf... John Information Technology Department Virginia Tech... FDI Summer Track V: Parallel Programming 10-12

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

System Requirements Table of contents

System Requirements Table of contents Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5

More information

How To Develop Android On Your Computer Or Tablet Or Phone

How To Develop Android On Your Computer Or Tablet Or Phone AN INTRODUCTION TO ANDROID DEVELOPMENT CS231M Alejandro Troccoli Outline Overview of the Android Operating System Development tools Deploying application packages Step-by-step application development The

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

VIRTU Universal MVP Installation Guide

VIRTU Universal MVP Installation Guide VIRTU Universal MVP Installation Guide 1 1. Introduction VIRTU Universal MVP includes the base features of Virtu Universal technology, which virtualizes integrated GPU and discrete GPU for best of breed

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

Example of Standard API

Example of Standard API 16 Example of Standard API System Call Implementation Typically, a number associated with each system call System call interface maintains a table indexed according to these numbers The system call interface

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

Java GPU Computing. Maarten Steur & Arjan Lamers

Java GPU Computing. Maarten Steur & Arjan Lamers Java GPU Computing Maarten Steur & Arjan Lamers Overzicht OpenCL Simpel voorbeeld Casus Tips & tricks Vragen Waarom GPU Computing Afkortingen CPU, GPU, APU Khronos: OpenCL, OpenGL Nvidia: CUDA JogAmp JOCL,

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

System Requirements G E N E R A L S Y S T E M R E C O M M E N D A T I O N S

System Requirements G E N E R A L S Y S T E M R E C O M M E N D A T I O N S System Requirements General Requirements These requirements are common to all platforms: A DVD drive for installation. If you need to install the software using CD-ROM media, please contact your local

More information

2020 Design Update 11.3. Release Notes November 10, 2015

2020 Design Update 11.3. Release Notes November 10, 2015 2020 Design Update 11.3 Release Notes November 10, 2015 Contents Introduction... 1 System Requirements... 2 Actively Supported Operating Systems... 2 Hardware Requirements (Minimum)... 2 Hardware Requirements

More information

Upgrade Instructions Zephyr 4.7 (Windows Server Installation)

Upgrade Instructions Zephyr 4.7 (Windows Server Installation) Upgrade Instructions Zephyr 4.7 (Windows Server Installation) TECHNICAL SUPPORT NOTE Audience: Zephyr Enterprise and Community Edition customers migrating from version 4.1 (Build 6173 or 6212), version

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

MDM Multidomain Edition (Version 9.6.0) For Microsoft SQL Server Performance Tuning

MDM Multidomain Edition (Version 9.6.0) For Microsoft SQL Server Performance Tuning MDM Multidomain Edition (Version 9.6.0) For Microsoft SQL Server Performance Tuning 2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Abila Grant Management. System Requirements

Abila Grant Management. System Requirements Abila Grant Management This is a publication of Abila, Inc. Version 2015 2014 Abila, Inc. and its affiliated entities. All rights reserved. Abila, the Abila logos, and the Abila product and service names

More information

Performance analysis of a Linux based FTP server

Performance analysis of a Linux based FTP server Performance analysis of a Linux based FTP server A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Technology by Anand Srivastava to the Department of Computer Science

More information

Several tips on how to choose a suitable computer

Several tips on how to choose a suitable computer Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec

More information

Using Power to Improve C Programming Education

Using Power to Improve C Programming Education Using Power to Improve C Programming Education Jonas Skeppstedt Department of Computer Science Lund University Lund, Sweden jonas.skeppstedt@cs.lth.se jonasskeppstedt.net jonasskeppstedt.net jonas.skeppstedt@cs.lth.se

More information

Web Server (Step 1) Processes request and sends query to SQL server via ADO/OLEDB. Web Server (Step 2) Creates HTML page dynamically from record set

Web Server (Step 1) Processes request and sends query to SQL server via ADO/OLEDB. Web Server (Step 2) Creates HTML page dynamically from record set Dawn CF Performance Considerations Dawn CF key processes Request (http) Web Server (Step 1) Processes request and sends query to SQL server via ADO/OLEDB. Query (SQL) SQL Server Queries Database & returns

More information

Samsung Magician v.4.5 Introduction and Installation Guide

Samsung Magician v.4.5 Introduction and Installation Guide Samsung Magician v.4.5 Introduction and Installation Guide 1 Legal Disclaimer SAMSUNG ELECTRONICS RESERVES THE RIGHT TO CHANGE PRODUCTS, INFORMATION AND SPECIFICATIONS WITHOUT NOTICE. Products and specifications

More information

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what

More information

Ultra Thin Client TC-401 TC-402. Users s Guide

Ultra Thin Client TC-401 TC-402. Users s Guide Ultra Thin Client TC-401 TC-402 Users s Guide CONTENT 1. OVERVIEW... 3 1.1 HARDWARE SPECIFICATION... 3 1.2 SOFTWARE OVERVIEW... 4 1.3 HARDWARE OVERVIEW...5 1.4 NETWORK CONNECTION... 7 2. INSTALLING THE

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*

Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows* Get an Easy Performance Boost Even with Unthreaded Apps for Windows* Can recompiling just one file make a difference? Yes, in many cases it can! Often, you can achieve a major performance boost by recompiling

More information