Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Size: px
Start display at page:

Download "Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems"

Transcription

1 Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems A. Carbon, Y. Lhuillier, H.-P. Charles CEA LIST DACLE division Embedded Computing Embedded Software Laboratories France Contact: 24th IEEE International Conference on Applicationspecific Systems, Architectures and Processors June , Washington D.C., USA.

2 Cliquez pour modifier le style du Outline titre Context : Virtualization JIT emergence JIT optimization opportunities (based on the LLVM framework) Hardware accelerator proposal Experiments results Conclusion 2

3 Cliquez pour modifier le style Context du titre Parallelism emergence CPU CPU CPU CPU CPU 3

4 Cliquez pour modifier le style Context du titre Parallelism emergence Heterogeneity development in computing systems ILT/TLP-based multi-cores and wide SIMD GPUs CPU CPU CPU CPU GPU 4

5 Cliquez pour modifier le style Context du titre Parallelism emergence Heterogeneity development in embedded systems Heterogeneous asymmetric many-core processors (AMP) emergence CPU DSP CPU HW acc. GPU PPE: a major concern of embedded systems designers 5

6 Cliquez Context: pour Virtualization modifier le style emergence du titre Code deployment on such architectures High development cost to efficiently target one AMP Code portability has become a major issue Software? Hardware 6

7 Cliquez Context: pour Virtualization modifier le style emergence du titre Emergence of virtualization abstraction layers First mention in 60 s (IBM 360/67) Java Virtual Machines, CLI, LLVM => Virtual Machines (VMs) development Software Virtualization layer Hardware 7

8 Cliquez Context: pour Virtualization modifier le style emergence du titre Emergence of virtualization abstraction layers Based initially on interpretation Suffer from considerable performance overheads Software Virtualization Interpretation layer Hardware 8

9 Cliquez Context: pour modifier Just-In-Time le style emergence du titre Emergence of virtualization abstraction layers Coupling today interpretation and Just-In-Time compilation Software Interpretation Virtualization layer Just-In-Time compilation Hardware 9

10 Cliquez Context: pour modifier Just-In-Time le style emergence du titre JIT compilation: widely used in GPP Performance consumption overheads in embedded 2 kinds of existing optimizations to reduce JIT impact Software optimizations System design specialization Specialized dedicated resources Additional standard dedicated resources 1 JIT compilation complexity limits performance gains 1 Pointer-based algorithms Proposing tuned hardware associated to these dedicated resources to manage JIT compilation algorithms 1 Ting Cao et al. «The yin and yang of power and performance for asymmetric hardware and managed software», ISCA

11 Cliquez pour modifier le style du Outline titre Context : Virtualization JIT emergence JIT optimization opportunities (based on the LLVM framework) Hardware accelerator proposal Experiments results Conclusion 11

12 Cliquez pour modifier LLVM le style framework du titre LLVM bytecode compiler (LLC): used in many projects Profiling: identifying LLC s most critical parts Experiments on a ARM Cortex-A5 model Associative array management dynamic memory allocation: on average 24% of LLC execution time 12

13 LLVM: Cliquez existing pour modifier software le optimizations style du titre New abstract-data-types (optimized versions of STL C++ ADT) Provide [multi]map multi[set] abstract data types Hash table implementations rather than sorted-tree STL C++: still used when performance is not a key issue Specialized allocators (eg. RecycleAllocator) Keeps track of recently dealloccated objects to reuse them Avoiding frequent allocations and deallocations Despite these software optimizations, associative arrays memory allocation still prevail 13

14 Cliquez pour modifier LLVM le normalization style du titre Our goal: proposing an alternative acceleration of associative arrays dynamic memory allocation Mean: standardization of LLC Proposing a solution based on standard libraries for reuse Using only STL C++ library for [multi]map multi[set] ADT C s memory allocation standard library (dlmalloc-based) Transferring optimizations to a hardware accelerator Solution portability: accelerator reuse Benefit to all pointer-based algorithms using massively associative arrays and dynamic memory allocation 14

15 Cliquez pour modifier le style du Outline titre Context : Virtualization JIT emergence JIT optimization opportunities (based on the LLVM framework) Hardware accelerator proposal Experiments results Conclusion 15

16 Cliquez RB-Tree pour modifier Hardware le style accelerator du titre Current implementation of associative arrays memory allocation Standard C++ Map Set libraries Using Red-Black Tree representation Binary tree with coloring property C s memory allocator: dlmalloc Using associative arrays to associate data sizes with free memory chunks Using hash table double linked-lists Systematic usage of RB-Trees Proposing an implementation of dlmalloc using RB-Trees Modifying the allocator without modifying user interface 16

17 Cliquez Hardware pour modifier acceleration style description du titre New RB-Tree node structure: held in 128-bits Digest key in 31-bits, color in the last bit, preserving the sorting order rb_tree_node_t* X rb_tree_node_t* X 31 0 COLOR PARENT LEFT RIGHT KEY D_KEY C 128 bits PARENT LEFT 128 bits RIGHT Key_size (a) initial structure (b) proposed structure Proposing hardware accelerated instructions Specialized instructions for RB-Tree management functions Accelerating traversals, key look-ups, balanced insertion and removal 17

18 Cliquez pour modifier Proposed le ISA style extension du titre 15 new instructions over 400 in ARM ISA Instruction Function Used by RBTINC Rd, Rm increment map::iterator, set::iterator RBTDEC Rd, Rm decrement map::iterator, set::iterator RBTLOW Rd, Rn, Rm lower bound map::lower_bound, set::lower_bound, malloc, realloc RBTUP Rd, Rn, Rm upper_bound map::upper_bound, set::upper_bound RBTDEL Rn, Rm rebalance for erase map::erase, set::erase, malloc, realloc, free RBTINS <L R> Rn, Rm, Rs insert rebalance map::insert, set::insert, malloc, realloc, free Read at most 3 registers, write at most 1 register Multi-cycles instructions: hiding iterative computations 18

19 Cliquez HW pour accelerator modifier implementation style du titre New Cortex-A5 pipeline HW accelerator proposal Full FSM size estimation: 161 states, 223 transitions 19

20 Cliquez pour modifier le style du Outline titre Context : Virtualization JIT emergence JIT optimization opportunities (based on the LLVM framework) Hardware accelerator proposal Experiments results Conclusion 20

21 Cliquez pour modifier Experiments le style du Results titre Instrumented ARM Cortex-A5 ISS with a cache simulator Speedups obtained for SW and HW accelerated versions of LLC SW: 29% of gain, HW: 50% / LLC standardized version 15% of gain for HW acc / SW acc 21

22 Cliquez pour modifier Experiments le style du Results titre Evolution of time spent in memory allocation associative array management (relative to total execution time) From 41% to 24% (SW) 12% (HW) Raw speedup on memory allocation associative arrays: 5x 22

23 Cliquez pour modifier le style du Outline titre Context : Virtualization JIT emergence JIT optimization opportunities (based on the LLVM framework) Hardware accelerator proposal Experiments results Conclusion 23

24 Cliquez pour modifier le style Conclusion du titre Interest of dedicated resources for virtualization services Limited gains for JIT compilation with SW optimizations Impact of dynamic memory allocation associative array management on execution time Proposing tuned HW for JIT compilation, coupled to dedicated resources HW accelerator hidden behind standard libraries Valuable for all pointer-based algorithms ISA extension for RB-Tree management functions 15% of gain comparing to SW opt in LLVM code gen 5x raw speedup for memory allocation associative array management Next acceleration opportunities: instruction graph handling 24

25 Thank you Questions? Centre de Grenoble 17 rue des Martyrs Grenoble Cedex Centre de Saclay Nano-Innov PC Gif sur Yvette Cedex

Secure data processing: Blind Hypervision

Secure data processing: Blind Hypervision Secure data processing: Blind Hypervision P. Dubrulle, R. Sirdey, E. Ohayon, P. Dore and M. Aichouch CEA LIST Contact : paul.dubrulle@cea.fr www.cea.fr Cliquez pour modifier le style Introduction titre

More information

Compila(on for the composi(on of so2ware protec(ons for embedded systems

Compila(on for the composi(on of so2ware protec(ons for embedded systems Compila(on for the composi(on of so2ware protec(ons for embedded systems Thierno BARRY 1 Damien COUROUSSÉ 1 Bruno ROBISSON 2 1 CEA LIST / DACLE 2 CEA / DPACA Firstname.LASTNAME@cea.fr Porquerolles Tuesday,

More information

On-Line Diagnosis using Orthogonal Multi-Tone Time Domain Reflectometry in a Lossy Cable

On-Line Diagnosis using Orthogonal Multi-Tone Time Domain Reflectometry in a Lossy Cable On-Line Diagnosis using Orthogonal Multi-Tone Time Domain Reflectometry in a Lossy Cable Wafa BEN HASSEN*, Fabrice AUZANNEAU*, Luca INCARBONE*, François PERES** and Ayeley P. TCHANGANI** (*) CEA, LIST,

More information

Runtime Code Generation for Code Polymorphism

Runtime Code Generation for Code Polymorphism Runtime Code Generation for Code Polymorphism Workshop on Runtime Code Generation for Secured Embedded Devices Damien Couroussé 2015-12-03 www.cea.fr Runtime Cliquez pour Code modifier Generation: le style

More information

Code generation under Control

Code generation under Control Code generation under Control Rencontres sur la compilation / Saint Hippolyte Henri-Pierre Charles CEA Laboratoire LaSTRE / Grenoble 12 décembre 2011 Introduction Présentation Henri-Pierre Charles, two

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

MCA Standards For Closely Distributed Multicore

MCA Standards For Closely Distributed Multicore MCA Standards For Closely Distributed Multicore Sven Brehmer Multicore Association, cofounder, board member, and MCAPI WG Chair CEO of PolyCore Software 2 Embedded Systems Spans the computing industry

More information

Jonathan Worthington Scarborough Linux User Group

Jonathan Worthington Scarborough Linux User Group Jonathan Worthington Scarborough Linux User Group Introduction What does a Virtual Machine do? Hides away the details of the hardware platform and operating system. Defines a common set of instructions.

More information

CEA LIST activity on Cable Monitoring and Diagnosis

CEA LIST activity on Cable Monitoring and Diagnosis CEA LIST activity on Cable Monitoring and Diagnosis Contact : Mr Josy COHEN Responsable de projet ingénieur chercheur Josy.cohen@cea.fr +33 1 69 08 78 07 www.cea.fr Cliquez CEA pour experience modifier

More information

Driving force. What future software needs. Potential research topics

Driving force. What future software needs. Potential research topics Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #

More information

Coherent sub-thz transmission systems in Silicon technologies: design challenges for frequency synthesis

Coherent sub-thz transmission systems in Silicon technologies: design challenges for frequency synthesis Coherent sub-thz transmission systems in Silicon technologies: design challenges for frequency synthesis Alexandre Siligaris www.cea.fr Cliquez pour modifier le style du Outline titre Introduction-context

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:

More information

Cloud Computing. Up until now

Cloud Computing. Up until now Cloud Computing Lecture 11 Virtualization 2011-2012 Up until now Introduction. Definition of Cloud Computing Grid Computing Content Distribution Networks Map Reduce Cycle-Sharing 1 Process Virtual Machines

More information

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon

More information

picojava TM : A Hardware Implementation of the Java Virtual Machine

picojava TM : A Hardware Implementation of the Java Virtual Machine picojava TM : A Hardware Implementation of the Java Virtual Machine Marc Tremblay and Michael O Connor Sun Microelectronics Slide 1 The Java picojava Synergy Java s origins lie in improving the consumer

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Introduction to Virtual Machines

Introduction to Virtual Machines Introduction to Virtual Machines Introduction Abstraction and interfaces Virtualization Computer system architecture Process virtual machines System virtual machines 1 Abstraction Mechanism to manage complexity

More information

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students Eastern Washington University Department of Computer Science Questionnaire for Prospective Masters in Computer Science Students I. Personal Information Name: Last First M.I. Mailing Address: Permanent

More information

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students Eastern Washington University Department of Computer Science Questionnaire for Prospective Masters in Computer Science Students I. Personal Information Name: Last First M.I. Mailing Address: Permanent

More information

Seven Challenges of Embedded Software Development

Seven Challenges of Embedded Software Development Corporate Technology Seven Challenges of Embedded Software Development EC consultation meeting New Platforms addressing mixed criticalities Brussels, Feb. 3, 2012 Urs Gleim Siemens AG Corporate Technology

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

12. Introduction to Virtual Machines

12. Introduction to Virtual Machines 12. Introduction to Virtual Machines 12. Introduction to Virtual Machines Modern Applications Challenges of Virtual Machine Monitors Historical Perspective Classification 332 / 352 12. Introduction to

More information

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute

More information

Virtual Machines. www.viplavkambli.com

Virtual Machines. www.viplavkambli.com 1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software

More information

Virtual Machine Learning: Thinking Like a Computer Architect

Virtual Machine Learning: Thinking Like a Computer Architect Virtual Machine Learning: Thinking Like a Computer Architect Michael Hind IBM T.J. Watson Research Center March 21, 2005 CGO 05 Keynote 2005 IBM Corporation What is this talk about? Virtual Machines? 2

More information

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing Won-Jong Lee, Shi-Hwa Lee, Jae-Ho Nah *, Jin-Woo Kim *, Youngsam Shin, Jaedon Lee, Seok-Yoon Jung SAIT, SAMSUNG Electronics, Yonsei Univ. *,

More information

COM 444 Cloud Computing

COM 444 Cloud Computing COM 444 Cloud Computing Lec 3: Virtual Machines and Virtualization of Clusters and Datacenters Prof. Dr. Halûk Gümüşkaya haluk.gumuskaya@gediz.edu.tr haluk@gumuskaya.com http://www.gumuskaya.com Virtual

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers ZT Systems (ST based) Applied Micro development platform HP Redstone platform Mitac Dell Copper platform ARM in Servers 1 Server Ecosystem Momentum 2009: Internal ARM trials hosting part of website on

More information

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable

More information

Recent Advances in Periscope for Performance Analysis and Tuning

Recent Advances in Periscope for Performance Analysis and Tuning Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,

More information

Java Virtual Machine: the key for accurated memory prefetching

Java Virtual Machine: the key for accurated memory prefetching Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain

More information

Optimizing Code for Accelerators: The Long Road to High Performance

Optimizing Code for Accelerators: The Long Road to High Performance Optimizing Code for Accelerators: The Long Road to High Performance Hans Vandierendonck Mons GPU Day November 9 th, 2010 The Age of Accelerators 2 Accelerators in Real Life 3 Latency (ps/inst) Why Accelerators?

More information

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Workshop on Computer Architecture Education 2015 Dan Connors, Kyle Dunn, Ryan Bueter Department of Electrical Engineering University

More information

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

High Performance or Cycle Accuracy?

High Performance or Cycle Accuracy? CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing

More information

Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform. Ed Spetka Mike Kohler

Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform. Ed Spetka Mike Kohler Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform Ed Spetka Mike Kohler Outline Abstract Hardware Overview Completely Fair Scheduler Design Theory Breakdown of the

More information

Accelerate Cloud Computing with the Xilinx Zynq SoC

Accelerate Cloud Computing with the Xilinx Zynq SoC X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce

More information

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86

More information

Chapter 3 Operating-System Structures

Chapter 3 Operating-System Structures Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information

ADVANCED COMPUTER ARCHITECTURE

ADVANCED COMPUTER ARCHITECTURE ADVANCED COMPUTER ARCHITECTURE Marco Ferretti Tel. Ufficio: 0382 985365 E-mail: marco.ferretti@unipv.it Web: www.unipv.it/mferretti, eecs.unipv.it 1 Course syllabus and motivations This course covers the

More information

A Unified View of Virtual Machines

A Unified View of Virtual Machines A Unified View of Virtual Machines First ACM/USENIX Conference on Virtual Execution Environments J. E. Smith June 2005 Introduction Why are virtual machines interesting? They allow transcending of interfaces

More information

Hybrid and Custom Data Structures: Evolution of the Data Structures Course

Hybrid and Custom Data Structures: Evolution of the Data Structures Course Hybrid and Custom Data Structures: Evolution of the Data Structures Course Daniel J. Ernst, Daniel E. Stevenson, and Paul Wagner Department of Computer Science University of Wisconsin Eau Claire Eau Claire,

More information

www.quilogic.com SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology

www.quilogic.com SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology The parallel revolution Future computing systems are parallel, but Programmers

More information

CFD Implementation with In-Socket FPGA Accelerators

CFD Implementation with In-Socket FPGA Accelerators CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline

More information

1/20/2016 INTRODUCTION

1/20/2016 INTRODUCTION INTRODUCTION 1 Programming languages have common concepts that are seen in all languages This course will discuss and illustrate these common concepts: Syntax Names Types Semantics Memory Management We

More information

AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL. AMD Embedded Solutions 1

AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL. AMD Embedded Solutions 1 AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL AMD Embedded Solutions 1 Optimizing Parallel Processing Performance and Coding Efficiency with AMD APUs and Texas Multicore Technologies SequenceL Auto-parallelizing

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex

More information

12/8/2010. Koen De Bosschere Ghent University Belgium JVM. Process. .NET Virtualization. Virtualization types. Xen. Paravirtualization.

12/8/2010. Koen De Bosschere Ghent University Belgium JVM. Process. .NET Virtualization. Virtualization types. Xen. Paravirtualization. Integrated : the silver bullet for future multi-core computing systems? Koen De Bosschere Ghent University Belgium Virtualization types JVM Process.NET Virtualization Xen Para System VMWare Full 1 Full

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Java Embedded Applications

Java Embedded Applications TM a One-Stop Shop for Java Embedded Applications GeeseWare offer brings Java in your constrained embedded systems. You develop and simulate your Java application on PC, and enjoy a seamless hardware validation.

More information

Next Generation Operating Systems

Next Generation Operating Systems Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the

More information

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...

More information

Building iphone Applications with Flash CS5. Mike Chambers Principal Product Manager Developer Relations Flash Platform

Building iphone Applications with Flash CS5. Mike Chambers Principal Product Manager Developer Relations Flash Platform Building iphone Applications with Flash CS5 Mike Chambers Principal Product Manager Developer Relations Flash Platform 1 What did Adobe announce at Max? 2 2 Two key announcements concerning developing

More information

What s New in MATLAB and Simulink

What s New in MATLAB and Simulink What s New in MATLAB and Simulink Kevin Cohan Product Marketing, MATLAB Michael Carone Product Marketing, Simulink 2015 The MathWorks, Inc. 1 What was new for Simulink in R2012b? 2 What Was New for MATLAB

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Garbage Collection in the Java HotSpot Virtual Machine

Garbage Collection in the Java HotSpot Virtual Machine http://www.devx.com Printed from http://www.devx.com/java/article/21977/1954 Garbage Collection in the Java HotSpot Virtual Machine Gain a better understanding of how garbage collection in the Java HotSpot

More information

Week 1 out-of-class notes, discussions and sample problems

Week 1 out-of-class notes, discussions and sample problems Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

ProTrack: A Simple Provenance-tracking Filesystem

ProTrack: A Simple Provenance-tracking Filesystem ProTrack: A Simple Provenance-tracking Filesystem Somak Das Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology das@mit.edu Abstract Provenance describes a file

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

INTEL IPP REALISTIC RENDERING MOBILE PLATFORM SOFTWARE DEVELOPMENT KIT

INTEL IPP REALISTIC RENDERING MOBILE PLATFORM SOFTWARE DEVELOPMENT KIT INTEL IPP REALISTIC RENDERING MOBILE PLATFORM SOFTWARE DEVELOPMENT KIT Department of computer science and engineering, Sogang university 2008. 7. 22 Deukhyun Cha INTEL PERFORMANCE LIBRARY: INTEGRATED PERFORMANCE

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Pervasive Parallelism Laboratory Stanford University Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun Graph and its Applications Graph Fundamental

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

A Case Study - Scaling Legacy Code on Next Generation Platforms

A Case Study - Scaling Legacy Code on Next Generation Platforms Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy

More information

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE Sudha M 1, Harish G M 2, Nandan A 3, Usha J 4 1 Department of MCA, R V College of Engineering, Bangalore : 560059, India sudha.mooki@gmail.com 2 Department

More information

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

C++ Programming Language

C++ Programming Language C++ Programming Language Lecturer: Yuri Nefedov 7th and 8th semesters Lectures: 34 hours (7th semester); 32 hours (8th semester). Seminars: 34 hours (7th semester); 32 hours (8th semester). Course abstract

More information

Supporting OpenMP on Cell

Supporting OpenMP on Cell Supporting OpenMP on Cell Kevin O Brien, Kathryn O Brien, Zehra Sura, Tong Chen and Tao Zhang IBM T. J Watson Research Abstract. The Cell processor is a heterogeneous multi-core processor with one Power

More information

General Introduction

General Introduction Managed Runtime Technology: General Introduction Xiao-Feng Li (xiaofeng.li@gmail.com) 2012-10-10 Agenda Virtual machines Managed runtime systems EE and MM (JIT and GC) Summary 10/10/2012 Managed Runtime

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.

More information

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Research Group Scientific Computing Faculty of Computer Science University of Vienna AUSTRIA http://www.par.univie.ac.at

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Techniques for Real-System Characterization of Java Virtual Machine Energy and Power Behavior

Techniques for Real-System Characterization of Java Virtual Machine Energy and Power Behavior Techniques for Real-System Characterization of Java Virtual Machine Energy and Power Behavior Gilberto Contreras Margaret Martonosi Department of Electrical Engineering Princeton University 1 Why Study

More information

Energiatehokas laskenta Ubi-sovelluksissa

Energiatehokas laskenta Ubi-sovelluksissa Energiatehokas laskenta Ubi-sovelluksissa Jarmo Takala Tampereen teknillinen yliopisto Tietokonetekniikan laitos email: jarmo.takala@tut.fi Energy-Efficiency Comparison: VGA 30 frames/s, 512kbit/s Software

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

Secured Embedded Many-Core Accelerator for Big Data Processing

Secured Embedded Many-Core Accelerator for Big Data Processing Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland,

More information

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Dan Connors, Kyle Dunn, and Ryan Bueter Department of Electrical Engineering University of Colorado Denver Denver, Colorado

More information

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will

More information

The SpiceC Parallel Programming System of Computer Systems

The SpiceC Parallel Programming System of Computer Systems UNIVERSITY OF CALIFORNIA RIVERSIDE The SpiceC Parallel Programming System A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science

More information

International Workshop on Field Programmable Logic and Applications, FPL '99

International Workshop on Field Programmable Logic and Applications, FPL '99 International Workshop on Field Programmable Logic and Applications, FPL '99 DRIVE: An Interpretive Simulation and Visualization Environment for Dynamically Reconægurable Systems? Kiran Bondalapati and

More information

my forecasted needs. The constraint of asymmetrical processing was offset two ways. The first was by configuring the SAN and all hosts to utilize

my forecasted needs. The constraint of asymmetrical processing was offset two ways. The first was by configuring the SAN and all hosts to utilize 1) Disk performance When factoring in disk performance, one of the larger impacts on a VM is determined by the type of disk you opt to use for your VMs in Hyper-v manager/scvmm such as fixed vs dynamic.

More information