Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems
|
|
|
- Chad Owen
- 10 years ago
- Views:
Transcription
1 Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems A. Carbon, Y. Lhuillier, H.-P. Charles CEA LIST DACLE division Embedded Computing Embedded Software Laboratories France Contact: 24th IEEE International Conference on Applicationspecific Systems, Architectures and Processors June , Washington D.C., USA.
2 Cliquez pour modifier le style du Outline titre Context : Virtualization JIT emergence JIT optimization opportunities (based on the LLVM framework) Hardware accelerator proposal Experiments results Conclusion 2
3 Cliquez pour modifier le style Context du titre Parallelism emergence CPU CPU CPU CPU CPU 3
4 Cliquez pour modifier le style Context du titre Parallelism emergence Heterogeneity development in computing systems ILT/TLP-based multi-cores and wide SIMD GPUs CPU CPU CPU CPU GPU 4
5 Cliquez pour modifier le style Context du titre Parallelism emergence Heterogeneity development in embedded systems Heterogeneous asymmetric many-core processors (AMP) emergence CPU DSP CPU HW acc. GPU PPE: a major concern of embedded systems designers 5
6 Cliquez Context: pour Virtualization modifier le style emergence du titre Code deployment on such architectures High development cost to efficiently target one AMP Code portability has become a major issue Software? Hardware 6
7 Cliquez Context: pour Virtualization modifier le style emergence du titre Emergence of virtualization abstraction layers First mention in 60 s (IBM 360/67) Java Virtual Machines, CLI, LLVM => Virtual Machines (VMs) development Software Virtualization layer Hardware 7
8 Cliquez Context: pour Virtualization modifier le style emergence du titre Emergence of virtualization abstraction layers Based initially on interpretation Suffer from considerable performance overheads Software Virtualization Interpretation layer Hardware 8
9 Cliquez Context: pour modifier Just-In-Time le style emergence du titre Emergence of virtualization abstraction layers Coupling today interpretation and Just-In-Time compilation Software Interpretation Virtualization layer Just-In-Time compilation Hardware 9
10 Cliquez Context: pour modifier Just-In-Time le style emergence du titre JIT compilation: widely used in GPP Performance consumption overheads in embedded 2 kinds of existing optimizations to reduce JIT impact Software optimizations System design specialization Specialized dedicated resources Additional standard dedicated resources 1 JIT compilation complexity limits performance gains 1 Pointer-based algorithms Proposing tuned hardware associated to these dedicated resources to manage JIT compilation algorithms 1 Ting Cao et al. «The yin and yang of power and performance for asymmetric hardware and managed software», ISCA
11 Cliquez pour modifier le style du Outline titre Context : Virtualization JIT emergence JIT optimization opportunities (based on the LLVM framework) Hardware accelerator proposal Experiments results Conclusion 11
12 Cliquez pour modifier LLVM le style framework du titre LLVM bytecode compiler (LLC): used in many projects Profiling: identifying LLC s most critical parts Experiments on a ARM Cortex-A5 model Associative array management dynamic memory allocation: on average 24% of LLC execution time 12
13 LLVM: Cliquez existing pour modifier software le optimizations style du titre New abstract-data-types (optimized versions of STL C++ ADT) Provide [multi]map multi[set] abstract data types Hash table implementations rather than sorted-tree STL C++: still used when performance is not a key issue Specialized allocators (eg. RecycleAllocator) Keeps track of recently dealloccated objects to reuse them Avoiding frequent allocations and deallocations Despite these software optimizations, associative arrays memory allocation still prevail 13
14 Cliquez pour modifier LLVM le normalization style du titre Our goal: proposing an alternative acceleration of associative arrays dynamic memory allocation Mean: standardization of LLC Proposing a solution based on standard libraries for reuse Using only STL C++ library for [multi]map multi[set] ADT C s memory allocation standard library (dlmalloc-based) Transferring optimizations to a hardware accelerator Solution portability: accelerator reuse Benefit to all pointer-based algorithms using massively associative arrays and dynamic memory allocation 14
15 Cliquez pour modifier le style du Outline titre Context : Virtualization JIT emergence JIT optimization opportunities (based on the LLVM framework) Hardware accelerator proposal Experiments results Conclusion 15
16 Cliquez RB-Tree pour modifier Hardware le style accelerator du titre Current implementation of associative arrays memory allocation Standard C++ Map Set libraries Using Red-Black Tree representation Binary tree with coloring property C s memory allocator: dlmalloc Using associative arrays to associate data sizes with free memory chunks Using hash table double linked-lists Systematic usage of RB-Trees Proposing an implementation of dlmalloc using RB-Trees Modifying the allocator without modifying user interface 16
17 Cliquez Hardware pour modifier acceleration style description du titre New RB-Tree node structure: held in 128-bits Digest key in 31-bits, color in the last bit, preserving the sorting order rb_tree_node_t* X rb_tree_node_t* X 31 0 COLOR PARENT LEFT RIGHT KEY D_KEY C 128 bits PARENT LEFT 128 bits RIGHT Key_size (a) initial structure (b) proposed structure Proposing hardware accelerated instructions Specialized instructions for RB-Tree management functions Accelerating traversals, key look-ups, balanced insertion and removal 17
18 Cliquez pour modifier Proposed le ISA style extension du titre 15 new instructions over 400 in ARM ISA Instruction Function Used by RBTINC Rd, Rm increment map::iterator, set::iterator RBTDEC Rd, Rm decrement map::iterator, set::iterator RBTLOW Rd, Rn, Rm lower bound map::lower_bound, set::lower_bound, malloc, realloc RBTUP Rd, Rn, Rm upper_bound map::upper_bound, set::upper_bound RBTDEL Rn, Rm rebalance for erase map::erase, set::erase, malloc, realloc, free RBTINS <L R> Rn, Rm, Rs insert rebalance map::insert, set::insert, malloc, realloc, free Read at most 3 registers, write at most 1 register Multi-cycles instructions: hiding iterative computations 18
19 Cliquez HW pour accelerator modifier implementation style du titre New Cortex-A5 pipeline HW accelerator proposal Full FSM size estimation: 161 states, 223 transitions 19
20 Cliquez pour modifier le style du Outline titre Context : Virtualization JIT emergence JIT optimization opportunities (based on the LLVM framework) Hardware accelerator proposal Experiments results Conclusion 20
21 Cliquez pour modifier Experiments le style du Results titre Instrumented ARM Cortex-A5 ISS with a cache simulator Speedups obtained for SW and HW accelerated versions of LLC SW: 29% of gain, HW: 50% / LLC standardized version 15% of gain for HW acc / SW acc 21
22 Cliquez pour modifier Experiments le style du Results titre Evolution of time spent in memory allocation associative array management (relative to total execution time) From 41% to 24% (SW) 12% (HW) Raw speedup on memory allocation associative arrays: 5x 22
23 Cliquez pour modifier le style du Outline titre Context : Virtualization JIT emergence JIT optimization opportunities (based on the LLVM framework) Hardware accelerator proposal Experiments results Conclusion 23
24 Cliquez pour modifier le style Conclusion du titre Interest of dedicated resources for virtualization services Limited gains for JIT compilation with SW optimizations Impact of dynamic memory allocation associative array management on execution time Proposing tuned HW for JIT compilation, coupled to dedicated resources HW accelerator hidden behind standard libraries Valuable for all pointer-based algorithms ISA extension for RB-Tree management functions 15% of gain comparing to SW opt in LLVM code gen 5x raw speedup for memory allocation associative array management Next acceleration opportunities: instruction graph handling 24
25 Thank you Questions? Centre de Grenoble 17 rue des Martyrs Grenoble Cedex Centre de Saclay Nano-Innov PC Gif sur Yvette Cedex
Secure data processing: Blind Hypervision
Secure data processing: Blind Hypervision P. Dubrulle, R. Sirdey, E. Ohayon, P. Dore and M. Aichouch CEA LIST Contact : [email protected] www.cea.fr Cliquez pour modifier le style Introduction titre
On-Line Diagnosis using Orthogonal Multi-Tone Time Domain Reflectometry in a Lossy Cable
On-Line Diagnosis using Orthogonal Multi-Tone Time Domain Reflectometry in a Lossy Cable Wafa BEN HASSEN*, Fabrice AUZANNEAU*, Luca INCARBONE*, François PERES** and Ayeley P. TCHANGANI** (*) CEA, LIST,
Runtime Code Generation for Code Polymorphism
Runtime Code Generation for Code Polymorphism Workshop on Runtime Code Generation for Secured Embedded Devices Damien Couroussé 2015-12-03 www.cea.fr Runtime Cliquez pour Code modifier Generation: le style
Code generation under Control
Code generation under Control Rencontres sur la compilation / Saint Hippolyte Henri-Pierre Charles CEA Laboratoire LaSTRE / Grenoble 12 décembre 2011 Introduction Présentation Henri-Pierre Charles, two
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
Jonathan Worthington Scarborough Linux User Group
Jonathan Worthington Scarborough Linux User Group Introduction What does a Virtual Machine do? Hides away the details of the hardware platform and operating system. Defines a common set of instructions.
CEA LIST activity on Cable Monitoring and Diagnosis
CEA LIST activity on Cable Monitoring and Diagnosis Contact : Mr Josy COHEN Responsable de projet ingénieur chercheur [email protected] +33 1 69 08 78 07 www.cea.fr Cliquez CEA pour experience modifier
Driving force. What future software needs. Potential research topics
Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #
Coherent sub-thz transmission systems in Silicon technologies: design challenges for frequency synthesis
Coherent sub-thz transmission systems in Silicon technologies: design challenges for frequency synthesis Alexandre Siligaris www.cea.fr Cliquez pour modifier le style du Outline titre Introduction-context
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
Cloud Computing. Up until now
Cloud Computing Lecture 11 Virtualization 2011-2012 Up until now Introduction. Definition of Cloud Computing Grid Computing Content Distribution Networks Map Reduce Cycle-Sharing 1 Process Virtual Machines
PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)
PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon
picojava TM : A Hardware Implementation of the Java Virtual Machine
picojava TM : A Hardware Implementation of the Java Virtual Machine Marc Tremblay and Michael O Connor Sun Microelectronics Slide 1 The Java picojava Synergy Java s origins lie in improving the consumer
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Introduction to Virtual Machines
Introduction to Virtual Machines Introduction Abstraction and interfaces Virtualization Computer system architecture Process virtual machines System virtual machines 1 Abstraction Mechanism to manage complexity
Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students
Eastern Washington University Department of Computer Science Questionnaire for Prospective Masters in Computer Science Students I. Personal Information Name: Last First M.I. Mailing Address: Permanent
Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students
Eastern Washington University Department of Computer Science Questionnaire for Prospective Masters in Computer Science Students I. Personal Information Name: Last First M.I. Mailing Address: Permanent
Seven Challenges of Embedded Software Development
Corporate Technology Seven Challenges of Embedded Software Development EC consultation meeting New Platforms addressing mixed criticalities Brussels, Feb. 3, 2012 Urs Gleim Siemens AG Corporate Technology
12. Introduction to Virtual Machines
12. Introduction to Virtual Machines 12. Introduction to Virtual Machines Modern Applications Challenges of Virtual Machine Monitors Historical Perspective Classification 332 / 352 12. Introduction to
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
Virtual Machines. www.viplavkambli.com
1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software
Virtual Machine Learning: Thinking Like a Computer Architect
Virtual Machine Learning: Thinking Like a Computer Architect Michael Hind IBM T.J. Watson Research Center March 21, 2005 CGO 05 Keynote 2005 IBM Corporation What is this talk about? Virtual Machines? 2
SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing
SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing Won-Jong Lee, Shi-Hwa Lee, Jae-Ho Nah *, Jin-Woo Kim *, Youngsam Shin, Jaedon Lee, Seok-Yoon Jung SAIT, SAMSUNG Electronics, Yonsei Univ. *,
COM 444 Cloud Computing
COM 444 Cloud Computing Lec 3: Virtual Machines and Virtualization of Clusters and Datacenters Prof. Dr. Halûk Gümüşkaya [email protected] [email protected] http://www.gumuskaya.com Virtual
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers
ZT Systems (ST based) Applied Micro development platform HP Redstone platform Mitac Dell Copper platform ARM in Servers 1 Server Ecosystem Momentum 2009: Internal ARM trials hosting part of website on
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
Recent Advances in Periscope for Performance Analysis and Tuning
Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,
Java Virtual Machine: the key for accurated memory prefetching
Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain
Optimizing Code for Accelerators: The Long Road to High Performance
Optimizing Code for Accelerators: The Long Road to High Performance Hans Vandierendonck Mons GPU Day November 9 th, 2010 The Age of Accelerators 2 Accelerators in Real Life 3 Latency (ps/inst) Why Accelerators?
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Workshop on Computer Architecture Education 2015 Dan Connors, Kyle Dunn, Ryan Bueter Department of Electrical Engineering University
MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu
1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab
Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event
FPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
High Performance or Cycle Accuracy?
CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing
Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform. Ed Spetka Mike Kohler
Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform Ed Spetka Mike Kohler Outline Abstract Hardware Overview Completely Fair Scheduler Design Theory Breakdown of the
Accelerate Cloud Computing with the Xilinx Zynq SoC
X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
Chapter 3 Operating-System Structures
Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual
ADVANCED COMPUTER ARCHITECTURE
ADVANCED COMPUTER ARCHITECTURE Marco Ferretti Tel. Ufficio: 0382 985365 E-mail: [email protected] Web: www.unipv.it/mferretti, eecs.unipv.it 1 Course syllabus and motivations This course covers the
A Unified View of Virtual Machines
A Unified View of Virtual Machines First ACM/USENIX Conference on Virtual Execution Environments J. E. Smith June 2005 Introduction Why are virtual machines interesting? They allow transcending of interfaces
Hybrid and Custom Data Structures: Evolution of the Data Structures Course
Hybrid and Custom Data Structures: Evolution of the Data Structures Course Daniel J. Ernst, Daniel E. Stevenson, and Paul Wagner Department of Computer Science University of Wisconsin Eau Claire Eau Claire,
www.quilogic.com SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology
SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology The parallel revolution Future computing systems are parallel, but Programmers
CFD Implementation with In-Socket FPGA Accelerators
CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline
1/20/2016 INTRODUCTION
INTRODUCTION 1 Programming languages have common concepts that are seen in all languages This course will discuss and illustrate these common concepts: Syntax Names Types Semantics Memory Management We
2015 The MathWorks, Inc. 1
25 The MathWorks, Inc. 빅 데이터 및 다양한 데이터 처리 위한 MATLAB의 인터페이스 환경 및 새로운 기능 엄준상 대리 Application Engineer MathWorks 25 The MathWorks, Inc. 2 Challenges of Data Any collection of data sets so large and complex
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu [email protected] High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
Java Embedded Applications
TM a One-Stop Shop for Java Embedded Applications GeeseWare offer brings Java in your constrained embedded systems. You develop and simulate your Java application on PC, and enjoy a seamless hardware validation.
Next Generation Operating Systems
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the
What s New in MATLAB and Simulink
What s New in MATLAB and Simulink Kevin Cohan Product Marketing, MATLAB Michael Carone Product Marketing, Simulink 2015 The MathWorks, Inc. 1 What was new for Simulink in R2012b? 2 What Was New for MATLAB
Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008
Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Scientific Computing Programming with Parallel Objects
Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
FPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
SOC architecture and design
SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external
Garbage Collection in the Java HotSpot Virtual Machine
http://www.devx.com Printed from http://www.devx.com/java/article/21977/1954 Garbage Collection in the Java HotSpot Virtual Machine Gain a better understanding of how garbage collection in the Java HotSpot
Week 1 out-of-class notes, discussions and sample problems
Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
ProTrack: A Simple Provenance-tracking Filesystem
ProTrack: A Simple Provenance-tracking Filesystem Somak Das Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology [email protected] Abstract Provenance describes a file
Multi-GPU Load Balancing for Simulation and Rendering
Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Pervasive Parallelism Laboratory Stanford University Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun Graph and its Applications Graph Fundamental
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
A Case Study - Scaling Legacy Code on Next Generation Platforms
Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE
PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE Sudha M 1, Harish G M 2, Nandan A 3, Usha J 4 1 Department of MCA, R V College of Engineering, Bangalore : 560059, India [email protected] 2 Department
FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab
FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search
Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001
Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering
C++ Programming Language
C++ Programming Language Lecturer: Yuri Nefedov 7th and 8th semesters Lectures: 34 hours (7th semester); 32 hours (8th semester). Seminars: 34 hours (7th semester); 32 hours (8th semester). Course abstract
General Introduction
Managed Runtime Technology: General Introduction Xiao-Feng Li ([email protected]) 2012-10-10 Agenda Virtual machines Managed runtime systems EE and MM (JIT and GC) Summary 10/10/2012 Managed Runtime
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters
Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.
Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner
Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Research Group Scientific Computing Faculty of Computer Science University of Vienna AUSTRIA http://www.par.univie.ac.at
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
Secured Embedded Many-Core Accelerator for Big Data Processing
Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland,
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Dan Connors, Kyle Dunn, and Ryan Bueter Department of Electrical Engineering University of Colorado Denver Denver, Colorado
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will
International Workshop on Field Programmable Logic and Applications, FPL '99
International Workshop on Field Programmable Logic and Applications, FPL '99 DRIVE: An Interpretive Simulation and Visualization Environment for Dynamically Reconægurable Systems? Kiran Bondalapati and
my forecasted needs. The constraint of asymmetrical processing was offset two ways. The first was by configuring the SAN and all hosts to utilize
1) Disk performance When factoring in disk performance, one of the larger impacts on a VM is determined by the type of disk you opt to use for your VMs in Hyper-v manager/scvmm such as fixed vs dynamic.
