Advanced Computer Architecture Project Proposals
|
|
- Theresa Welch
- 7 years ago
- Views:
Transcription
1 POLITECNICO DI MILANO Dipartimento di Elettronica, Informazione e Bioingegneria Advanced Computer Architecture Project Proposals Giovanni Agosta, Amir H. Ashouri, Alessandro Barenghi, Davide Cerotti, Stefano Cherubin, Alessandro Di Federico, Davide Gadioli, Gianluca Palermo, Gerardo Pelosi, Ioannis Stamelakos, Emanuele Vitali HEAP Laboratory Advanced Computer Architecture 2016
2 README FIRST Project Overview 2 Project Tags S : Security projects I : Internet of Things projects F : FPGA projects H : High Performance Computing projects M : Machine learning for compilation/decompilation projects D : Design Space Exploration (DSE) projects Procedure 1. both the TA of the course and the researcher in charge of the project (in contacts section) mentioning your interest in the specific project using the project tag and number. Additional tools, data-sets, frameworks, etc may be sent to your if needed. 2. maximum points awarded and number of students undertaking the project have been mentioned on each project block.
3 Project S1 Parallel computing on embedded platforms with OpenCL 3 (up to 2 people, up to 4 pts per person) Parallelize the execution of a few block ciphers on a high-end mobile platform employing the OpenCL programming model Automate (script) the performance evaluation of the implementations on all the OpenCL runtimes available on the platform (host+device) Preferred Reference Platform Exynos 5410 ARM big.little SoC with PowerVR SGX544MP3 OpenCL enabled GPU ( 1. Port the C implementation of AES-GCM ( to OpenCL 2. Check correct execution with supplied test vectors on a common PC 3. Collect performance figures on the reference platform Recommended skills: C or C++ programming in a Linux environment Contacts: {alessandro.barenghi,gerardo.pelosi}@polimi.it
4 Project S2 Parallel computing on FPGAs with OpenCL 4 (up to 2 people, up to 4 pts per person) Start from an existing OpenCL implementation of a block cipher (e.g., AES) Compile it using Xilinx Vivado tools (eventually modifying it to satisfy the platform requirements) Automate and run performance evaluation Preferred Reference Platform Xilinx Zinq board 1. Port the application to FPGA using the Xilinx tools 2. Check correct execution with supplied test vectors 3. Collect performance figures on the reference platform Recommended skills: OpenCL programming Contacts: agosta@acm.org, {alessandro.barenghi,gerardo.pelosi}@polimi.it
5 Project I1 Fixed Point vs Floating Point Benchmarks 5 (up to 2 people, up to 8 pts per person) Develop benchmarks (algorithm + testing code) comparing Float vs Fixed Point arithmetics performance (starting from existing Floating Point reference) Report on existing support tools for programming fixed point platforms GeCoS ( Testing Platforms MIPS Creator CI20 Raspberry PI (ARM) 1. Target OpenCV ( Image Processing Kernels: canny, contour 2. Rewrite the benchmarks using fixed point arithmetics 3. Compare efficiency and quality of results Recommended skills: C/C++ programming in a Linux environment Contacts: agosta@acm.org
6 Project I3 6 Simulating Computer Architecture ILP Techniques: Multistage Pipeline (up to 2 people, up to 8 pts per person) Revise, test and add another technique to the existing java simulator The goal is to complete the current simulator and make it available web-based using java applets ACA Simulator source-code ( 1. Test the integrity of the existing code and the already simulated techniques (Pipeline, Scoreboard and Tomasulo) 2. Add branch-prediction feature 3. Add VLIW processor based on the ACA course 4. Using Java applets, make the current version executable on web-based Recommended skills: Java programming Contacts: amirhossein.ashouri@polimi.it
7 Project F1 MinSoC Open source CPU support for SAKURA-G board 7 (up to 2 people, up to 12 pts per person) The MinSoC ( is a RISC-based system on chip, with a development and compilation toolchain, and JTAG debug support Add the required support (connect JTAG interface and map pins ) for the MinSoC to run on Sakura-G development board Reference Platform Standard Side-channel Evaluation board Sakura-G (Xilinx Spartan-6 LX75 FPGA) ( 1. Familiarize with the MinSoC environment and compilation toolchain 2. Analyze the JTAG interface of Spartan-6 FPGAs from the reference manual 3. Wire the JTAG interface module of the MinSoC to achieve a functional JTAG interface and test its functionality with the gdb JTAG bridge Recommended skills: VHDL/Verilog working knowledge, gdb via JTAG-bridge use Contacts: {alessandro.barenghi,gerardo.pelosi}@polimi.it
8 Project H1 8 Compiler and Precision Tuning for High Performance Computing (up to 2 people, up to 12 pts per person) The ANTAREX framework provides support for compiling multiple version of the same kernels with parameters and floating point precision Modify one of the Rodinia benchmarks to take advantage of the framework and explore the performances obtained with different options Intel Xeon dual socket server, possibly NVidia GPGPUs 1. Familiarize with the ANTAREX framework 2. Analyze the chosen Rodinia benchmark 3. Modify the benchmark to take advantage of the framework and run it 4. For two people, also modify the benchmark to switch dynamically from OpenCL to OpenMP parallelisation Recommended skills: C/C++, possibly OpenMP/OpenCL Contacts: agosta@acm.org, stefano.cherubin@polimi.it
9 Project H2 AutoTuning for High Performance Computing 9 (up to 2 people, up to 12 pts per person) The ANTAREX framework provides support for tuning parameters of the application to reach specific goals (performance, power, accuracy, etc) Modify one of the Rodinia benchmarks to take advantage of the framework and explore the trade-offs obtained with different options Intel Xeon dual socket server, possibly NVidia GPGPUs 1. Familiarize with the ANTAREX framework 2. Analyze the chosen Rodinia benchmark 3. Modify the benchmark to take advantage of the framework and run it Recommended skills: C++, possibly OpenMP/OpenCL Contacts: {gianluca.palermo,davide.gadioli}@polimi.it
10 Project H3 Inter-dependencies among CPU and GPGPUs 10 (up to 2 people, up to 4 pts per person) GPGPU applications exploit the CPU to dispatch tasks on the GPU cores Investigate how the performance are affected when both CPU and GPGPU intensive applications run concurrently in the same platform Intel Xeon dual socket server, possibly NVidia GPGPUs 1. Examine the OpenCL framework in order to implement the simulateneous execution of CPU and GPGPU applications 2. Collect performance of the same OpenCL benchmark running only on CPU and only on GPGPU 3. Collect performance of the two versions of the benchmark running together simultaneously and Compare the results Recommended skills: C++, possibly OpenMP/OpenCL Contacts: davide.cerotti@polimi.it
11 Project H4 Evaluating the memory interference in Manycore platforms 11 (up to 2 people, up to 4 pts per person) In modern multicore architectures, where different applications run simultaneously, memory interference can be a serious bottleneck. The students will have to experiment with running highly parallel benchmarks from the Splash-2 and Parsec suites and measure the impact of application co-scheduling on the shared last level cache (LL$) and the memory (DRAM). Sniper architecture simulator ( benchmarks ( 1. Installing and running a manycore simulator 2. Extracting the metric from the simulation traces produced 3. Analyzing the output Recommended skills: Bash scripting Contact: ioannis.stamelakos@polimi.it
12 Project H5 Approximate Computing for HPC 12 (up to 2 people, up to 12 pts per person) Approximate computing is a technique that exploit the possibility to trade-off power/performance/accuracy by calculating enough Analyze and modify some of the applications within the Rodinia benchmarks to introduce approximate calculation and defining a quality metric to evaluate the approximation Any Intel machine 1. Familiarize with the concept of approximate computing (papers will be provided) 2. Analyze the chosen Rodinia benchmarks 3. Introduce the approximation and evaluate the trade-offs Recommended skills: C/C++ Contacts: {gianluca.palermo,davide.gadioli}@polimi.it
13 Project H6 AOP for AutoTuning in HPC 13 (up to 2 people, up to 12 pts per person) Aspect-oriented programming (AOP) is a programming paradigm that aims to increase modularity by allowing the separation of concerns. The project want to use AOP concepts (AspectC++, aspectc.org) to introduce monitoring and auto-tuning code within the functional code of an application. Any Intel machine 1. Familiarize with the concept of Aspect-oriented Programming 2. Familiarize with the ANTAREX framework for auto-tuning 3. Adopt AOP in simple codes by evaluating the intrusiveness. Recommended skills: C/C++ Contacts: {gianluca.palermo,davide.gadioli}@polimi.it
14 Project H7 Application Characterization in HPC 14 (up to 2 people, up to 10 pts per person. Extendable to M. Sc thesis) Extending an already available program characterization plugin based on Linux PIN to support parallel characterization (instrumentation) (OpenMP and MPI) MICA Dynamic characterization tool ( PIN Linux dynamic PIN instrumentation tool 1. Familiarize with the linux PIN tools 2. familiarize with PIN plugin: MICA 3. Extend parallel instrumentation of MICA within the PIN platform Recommended skills: Basic C++, Linux environment, basic parallel programming Contacts: amirhossein.ashouri@polimi.it
15 Project H8 Application Characterization in HPC 15 Parallel characterization of programs using PAPI librari (up to 2 people, up to 6 pts per person.) Page INRIA ParaSuite: 1. Familiarize with PAPI Library 2. familiarize with INRIA ParaSuite 3. Run the benchmarks and collect dunamic features with PAPI 4. Do sensitivity analysis on the collected results Recommended skills: Basic C, Linux environment, basic parallel programming Contacts: amirhossein.ashouri@polimi.it
16 Project H9 16 Finding the Thermal Safe Power Budget for Manycore Platforms (up to 2 people, up to 6 pts per person.) As technology scales, the thermal management for multicore architectures becomes a critical challenge due to the increasing power density. The efficiency of the power budgeting techniques is limited by the fixed chip-wide power budget. To solve this limitation, a thermal safe power (TSP) model can be used to calculate the core power capacity for different active core mappings. The students will have to use academic thermal estimation tools such Hotspot and TSP to estimate the thermal safe power budget of a manycore platform consisting up to 128 cores. HotSpot Temperature Modeling Tool ( Thermal Safe Power (TSP) Tool ( 1. Installing and familiarizing with Hotspot and/or TSP 2. Using the tools in order to obtain the thermal safe power budget from power traces that are gonna be provided Recommended skills: Basic computer architecture knowledge Contacts: Advancedioannis.stamelakos@polimi.it Computer Architecture 2016 POLITECNICO DI MILANO Dipartimento di Elettronica, Informazione e Bioingegneria
17 Project D1 Automatic Design Space Exploration for System on Chip 17 (1 student, up to 6 pts) The scope of this work is automatize the design space exploration for mixed critical system on chip. No particular requirements; ground truths are provided as part of the data set The objective of this project is to automatize multiple searches modifying a source xml into the format that is accepted by the interference checker. 1. Create a script able to automatically generate a compliant xml file for each possible decision. Only characteristic decisions have to be handled at this point. 2. Given an xml of a single SoC, with some mapping decisions yet to take, create all the possible mappings according to those not taken decisions, run the program and collect the output, declaring which designs are viable and which ones are not. Recommended skills: Basic Linux, Bash scripting Contacts: emanuele.vitali@polimi.it
18 Project M3 Machine learning for compiler auto-tuning 18 (up to 2 people, up to 8 pts per person. Extendable to M. Sc thesis) When dealing with auto-tuning and machine learning, understanding the best features of the program to collect plays an important role in the final prediction. To understand what program characteristics are more important to monitor Analyze the relation between the type of characteristics and the different optimizations already applied MICA Dynamic characterization tool ( 1. Familiarize with the linux PIN tools 2. familiarize with MICA tool 3. Do sensitivity analysis on the relation of between the optimized codes and the type of the characterization techniques Recommended skills: Basic C++, Linux environment, Machine learning Contacts: amirhossein.ashouri@polimi.it
19 Project M4 Machine learning for compiler auto-tuning 19 (up to 2 people, up to 8 pts per person. Extendable to M. Sc thesis) When dealing with auto-tuning and machine learning, understanding the best set of compiler optimizations and their effects on the code plays an important role on the performance metrics. Analyze the existing optimizations level and clustering the set of promising optimizations together LLVM compilation framework 1. Familiarize with the LLVM (Opt and Clang) optimization passes 2. Familiaraze with machine learning and clustering techniques 3. Applying clustering techniques on the existing optimizations to group the promising optimizations that further improve the performance Recommended skills: Machine learning Contacts: amirhossein.ashouri@polimi.it
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationPerformance Counter. Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10
Performance Counter Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10 Performance Counter Hardware Unit for event measurements Performance Monitoring Unit (PMU) Originally for CPU-Debugging
More informationBuilding an energy dashboard. Energy measurement and visualization in current HPC systems
Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 thomas.geenen@surfsara.nl SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationDevelopment With ARM DS-5. Mervyn Liu FAE Aug. 2015
Development With ARM DS-5 Mervyn Liu FAE Aug. 2015 1 Support for all Stages of Product Development Single IDE, compiler, debug, trace and performance analysis for all stages in the product development
More informationEmbedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
More informationThe High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
More informationNVIDIA Tools For Profiling And Monitoring. David Goodwin
NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
More informationHIGH PERFORMANCE CONSULTING COURSE OFFERINGS
Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationMAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
More informationLinux. Reverse Debugging. Target Communication Framework. Nexus. Intel Trace Hub GDB. PIL Simulation CONTENTS
Android NEWS 2016 AUTOSAR Linux Windows 10 Reverse ging Target Communication Framework ARM CoreSight Requirements Analysis Nexus Timing Tools Intel Trace Hub GDB Unit Testing PIL Simulation Infineon MCDS
More informationD5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0
D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Document Information Contract Number 288777 Project Website www.montblanc-project.eu Contractual Deadline
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationHigh Performance or Cycle Accuracy?
CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing
More informationPerformance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab
Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event
More informationRecent and Future Activities in HPC and Scientific Data Management Siegfried Benkner
Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Research Group Scientific Computing Faculty of Computer Science University of Vienna AUSTRIA http://www.par.univie.ac.at
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More informationKalray MPPA Massively Parallel Processing Array
Kalray MPPA Massively Parallel Processing Array Next-Generation Accelerated Computing February 2015 2015 Kalray, Inc. All Rights Reserved February 2015 1 Accelerated Computing 2015 Kalray, Inc. All Rights
More informationOpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
More informationST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationTurbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
More informationCase Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
More informationProgram Grid and HPC5+ workshop
Program Grid and HPC5+ workshop 24-30, Bahman 1391 Tuesday Wednesday 9.00-9.45 9.45-10.30 Break 11.00-11.45 11.45-12.30 Lunch 14.00-17.00 Workshop Rouhani Karimi MosalmanTabar Karimi G+MMT+K Opening IPM_Grid
More informationMulticore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
More informationFPGA Acceleration using OpenCL & PCIe Accelerators MEW 25
FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 December 2014 FPGAs in the news» Catapult» Accelerate BING» 2x search acceleration:» ½ the number of servers»
More informationManaging Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction
Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction Cristina Silvano cristina.silvano@polimi.it Politecnico di Milano HiPEAC CSW Athens 2014 Motivations System
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationDavid Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
More informationEnd-user Tools for Application Performance Analysis Using Hardware Counters
1 End-user Tools for Application Performance Analysis Using Hardware Counters K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, T. Spencer Abstract One purpose of the end-user tools described in
More informationEmbedded Development Tools
Embedded Development Tools Software Development Tools by ARM ARM tools enable developers to get the best from their ARM technology-based systems. Whether implementing an ARM processor-based SoC, writing
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationThe Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
More informationIntroduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationNVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
More informationExascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation
Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03
More informationPerformance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
More informationResource Scheduling Best Practice in Hybrid Clusters
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Resource Scheduling Best Practice in Hybrid Clusters C. Cavazzoni a, A. Federico b, D. Galetti a, G. Morelli b, A. Pieretti
More informationLecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com
CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU
More informationHigh Performance Computing in the Multi-core Area
High Performance Computing in the Multi-core Area Arndt Bode Technische Universität München Technology Trends for Petascale Computing Architectures: Multicore Accelerators Special Purpose Reconfigurable
More informationVirtual Machines. www.viplavkambli.com
1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software
More informationAn examination of the dual-core capability of the new HP xw4300 Workstation
An examination of the dual-core capability of the new HP xw4300 Workstation By employing single- and dual-core Intel Pentium processor technology, users have a choice of processing power options in a compact,
More informationEvaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
More informationA GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationANDROID DEVELOPER TOOLS TRAINING GTC 2014. Sébastien Dominé, NVIDIA
ANDROID DEVELOPER TOOLS TRAINING GTC 2014 Sébastien Dominé, NVIDIA AGENDA NVIDIA Developer Tools Introduction Multi-core CPU tools Graphics Developer Tools Compute Developer Tools NVIDIA Developer Tools
More informationPyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Workshop on Computer Architecture Education 2015 Dan Connors, Kyle Dunn, Ryan Bueter Department of Electrical Engineering University
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationTEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING
TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING NVIDIA DEVELOPER TOOLS BUILD. DEBUG. PROFILE. C/C++ IDE INTEGRATION STANDALONE TOOLS HARDWARE SUPPORT CPU AND GPU DEBUGGING & PROFILING
More informationGoing Linux on Massive Multicore
Embedded Linux Conference Europe 2013 Going Linux on Massive Multicore Marta Rybczyńska 24th October, 2013 Agenda Architecture Linux Port Core Peripherals Debugging Summary and Future Plans 2 Agenda Architecture
More informationPedraforca: ARM + GPU prototype
www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of
More informationApplications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
More informationProgram Optimization for Multi-core Architectures
Program Optimization for Multi-core Architectures Sanjeev K Aggarwal (ska@iitk.ac.in) M Chaudhuri (mainak@iitk.ac.in) R Moona (moona@iitk.ac.in) Department of Computer Science and Engineering, IIT Kanpur
More informationAutomatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München
Automatic Tuning of HPC Applications for Performance and Energy Efficiency. Michael Gerndt Technische Universität München SuperMUC: 3 Petaflops (3*10 15 =quadrillion), 3 MW 2 TOP 500 List TOTAL #1 #500
More informationVALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS
VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,
More informationGraphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
More informationMulti-core Curriculum Development at Georgia Tech: Experience and Future Steps
Multi-core Curriculum Development at Georgia Tech: Experience and Future Steps Ada Gavrilovska, Hsien-Hsin-Lee, Karsten Schwan, Sudha Yalamanchili, Matt Wolf CERCS Georgia Institute of Technology Background
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
More informationRISC-V Software Ecosystem. Andrew Waterman UC Berkeley waterman@eecs.berkeley.edu!
RISC-V Software Ecosystem Andrew Waterman UC Berkeley waterman@eecs.berkeley.edu! 2 Tethered vs. Standalone Systems Tethered systems are those that cannot stand alone - They depend on a host system to
More informationParallel programming with Session Java
1/17 Parallel programming with Session Java Nicholas Ng (nickng@doc.ic.ac.uk) Imperial College London 2/17 Motivation Parallel designs are difficult, error prone (eg. MPI) Session types ensure communication
More informationChoosing a Computer for Running SLX, P3D, and P5
Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationDebugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014
Debugging in Heterogeneous Environments with TotalView ECMWF HPC Workshop 30 th October 2014 Agenda Introduction Challenges TotalView overview Advanced features Current work and future plans 2014 Rogue
More informationLe langage OCaml et la programmation des GPU
Le langage OCaml et la programmation des GPU GPU programming with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte Le projet OpenGPU : un an plus tard Ecole Polytechnique - 8 juin 2011 Outline
More informationSourcery Overview & Virtual Machine Installation
Sourcery Overview & Virtual Machine Installation Damian Rouson, Ph.D., P.E. Sourcery, Inc. www.sourceryinstitute.org Sourcery, Inc. About Us Sourcery, Inc., is a software consultancy founded by and for
More informationInformal methods A personal search for practical alternatives to moral improvement through suffering in systems research
Informal methods A personal search for practical alternatives to moral improvement through suffering in systems research (lightning-talk version) Robert N. M. Watson University of Cambridge Computer Laboratory
More informationBusiness white paper. HP Process Automation. Version 7.0. Server performance
Business white paper HP Process Automation Version 7.0 Server performance Table of contents 3 Summary of results 4 Benchmark profile 5 Benchmark environmant 6 Performance metrics 6 Process throughput 6
More informationCUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationExperiences on using GPU accelerators for data analysis in ROOT/RooFit
Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,
More informationIntroduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
More informationData Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
More informationFLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
More informationJonathan C. Sevy. Software and Systems Engineering Experience
Jonathan C. Sevy jsevy@cs.drexel.edu http://gicl.cs.drexel.edu/people/sevy Software and Systems Engineering Experience Experienced in all phases of software development, including requirements, architecture
More informationLattice QCD Performance. on Multi core Linux Servers
Lattice QCD Performance on Multi core Linux Servers Yang Suli * Department of Physics, Peking University, Beijing, 100871 Abstract At the moment, lattice quantum chromodynamics (lattice QCD) is the most
More information~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
More informationA Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
More informationRecent Advances in Periscope for Performance Analysis and Tuning
Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,
More informationA general-purpose virtualization service for HPC on cloud computing: an application to GPUs
A general-purpose virtualization service for HPC on cloud computing: an application to GPUs R.Montella, G.Coviello, G.Giunta* G. Laccetti #, F. Isaila, J. Garcia Blas *Department of Applied Science University
More informationOptimizing a 3D-FWT code in a cluster of CPUs+GPUs
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la
More informationDesigning and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
More informationDriving force. What future software needs. Potential research topics
Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #
More informationOptimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing
More informationOptimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
More informationMulti-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
More informationCERN openlab III. Major Review Platform CC. Sverre Jarp Alfio Lazzaro Julien Leduc Andrzej Nowak
CERN openlab III Major Review Platform CC Sverre Jarp Alfio Lazzaro Julien Leduc Andrzej Nowak Teaching (1) 3 workshops already held this year: Computer Architecture and Performance Tuning: 17/18 February
More informationQUADRICS IN LINUX CLUSTERS
QUADRICS IN LINUX CLUSTERS John Taylor Motivation QLC 21/11/00 Quadrics Cluster Products Performance Case Studies Development Activities Super-Cluster Performance Landscape CPLANT ~600 GF? 128 64 32 16
More informationKeys to node-level performance analysis and threading in HPC applications
Keys to node-level performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015 Legal Disclaimer & Optimization Notice INFORMATION
More informationMichele Tartara. Brief summary. Position and Education RECORD OF EMPLOYMENT
Michele Tartara Name Michele Tartara Date of birth February, 24th 1984 Citizenship Italian Address Via Oberdan 22, Abbiategrasso (MI) Email michele.tartara@gmail.com Italian Phone +39-3409202134 LinkedIn
More informationPyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts
PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Dan Connors, Kyle Dunn, and Ryan Bueter Department of Electrical Engineering University of Colorado Denver Denver, Colorado
More informationGETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS
Embedded Systems White Paper GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS September 2009 ABSTRACT Android is an open source platform built by Google that includes an operating system,
More informationProgramming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
More informationReconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra
More informationModel-based system-on-chip design on Altera and Xilinx platforms
CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect RJA.Grootelaar@3t.nl Agenda 3T Company profile Technology
More information