Advanced Computer Architecture Project Proposals

Transcription

1 POLITECNICO DI MILANO Dipartimento di Elettronica, Informazione e Bioingegneria Advanced Computer Architecture Project Proposals Giovanni Agosta, Amir H. Ashouri, Alessandro Barenghi, Davide Cerotti, Stefano Cherubin, Alessandro Di Federico, Davide Gadioli, Gianluca Palermo, Gerardo Pelosi, Ioannis Stamelakos, Emanuele Vitali HEAP Laboratory Advanced Computer Architecture 2016

2 README FIRST Project Overview 2 Project Tags S : Security projects I : Internet of Things projects F : FPGA projects H : High Performance Computing projects M : Machine learning for compilation/decompilation projects D : Design Space Exploration (DSE) projects Procedure 1. both the TA of the course and the researcher in charge of the project (in contacts section) mentioning your interest in the specific project using the project tag and number. Additional tools, data-sets, frameworks, etc may be sent to your if needed. 2. maximum points awarded and number of students undertaking the project have been mentioned on each project block.

3 Project S1 Parallel computing on embedded platforms with OpenCL 3 (up to 2 people, up to 4 pts per person) Parallelize the execution of a few block ciphers on a high-end mobile platform employing the OpenCL programming model Automate (script) the performance evaluation of the implementations on all the OpenCL runtimes available on the platform (host+device) Preferred Reference Platform Exynos 5410 ARM big.little SoC with PowerVR SGX544MP3 OpenCL enabled GPU ( 1. Port the C implementation of AES-GCM ( to OpenCL 2. Check correct execution with supplied test vectors on a common PC 3. Collect performance figures on the reference platform Recommended skills: C or C++ programming in a Linux environment Contacts: {alessandro.barenghi,gerardo.pelosi}@polimi.it

4 Project S2 Parallel computing on FPGAs with OpenCL 4 (up to 2 people, up to 4 pts per person) Start from an existing OpenCL implementation of a block cipher (e.g., AES) Compile it using Xilinx Vivado tools (eventually modifying it to satisfy the platform requirements) Automate and run performance evaluation Preferred Reference Platform Xilinx Zinq board 1. Port the application to FPGA using the Xilinx tools 2. Check correct execution with supplied test vectors 3. Collect performance figures on the reference platform Recommended skills: OpenCL programming Contacts: agosta@acm.org, {alessandro.barenghi,gerardo.pelosi}@polimi.it

5 Project I1 Fixed Point vs Floating Point Benchmarks 5 (up to 2 people, up to 8 pts per person) Develop benchmarks (algorithm + testing code) comparing Float vs Fixed Point arithmetics performance (starting from existing Floating Point reference) Report on existing support tools for programming fixed point platforms GeCoS ( Testing Platforms MIPS Creator CI20 Raspberry PI (ARM) 1. Target OpenCV ( Image Processing Kernels: canny, contour 2. Rewrite the benchmarks using fixed point arithmetics 3. Compare efficiency and quality of results Recommended skills: C/C++ programming in a Linux environment Contacts: agosta@acm.org

6 Project I3 6 Simulating Computer Architecture ILP Techniques: Multistage Pipeline (up to 2 people, up to 8 pts per person) Revise, test and add another technique to the existing java simulator The goal is to complete the current simulator and make it available web-based using java applets ACA Simulator source-code ( 1. Test the integrity of the existing code and the already simulated techniques (Pipeline, Scoreboard and Tomasulo) 2. Add branch-prediction feature 3. Add VLIW processor based on the ACA course 4. Using Java applets, make the current version executable on web-based Recommended skills: Java programming Contacts: amirhossein.ashouri@polimi.it

7 Project F1 MinSoC Open source CPU support for SAKURA-G board 7 (up to 2 people, up to 12 pts per person) The MinSoC ( is a RISC-based system on chip, with a development and compilation toolchain, and JTAG debug support Add the required support (connect JTAG interface and map pins ) for the MinSoC to run on Sakura-G development board Reference Platform Standard Side-channel Evaluation board Sakura-G (Xilinx Spartan-6 LX75 FPGA) ( 1. Familiarize with the MinSoC environment and compilation toolchain 2. Analyze the JTAG interface of Spartan-6 FPGAs from the reference manual 3. Wire the JTAG interface module of the MinSoC to achieve a functional JTAG interface and test its functionality with the gdb JTAG bridge Recommended skills: VHDL/Verilog working knowledge, gdb via JTAG-bridge use Contacts: {alessandro.barenghi,gerardo.pelosi}@polimi.it

8 Project H1 8 Compiler and Precision Tuning for High Performance Computing (up to 2 people, up to 12 pts per person) The ANTAREX framework provides support for compiling multiple version of the same kernels with parameters and floating point precision Modify one of the Rodinia benchmarks to take advantage of the framework and explore the performances obtained with different options Intel Xeon dual socket server, possibly NVidia GPGPUs 1. Familiarize with the ANTAREX framework 2. Analyze the chosen Rodinia benchmark 3. Modify the benchmark to take advantage of the framework and run it 4. For two people, also modify the benchmark to switch dynamically from OpenCL to OpenMP parallelisation Recommended skills: C/C++, possibly OpenMP/OpenCL Contacts: agosta@acm.org, stefano.cherubin@polimi.it

9 Project H2 AutoTuning for High Performance Computing 9 (up to 2 people, up to 12 pts per person) The ANTAREX framework provides support for tuning parameters of the application to reach specific goals (performance, power, accuracy, etc) Modify one of the Rodinia benchmarks to take advantage of the framework and explore the trade-offs obtained with different options Intel Xeon dual socket server, possibly NVidia GPGPUs 1. Familiarize with the ANTAREX framework 2. Analyze the chosen Rodinia benchmark 3. Modify the benchmark to take advantage of the framework and run it Recommended skills: C++, possibly OpenMP/OpenCL Contacts: {gianluca.palermo,davide.gadioli}@polimi.it

10 Project H3 Inter-dependencies among CPU and GPGPUs 10 (up to 2 people, up to 4 pts per person) GPGPU applications exploit the CPU to dispatch tasks on the GPU cores Investigate how the performance are affected when both CPU and GPGPU intensive applications run concurrently in the same platform Intel Xeon dual socket server, possibly NVidia GPGPUs 1. Examine the OpenCL framework in order to implement the simulateneous execution of CPU and GPGPU applications 2. Collect performance of the same OpenCL benchmark running only on CPU and only on GPGPU 3. Collect performance of the two versions of the benchmark running together simultaneously and Compare the results Recommended skills: C++, possibly OpenMP/OpenCL Contacts: davide.cerotti@polimi.it

11 Project H4 Evaluating the memory interference in Manycore platforms 11 (up to 2 people, up to 4 pts per person) In modern multicore architectures, where different applications run simultaneously, memory interference can be a serious bottleneck. The students will have to experiment with running highly parallel benchmarks from the Splash-2 and Parsec suites and measure the impact of application co-scheduling on the shared last level cache (LL$) and the memory (DRAM). Sniper architecture simulator ( benchmarks ( 1. Installing and running a manycore simulator 2. Extracting the metric from the simulation traces produced 3. Analyzing the output Recommended skills: Bash scripting Contact: ioannis.stamelakos@polimi.it

12 Project H5 Approximate Computing for HPC 12 (up to 2 people, up to 12 pts per person) Approximate computing is a technique that exploit the possibility to trade-off power/performance/accuracy by calculating enough Analyze and modify some of the applications within the Rodinia benchmarks to introduce approximate calculation and defining a quality metric to evaluate the approximation Any Intel machine 1. Familiarize with the concept of approximate computing (papers will be provided) 2. Analyze the chosen Rodinia benchmarks 3. Introduce the approximation and evaluate the trade-offs Recommended skills: C/C++ Contacts: {gianluca.palermo,davide.gadioli}@polimi.it

13 Project H6 AOP for AutoTuning in HPC 13 (up to 2 people, up to 12 pts per person) Aspect-oriented programming (AOP) is a programming paradigm that aims to increase modularity by allowing the separation of concerns. The project want to use AOP concepts (AspectC++, aspectc.org) to introduce monitoring and auto-tuning code within the functional code of an application. Any Intel machine 1. Familiarize with the concept of Aspect-oriented Programming 2. Familiarize with the ANTAREX framework for auto-tuning 3. Adopt AOP in simple codes by evaluating the intrusiveness. Recommended skills: C/C++ Contacts: {gianluca.palermo,davide.gadioli}@polimi.it

14 Project H7 Application Characterization in HPC 14 (up to 2 people, up to 10 pts per person. Extendable to M. Sc thesis) Extending an already available program characterization plugin based on Linux PIN to support parallel characterization (instrumentation) (OpenMP and MPI) MICA Dynamic characterization tool ( PIN Linux dynamic PIN instrumentation tool 1. Familiarize with the linux PIN tools 2. familiarize with PIN plugin: MICA 3. Extend parallel instrumentation of MICA within the PIN platform Recommended skills: Basic C++, Linux environment, basic parallel programming Contacts: amirhossein.ashouri@polimi.it

15 Project H8 Application Characterization in HPC 15 Parallel characterization of programs using PAPI librari (up to 2 people, up to 6 pts per person.) Page INRIA ParaSuite: 1. Familiarize with PAPI Library 2. familiarize with INRIA ParaSuite 3. Run the benchmarks and collect dunamic features with PAPI 4. Do sensitivity analysis on the collected results Recommended skills: Basic C, Linux environment, basic parallel programming Contacts: amirhossein.ashouri@polimi.it

16 Project H9 16 Finding the Thermal Safe Power Budget for Manycore Platforms (up to 2 people, up to 6 pts per person.) As technology scales, the thermal management for multicore architectures becomes a critical challenge due to the increasing power density. The efficiency of the power budgeting techniques is limited by the fixed chip-wide power budget. To solve this limitation, a thermal safe power (TSP) model can be used to calculate the core power capacity for different active core mappings. The students will have to use academic thermal estimation tools such Hotspot and TSP to estimate the thermal safe power budget of a manycore platform consisting up to 128 cores. HotSpot Temperature Modeling Tool ( Thermal Safe Power (TSP) Tool ( 1. Installing and familiarizing with Hotspot and/or TSP 2. Using the tools in order to obtain the thermal safe power budget from power traces that are gonna be provided Recommended skills: Basic computer architecture knowledge Contacts: Advancedioannis.stamelakos@polimi.it Computer Architecture 2016 POLITECNICO DI MILANO Dipartimento di Elettronica, Informazione e Bioingegneria

17 Project D1 Automatic Design Space Exploration for System on Chip 17 (1 student, up to 6 pts) The scope of this work is automatize the design space exploration for mixed critical system on chip. No particular requirements; ground truths are provided as part of the data set The objective of this project is to automatize multiple searches modifying a source xml into the format that is accepted by the interference checker. 1. Create a script able to automatically generate a compliant xml file for each possible decision. Only characteristic decisions have to be handled at this point. 2. Given an xml of a single SoC, with some mapping decisions yet to take, create all the possible mappings according to those not taken decisions, run the program and collect the output, declaring which designs are viable and which ones are not. Recommended skills: Basic Linux, Bash scripting Contacts: emanuele.vitali@polimi.it

18 Project M3 Machine learning for compiler auto-tuning 18 (up to 2 people, up to 8 pts per person. Extendable to M. Sc thesis) When dealing with auto-tuning and machine learning, understanding the best features of the program to collect plays an important role in the final prediction. To understand what program characteristics are more important to monitor Analyze the relation between the type of characteristics and the different optimizations already applied MICA Dynamic characterization tool ( 1. Familiarize with the linux PIN tools 2. familiarize with MICA tool 3. Do sensitivity analysis on the relation of between the optimized codes and the type of the characterization techniques Recommended skills: Basic C++, Linux environment, Machine learning Contacts: amirhossein.ashouri@polimi.it

19 Project M4 Machine learning for compiler auto-tuning 19 (up to 2 people, up to 8 pts per person. Extendable to M. Sc thesis) When dealing with auto-tuning and machine learning, understanding the best set of compiler optimizations and their effects on the code plays an important role on the performance metrics. Analyze the existing optimizations level and clustering the set of promising optimizations together LLVM compilation framework 1. Familiarize with the LLVM (Opt and Clang) optimization passes 2. Familiaraze with machine learning and clustering techniques 3. Applying clustering techniques on the existing optimizations to group the promising optimizations that further improve the performance Recommended skills: Machine learning Contacts: amirhossein.ashouri@polimi.it