Program Optimization for Multi-core Architectures



Similar documents
High Performance Computer Architecture

COMPUTER SCIENCE AND ENGINEERING - Microprocessor Systems - Mitchell Aaron Thornton

Introduction to Cloud Computing

ADVANCED COMPUTER ARCHITECTURE

Introduction to Cloud Computing

DIRECT PH.D. (POST B.S.) IN COMPUTER SCIENCE PROGRAM

Parallel Computing. Benson Muite. benson.

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

CA Compiler Construction

CSC475 Distributed and Cloud Computing Pre- or Co-requisite: CSC280

Parallel Programming Survey

A Lab Course on Computer Architecture

HPC Wales Skills Academy Course Catalogue 2015

Parallel Algorithm Engineering

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

SYLLABUS. 1 seminar/laboratory 3.4 Total hours in the curriculum 42 Of which: 3.5 course

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

Multi-GPU Load Balancing for Simulation and Rendering

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

How To Understand The Design Of A Microprocessor

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Introduction to GPU hardware and to CUDA

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

Multi-Threading Performance on Commodity Multi-Core Processors

NEW YORK CITY COLLEGE OF TECHNOLOGY/CUNY Computer Systems Technology Department

Stage III courses COMPSCI 314

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Digital Systems Design! Lecture 1 - Introduction!!

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Master Degree in Computer Science and Networking

Multicore Parallel Computing with OpenMP

MPI and Hybrid Programming Models. William Gropp

Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

High Performance Computing

Control 2004, University of Bath, UK, September 2004

How To Understand The Principles Of Operating Systems

A bachelor of science degree in electrical engineering with a cumulative undergraduate GPA of at least 3.0 on a 4.0 scale

CSEE W4824 Computer Architecture Fall 2012

Course Development of Programming for General-Purpose Multicore Processors

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Weighted Total Mark. Weighted Exam Mark

NETWORK SYSTEMS 3 Learner Guide

Xeon+FPGA Platform for the Data Center

High Performance Computing

An Open Architecture through Nanocomputing

DIGITAL FORENSICS SPECIALIZATION IN BACHELOR OF SCIENCE IN COMPUTING SCIENCE PROGRAM

CS 394 Introduction to Computer Architecture Spring 2012

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Lecture 1 Introduction to Parallel Programming

CS 159 Two Lecture Introduction. Parallel Processing: A Hardware Solution & A Software Challenge

Cluster, Grid, Cloud Concepts

Distributed Systems Lecture 1 1

Parallel Computing for Data Science

On the Importance of Thread Placement on Multicore Architectures

Multi-core Curriculum Development at Georgia Tech: Experience and Future Steps

Artificial Intelligence. Class: 3 rd

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

Driving force. What future software needs. Potential research topics

10- High Performance Compu5ng

EE361: Digital Computer Organization Course Syllabus

Language Processing Systems

Performance evaluation

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G.

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

CMSC 611: Advanced Computer Architecture

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Multi-core Programming System Overview

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008. Course Syllabus

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

High Performance Computing

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Virtual Machines as an Aid in Teaching Computer Concepts

Trends in High-Performance Computing for Power Grid Applications

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

Gildart Haase School of Computer Sciences and Engineering

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

GPU Computing - CUDA

Operating System Structures

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

Parallel Computing. Introduction

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Parallel Programming

Transcription:

Program Optimization for Multi-core Architectures Sanjeev K Aggarwal (ska@iitk.ac.in) M Chaudhuri (mainak@iitk.ac.in) R Moona (moona@iitk.ac.in) Department of Computer Science and Engineering, IIT Kanpur - 208016 India Acknowledgement: This course has been developed with support from Intel Semiconductors (US) Limited 1

Motivation for doing this course Processors and computer systems are becoming more and more powerful Applications are becoming more and more demanding Important applications: Engineering simulations (FM, CFD, structures) Biology (Genetics, cell structure, molecular biology) Particle and nuclear physics, high energy physics, astrophysics, weather prediction, drug design, tomography, molecular dynamics Entertainment industry 2

Motivation These application require too much compute power Compute power is available! However, software systems (algorithms, compilers, libraries, debuggers) AND programmers (mainly trained in sequential programming) are unable to exploit the compute power 3

Motivation Solving these complex problems and programming these architectures require different methodology Better algorithms, compilers, libraries, profilers, application tuners, debuggers etc. have to be designed Programmers have to be trained to use these tools 4

High Performance Systems Power of high performance systems come from Hardware technology: faster machines, more cache, low latencies between devices Multilevel architectural parallelism Pipelines: out of order execution Vector: handles arrays with a single instruction Parallel: lot of processors each capable of executing independent instruction stream VLIW: handles many instructions in a single cycle Clusters: large number of systems on a very fast network Grid: cooperation of a large number of systems 5

Processor design problems Processor frequency and power consumption seem to be scaling in lockstep How can machines stay on historic performance curve without burning down the system!? Moore s Law is still applicable Physics and chemistry at the nano-scale: Materials Transistor leakage current, quantum effect coming into play Wire lengths have to be reduced 6

Processor design problems For 200 MHz frequency steps two steps back on frequency cuts power consumption by ~40% Same thermal envelope for dual core running at n-2 as the single processor running at n 7

Sources and types of Parallelism Structured: identical tasks on different data sets Unstructured: different data streams and different instructions Algorithm level: appropriate algorithms and data structures Programming: specify parallelism in parallel languages write sequential code and use compilers use course grain parallelism: independent modules use medium grain parallelism: loop level use fine grain parallelism: basic block or statement Expressing parallelism in programs no good languages most applications are not multi-threaded Writing multi-threaded code increases software costs programmers are unable to exploit whatever little is available 8

What will we learn in the course? Processor architectures with focus on memory hierarchy, instruction level parallelism and multicore architectures Program analysis techniques for redundancy removal and optimization for high performance architectures Concurrency and operating systems issues in using these architectures Programming techniques for exploiting parallelism (use of message passing libraries) Tools for code analysis and optimization (Intel compilers, profilers and application tuning tools) 9

What will we learn Understand paradigms for programming high end machines, compilers and runtime systems Applications requirements Shared-memory programming Optimistic and pessimistic parallelization Memory hierarchy optimization Focus on software problem for multicore processors 10

What do we expect to achieve by the end of the course? Faculty who can teach this course and conduct research in this area Students who can design, develop, understand, modify/enhance, and maintain complex applications which run on high performance architectures (in addition to doing research!) A set of slides, notes, projects and laboratory exercises which can be used for teaching this course in future both at IITK and at other universities 11

Organization of the course Approximately 40 lectures of one hour each (both by faculty and students) One term paper/project to be done individually. It is important to start early. (30% credit) Some small programming assignments for laboratory (20% credit) One mid semester examination (20% credit) One end semester examination (30% credit) Every one is expected to participate in the discussions in the 12 class

Ethical Issues Copying material from internet and other sources DO NOT use cut and paste technology to prepare your reports DO NOT copy assignments You can borrow ideas (after giving due credit) but not the text/programs Look at the word Plagiarism in www.dictionary.com a piece of writing that has been copied from someone else and is presented as being your own work 13

Ethical Issue Look at the word Plagiarism in http://en.wikipedia.org/wiki/plagiarism Plagiarism is a form of cheating, and within academia is seen as academic dishonesty. It is a matter of deceit: fooling a reader into believing that certain written material is original when it is not. Plagiarism is a serious and punishable academic offense, when the goal is to obtain some sort of personal academic credit or personal recognition 14

Background required Knowledge of basic compiler, operating systems and computer organization Focus on back-end of the compiler and concurrency issues in operating systems Computer Organization Computer Organization and Design by Patterson and Hennessy Compiler reference: Dragon book Compilers: Principles, Tools and Techniques by Aho, Sethi and Ullman Operating Systems reference Operating Systems Concepts by Silberschatz and Galvin 15

No specific text book!! References Material has been collected from various sources like books, research papers, position papers etc. We will make the material available as we go along Our slides and class notes will be useful material 16

Some useful references ACKNOWLEDGEMENT: Some of the figures have been taken from the Wolfe's book on High Performance Compilers for Parallel Computing. Material on OpenMP are the tutorials given by Tim Mattson (Intel) and Rudolf Eigenmann (Purdue University) at Super Computing 2001. Computer Architecture J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kofmann publishers, 3rd Edition. D. E. Culler, J. P. Singh, with A. Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann publishers, 2nd Edition. 17

Some useful References Redundancy removal Aho A, Lam Monica S, Sethi R, Ullman J D, ``Compilers Principles, Techniques, and Tools'', Addison-Wesley Publishing Company, 2007. Hecht M S, ``Flow Analysis of Computer Programs'', Elsevier North Holland, Inc., 1977. High Performance Compilers, Data Dependence Analysis Wolfe M, ``High Performance Compilers for Parallel Computing'', Addison- Wesley Publishing Company, 1996. Muchnick S S, ``Advanced Compiler Design Implementation'', Morgan-Kaufmann Publishers, 1997. Allen Randy and Ken Kennedy, ``Optimizing Compilers for Modern Architectures, Morgan Kauffman Publishers, 2002 Operating Systems Tanenbaum A S, Distributed Operating Systems, Prentice Hall. Coulouris, Dollimore and Kindberg Distributed Systems Concept and Design, Addison-Wesley. Silberschatz, Galvin, Operating Systems Principles, Addison-Wesley 18