Using Power to Improve C Programming Education



Similar documents
Multi-Threading Performance on Commodity Multi-Core Processors

Chapter 6, The Operating System Machine Level

Real Time Programming: Concepts

Benchmarking FreeBSD. Ivan Voras

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

A Lab Course on Computer Architecture

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

TPCalc : a throughput calculator for computer architecture studies

Assessment for Master s Degree Program Fall Spring 2011 Computer Science Dept. Texas A&M University - Commerce

Instruction scheduling

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Practical Performance Understanding the Performance of Your Application

1 Organization of Operating Systems


Introduction to Operating Systems. Perspective of the Computer. System Software. Indiana University Chen Yu

Course Development of Programming for General-Purpose Multicore Processors

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Introduction to GPU hardware and to CUDA

Chapter 5 Instructor's Manual

University of Dayton Department of Computer Science Undergraduate Programs Assessment Plan DRAFT September 14, 2011

Operating Systems, 6 th ed. Test Bank Chapter 7

Eclipse Visualization and Performance Monitoring

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

An Implementation Of Multiprocessor Linux

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Processor Architectures

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

University of Pennsylvania Department of Electrical and Systems Engineering Digital Audio Basics

Introduction to Cloud Computing

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

POWER8 Performance Analysis

Load Manager Administrator s Guide For other guides in this document set, go to the Document Center

Intel DPDK Boosts Server Appliance Performance White Paper

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK tel:

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Chapter 11 I/O Management and Disk Scheduling

Operating Systems Introduction

Parallel Computing with Mathematica UVACSE Short Course

VMware Server 2.0 Essentials. Virtualization Deployment and Management

Parallel and Distributed Computing Programming Assignment 1

GPUs for Scientific Computing

Advanced Computer Networks Project 2: File Transfer Application

İSTANBUL AYDIN UNIVERSITY

Forming a P2P System In order to form a P2P system, the 'central-server' should be created by the following command.

Chapter 1: Introduction. What is an Operating System?

x86 ISA Modifications to support Virtual Machines

Generations of the computer. processors.

Figure 1: Graphical example of a mergesort 1.

Instruction Set Architecture (ISA)

Operating Systems. Notice that, before you can run programs that you write in JavaScript, you need to jump through a few hoops first

Load Testing and Monitoring Web Applications in a Windows Environment

CLIENT SERVER BASED FILE SHARING SYSTEM

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Delivering Quality in Software Performance and Scalability Testing

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Apache Traffic Server Extensible Host Resolution

Assessment Plan for CS and CIS Degree Programs Computer Science Dept. Texas A&M University - Commerce

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

Multi-GPU Load Balancing for Simulation and Rendering

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Rational Application Developer Performance Tips Introduction

Mobile Application Development Android

CSC 2405: Computer Systems II

AC : A PROCESSOR DESIGN PROJECT FOR A FIRST COURSE IN COMPUTER ORGANIZATION

Operating System Tutorial

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

Intro to Virtualization

Linux Process Scheduling Policy

Monitoring, Tracing, Debugging (Under Construction)

An Easier Way for Cross-Platform Data Acquisition Application Development

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Recommended hardware system configurations for ANSYS users

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ò Paper reading assigned for next Thursday ò Lab 2 due next Friday ò What is cooperative multitasking? ò What is preemptive multitasking?

Virtualization Technology (or how my Windows computer gave birth to a bunch of Linux computers)

Performance And Scalability In Oracle9i And SQL Server 2000

DIABLO VALLEY COLLEGE CATALOG

NetBeans Profiler is an

10CS35: Data Structures Using C

Unix Security Technologies: Host Security Tools. Peter Markowsky <peterm[at]ccs.neu.edu>

CSE 120 Principles of Operating Systems. Modules, Interfaces, Structure

Zing Vision. Answering your toughest production Java performance questions

Cloud Web-Based Operating System (Cloud Web Os)

HeapStats: Your Dependable Helper for Java Applications, from Development to Operation

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Transcription:

Using Power to Improve C Programming Education Jonas Skeppstedt Department of Computer Science Lund University Lund, Sweden jonas.skeppstedt@cs.lth.se jonasskeppstedt.net jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 1 / 16

Outline Background and Problem Our approach Forsete an Automatic Grader Advantages with the Power Architecture Conclusion and near future jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 2 / 16

Background and Problem 1(2) There are two courses on C in Lund: C Programming focus on clean code plus ISO C Standard. Algorithm Implementation focus on efficient C. The C11 atomic types, memory model, and multithreading is taught in the Multicore Programming course. jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 3 / 16

Background and Problem 2(2) Previously the programming assignments were graded manually by teaching assistants during weekends. The grading is very strict so most students need multiple iterations. Problems with this approach: 72 hour latency from handin to reject with occasional pass. Since passed assignments are required for writing the exam, it often became stressful for some students. It costs money to pay the TA s. jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 4 / 16

A Different Approach Automatic Grading To eliminate these problems I wrote an automatic grader which cuts the latency to a few minutes (email plus a grading queue). Students can try any number of times and almost all were finished before the written exam. A new challenge is to motivate students to performing their best despite only a machine sees their code. jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 5 / 16

A Competition: Memory Efficiency Assignments which pass all tests are assigned a score. The score is the size of static data and code of their file. Assignments with the same score are sorted by a timestamp. There is a new assignment each week and ranks are accumulated: RPN calculator Find longest word in input Polynomial multiplication After two assignments, three students each had accumulated four points. jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 6 / 16

The Prize jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 7 / 16

The Automatic Grader Forsete forsete.net Forsete is a judge in Nordic mythology who is always fair. The Forsete program runs as root, fetches mails, and grades the code. The score is then sent back with disassembled Power machine code. You can try it by sending an email of the form: To: cbook@forsete.net Subject: assignment poly by username Make the code as small as possible my score is 735 bytes. Sample input can be found at the site of the course book Writing Efficient C Code: A Thorough Introduction, 2nd ed.: writing-efficient-c-code.com jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 8 / 16

Forsete Checks the source code against the Linux kernel style guide. Creates a random problem for input, runs a reference implementation, and records heap usage by the reference implementation. Forks to compile the code. Forks, limits stack area, changes root directory and switches Unix uid. Executes the program and checks heap usage and output for the random input and for corner cases. A timeout kills too slow programs this happened often and was very valuable to many students. At most 4 times the reference heap size is allowed and no leaks. jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 9 / 16

Encouraging Simplicity and Elegance We want students to learn writing simple and elegant C code. Code efficiency is the focus of a different course (EDAF15). Elegant code often is memory efficient. Checking the high score list gives important feedback. For the Longest Word, the scores ranged from 189 to 767 bytes. So we need to create a desire to scrutinize machine code. How? jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 10 / 16

Advantages with the Power Architecture 1(3) The generated code should be relatively predictable, and easy to read, including register usage. Power advantages: fixed sized instructions simplifies reasoning about size, large register sets, and regular addressing modes. Availability of mature optimizing compilers: gcc -Os is great. Anton Klarén, winner of the 2015 EDAA25 Lund University Memory Efficient C Code Programming Competition: The gcc compiler for Power does not generate any instruction that you don t understand what it does or why it is there! jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 11 / 16

Advantages with the Power Architecture 2(3) Easy access to detailed online documentation was also important. Also, the course book introduces Power. Availability of good development platforms either e.g. a POWER8 server or, as in Lund, several 4-way multiprocessors based on IBM s 970MP clocked at 2.5 GHz. We use Power not only for the C Programming course but also in Multicore Programming Algorithm Implementation and then development machines with good performance are essential. jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 12 / 16

Advantages with the Power Architecture 3(3) In the Multicore Programming course, the advanced memory model of Power lets students explore what theory really means in terms of performance Forsete was used for a parallel graph problem (dataflow analysis) and here the score is execution time. The winners were Valdemar Roxling and again Anton Klarén. Availability of detailed pipeline simulators is yet another important advantage for Power when selecting a platform for CS education. The pipeline visualizer (scrollpv) from IBM Austin has been invaluable in making students understand the performance of superscalar processors and branch prediction, the reorder buffer (global completion table) and rename registers. jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 13 / 16

MSc Theses Using Power Karl Hylén: Processor Models for Instruction Scheduling using Constraint Programming first ever work in the area with measurements on a real machine. Anton Botvalde and Andreas Larsson: Performance Evaluation of ISO C restrict on the Power Architecture noted why a deleted floating point load instruction could make a program slower valuable insight for compiler writers. For both of these, using the Power architecture was crucial primarily because of the interesting machine, the detailed documentation from IBM of the 970MP pipelines, and IBM s pipeline simulator. jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 14 / 16

Conclusion and Near Future Also for universities, the Power Architecture is a fantastic platform. In the Optimizing Compilers course in September we will use Power. My book An Introduction to the Theory of Optimizing Compilers with Performance Measurements on Power will be available in August. It will have comparisons of clang, gcc, and my C compiler, which was validated for ISO C99 conformance in 2003. The mentioned M.Sc. theses can be downloaded from: jonasskeppstedt.net/theses jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 15 / 16

Resources and Remarks The pipeline simulator is available as the Performance Simulator for Linux on Power (sim_ppc) in the SDK for Linux on Power at IBM. Programming assignments with math problems are most appreciated. It can obviously be dangerous to execute unknown arbitrary code. By changing root directory, switching Unix uid, disabling the network for this uid it is safe. It is important to make competitions intensive the best competing students tend to spend a lot of time on this and that cannot go on for much more than three weeks. jonasskeppstedt.net jonas.skeppstedt@cs.lth.se 2016 16 / 16