Code Density Concerns for New Architectures

Similar documents
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Computer Architectures

LSN 2 Computer Processors

Instruction Set Architecture (ISA)

VLIW Processors. VLIW Processors

Instruction Set Design

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

Processor Architectures

Design Cycle for Microprocessors

Instruction Set Architecture (ISA) Design. Classification Categories

İSTANBUL AYDIN UNIVERSITY

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

CISC, RISC, and DSP Microprocessors

Property of ISA vs. Uarch?

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

Week 1 out-of-class notes, discussions and sample problems

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Computer Organization and Architecture

Computer Architecture. Secure communication and encryption.

CPU Organization and Assembly Language

Central Processing Unit (CPU)

The AVR Microcontroller and C Compiler Co-Design Dr. Gaute Myklebust ATMEL Corporation ATMEL Development Center, Trondheim, Norway

picojava TM : A Hardware Implementation of the Java Virtual Machine

Chapter 2 Logic Gates and Introduction to Computer Architecture

Microprocessor and Microcontroller Architecture

In the Beginning The first ISA appears on the IBM System 360 In the good old days

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

CPU Performance Equation

RISC AND CISC. Computer Architecture. Farhat Masood BE Electrical (NUST) COLLEGE OF ELECTRICAL AND MECHANICAL ENGINEERING

Instruction Set Architecture

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap Famiglie di processori) PROBLEMS

System i Architecture Part 1. Module 2

Operating System Overview. Otto J. Anshus

How To Write Portable Programs In C

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

Comparative Performance Review of SHA-3 Candidates

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

Chapter 5, The Instruction Set Architecture Level

ARM Microprocessor and ARM-Based Microcontrollers

Compilers and Tools for Software Stack Optimisation

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey


Going Linux on Massive Multicore

EE361: Digital Computer Organization Course Syllabus

The Java Virtual Machine and Mobile Devices. John Buford, Ph.D. Oct 2003 Presented to Gordon College CS 311

Computer Architecture Basics

Introduction to MIPS Assembly Programming

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

The Central Processing Unit:

Operating Systems. Virtual Memory

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

Performance Comparison of RTOS

PROBLEMS #20,R0,R1 #$3A,R2,R4

Virtualization and Other Tricks.

Intel 8086 architecture

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

CHAPTER 7: The CPU and Memory

SOC architecture and design

Using the Game Boy Advance to Teach Computer Systems and Architecture

12. Introduction to Virtual Machines

Computer Systems Design and Architecture by V. Heuring and H. Jordan

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

FLIX: Fast Relief for Performance-Hungry Embedded Applications

Lecture 17: Virtual Memory II. Goals of virtual memory

RISC-V Software Ecosystem. Andrew Waterman UC Berkeley

Cortex-A9 MPCore Software Development

Five Families of ARM Processor IP

MACHINE ARCHITECTURE & LANGUAGE

IA-64 Application Developer s Architecture Guide

Embedded Software development Process and Tools: Lesson-4 Linking and Locating Software

Industry First X86-based Single Board Computer JaguarBoard Released

An Overview of Stack Architecture and the PSC 1000 Microprocessor

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Introducción. Diseño de sistemas digitales.1

ARM Architecture. ARM history. Why ARM? ARM Ltd developed by Acorn computers. Computer Organization and Assembly Languages Yung-Yu Chuang

Reduced Instruction Set Computer (RISC)

An Introduction to the ARM 7 Architecture

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

A JAVA VIRTUAL MACHINE FOR THE ARM PROCESSOR

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Operating System Impact on SMT Architecture

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

The ARM Architecture. With a focus on v7a and Cortex-A8

Optimal Stack Slot Assignment in GCC

Cloud Computing. Up until now

Keil C51 Cross Compiler

CS352H: Computer Systems Architecture

AMD64/EM64T for HEP. Stephan Wiesand DESY -DV -

on an system with an infinite number of processors. Calculate the speedup of

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

The programming language C. sws1 1

Transcription:

Code Density Concerns for New Architectures Vincent M. Weaver Cornell University Sally A. McKee Chalmers University of Technology 7 October 2009

Introduction Benchmark ported to 21 different assembly languages Hand-optimized for minimum size Code tested and works on all architectures What ISA features lead to high code density? Can this help designers of new ISAs? 1

New ISAs? Really? ISA design still a concern FPGAs make it easy Embedded architectures want dense code Linux has 12 embedded architectures and counting 2

Benefits of Code Density L1 icache holds more instructions More data fits in unified L2 cache Less bandwidth required to memory and disk Fewer TLB misses Compact loops can be executed from instruction buffer Smaller cache footprint can lead to energy savings 3

What about Performance? Hard to optimize performance Varies across implementations Dense code often performs well 4

The Benchmarkinux Version 2.6.11.4-21.17-smp, Compiled #1 SMP Fri Apr 6 08:42:34 UTC 2007 Two 2791MHz Intel(R) Xeon(TM) Processors, 2027M RAM, 5521.40 Bogomips Total sampaka LZSS decompression System calls, including disk I/O String manipulation and search Integer to ASCII conversion 5

ISA Categories VLIW CISC RISC Embedded 8/16-bit 6

VLIW Processors ia64 16-byte bundle holds 3 instructions Instruction has 3 arguments Hundreds of integer registers Predication 7

CISC Processors m68k, s390, VAX, x86, x86 64 Variable instruction length, 1-54 bytes Instruction has 2 arguments 16 integer registers (x86 has 8) Status flags Unaligned loads Complex addressing modes 8

RISC Processors Alpha, ARM, m88k, microblaze, MIPS, PA-RISC, PPC, SPARC 4 byte instruction length Instruction has 3 arguments 32 integer registers (except ARM, SPARC) Most have a zero register Many have branch delay slot 9

Embedded Processors avr32, crisv32, sh3, ARM Thumb 2 byte instruction length Instruction has 2 arguments Most have 16 integer registers Auto-incrementing loads Status flags 10

8/16-bit Processors 6502, PDP-11, z80 Variable instruction length (1-6 bytes) Instruction has 1-2 arguments Status flags 11

Results LZSS Decompression 256 VLIW RISC bytes 192 128 64 0 ia64 alpha mips parisc sparc mblaze 6502 m88k arm s390 ppc 12 pdp-11 vax z80 m68k thumb avr32 sh3 x86_64 crisv32 i386 CISC embedded 8/16-bit

Results String Concatenation 64 VLIW RISC bytes 48 32 16 0 ia64 alpha sparc mips parisc m88k mblaze 6502 arm ppc vax 13 thumb sh3 avr32 m68k s390 z80 crisv32 x86_64 pdp-11 i386 CISC embedded 8/16-bit

Results String Search 256 VLIW RISC bytes 192 128 64 0 ia64 alpha sparc mblaze m88k parisc mips arm ppc 6502 pdp-11 14 m68k sh3 thumb vax z80 x86_64 i386 crisv32 avr32 s390 CISC embedded 8/16-bit

Results Integer ASCII 256 VLIW RISC bytes 192 128 64 0 ia64 parisc 6502 alpha arm m88k sparc z80 sh3 ppc mips 15 thumb mblaze m68k crisv32 pdp-11 s390 avr32 vax x86_64 i386 CISC embedded 8/16-bit No HW divide

Results Overall 2560 VLIW RISC 2048 CISC bytes 1536 1024 512 0 ia64 alpha parisc sparc mblaze mips m88k arm ppc 6502 s390 x86_64 16 vax sh3 m68k i386 thumb z80 avr32 crisv32 pdp-11 embedded 8/16-bit

Correlations Correlation Coefficient Architectural Parameter 0.9381 Smallest possible instruction length 0.9116 Low number of integer registers 0.7823 Low Virtual address of first instruction 0.6607 Architecture lacks a zero register 0.6159 Low Bit-width 0.4982 Few operands in each instruction 0.3854 Hardware divide in ALU 17

More Correlations Correlation Coefficient Architectural Parameter 0.3653 Unaligned load/store available 0.3129 Year the architecture was introduced 0.2521 Hardware status flags (zero/overflow/etc.) 0.2121 Auto-incrementing addressing scheme 0.0809 Machine is big-endian 0.0021 Branch delay slot 18

Results C Comparison (x86/linux) 8192 ~500kB Size (bytes) 6144 4096 2048 1024 -O3 gcc -Os gcc -Os intel -Os sun -Os intel -O3 gcc -Os gcc -Os sun -O3 gcc -Os gcc -Os gcc -Os intel -O3 gcc -Os gcc hand opt. GLIBC / STATIC GLIBC / DYNAMIC uclibc SYSCALL ONLY ASM 19

What is holding back the C version? Stack frame (Calling convention) Pointer aliasing Full program register allocation Constant loading optimizations String instructions 20

Related Work RISC Code Compression Kozuch and Wolfe investigate VAX, MIPS, SPARC, m68k, RS6000, PPC Hasegawa et al. gcc generated code on m68k, x86, i960, Sparclite, SPARC, MIPS, AMD29k, m88k, Alpha, RS6000 Flynn et al. synthetic architectures 21

Conclusions / Future Work New ISAs are continually being developed; code density is still a concern Short instruction codings are key High code density requires co-operation of ISA, operating system, system libraries, and compiler More architectures should be investigated, as well as more and larger benchmarks 22

Questions? All code is available: http://www.deater.net/weave/vmwprod/asm/ll 23