Andreas Herrmann. AMD Operating System Research Center

Similar documents
64-Bit NASM Notes. Invoking 64-Bit NASM

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

W4118 Operating Systems. Junfeng Yang

The programming language C. sws1 1

5. Calling conventions for different C++ compilers and operating systems

Lecture 27 C and Assembly

OS Virtualization Frank Hofmann

Hacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail

CHAPTER 6 TASK MANAGEMENT

Return-oriented programming without returns

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

4.1 Introduction 4.2 Explain the purpose of an operating system Describe characteristics of modern operating systems Control Hardware Access

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x Architecture

Intel 8086 architecture

Instruction Set Architecture

Introduction to Virtual Machines

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

The Plan Today... System Calls and API's Basics of OS design Virtual Machines

How To Write Portable Programs In C

CS222: Systems Programming

Multi-Threading Performance on Commodity Multi-Core Processors

System Requirements Table of contents

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

X86-64 Architecture Guide

High-speed image processing algorithms using MMX hardware

Virtualization Technologies

PARALLELS SERVER 4 BARE METAL README

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Assembly Language: Function Calls" Jennifer Rexford!

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

Using VMware Player. VMware Player. What Is VMware Player?

Operating System Overview. Otto J. Anshus

Computer Organization and Architecture

PARALLELS SERVER BARE METAL 5.0 README

CS61: Systems Programing and Machine Organization

Understanding a Simple Operating System

picojava TM : A Hardware Implementation of the Java Virtual Machine

Full and Para Virtualization

CS 152 Computer Architecture and Engineering. Lecture 22: Virtual Machines

Fall Lecture 1. Operating Systems: Configuration & Use CIS345. Introduction to Operating Systems. Mostafa Z. Ali. mzali@just.edu.

Parallels Desktop 4 for Windows and Linux Read Me

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

x64 Cheat Sheet Fall 2015

Example of Standard API

sys socketcall: Network systems calls on Linux

Data Types in the Kernel

System Requirements G E N E R A L S Y S T E M R E C O M M E N D A T I O N S

Autodesk Revit 2016 Product Line System Requirements and Recommendations

An Implementation Of Multiprocessor Linux

Active Fabric Manager (AFM) Plug-in for VMware vcenter Virtual Distributed Switch (VDS) CLI Guide

Virtualizing a Virtual Machine

Laboratory Report. An Appendix to SELinux & grsecurity: A Side-by-Side Comparison of Mandatory Access Control & Access Control List Implementations

How To Write A Test Drive For Kaspersky Anti Virus 6.0 For Windows Server (For Windows)

DNA Data and Program Representation. Alexandre David

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

Table of Contents. Introduction...9. Installation Program Tour The Program Components...10 Main Program Features...11

CS:APP Chapter 4 Computer Architecture Instruction Set Architecture. CS:APP2e

inforouter V8.0 Server & Client Requirements

UEFI on Dell BizClient Platforms

Instruction Set Architecture (ISA)

KIP AutoCAD Installation and User Guide. KIP AutoCAD Installation and User Guide

Dell Fabric Manager Installation Guide 1.0.0

Language Processing Systems

THE VELOX STACK Patrick Marlier (UniNE)

Parallels Virtual Automation 6.1

Embedded Software development Process and Tools: Lesson-4 Linking and Locating Software

Gigabit Ethernet Packet Capture. User s Guide

Backup Exec 2014 Software Compatibility List (SCL)

Virtualization for Cloud Computing

System Calls and Standard I/O

The Microsoft Windows Hypervisor High Level Architecture

DCPS STUDENT OPTION HOME USE PROGRAM SIGN UP INSTRUCTIONS

Linux Kernel Architecture

Autodesk Inventor on the Macintosh

Windows XP Professional x64 Edition for HP Workstations - FAQ

Computer Architecture. Secure communication and encryption.

Parallels Desktop for Mac

Oracle Database Scalability in VMware ESX VMware ESX 3.5

MPLAB TM C30 Managed PSV Pointers. Beta support included with MPLAB C30 V3.00

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

Novell Open Workgroup Suite

Virtualization and the U2 Databases

User Manual Version p BETA III December 23rd, 2015

About Parallels Desktop 10 for Mac

Microsoft Dynamics CRM 2011 Guide to features and requirements

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

CSC 2405: Computer Systems II

TODAY, FEW PROGRAMMERS USE ASSEMBLY LANGUAGE. Higher-level languages such

Upgrade to Microsoft Windows Server 2012 R2 on Dell PowerEdge Servers Abstract

SAN Conceptual and Design Basics

ELEC 377. Operating Systems. Week 1 Class 3

Autodesk 3ds Max 2010 Boot Camp FAQ

Computer Architectures

Jorix kernel: real-time scheduling

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Parallels Virtuozzo Containers 4.7 for Linux Readme

Transcription:

Myth and facts about 64-bit Linux Andreas Herrmann André Przywara AMD Operating System Research Center March 2nd, 2008

Myths... You don't need 64-bit software with less than 3 GB RAM. There are less drivers for 64-bit OS. You will need all new software, all 64-bit.... 64-bit software being twice as fast... 640K ought to be enough for everybody ;-) 2 03/02/2008

... and facts: The agenda 64-bit x86 hardware Availability Architecture extensions Software support ABI extensions and GCC compiler Linux kernel 32-bit compatibility Hardware support Linux compat layer 32 => 64-bit porting issues Benchmarks and performance considerations 3 03/02/2008

64-bit Hardware 4 03/02/2008

64-bit x86 Hardware Availability Every AMD Opteron processor Every AMD Athlon 64 processor Every AMD Turion 64 processor Every AMD Phenom processor Newer AMD Sempron processors (since about 2005) Every Intel Core2 processor Newer Intel Pentium 4 processors Newer Intel Xeon processors Some Intel Celeron processors => almost every nowadays sold x86 PC processor 5 03/02/2008

Architecture extension: Operation modes Long mode introduced Two sub-modes: Compatibility mode: similar to protected mode 64-bit mode: full 64-bit capability Sub-modes are noted in one bit of CS descriptor Allows to run 32-bit binaries within a 64-bit environment but requires explicit gateway to 64-bit code parts Legacy mode maintains 100% compatibility for 32-bit OSes Long mode Legacy mode Modes: Operating System: Applications: 6 03/02/2008 64-bit Compat Protected Virtual 8086 Real 64-bit (32-bit) 32-bit 16 64-bit 32-bit 32-bit 16-bit

Architecture extensions: Address space Pointers (in registers) are always 64-bit 48 bits are used for virtual addressing Virtual address translated to physical address with paging Paging compatible to PAE mode (but extending to 4 levels) Restricts physical address to 52bit Current CPUs have a limited address bus anyway: AMD Athlon 64: 40bits AMD Phenom : 48bits CR3 sign ext. 64-bit pointer 48-bit virtual address Intel Core2 : 36bits Intel Xeon : 40bits Extended virtual address space helps much (memory mapping) 52-bit physical address 40-bit to DRAM 7 03/02/2008

Architecture extensions: Register set General purpose registers extended to 64-bits Replacing the e with an r : eax -> rax, esp -> rsp,... 32-bit parts still available, simply use the e -name Number of registers doubled: r8 r15 introduced reason for performance improvements of many applications Lower 32-bit can be accessed separately SSE2 is the new natural floating point unit Number of SSE registers doubled: xmm8-xmm15 SSE registers stay at 128bits (2*64-bit, 4*32-bit, 16*8bit) eax ebx ecx edx esi edi ebp rax rbx rcx rdx rsi rdi rbp rsp esp r8 r9 r10 r11 r12 r13 r14 r15 xmm0-xmm7 xmm8-xmm15 8 03/02/2008

Architecture extensions: Instruction set In long mode some obsolete x86 features have been removed: Segmentation: base and limit are not checked Hardware task switching Call gates New addressing mode: RIP relative Default operand size and immediates stay at 32-bits Special opcodes for loading 64-bit values (movabs) REX-Prefix byte for overriding default operand size (replacing one-byte inc and dec opcodes) 9 03/02/2008

Software support for AMD64 10 03/02/2008

gcc Compiler support for AMD64 most work done by SuSE engineers ABI describes conventions for binaries, compiler complies to Data types: int=32-bit, long int=64-bit, pointer=64-bit Parameter are passed in registers (6 integer, 8 floating point) Stack frame layout is simplified (-fomit-frame-pointers) Takes advantage of extended number of registers Uses 32-bit operands when possible to save space Contains cross-compiler functionality for i386 (-m32) 11 03/02/2008

Linux x86-64 architecture Officially introduced in Linux 2.6 as arch/x86-64 Merged with i386 in 2.6.24 (now: x86) No legacy code for older processors Introduces compat layer for i386 binaries 12 03/02/2008

32-bit compatibility 13 03/02/2008

32-bit compat: Hardware support Compatibility mode allows to run unmodified 32-bit binaries in a 64-bit environment (no 16-bit binaries!) Almost equal to 32-bit protected mode (including segmentation!) Syscalls switch between 32 and 64-bit Addresses get zero-extended to 64-bit Applications can use the lower 4 GB of virtual address space Which can be mapped to any physical address range! 14 03/02/2008

Linux compat layer Linux kernel maintains compat layer Allows user-space programs to be 32-bit Kernel cares about bitness Invokes appropriate /lib{32,64}/ld.so Booting a 32-bit installation with a 64-bit kernel works! Allows for several 32-bit apps to take more than 4GB Syscalls get translated to match size, pointers, structure layout and numbering 15 03/02/2008

Linux compat in real life Switching between 32-bit and 64-bit applications is automatic 32-bit apps look for libraries in a different directory Actual algorithm depends on distribution SuSE, RedHat: 32-bit: /lib, 64-bit: /lib64 Debian, Ubuntu: 32-bit: /lib32, 64-bit: /lib64 (symlinked to /lib) 32-bit apps need separate libraries, often called lib32* This applies to all libraries used (dependency chain!) Results in growing size of lib[32] directory linux32 tool to switch uname output for easier configuration (uses Linux personality feature) 16 03/02/2008

Porting overview Many programs just need to be recompiled......but some may create problems Programmer visible changes sizeof(long)==8 sizeof(void*)==8 Look for Usage of type long Assignment of pointers to ints and vice versa Dumping of data structures to disk / network printf/scanf format specifiers Care about all warnings! 17 03/02/2008

Porting issues: practical advices Don't assign pointers to ints and vice versa! (size mismatch) use [u]intptr_t instead Don't dump structures to disk or over network (padding!) use packed structs or better: write every single member use explicit types: [u]int32_t, [u]int64_t, [u]int16_t don't dump system structures like struct stat or struct tm! Use printf format specifiers from <inttypes.h> or casts int64_t foo; printf ( Number: % PRId64 \n, foo); off_t ofs; printf ( Pos: %lli\n, (long long) ofs); 18 03/02/2008

Benchmarks 19 03/02/2008

Benchmarks Real world benchmarks: povray, oggenc, mencoder kernel compilation Comparing apples and oranges compiling mozilla-firefox compiling gtk+ Microbenchmarks: Syscall performance Function call and 64-bit arithmetics 20 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

Benchmark setup Same hardware for all tests Dual-core K8, 2x1024 kbyte L2 cache, 1024 MByte RAM 2 Gentoo Linux installations (32-bit, 64-bit): Identical USE flags, same packages installed 64-bit: CFLAGS="-march=k8 -O2 -pipe" 32-bit: CFLAGS="-march=k8 -O2 -pipe -fomit-frame-pointer" 3 host architectures as test environments: 64-bit installation 32-bit installation 32-bit compat: 64-bit kernel with 32-bit installation and linux32 Cross-compilers created using crossdev - stable - target... 21 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

Benchmarks: povray, oggenc, mencoder 100,0 90,0 101,8 100,0 100,0 100,1 100,0 100,6 93,0 96,5 80,0 75,7 time (relative to 32-bit) 70,0 60,0 50,0 40,0 30,0 20,0 Host architecture 32-bit 32-bit compat 64-bit 10,0 0,0 POV-Ray oggenc mencoder Time for a benchmark run, less is better 22 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

Benchmarks: apples and oranges 120,0 114,4 110,0 100,0 100,0 101,5 100,0 107,2 time (relative to 32-bit) 90,0 80,0 70,0 60,0 50,0 40,0 80,2 Host architecture 32-bit 32-bit compat 64-bit 30,0 20,0 10,0 0,0 emerge firefox emerge gtk+ Time for a benchmark run, less is better 23 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

Benchmarks: Kernel compile (ARCH=i386) time (percentage to 32-bit, i686-gcc) 110 100 90 80 70 60 50 40 30 20 100 101 108 107 108 105 Host architecture 32-bit 32-bit compat 64-bit 10 0 CROSS_COMPILE=i686-pc-linux-gnu- CROSS_COMPILE=x86_64-pc-linux-gnu- Time for a (cross) kernel compile, less is better 24 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

Benchmarks: Kernel compile (ARCH=x86_64) 110 100 100 101 98 time (percentage to 32-bit) 90 80 70 60 50 40 30 20 10 Host architecture 32-bit 32-bit compat 64-bit 0 CROSS_COMPILE=x86_64-pc-linux-gnu- Time for a (cross) kernel compile, less is better 25 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

Microbenchmarks: Syscalls on compat 240 220 200 Time (relative to x86-64) 180 160 140 120 100 80 60 40 20 x86-64 i686 on x86-64 0 getpid() fstat(/dev/zero) read(/dev/zero,1) syscall used Time for calling 10 million syscalls in a loop, less is better 26 03/02/2008

Microbenchmarks: Arithmetics 275 250 Time (relative to each int i686) 225 200 175 150 125 100 75 50 25 int i686 long i686 long long i686 int amd64 long amd64 long long amd64 0 -O0 -O1 -O2 Compiler optimization switch Doing additions of several numbers in a function called in a loop 27 03/02/2008

Example: 64-bit arithmetics and function calls uint64_t do_add (uint64_t a, uint64_t b) { return a+b; } gcc -S -m32 -O2 add.c gcc -S -m64 -O2 add.c do_add: pushl movl %ebp %esp, %ebp do_add: leaq (%rdi,%rsi), %rax ret movl addl movl adcl 8(%ebp), %eax 16(%ebp), %eax 12(%ebp), %edx 20(%ebp), %edx leave ret instruction size: 17 bytes 28 03/02/2008 instruction size: 5 bytes

Performance aspects Pro x86 64-bit Linux Extended register set Parameter passing in registers Native 64-bit arithmetics Enhanced virtual address space Contra x86 64-bit Linux Larger memory footprint (larger binaries, 64-bit operands) Cache utilization 29 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

Conclusions 30 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

Myths revisited You don't need 64-bit software with less than 3 GB RAM. Performance advantages even on lesser equipped machines. There are less drivers for 64-bit OS. Mostly irrelevant for Linux (hail Open Source). You will need all new software, all 64-bit. 32-bit compat mode performs very well and is transparent.... 64-bit software being twice as fast... Only in very rare cases. (Lots of software is optimized for 32-bit.) 640K ought to be enough for everybody ;-) 2 GB or 4GB barrier is just around the corner. 31 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

Conclusion Use a 64-bit system and use 32-bit apps in compat mode if necessary. 32 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

References Comparing 32-bit vs. 64-bit performance Phoronix: "AMD Phenom 32-bit vs. 64-bit Performance" (http://www.phoronix.com/scan.php?page=article&item=998&num=1) Zdnet: Vista 32-bit vs. 64-bit & RTM vs. SP1 (http://blogs.zdnet.com/hardware/?p=1367) Geekpatrol: 32-bit vs 64-bit Performance Under Mac OS X (http://www.geekpatrol.ca/2006/09/32-bit-vs-64-bit-performance/) Jan Hubicka: Porting GCC to the AMD64 architecture (http://www.ucw.cz/~hubicka/papers/amd64/index.html) System V Application Binary Interface for AMD64 (http://www.x86-64.org/documentation/abi.pdf) Which safe compile flags to use for your Gentoo installation: (http://gentoo-wiki.com/safe_cflags) 33 02/27/2008 64-bit Linux - Andreas Herrmann, André Przywara, OSRC

Trademark Attribution AMD, the AMD Arrow logo, AMD Opteron, AMD Athlon, AMD Sempron, AMD Turion and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Linux is a registered trademark of Linus Torvalds. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. 2008 Advanced Micro Devices, Inc. All rights reserved. 34 03/02/2008

Porting: Usage of long Explicit usage of long is rare ANSI-C 90: sizeof(long)==sizeof(void*) This is still mostly true, but C99 drops this assurance! Better replace long by int or by intptr_t Think about hidden longs (typedefs like size_t, off_t)! Avoid unnecessary broadening to 64-bit Windows has complete different understanding of this! Advice: Track usage of long in program! Better replace it! 35 03/02/2008

Porting: Assignment of pointers to ints Many programs use void* for opaque types Passing int types into those variables Compiler warns: incompatible size! Use [u]intptr_t (C99 type) #include <stdint.h> 36 03/02/2008

Porting: Dumping to disk Writing longs and pointers to disk or network breaks protocol Protocol assumes 32-bit size, but variable is 64-bit! Hardcoded size (4) works with write(little endian!), but is ugly Hardcoded size may work on reading, upper 32-bit undefined Portable size (sizeof(long)) will break it Replace types in structures with explicit types [u]int32_t, [u]int64_t, [u]int16_t, [u]int8_t Then both hardcoded size and sizeof() work Alignment may change! Use attribute (( packed )) on gcc Better avoid dumping or mapping of structs or variables at all 37 03/02/2008

Porting: printf/scanf issues Source of many warnings: typedef long int64_t; // typedef long long int64_t for 32-bit typedef int64_t off_t; off_t filepos; filepos=lseek (fd, SEEK_CUR, 0); printf ( currently at %lli bytes\n, filepos); Warning on 64-bit: long long int specifier, long int argument! Solutions: use %z for size_t, %p for pointers cast arguments to long long and use %lli Specify explicit types with macros (#include <stdint.h>) printf ( % PRIi64 \n, myint64); 38 03/02/2008