Adaptation par J.Bétréma CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP
Instruction Set Architecture Application Program Jeu d instructions du processeur, ou plutôt du modèle de processeur. Compiler ISA CPU Design OS Circuit Design Chip Layout 2 CS:APP
X86 Evolution: Programmer s View Name Date Transistors 8086 1978 29K 16-bit processor. Basis for IBM PC & DOS Limited to 1MB address space. DOS only gives you 640K 386 1985 275K Extended to 32 bits. Added flat addressing Capable of running Unix Linux/gcc uses no instructions introduced in later models Pentium 1993 3.1M 3 CS:APP
X86 Evolution: Programmer s View Name Date Transistors Pentium III 1999 8.2M Added streaming SIMD instructions for operating on 128-bit vectors of 1, 2, or 4 byte integer or floating point data Our fish machines Pentium 4 2001 42M Added 8-byte formats and 144 new instructions for streaming SIMD mode 4 CS:APP
X86 Evolution: Clones Advanced Micro Devices (AMD) Historically AMD has followed just behind Intel A little bit slower, a lot cheaper Recently Recruited top circuit designers from Digital Equipment Corp. Exploited fact that Intel distracted by IA64 Now are close competitors to Intel Developing own extension to 64 bits 5 CS:APP
New Species: IA64 Name Date Transistors Itanium 2001 10M Extends to IA64, a 64-bit architecture Radically new instruction set designed for high performance Will be able to run existing IA32 programs On-board x86 engine Joint project with Hewlett-Packard Itanium 2 2002 221M Big performance boost 6 CS:APP
Y86 Processor State %eax %ecx %edx %ebx Program registers %esi %edi %esp %ebp Condition codes OF ZF SF PC Memory Program Registers Same 8 as with IA32. Each 32 bits Program Counter Indicates address of instruction 7 CS:APP
Condition codes %eax %ecx %edx %ebx Program registers %esi %edi %esp %ebp Condition codes OF ZF SF PC Memory Single-bit flags set by arithmetic or logical instructions» OF: Overflow ZF: Zero SF:Negative Il manque CF : Carry (retenue). On travaille donc sur des entiers relatifs ( avec signe ). Correspondance avec la notation NZVC :» N = SF Z = ZF V = OF 8 CS:APP
Memory %eax %ecx %edx %ebx Program registers %esi %edi %esp %ebp Condition codes OF ZF SF PC Memory Byte-addressable storage array Words stored in little-endian byte order 9 CS:APP
Machine Words Machine Has Word Size Nominal size of integer-valued data Including addresses Most current machines are 32 bits (4 bytes) Limits addresses to 4GB Becoming too small for memory-intensive applications High-end systems are 64 bits (8 bytes) Potentially address 1.8 X 10 19 bytes Machines support multiple data formats Fractions or multiples of word size Always integral number of bytes 10 CS:APP
Word-Oriented Memory Organization Addresses Specify Byte Locations Address of first byte in word Addresses of successive words differ by 4 (32-bit) or 8 (64-bit) 32-bit Words Addr = 0000?? Addr = 0004?? Addr = 0008?? Addr = 000c?? 64-bit Words Addr = 0000?? Addr = 0008?? Bytes Addr. 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000a 000b 000c 000d 000e 000f 11 CS:APP
Byte Ordering Example Big Endian Least significant byte has highest address Little Endian Least significant byte has lowest address Example Variable n has 4-byte representation 0x01234567 Address given by &n is 0x100 Big Endian Little Endian 0x100 0x101 0x102 0x103 01 23 45 67 0x100 0x101 0x102 0x103 67 45 23 01 12 CS:APP
Y86 Instructions Format 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state PC incrémenté selon la longueur de l instruction, pour pointer sur l instruction suivante 13 CS:APP
Encoding Registers Each register has 4-bit ID %eax %ecx %edx %ebx 0 1 2 3 %esi %edi %esp %ebp 6 7 4 5 Same encoding as in IA32 IA32 utilise seulement 3 bits pour coder un registre Register ID 8 indicates no register Will use this in our hardware design in multiple places 14 CS:APP
Instruction Example Addition Instruction Generic Form Encoded Representation addl ra, rb 6 0 ra rb Add value in register ra to that in register rb Store result in register rb Note that Y86 only allows addition to be applied to register data Set condition codes based on result e.g., addl %eax,%esi Encoding: 60 06 Two-byte encoding First indicates instruction type Second gives source and destination registers 15 CS:APP
Arithmetic and Logical Operations Instruction Code Function Code Add addl ra, rb 6 0 ra rb Subtract (ra from rb) subl ra, rb 6 1 ra rb And andl ra, rb 6 2 ra rb Refer to generically as OPl Encodings differ only by function code Set condition codes as side effect Manquent (entre autres) : inc (incrémentation), dec, sh (shift = décalage) Exclusive-Or xorl ra, rb 6 3 ra rb 16 CS:APP
Arithmetic and Logical Operations (2) Dans les architectures RISC (Reduced Instruction Set Computer) les opérations portent sur 3 registres : deux registres sources pour les opérandes un registre destination pour le résultat Attention : ici (x86 ou Y86) le second registre sert aussi de destination, et le second opérande est donc écrasé. 17 CS:APP
Move Operations rrmovl ra, rb 2 0 ra rb Register --> Register irmovl V, rb 3 0 8 rb V Immediate --> Register rmmovl ra, D(rB) 4 0 ra rb D Register --> Memory mrmovl D(rB), ra 5 0 ra rb D Memory --> Register Like the IA32 movl instruction Simpler format for memory addresses Give different names to keep them distinct 18 CS:APP
Move Instruction Examples IA32 Y86 Encoding movl $0xabcd, %edx irmovl $0xabcd, %edx 30 82 cd ab 00 00 movl %esp, %ebx rrmovl %esp, %ebx 20 43 movl -12(%ebp),%ecx movl %esi,0x41c(%esp) mrmovl -12(%ebp),%ecx rmmovl %esi,0x41c(%esp) 50 15 f4 ff ff ff 40 64 1c 04 00 00 movl $0xabcd, (%eax) movl %eax, 12(%eax,%edx) movl (%ebp,%eax,4),%ecx 19 CS:APP
Load Ce sont les instructions coûteuses! mrmovl 12(%ebp),%ecx mrmovl (%ebp),%ecx mrmovl 0x1b8,%ecx cas général (déplacement) le registre ebp sert de pointeur adresse absolue de lecture Store rmmovl %esi,0x41c(%esp) rmmovl %esi,(%esp) rmmovl %esi,0x41c cas général (déplacement) le registre esp sert de pointeur adresse absolue d écriture 20 CS:APP
Adressage par déplacement mrmovl 12(%ebp),%ecx cas général (déplacement) On charge le mot d adresse ebp + 12 dans le registre ecx. Load = charger = lire. rmmovl %esi,0x41c(%esp) cas général (déplacement) On sauvegarde le registre esi à l adresse esp + 0x41c. Store = sauve(garde)r = écrire. 21 CS:APP
Jump Instructions Jump Unconditionally jmp Dest 7 0 Jump When Less or Equal jle Dest 7 1 Jump When Less jl Dest 7 2 Jump When Equal je Dest 7 3 Jump When Not Equal jne Dest 7 4 Jump When Greater or Equal Dest Dest Dest Dest Dest Refer to generically as jxx Encodings differ only by function code Based on values of condition codes Same as IA32 counterparts Encode full destination address Unlike PC-relative addressing seen in IA32 jge Dest 7 5 Dest Jump When Greater jg Dest 7 6 Dest 22 CS:APP
Sauts conditionnels Jump When Less : saut effectué si le résultat de la dernière opération (addl, subl, andl, xorl) est négatif; ne pas oublier l overflow! ( SF = 1 et OF = 0 ) ou ( SF = 0 et OF = 1 ) en résumé SF OF Jump When Equal : saut effectué si le résultat de la dernière opération est nul, soit ZF = 1 Jump When Less or Equal : SF OF or ZF = 1 Négations : Jump When Greater or Equal : SF = OF Jump When Not Equal : ZF = 0 Jump When Greater : SF = OF and ZF = 0 23 CS:APP
Sauts (suite et fin) Un saut est effectué en modifiant PC. Les instructions rrmovl, irmovl, rmmovl, mrmovl ne modifient jamais les codes de condition. Opérations logiques andl, xorl : ZF et SF ajustés selon résultat de l opération, OF = 0 (clear). andl %eax, %eax # ne modifie pas le registre je... # saut effectué si eax = 0 jl... # saut effectué si eax < 0 xorl %eax, %ebx je... # saut effectué si eax = ebx (mais ce test écrase ebx) 24 CS:APP
Miscellaneous Instructions nop 0 0 Don t do anything halt 1 0 Stop executing instructions IA32 has comparable instruction, but can t execute it in user mode We will use it to stop the simulator 25 CS:APP
Object Code Code for sum 0x401040 <sum>: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45 0x08 0x89 0xec 0x5d 0xc3 Total of 13 bytes Each instruction 1, 2, or 3 bytes Starts at address 0x401040 Assembler Linker Dans la vraie vie : Translates.s into.o Binary encoding of each instruction Nearly-complete image of executable code Missing linkages between code in different files Resolves references between files Combines with static run-time libraries E.g., code for malloc, printf Some libraries are dynamically linked Linking occurs when program begins execution 26 CS:APP
Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p text C program (p1.c p2.c) Compiler (gcc -S) text Asm program (p1.s p2.s) Assembler (gcc or as) binary binary Object program (p1.o p2.o) Linker (gcc or ld) Executable program (p) Static libraries (.a) 27 CS:APP
Disassembling Object Code Disassembled 00401040 <_sum>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 45 0c mov 0xc(%ebp),%eax 6: 03 45 08 add 0x8(%ebp),%eax 9: 89 ec mov %ebp,%esp b: 5d pop %ebp c: c3 ret d: 8d 76 00 lea 0x0(%esi),%esi Disassembler objdump -d p Useful tool for examining object code Analyzes bit pattern of series of instructions Produces approximate rendition of assembly code Can be run on either a.out (complete executable) or.o file 28 CS:APP
Pseudo code objet Simulateur Y86 Loop: Fichier source.ys Assembleur yas mrmovl (%ecx),%esi # get *Start addl %esi,%eax # add to sum irmovl $4,%ebx addl %ebx,%ecx # Start++ irmovl $-1,%ebx addl %ebx,%edx # Count-- jne Loop # Stop when 0 Fichier.yo 0x057: 506100000000 Loop: mrmovl (%ecx),%esi 0x05d: 6060 addl %esi,%eax 0x05f: 308304000000 irmovl $4,%ebx 0x065: 6031 addl %ebx,%ecx 0x067: 3083ffffffff irmovl $-1,%ebx 0x06d: 6032 addl %ebx,%edx 0x06f: 7457000000 jne Loop 29 CS:APP
Fichier.yo 0x057: 506100000000 Loop: mrmovl (%ecx),%esi 0x05d: 6060 addl %esi,%eax 0x05f: 308304000000 irmovl $4,%ebx 0x065: 6031 addl %ebx,%ecx 0x067: 3083ffffffff irmovl $-1,%ebx 0x06d: 6032 addl %ebx,%edx 0x06f: 7457000000 jne Loop 1. Etiquette = adresse calculée par l assembleur (vrai pour tout assembleur) 2. Fichier «exécuté» par un simulateur 30 CS:APP
CISC Instruction Sets Complex Instruction Set Computer Dominant style through mid-80 s Stack-oriented instruction set Use stack to pass arguments, save program counter Explicit push and pop instructions Arithmetic instructions can access memory addl %eax, 12(%ebx,%ecx,4) requires memory read and write Complex address calculation Condition codes Set as side effect of arithmetic and logical instructions Philosophy Add instructions to perform typical programming tasks 31 CS:APP
RISC Instruction Sets Reduced Instruction Set Computer Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley) Fewer, simpler instructions Might take more to get given task done Can execute them with small and fast hardware Register-oriented instruction set Many more (typically 32) registers Use for arguments, return pointer, temporaries Only load and store instructions can access memory Similar to Y86 mrmovl and rmmovl No Condition codes Test instructions return 0/1 in register 32 CS:APP
MIPS Registers 33 CS:APP
MIPS Instruction Examples R-R Op Ra Rb Rd 00000 Fn addu $3,$2,$1 # Register add: $3 = $2+$1 R-I Op Ra Rb Immediate addu $3,$2, 3145 # Immediate add: $3 = $2+3145 sll $3,$2,2 # Shift left: $3 = $2 << 2 Branch Op Ra Rb Offset beq $3,$2,dest # Branch when $3 = $2 Load/Store Op Ra Rb Offset lw $3,16($2) # Load Word: $3 = M[$2+16] sw $3,16($2) # Store Word: M[$2+16] = $3 34 CS:APP
CISC vs. RISC Original Debate Strong opinions! CISC proponents---easy for compiler, fewer code bytes RISC proponents---better for optimizing compilers, can make run fast with simple chip design Current Status For desktop processors, choice of ISA not a technical issue With enough hardware, can make anything run fast Code compatibility more important For embedded processors, RISC makes sense Smaller, cheaper, less power 35 CS:APP
Summary Y86 Instruction Set Architecture Similar state and instructions as IA32 Simpler encodings Somewhere between CISC and RISC How Important is ISA Design? Less now than before With enough hardware, can make almost anything go fast Intel is moving away from IA32 Does not allow enough parallel execution Introduced IA64» 64-bit word sizes (overcome address space limitations)» Radically different style of instruction set with explicit parallelism» Requires sophisticated compilers 36 CS:APP