Example of a high-end processor architecture: Intel Haswell
|
|
- Bertram Lambert
- 7 years ago
- Views:
Transcription
1 Example of a high-end processor architecture: Intel Haswell 1 Instruction Set Architecture Intel64 Instruction Set Architecture Can run in both 64-bit and 32-bit mode binary compatible with 32-bit IA-32 ISA existing applications can directly be executed without recompilation Binary compatible with the 16-bit 8086 processor (from 1978) with some limitations Very large CISC-like instruction set instruction encoding is very complex and irregular The execution mechanism (the microarchitecture) is very RISC-like load-store architecture machine instructions are translated into micro-operations (µops) immediately after they are fetched 2 1
2 General purpouse registers In 64-bit mode, the processor has bit 63 general-purpose integer registers RAX RBX in 32-bit mode it has 8 32-bit registers RCX RDX RBP Instruction pointer, RIP RSI RDI points to the next instruction to be RSP R8 executed R9 R10 Stack pointer, RSP R11 R12 points to top of stack R13 R14 R15 Base pointer, RBP RFLAGS points to data on the stack RIP Status flags, RFLAGS consists of status bits describing the current status of the processor Carry, Parity, Auxiliary Carry, Zero, Sign, Overflow, EAX EBX ECX EDX EBP ESI EDI ESP EFLAGS EIP 3 General purpouse registers (cont.) Can also refer to 32,16-and 8-bit parts of the registers 32-bit registers: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP EFLAGS, EIP 16-bit registers: AX, BX, CX, DX, SI, DI, BP, SP FLAGS, IP 8-bit registers: AH/AL, BH/BL, CH/CL, DH/DL EAX AX AH AL RAX EAX AX 4 2
3 Vector and floating-point registers bit registers for scalar floating-point and vector operations called YMM0 YMM15 The AVX2 extension is the latest vector extension, introduced in the Haswell microarchitecture AVX2 AVX SSE2 SSE MMX AVX-512 will further extend the vector registers to 512 bits There are also 8 80-bit floating-point registers in the x87 floating-point unit modern compilers do not use the x87 FPU for floating-point instructions instead uses scalar fp-operations on the vector registers 5 Floating point operation with the AVX unit The AVX vector unit is used for floating-point operations can operate on both scalar and vector data scalar operations are used for normal (non-vectorized) floating-point operations Scalar operation: x = x+y arithmetic instructions on one single floating-point value X3 Y3 X2 Y2 X1 Y1 X0 Y0 + Vector operation: X= X+Y X3 X2 X1 X0+Y0 arithmetic instructions on a short vector of 2, 4 or 8 floating-point values X3 X2 X1 X0 Y3 Y2 Y1 Y X3+Y3 X2+Y2 X1+Y1 X0+Y0 6 3
4 YMM registers bit YMM registers can hold scalar or vector values Independent of the general-purpose registers Can only be used for operations on data, not addresses Scalar floating-point instructions have a prefix that describes the size of the operand d = double-precision, s = single-precision Examples: YMM0 YMM1 YMM2 YMM3 YMM4 YMM5 YMM6 YMM7 YMM8 YMM9 YMM10 YMM11 YMM12 YMM13 YMM14 YMM15 mulss multiply scalar single-precision floating-point value mulsd multiply scalar double-precision floating-point value movss move scalar single-precision floating-point value Name of vector instructions have a byte p (stands for packed) Example: mulpd multiply packed double-precision floating point values Instruction set Very large CISC-like instruction set Instructions can roughly be divided into the following groups data transfer instructions (MOV, CMOV) copies data between registers or registers and memory data conversion (CBW, CWD) converts data between different formats arithmetic (ADD, SUB, MUL, IMUL, DIV, IDIV) rotate and shift (ROL, ROR, SAL, SAR) cyclic and non-cyclic shifts logical instructions (AND, OR, XOR, NOT) bitwise logical operations compare and test (CMP, TEST) compares values and sets bits in FLAGS control transfer instructions ( JMP, JZ, JNZ, CALL, RET) branches based on status bits i FLAGS miscellaneous instructions (NOP, CPUID, LEA) 8 4
5 Instruction format Instructions are encoded into binary opcodes of length between 1 and 15 bytes Prefixes Opcode ModR/M SIB Displacement Immediate up to four prefix bytes prefixes modify an instruction s default address or operand size, segment or invoke some special function of the operation 1 2 opcode bytes 1 byte optional ModRM (Mode-Register-Memory) and SIB (Scale-Index-Base) describes the registers and addressing mode used 1, 2 or 4 bytes displacement 1, 2 or 4 (or 8) bytes immediate Instructions can be between 1 and 15 bytes long commonly used instructions and instructions with fewer operands have a shorter encoding 9 Intel Haswell microarchitecture Introduced in 2013 designed to be scalable up to large numbers of cores highly flexible design that can support different market segments, including lowpower mobile devices Multicore design with 2-way hyper-threading available in versions with 2 to 18 cores 1.4 billion transistors, 22 nm process technology clock frequency up to 3.6 GHz advanced power management Can at most execute 8 µops each clock cycle however, typical pipeline throughput is 4 µops/cycle Improved memory bandwidth integrated on-chip memory controller 48 bit virtual addresses, 40 bit physical addresses L1 and L2 private for each core unified L3 cache shared by all cores 10 5
6 Pipeline organization stage pipeline depending on the instruction 4 instruction decoders µop cache holds decoded µops Reorder buffer, 192 µops Large register file 8 ports through which µops can be issued to the functional units 11 Instruction fetch and decode Instruction fetch unit can fetch 16 bytes of code per clock cycle from L1 instruction cache fetched x86 instructions are placed in 2 instruction queues of length 20 instructions, one for each thread 4 instruction decoders one for complex instructions (generating 1-4 fused µops / instruction) three for simple instructions (generating 1 µop / instruction) Instructions generating more than 4 µops are decoded from microcode Decoded instructions are placed in a queue of size 56 µops shared by both threads µop cache of size 1.5 K µops stores already decoded instruction acts like an additional L0 instruction cache, but holds decoded µops small loops can be executed without repeated decoding 12 6
7 Macro-op and micro-op fusion The instruction decoding uses two techniques that improve the instruction execution MacroOp-fusion commonly used sequences of two assembly-language instructions are decoded and combined into one single µop can be executed and retired as one single µop Example: CMP and Jcc (compare and branch conditionally) MicroOp-fusion two closely related µops are encoded into one micro-operation Example: ADD RAX, [MEM] # add the value in RAX to memory location MEM without micro-op fusion this would generate three µops: load [MEM], R10 add R10, RAX store RAX, [MEM] with micro-op fusion the load and add are combined into one single µop 13 Branch prediction The branch prediction mechanism has been improved in Haswell, but Intel does not publish any details about it Observations indicate that there are two branch prediction methods one that predicts branches for code executed from µop cache fast, but uses a rather small history buffer one for other branches, executed from instruction cache based on a larger branch target buffer, but slower the second-level predictor can handle much more complex patterns than the first level predictor Branch misprediction penalty is clock cycles The return address stack is of size
8 Register renaming The renaming mechanism maps architectural registers onto the physical register file renames physical registers to the internal register file also load/store operations are renamed eliminated name dependences some register-to-register moves can be executed by the renamer without using any functional unit Reorder buffer is of size 192 µops The Branch Order Buffer is used to resolve branch mispredictions contains information about last consistent architectural state Scheduler with 60 entries for all types of µops stores decoded and renamed operations that are waiting to be issued similar to the reservation stations when a µop is ready it is issued by the scheduler through a dispatch port to a functional unit 15 Execution ports Superscalar out-of-order instruction execution can issue at most 8 micro-ops in one clock cycle 4 µops can be retired per clock cycle The scheduler issues µops to the functional units through 8 execution ports Many operations with a long latency are pipelined to improve instruction throughput efficient support for vector execution (AVX2) most vector instructions have a throughput of one clock cycle Supports a new vectorized FMA instruction Fused Mutiply-Add: x = x + y*z 16 8
9 Execution units 8 execution ports some of the execution units are duplicated, can do two µops per clock cycle Two new ports added since the previous microarchitecture (Sandy Bridge) integer ALU and memory port The 8 oldest ready µops can be issued each clock cycle Both ports 0 and 1 can do a vector FMA each clock cycle doubles the number of fp operations per clock cycle, compared to Sandy Bridge can do 16 FLOPs per clock cycle 17 Load/store units Load and store buffers hold µops that do memory access issued by the scheduler to the load/store units 4 load/store units 3 address generation units (AGUs) for both load and store operatios 1 unit dedicated to store operations 2-level Translation Lookaside Buffer (TLB) L1 DTLB is only for data accesses (DTLB) can hold addresses for 4KB, 2MB and 1 GB pages 4-way set associative L2 TLB holds 1024 addresses 8-way set associative Improved access to unaligned data Advanced memory prefetch recognizes address patterns and prefetches both instructions and data into L1 or L2 cache 18 9
10 Cache hierarchy Private L1 cache for each core L1 instruction cache 32 KB, 8-way set associative L1 data cache, 32 KB, 8-way set associative, writeback latency 4 clock cycles Unified L2 cache for each core 256 KB, 8-way set associative, writeback access time 11 clock cycles Cache line size 64 bytes Up to 8 MB L3 cache, shared by all cores 16 way set associative access time about 30 clock cycles 19 Power management Advanced power control built-in real-time sensors for temperature, current and power also uses information about OS requests for each core uses this information to make decisions about when a core can be powered down individual cores can be shut down to save power Turbo boost technology automatically increases the clock frequency when the system detects that it is running below its power limit based on information about nr. of active cores, type of workload, current consumption, power consumption and temperature dynamic on-demand overclocking 20 10
More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
More informationOverview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX
Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy
More informationLecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com
CSCI-UA.0201-003 Computer Systems Organization Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Some slides adapted (and slightly modified)
More informationIntel 8086 architecture
Intel 8086 architecture Today we ll take a look at Intel s 8086, which is one of the oldest and yet most prevalent processor architectures around. We ll make many comparisons between the MIPS and 8086
More information64-Bit NASM Notes. Invoking 64-Bit NASM
64-Bit NASM Notes The transition from 32- to 64-bit architectures is no joke, as anyone who has wrestled with 32/64 bit incompatibilities will attest We note here some key differences between 32- and 64-bit
More informationAdvanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
More informationComputer Organization and Architecture
Computer Organization and Architecture Chapter 11 Instruction Sets: Addressing Modes and Formats Instruction Set Design One goal of instruction set design is to minimize instruction length Another goal
More informationIntel Pentium 4 Processor on 90nm Technology
Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended
More informationPentium vs. Power PC Computer Architecture and PCI Bus Interface
Pentium vs. Power PC Computer Architecture and PCI Bus Interface CSE 3322 1 Pentium vs. Power PC Computer Architecture and PCI Bus Interface Nowadays, there are two major types of microprocessors in the
More informationComputer Organization and Components
Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides
More informationCS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08
CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 20: Stack Frames 7 March 08 CS 412/413 Spring 2008 Introduction to Compilers 1 Where We Are Source code if (b == 0) a = b; Low-level IR code
More informationHacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail
Hacking Techniques & Intrusion Detection Ali Al-Shemery arabnix [at] gmail All materials is licensed under a Creative Commons Share Alike license http://creativecommonsorg/licenses/by-sa/30/ # whoami Ali
More informationAdministration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers
CS 4 Introduction to Compilers ndrew Myers Cornell University dministration Prelim tomorrow evening No class Wednesday P due in days Optional reading: Muchnick 7 Lecture : Instruction scheduling pr 0 Modern
More informationVLIW Processors. VLIW Processors
1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW
More informationx64 Cheat Sheet Fall 2015
CS 33 Intro Computer Systems Doeppner x64 Cheat Sheet Fall 2015 1 x64 Registers x64 assembly code uses sixteen 64-bit registers. Additionally, the lower bytes of some of these registers may be accessed
More informationADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
More informationInstruction Set Architecture (ISA)
Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine
More informationInstruction Set Architecture
Instruction Set Architecture Consider x := y+z. (x, y, z are memory variables) 1-address instructions 2-address instructions LOAD y (r :=y) ADD y,z (y := y+z) ADD z (r:=r+z) MOVE x,y (x := y) STORE x (x:=r)
More informationSoftware Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x
Software Pipelining for (i=1, i
More informationCS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of
CS:APP Chapter 4 Computer Architecture Wrap-Up William J. Taffe Plymouth State University using the slides of Randal E. Bryant Carnegie Mellon University Overview Wrap-Up of PIPE Design Performance analysis
More informationThe IA-32 processor architecture
The IA-32 processor architecture Nicholas FitzRoy-Dale Document Revision: 1 Date: 2006/05/30 22:31:24 nfd@cse.unsw.edu.au http://www.cse.unsw.edu.au/ disy/ Operating Systems and Distributed Systems Group
More informationCPU Organization and Assembly Language
COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:
More informationExploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager
Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor Travis Lanier Senior Product Manager 1 Cortex-A15: Next Generation Leadership Cortex-A class multi-processor
More informationIntel 64 and IA-32 Architectures Software Developer s Manual
Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of seven volumes: Basic Architecture,
More informationPutting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719
Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo
More informationInstruction Set Architecture. or How to talk to computers if you aren t in Star Trek
Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture
More informationINSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER
Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano
More informationCentral Processing Unit (CPU)
Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following
More informationEE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution
More informationCHAPTER 6 TASK MANAGEMENT
CHAPTER 6 TASK MANAGEMENT This chapter describes the IA-32 architecture s task management facilities. These facilities are only available when the processor is running in protected mode. 6.1. TASK MANAGEMENT
More informationCPU Session 1. Praktikum Parallele Rechnerarchtitekturen. Praktikum Parallele Rechnerarchitekturen / Johannes Hofmann April 14, 2015 1
CPU Session 1 Praktikum Parallele Rechnerarchtitekturen Praktikum Parallele Rechnerarchitekturen / Johannes Hofmann April 14, 2015 1 Overview Types of Parallelism in Modern Multi-Core CPUs o Multicore
More information"JAGUAR AMD s Next Generation Low Power x86 Core. Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012
"JAGUAR AMD s Next Generation Low Power x86 Core Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012 TWO X86 CORES TUNED FOR TARGET MARKETS Mainstream Client and Server Markets Bulldozer
More informationPART B QUESTIONS AND ANSWERS UNIT I
PART B QUESTIONS AND ANSWERS UNIT I 1. Explain the architecture of 8085 microprocessor? Logic pin out of 8085 microprocessor Address bus: unidirectional bus, used as high order bus Data bus: bi-directional
More informationFaculty of Engineering Student Number:
Philadelphia University Student Name: Faculty of Engineering Student Number: Dept. of Computer Engineering Final Exam, First Semester: 2012/2013 Course Title: Microprocessors Date: 17/01//2013 Course No:
More informationIntel Architecture Software Developer s Manual
Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number 243190;
More information5. Calling conventions for different C++ compilers and operating systems
5. Calling conventions for different C++ compilers and operating systems By Agner Fog. Technical University of Denmark. Copyright 2004-2014. Last updated 2014-08-07. Contents 1 Introduction... 3 2 The
More informationMACHINE ARCHITECTURE & LANGUAGE
in the name of God the compassionate, the merciful notes on MACHINE ARCHITECTURE & LANGUAGE compiled by Jumong Chap. 9 Microprocessor Fundamentals A system designer should consider a microprocessor-based
More informationCHAPTER 4 MARIE: An Introduction to a Simple Computer
CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks
More informationPROBLEMS #20,R0,R1 #$3A,R2,R4
506 CHAPTER 8 PIPELINING (Corrisponde al cap. 11 - Introduzione al pipelining) PROBLEMS 8.1 Consider the following sequence of instructions Mul And #20,R0,R1 #3,R2,R3 #$3A,R2,R4 R0,R2,R5 In all instructions,
More informationInstruction Set Design
Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,
More informationLSN 2 Computer Processors
LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2
More informationGiving credit where credit is due
CSCE 230J Computer Organization Processor Architecture VI: Wrap-Up Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce230j Giving credit where credit is due ost of slides for
More informationSoftware implementation of Post-Quantum Cryptography
Software implementation of Post-Quantum Cryptography Peter Schwabe Radboud University Nijmegen, The Netherlands October 20, 2013 ASCrypto 2013, Florianópolis, Brazil Part I Optimizing cryptographic software
More informationİSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
More information612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap. 12 - Famiglie di processori) PROBLEMS
612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap. 12 - Famiglie di processori) PROBLEMS 11.1 How is conditional execution of ARM instructions (see Part I of Chapter 3) related to predicated execution
More informationMachine Programming II: Instruc8ons
Machine Programming II: Instrucons Move instrucons, registers, and operands Complete addressing mode, address computaon (leal) Arithmec operaons (including some x6 6 instrucons) Condion codes Control,
More informationX86-64 Architecture Guide
X86-64 Architecture Guide For the code-generation project, we shall expose you to a simplified version of the x86-64 platform. Example Consider the following Decaf program: class Program { int foo(int
More informationIA-32 Intel Architecture Software Developer s Manual
IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number
More informationInstruction Set Architecture
CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant adapted by Jason Fritts http://csapp.cs.cmu.edu CS:APP2e Hardware Architecture - using Y86 ISA For learning aspects
More informationA Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
More informationUnpacked BCD Arithmetic. BCD (ASCII) Arithmetic. Where and Why is BCD used? From the SQL Server Manual. Packed BCD, ASCII, Unpacked BCD
BCD (ASCII) Arithmetic The Intel Instruction set can handle both packed (two digits per byte) and unpacked BCD (one decimal digit per byte) We will first look at unpacked BCD Unpacked BCD can be either
More informationAndreas Herrmann. AMD Operating System Research Center
Myth and facts about 64-bit Linux Andreas Herrmann André Przywara AMD Operating System Research Center March 2nd, 2008 Myths... You don't need 64-bit software with less than 3 GB RAM. There are less drivers
More informationSolution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
More informationBEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single
More information2
1 2 3 4 5 For Description of these Features see http://download.intel.com/products/processor/corei7/prod_brief.pdf The following Features Greatly affect Performance Monitoring The New Performance Monitoring
More informationBindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27
Logistics Week 1: Wednesday, Jan 27 Because of overcrowding, we will be changing to a new room on Monday (Snee 1120). Accounts on the class cluster (crocus.csuglab.cornell.edu) will be available next week.
More informationFLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015
FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 AGENDA The Kaveri Accelerated Processing Unit (APU) The Graphics Core Next Architecture and its Floating-Point Arithmetic
More informationCS61: Systems Programing and Machine Organization
CS61: Systems Programing and Machine Organization Fall 2009 Section Notes for Week 2 (September 14 th - 18 th ) Topics to be covered: I. Binary Basics II. Signed Numbers III. Architecture Overview IV.
More informationAbysssec Research. 1) Advisory information. 2) Vulnerable version
Abysssec Research 1) Advisory information Title Version Discovery Vendor Impact Contact Twitter CVE : Apple QuickTime FlashPix NumberOfTiles Remote Code Execution Vulnerability : QuickTime player 7.6.5
More informationA Tiny Guide to Programming in 32-bit x86 Assembly Language
CS308, Spring 1999 A Tiny Guide to Programming in 32-bit x86 Assembly Language by Adam Ferrari, ferrari@virginia.edu (with changes by Alan Batson, batson@virginia.edu and Mike Lack, mnl3j@virginia.edu)
More informationProperty of ISA vs. Uarch?
More ISA Property of ISA vs. Uarch? ADD instruction s opcode Number of general purpose registers Number of cycles to execute the MUL instruction Whether or not the machine employs pipelined instruction
More informationSystems Design & Programming Data Movement Instructions. Intel Assembly
Intel Assembly Data Movement Instruction: mov (covered already) push, pop lea (mov and offset) lds, les, lfs, lgs, lss movs, lods, stos ins, outs xchg, xlat lahf, sahf (not covered) in, out movsx, movzx
More informationa storage location directly on the CPU, used for temporary storage of small amounts of data during processing.
CS143 Handout 18 Summer 2008 30 July, 2008 Processor Architectures Handout written by Maggie Johnson and revised by Julie Zelenski. Architecture Vocabulary Let s review a few relevant hardware definitions:
More informationCS:APP Chapter 4 Computer Architecture Instruction Set Architecture. CS:APP2e
CS:APP Chapter 4 Computer Architecture Instruction Set Architecture CS:APP2e Instruction Set Architecture Assembly Language View Processor state Registers, memory, Instructions addl, pushl, ret, How instructions
More informationGenerations of the computer. processors.
. Piotr Gwizdała 1 Contents 1 st Generation 2 nd Generation 3 rd Generation 4 th Generation 5 th Generation 6 th Generation 7 th Generation 8 th Generation Dual Core generation Improves and actualizations
More informationOC By Arsene Fansi T. POLIMI 2008 1
IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5
More information<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing
T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing Robert Golla Senior Hardware Architect Paul Jordan Senior Principal Hardware Engineer Oracle
More informationReturn-oriented programming without returns
Faculty of Computer Science Institute for System Architecture, Operating Systems Group Return-oriented programming without urns S. Checkoway, L. Davi, A. Dmitrienko, A. Sadeghi, H. Shacham, M. Winandy
More informationIntroduction to GPU Architecture
Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three
More informationRadeon HD 2900 and Geometry Generation. Michael Doggett
Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command
More informationHigh-speed image processing algorithms using MMX hardware
High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to
More informationAlgorithms of Scientific Computing II
Technische Universität München WS 2010/2011 Institut für Informatik Prof. Dr. Hans-Joachim Bungartz Alexander Heinecke, M.Sc., M.Sc.w.H. Algorithms of Scientific Computing II Exercise 4 - Hardware-aware
More informationComputer Architectures
Computer Architectures 2. Instruction Set Architectures 2015. február 12. Budapest Gábor Horváth associate professor BUTE Dept. of Networked Systems and Services ghorvath@hit.bme.hu 2 Instruction set architectures
More information18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013
18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 Reminder: Homeworks for Next Two Weeks Homework 0 Due next Wednesday (Jan 23), right
More informationIA-32 Intel Architecture Software Developer s Manual
IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number
More informationCHAPTER 7: The CPU and Memory
CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
More informationSPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers
X: Fujitsu s New Generation 16 Processor for the next generation UNIX servers August 29, 2012 Takumi Maruyama Processor Development Division Enterprise Server Business Unit Fujitsu Limited All Rights Reserved,Copyright
More informationComplete 8086 instruction set
Page 1 of 53 Complete 8086 instruction set Quick reference: AAA AAD AAM AAS ADC ADD AND CALL CBW CLC CLD CLI CMC CMP CMPSB CMPSW CWD DAA DAS DEC DIV HLT IDIV IMUL IN INC INT INTO I JA JAE JB JBE JC JCXZ
More informationIntel Xeon Processor E5-2600
Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset
More informationProcessor Architectures
ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture
More informationSPARC64 VIIIfx: CPU for the K computer
SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS
More informationWhere we are CS 4120 Introduction to Compilers Abstract Assembly Instruction selection mov e1 , e2 jmp e cmp e1 , e2 [jne je jgt ] l push e1 call e
0/5/03 Where we are CS 0 Introduction to Compilers Ross Tate Cornell University Lecture 8: Instruction Selection Intermediate code synta-directed translation reordering with traces Canonical intermediate
More informationMICROPROCESSOR AND MICROCOMPUTER BASICS
Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit
More informationCS352H: Computer Systems Architecture
CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Data Hazards in ALU Instructions
More informationNotes on x86-64 programming
Notes on x86-64 programming This document gives a brief summary of the x86-64 architecture and instruction set. It concentrates on features likely to be useful to compiler writing. It makes no aims at
More informationComputer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive
More informationThe 80x86 Instruction Set
Thi d t t d ith F M k 4 0 2 The 80x86 Instruction Set Chapter Six Until now, there has been little discussion of the instructions available on the 80x86 microprocessor. This chapter rectifies this situation.
More informationCPU performance monitoring using the Time-Stamp Counter register
CPU performance monitoring using the Time-Stamp Counter register This laboratory work introduces basic information on the Time-Stamp Counter CPU register, which is used for performance monitoring. The
More informationIntroduction to RISC Processor. ni logic Pvt. Ltd., Pune
Introduction to RISC Processor ni logic Pvt. Ltd., Pune AGENDA What is RISC & its History What is meant by RISC Architecture of MIPS-R4000 Processor Difference Between RISC and CISC Pros and Cons of RISC
More informationThe Microarchitecture of the Pentium 4 Processor
The Microarchitecture of the Pentium 4 Processor Glenn Hinton, Desktop Platforms Group, Intel Corp. Dave Sager, Desktop Platforms Group, Intel Corp. Mike Upton, Desktop Platforms Group, Intel Corp. Darrell
More informationIn the Beginning... 1964 -- The first ISA appears on the IBM System 360 In the good old days
RISC vs CISC 66 In the Beginning... 1964 -- The first ISA appears on the IBM System 360 In the good old days Initially, the focus was on usability by humans. Lots of user-friendly instructions (remember
More informationMachine-Level Programming I: Basics
Machine-Level Programming I: Basics 15-213/18-213: Introduction to Computer Systems 5 th Lecture, May 25, 2016 Instructor: Brian Railing 1 Today: Machine Programming I: Basics History of Intel processors
More informationl C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language
198:211 Computer Architecture Topics: Processor Design Where are we now? C-Programming A real computer language Data Representation Everything goes down to bits and bytes Machine representation Language
More informationMachine-Level Programming II: Arithmetic & Control
Mellon Machine-Level Programming II: Arithmetic & Control 15-213 / 18-213: Introduction to Computer Systems 6 th Lecture, Jan 29, 2015 Instructors: Seth Copen Goldstein, Franz Franchetti, Greg Kesden 1
More informationIMAGE SIGNAL PROCESSING PERFORMANCE ON 2 ND GENERATION INTEL CORE MICROARCHITECTURE PRESENTATION PETER CARLSTON, EMBEDDED & COMMUNICATIONS GROUP
IMAGE SIGNAL PROCESSING PERFORMANCE ON 2 ND GENERATION INTEL CORE MICROARCHITECTURE PRESENTATION PETER CARLSTON, EMBEDDED & COMMUNICATIONS GROUP Q3 2011 325877-001 1 Legal Notices and Disclaimers INFORMATION
More informationIA-32 Intel Architecture Software Developer s Manual
IA-32 Intel Architecture Software Developer s Manual Volume 2B: Instruction Set Reference, N-Z NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of four volumes: Basic Architecture,
More informationAMD PhenomII. Architecture for Multimedia System -2010. Prof. Cristina Silvano. Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923
AMD PhenomII Architecture for Multimedia System -2010 Prof. Cristina Silvano Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923 Outline Introduction Features Key architectures References AMD Phenom
More informationPerformance Analysis Guide for Intel Core i7 Processor and Intel Xeon 5500 processors By Dr David Levinthal PhD. Version 1.0
Performance Analysis Guide for Intel Core i7 Processor and Intel Xeon 5500 processors By Dr David Levinthal PhD. Version 1.0 1 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.
More informationArchitecture of Hitachi SR-8000
Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data
More information