How To Understand The Design Of A Microprocessor
|
|
|
- Pierce McKinney
- 5 years ago
- Views:
Transcription
1 Computer Architecture R. Poss 1
2 What is computer architecture? 2
3 Your ideas and expectations What is part of computer architecture, what is not? Who are computer architects, what is their job? What is the role of computer architecture in science? Why do you need to know about computer architecture? 3
4 Vocabulary List words/concepts that you want explained during this course: About the hardware/software interface About hardware components About choices in design About how to be a better programmer etc. - anything you think is relevant 4
5 An engineering domain layers of composition and complexity: from parts to whole refined matter electronics logic circuits components platforms systems petro-chemicals packaging Backplanes Processors metals metal oxides GaA/Si/SiGe/SiC crystals semiconductors magnetic substrates CMOS NMOS ELECTRONIC ENGINEERING You are here The hidden partner activity: compilers, operating systems Links Functional Units Storage Memories Caches Networks COMPUTER ARCHITECTURE Algorithms Frameworks Operating software SOFTWARE ENGINEERING Computing platforms Software programs Computational clusters Embedded systems Personal computers Game consoles SYSTEMS ARCHITECTURE 5
6 Current state of affairs Power vs chip area: before: power free, transistors expensive now: power expensive, transistors cheap Storage vs computation: before: computation slow, storage fast now: storage slow, computation fast Computation vs storage cost: before: small storage, ok to compute more to save space (eg compression) now: large storage, expensive to compute more 6
7 Trends - the free lunch is over This is the story of uniprocessor performance This is the power wall This is the memory wall + seq. performance wall 7
8 Current state of affairs (cont) A dramatic change in processor chips: Memory wall: processors much faster than memories Power wall: can t power all transistors lest the chip will fry Sequential performance wall: more transistors don t help sequential performance any more This course will suggest how we got here, why these problems are happening and what we can do about it 8
9 Example questions You are in charge of selecting a processor chip for a new web server. You have a choice between two chips, a 1-core running at 2GHz and a 2-core running at 1 GHz. They use the same core microarchitecture, have nearly the same price. How do you choose? Why? The web server will support encrypted SSL connections. You can choose a 4-core processor, 4MB of cache on chip with a cryptographic accelerator with 4 channels. Or you can choose a 4-core processor with 8MB of cache on chip but no accelerator. How do you choose? 9
10 Course & organization 10
11 Aims of this course The aims of this course are: to introduce the notion of hardware/software interface and variations in ISA design to give a thorough understanding of modern microprocessor design and related issues to introduce parallelism in computer architecture to introduce simulators & architecture models 11
12 Bibliography The main course text is: Computer architecture - a quantitative approach, Hennessy & Patterson, 4th Edition, ISBN Other useful texts are: Processor Architecture, Silc, Robic and Ungerer, Springer, ISBN D. Sima, T. Fountain and P. Kacsuk, Advanced Computer Architecture a Design space approach (Addison-Wesley) Your own search-fu -- use Google Scholar! 12
13 Assessment Assessment will be by coursework assignments and exam The following weighting will be used Homework 20% Exam 30% Lab assignments 50% Assignment labs start on Sep 3rd, supervised by M. van Drunen, R. Ooms and F. Treurniet Assignment assessment will be based on demonstration to the lab supervisors (60%) and a brief report (40%) to me on the observations of your results, deadline in details 13
14 Assessment Lab1 Lab2 Lab3 Exam Homework Lab1 10% Lab2 20% Lab3 20% Homework 20% Exam 30% 14
15 Communication & contact Please use the mailing list Prefer group discussions to one-on-one interactions There are no stupid questions, don t be ashamed However, try to find if your question has already been answered first 15
16 Course overview The following topics will be covered in a bottom-up approach to the subject 1. Hardware/software interface, ISAs 2. Processors & implicit concurrency Pipelined processors. superscalar microprocessors 3. Memory, caches, interconnects & topology 4. Explicit concurrency & parallelism in systems Vector processors, VLIW, multi-cores, hardware multi-threading 5. Co-processors and accelerators 16
17 Getting started 17
18 Contribution of Comp. Arch. Quantitative principles of design Take advantage of parallelism Principle of locality Focus on the common case Amdahl s laws Careful, quantitative comparison: define, quantify, summarize Anticipating and exploiting advances in technology Well-defined interfaces, carefully implemented and thoroughly defined 18
19 Parallelism Three main strategies: Increase bandwidth and throughput by duplicating storage and data paths Use pipelining, ie assembly line Perform operations out of order, including simultaneously Fundamental limits: pipeline hazards time and data dependencies = mandatory order 19
20 Locality Principle: individual programs access a relatively small portion of memory in a small amount of time Two different types: Temporal locality: if an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) Spatial locality: if an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straight-line code, array access) Caches are a fundamental mechanism to take advantage of locality 20
21 Memory hierarchy and latency Registers - on-chip SRAM L1 cache - on-chip SRAM L2 cache - on-chip SRAM off-chip SRAM L3 cache - off-chip SRAM Main memory - DRAM Distributed memory 1/cycle time 100MHz clocks < GHz clocks < Size 21
22 Focus on the common case In making a design trade-off, favor the frequent case over the infrequent case E.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it 1st E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st Frequent case is often simpler and can be done faster than the infrequent case What is frequent case and how much performance improved by making case faster => Amdahl s Law 22
23 Amdahl s law on speedup Consider a computation P which contains two parts A and B in sequence A can be enhanced (eg more parallelism, more performance); B cannot T(P) = T(A) + T(B) ( T = time to complete ) Imagine we can accelerate A infinitely so that T(A) becomes 0 Intuitively: overall speedup is limited by T(B) If the complexity ratio between A and B is P[A/B] (proportion), and A can be accelerated by a factor SA (speedup), Amdalh s law says: Soverall = 1 / ( (1 - P[A/B]) + (P[A/B] / SA) ) 23
24 Amdahl s law example An algorithm contains a sequential section and a parallel section The parallel section contains 20% of the computation steps (P=0.2) The parallel section can be accelerated by a factor N by using N processors/cores Maximum speedup with N cores = 1 / ( ( 1-0.2) + (0.2 / N) ) With N = 100, speedup = 1.24X (100 cores, yet only 24% perf increase!) This is the fundamental limit to parallelism: to maximize performance gains, need to first increase the proportion of the parallel section. 24
25 Amdahl s law on design A balanced system design should provision 1 bit per second of external bandwidth for each potential instruction per second Too little external bandwidth: I/O bound Too little instructions/second: compute-bound Desktop computers are traditionally I/O bound Mainframes are the other way around Multi-cores require huge amount of bandwidth to stay balanced 25
26 Lab assignments 26
27 Lab assignments 1st series: getting to know the hardware/software interface, make your own MIPS code 2nd series: implement your own MIPS-like ISA in a simulator/emulator 3rd series: implement your own cache protocol 4th series: implement your own branch predictor Total: 50% of final grade; First series: 10% of final grade; 2nd: 15%, 3rd: 15%, 4th: 10% 27
28 How do programs run? General computer model: processor + memory + interconnect + I/O devices Software is just bits, so is data How does software translate into behavior? ie. communication, computation and control? Your take here 28
29 Computers and interpreters A computer processes inputs and produces output Both inputs and outputs are just bits of data A universal computer also reads what to do (program) as data - it interprets the program code step by step as instructions Software = data = program code + input NB: The behavior of software comes from the machine that interprets it 29
30 Multiple layers of interpreters Java bytecode and real hardware: Java VM is an interpreter for Java bytecode Hardware processor is an interpreter for Java VM program code Python bytecode and real hardware: CPython is an interpreter for Python bytecode Hardware processor is an interpreter for CPython System simulators/emulators and real hardware MGSim is an interpreter for Alpha/SPARC/MIPS program code Hardware processor is an interpreter for MGSim 30
31 System initialization What happens when you switch the computer on? Define/explain the relationships between: Reset signal Initial program counter Boot ROM Boot code Disks I/O interface CPU Operating system Start-up storage ROM RAM 31
32 What does this code do? sum: L3: L2:.ent sum mov $31,$0 ble $16,L2 mov $31,$1 ldl $2,0($17) addl $2,$0,$0 addl $1,1,$1 lda $17,4($17) cmpeq $1,$16,$2 beq $2,L3 ret $31,($26),1.end sum $31 is a special register zero Alpha assembly uses left-to-right operands - except for ldx and br/jsr/ret Function arguments passed in $16-$21 ble = branch if lower or equal lda = load address, a form of add $26 is also called ra for return address 32
33 What does this code do? sum: L3: L2:.ent sum mov $rz,$0 ble $a0,l2 mov $rz,$1 ldl $2,0($a1) addl $2,$0,$0 addl $1,1,$1 lda $a1,4($a1) cmpeq $1,$a0,$2 beq $2,L3 ret $rz,($ra),1.end sum $a0 is an alias for the concrete register name $16 The assembler translates the former to the latter automatically ret A, (B) means place the current PC in A, then jump to the address in B 33
34 How did we get this code? C source for sum.c: int sum(int n, int x[]) { int s = 0; for (int i = 0; i < n; ++i) s += x[i]; return s; } Compile: alpha-cc -S -o sum.s sum.c Assemble: alpha-as -o sum.o sum.c Link: alpha-ld -o demo sum.o... 34
35 Let s run this You probably don t have an Alpha processor at hand First, let s try to compile/assemble/link/run natively eg. using the x86 instruction set But this course is about RISC - the example uses the Alpha instruction set so let s use an emulator instead! Emulator = program running on processor A that interprets program code made for another processor B Simulator = program that mimics the behavior of another system 35
36 Why simulators? Why not native? This course will talk about the processor(s) in your desktops/ laptop machines But all x86 processors are really RISC under the hood More useful to study RISC to understand the main problems Also: for your lab assignments you will modify an architecture and study its parameters Easier with a simulator than real hardware! 36
37 Intro to MGSim Developed at the University of Amsterdam Purpose is to simulate multi-cores and do architecture research Simulates: cores, memory, interconnect, I/O devices i.e. it is a full-system simulator It emulates the Alpha and SPARC instruction sets you will add support for MIPS in your lab assignments 37
38 Intro to MGSim See the manual page mgsimdoc(7) System diagram for minisim.ini: bootrom cpu0 memory command to start: mgsim -c minisim.ini yourprogram.bin Add -i for an interactive prompt 38
39 Summary Seen today: What is computer architecture and why it is important Some general principles of architecture Intro to the hardware/software interface, machine instructions and system initialization How to compile/assemble/link/run a program in a simulated environment 39
Introducción. Diseño de sistemas digitales.1
Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1
İSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
EE361: Digital Computer Organization Course Syllabus
EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)
Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.
Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how
Processor Architectures
ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture
Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek
Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture
Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan
Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software
Architectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
SOC architecture and design
SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external
A Lab Course on Computer Architecture
A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,
what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo
Computer Systems Structure Main Memory Organization
Computer Systems Structure Main Memory Organization Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Storage/Memory
Performance evaluation
Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1 Bibliography and evaluation Bibliography
In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
Instruction Set Design
Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,
picojava TM : A Hardware Implementation of the Java Virtual Machine
picojava TM : A Hardware Implementation of the Java Virtual Machine Marc Tremblay and Michael O Connor Sun Microelectronics Slide 1 The Java picojava Synergy Java s origins lie in improving the consumer
Quiz for Chapter 1 Computer Abstractions and Technology 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,
CHAPTER 7: The CPU and Memory
CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
Enabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
Computer Organization and Components
Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides
Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit
Unit A451: Computer systems and programming Section 2: Computing Hardware 1/5: Central Processing Unit Section Objectives Candidates should be able to: (a) State the purpose of the CPU (b) Understand the
ADVANCED COMPUTER ARCHITECTURE
ADVANCED COMPUTER ARCHITECTURE Marco Ferretti Tel. Ufficio: 0382 985365 E-mail: [email protected] Web: www.unipv.it/mferretti, eecs.unipv.it 1 Course syllabus and motivations This course covers the
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
Enabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
LSN 2 Computer Processors
LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2
COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING
COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.
Introduction to RISC Processor. ni logic Pvt. Ltd., Pune
Introduction to RISC Processor ni logic Pvt. Ltd., Pune AGENDA What is RISC & its History What is meant by RISC Architecture of MIPS-R4000 Processor Difference Between RISC and CISC Pros and Cons of RISC
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University [email protected].
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University [email protected] Review Computers in mid 50 s Hardware was expensive
COMPUTER HARDWARE. Input- Output and Communication Memory Systems
COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)
Computer Organization
Computer Organization and Architecture Designing for Performance Ninth Edition William Stallings International Edition contributions by R. Mohan National Institute of Technology, Tiruchirappalli PEARSON
Introduction to Microprocessors
Introduction to Microprocessors Yuri Baida [email protected] [email protected] October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?
Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?
Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and
on an system with an infinite number of processors. Calculate the speedup of
1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements
OC By Arsene Fansi T. POLIMI 2008 1
IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5
The Central Processing Unit:
The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Objectives Identify the components of the central processing unit and how they work together and interact with memory Describe how
Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. [email protected]. MemoryHierarchy- 1
Hierarchy Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN [email protected] Hierarchy- 1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor
Design Cycle for Microprocessors
Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types
VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU
VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU Martin Straka Doctoral Degree Programme (1), FIT BUT E-mail: [email protected] Supervised by: Zdeněk Kotásek E-mail: [email protected]
Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27
Logistics Week 1: Wednesday, Jan 27 Because of overcrowding, we will be changing to a new room on Monday (Snee 1120). Accounts on the class cluster (crocus.csuglab.cornell.edu) will be available next week.
Unit 4: Performance & Benchmarking. Performance Metrics. This Unit. CIS 501: Computer Architecture. Performance: Latency vs.
This Unit CIS 501: Computer Architecture Unit 4: Performance & Benchmarking Metrics Latency and throughput Speedup Averaging CPU Performance Performance Pitfalls Slides'developed'by'Milo'Mar0n'&'Amir'Roth'at'the'University'of'Pennsylvania'
CSEE W4824 Computer Architecture Fall 2012
CSEE W4824 Computer Architecture Fall 2012 Lecture 2 Performance Metrics and Quantitative Principles of Computer Design Luca Carloni Department of Computer Science Columbia University in the City of New
Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng
Slide Set 8 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section
Chapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
Computer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
1. Memory technology & Hierarchy
1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency
Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance
What You Will Learn... Computers Are Your Future Chapter 6 Understand how computers represent data Understand the measurements used to describe data transfer rates and data storage capacity List the components
Operating Systems 4 th Class
Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science
Thread level parallelism
Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process
CISC, RISC, and DSP Microprocessors
CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:
VLIW Processors. VLIW Processors
1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW
Operating System Impact on SMT Architecture
Operating System Impact on SMT Architecture The work published in An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture, Josh Redstone et al., in Proceedings of the 9th
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] NIOS II 1 1 What is Nios II? Altera s Second Generation
Chapter 2 Heterogeneous Multicore Architecture
Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is
Computer organization
Computer organization Computer design an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + datapath Control = finite state machine inputs
Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level
System: User s View System Components: High Level View Input Output 1 System: Motherboard Level 2 Components: Interconnection I/O MEMORY 3 4 Organization Registers ALU CU 5 6 1 Input/Output I/O MEMORY
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution
Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?
Management Challenge Managing Hardware Assets What computer processing and storage capability does our organization need to handle its information and business transactions? What arrangement of computers
7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
Week 1 out-of-class notes, discussions and sample problems
Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
COMPUTER SCIENCE AND ENGINEERING - Microprocessor Systems - Mitchell Aaron Thornton
MICROPROCESSOR SYSTEMS Mitchell Aaron Thornton, Department of Electrical and Computer Engineering, Mississippi State University, PO Box 9571, Mississippi State, MS, 39762-9571, United States. Keywords:
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
High Performance Processor Architecture. André Seznec IRISA/INRIA ALF project-team
High Performance Processor Architecture André Seznec IRISA/INRIA ALF project-team 1 2 Moore s «Law» Nb of transistors on a micro processor chip doubles every 18 months 1972: 2000 transistors (Intel 4004)
OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING
OBJECTIVE ANALYSIS WHITE PAPER MATCH ATCHING FLASH TO THE PROCESSOR Why Multithreading Requires Parallelized Flash T he computing community is at an important juncture: flash memory is now generally accepted
Chapter 2 Logic Gates and Introduction to Computer Architecture
Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are
Program Optimization for Multi-core Architectures
Program Optimization for Multi-core Architectures Sanjeev K Aggarwal ([email protected]) M Chaudhuri ([email protected]) R Moona ([email protected]) Department of Computer Science and Engineering, IIT Kanpur
More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
Hardware/Software Co-Design of a Java Virtual Machine
Hardware/Software Co-Design of a Java Virtual Machine Kenneth B. Kent University of Victoria Dept. of Computer Science Victoria, British Columbia, Canada [email protected] Micaela Serra University of Victoria
CSE 141L Computer Architecture Lab Fall 2003. Lecture 2
CSE 141L Computer Architecture Lab Fall 2003 Lecture 2 Pramod V. Argade CSE141L: Computer Architecture Lab Instructor: TA: Readers: Pramod V. Argade ([email protected]) Office Hour: Tue./Thu. 9:30-10:30
Memory Systems. Static Random Access Memory (SRAM) Cell
Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled
CSC 2405: Computer Systems II
CSC 2405: Computer Systems II Spring 2013 (TR 8:30-9:45 in G86) Mirela Damian http://www.csc.villanova.edu/~mdamian/csc2405/ Introductions Mirela Damian Room 167A in the Mendel Science Building [email protected]
Parallelism and Cloud Computing
Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication
A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications
1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture
IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus
Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,
Price/performance Modern Memory Hierarchy
Lecture 21: Storage Administration Take QUIZ 15 over P&H 6.1-4, 6.8-9 before 11:59pm today Project: Cache Simulator, Due April 29, 2010 NEW OFFICE HOUR TIME: Tuesday 1-2, McKinley Last Time Exam discussion
Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture
Chapter 2: Computer-System Structures Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture Operating System Concepts 2.1 Computer-System Architecture
Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:
55 Topic 3 Computer Performance Contents 3.1 Introduction...................................... 56 3.2 Measuring performance............................... 56 3.2.1 Clock Speed.................................
ARM Microprocessor and ARM-Based Microcontrollers
ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 A Microcontroller-Based Embedded System Roadmap 1 Introduction ARM ARM Basics 2 ARM Extensions Thumb Jazelle NEON & DSP Enhancement
Administrative Issues
CSC 3210 Computer Organization and Programming Introduction and Overview Dr. Anu Bourgeois (modified by Yuan Long) Administrative Issues Required Prerequisites CSc 2010 Intro to CSc CSc 2310 Java Programming
Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin
BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394
Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory
Computer Organization and Architecture Chapter 4 Cache Memory Characteristics of Memory Systems Note: Appendix 4A will not be covered in class, but the material is interesting reading and may be used in
What is a System on a Chip?
What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex
SPARC64 VIIIfx: CPU for the K computer
SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS
Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1
Module 2 Embedded Processors and Memory Version 2 EE IIT, Kharagpur 1 Lesson 5 Memory-I Version 2 EE IIT, Kharagpur 2 Instructional Objectives After going through this lesson the student would Pre-Requisite
Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design
Learning Outcomes Simple CPU Operation and Buses Dr Eddie Edwards [email protected] At the end of this lecture you will Understand how a CPU might be put together Be able to name the basic components
Intel 8086 architecture
Intel 8086 architecture Today we ll take a look at Intel s 8086, which is one of the oldest and yet most prevalent processor architectures around. We ll make many comparisons between the MIPS and 8086
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Computer Systems Design and Architecture by V. Heuring and H. Jordan
1-1 Chapter 1 - The General Purpose Machine Computer Systems Design and Architecture Vincent P. Heuring and Harry F. Jordan Department of Electrical and Computer Engineering University of Colorado - Boulder
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013
18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 Reminder: Homeworks for Next Two Weeks Homework 0 Due next Wednesday (Jan 23), right
