Challenges for Worst-case Execution Time Analysis of Multi-core Architectures

Size: px
Start display at page:

Download "Challenges for Worst-case Execution Time Analysis of Multi-core Architectures"

Transcription

1 Challenges for Worst-case Execution Time Analysis of Multi-core Architectures Jan saarland university computer science Intel, Braunschweig April 29, 2013

2 The Context: Hard Real-Time Systems Safety-critical applications: Avionics, automotive, train industries, manufacturing Side airbag in car Side airbag in car Reaction in < 10 msec Crankshaft-synchronous tasks Reaction in < 45 microsec Embedded controllers must finish their tasks within given time bounds. Developers would like to know the Worst-Case Execution Time (WCET) to give a guarantee. Jan Reineke, Saarland 2

3 The Timing Analysis Problem Embedded + Software? Timing Requirements Microarchitecture Jan Reineke, Saarland 3

4 What does the execution time depend on? The input, determining which path is taken through the program. The state of the hardware platform: l Due to caches, pipelining, speculation, etc. Interference from the environment: l External interference as seen from the analyzed task on shared busses, caches, memory. Simple CPU Memory Jan Reineke, Saarland 4

5 What does the execution time depend on? The input, determining which path is taken through the program. The state of the hardware platform: l Due to caches, pipelining, speculation, etc. Interference from the environment: l External interference as seen from the analyzed task on shared busses, caches, memory. Complex CPU (out-of-order Simple execution, branch CPU prediction, etc.) L1 Cache Memory Main Memory Jan Reineke, Saarland 5

6 What does the execution time depend on? The input, determining which path is taken through the program. The state of the hardware platform: l Due to caches, pipelining, speculation, etc. Interference from the environment: l External interference as seen from the analyzed task on shared busses, caches, memory. Complex Complex CPU CPU (out-of-order Simple execution,... branch CPU prediction, etc.) Complex CPU L1 Cache L1 Cache L1 Cache L2 Memory Cache Main Main Memory Jan Reineke, Saarland 6

7 Example of Influence of Microarchitectural State x=a+b; LOAD LOAD ADD r2, _a r1, _b r3,r2,r1 PowerPC 755 Courte Jan Reineke, Saarland 7

8 Example of Influence of Corunning Tasks in Multicores Radojkovic et al. (ACM TACO, 2012) on Intel Atom and Intel Core 2 Quad: up to 14x slow-down due to interference on shared L2 cache and memory controller Jan Reineke, Saarland 8

9 Challenges 1. Modeling How to construct sound timing models? 2. Analysis How to precisely & efficiently bound the WCET? 3. Design How to design microarchitectures that enable precise & efficient WCET analysis? Jan Reineke, Saarland 9

10 The Modeling Challenge Microarchitecture? Timing Model Timing model = Formal specification of microarchitecture s timing Incorrect timing model à possibly incorrect WCET bound. Jan Reineke, Saarland 10

11 Current Process of Deriving Timing Model Microarchitecture? Timing Model Jan Reineke, Saarland 11

12 Current Process of Deriving Timing Model Microarchitecture? Timing Model Jan Reineke, Saarland 12

13 Current Process of Deriving Timing Model Microarchitecture? Timing Model Jan Reineke, Saarland 13

14 Current Process of Deriving Timing Model Microarchitecture? Timing Model Jan Reineke, Saarland 14

15 Current Process of Deriving Timing Model Microarchitecture? Timing Model à Time-consuming, and à error-prone. Jan Reineke, Saarland 15

16 Current Process of Deriving Timing Model Microarchitecture? Timing Model à Time-consuming, and à error-prone. Jan Reineke, Saarland 16

17 1. Future Process of Deriving Timing Model Microarchitecture VHDL Model Timing Model Jan Reineke, Saarland 17

18 1. Future Process of Deriving Timing Model Microarchitecture VHDL Model Timing Model Derive timing model automatically from formal specification of microarchitecture. à Less manual effort, thus less time-consuming, and à provably correct. Jan Reineke, Saarland 18

19 1. Future Process of Deriving Timing Model Microarchitecture VHDL Model Timing Model Derive timing model automatically from formal specification of microarchitecture. à Less manual effort, thus less time-consuming, and à provably correct. Jan Reineke, Saarland 19

20 1. Future Process of Deriving Timing Model Microarchitecture VHDL Model Timing Model Derive timing model automatically from formal specification of microarchitecture. à Less manual effort, thus less time-consuming, and à provably correct. Jan Reineke, Saarland 20

21 2. Future Process of Deriving Timing Model Microarchitecture Perform measurements on hardware Infer model Timing Model Jan Reineke, Saarland 21

22 2. Future Process of Deriving Timing Model Microarchitecture Perform measurements on hardware Infer model Timing Model Derive timing model automatically from measurements on the hardware using ideas from automata learning. à No manual effort, and à (under certain assumptions) provably correct. à Also useful to validate assumptions about microarch. Jan Reineke, Saarland 22

23 2. Future Process of Deriving Timing Model Microarchitecture Perform measurements on hardware Infer model Timing Model Derive timing model automatically from measurements on the hardware using ideas from automata learning. à No manual effort, and à (under certain assumptions) provably correct. à Also useful to validate assumptions about microarch. Jan Reineke, Saarland 23

24 2. Future Process of Deriving Timing Model Microarchitecture Perform measurements on hardware Infer model Timing Model Derive timing model automatically from measurements on the hardware using ideas from automata learning. à No manual effort, and à (under certain assumptions) provably correct. à Also useful to validate assumptions about microarch. Jan Reineke, Saarland 24

25 2. Future Process of Deriving Timing Model Microarchitecture Perform measurements on hardware Infer model Timing Model Derive timing model automatically from measurements on the hardware using ideas from automata learning. à No manual effort, and à (under certain assumptions) provably correct. à Also useful to validate assumptions about microarch. Jan Reineke, Saarland 25

26 Proof-of-concept: Automatic Modeling of the Cache Hierarchy Cache Model is important part of Timing Model Can be characterized by a few parameters: l ABC: associativity, block size, capacity l Replacement policy B = Block Size Tag Data Tag Data Tag Data A = Associativity Tag Tag Data Data Tag Tag Data Data... Tag Tag Data Data Tag Data Tag Data Tag Data N = Number of Cache Sets chi [Abel and Reineke, RTAS 2013] derives all of these parameters fully automatically. Jan Reineke, Saarland 26

27 Example: Intel Core 2 Duo E6750, L1 Data Cache Misses L1 Misses Size Jan Reineke, Saarland 27

28 Example: Intel Core 2 Duo E6750, L1 Data Cache Misses L1 Misses Size Capacity = 32 KB Jan Reineke, Saarland 28

29 Example: Intel Core 2 Duo E6750, L1 Data Cache Way Size = 4 KB Misses L1 Misses Size Capacity = 32 KB Jan Reineke, Saarland 29

30 Replacement Policy Approach inspired by methods to learn finite automata. Heavily specialized to problem domain. Jan Reineke, Saarland 30

31 Replacement Policy Approach inspired by methods to learn finite automata. Heavily specialized to problem domain. Discovered to our knowledge undocumented policy of the Intel Atom D525: d x a b c d e f d c a b e f x e d c a b More information: Jan Reineke, Saarland 31

32 Modeling Challenge: Future Work Extend automation to other parts of the microarchitecture: Translation lookaside buffers, branch predictors Shared caches in multicores including their coherency protocols Out-of-order pipelines? Jan Reineke, Saarland 32

33 The Analysis Challenge Microarchitecture! Timing Model? Precise & Efficient Timing Analysis Consider all possible program inputs Consider all possible initial states of the hardware WCET H (P ):= max i2inputs max ET H(P, i, h) h2states(h) Jan Reineke, Saarland 33

34 The Analysis Challenge Consider all possible program inputs Consider all possible initial states of the hardware WCET H (P ):= max i2inputs max ET H(P, i, h) h2states(h) Explicitly evaluating ET for all inputs and all hardware states is not feasible in practice: There are simply too many. è Need for abstraction and thus approximation! Jan Reineke, Saarland 34

35 The Analysis Challenge: State of the Art Component Caches, Branch Target Buffers Analysis Status Precise & efficient abstractions, for LRU [Ferdinand, 1999] Not-so-precise but efficient abstractions, for FIFO, PLRU, MRU [Grund and Reineke, ] Complex Pipelines Precise but very inefficient; little abstraction Major challenge: timing anomalies Shared resources, e.g. busses, shared caches, DRAM No realistic approaches yet Major challenge: interference between hardware threads à execution time depends on corunning tasks Jan Reineke, Saarland 35

36 Timing Anomalies Timing Anomaly = Local worst-case does not imply Global worst-case Resource 1 A D E Resource 2 C B Resource 1 A D E Resource 2 B C Scheduling Anomaly Jan Reineke, Saarland 36

37 Timing Anomalies Timing Anomaly = Local worst-case does not imply Global worst-case Branch Condition Evaluated Cache Hit A Prefetch B - Miss C Cache Miss A C Speculation Anomaly Jan Reineke, Saarland 37

38 The Design Challenge Wanted: Multi-/many-core architecture with No timing anomalies à precise & efficient analysis of individual cores Temporal isolation between cores à independent/incremental development & analysis and high performance! Jan Reineke, Saarland 38

39 Approaches to the Design Challenge At the level of individual cores: Simple in-order pipelines, with static or no branch prediction Scratchpad Memories or LRU Caches Softwarecontrolled caches Jan Reineke, Saarland 39

40 Approaches to the Design Challenge For resources shared among multiple cores: Temporal partitioning, e.g. l TDMA arbitration of buses, shared memories l Thread-interleaved pipeline in PRET Spatial partitioning, e.g. l Partition shared caches l Partition shared DRAM à Temporal isolation Jan Reineke, Saarland 40

41 Design Challenge: Predictable Pipelining from Hennessy and Patterson, Computer Architecture: A Quantitative Approach, Jan Reineke, Saarland 41

42 Pipelining: Hazards from Hennessy and Patterson, Computer Architecture: A Quantitative Approach, Jan Reineke, Saarland 42

43 Forwarding helps, but not all the time LD R1, 45(r2) DADD R5, R1, R7 BE R5, R3, R0 ST R5, 48(R2) Unpipelined The Dream F D E M W F D E M W F D E M W F D E M W F D E M W F D E M W F D E M W F D E M W F D E M W The Reality F D E M W Memory Hazard F D E M W Data Hazard F D E M W Branch Hazard Jan Reineke, Saarland 43

44 Solution: PTARM Thread-interleaved Pipelines [Lickly et al., CASES 2008] + T1: F D E M W F D E M W T2: F D E M W F D E M W T3: F D E M W F D E M W T4: F D E M W F D E M W T5: F D E M W F D E M W Each thread occupies only one stage of the pipeline at a time à No hazards; perfect utilization of pipeline à Simple hardware implementation (no forwarding, etc.) à Each instruction takes the same amount of time à Temporal isolation between different hardware threads Drawback: reduced single-thread performance Jan Reineke, Saarland 44

45 Design Challenge: DRAM Controller Translates sequences of memory accesses by Clients (CPUs and I/O) into legal sequences of DRAM commands l Needs to obey all timing constraints l Needs to insert refresh commands sufficiently often l Needs to translate physical memory addresses into row/column/ bank tuples CPU1... CPU1 Interconnect + Arbitration Memory Controller DRAM Module I/O Jan Reineke, Saarland 45

46 Dynamic RAM Timing Constraints DRAM Memory Controllers have to conform to different timing constraints that define minimal distances between consecutive DRAM commands. Almost all of these constraints are due to the sharing of resources at different levels of the hierarchy: addr+cmd DIMM Bit line Word line Transistor Capacitor Bank Row Address Row Decoder DRAM Array Sense Amplifiers and Row Buffer Column Decoder/ Multiplexer command chip select address Control Logic Mode Register Address Register DRAM Device Row Address Mux Refresh Counter Bank Bank Bank Bank I/O Gating I/O Registers + Data I/O data 16 data 64 data 16 data 16 data 16 data 16 x16 Device x16 Device x16 Device x16 Device x16 Device x16 Device x16 Device x16 Device chip select 0 chip select 1 Rank 0 Rank 1 Needs to insert refresh commands sufficiently often Rows within a bank share sense amplifiers Banks within a DRAM device share I/O gating and control logic Different ranks share data/address/ command busses Jan Reineke, Saarland 46

47 General-Purpose DRAM Controllers Schedule DRAM commands dynamically Timing hard to predict even for single client: l Timing of request depends on past requests: Request to same/different bank? Request to open/closed row within bank? Controller might reorder requests to minimize latency l Controllers dynamically schedule refreshes No temporal isolation. Timing depends on behavior of other clients: l They influence sequence of past requests l Arbitration may or may not provide guarantees Jan Reineke, Saarland 47

48 General-Purpose DRAM Controllers Thread 1 Thread 2 Load B1.R3.C2 Load B2.R4.C3 Store B4.R3.C5 Load B3.R3.C2 Load B3.R5.C3 Store B2.R3.C5 Arbitration Load B1.R3.C2 Load B3.R3.C2 Load B2.R4.C3 Store B4.R3.C5 Load B3.R5.C3 Store B2.R3.C5? Memory Controller Jan Reineke, Saarland 48

49 General-Purpose DRAM Controllers Thread 1 Thread 2 Load B1.R3.C2 Load B2.R4.C3 Store B4.R3.C5 Load B3.R3.C2 Load B3.R5.C3 Store B2.R3.C5 Arbitration Load B3.R3.C2 Load B1.R3.C2 Load B2.R4.C3 Load B3.R5.C3 Store B4.R3.C5 Store B2.R3.C5? Memory Controller Jan Reineke, Saarland 49

50 General-Purpose DRAM Controllers Load B1.R3.C2 Load B1.R4.C3 Load B1.R3.C5 Memory Controller RAS B1.R3 CAS B1.C2 RAS B1.R4 CAS B1.C3 RAS B1.R3 CAS B1.C5? RAS B1.R3 CAS B1.C2 CAS B1.C5 RAS B1.R4 CAS B1.C3 Jan Reineke, Saarland 50

51 PRET DRAM Controller: Three Innovations [Reineke et al., CODES+ISSS 2011] Expose internal structure of DRAM devices: l Expose individual banks within DRAM device as multiple independent resources CPU1... CPU1 Interconnect + Arbitration PRET DRAM Controller DRAM Bank DRAM Module DRAM Module DRAM Module I/O Defer refreshes to the end of transactions l Allows to hide refresh latency Perform refreshes manually : l Replace standard refresh command with multiple reads Jan Reineke, Saarland 51

52 PRET DRAM Controller: Exploiting Internal Structure of DRAM Module l Consists of 4-8 banks in 1-2 ranks Share only command and data bus, otherwise independent l Partition banks into four groups in alternating ranks l Cycle through groups in a time-triggered fashion Rank 0: Bank 0 Bank 1 Bank 2 Bank 3 Rank 1: Bank 0 Bank 1 Bank 2 Bank 3 Jan Reineke, Saarland 52

53 PRET DRAM Controller: Exploiting Internal Structure of DRAM Module l Consists of 4-8 banks in 1-2 ranks Share only command and data bus, otherwise independent l Partition banks into four groups in alternating ranks l Cycle through groups in a time-triggered fashion Rank 0: Bank 0 Bank 1 Bank 2 Bank 3 Successive accesses to same group obey timing constraints Reads/writes to different groups do not interfere Rank 1: Bank 0 Bank 1 Bank 2 Bank 3 Jan Reineke, Saarland 53

54 PRET DRAM Controller: Exploiting Internal Structure of DRAM Module l Consists of 4-8 banks in 1-2 ranks Share only command and data bus, otherwise independent l Partition banks into four groups in alternating ranks l Cycle through groups in a time-triggered fashion Rank 0: Bank 0 Bank 1 Bank 2 Bank 3 Successive accesses to same group obey timing constraints Reads/writes to different groups do not interfere Rank 1: Bank 0 Bank 1 Bank 2 Bank 3 Provides four independent and predictable resources Jan Reineke, Saarland 54

55 Conventional DRAM Controller (DRAMSim2) vs PRET DRAM Controller: Latency Evaluation latency [cycles] 3,000 2,000 1,000 0 Varying interference for fixed transfer size: Interference [# of other threads occupied] 4096B transfers, conventional controller 4096B transfers, PRET controller 1024B transfers, conventional controller 1024B transfers, PRET controller Figure 9: Latencies of conventional and PRET memory con- average latency [cycles] 3,000 2,000 1,000 0 Varying transfer size at maximal interference: Conventional controller PRET controller 0 1,000 2,000 3,000 4,000 transfer size [bytes] Figure 10: Latencies of conventional and PRET memory con- More information: Jan Reineke, Saarland 55

56 Emerging Challenge: Microarchitecture Selection & Configuration Embedded Software with? Timing Requirements Family of Microarchitectures = Platform Jan Reineke, Saarland 56

57 Emerging Challenge: Microarchitecture Selection & Configuration Embedded Software with? Timing Requirements Choices: Processor frequency Sizes and latencies of local memories Latency and bandwidth of interconnect Presence of floatingpoint unit Family of Microarchitectures = Platform Jan Reineke, Saarland 57

58 Emerging Challenge: Microarchitecture Selection & Configuration Embedded Software Timing Requirements Choices: Processor frequency Sizes and latencies of local memories Latency and bandwidth of interconnect Presence of floatingpoint unit? Select a microarchitecture that a) satisfies all timing requirements, and with b) minimizes cost/size/energy. Family of Microarchitectures = Platform Jan Reineke, Saarland 58

59 Conclusions Challenges in modeling analysis design remain. Jan Reineke, Saarland 59

60 Conclusions Challenges in modeling analysis design remain. Progress based on automation abstraction partitioning has been made. Jan Reineke, Saarland 60

61 Conclusions Challenges in modeling analysis design remain. Progress based on automation abstraction partitioning has been made. Thank you for your attention! Jan Reineke, Saarland 61

Designing Predictable Multicore Architectures for Avionics and Automotive Systems extended abstract

Designing Predictable Multicore Architectures for Avionics and Automotive Systems extended abstract Designing Predictable Multicore Architectures for Avionics and Automotive Systems extended abstract Reinhard Wilhelm, Christian Ferdinand, Christoph Cullmann, Daniel Grund, Jan Reineke, Benôit Triquet

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

Intel X38 Express Chipset Memory Technology and Configuration Guide

Intel X38 Express Chipset Memory Technology and Configuration Guide Intel X38 Express Chipset Memory Technology and Configuration Guide White Paper January 2008 Document Number: 318469-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

OpenSPARC T1 Processor

OpenSPARC T1 Processor OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware

More information

Motivation: Smartphone Market

Motivation: Smartphone Market Motivation: Smartphone Market Smartphone Systems External Display Device Display Smartphone Systems Smartphone-like system Main Camera Front-facing Camera Central Processing Unit Device Display Graphics

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code

More information

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide White Paper August 2007 Document Number: 316971-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

OC By Arsene Fansi T. POLIMI 2008 1

OC By Arsene Fansi T. POLIMI 2008 1 IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5

More information

Memory Systems. Static Random Access Memory (SRAM) Cell

Memory Systems. Static Random Access Memory (SRAM) Cell Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

361 Computer Architecture Lecture 14: Cache Memory

361 Computer Architecture Lecture 14: Cache Memory 1 361 Computer Architecture Lecture 14 Memory cache.1 The Motivation for s Memory System Processor DRAM Motivation Large memories (DRAM) are slow Small memories (SRAM) are fast Make the average access

More information

A Dual-Layer Bus Arbiter for Mixed-Criticality Systems with Hypervisors

A Dual-Layer Bus Arbiter for Mixed-Criticality Systems with Hypervisors A Dual-Layer Bus Arbiter for Mixed-Criticality Systems with Hypervisors Bekim Cilku, Bernhard Frömel, Peter Puschner Institute of Computer Engineering Vienna University of Technology A1040 Wien, Austria

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 11 Memory Management Computer Architecture Part 11 page 1 of 44 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Pentium vs. Power PC Computer Architecture and PCI Bus Interface Pentium vs. Power PC Computer Architecture and PCI Bus Interface CSE 3322 1 Pentium vs. Power PC Computer Architecture and PCI Bus Interface Nowadays, there are two major types of microprocessors in the

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG 29.10.2012, Moscow

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG 29.10.2012, Moscow PikeOS: Multi-Core RTOS for IMA Dr. Sergey Tverdyshev SYSGO AG 29.10.2012, Moscow Contents Multi Core Overview Hardware Considerations Multi Core Software Design Certification Consideratins PikeOS Multi-Core

More information

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

VLIW Processors. VLIW Processors

VLIW Processors. VLIW Processors 1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW

More information

Energy-Efficient, High-Performance Heterogeneous Core Design

Energy-Efficient, High-Performance Heterogeneous Core Design Energy-Efficient, High-Performance Heterogeneous Core Design Raj Parihar Core Design Session, MICRO - 2012 Advanced Computer Architecture Lab, UofR, Rochester April 18, 2013 Raj Parihar Energy-Efficient,

More information

The Orca Chip... Heart of IBM s RISC System/6000 Value Servers

The Orca Chip... Heart of IBM s RISC System/6000 Value Servers The Orca Chip... Heart of IBM s RISC System/6000 Value Servers Ravi Arimilli IBM RISC System/6000 Division 1 Agenda. Server Background. Cache Heirarchy Performance Study. RS/6000 Value Server System Structure.

More information

Multithreading Lin Gao cs9244 report, 2006

Multithreading Lin Gao cs9244 report, 2006 Multithreading Lin Gao cs9244 report, 2006 2 Contents 1 Introduction 5 2 Multithreading Technology 7 2.1 Fine-grained multithreading (FGMT)............. 8 2.2 Coarse-grained multithreading (CGMT)............

More information

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU Martin Straka Doctoral Degree Programme (1), FIT BUT E-mail: strakam@fit.vutbr.cz Supervised by: Zdeněk Kotásek E-mail: kotasek@fit.vutbr.cz

More information

TIME PREDICTABLE CPU AND DMA SHARED MEMORY ACCESS

TIME PREDICTABLE CPU AND DMA SHARED MEMORY ACCESS TIME PREDICTABLE CPU AND DMA SHARED MEMORY ACCESS Christof Pitter Institute of Computer Engineering Vienna University of Technology, Austria cpitter@mail.tuwien.ac.at Martin Schoeberl Institute of Computer

More information

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,

More information

Operating Systems. Virtual Memory

Operating Systems. Virtual Memory Operating Systems Virtual Memory Virtual Memory Topics. Memory Hierarchy. Why Virtual Memory. Virtual Memory Issues. Virtual Memory Solutions. Locality of Reference. Virtual Memory with Segmentation. Page

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access? ECE337 / CS341, Fall 2005 Introduction to Computer Architecture and Organization Instructor: Victor Manuel Murray Herrera Date assigned: 09/19/05, 05:00 PM Due back: 09/30/05, 8:00 AM Homework # 2 Solutions

More information

AES Power Attack Based on Induced Cache Miss and Countermeasure

AES Power Attack Based on Induced Cache Miss and Countermeasure AES Power Attack Based on Induced Cache Miss and Countermeasure Guido Bertoni, Vittorio Zaccaria STMicroelectronics, Advanced System Technology Agrate Brianza - Milano, Italy, {guido.bertoni, vittorio.zaccaria}@st.com

More information

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy The Quest for Speed - Memory Cache Memory CSE 4, Spring 25 Computer Systems http://www.cs.washington.edu/4 If all memory accesses (IF/lw/sw) accessed main memory, programs would run 20 times slower And

More information

CHAPTER 7: The CPU and Memory

CHAPTER 7: The CPU and Memory CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

Intel 965 Express Chipset Family Memory Technology and Configuration Guide

Intel 965 Express Chipset Family Memory Technology and Configuration Guide Intel 965 Express Chipset Family Memory Technology and Configuration Guide White Paper - For the Intel 82Q965, 82Q963, 82G965 Graphics and Memory Controller Hub (GMCH) and Intel 82P965 Memory Controller

More information

CS/COE1541: Introduction to Computer Architecture. Memory hierarchy. Sangyeun Cho. Computer Science Department University of Pittsburgh

CS/COE1541: Introduction to Computer Architecture. Memory hierarchy. Sangyeun Cho. Computer Science Department University of Pittsburgh CS/COE1541: Introduction to Computer Architecture Memory hierarchy Sangyeun Cho Computer Science Department CPU clock rate Apple II MOS 6502 (1975) 1~2MHz Original IBM PC (1981) Intel 8080 4.77MHz Intel

More information

HIPEAC 2015. Segregation of Subsystems with Different Criticalities on Networked Multi-Core Chips in the DREAMS Architecture

HIPEAC 2015. Segregation of Subsystems with Different Criticalities on Networked Multi-Core Chips in the DREAMS Architecture HIPEAC 2015 Segregation of Subsystems with Different Criticalities on Networked Multi-Core Chips in the DREAMS Architecture University of Siegen Roman Obermaisser Overview Mixed-Criticality Systems Modular

More information

The Classical Architecture. Storage 1 / 36

The Classical Architecture. Storage 1 / 36 1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage

More information

A N. O N Output/Input-output connection

A N. O N Output/Input-output connection Memory Types Two basic types: ROM: Read-only memory RAM: Read-Write memory Four commonly used memories: ROM Flash, EEPROM Static RAM (SRAM) Dynamic RAM (DRAM), SDRAM, RAMBUS, DDR RAM Generic pin configuration:

More information

Switch Fabric Implementation Using Shared Memory

Switch Fabric Implementation Using Shared Memory Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi University of Utah & HP Labs 1 Large Caches Cache hierarchies

More information

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor Travis Lanier Senior Product Manager 1 Cortex-A15: Next Generation Leadership Cortex-A class multi-processor

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar

Memory ICS 233. Computer Architecture and Assembly Language Prof. Muhamed Mudawar Memory ICS 233 Computer Architecture and Assembly Language Prof. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline Random

More information

21152 PCI-to-PCI Bridge

21152 PCI-to-PCI Bridge Product Features Brief Datasheet Intel s second-generation 21152 PCI-to-PCI Bridge is fully compliant with PCI Local Bus Specification, Revision 2.1. The 21152 is pin-to-pin compatible with Intel s 21052,

More information

Computer Systems Structure Main Memory Organization

Computer Systems Structure Main Memory Organization Computer Systems Structure Main Memory Organization Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Storage/Memory

More information

Memory Basics. SRAM/DRAM Basics

Memory Basics. SRAM/DRAM Basics Memory Basics RAM: Random Access Memory historically defined as memory array with individual bit access refers to memory with both Read and Write capabilities ROM: Read Only Memory no capabilities for

More information

Computer Architecture

Computer Architecture Computer Architecture Random Access Memory Technologies 2015. április 2. Budapest Gábor Horváth associate professor BUTE Dept. Of Networked Systems and Services ghorvath@hit.bme.hu 2 Storing data Possible

More information

Byte Ordering of Multibyte Data Items

Byte Ordering of Multibyte Data Items Byte Ordering of Multibyte Data Items Most Significant Byte (MSB) Least Significant Byte (LSB) Big Endian Byte Addresses +0 +1 +2 +3 +4 +5 +6 +7 VALUE (8-byte) Least Significant Byte (LSB) Most Significant

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

A Mixed Time-Criticality SDRAM Controller

A Mixed Time-Criticality SDRAM Controller NEST COBRA CA4 A Mixed Time-Criticality SDRAM Controller MeAOW 3-9-23 Sven Goossens, Benny Akesson, Kees Goossens Mixed Time-Criticality 2/5 Embedded multi-core systems are getting more complex: Integrating

More information

Mixed-Criticality: Integration of Different Models of Computation. University of Siegen, Roman Obermaisser

Mixed-Criticality: Integration of Different Models of Computation. University of Siegen, Roman Obermaisser Workshop on "Challenges in Mixed Criticality, Real-time, and Reliability in Networked Complex Embedded Systems" Mixed-Criticality: Integration of Different Models of Computation University of Siegen, Roman

More information

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano Intel Itanium Quad-Core Architecture for the Enterprise Lambert Schaelicke Eric DeLano Agenda Introduction Intel Itanium Roadmap Intel Itanium Processor 9300 Series Overview Key Features Pipeline Overview

More information

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng Slide Set 8 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section

More information

Parallel Computing 37 (2011) 26 41. Contents lists available at ScienceDirect. Parallel Computing. journal homepage: www.elsevier.

Parallel Computing 37 (2011) 26 41. Contents lists available at ScienceDirect. Parallel Computing. journal homepage: www.elsevier. Parallel Computing 37 (2011) 26 41 Contents lists available at ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco Architectural support for thread communications in multi-core

More information

Worst-Case Execution Time Prediction by Static Program Analysis

Worst-Case Execution Time Prediction by Static Program Analysis Worst-Case Execution Time Prediction by Static Program Analysis Reinhold Heckmann Christian Ferdinand AbsInt Angewandte Informatik GmbH Science Park 1, D-66123 Saarbrücken, Germany Phone: +49-681-38360-0

More information

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Heechul Yun +, Gang Yao +, Rodolfo Pellizzoni *, Marco Caccamo +, Lui Sha + University of Illinois at Urbana and Champaign

More information

CPU Shielding: Investigating Real-Time Guarantees via Resource Partitioning

CPU Shielding: Investigating Real-Time Guarantees via Resource Partitioning CPU Shielding: Investigating Real-Time Guarantees via Resource Partitioning Progress Report 1 John Scott Tillman jstillma@ncsu.edu CSC714 Real-Time Computer Systems North Carolina State University Instructor:

More information

DDR4 Memory Technology on HP Z Workstations

DDR4 Memory Technology on HP Z Workstations Technical white paper DDR4 Memory Technology on HP Z Workstations DDR4 is the latest memory technology available for main memory on mobile, desktops, workstations, and server computers. DDR stands for

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU

More information

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory Computer Organization and Architecture Chapter 4 Cache Memory Characteristics of Memory Systems Note: Appendix 4A will not be covered in class, but the material is interesting reading and may be used in

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

High Performance Processor Architecture. André Seznec IRISA/INRIA ALF project-team

High Performance Processor Architecture. André Seznec IRISA/INRIA ALF project-team High Performance Processor Architecture André Seznec IRISA/INRIA ALF project-team 1 2 Moore s «Law» Nb of transistors on a micro processor chip doubles every 18 months 1972: 2000 transistors (Intel 4004)

More information

SPARC64 VIIIfx: CPU for the K computer

SPARC64 VIIIfx: CPU for the K computer SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS

More information

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune Introduction to RISC Processor ni logic Pvt. Ltd., Pune AGENDA What is RISC & its History What is meant by RISC Architecture of MIPS-R4000 Processor Difference Between RISC and CISC Pros and Cons of RISC

More information

Multicore partitioned systems based on hypervisor

Multicore partitioned systems based on hypervisor Preprints of the 19th World Congress The International Federation of Automatic Control Multicore partitioned systems based on hypervisor A. Crespo M. Masmano J. Coronel S. Peiró P. Balbastre J. Simó Universitat

More information

The Big Picture. Cache Memory CSE 675.02. Memory Hierarchy (1/3) Disk

The Big Picture. Cache Memory CSE 675.02. Memory Hierarchy (1/3) Disk The Big Picture Cache Memory CSE 5.2 Computer Processor Memory (active) (passive) Control (where ( brain ) programs, Datapath data live ( brawn ) when running) Devices Input Output Keyboard, Mouse Disk,

More information

Design Cycle for Microprocessors

Design Cycle for Microprocessors Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types

More information

COS 318: Operating Systems. Virtual Memory and Address Translation

COS 318: Operating Systems. Virtual Memory and Address Translation COS 318: Operating Systems Virtual Memory and Address Translation Today s Topics Midterm Results Virtual Memory Virtualization Protection Address Translation Base and bound Segmentation Paging Translation

More information

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano

More information

Central Processing Unit

Central Processing Unit Chapter 4 Central Processing Unit 1. CPU organization and operation flowchart 1.1. General concepts The primary function of the Central Processing Unit is to execute sequences of instructions representing

More information

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont. CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont. Instructors: Vladimir Stojanovic & Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/ 1 Bare 5-Stage Pipeline Physical Address PC Inst.

More information

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing Robert Golla Senior Hardware Architect Paul Jordan Senior Principal Hardware Engineer Oracle

More information

IA-64 Application Developer s Architecture Guide

IA-64 Application Developer s Architecture Guide IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve

More information

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

More information

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus Datorteknik F1 bild 1 What is a bus? Slow vehicle that many people ride together well, true... A bunch of wires... A is: a shared communication link a single set of wires used to connect multiple subsystems

More information

Making Dynamic Memory Allocation Static To Support WCET Analyses

Making Dynamic Memory Allocation Static To Support WCET Analyses Making Dynamic Memory Allocation Static To Support WCET Analyses Jörg Herter Jan Reineke Department of Computer Science Saarland University WCET Workshop, June 2009 Jörg Herter, Jan Reineke Making Dynamic

More information

POSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2)

POSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2) RTOSes Part I Christopher Kenna September 24, 2010 POSIX Portable Operating System for UnIX Application portability at source-code level POSIX Family formally known as IEEE 1003 Originally 17 separate

More information

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),

More information

Configuring and using DDR3 memory with HP ProLiant Gen8 Servers

Configuring and using DDR3 memory with HP ProLiant Gen8 Servers Engineering white paper, 2 nd Edition Configuring and using DDR3 memory with HP ProLiant Gen8 Servers Best Practice Guidelines for ProLiant servers with Intel Xeon processors Table of contents Introduction

More information

Enterprise Applications

Enterprise Applications Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

Introducción. Diseño de sistemas digitales.1

Introducción. Diseño de sistemas digitales.1 Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1

More information

The Microarchitecture of Superscalar Processors

The Microarchitecture of Superscalar Processors The Microarchitecture of Superscalar Processors James E. Smith Department of Electrical and Computer Engineering 1415 Johnson Drive Madison, WI 53706 ph: (608)-265-5737 fax:(608)-262-1267 email: jes@ece.wisc.edu

More information

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings This Unit: Multithreading (MT) CIS 501 Computer Architecture Unit 10: Hardware Multithreading Application OS Compiler Firmware CU I/O Memory Digital Circuits Gates & Transistors Why multithreading (MT)?

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

RAM & ROM Based Digital Design. ECE 152A Winter 2012

RAM & ROM Based Digital Design. ECE 152A Winter 2012 RAM & ROM Based Digital Design ECE 152A Winter 212 Reading Assignment Brown and Vranesic 1 Digital System Design 1.1 Building Block Circuits 1.1.3 Static Random Access Memory (SRAM) 1.1.4 SRAM Blocks in

More information

ADVANCED COMPUTER ARCHITECTURE

ADVANCED COMPUTER ARCHITECTURE ADVANCED COMPUTER ARCHITECTURE Marco Ferretti Tel. Ufficio: 0382 985365 E-mail: marco.ferretti@unipv.it Web: www.unipv.it/mferretti, eecs.unipv.it 1 Course syllabus and motivations This course covers the

More information

Concept of Cache in web proxies

Concept of Cache in web proxies Concept of Cache in web proxies Chan Kit Wai and Somasundaram Meiyappan 1. Introduction Caching is an effective performance enhancing technique that has been used in computer systems for decades. However,

More information

With respect to the way of data access we can classify memories as:

With respect to the way of data access we can classify memories as: Memory Classification With respect to the way of data access we can classify memories as: - random access memories (RAM), - sequentially accessible memory (SAM), - direct access memory (DAM), - contents

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

Parallel Processing and Software Performance. Lukáš Marek

Parallel Processing and Software Performance. Lukáš Marek Parallel Processing and Software Performance Lukáš Marek DISTRIBUTED SYSTEMS RESEARCH GROUP http://dsrg.mff.cuni.cz CHARLES UNIVERSITY PRAGUE Faculty of Mathematics and Physics Benchmarking in parallel

More information

HyperThreading Support in VMware ESX Server 2.1

HyperThreading Support in VMware ESX Server 2.1 HyperThreading Support in VMware ESX Server 2.1 Summary VMware ESX Server 2.1 now fully supports Intel s new Hyper-Threading Technology (HT). This paper explains the changes that an administrator can expect

More information

150127-Microprocessor & Assembly Language

150127-Microprocessor & Assembly Language Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an

More information