Practical Benefits of Multi-threaded RISC/DSP Processing

Size: px
Start display at page:

Download "Practical Benefits of Multi-threaded RISC/DSP Processing"

Transcription

1 Practical Benefits of Multi-threaded RISC/DSP Processing David W. Knox Vice President, Software. Imagination Technologies The Imagination Technologies Ltd. (IMG) META processor is the first commercially available multi-threaded IP (Intellectual Property) core. A common architecture for both RISC and DSP instructions, in combination with hardware multi-threading, allows complex systems to be built around a single processor core, where previously two or more different processors would have been used together. This has already proved beneficial in volume real-time systems supporting broadcast and multimedia requirements, including digital radio and digital television receivers. These systems used early versions of META It is expected that the licensable META 120 IP core will form the basis of many future SoC products. Introduction The Imagination Technologies Ltd. (IMG) META processor core is the first commercially available multi-threaded IP (Intellectual Property) core. It is available either stand-alone, in the form of synthesisable VHDL, or as a component of System-on-Chip (SoC) designs. Architecture Overview The processor contains two 32-bit data units, two 32-bit address units and a control unit. Each data unit provides both conventional ALU operations and, optionally, extended DSP arithmetic operations. The address units are cut-down data units, with operations limited to those required for a range of indexed and auto-incremented memory accesses via pointer registers. The control unit looks after program flow, with features to support branch-prediction, lowoverhead looping, exceptions, interrupts, and synchronous triggers. Main memory is accessed via two caches, one for instructions and one for data. Optionally, the data units may include closely coupled DSP RAM which is accessed like an extended register file, bypassing the main memory system. The processor is intended for high volume, multi-function consumer SoC applications. Many SoC systems have performance critical functions which are best implemented as custom hardware accelerators. Accelerators may be memory mapped devices or they may be implemented as coprocessors connected to the CPU s coprocessor port. Co-processor devices benefit from special instructions to optimise the flow of data between CPU registers and the accelerator. DSP Features The standard processor configuration provides a 32-bit fixed point RISC architecture suitable for general purpose applications (typically compiled C code). All processor options benefit from a 64-bit memory architecture which allows two 32-bit operands to be fetched or stored in the same cycle. In addition, an optional package of DSP extensions includes: Larger register files in the data and address units for additional scratch variable storage 40-bit accumulators with sustainable single cycle Multiply-Accumulate (MAC) performance. Extended arithmetic modes including saturation, rounding and scaling. Zero overhead hardware loop support in the control unit. Parallel ALU operation and memory access. DSP RAM memory banks, built into the data units to provide scratch RAM space with register style access

2 Non-linear address modes (bit-reversed and modulo addressing) in the address units. Split arithmetic: a form of SIMD in which a data unit treats a 32-bit register as either two 16-bit values or four 8-bit values and applies the same operation to all of them. Dual bank operation: a form of SIMD in which both data units perform the same operation in parallel. By using both dual bank operation and split-16 arithmetic together, a sustained performance of 4 MACs per cycle is readily achievable. A system of DSP instruction Template registers which allows a rich DSP instruction set to be made available without resorting to an excessively long instruction word. The DSP Template registers implement a table driven VLIW system which keeps the number of DSP op-codes required low enough for both general purpose (RISC) and DSP instructions to be combined in a unified instruction stream. Hardware Multi-Threading The most significant feature of the processor is that it maintains a number of separate hardware execution threads. System designers can tailor the processor implementation by choosing the number of threads and how many of the threads have DSP capability. A typical implementation might have 4 threads of which two would be DSP-capable. A lightweight system might have one general purpose thread and one DSP-capable thread. A heavyweight system might have 4 DSP-capable threads. Although the processing resources (multiplier, adder etc) are shared, each thread has its own registers, pipeline and program counter. On every cycle, a hardware scheduler looks at the next candidate instruction from each thread and chooses which to execute. This choice is based on availability of processing resources and a thread prioritisation system. There is an opportunity for a context switch on every instruction cycle, without overhead. Over fifty internal resources are considered by the scheduler to determine whether a thread can run on the next cycle but most significant, from a programmers viewpoint, is whether a previous memory access has completed the time for a memory access can be lengthy (depending on memory type) and unpredictable (depending on cache state, refresh cycles etc). For high performance DSP code, programmers will attempt to control these memory stalls using techniques such as cache pre-fetches and pipelined address pre-issue. With hardware multi-threading, memory stalls are much less of a problem. When one thread stalls, other threads can execute. Simply dividing a system into a few major subsystems, with one executing on each thread, can be all that is required to eliminate wasted cycles from the system as a whole. Figure 1 and Figure 2 show the difference between the conventional and the multithreaded approaches. By way of example, we have a system of four tasks, two are hard real time tasks in the sense that they must be scheduled in response to external events and complete before the next event arrives. Two further tasks are lower priority continuous background activities. The histogram in Figure 1 shows, as a function of time, the proportion of CPU cycles on which an instruction is issued for a particular CPU and memory combination. This histogram does not reveal all the details of the task scheduling, but we can see that, while the utilisation varies a little as the software scheduler switches tasks, more than 25% of the available CPU cycles are unused. Figure 1 Single Threaded Operation In Figure 2 we see what happens when we use hardware multi-threading to run each of the four tasks on its own hardware thread. The tasks on threads 0 and 1 are the repetitive real-time tasks and these have been given higher priority to ensure they meet their deadlines. The tasks on threads 2 and 3 are the background load tasks

3 We can see that, even though it has a lower priority, thread 1 makes some progress when thread 0 is running by fitting into the unusable (for thread 0) cycles. More significantly, the total CPU utilisation has increased to nearly 100% as shown by the lowest graph. Figure 2 Prioritised Multi-Threading Super-Threading Although there is only one set of processing resources shared between all the threads, not every resource is needed by every instruction. In principle, instructions which do not conflict in their resource requirements can execute simultaneously. Often, more than one thread can issue an instruction on each cycle. We call this super-threading. The performance improvement obtained in practice depends on the particular instruction sequences in the program, but throughput improvements from 50% to 100% relative to simple single-instruction multithreading are not uncommon. Figure 3 shows the additional improvement obtained in our example system as a result of super-threading. This system is identical to that illustrated in Figure 2, but superthreading has increased instruction issues to about 150% of the clock rate. Figure 3 Super-Threading Automatic MIPS Allocation For simplicity, in the examples shown so far, we have used a fixed priority scheme to control the relative execution rates of the four threads. While this is adequate for a system of regular processing in response to periodic events, real systems are more complex and more sophisticated control is required. The scheduler includes a patented system called Automatic MIPS Allocation (AMA ), which dynamically adjusts thread priorities to give each thread, a controlled proportion of the CPU capacity. The control can be applied to both short and long term observation periods, for example to allow a thread to respond to a real-time event with a burst of activity before settling back to its background activity level. Figure 4 illustrates a simple AMA rate control configuration using the same system as in the previous examples. Here we have limited threads 0 and 1 to an instruction issue rate which is just sufficient for them to meet their real-time deadlines (about 40% each). Thread 2 has been set at 20% and thread 3 has been allowed to free-run and absorb the remaining capacity. In particular, this has enabled us to control the relative resource use of threads 2 and 3 with greater precision than simple fixed priorities allowed

4 Figure 4 AMA Rate Control Figure 5 shows a more realistic configuration in which we have used a combination of thread prioritisation and AMA to achieve the desired behaviour. Threads 0 and 1 have elevated priorities for timely real-time response. Thread 2 is controlled by AMA for an execution rate of about 40% instructions/clock and thread 3 is a free-running background task. Because of the interference from the higher priority threads, at times thread 2 cannot achieve the desired run rate and builds up a processing deficit. Whenever possible, for example immediately after the thread 1 activity burst is complete, thread 2 s execution rate is increased to make up the deficit. After catching up, the execution rate returns to the set value. Over short periods, the desired execution rate can not be provided, due to total resource demand exceeding what is available. However, over longer periods (the major time divisions) the average rate of 40% instructions/clock is maintained. Figure 5 AMA & Priority Combination Benefits In hardware multi-threading, execution contexts are replicated but processing logic, external interfaces etc. are shared. This means that META systems introduce new points in the set of cost/performance options available to a system designer. The combination of multi-threading and a unified instruction set for both DSP and RISC operations means that we can often replace a multi-processor RISC+DSP system with a single multi-threaded core. It turns out that the performance benefits of the multi-threaded architecture are easy to realise at the system design level. Often a fairly simplistic partitioning across two or three hardware threads is all that is required to convert a significant proportion of stall cycles into useful processing. If fine-tuning, through load balancing, is required, then the fact that each thread executes the same instruction set makes it simple to move functions from one thread to another. A second benefit of hardware multithreading occurs during system construction and integration. Software authors can develop each thread as a separate subsystem, largely independent of the others. A software tool combines the separate thread images into a single executable system unit at the end of the build process. Each thread can have a completely different software environment. It - 4 -

5 is even possible to run different software operating systems simultaneously on different threads. Hardware multi-threading allows us to separate real-time tasks having widely differing scheduling requirements into different software scheduling domains. For example a subsystem based on a complex multi-functional OS could run on one thread at the same time as a real-time data-driven DSP task runs on another. The DSP task might be based on a simple synchronous IO driven scheduling strategy which would be completely independent of the interrupts and device drivers in the other subsystem. This approach avoids the problems which arise when trying to schedule tasks with wildly differing event rates and activity patterns under a common OS. Such difficulties are another reason why many systems use two or more processors. causes no interference or break-up on the DAB audio an excellent example of how software tasks on different threads can operate independently on the META core. Devices based on META and currently reaching production are applying the same design approaches to bring the cost of digital TV decoders down to a level where they can be integrated into TV sets rather than requiring a separate set-top box. Now, with the new META 120 core, IMG have a set of standard IP configurations which can be licensed alone or as a component of SoC designs. With the core design and architecture validated by real consumer products in volume manufacture, and the availability of comprehensive software algorithm support, IMG expect that META will form the basis of many future high volume products. Application Examples The META core has been used in System on Chip designs for a range of consumer entertainment products. Perhaps the best known, so far, is the Frontier Silicon CHORUS device which implements a Digital Audio Broadcast (DAB) receiver for the internationally adopted Eureka-147 system. In this design three threads are deployed as follows: (i) COFDM symbol demodulation, (ii) DAB protocol and MPEG Audio decoding, (iii) support and control of a dedicated Viterbi decoder hardware device. The fourth thread looks after the user interface keyboard, display, volume and tuning controls. Many different variants of the fourth thread software have been developed to give a variety of product and brand identities ranging from Hi-Fi rack tuners to portable battery powered devices. The CHORUS chip has rapidly established itself as the world s leading DAB receiver. More recent versions of the DAB receiver software have been designed to use only three threads, leaving the fourth available for completely separate applications. For example, both Linux and an MPEG-4 video decoder have been run on the fourth thread at the same time as the DAB radio receiver. Booting Linux on the fourth thread when DAB is being decoded on the other threads - 5 -

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Embedded System Hardware - Processing (Part II)

Embedded System Hardware - Processing (Part II) 12 Embedded System Hardware - Processing (Part II) Jian-Jia Chen (Slides are based on Peter Marwedel) Informatik 12 TU Dortmund Germany Springer, 2010 2014 年 11 月 11 日 These slides use Microsoft clip arts.

More information

Operating Systems 4 th Class

Operating Systems 4 th Class Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science

More information

(Refer Slide Time: 00:01:16 min)

(Refer Slide Time: 00:01:16 min) Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

More information

FLIX: Fast Relief for Performance-Hungry Embedded Applications

FLIX: Fast Relief for Performance-Hungry Embedded Applications FLIX: Fast Relief for Performance-Hungry Embedded Applications Tensilica Inc. February 25 25 Tensilica, Inc. 25 Tensilica, Inc. ii Contents FLIX: Fast Relief for Performance-Hungry Embedded Applications...

More information

ARM Microprocessor and ARM-Based Microcontrollers

ARM Microprocessor and ARM-Based Microcontrollers ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 A Microcontroller-Based Embedded System Roadmap 1 Introduction ARM ARM Basics 2 ARM Extensions Thumb Jazelle NEON & DSP Enhancement

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Introduction to the Latest Tensilica Baseband Solutions

Introduction to the Latest Tensilica Baseband Solutions Introduction to the Latest Tensilica Baseband Solutions Dr. Chris Rowen Founder and Chief Technology Officer Tensilica Inc. Outline The Mobile Wireless Challenge Multi-standard Baseband Tensilica Fits

More information

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance MPSoC Designs: Driving Storage Management IP to Critical Importance Design IP has become an essential part of SoC realization it is a powerful resource multiplier that allows SoC design teams to focus

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: Embedded Systems - , Raj Kamal, Publs.: McGraw-Hill Education Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

GSM/GPRS PHYSICAL LAYER ON SANDBLASTER DSP

GSM/GPRS PHYSICAL LAYER ON SANDBLASTER DSP GSM/GPRS PHYSICAL LAYER ON SANDBLASTER DSP Raghunath Kalavai, Murugappan Senthilvelan, Sitij Agrawal, Sanjay Jinturkar, John Glossner Sandbridge Technologies, 1 North Lexington Avenue, White Plains, NY

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

Using On-chip Networks to Minimize Software Development Costs

Using On-chip Networks to Minimize Software Development Costs Using On-chip Networks to Minimize Software Development Costs Today s multi-core SoCs face rapidly escalating costs driven by the increasing number of cores on chips. It is common to see code bases for

More information

CHAPTER 7: The CPU and Memory

CHAPTER 7: The CPU and Memory CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

Operating System Impact on SMT Architecture

Operating System Impact on SMT Architecture Operating System Impact on SMT Architecture The work published in An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture, Josh Redstone et al., in Proceedings of the 9th

More information

Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track)

Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track) Plan Number 2009 Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track) I. General Rules and Conditions 1. This plan conforms to the regulations of the general frame of programs

More information

An Introduction to the ARM 7 Architecture

An Introduction to the ARM 7 Architecture An Introduction to the ARM 7 Architecture Trevor Martin CEng, MIEE Technical Director This article gives an overview of the ARM 7 architecture and a description of its major features for a developer new

More information

150127-Microprocessor & Assembly Language

150127-Microprocessor & Assembly Language Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an

More information

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern: Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU Martin Straka Doctoral Degree Programme (1), FIT BUT E-mail: strakam@fit.vutbr.cz Supervised by: Zdeněk Kotásek E-mail: kotasek@fit.vutbr.cz

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

Computer Architecture TDTS10

Computer Architecture TDTS10 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

More information

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor Von der Hardware zur Software in FPGAs mit Embedded Prozessoren Alexander Hahn Senior Field Application Engineer Lattice Semiconductor AGENDA Overview Mico32 Embedded Processor Development Tool Chain HW/SW

More information

MICROPROCESSOR AND MICROCOMPUTER BASICS

MICROPROCESSOR AND MICROCOMPUTER BASICS Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Mobile Processors: Future Trends

Mobile Processors: Future Trends Mobile Processors: Future Trends Mário André Pinto Ferreira de Araújo Departamento de Informática, Universidade do Minho 4710-057 Braga, Portugal maaraujo@mail.pt Abstract. Mobile devices, such as handhelds,

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

Let s put together a Manual Processor

Let s put together a Manual Processor Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

About Parallels Desktop 7 for Mac

About Parallels Desktop 7 for Mac About Parallels Desktop 7 for Mac Parallels Desktop 7 for Mac is a major upgrade to Parallels' award-winning software for running Windows on a Mac. About this Update This update for Parallels Desktop for

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

An Overview of Stack Architecture and the PSC 1000 Microprocessor

An Overview of Stack Architecture and the PSC 1000 Microprocessor An Overview of Stack Architecture and the PSC 1000 Microprocessor Introduction A stack is an important data handling structure used in computing. Specifically, a stack is a dynamic set of elements in which

More information

Five Families of ARM Processor IP

Five Families of ARM Processor IP ARM1026EJ-S Synthesizable ARM10E Family Processor Core Eric Schorn CPU Product Manager ARM Austin Design Center Five Families of ARM Processor IP Performance ARM preserves SW & HW investment through code

More information

Instruction Set Architecture

Instruction Set Architecture Instruction Set Architecture Consider x := y+z. (x, y, z are memory variables) 1-address instructions 2-address instructions LOAD y (r :=y) ADD y,z (y := y+z) ADD z (r:=r+z) MOVE x,y (x := y) STORE x (x:=r)

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

Instruction Set Design

Instruction Set Design Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,

More information

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit. Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

System Considerations

System Considerations System Considerations Interfacing Performance Power Size Ease-of Use Programming Interfacing Debugging Cost Device cost System cost Development cost Time to market Integration Peripherals Different Needs?

More information

Tensilica Software Development Toolkit (SDK)

Tensilica Software Development Toolkit (SDK) Tensilica Datasheet Tensilica Software Development Toolkit (SDK) Quickly develop application code Features Cadence Tensilica Xtensa Xplorer Integrated Development Environment (IDE) with full graphical

More information

Switch Fabric Implementation Using Shared Memory

Switch Fabric Implementation Using Shared Memory Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today

More information

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks

More information

Mobile Operating Systems Lesson 05 Windows CE Part 1

Mobile Operating Systems Lesson 05 Windows CE Part 1 Mobile Operating Systems Lesson 05 Windows CE Part 1 Oxford University Press 2007. All rights reserved. 1 Windows CE A 32 bit OS from Microsoft Customized for each specific hardware and processor in order

More information

Computer System Design. System-on-Chip

Computer System Design. System-on-Chip Brochure More information from http://www.researchandmarkets.com/reports/2171000/ Computer System Design. System-on-Chip Description: The next generation of computer system designers will be less concerned

More information

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single

More information

Central Processing Unit

Central Processing Unit Chapter 4 Central Processing Unit 1. CPU organization and operation flowchart 1.1. General concepts The primary function of the Central Processing Unit is to execute sequences of instructions representing

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

CPU Organisation and Operation

CPU Organisation and Operation CPU Organisation and Operation The Fetch-Execute Cycle The operation of the CPU 1 is usually described in terms of the Fetch-Execute cycle. 2 Fetch-Execute Cycle Fetch the Instruction Increment the Program

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

The Central Processing Unit:

The Central Processing Unit: The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Objectives Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

Module 8. Industrial Embedded and Communication Systems. Version 2 EE IIT, Kharagpur 1

Module 8. Industrial Embedded and Communication Systems. Version 2 EE IIT, Kharagpur 1 Module 8 Industrial Embedded and Communication Systems Version 2 EE IIT, Kharagpur 1 Lesson 37 Real-Time Operating Systems: Introduction and Process Management Version 2 EE IIT, Kharagpur 2 Instructional

More information

Spacecraft Computer Systems. Colonel John E. Keesee

Spacecraft Computer Systems. Colonel John E. Keesee Spacecraft Computer Systems Colonel John E. Keesee Overview Spacecraft data processing requires microcomputers and interfaces that are functionally similar to desktop systems However, space systems require:

More information

Performance Comparison of RTOS

Performance Comparison of RTOS Performance Comparison of RTOS Shahmil Merchant, Kalpen Dedhia Dept Of Computer Science. Columbia University Abstract: Embedded systems are becoming an integral part of commercial products today. Mobile

More information

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances: Scheduling Scheduling Scheduling levels Long-term scheduling. Selects which jobs shall be allowed to enter the system. Only used in batch systems. Medium-term scheduling. Performs swapin-swapout operations

More information

Digital Signal Controller Based Automatic Transfer Switch

Digital Signal Controller Based Automatic Transfer Switch Digital Signal Controller Based Automatic Transfer Switch by Venkat Anant Senior Staff Applications Engineer Freescale Semiconductor, Inc. Abstract: An automatic transfer switch (ATS) enables backup generators,

More information

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

More information

IA-64 Application Developer s Architecture Guide

IA-64 Application Developer s Architecture Guide IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve

More information

CISC, RISC, and DSP Microprocessors

CISC, RISC, and DSP Microprocessors CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:

More information

18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 Reminder: Homeworks for Next Two Weeks Homework 0 Due next Wednesday (Jan 23), right

More information

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 26 Real - Time POSIX. (Contd.) Ok Good morning, so let us get

More information

LynxOS RTOS (Real-Time Operating System)

LynxOS RTOS (Real-Time Operating System) LynxOS RTOS (Real-Time Operating System) Stephen J. Franz CS-550 Section 1 Fall 2005 1 Summary LynxOS is one of two real time operating systems (RTOS) developed and marketed by LynuxWorks of San José,

More information

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN. zl2211@columbia.edu. ml3088@columbia.edu

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN. zl2211@columbia.edu. ml3088@columbia.edu MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN Zheng Lai Zhao Liu Meng Li Quan Yuan zl2215@columbia.edu zl2211@columbia.edu ml3088@columbia.edu qy2123@columbia.edu I. Overview Architecture The purpose

More information

1. Computer System Structure and Components

1. Computer System Structure and Components 1 Computer System Structure and Components Computer System Layers Various Computer Programs OS System Calls (eg, fork, execv, write, etc) KERNEL/Behavior or CPU Device Drivers Device Controllers Devices

More information

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution

More information

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System? Management Challenge Managing Hardware Assets What computer processing and storage capability does our organization need to handle its information and business transactions? What arrangement of computers

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

THROUGHPUTER. Parallel Program Development and Execution Platform as a Service

THROUGHPUTER. Parallel Program Development and Execution Platform as a Service THROUGHPUTER Parallel Program Development and Execution Platform as a Service Many Cloud Computing Challenge - Technical Example: Average demands by applications sharing a 16- processor app1 12.5% Actual

More information

Cortex-A9 MPCore Software Development

Cortex-A9 MPCore Software Development Cortex-A9 MPCore Software Development Course Description Cortex-A9 MPCore software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop

More information

Counters and Decoders

Counters and Decoders Physics 3330 Experiment #10 Fall 1999 Purpose Counters and Decoders In this experiment, you will design and construct a 4-bit ripple-through decade counter with a decimal read-out display. Such a counter

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

Qsys and IP Core Integration

Qsys and IP Core Integration Qsys and IP Core Integration Prof. David Lariviere Columbia University Spring 2014 Overview What are IP Cores? Altera Design Tools for using and integrating IP Cores Overview of various IP Core Interconnect

More information

S7 for Windows S7-300/400

S7 for Windows S7-300/400 S7 for Windows S7-300/400 A Programming System for the Siemens S7 300 / 400 PLC s IBHsoftec has an efficient and straight-forward programming system for the Simatic S7-300 and ern controller concept can

More information

Which ARM Cortex Core Is Right for Your Application: A, R or M?

Which ARM Cortex Core Is Right for Your Application: A, R or M? Which ARM Cortex Core Is Right for Your Application: A, R or M? Introduction The ARM Cortex series of cores encompasses a very wide range of scalable performance options offering designers a great deal

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit Unit A451: Computer systems and programming Section 2: Computing Hardware 1/5: Central Processing Unit Section Objectives Candidates should be able to: (a) State the purpose of the CPU (b) Understand the

More information

Design and Implementation of the Heterogeneous Multikernel Operating System

Design and Implementation of the Heterogeneous Multikernel Operating System 223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,

More information

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU ESE566 REPORT3 Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU Nov 19th, 2002 ABSTRACT: In this report, we discuss several recent published papers on design methodologies of core-based

More information

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what

More information

HyperThreading Support in VMware ESX Server 2.1

HyperThreading Support in VMware ESX Server 2.1 HyperThreading Support in VMware ESX Server 2.1 Summary VMware ESX Server 2.1 now fully supports Intel s new Hyper-Threading Technology (HT). This paper explains the changes that an administrator can expect

More information

SPARC64 VIIIfx: CPU for the K computer

SPARC64 VIIIfx: CPU for the K computer SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS

More information

CS352H: Computer Systems Architecture

CS352H: Computer Systems Architecture CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Data Hazards in ALU Instructions

More information

A Powerful solution for next generation Pcs

A Powerful solution for next generation Pcs Product Brief 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k A Powerful solution for next generation Pcs Looking for

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV UNIT I THE 8086 MICROPROCESSOR 1. What is the purpose of segment registers

More information

Week 1 out-of-class notes, discussions and sample problems

Week 1 out-of-class notes, discussions and sample problems Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types

More information