Practical Benefits of Multi-threaded RISC/DSP Processing
|
|
- Arabella Dennis
- 7 years ago
- Views:
Transcription
1 Practical Benefits of Multi-threaded RISC/DSP Processing David W. Knox Vice President, Software. Imagination Technologies The Imagination Technologies Ltd. (IMG) META processor is the first commercially available multi-threaded IP (Intellectual Property) core. A common architecture for both RISC and DSP instructions, in combination with hardware multi-threading, allows complex systems to be built around a single processor core, where previously two or more different processors would have been used together. This has already proved beneficial in volume real-time systems supporting broadcast and multimedia requirements, including digital radio and digital television receivers. These systems used early versions of META It is expected that the licensable META 120 IP core will form the basis of many future SoC products. Introduction The Imagination Technologies Ltd. (IMG) META processor core is the first commercially available multi-threaded IP (Intellectual Property) core. It is available either stand-alone, in the form of synthesisable VHDL, or as a component of System-on-Chip (SoC) designs. Architecture Overview The processor contains two 32-bit data units, two 32-bit address units and a control unit. Each data unit provides both conventional ALU operations and, optionally, extended DSP arithmetic operations. The address units are cut-down data units, with operations limited to those required for a range of indexed and auto-incremented memory accesses via pointer registers. The control unit looks after program flow, with features to support branch-prediction, lowoverhead looping, exceptions, interrupts, and synchronous triggers. Main memory is accessed via two caches, one for instructions and one for data. Optionally, the data units may include closely coupled DSP RAM which is accessed like an extended register file, bypassing the main memory system. The processor is intended for high volume, multi-function consumer SoC applications. Many SoC systems have performance critical functions which are best implemented as custom hardware accelerators. Accelerators may be memory mapped devices or they may be implemented as coprocessors connected to the CPU s coprocessor port. Co-processor devices benefit from special instructions to optimise the flow of data between CPU registers and the accelerator. DSP Features The standard processor configuration provides a 32-bit fixed point RISC architecture suitable for general purpose applications (typically compiled C code). All processor options benefit from a 64-bit memory architecture which allows two 32-bit operands to be fetched or stored in the same cycle. In addition, an optional package of DSP extensions includes: Larger register files in the data and address units for additional scratch variable storage 40-bit accumulators with sustainable single cycle Multiply-Accumulate (MAC) performance. Extended arithmetic modes including saturation, rounding and scaling. Zero overhead hardware loop support in the control unit. Parallel ALU operation and memory access. DSP RAM memory banks, built into the data units to provide scratch RAM space with register style access
2 Non-linear address modes (bit-reversed and modulo addressing) in the address units. Split arithmetic: a form of SIMD in which a data unit treats a 32-bit register as either two 16-bit values or four 8-bit values and applies the same operation to all of them. Dual bank operation: a form of SIMD in which both data units perform the same operation in parallel. By using both dual bank operation and split-16 arithmetic together, a sustained performance of 4 MACs per cycle is readily achievable. A system of DSP instruction Template registers which allows a rich DSP instruction set to be made available without resorting to an excessively long instruction word. The DSP Template registers implement a table driven VLIW system which keeps the number of DSP op-codes required low enough for both general purpose (RISC) and DSP instructions to be combined in a unified instruction stream. Hardware Multi-Threading The most significant feature of the processor is that it maintains a number of separate hardware execution threads. System designers can tailor the processor implementation by choosing the number of threads and how many of the threads have DSP capability. A typical implementation might have 4 threads of which two would be DSP-capable. A lightweight system might have one general purpose thread and one DSP-capable thread. A heavyweight system might have 4 DSP-capable threads. Although the processing resources (multiplier, adder etc) are shared, each thread has its own registers, pipeline and program counter. On every cycle, a hardware scheduler looks at the next candidate instruction from each thread and chooses which to execute. This choice is based on availability of processing resources and a thread prioritisation system. There is an opportunity for a context switch on every instruction cycle, without overhead. Over fifty internal resources are considered by the scheduler to determine whether a thread can run on the next cycle but most significant, from a programmers viewpoint, is whether a previous memory access has completed the time for a memory access can be lengthy (depending on memory type) and unpredictable (depending on cache state, refresh cycles etc). For high performance DSP code, programmers will attempt to control these memory stalls using techniques such as cache pre-fetches and pipelined address pre-issue. With hardware multi-threading, memory stalls are much less of a problem. When one thread stalls, other threads can execute. Simply dividing a system into a few major subsystems, with one executing on each thread, can be all that is required to eliminate wasted cycles from the system as a whole. Figure 1 and Figure 2 show the difference between the conventional and the multithreaded approaches. By way of example, we have a system of four tasks, two are hard real time tasks in the sense that they must be scheduled in response to external events and complete before the next event arrives. Two further tasks are lower priority continuous background activities. The histogram in Figure 1 shows, as a function of time, the proportion of CPU cycles on which an instruction is issued for a particular CPU and memory combination. This histogram does not reveal all the details of the task scheduling, but we can see that, while the utilisation varies a little as the software scheduler switches tasks, more than 25% of the available CPU cycles are unused. Figure 1 Single Threaded Operation In Figure 2 we see what happens when we use hardware multi-threading to run each of the four tasks on its own hardware thread. The tasks on threads 0 and 1 are the repetitive real-time tasks and these have been given higher priority to ensure they meet their deadlines. The tasks on threads 2 and 3 are the background load tasks
3 We can see that, even though it has a lower priority, thread 1 makes some progress when thread 0 is running by fitting into the unusable (for thread 0) cycles. More significantly, the total CPU utilisation has increased to nearly 100% as shown by the lowest graph. Figure 2 Prioritised Multi-Threading Super-Threading Although there is only one set of processing resources shared between all the threads, not every resource is needed by every instruction. In principle, instructions which do not conflict in their resource requirements can execute simultaneously. Often, more than one thread can issue an instruction on each cycle. We call this super-threading. The performance improvement obtained in practice depends on the particular instruction sequences in the program, but throughput improvements from 50% to 100% relative to simple single-instruction multithreading are not uncommon. Figure 3 shows the additional improvement obtained in our example system as a result of super-threading. This system is identical to that illustrated in Figure 2, but superthreading has increased instruction issues to about 150% of the clock rate. Figure 3 Super-Threading Automatic MIPS Allocation For simplicity, in the examples shown so far, we have used a fixed priority scheme to control the relative execution rates of the four threads. While this is adequate for a system of regular processing in response to periodic events, real systems are more complex and more sophisticated control is required. The scheduler includes a patented system called Automatic MIPS Allocation (AMA ), which dynamically adjusts thread priorities to give each thread, a controlled proportion of the CPU capacity. The control can be applied to both short and long term observation periods, for example to allow a thread to respond to a real-time event with a burst of activity before settling back to its background activity level. Figure 4 illustrates a simple AMA rate control configuration using the same system as in the previous examples. Here we have limited threads 0 and 1 to an instruction issue rate which is just sufficient for them to meet their real-time deadlines (about 40% each). Thread 2 has been set at 20% and thread 3 has been allowed to free-run and absorb the remaining capacity. In particular, this has enabled us to control the relative resource use of threads 2 and 3 with greater precision than simple fixed priorities allowed
4 Figure 4 AMA Rate Control Figure 5 shows a more realistic configuration in which we have used a combination of thread prioritisation and AMA to achieve the desired behaviour. Threads 0 and 1 have elevated priorities for timely real-time response. Thread 2 is controlled by AMA for an execution rate of about 40% instructions/clock and thread 3 is a free-running background task. Because of the interference from the higher priority threads, at times thread 2 cannot achieve the desired run rate and builds up a processing deficit. Whenever possible, for example immediately after the thread 1 activity burst is complete, thread 2 s execution rate is increased to make up the deficit. After catching up, the execution rate returns to the set value. Over short periods, the desired execution rate can not be provided, due to total resource demand exceeding what is available. However, over longer periods (the major time divisions) the average rate of 40% instructions/clock is maintained. Figure 5 AMA & Priority Combination Benefits In hardware multi-threading, execution contexts are replicated but processing logic, external interfaces etc. are shared. This means that META systems introduce new points in the set of cost/performance options available to a system designer. The combination of multi-threading and a unified instruction set for both DSP and RISC operations means that we can often replace a multi-processor RISC+DSP system with a single multi-threaded core. It turns out that the performance benefits of the multi-threaded architecture are easy to realise at the system design level. Often a fairly simplistic partitioning across two or three hardware threads is all that is required to convert a significant proportion of stall cycles into useful processing. If fine-tuning, through load balancing, is required, then the fact that each thread executes the same instruction set makes it simple to move functions from one thread to another. A second benefit of hardware multithreading occurs during system construction and integration. Software authors can develop each thread as a separate subsystem, largely independent of the others. A software tool combines the separate thread images into a single executable system unit at the end of the build process. Each thread can have a completely different software environment. It - 4 -
5 is even possible to run different software operating systems simultaneously on different threads. Hardware multi-threading allows us to separate real-time tasks having widely differing scheduling requirements into different software scheduling domains. For example a subsystem based on a complex multi-functional OS could run on one thread at the same time as a real-time data-driven DSP task runs on another. The DSP task might be based on a simple synchronous IO driven scheduling strategy which would be completely independent of the interrupts and device drivers in the other subsystem. This approach avoids the problems which arise when trying to schedule tasks with wildly differing event rates and activity patterns under a common OS. Such difficulties are another reason why many systems use two or more processors. causes no interference or break-up on the DAB audio an excellent example of how software tasks on different threads can operate independently on the META core. Devices based on META and currently reaching production are applying the same design approaches to bring the cost of digital TV decoders down to a level where they can be integrated into TV sets rather than requiring a separate set-top box. Now, with the new META 120 core, IMG have a set of standard IP configurations which can be licensed alone or as a component of SoC designs. With the core design and architecture validated by real consumer products in volume manufacture, and the availability of comprehensive software algorithm support, IMG expect that META will form the basis of many future high volume products. Application Examples The META core has been used in System on Chip designs for a range of consumer entertainment products. Perhaps the best known, so far, is the Frontier Silicon CHORUS device which implements a Digital Audio Broadcast (DAB) receiver for the internationally adopted Eureka-147 system. In this design three threads are deployed as follows: (i) COFDM symbol demodulation, (ii) DAB protocol and MPEG Audio decoding, (iii) support and control of a dedicated Viterbi decoder hardware device. The fourth thread looks after the user interface keyboard, display, volume and tuning controls. Many different variants of the fourth thread software have been developed to give a variety of product and brand identities ranging from Hi-Fi rack tuners to portable battery powered devices. The CHORUS chip has rapidly established itself as the world s leading DAB receiver. More recent versions of the DAB receiver software have been designed to use only three threads, leaving the fourth available for completely separate applications. For example, both Linux and an MPEG-4 video decoder have been run on the fourth thread at the same time as the DAB radio receiver. Booting Linux on the fourth thread when DAB is being decoded on the other threads - 5 -
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
More informationArchitectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
More informationEmbedded System Hardware - Processing (Part II)
12 Embedded System Hardware - Processing (Part II) Jian-Jia Chen (Slides are based on Peter Marwedel) Informatik 12 TU Dortmund Germany Springer, 2010 2014 年 11 月 11 日 These slides use Microsoft clip arts.
More informationOperating Systems 4 th Class
Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science
More information(Refer Slide Time: 00:01:16 min)
Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control
More informationFLIX: Fast Relief for Performance-Hungry Embedded Applications
FLIX: Fast Relief for Performance-Hungry Embedded Applications Tensilica Inc. February 25 25 Tensilica, Inc. 25 Tensilica, Inc. ii Contents FLIX: Fast Relief for Performance-Hungry Embedded Applications...
More informationARM Microprocessor and ARM-Based Microcontrollers
ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 A Microcontroller-Based Embedded System Roadmap 1 Introduction ARM ARM Basics 2 ARM Extensions Thumb Jazelle NEON & DSP Enhancement
More informationİSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
More informationA Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationIntroduction to the Latest Tensilica Baseband Solutions
Introduction to the Latest Tensilica Baseband Solutions Dr. Chris Rowen Founder and Chief Technology Officer Tensilica Inc. Outline The Mobile Wireless Challenge Multi-standard Baseband Tensilica Fits
More informationMPSoC Designs: Driving Memory and Storage Management IP to Critical Importance
MPSoC Designs: Driving Storage Management IP to Critical Importance Design IP has become an essential part of SoC realization it is a powerful resource multiplier that allows SoC design teams to focus
More informationwhat operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
More informationLesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education
Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,
More informationMulti-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer
More informationGSM/GPRS PHYSICAL LAYER ON SANDBLASTER DSP
GSM/GPRS PHYSICAL LAYER ON SANDBLASTER DSP Raghunath Kalavai, Murugappan Senthilvelan, Sitij Agrawal, Sanjay Jinturkar, John Glossner Sandbridge Technologies, 1 North Lexington Avenue, White Plains, NY
More information7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
More informationAdvanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
More informationUsing On-chip Networks to Minimize Software Development Costs
Using On-chip Networks to Minimize Software Development Costs Today s multi-core SoCs face rapidly escalating costs driven by the increasing number of cores on chips. It is common to see code bases for
More informationCHAPTER 7: The CPU and Memory
CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
More informationLSN 2 Computer Processors
LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2
More informationA Generic Network Interface Architecture for a Networked Processor Array (NePA)
A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction
More informationOperating System Impact on SMT Architecture
Operating System Impact on SMT Architecture The work published in An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture, Josh Redstone et al., in Proceedings of the 9th
More informationStudy Plan Masters of Science in Computer Engineering and Networks (Thesis Track)
Plan Number 2009 Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track) I. General Rules and Conditions 1. This plan conforms to the regulations of the general frame of programs
More informationAn Introduction to the ARM 7 Architecture
An Introduction to the ARM 7 Architecture Trevor Martin CEng, MIEE Technical Director This article gives an overview of the ARM 7 architecture and a description of its major features for a developer new
More information150127-Microprocessor & Assembly Language
Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an
More informationQ. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:
Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove
More informationSOC architecture and design
SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external
More informationCentral Processing Unit (CPU)
Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following
More informationVHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU
VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU Martin Straka Doctoral Degree Programme (1), FIT BUT E-mail: strakam@fit.vutbr.cz Supervised by: Zdeněk Kotásek E-mail: kotasek@fit.vutbr.cz
More informationChapter 11 I/O Management and Disk Scheduling
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization
More informationChapter 2 Logic Gates and Introduction to Computer Architecture
Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are
More informationComputer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
More informationUNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction
More informationA Lab Course on Computer Architecture
A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,
More informationVon der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor
Von der Hardware zur Software in FPGAs mit Embedded Prozessoren Alexander Hahn Senior Field Application Engineer Lattice Semiconductor AGENDA Overview Mico32 Embedded Processor Development Tool Chain HW/SW
More informationMICROPROCESSOR AND MICROCOMPUTER BASICS
Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationMobile Processors: Future Trends
Mobile Processors: Future Trends Mário André Pinto Ferreira de Araújo Departamento de Informática, Universidade do Minho 4710-057 Braga, Portugal maaraujo@mail.pt Abstract. Mobile devices, such as handhelds,
More informationAn Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
More informationLet s put together a Manual Processor
Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationAbout Parallels Desktop 7 for Mac
About Parallels Desktop 7 for Mac Parallels Desktop 7 for Mac is a major upgrade to Parallels' award-winning software for running Windows on a Mac. About this Update This update for Parallels Desktop for
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationAn Overview of Stack Architecture and the PSC 1000 Microprocessor
An Overview of Stack Architecture and the PSC 1000 Microprocessor Introduction A stack is an important data handling structure used in computing. Specifically, a stack is a dynamic set of elements in which
More informationFive Families of ARM Processor IP
ARM1026EJ-S Synthesizable ARM10E Family Processor Core Eric Schorn CPU Product Manager ARM Austin Design Center Five Families of ARM Processor IP Performance ARM preserves SW & HW investment through code
More informationInstruction Set Architecture
Instruction Set Architecture Consider x := y+z. (x, y, z are memory variables) 1-address instructions 2-address instructions LOAD y (r :=y) ADD y,z (y := y+z) ADD z (r:=r+z) MOVE x,y (x := y) STORE x (x:=r)
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationSolution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
More informationInstruction Set Design
Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,
More informationLogical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.
Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how
More informationSystem Considerations
System Considerations Interfacing Performance Power Size Ease-of Use Programming Interfacing Debugging Cost Device cost System cost Development cost Time to market Integration Peripherals Different Needs?
More informationTensilica Software Development Toolkit (SDK)
Tensilica Datasheet Tensilica Software Development Toolkit (SDK) Quickly develop application code Features Cadence Tensilica Xtensa Xplorer Integrated Development Environment (IDE) with full graphical
More informationSwitch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
More informationCHAPTER 4 MARIE: An Introduction to a Simple Computer
CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks
More informationMobile Operating Systems Lesson 05 Windows CE Part 1
Mobile Operating Systems Lesson 05 Windows CE Part 1 Oxford University Press 2007. All rights reserved. 1 Windows CE A 32 bit OS from Microsoft Customized for each specific hardware and processor in order
More informationComputer System Design. System-on-Chip
Brochure More information from http://www.researchandmarkets.com/reports/2171000/ Computer System Design. System-on-Chip Description: The next generation of computer system designers will be less concerned
More informationBEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single
More informationCentral Processing Unit
Chapter 4 Central Processing Unit 1. CPU organization and operation flowchart 1.1. General concepts The primary function of the Central Processing Unit is to execute sequences of instructions representing
More informationDigitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation
More informationCPU Organisation and Operation
CPU Organisation and Operation The Fetch-Execute Cycle The operation of the CPU 1 is usually described in terms of the Fetch-Execute cycle. 2 Fetch-Execute Cycle Fetch the Instruction Increment the Program
More informationWhat is a System on a Chip?
What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex
More informationThe Central Processing Unit:
The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Objectives Identify the components of the central processing unit and how they work together and interact with memory Describe how
More informationModule 8. Industrial Embedded and Communication Systems. Version 2 EE IIT, Kharagpur 1
Module 8 Industrial Embedded and Communication Systems Version 2 EE IIT, Kharagpur 1 Lesson 37 Real-Time Operating Systems: Introduction and Process Management Version 2 EE IIT, Kharagpur 2 Instructional
More informationSpacecraft Computer Systems. Colonel John E. Keesee
Spacecraft Computer Systems Colonel John E. Keesee Overview Spacecraft data processing requires microcomputers and interfaces that are functionally similar to desktop systems However, space systems require:
More informationPerformance Comparison of RTOS
Performance Comparison of RTOS Shahmil Merchant, Kalpen Dedhia Dept Of Computer Science. Columbia University Abstract: Embedded systems are becoming an integral part of commercial products today. Mobile
More informationScheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:
Scheduling Scheduling Scheduling levels Long-term scheduling. Selects which jobs shall be allowed to enter the system. Only used in batch systems. Medium-term scheduling. Performs swapin-swapout operations
More informationDigital Signal Controller Based Automatic Transfer Switch
Digital Signal Controller Based Automatic Transfer Switch by Venkat Anant Senior Staff Applications Engineer Freescale Semiconductor, Inc. Abstract: An automatic transfer switch (ATS) enables backup generators,
More informationWhy Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
More informationInstruction Set Architecture. or How to talk to computers if you aren t in Star Trek
Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture
More informationEE361: Digital Computer Organization Course Syllabus
EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)
More informationIA-64 Application Developer s Architecture Guide
IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve
More informationCISC, RISC, and DSP Microprocessors
CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:
More information18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013
18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 Reminder: Homeworks for Next Two Weeks Homework 0 Due next Wednesday (Jan 23), right
More informationReal-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 26 Real - Time POSIX. (Contd.) Ok Good morning, so let us get
More informationLynxOS RTOS (Real-Time Operating System)
LynxOS RTOS (Real-Time Operating System) Stephen J. Franz CS-550 Section 1 Fall 2005 1 Summary LynxOS is one of two real time operating systems (RTOS) developed and marketed by LynuxWorks of San José,
More informationMP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN. zl2211@columbia.edu. ml3088@columbia.edu
MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN Zheng Lai Zhao Liu Meng Li Quan Yuan zl2215@columbia.edu zl2211@columbia.edu ml3088@columbia.edu qy2123@columbia.edu I. Overview Architecture The purpose
More information1. Computer System Structure and Components
1 Computer System Structure and Components Computer System Layers Various Computer Programs OS System Calls (eg, fork, execv, write, etc) KERNEL/Behavior or CPU Device Drivers Device Controllers Devices
More informationEE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution
More informationManagement Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?
Management Challenge Managing Hardware Assets What computer processing and storage capability does our organization need to handle its information and business transactions? What arrangement of computers
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationTHROUGHPUTER. Parallel Program Development and Execution Platform as a Service
THROUGHPUTER Parallel Program Development and Execution Platform as a Service Many Cloud Computing Challenge - Technical Example: Average demands by applications sharing a 16- processor app1 12.5% Actual
More informationCortex-A9 MPCore Software Development
Cortex-A9 MPCore Software Development Course Description Cortex-A9 MPCore software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop
More informationCounters and Decoders
Physics 3330 Experiment #10 Fall 1999 Purpose Counters and Decoders In this experiment, you will design and construct a 4-bit ripple-through decade counter with a decimal read-out display. Such a counter
More informationInfrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
More informationQsys and IP Core Integration
Qsys and IP Core Integration Prof. David Lariviere Columbia University Spring 2014 Overview What are IP Cores? Altera Design Tools for using and integrating IP Cores Overview of various IP Core Interconnect
More informationS7 for Windows S7-300/400
S7 for Windows S7-300/400 A Programming System for the Siemens S7 300 / 400 PLC s IBHsoftec has an efficient and straight-forward programming system for the Simatic S7-300 and ern controller concept can
More informationWhich ARM Cortex Core Is Right for Your Application: A, R or M?
Which ARM Cortex Core Is Right for Your Application: A, R or M? Introduction The ARM Cortex series of cores encompasses a very wide range of scalable performance options offering designers a great deal
More informationOutline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip
Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC
More informationUnit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit
Unit A451: Computer systems and programming Section 2: Computing Hardware 1/5: Central Processing Unit Section Objectives Candidates should be able to: (a) State the purpose of the CPU (b) Understand the
More informationDesign and Implementation of the Heterogeneous Multikernel Operating System
223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,
More informationESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU
ESE566 REPORT3 Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU Nov 19th, 2002 ABSTRACT: In this report, we discuss several recent published papers on design methodologies of core-based
More informationCS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study
CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what
More informationHyperThreading Support in VMware ESX Server 2.1
HyperThreading Support in VMware ESX Server 2.1 Summary VMware ESX Server 2.1 now fully supports Intel s new Hyper-Threading Technology (HT). This paper explains the changes that an administrator can expect
More informationSPARC64 VIIIfx: CPU for the K computer
SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS
More informationCS352H: Computer Systems Architecture
CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Data Hazards in ALU Instructions
More informationA Powerful solution for next generation Pcs
Product Brief 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k A Powerful solution for next generation Pcs Looking for
More informationMeasuring Cache and Memory Latency and CPU to Memory Bandwidth
White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary
More informationDEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV UNIT I THE 8086 MICROPROCESSOR 1. What is the purpose of segment registers
More informationWeek 1 out-of-class notes, discussions and sample problems
Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types
More information