FLIX: Fast Relief for Performance-Hungry Embedded Applications

Size: px
Start display at page:

Download "FLIX: Fast Relief for Performance-Hungry Embedded Applications"

Transcription

1 FLIX: Fast Relief for Performance-Hungry Embedded Applications Tensilica Inc. February Tensilica, Inc.

2 25 Tensilica, Inc. ii

3 Contents FLIX: Fast Relief for Performance-Hungry Embedded Applications... Four Applications in Search of Acceleration...2 Works on Large Code Blocks Too...5 Conclusion...8 Figures Figure : Designer-defined FLIX instructions for the Xtensa LX processor can be either 32 or 64 bits wide and can encode several independent operations in one instruction word....2 Figure 2: Bit Manipulator application performance versus gate-count....4 Figure 3: H.264 deblocking filter application performance versus gate-count...6 Figure 4: MPEG-4 decoder application performance versus gate-count...7 Figure 5: SAD (sum of absolute differences) application performance versus gate-count...8 Tables Table : Results for Bit Manipulator application...9 Table 2: Results for H.264 deblocking filter...9 Table 3: Results for MPEG-4 decoder...9 Table 4: Results for SAD (sum of absolute differences) engine Tensilica, Inc. iii

4 25 Tensilica, Inc. iv

5 FLIX: Fast Relief for Performance-Hungry Embedded Applications By Steven Leibson and John Massingham Tensilica, Inc. Microprocessors are great building blocks for all types of embedded systems because they re so flexible. Compile some code for them and they can decode and play digital audio, route IP network packets, or decompress video (just to name a very few applications). If microprocessors were infinitely fast, there d never be a need to design any other hardware. However, microprocessors aren t infinitely fast. Often, they re not even fast enough to meet project goals. One of the bottlenecks in general-purpose microprocessor designs that prevent them from meeting performance goals is their insistence on executing one operation at a time. Modern RISC processor designs solve this problem somewhat through pipelining, which allows several instructions to be in various pipeline stages simultaneously. However, most RISC designs remain singleinstruction-issue machines. To combat this bottleneck, processor designers sometimes develop designs that issue and execute multiple independent operations simultaneously. These processors are often called VLIW (very long instruction word) machines because they encode multiple independent operations into one long instruction word. Many classes of programs benefit from the increased instruction parallelism provided by VLIW processor designs. However, VLIW instruction words must often be hundreds of bits long to allow the encoding of many simultaneous independent operations. As a result, VLIW programs tend to be large, which is the usual price for encoding multiple, independent operations. Tensilica has developed a VLIW-like technology called FLIX (flexible-length instruction extensions) for its Xtensa processor core family. This technology offers developers a way to realize the performance of VLIW instructions but without the usual VLIW code bloat. And with Tensilica s XPRES Compiler, SOC designers don t have to become processor designers to employ this technology the XPRES Compiler exploits capability when it automatically generates processor configurations. FLIX instruction formats can be either 32 or 64 bits wide and can encode many independent operations in designer-defined operation slots within the FLIX instruction word, as shown in Figure. Note that as the number of independent operations encoded in each FLIX instruction increases, the number of bits available in each operation slot decreases because the number of bits in the instruction is constant. With fewer available encoding bits, generalpurpose instructions become more specialized because there are fewer bits available to specify source and destination operands and immediate values. This fact will be important to remember when analyzing the results of the tests made for this white paper. 25 Tensilica, Inc.

6 Designer-Defined FLIX Instruction Formats with Designer-Defined Number of Operations 63 Operation Operation 2 Operation3 Example: 3-Operation, 64-bit Instruction Format 63 Operation Operation 2 Op 3 Op 4 Operation 5 Example: 5-Operation, 64-bit Instruction Format 3 Op Op 2 Op 3 Op. 4 Example: 4-Operation, 32-bit Instruction Format These three examples show (from top to bottom) a 3-operation, 64-bit FLIX instruction; a 5-operation, 64-bit FLIX instruction; and a 4-operation, 32-bit FLIX instruction. Note that as the number of operations in a FLIX instruction increases, the number of bits available to encode each operation decreases. Figure : Designer-defined FLIX instructions for the Xtensa LX processor can be either 32 or 64 bits wide and can encode several independent operations in one instruction word. The Xtensa LX C/C++ compiler that is generated along with a FLIX-enhanced Xtensa LX processor core can exploit the operational parallelism provided by FLIX instructions. Thus FLIX instructions can be used selectively to improve application performance where needed while the processor s native 24- and 6-bit instructions can be used in other sections of the code where parallelism isn t needed. This flexibility allows the compiler to generate compact code in sections of the application where the high performance of multiple operations/clock isn t required. Four Applications in Search of Acceleration To demonstrate the ability of FLIX instructions to accelerate code performance, Tensilica used the XPRES Compiler to automatically analyze the C code of four different applications. The XPRES Compiler is an ideal tool for this sort of architectural exploration. In much less than an hour, the XPRES Compiler analyzes the instruction flow within a C program and then generates hundreds of thousands or millions of candidate processor architectures all based on Tensilica s Xtensa LX processor core. It then selects the best candidates based on silicon area (cost) and performance criteria, and presents the final candidates to the system designer who selects a final architecture based on project goals. For each of the four applications considered in this white paper, we used a baseline single-instruction-issue processor configuration with one load/store unit that executes the full Xtensa LX instruction set to generate baseline performance numbers. We then allowed the XPRES Compiler to generate eight additional processor configurations (four with one load/store unit and four with two load/store units) with FLIX enhancements. In each group, the XPRES Compiler generated versions of the Xtensa LX processor with 2, 3, 4, and 5 operation slots in the FLIX instruction word. The addition of a second load/store unit allows the Xtensa LX processor to emulate XY memory operation that is a popular performance-enhancing feature found in many DSP processors. Addition of the second load/store unit requires the use of FLIX technology because each load/store unit requires its own operation field. For these experiments, we restricted the XPRES Compiler to use only one of its three optimization methods: the addition of FLIX instructions. The XPRES Compiler can also create new kinds of instructions using operator fusion and SIMD vectorization techniques 25 Tensilica, Inc. 2

7 and these additional optimizations are discussed in another Tensilica white paper: The XPRES Compiler triple-threat solution to code performance challenges. However, for this white paper, we constrained the XPRES Compiler to use of FLIX optimizations and no new operations were created. The XPRES compiler was only allowed to replicate baseline processor instructions in the additional FLIX operation slots if the additional parallelism could increase the application s performance. The results from these experiments show which of the four application programs benefit from the addition of an extra load/store unit and which benefit from the additional operation slots. The four test applications include: Bit Manipulator, a simple multi-operation algorithm that takes two numbers, masks each, shifts each, and then adds them together in a loop An H.264 (video) deblocking filter A SAD (sum of absolute differences) algorithm for video motion estimation An MPEG-4 video decoder algorithm Cycle counts for these four algorithms running on the baseline single-operation/clock Xtensa LX processor range from a few tens of thousands to hundreds of millions of cycles. Performance improvements from FLIX extensions range from cycle-count reductions of as much as 63% (the code runs nearly three times faster) to about 6%, which shows that not all code benefits from the availability of multiple simultaneous operations. Some code is stubbornly serial and cannot be accelerated through the operational parallelism of SIMD units or even big VLIW architectures. It s very important to note that the use of the XPRES Compiler allowed this design-space exploration to occur very quickly. XPRES can examine a block of code and generate multiple processor designs in less than an hour. Even a 6% performance improvement could help many projects meet performance goals. However, tripling an algorithm s speed in the time it takes to go out for a meal is truly a remarkable result. Tables through 4 (which appear at the end of this white paper) list the raw performance numbers for the four applications listed above. These tables show the performance of the unaugmented Xtensa LX processor core, a very competent 32-bit embedded RISC processor even without application-specific extensions, and they list the performance numbers for the enhanced 2-, 3-, 4-, and 5-slot Xtensa LX processors created by the XPRES compiler. 25 Tensilica, Inc. 3

8 35, 3, slot Load/Store Unit 2 Load/Store Units 25, Cycle Count 2, 5,, 2 slots 2 slots 5,, 2, 3, 4, 5, 6, Incremental Gate Count A plot for an Xtensa LX base processor and processors with 2-, 3-, 4-, and 5-slot FLIX extensions and one or two load/store units. Figure 2: Bit Manipulator application performance versus gate-count. The Bit Manipulator application results appear in Figure 2 and are perhaps the easiest to understand. The dark line in Figure 2 shows performance results for the baseline Xtensa LX processor and for processors that have been enhanced with 2-, 3-, 4-, and 5- slot FLIX instructions. All of the operation slots in these processors are filled with instances of Xtensa LX baseline instructions but the processors with FLIX enhancements can execute multiple operations during each clock cycle. All of the processors represented by the dark line in the graph in Figure 2 have one load/store unit. The lighter line in Figure 2 plots the same results but all of the processors on that line (except for the baseline processor) have two load/store units. The graph plots the cycle count required to execute the application versus the number of additional gates required to add the multiple execution units and the second load/store unit. As you can see from this graph, adding the ability to execute multiple simultaneous operations greatly accelerates the Bit Manipulator application. This application places load, mask, shift, and store operations within a loop and the Xtensa C/C++ compiler is able to profitably use the additional parallel execution resources to accelerate loop performance. A processor with 3-slot FLIX instructions requires only about 37% of the execution cycles to execute this application code compared to the baseline Xtensa LX processor it s nearly three times faster for this application. The addition of a FLIX instruction format with three operation slots is a general sort of extension and can be usefully employed to accelerate a wide range of application code. There are three additional factors to note with respect to Figure 2:. Essentially all of the benefit from parallel operations is realized in the processor with 3-slot FLIX instructions. More operation slots add more hardware parallelism but the Xtensa C/C++ compiler is unable to exploit the additional available instruction-level parallelism for this particular application program. 25 Tensilica, Inc. 4

9 2. Processors with a second load/store unit (results shown by the lighter line in Figure 2) are no faster than the same processor configuration with one load/store unit. This result indicates that the Bit Manipulator application is compute intensive and that the load/store unit is not a bottleneck in this instance, so the additional cost (in terms of silicon area) of the second load/store unit is not merited in this case. 3. The 5-slot version of the Xtensa LX processor actually exhibits slightly degraded performance compared to the 3- and 4-slot versions. This result shows that forcing the XPRES compiler to add more than the required number of operation slots can result in a loss of operation efficiency by reducing the number of operation-encoding bits available to each operation slot. Having more encoding bits available per operation allows the XPRES Compiler to create more comprehensive operations so that the C compiler needs fewer of these operations to execute a task. Works on Large Code Blocks Too Figure 3 illustrates these same trends but for a much larger application: an H.264 deblocking filter. This application program requires nearly 2 million cycles to run on an unaugmented Xtensa LX processor. The XPRES Compiler achieves about a 6% performance improvement (eliminating more than million execution cycles from the application in a few minutes) by adding a FLIX instruction format with two operation slots. Based on the results, this particular application appears to be more limited by data movement than by a lack of computational resources because additional operation slots do not appear to improve performance. In fact, adding more than two operation slots appears to slightly degrade performance compared to the 2-slot result. However, the addition of a second load/store unit nearly doubles the achieved performance improvement, as shown by the lighter line in the graph in Figure 3. This result differs from the one observed for the Bit Manipulator application, demonstrating that different applications really do benefit from different processor optimizations. 25 Tensilica, Inc. 5

10 2,, 95,, slot 9,, Load/Store Unit 2 Load/Store Units Cycle Count 85,, 8,, 75,, 7,, 2 slots 2 slots 65,, 6,, 55,, 5,,, 2, 3, 4, 5, 6, Incremental Gate Count A plot for an Xtensa LX base processor and processors with 2-, 3-, 4-, and 5-slot FLIX extensions and one or two load/store units. Figure 3: H.264 deblocking filter application performance versus gate-count Another video application, an MPEG-4 video decoder, exhibits the same sort of results as the H.264 deblocking filter. Results for this application appear in Figure 4 and the pattern of the results for the processors with one load/store unit is very similar to the results obtained for the H.264 deblocking filter but the MPEG-4 video decoder application code benefits more from the optimizations of the XPRES Compiler, which achieves a 23% reduction in cycle count by adding a second operation slot to the Xtensa LX processor. In addition, a second load/store unit further increases performance and the processor with two load/store units can gainfully exploit three operation slots for yet more performance. Again, it s important to remember that all of this design-space exploration consumes only a few minutes because it s automated by the XPRES Compiler. Normally, a design team would not be able to conduct this sort of extensive architectural research because handdesigned processor variants require months of design time, not minutes. 25 Tensilica, Inc. 6

11 3,, 2,, slot Load/Store Unit 2 Load/Store Units,, Cycle Count,, 9,, 2 slots 2 slots 8,, 7,, 6,,, 2, 3, 4, 5, 6, Incremental Gate Count A plot for an Xtensa LX base processor and processors with 2-, 3-, 4-, and 5-slot FLIX extensions and one or two load/store units. Figure 4: MPEG-4 decoder application performance versus gate-count The fourth application, which is also from the video domain, is a SAD (sum of absolute differences) engine used for video motion estimation. Results achieved for this application (shown in Figure 5) are very similar to those achieved for the Bit Manipulator application, although the SAD application consumes about 2 times the number of cycles. The addition of 3-slot FLIX instructions cuts the cycle count by about 63% and the addition of a second load/store unit provides no benefit to the SAD engine code. 25 Tensilica, Inc. 7

12 8, slot 7, 6, Load/Store Unit 2 Load/Store Units Cycle Count 5, 4, 3, 2, 2 slots 2 slots,, 2, 3, 4, 5, 6, Incremental Gate Count A plot for an Xtensa LX base processor and processors with 2-, 3-, 4-, and 5-slot FLIX extensions and one or two load/store units. Figure 5: SAD (sum of absolute differences) application performance versus gate-count Conclusion Use of the XPRES Compiler in this white paper was artificially constrained but the results are real and demonstrate both the ability of FLIX multi-operation instructions to accelerate various applications and the ability of the XPRES Compiler to rapidly explore processor configurations and extensions that can accelerate the execution of critical code blocks in a system. The XPRES Compiler results for each application discussed in this white paper consumed far less than an hour per run, resulting in performance enhancements that range from some performance improvement to a tripling of code execution speed. The automated XPRES Compiler allows the developer to discover how much performance benefit FLIX instruction extensions can provide to an application in the time it takes to eat lunch. Some applications benefit only mildly from FLIX-type processor enhancement and others benefit substantially. Examination of results from experiments conducted with the XPRES Compiler by Tensilica customers has prompted design teams to restructure some application code. This code restructuring, taking only a day or two, has substantially boosted application performance in some cases. 25 Tensilica, Inc. 8

13 Number of Instruction slots Table : Results for the Bit Manipulator application Bit Manipulator One Load/Store Unit Two Load/Store Units Cycles Gates Cycles Gates slot 3,746 NA NA 2-slot FLIX 5,384 2,263 5,384 35,284 3-slot FLIX,29 24,662,287 48,46 4-slot FLIX,29 24,73,287 49,43 5-slot FLIX,287 26,366 2,34 49,946 Number of Instruction slots Table 2: Results for the H.264 deblocking filter H.264 Deblocking Filter One Load/Store Unit Two Load/Store Units Cycles Gates Cycles Gates slot 93,788,7 NA NA 2-slot FLIX 8,4,93 2, 74,638,399 36,92 3-slot FLIX 82,552,788 2,55 74,9,533 49,26 4-slot FLIX 82,679,72 2,684 77,978,299 48,75 5-slot FLIX 82,988,82 2,765 79,85,489 48,837 Number of Instruction slots Table 3: Results for the MPEG-4 decoder MPEG-4 Decoder One Load/Store Unit Two Load/Store Units Cycles Gates Cycles Gates slot 22,8,59 NA NA 2-slot FLIX 93,684,776 4,74 9,69,824 38,833 3-slot FLIX,669,84 27,992 85,259,477 5,56 4-slot FLIX,7,48 28,9 9,647,42 5,549 5-slot FLIX,36,58 28,29 87,847,53 52,439 Number of Instruction slots Table 4: Results for the SAD (sum of absolute differences) engine SAD Engine One Load/Store Unit Two Load/Store Units Cycles Gates Cycles Gates slot 729,322 NA NA 2-slot FLIX 369,37 2,2 369,37 35,23 3-slot FLIX 287,943 22, ,943 45,76 4-slot FLIX 27,43 22,747 27,43 45,862 5-slot FLIX 287,993 24,76 287,993 47,89 25 Tensilica, Inc. 9

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Boost ASIC and SOC Performance by Matching Processor to Task through Automated Processor Generation

Boost ASIC and SOC Performance by Matching Processor to Task through Automated Processor Generation WHITE PAPER Boost ASIC and SOC Performance by Matching Processor to Task through Automated Processor Generation System architects face a number of important design decisions when creating the best ASIC

More information

ARM Microprocessor and ARM-Based Microcontrollers

ARM Microprocessor and ARM-Based Microcontrollers ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 A Microcontroller-Based Embedded System Roadmap 1 Introduction ARM ARM Basics 2 ARM Extensions Thumb Jazelle NEON & DSP Enhancement

More information

Introduction to the Latest Tensilica Baseband Solutions

Introduction to the Latest Tensilica Baseband Solutions Introduction to the Latest Tensilica Baseband Solutions Dr. Chris Rowen Founder and Chief Technology Officer Tensilica Inc. Outline The Mobile Wireless Challenge Multi-standard Baseband Tensilica Fits

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

VLIW Processors. VLIW Processors

VLIW Processors. VLIW Processors 1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW

More information

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:

More information

High-speed image processing algorithms using MMX hardware

High-speed image processing algorithms using MMX hardware High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Understanding Compression Technologies for HD and Megapixel Surveillance

Understanding Compression Technologies for HD and Megapixel Surveillance When the security industry began the transition from using VHS tapes to hard disks for video surveillance storage, the question of how to compress and store video became a top consideration for video surveillance

More information

Instruction Set Design

Instruction Set Design Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,

More information

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor

More information

Tensilica Software Development Toolkit (SDK)

Tensilica Software Development Toolkit (SDK) Tensilica Datasheet Tensilica Software Development Toolkit (SDK) Quickly develop application code Features Cadence Tensilica Xtensa Xplorer Integrated Development Environment (IDE) with full graphical

More information

Compiling PCRE to FPGA for Accelerating SNORT IDS

Compiling PCRE to FPGA for Accelerating SNORT IDS Compiling PCRE to FPGA for Accelerating SNORT IDS Abhishek Mitra Walid Najjar Laxmi N Bhuyan QuickTime and a QuickTime and a decompressor decompressor are needed to see this picture. are needed to see

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86

More information

on an system with an infinite number of processors. Calculate the speedup of

on an system with an infinite number of processors. Calculate the speedup of 1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements

More information

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27 Logistics Week 1: Wednesday, Jan 27 Because of overcrowding, we will be changing to a new room on Monday (Snee 1120). Accounts on the class cluster (crocus.csuglab.cornell.edu) will be available next week.

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

Data Analysis Software

Data Analysis Software Data Analysis Software Compatible with all Race Technology products Fully integrated video support Accurate track maps Graphs generated with a single mouse click for fast analysis Automatically splits

More information

Fast Arithmetic Coding (FastAC) Implementations

Fast Arithmetic Coding (FastAC) Implementations Fast Arithmetic Coding (FastAC) Implementations Amir Said 1 Introduction This document describes our fast implementations of arithmetic coding, which achieve optimal compression and higher throughput by

More information

PowerPC Microprocessor Clock Modes

PowerPC Microprocessor Clock Modes nc. Freescale Semiconductor AN1269 (Freescale Order Number) 1/96 Application Note PowerPC Microprocessor Clock Modes The PowerPC microprocessors offer customers numerous clocking options. An internal phase-lock

More information

Microwatt to Megawatt - Transforming Edge to Data Centre Insights

Microwatt to Megawatt - Transforming Edge to Data Centre Insights Security Level: Public Microwatt to Megawatt - Transforming Edge to Data Centre Insights Steve Langridge steve.langridge@huawei.com May 3, 2015 www.huawei.com Agenda HW Acceleration System thinking Big

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

CISC, RISC, and DSP Microprocessors

CISC, RISC, and DSP Microprocessors CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

IA-64 Application Developer s Architecture Guide

IA-64 Application Developer s Architecture Guide IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve

More information

Design Cycle for Microprocessors

Design Cycle for Microprocessors Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance MPSoC Designs: Driving Storage Management IP to Critical Importance Design IP has become an essential part of SoC realization it is a powerful resource multiplier that allows SoC design teams to focus

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

More information

White paper. H.264 video compression standard. New possibilities within video surveillance.

White paper. H.264 video compression standard. New possibilities within video surveillance. White paper H.264 video compression standard. New possibilities within video surveillance. Table of contents 1. Introduction 3 2. Development of H.264 3 3. How video compression works 4 4. H.264 profiles

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu. Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive

More information

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions Insight, Analysis, and Advice on Signal Processing Technology BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions Steve Ammon Berkeley Design Technology, Inc.

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Parallel Scalable Algorithms- Performance Parameters

Parallel Scalable Algorithms- Performance Parameters www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

CSE 237A Final Project Final Report

CSE 237A Final Project Final Report CSE 237A Final Project Final Report Multi-way video conferencing system over 802.11 wireless network Motivation Yanhua Mao and Shan Yan The latest technology trends in personal mobile computing are towards

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

An Overview of Stack Architecture and the PSC 1000 Microprocessor

An Overview of Stack Architecture and the PSC 1000 Microprocessor An Overview of Stack Architecture and the PSC 1000 Microprocessor Introduction A stack is an important data handling structure used in computing. Specifically, a stack is a dynamic set of elements in which

More information

Week 1 out-of-class notes, discussions and sample problems

Week 1 out-of-class notes, discussions and sample problems Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types

More information

Instruction Set Architecture (ISA) Design. Classification Categories

Instruction Set Architecture (ISA) Design. Classification Categories Instruction Set Architecture (ISA) Design Overview» Classify Instruction set architectures» Look at how applications use ISAs» Examine a modern RISC ISA (DLX)» Measurement of ISA usage in real computers

More information

Increasing performance and lowering the cost of storage for VDI With Virsto, Citrix, and Microsoft

Increasing performance and lowering the cost of storage for VDI With Virsto, Citrix, and Microsoft Increasing performance and lowering the cost of storage for VDI With Virsto, Citrix, and Microsoft 2010 Virsto www.virsto.com Virsto: Improving VDI with Citrix and Microsoft Virsto Software, developer

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Flexible VDSL2 datapath IP for SOC designs provides ready access to the VDSL2 chip market

Flexible VDSL2 datapath IP for SOC designs provides ready access to the VDSL2 chip market An UpZide White Paper Flexible datapath IP for SOC designs provides ready access to the chip market The fundamentals of the ITU-T G.993.2 recommendation was accepted by the ITU in mid 2005 and the rush

More information

Introduction to Digital System Design

Introduction to Digital System Design Introduction to Digital System Design Chapter 1 1 Outline 1. Why Digital? 2. Device Technologies 3. System Representation 4. Abstraction 5. Development Tasks 6. Development Flow Chapter 1 2 1. Why Digital

More information

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit Unit A451: Computer systems and programming Section 2: Computing Hardware 1/5: Central Processing Unit Section Objectives Candidates should be able to: (a) State the purpose of the CPU (b) Understand the

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

PROBLEMS. which was discussed in Section 1.6.3.

PROBLEMS. which was discussed in Section 1.6.3. 22 CHAPTER 1 BASIC STRUCTURE OF COMPUTERS (Corrisponde al cap. 1 - Introduzione al calcolatore) PROBLEMS 1.1 List the steps needed to execute the machine instruction LOCA,R0 in terms of transfers between

More information

The Evolution of CCD Clock Sequencers at MIT: Looking to the Future through History

The Evolution of CCD Clock Sequencers at MIT: Looking to the Future through History The Evolution of CCD Clock Sequencers at MIT: Looking to the Future through History John P. Doty, Noqsi Aerospace, Ltd. This work is Copyright 2007 Noqsi Aerospace, Ltd. This work is licensed under the

More information

Computer Architecture TDTS10

Computer Architecture TDTS10 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

More information

Router Architectures

Router Architectures Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components

More information

Computer Organization and Components

Computer Organization and Components Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides

More information

Pexip Speeds Videoconferencing with Intel Parallel Studio XE

Pexip Speeds Videoconferencing with Intel Parallel Studio XE 1 Pexip Speeds Videoconferencing with Intel Parallel Studio XE by Stephen Blair-Chappell, Technical Consulting Engineer, Intel Over the last 18 months, Pexip s software engineers have been optimizing Pexip

More information

5Get rid of hackers and viruses for

5Get rid of hackers and viruses for Reprint from TechWorld /2007 TEChWoRLd ISSuE 2007 ThEBIG: 5 FIREWaLLS TEChWoRLd ISSuE 2007 ThEBIG: 5 FIREWaLLS TEChWoRLd ISSuE 2007 ThEBIG: 5 FIREWaLLS # # # Load balancing is basically a simple task where

More information

ELEC 5260/6260/6266 Embedded Computing Systems

ELEC 5260/6260/6266 Embedded Computing Systems ELEC 5260/6260/6266 Embedded Computing Systems Spring 2016 Victor P. Nelson Text: Computers as Components, 3 rd Edition Prof. Marilyn Wolf (Georgia Tech) Course Topics Embedded system design & modeling

More information

A PPENDIX H RITERIA FOR AES E VALUATION C RITERIA FOR

A PPENDIX H RITERIA FOR AES E VALUATION C RITERIA FOR A PPENDIX H RITERIA FOR AES E VALUATION C RITERIA FOR William Stallings Copyright 20010 H.1 THE ORIGINS OF AES...2 H.2 AES EVALUATION...3 Supplement to Cryptography and Network Security, Fifth Edition

More information

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano

More information

Processor Architectures

Processor Architectures ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture

More information

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System? Management Challenge Managing Hardware Assets What computer processing and storage capability does our organization need to handle its information and business transactions? What arrangement of computers

More information

Influence of Load Balancing on Quality of Real Time Data Transmission*

Influence of Load Balancing on Quality of Real Time Data Transmission* SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 6, No. 3, December 2009, 515-524 UDK: 004.738.2 Influence of Load Balancing on Quality of Real Time Data Transmission* Nataša Maksić 1,a, Petar Knežević 2,

More information

How To Build A Cloud Computer

How To Build A Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,

More information

CS2101a Foundations of Programming for High Performance Computing

CS2101a Foundations of Programming for High Performance Computing CS2101a Foundations of Programming for High Performance Computing Marc Moreno Maza & Ning Xie University of Western Ontario, London, Ontario (Canada) CS2101 Plan 1 Course Overview 2 Hardware Acceleration

More information

PROBLEMS #20,R0,R1 #$3A,R2,R4

PROBLEMS #20,R0,R1 #$3A,R2,R4 506 CHAPTER 8 PIPELINING (Corrisponde al cap. 11 - Introduzione al pipelining) PROBLEMS 8.1 Consider the following sequence of instructions Mul And #20,R0,R1 #3,R2,R3 #$3A,R2,R4 R0,R2,R5 In all instructions,

More information

The Importance of Software License Server Monitoring

The Importance of Software License Server Monitoring The Importance of Software License Server Monitoring NetworkComputer How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization White Paper Introduction Semiconductor companies typically

More information

Introduction to Embedded Systems. Software Update Problem

Introduction to Embedded Systems. Software Update Problem Introduction to Embedded Systems CS/ECE 6780/5780 Al Davis logistics minor Today s topics: more software development issues 1 CS 5780 Software Update Problem Lab machines work let us know if they don t

More information

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE

More information

Hardware/Software Co-Design of a Java Virtual Machine

Hardware/Software Co-Design of a Java Virtual Machine Hardware/Software Co-Design of a Java Virtual Machine Kenneth B. Kent University of Victoria Dept. of Computer Science Victoria, British Columbia, Canada ken@csc.uvic.ca Micaela Serra University of Victoria

More information

Instruction Set Architecture

Instruction Set Architecture Instruction Set Architecture Consider x := y+z. (x, y, z are memory variables) 1-address instructions 2-address instructions LOAD y (r :=y) ADD y,z (y := y+z) ADD z (r:=r+z) MOVE x,y (x := y) STORE x (x:=r)

More information

Enhancing SQL Server Performance

Enhancing SQL Server Performance Enhancing SQL Server Performance Bradley Ball, Jason Strate and Roger Wolter In the ever-evolving data world, improving database performance is a constant challenge for administrators. End user satisfaction

More information

Understanding Video Latency What is video latency and why do we care about it?

Understanding Video Latency What is video latency and why do we care about it? By Pete Eberlein, Sensoray Company, Inc. Understanding Video Latency What is video latency and why do we care about it? When choosing components for a video system, it is important to understand how the

More information

Amazon EC2 XenApp Scalability Analysis

Amazon EC2 XenApp Scalability Analysis WHITE PAPER Citrix XenApp Amazon EC2 XenApp Scalability Analysis www.citrix.com Table of Contents Introduction...3 Results Summary...3 Detailed Results...4 Methods of Determining Results...4 Amazon EC2

More information

Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis Martínez, Gerardo Fernández-Escribano, José M. Claver and José Luis Sánchez

Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis Martínez, Gerardo Fernández-Escribano, José M. Claver and José Luis Sánchez Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis artínez, Gerardo Fernández-Escribano, José. Claver and José Luis Sánchez 1. Introduction 2. Technical Background 3. Proposed DVC to H.264/AVC

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Table of Contents. Cisco How Does Load Balancing Work?

Table of Contents. Cisco How Does Load Balancing Work? Table of Contents How Does Load Balancing Work?...1 Document ID: 5212...1 Introduction...1 Prerequisites...1 Requirements...1 Components Used...1 Conventions...1 Load Balancing...1 Per Destination and

More information

TriMedia CPU64 Application Development Environment

TriMedia CPU64 Application Development Environment Published at ICCD 1999, International Conference on Computer Design, October 10-13, 1999, Austin Texas, pp. 593-598. TriMedia CPU64 Application Development Environment E.J.D. Pol, B.J.M. Aarts, J.T.J.

More information

Uptime Infrastructure Monitor. Installation Guide

Uptime Infrastructure Monitor. Installation Guide Uptime Infrastructure Monitor Installation Guide This guide will walk through each step of installation for Uptime Infrastructure Monitor software on a Windows server. Uptime Infrastructure Monitor is

More information

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern: Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove

More information

The new 32-bit MSP432 MCU platform from Texas

The new 32-bit MSP432 MCU platform from Texas Technology Trend MSP432 TM microcontrollers: Bringing high performance to low-power applications The new 32-bit MSP432 MCU platform from Texas Instruments leverages its more than 20 years of lowpower leadership

More information

CPU Organization and Assembly Language

CPU Organization and Assembly Language COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:

More information

Systolic Computing. Fundamentals

Systolic Computing. Fundamentals Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW

More information

IEC 61131-3. The Fast Guide to Open Control Software

IEC 61131-3. The Fast Guide to Open Control Software IEC 61131-3 The Fast Guide to Open Control Software 1 IEC 61131-3 The Fast Guide to Open Control Software Introduction IEC 61131-3 is the first vendor-independent standardized programming language for

More information

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language Chapter 4 Register Transfer and Microoperations Section 4.1 Register Transfer Language Digital systems are composed of modules that are constructed from digital components, such as registers, decoders,

More information

In the Beginning... 1964 -- The first ISA appears on the IBM System 360 In the good old days

In the Beginning... 1964 -- The first ISA appears on the IBM System 360 In the good old days RISC vs CISC 66 In the Beginning... 1964 -- The first ISA appears on the IBM System 360 In the good old days Initially, the focus was on usability by humans. Lots of user-friendly instructions (remember

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

Eight Ways to Increase GPIB System Performance

Eight Ways to Increase GPIB System Performance Application Note 133 Eight Ways to Increase GPIB System Performance Amar Patel Introduction When building an automated measurement system, you can never have too much performance. Increasing performance

More information

CPU Organisation and Operation

CPU Organisation and Operation CPU Organisation and Operation The Fetch-Execute Cycle The operation of the CPU 1 is usually described in terms of the Fetch-Execute cycle. 2 Fetch-Execute Cycle Fetch the Instruction Increment the Program

More information

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems JHalstuch@racktopsystems.com Big Data Invasion We hear so much on Big Data and

More information

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 s Introduction Convolution is one of the basic and most common operations in both analog and digital domain signal processing.

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

The Design of the Inferno Virtual Machine. Introduction

The Design of the Inferno Virtual Machine. Introduction The Design of the Inferno Virtual Machine Phil Winterbottom Rob Pike Bell Labs, Lucent Technologies {philw, rob}@plan9.bell-labs.com http://www.lucent.com/inferno Introduction Virtual Machine are topical

More information