TMS320C6000 Programmer s Guide

Size: px
Start display at page:

Download "TMS320C6000 Programmer s Guide"

Transcription

1 TMS320C6000 Programmer s Guide Literature Number: SPRU198K Revised: July 2011 Printed on Recycled Paper

2

3 Preface Read This First About This Manual This manual is a reference for programming TMS320C6000 digital signal processor (DSP) devices. Before you use this book, you should install your code generation and debugging tools. This book is organized in five major parts: Part I: Introduction includes a brief description of the C6000 architecture and code development flow. It also includes a tutorial that introduces you to the tools you will use in each phase of development and an optimization checklist to help you achieve optimal performance from your code. Part II: C Code includes C code examples and discusses optimization methods for the code. This information can help you choose the most appropriate optimization techniques for your code. Part III: Assembly Code describes the structure of assembly code. It provides examples and discusses optimizations for assembly code. It also includes a chapter on interrupt subroutines. Part IV: C64x Programming Techniques describes programming considerations for the C64x. iii

4 Related Documentation From Texas Instruments Related Documentation From Texas Instruments The following books describe the TMS320C6000 devices and related support tools. To obtain a copy of any of these TI documents, call the Texas Instruments Literature Response Center at (800) When ordering, please identify the book by its title and literature number. TMS320C6000 Assembly Language Tools User s Guide (literature number SPRU186) describes the assembly language tools (assembler, linker, and other tools used to develop assembly language code), assembler directives, macros, common object file format, and symbolic debugging directives for the C6000 generation of devices. TMS320C6000 Optimizing C Compiler User s Guide (literature number SPRU187) describes the C6000 C compiler and the assembly optimizer. This C compiler accepts ANSI standard C source code and produces assembly language source code for the C6000 generation of devices. The assembly optimizer helps you optimize your assembly code. TMS320C6000 CPU and Instruction Set Reference Guide (literature number SPRU189) describes the C6000 CPU architecture, instruction set, pipeline, and interrupts for these digital signal processors. TMS320C6000 Peripherals Reference Guide (literature number SPRU190) describes common peripherals available on the TMS320C6201/6701 digital signal processors. This book includes information on the internal data and program memories, the external memory interface (EMIF), the host port interface (HPI), multichannel buffered serial ports (McBSPs), direct memory access (DMA), enhanced DMA (EDMA), expansion bus, clocking and phase-locked loop (PLL), and the power-down modes. TMS320C64x Technical Overview (SPRU395) The TMS320C64x technical overview gives an introduction to the C64x digital signal processor, and discusses the application areas that are enhanced by the C64x VelociTI. iv

5 Trademarks Trademarks Solaris and SunOS are trademarks of Sun Microsystems, Inc. VelociTI is a trademark of Texas Instruments Incorporated. Windows and Windows NT are trademarks of Microsoft Corporation. The Texas Instruments logo and Texas Instruments are registered trademarks of Texas Instruments Incorporated. Trademarks of Texas Instruments include: TI, XDS, Code Composer, Code Composer Studio, TMS320, TMS320C6000 and 320 Hotline On-line. All other brand or product names are trademarks or registered trademarks of their respective companies or organizations. v

6 vi

7 Contents Contents 1 Introduction Introduces some features of the C6000 microprocessor and discusses the basic process for creating code and understanding feedback 1.1 TMS320C6000 Architecture TMS320C6000 Pipeline Code Development Flow to Increase Performance Optimizing C/C++ Code Explains how to maximize C performance by using compiler options, intrinsics, and code transformations. 2.1 Writing C/C++ Code Tips on Data Types Analyzing C Code Performance Compiling C/C++ Code Compiler Options Memory Dependencies Performing Program-Level Optimization ( pst Option) Profiling Your Code Using the Standalone Simulator (load6x) to Profile Refining C/C++ Code Using Intrinsics Wider Memory Access for Smaller Data Widths Software Pipelining Compiler Optimization Tutorial Uses example code to walk you through the code development flow for the TMS320C Introduction: Simple C Tuning Project Familiarization Getting Ready for Lesson Lesson 1: Loop Carry Path From Memory Pointers Lesson 2: Balancing Resources With Dual-Data Paths Lesson 3: Packed Data Optimization of Memory Bandwidth Lesson 4: Program Level Optimization Lesson 5: Writing Linear Assembly vii

8 Contents 4 Feedback Solutions Provides a quick reference to techniques to optimize loops. 4.1 Understanding Feedback Stage 1: Qualify the Loop for Software Pipelining Stage 2: Collect Loop Resource and Dependency Graph Information Stage 3: Software Pipeline the Loop Loop Disqualification Messages Bad Loop Structure Loop Contains a Call Too Many Instructions Software Pipelining Disabled Uninitialized Trip Counter Suppressed to Prevent Code Expansion Loop Carried Dependency Bound Too Large Cannot Identify Trip Counter Pipeline Failure Messages Address Increment Too Large Cannot Allocate Machine Registers Cycle Count Too High. Not Profitable Did Not Find Schedule Iterations in Parallel > Max. Trip Count Speculative Threshold Exceeded Iterations in Parallel > Min. Trip Count Register is Live Too Long Too Many Predicates Live on One Side Too Many Reads of One Register Trip var. Used in Loop Can t Adjust Trip Count Investigative Feedback Loop Carried Dependency Bound is Much Larger Than Unpartitioned Resource Bound Two Loops are Generated, One Not Software Pipelined Uneven Resources Larger Outer Loop Overhead in Nested Loop There are Memory Bank Conflicts T Address Paths Are Resource Bound Optimizing Assembly Code via Linear Assembly Describes methods that help you develop more efficient assembly language programs. 5.1 Linear Assembly Code Assembly Optimizer Options and Directives The on Option The mt Option and the.no_mdep Directive The.mdep Directive The.mptr Directive viii

9 Contents The.trip Directive Writing Parallel Code Dot Product C Code Translating C Code to Linear Assembly Linear Assembly Resource Allocation Drawing a Dependency Graph Nonparallel Versus Parallel Assembly Code Comparing Performance Using Word Access for Short Data and Doubleword Access for Floating-Point Data Unrolled Dot Product C Code Translating C Code to Linear Assembly Drawing a Dependency Graph Linear Assembly Resource Allocation Final Assembly Comparing Performance Software Pipelining Modulo Iteration Interval Scheduling Using the Assembly Optimizer to Create Optimized Loops Final Assembly Comparing Performance Modulo Scheduling of Multicycle Loops Weighted Vector Sum C Code Translating C Code to Linear Assembly Determining the Minimum Iteration Interval Drawing a Dependency Graph Linear Assembly Resource Allocation Modulo Iteration Interval Scheduling Using the Assembly Optimizer for the Weighted Vector Sum Final Assembly Loop Carry Paths IIR Filter C Code Translating C Code to Linear Assembly (Inner Loop) Drawing a Dependency Graph Determining the Minimum Iteration Interval Linear Assembly Resource Allocation Modulo Iteration Interval Scheduling Using the Assembly Optimizer for the IIR Filter Final Assembly If-Then-Else Statements in a Loop If-Then-Else C Code Translating C Code to Linear Assembly Drawing a Dependency Graph Determining the Minimum Iteration Interval Contents ix

10 Contents Linear Assembly Resource Allocation Final Assembly Comparing Performance Loop Unrolling Unrolled If-Then-Else C Code Translating C Code to Linear Assembly Drawing a Dependency Graph Determining the Minimum Iteration Interval Linear Assembly Resource Allocation Final Assembly Comparing Performance Live-Too-Long Issues C Code With Live-Too-Long Problem Translating C Code to Linear Assembly Drawing a Dependency Graph Determining the Minimum Iteration Interval Linear Assembly Resource Allocation Final Assembly With Move Instructions Redundant Load Elimination FIR Filter C Code Translating C Code to Linear Assembly Drawing a Dependency Graph Determining the Minimum Iteration Interval Linear Assembly Resource Allocation Final Assembly Memory Banks FIR Filter Inner Loop Unrolled FIR Filter C Code Translating C Code to Linear Assembly Drawing a Dependency Graph Linear Assembly for Unrolled FIR Inner Loop With.mptr Directive Linear Assembly Resource Allocation Determining the Minimum Iteration Interval Final Assembly Comparing Performance Software Pipelining the Outer Loop Unrolled FIR Filter C Code Making the Outer Loop Parallel With the Inner Loop Epilog and Prolog Final Assembly Comparing Performance Outer Loop Conditionally Executed With Inner Loop Unrolled FIR Filter C Code Translating C Code to Linear Assembly (Inner Loop) Translating C Code to Linear Assembly (Outer Loop) x

11 Contents Unrolled FIR Filter C Code Translating C Code to Linear Assembly (Inner Loop) Translating C Code to Linear Assembly (Inner Loop and Outer Loop) Determining the Minimum Iteration Interval Final Assembly Comparing Performance C64x Programming Considerations Describes programming considerations for the C64x. 6.1 Overview of C64x Architectural Enhancements Improved Scheduling Flexibility Greater Memory Bandwidth Support for Packed Data Types Non-Aligned Memory Accesses Additional Specialized Instructions Accessing Packed-Data Processing on the C64x Packed Data Types Storing Multiple Elements in a Single Register Packing and Unpacking Data Optimizing for Packed Data Processing Vectorizing With Packed Data Processing Combining Multiple Operations in a Single Instruction Non-Aligned Memory Accesses Performing Conditional Operations With Packed Data Linear Assembly Considerations Using BDEC and BPOS in Linear Assembly Avoiding Cross Path Stalls C64x+ Programming Considerations Describes programming considerations for the C64x Overview of C64x+ Architectural Enhancements Improved Scheduling Flexiblity Additional Specialized Instructions Software Pipelined Loop (SPLOOP) Buffer Exceptions Compact Instruction Set Utilizing Additional Instructions Improvng Multiply Throughput Combining Addition and Subtraction Instructions Improved Packed Data Instructions Software Pipelined Loop (SPLOOP) Buffer Introduction to SPLOOP Buffer Terminology Hardware Support for SPLOOP Contents xi

12 Contents The Compiler Supports SPLOOP Single Loop With SPLOOP(D) Nested Loop With SPLOOP(D) Do While Loops Using SPLOOPW Considerations for Interruptible SPLOOP Code Compact Instructions Structure of Assembly Code Describes the structure of the assembly code, including labels, conditions, instructions, functional units, operands, and comments. 8.1 Labels Parallel Bars Conditions Instructions Functional Units Operands Comments Interrupts Describes interrupts from a software programming point of view. 9.1 Overview of Interrupts Single Assignment vs. Multiple Assignment Interruptible Loops Interruptible Code Generation Level 0 - Specified Code is Guaranteed to Not Be Interrupted Level 1 Specified Code Interruptible at All Times Level 2 Specified Code Interruptible Within Threshold Cycles Getting the Most Performance Out of Interruptible Code Interrupt Subroutines ISR with the C/C++ Compiler ISR with Hand-Coded Assembly Nested Interrupts Linking Issues Explains linker messages and how to use run-time-support functions How to Use Linker Error Messages How to Find The Problem Executable Flag How to Save On-Chip Memory by Placing Run-Time Support Off-Chip How to Compile Must #include Header Files Run-Time-Support Data How to Link Example Compiler Invocation xii

13 Contents Header File Details Changing Run-Time-Support Data to near Contents xiii

14 Figures Figures 2 1 Dependency Graph for Vector Sum # Software-Pipelined Loop Dependency Graph for Lesson_c.c Dependency Graph of Fixed-Point Dot Product Dependency Graph of Floating-Point Dot Product Dependency Graph of Fixed-Point Dot Product with Parallel Assembly Dependency Graph of Floating-Point Dot Product With Parallel Assembly Dependency Graph of Fixed-Point Dot Product With LDW Dependency Graph of Floating-Point Dot Product With LDDW Dependency Graph of Fixed-Point Dot Product With LDW (Showing Functional Units) Dependency Graph of Floating-Point Dot Product With LDDW (Showing Functional Units) Dependency Graph of Fixed-Point Dot Product With LDW (Showing Functional Units) Dependency Graph of Floating-Point Dot Product With LDDW (Showing Functional Units) Dependency Graph of Weighted Vector Sum Dependency Graph of Weighted Vector Sum (Showing Resource Conflict) Dependency Graph of Weighted Vector Sum (With Resource Conflict Resolved) Dependency Graph of Weighted Vector Sum (Scheduling ci +1) Dependency Graph of IIR Filter Dependency Graph of IIR Filter (With Smaller Loop Carry) Dependency Graph of If-Then-Else Code Dependency Graph of If-Then-Else Code (Unrolled) Dependency Graph of Live-Too-Long Code Dependency Graph of Live-Too-Long Code (Split-Join Path Resolved) Dependency Graph of FIR Filter (With Redundant Load Elimination) Bank Interleaved Memory Bank Interleaved Memory With Two Memory Blocks Dependency Graph of FIR Filter (With Even and Odd Elements of Each Array on Same Loop Cycle) Dependency Graph of FIR Filter (With No Memory Hits) Four Bytes Packed Into a Single General Purpose Register Two halfwords Packed Into a Single General Purpose Register Graphical Representation of _packxx2 Intrinsics Graphical Representation of _spack xiv

15 Figures 6 5 Graphical Representation of 8-Bit Packs (_packx4 and _spacku4) Graphical Representation of 8-Bit Unpacks (_unpkxu4) Graphical Representation of (_shlmb, _shrmb, and _swap4) Graphical Representation of a Simple Vector Operation Graphical Representation of Dot Product Graphical Representation of a Single Iteration of Vector Complex Multiply Array Access in Vector Sum by LDDW Array Access in Vector Sum by STDW Vector Addition Graphical Representation of a Single Iteration of Vector Multiply Packed Multiplies Using _mpy Fine Tuning Vector Multiply (shift > 16) Fine Tuning Vector Multiply (shift < 16) Graphical Representation of the _dotp2 Intrinsic c = _dotp2(b, a) The _dotpn2 Intrinsic Performing Real Portion of Complex Multiply _packlh2 and _dotp2 Working Together Graphical Illustration of _cmpxx2 Intrinsics Graphical Illustration of _cmpxx4 Intrinsics Graphical Illustration of _xpnd2 Intrinsic Graphical Illustration of _xpnd4 Intrinsic C64x Data Cross Paths Nested Loops Flow Labels in Assembly Code Parallel Bars in Assembly Code Conditions in Assembly Code Instructions in Assembly Code TMS320C6x Functional Units Units in the Assembly Code Operands in the Assembly Code Operands in Instructions Comments in Assembly Code Contents xv

16 Tables Tables 1 1 Three Phases of Software Development Code Development Steps Compiler Options to Avoid on Performance Critical Code Compiler Options for Performance Compiler Options That Slightly Degrade Performance and Improve Code Size Compiler Options for Control Code Compiler Options for Information TMS320C6000 C/C++ Compiler Intrinsics TMS320C64x/C64x+ C/C++ Compiler Intrinsics TMS320C64x+ C/C++ Compiler Intrinsics TMS320C67x C/C++ Compiler Intrinsics Memory Access Intrinsics Status Update: Tutorial example lesson_c lesson1_c Status Update: Tutorial example lesson_c lesson1_c lesson2_c Status Update: Tutorial example lesson_c lesson1_c lesson2_c lesson3_c Status Update: Tutorial example lesson_c lesson1_c lesson2_c lesson3_c Comparison of Nonparallel and Parallel Assembly Code for Fixed-Point Dot Product Comparison of Nonparallel and Parallel Assembly Code for Floating-Point Dot Product Comparison of Fixed-Point Dot Product Code With Use of LDW Comparison of Floating-Point Dot Product Code With Use of LDDW Modulo Iteration Interval Scheduling Table for Fixed-Point Dot Product (Before Software Pipelining) Modulo Iteration Interval Scheduling Table for Floating-Point Dot Product (Before Software Pipelining) Modulo Iteration Interval Table for Fixed-Point Dot Product (After Software Pipelining) Modulo Iteration Interval Table for Floating-Point Dot Product (After Software Pipelining) Software Pipeline Accumulation Staggered Results Due to Three-Cycle Delay Comparison of Fixed-Point Dot Product Code Examples Comparison of Floating-Point Dot Product Code Examples Modulo Iteration Interval for Weighted Vector Sum (2-Cycle Loop) Modulo Iteration Interval for Weighted Vector Sum With SHR Instructions Modulo Iteration Interval for Weighted Vector Sum (2-Cycle Loop) Modulo Iteration Interval for Weighted Vector Sum (2-Cycle Loop) Resource Table for IIR Filter xvi

17 Tables 5 17 Modulo Iteration Interval Table for IIR (4-Cycle Loop) Resource Table for If-Then-Else Code Comparison of If-Then-Else Code Examples Resource Table for Unrolled If-Then-Else Code Comparison of If-Then-Else Code Examples Resource Table for Live-Too-Long Code Resource Table for FIR Filter Code Resource Table for FIR Filter Code Comparison of FIR Filter Code Comparison of FIR Filter Code Resource Table for FIR Filter Code Comparison of FIR Filter Code Packed Data Types Supported Operations on Packed Data Types Instructions for Manipulating Packed Data Types Unpacking Packed 16-Bit Quantities to 32-Bit Values Intrinsics Which Combine Multiple Operations in One Instruction Comparison Between Aligned and Non-Aligned Memory Accesses Intrinsics With Increased Multiply Throughput Intrinsics Combining Addition and Subtraction Instructions on Common Inputs Intrinsics Combining Multiple Data Manipulation Instructions SPLOOPD ILC Values Execution Unit Use for SPLOOP of Autocorrelation Execution Unit Use for Outer Loop of SPLOOP of Autocorrelation Selected TMS320C6x Directives Functional Units and Operations Performed Definitions Command Line Options for Run-Time-Support Calls How _FAR_RTS is Defined in Linkage.h With mr Contents xvii

18 Examples Examples 1 1 Compiler and/or Assembly Optimizer Feedback Basic Vector Sum Use of the Restrict Type Qualifier With Pointers Use of the Restrict Type Qualifier With Arrays Incorrect Use of the restrict Keyword Including the clock( ) Function Saturated Add Without Intrinsics Saturated Add With Intrinsics Vector Sum With restrict Keywords,MUST_ITERATE, Word Reads Vector Sum with Type-Casting Casting Breaking Default Assumptions Additional Casting Breaking Default Assumptions Rewritten Using Memory Access Intrinsics Vector Sum With Non-Aligned Word Accesses to Memory Vector Sum With restrict Keyword, MUST_ITERATE Pragma, and Word Reads (Generic Version) Dot Product Using Intrinsics FIR Filter Original Form FIR Filter Optimized Form Basic Float Dot Product Float Dot Product Using Intrinsics Float Dot Product With Peak Performance Int Dot Product with Nonaligned Doubleword Reads Using the Compiler to Generate a Dot Product With Word Accesses Using the _nassert() Intrinsic to Generate Word Accesses for Vector Sum Using _nassert() Intrinsic to Generate Word Accesses for FIR Filter Compiler Output From Compiler Output From Compiler Output From Automatic Use of Word Accesses Without the _nassert Intrinsic Assembly File Resulting From Trip Counters Vector Sum With Three Memory Operations Word-Aligned Vector Sum Vector Sum Using const Keywords, MUST_ITERATE pragma, Word Reads, and Loop Unrolling FIR_Type2 Original Form xviii

19 Examples 2 35 FIR_Type2 Inner Loop Completely Unrolled Vector Sum Use of If Statements in Float Collision Detection (Original Code) Use of If Statements in Float Collision Detection (Modified Code) Vector Summation of Two Weighted Vectors lesson_c.c Feedback From lesson_c.asm lesson_c.asm lesson1_c.c lesson1_c.asm lesson1_c.asm lesson2_c.c lesson2_c.asm lesson2_c.asm lesson3_c.c lesson3_c.asm Profile Statistics Using the iircas4 Function in C Software Pipelining Feedback From the iircas4 C Code Rewriting the iircas4 ( ) Function in Linear Assembly Software Pipeline Feedback from Linear Assembly Stage 1 Feedback Stage Two Feedback Stage 3 Feedback Linear Assembly Block Copy Block Copy With.mdep Linear Assembly Dot Product Linear Assembly Dot Product With.mptr Fixed-Point Dot Product C Code Floating-Point Dot Product C Code List of Assembly Instructions for Fixed-Point Dot Product List of Assembly Instructions for Floating-Point Dot Product Nonparallel Assembly Code for Fixed-Point Dot Product Parallel Assembly Code for Fixed-Point Dot Product Nonparallel Assembly Code for Floating-Point Dot Product Parallel Assembly Code for Floating-Point Dot Product Fixed-Point Dot Product C Code (Unrolled) Floating-Point Dot Product C Code (Unrolled) Linear Assembly for Fixed-Point Dot Product Inner Loop With LDW Linear Assembly for Floating-Point Dot Product Inner Loop With LDDW Linear Assembly for Fixed-Point Dot Product Inner Loop With LDW (With Allocated Resources) Linear Assembly for Floating-Point Dot Product Inner Loop With LDDW (With Allocated Resources) Contents xix

20 Examples 5 19 Assembly Code for Fixed-Point Dot Product With LDW (Before Software Pipelining) Assembly Code for Floating-Point Dot Product With LDDW (Before Software Pipelining) Linear Assembly for Fixed-Point Dot Product Inner Loop (With Conditional SUB Instruction) Linear Assembly for Floating-Point Dot Product Inner Loop (With Conditional SUB Instruction) Pseudo-Code for Single-Cycle Accumulator With ADDSP Linear Assembly for Full Fixed-Point Dot Product Linear Assembly for Full Floating-Point Dot Product Assembly Code for Fixed-Point Dot Product (Software Pipelined) Assembly Code for Floating-Point Dot Product (Software Pipelined) Assembly Code for Fixed-Point Dot Product (Software Pipelined With No Extraneous Loads) Assembly Code for Floating-Point Dot Product (Software Pipelined With No Extraneous Loads) Assembly Code for Fixed-Point Dot Product (Software Pipelined With Removal of Prolog and Epilog) Assembly Code for Floating-Point Dot Product (Software Pipelined With Removal of Prolog and Epilog) Assembly Code for Fixed-Point Dot Product (Software Pipelined With Smallest Code Size) Assembly Code for Floating-Point Dot Product (Software Pipelined With Smallest Code Size) Weighted Vector Sum C Code Linear Assembly for Weighted Vector Sum Inner Loop Weighted Vector Sum C Code (Unrolled) Linear Assembly for Weighted Vector Sum Using LDW Linear Assembly for Weighted Vector Sum With Resources Allocated Linear Assembly for Weighted Vector Sum Assembly Code for Weighted Vector Sum IIR Filter C Code Linear Assembly for IIR Inner Loop Linear Assembly for IIR Inner Loop With Reduced Loop Carry Path Linear Assembly for IIR Inner Loop (With Allocated Resources) Linear Assembly for IIR Filter Assembly Code for IIR Filter If-Then-Else C Code Linear Assembly for If-Then-Else Inner Loop Linear Assembly for Full If-Then-Else Code Assembly Code for If-Then-Else Assembly Code for If-Then-Else With Loop Count Greater Than If-Then-Else C Code (Unrolled) Linear Assembly for Unrolled If-Then-Else Inner Loop Linear Assembly for Full Unrolled If-Then-Else Code xx

21 Examples 5 55 Assembly Code for Unrolled If-Then-Else Live-Too-Long C Code Linear Assembly for Live-Too-Long Inner Loop Linear Assembly for Full Live-Too-Long Code Assembly Code for Live-Too-Long With Move Instructions FIR Filter C Code FIR Filter C Code With Redundant Load Elimination Linear Assembly for FIR Inner Loop Linear Assembly for Full FIR Code Final Assembly Code for FIR Filter With Redundant Load Elimination Final Assembly Code for Inner Loop of FIR Filter FIR Filter C Code (Unrolled) Linear Assembly for Unrolled FIR Inner Loop Linear Assembly for Full Unrolled FIR Filter Final Assembly Code for FIR Filter With Redundant Load Elimination and No Memory Hits Unrolled FIR Filter C Code Final Assembly Code for FIR Filter With Redundant Load Elimination and No Memory Hits With Outer Loop Software-Pipelined Unrolled FIR Filter C Code Linear Assembly for Unrolled FIR Inner Loop Linear Assembly for FIR Outer Loop Unrolled FIR Filter C Code Linear Assembly for FIR With Outer Loop Conditionally Executed With Inner Loop Linear Assembly for FIR With Outer Loop Conditionally Executed With Inner Loop (With Functional Units) Final Assembly Code for FIR Filter Vector Sum Vector Multiply Dot Product Vector Complex Multiply Vectorization: Using LDDW and STDW in Vector Sum Vector Addition (Complete) Using LDDW and STDW in Vector Multiply Using _mpy2() and _pack2() to Perform the Vector Multiply Vectorized Form of the Dot Product Kernel Vectorized Form of the Dot Product Kernel Final Assembly Code for Dot-Product Kernel s Inner Loop Vectorized Form of the Vector Complex Multiply Kernel Vectorized Form of the Vector Complex Multiply Non-Aligned Memory Access With _mem4 and _memd Vector Sum Modified to Use Non-Aligned Memory Accesses Clear Below Threshold Kernel Clear Below Threshold Kernel, Using _cmpgtu4 and _xpnd4 Intrinsics Contents xxi

Guidelines for Software Development Efficiency on the TMS320C6000 VelociTI Architecture

Guidelines for Software Development Efficiency on the TMS320C6000 VelociTI Architecture Guidelines for Software Development Efficiency on the TMS320C6000 VelociTI Architecture WHITE PAPER: SPRA434 Authors: Marie Silverthorn Leon Adams Richard Scales Digital Signal Processing Solutions April

More information

TMS320C3x/C4x Assembly Language Tools User s Guide

TMS320C3x/C4x Assembly Language Tools User s Guide TMS320C3x/C4x Assembly Language Tools User s Guide Literature Number: SPRU035D June 1998 Printed on Recycled Paper IMPORTANT NOTICE Texas Instruments and its subsidiaries (TI) reserve the right to make

More information

Introduction to Compiler Consultant

Introduction to Compiler Consultant Application Report SPRAA14 April 2004 Introduction to Compiler Consultant George Mock Software Development Systems ABSTRACT C and C++ are very powerful and expressive programming languages. Even so, these

More information

TMS320C67x FastRTS Library Programmer s Reference

TMS320C67x FastRTS Library Programmer s Reference TMS320C67x FastRTS Library Programmer s Reference SPRU100A October 2002 Printed on Recycled Paper IMPORTANT NOTICE Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections,

More information

System Considerations

System Considerations System Considerations Interfacing Performance Power Size Ease-of Use Programming Interfacing Debugging Cost Device cost System cost Development cost Time to market Integration Peripherals Different Needs?

More information

Keil C51 Cross Compiler

Keil C51 Cross Compiler Keil C51 Cross Compiler ANSI C Compiler Generates fast compact code for the 8051 and it s derivatives Advantages of C over Assembler Do not need to know the microcontroller instruction set Register allocation

More information

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection

More information

Chapter 13. PIC Family Microcontroller

Chapter 13. PIC Family Microcontroller Chapter 13 PIC Family Microcontroller Lesson 01 PIC Characteristics and Examples PIC microcontroller characteristics Power-on reset Brown out reset Simplified instruction set High speed execution Up to

More information

EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

More information

NEON. Support in Compilation Tools. Development Article. Copyright 2009 ARM Limited. All rights reserved. DHT 0004A (ID081609)

NEON. Support in Compilation Tools. Development Article. Copyright 2009 ARM Limited. All rights reserved. DHT 0004A (ID081609) NEON Support in Compilation Tools Development Article Copyright 2009 ARM Limited. All rights reserved. DHT 0004A () NEON Support in Compilation Tools Development Article Copyright 2009 ARM Limited. All

More information

Using Example Projects, Code and Scripts to Jump-Start Customers With Code Composer Studio 2.0

Using Example Projects, Code and Scripts to Jump-Start Customers With Code Composer Studio 2.0 Application Report SPRA766 - June 2001 Using Example Projects, Code and Scripts to Jump-Start Customers With Code Composer Studio 2.0 Steve White, Senior Applications Code Composer Studio, Applications

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

High-speed image processing algorithms using MMX hardware

High-speed image processing algorithms using MMX hardware High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to

More information

Freescale Semiconductor, I

Freescale Semiconductor, I nc. Application Note 6/2002 8-Bit Software Development Kit By Jiri Ryba Introduction 8-Bit SDK Overview This application note describes the features and advantages of the 8-bit SDK (software development

More information

PROBLEMS #20,R0,R1 #$3A,R2,R4

PROBLEMS #20,R0,R1 #$3A,R2,R4 506 CHAPTER 8 PIPELINING (Corrisponde al cap. 11 - Introduzione al pipelining) PROBLEMS 8.1 Consider the following sequence of instructions Mul And #20,R0,R1 #3,R2,R3 #$3A,R2,R4 R0,R2,R5 In all instructions,

More information

8051 MICROCONTROLLER COURSE

8051 MICROCONTROLLER COURSE 8051 MICROCONTROLLER COURSE Objective: 1. Familiarization with different types of Microcontroller 2. To know 8051 microcontroller in detail 3. Programming and Interfacing 8051 microcontroller Prerequisites:

More information

Programing the Microprocessor in C Microprocessor System Design and Interfacing ECE 362

Programing the Microprocessor in C Microprocessor System Design and Interfacing ECE 362 PURDUE UNIVERSITY Programing the Microprocessor in C Microprocessor System Design and Interfacing ECE 362 Course Staff 1/31/2012 1 Introduction This tutorial is made to help the student use C language

More information

Using Code Coverage and Multi-event Profiler in Code Composer Studio v2.3 for Robustness and Efficiency Analyses

Using Code Coverage and Multi-event Profiler in Code Composer Studio v2.3 for Robustness and Efficiency Analyses Application Report SPRA868A April 2004 Using Code Coverage and Multi-event Profiler in Code Composer Studio v2.3 for Robustness and Efficiency Analyses Amit Rangari, N.Pradeep Software Development Systems

More information

Typically, TMS320C62xx DSP code will be generated using a top-down design technique, as follows:

Typically, TMS320C62xx DSP code will be generated using a top-down design technique, as follows: Software Development Techniques for the TMS320C6201 DSP Richard Scales Abstract The advancements in performance and flexibility of modern digital signal processor (DSP) devices is clearly demonstrated

More information

OpenACC 2.0 and the PGI Accelerator Compilers

OpenACC 2.0 and the PGI Accelerator Compilers OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group michael.wolfe@pgroup.com This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

CHAPTER 7: The CPU and Memory

CHAPTER 7: The CPU and Memory CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

PROBLEMS (Cap. 4 - Istruzioni macchina)

PROBLEMS (Cap. 4 - Istruzioni macchina) 98 CHAPTER 2 MACHINE INSTRUCTIONS AND PROGRAMS PROBLEMS (Cap. 4 - Istruzioni macchina) 2.1 Represent the decimal values 5, 2, 14, 10, 26, 19, 51, and 43, as signed, 7-bit numbers in the following binary

More information

MACHINE ARCHITECTURE & LANGUAGE

MACHINE ARCHITECTURE & LANGUAGE in the name of God the compassionate, the merciful notes on MACHINE ARCHITECTURE & LANGUAGE compiled by Jumong Chap. 9 Microprocessor Fundamentals A system designer should consider a microprocessor-based

More information

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications 1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Learn AX: A Beginner s Guide to Microsoft Dynamics AX. Managing Users and Role Based Security in Microsoft Dynamics AX 2012. Dynamics101 ACADEMY

Learn AX: A Beginner s Guide to Microsoft Dynamics AX. Managing Users and Role Based Security in Microsoft Dynamics AX 2012. Dynamics101 ACADEMY Learn AX: A Beginner s Guide to Microsoft Dynamics AX Managing Users and Role Based Security in Microsoft Dynamics AX 2012 About.com is a Rand Group Knowledge Center intended to provide our clients, and

More information

PART B QUESTIONS AND ANSWERS UNIT I

PART B QUESTIONS AND ANSWERS UNIT I PART B QUESTIONS AND ANSWERS UNIT I 1. Explain the architecture of 8085 microprocessor? Logic pin out of 8085 microprocessor Address bus: unidirectional bus, used as high order bus Data bus: bi-directional

More information

MICROPROCESSOR AND MICROCOMPUTER BASICS

MICROPROCESSOR AND MICROCOMPUTER BASICS Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit

More information

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007 Application Note 195 ARM11 performance monitor unit Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007 Copyright 2007 ARM Limited. All rights reserved. Application Note

More information

Faculty of Engineering Student Number:

Faculty of Engineering Student Number: Philadelphia University Student Name: Faculty of Engineering Student Number: Dept. of Computer Engineering Final Exam, First Semester: 2012/2013 Course Title: Microprocessors Date: 17/01//2013 Course No:

More information

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV UNIT I THE 8086 MICROPROCESSOR 1. What is the purpose of segment registers

More information

Software Programmable DSP Platform Analysis Episode 7, Monday 19 March 2007, Ingredients. Software Pipelining. Data Dependence. Resource Constraints

Software Programmable DSP Platform Analysis Episode 7, Monday 19 March 2007, Ingredients. Software Pipelining. Data Dependence. Resource Constraints Software Programmable DSP Platform Analysis Episode 7, Monday 19 March 7, Ingredients Software Pipelining Data & Resource Constraints Resource Constraints in C67x Loop Scheduling Without Resource Bounds

More information

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013

CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013 Oct 4, 2013, p 1 Name: CS 141: Introduction to (Java) Programming: Exam 1 Jenny Orr Willamette University Fall 2013 1. (max 18) 4. (max 16) 2. (max 12) 5. (max 12) 3. (max 24) 6. (max 18) Total: (max 100)

More information

Parallel and Distributed Computing Programming Assignment 1

Parallel and Distributed Computing Programming Assignment 1 Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong

More information

AN10850. LPC1700 timer triggered memory to GPIO data transfer. Document information. LPC1700, GPIO, DMA, Timer0, Sleep Mode

AN10850. LPC1700 timer triggered memory to GPIO data transfer. Document information. LPC1700, GPIO, DMA, Timer0, Sleep Mode LPC1700 timer triggered memory to GPIO data transfer Rev. 01 16 July 2009 Application note Document information Info Keywords Abstract Content LPC1700, GPIO, DMA, Timer0, Sleep Mode This application note

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

An Introduction to Assembly Programming with the ARM 32-bit Processor Family An Introduction to Assembly Programming with the ARM 32-bit Processor Family G. Agosta Politecnico di Milano December 3, 2011 Contents 1 Introduction 1 1.1 Prerequisites............................. 2

More information

Using C to Access Data Stored in Program Space Memory on the TMS320C24x DSP

Using C to Access Data Stored in Program Space Memory on the TMS320C24x DSP Application Report SPRA380 April 2002 Using C to Access Data Stored in Program Space Memory on the TMS320C24x DSP David M. Alter DSP Applications - Semiconductor Group ABSTRACT Efficient utilization of

More information

Eliminate Memory Errors and Improve Program Stability

Eliminate Memory Errors and Improve Program Stability Eliminate Memory Errors and Improve Program Stability with Intel Parallel Studio XE Can running one simple tool make a difference? Yes, in many cases. You can find errors that cause complex, intermittent

More information

Altera SDK for OpenCL

Altera SDK for OpenCL Altera SDK for OpenCL Best Practices Guide Subscribe OCL003-15.0.0 101 Innovation Drive San Jose, CA 95134 www.altera.com TOC-2 Contents...1-1 Introduction...1-1 FPGA Overview...1-1 Pipelines... 1-2 Single

More information

OpenCL Programming for the CUDA Architecture. Version 2.3

OpenCL Programming for the CUDA Architecture. Version 2.3 OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different

More information

System Design Issues in Embedded Processing

System Design Issues in Embedded Processing System Design Issues in Embedded Processing 9/16/10 Jacob Borgeson 1 Agenda What does TI do? From MCU to MPU to DSP: What are some trends? Design Challenges Tools to Help 2 TI - the complete system The

More information

Programming Language Pragmatics

Programming Language Pragmatics Programming Language Pragmatics THIRD EDITION Michael L. Scott Department of Computer Science University of Rochester ^ШШШШШ AMSTERDAM BOSTON HEIDELBERG LONDON, '-*i» ЩЛ< ^ ' m H NEW YORK «OXFORD «PARIS»SAN

More information

Floating Point C Compiler: Tips and Tricks Part I

Floating Point C Compiler: Tips and Tricks Part I TMS320 DSP DESIGNER S NOTEBOOK Floating Point C Compiler: Tips and Tricks Part I APPLICATION BRIEF: SPRA229 Karen Baldwin Digital Signal Processing Products Semiconductor Group Texas Instruments June 1993

More information

Software Pipelining. Y.N. Srikant. NPTEL Course on Compiler Design. Department of Computer Science Indian Institute of Science Bangalore 560 012

Software Pipelining. Y.N. Srikant. NPTEL Course on Compiler Design. Department of Computer Science Indian Institute of Science Bangalore 560 012 Department of Computer Science Indian Institute of Science Bangalore 560 2 NPTEL Course on Compiler Design Introduction to Overlaps execution of instructions from multiple iterations of a loop Executes

More information

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 1 - INTRODUCTION JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ Unit 1.MaNoTaS 1 Definitions (I) Description A computer is: A

More information

Digital signal processor fundamentals and system design

Digital signal processor fundamentals and system design Digital signal processor fundamentals and system design M.E. Angoletta CERN, Geneva, Switzerland Abstract Digital Signal Processors (DSPs) have been used in accelerator systems for more than fifteen years

More information

MAQAO Performance Analysis and Optimization Tool

MAQAO Performance Analysis and Optimization Tool MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22

More information

Embedded System Hardware - Processing (Part II)

Embedded System Hardware - Processing (Part II) 12 Embedded System Hardware - Processing (Part II) Jian-Jia Chen (Slides are based on Peter Marwedel) Informatik 12 TU Dortmund Germany Springer, 2010 2014 年 11 月 11 日 These slides use Microsoft clip arts.

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

13. Publishing Component Information to Embedded Software

13. Publishing Component Information to Embedded Software February 2011 NII52018-10.1.0 13. Publishing Component Information to Embedded Software NII52018-10.1.0 This document describes how to publish SOPC Builder component information for embedded software tools.

More information

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language Chapter 4 Register Transfer and Microoperations Section 4.1 Register Transfer Language Digital systems are composed of modules that are constructed from digital components, such as registers, decoders,

More information

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture. February 2012 Introduction Reference Design RD1031 Adaptive algorithms have become a mainstay in DSP. They are used in wide ranging applications including wireless channel estimation, radar guidance systems,

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

1 Classical Universal Computer 3

1 Classical Universal Computer 3 Chapter 6: Machine Language and Assembler Christian Jacob 1 Classical Universal Computer 3 1.1 Von Neumann Architecture 3 1.2 CPU and RAM 5 1.3 Arithmetic Logical Unit (ALU) 6 1.4 Arithmetic Logical Unit

More information

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor

More information

150127-Microprocessor & Assembly Language

150127-Microprocessor & Assembly Language Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an

More information

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r : 1. FHE DEFINITIVE GUIDE fir y ti rvrrtuttnrr i t i r ^phihri^^lv ;\}'\^X$:^u^'! :: ^ : ',!.4 '. JEFFREY GARBUS PLAMEN ratchev Alvin Chang Joe Celko g JONES & BARTLETT LEARN IN G Contents About the Authors

More information

Monitoring TMS320C240 Peripheral Registers in the Debugger Software

Monitoring TMS320C240 Peripheral Registers in the Debugger Software TMS320 DSP DESIGNER S NOTEBOOK Monitoring TMS320C240 Peripheral Registers in the Debugger Software APPLICATION BRIEF: SPRA276 Jeff Crankshaw Digital Signal Processor Solutions May 1997 IMPORTANT NOTICE

More information

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C 1 An essential part of any embedded system design Programming 2 Programming in Assembly or HLL Processor and memory-sensitive

More information

IA-64 Application Developer s Architecture Guide

IA-64 Application Developer s Architecture Guide IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve

More information

(Refer Slide Time: 00:01:16 min)

(Refer Slide Time: 00:01:16 min) Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

More information

System Administration of Windchill 10.2

System Administration of Windchill 10.2 System Administration of Windchill 10.2 Overview Course Code Course Length TRN-4340-T 3 Days In this course, you will gain an understanding of how to perform routine Windchill system administration tasks,

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

Building Applications Using Micro Focus COBOL

Building Applications Using Micro Focus COBOL Building Applications Using Micro Focus COBOL Abstract If you look through the Micro Focus COBOL documentation, you will see many different executable file types referenced: int, gnt, exe, dll and others.

More information

NEC Storage Manager User's Manual

NEC Storage Manager User's Manual NEC Storage Manager User's Manual NEC Corporation 2001-2003 No part of the contents of this book may be reproduced or transmitted in any form without permission of NEC Corporation. The contents of this

More information

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,

More information

NEC Storage Performance Monitor/Optimizer User s Manual

NEC Storage Performance Monitor/Optimizer User s Manual NEC Storage Performance Monitor/Optimizer User s Manual NEC Corporation 2001-2003 No part of the contents of this book may be reproduced or transmitted in any form without permission of NEC Corporation.

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

Teaching DSP through the Practical Case Study of an FSK Modem

Teaching DSP through the Practical Case Study of an FSK Modem Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the

More information

OpenSPARC T1 Processor

OpenSPARC T1 Processor OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware

More information

Implementing and Administering an Enterprise SharePoint Environment

Implementing and Administering an Enterprise SharePoint Environment Implementing and Administering an Enterprise SharePoint Environment There are numerous planning and management issues that your team needs to address when deploying SharePoint. This process can be simplified

More information

THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES APPLICATION CONFIGURABLE PROCESSORS CHRISTOPHER J. ZIMMER

THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES APPLICATION CONFIGURABLE PROCESSORS CHRISTOPHER J. ZIMMER THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES APPLICATION CONFIGURABLE PROCESSORS By CHRISTOPHER J. ZIMMER A Thesis submitted to the Department of Computer Science In partial fulfillment of

More information

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1 MICROPROCESSOR A microprocessor incorporates the functions of a computer s central processing unit (CPU) on a single Integrated (IC), or at most a few integrated circuit. It is a multipurpose, programmable

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

CUDA Programming. Week 4. Shared memory and register

CUDA Programming. Week 4. Shared memory and register CUDA Programming Week 4. Shared memory and register Outline Shared memory and bank confliction Memory padding Register allocation Example of matrix-matrix multiplication Homework SHARED MEMORY AND BANK

More information

Traditional IBM Mainframe Operating Principles

Traditional IBM Mainframe Operating Principles C H A P T E R 1 7 Traditional IBM Mainframe Operating Principles WHEN YOU FINISH READING THIS CHAPTER YOU SHOULD BE ABLE TO: Distinguish between an absolute address and a relative address. Briefly explain

More information

TMS320C6000 Code Composer Studio Tutorial

TMS320C6000 Code Composer Studio Tutorial TMS320C6000 Code Composer Studio Tutorial Literature Number: SPRU301C February 2000 Printed on Recycled Paper IMPORTANT NOTICE Texas Instruments and its subsidiaries (TI) reserves the right to make changes

More information

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program. Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to

More information

Beginning C# 5.0. Databases. Vidya Vrat Agarwal. Second Edition

Beginning C# 5.0. Databases. Vidya Vrat Agarwal. Second Edition Beginning C# 5.0 Databases Second Edition Vidya Vrat Agarwal Contents J About the Author About the Technical Reviewer Acknowledgments Introduction xviii xix xx xxi Part I: Understanding Tools and Fundamentals

More information

StrongARM** SA-110 Microprocessor Instruction Timing

StrongARM** SA-110 Microprocessor Instruction Timing StrongARM** SA-110 Microprocessor Instruction Timing Application Note September 1998 Order Number: 278194-001 Information in this document is provided in connection with Intel products. No license, express

More information

Web Services Performance: Comparing Java 2 TM Enterprise Edition (J2EE TM platform) and the Microsoft.NET Framework

Web Services Performance: Comparing Java 2 TM Enterprise Edition (J2EE TM platform) and the Microsoft.NET Framework Web Services Performance: Comparing 2 TM Enterprise Edition (J2EE TM platform) and the Microsoft Framework A Response to Sun Microsystem s Benchmark Microsoft Corporation July 24 Introduction In June 24,

More information

Levels of Programming Languages. Gerald Penn CSC 324

Levels of Programming Languages. Gerald Penn CSC 324 Levels of Programming Languages Gerald Penn CSC 324 Levels of Programming Language Microcode Machine code Assembly Language Low-level Programming Language High-level Programming Language Levels of Programming

More information

Basic System. Vyatta System. REFERENCE GUIDE Using the CLI Working with Configuration System Management User Management Logging VYATTA, INC.

Basic System. Vyatta System. REFERENCE GUIDE Using the CLI Working with Configuration System Management User Management Logging VYATTA, INC. VYATTA, INC. Vyatta System Basic System REFERENCE GUIDE Using the CLI Working with Configuration System Management User Management Logging Vyatta Suite 200 1301 Shoreway Road Belmont, CA 94002 vyatta.com

More information

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage Outline 1 Computer Architecture hardware components programming environments 2 Getting Started with Python installing Python executing Python code 3 Number Systems decimal and binary notations running

More information

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks

More information

An Introduction to the ARM 7 Architecture

An Introduction to the ARM 7 Architecture An Introduction to the ARM 7 Architecture Trevor Martin CEng, MIEE Technical Director This article gives an overview of the ARM 7 architecture and a description of its major features for a developer new

More information

Code Composer Studio Getting Started Guide

Code Composer Studio Getting Started Guide Code Composer Studio Getting Started Guide Literature Number: SPRU509 May 2001 Printed on Recycled Paper IMPORTANT NOTICE Texas Instruments and its subsidiaries (TI) reserve the right to make changes to

More information

Operating Systems 4 th Class

Operating Systems 4 th Class Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science

More information

GPU Hardware Performance. Fall 2015

GPU Hardware Performance. Fall 2015 Fall 2015 Atomic operations performs read-modify-write operations on shared or global memory no interference with other threads for 32-bit and 64-bit integers (c. c. 1.2), float addition (c. c. 2.0) using

More information

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation PLDI 03 A Static Analyzer for Large Safety-Critical Software B. Blanchet, P. Cousot, R. Cousot, J. Feret L. Mauborgne, A. Miné, D. Monniaux,. Rival CNRS École normale supérieure École polytechnique Paris

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Abstract. Cycle Domain Simulator for Phase-Locked Loops

Abstract. Cycle Domain Simulator for Phase-Locked Loops Abstract Cycle Domain Simulator for Phase-Locked Loops Norman James December 1999 As computers become faster and more complex, clock synthesis becomes critical. Due to the relatively slower bus clocks

More information

Chapter 7D The Java Virtual Machine

Chapter 7D The Java Virtual Machine This sub chapter discusses another architecture, that of the JVM (Java Virtual Machine). In general, a VM (Virtual Machine) is a hypothetical machine (implemented in either hardware or software) that directly

More information

Real Time Programming: Concepts

Real Time Programming: Concepts Real Time Programming: Concepts Radek Pelánek Plan at first we will study basic concepts related to real time programming then we will have a look at specific programming languages and study how they realize

More information