Embedded System Hardware - Processing (Part II)
|
|
|
- Hilary Strickland
- 10 years ago
- Views:
Transcription
1 12 Embedded System Hardware - Processing (Part II) Jian-Jia Chen (Slides are based on Peter Marwedel) Informatik 12 TU Dortmund Germany Springer, 年 11 月 11 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.
2 Embedded System Hardware Embedded system hardware is frequently used in a loop ( hardware in a loop ): F cyber-physical systems - 2 -
3 Key requirement #2: Code-size efficiency Overview: codecompression/codecompression/node1.html Compression techniques: key idea - 3 -
4 Code-size efficiency Compression techniques (continued): 2nd instruction set, e.g. ARM Thumb instruction set: Rd Constant major opcode minor opcode source= destination 16-bit Thumb instr. ADD Rd #constant zero extended Rd 0 Rd 0000 Constant Reduction to % of original code size 130% of ARM performance with 8/16 bit memory 85% of ARM performance with 32-bit memory Same approach for LSI TinyRisc, Requires support by compiler, assembler etc. Dynamically decoded at run-time [ARM, R. Gupta] - 4 -
5 Dictionary approach, two level control store (indirect addressing of instructions) Dictionary-based coding schemes cover a wide range of various coders and compressors. Their common feature is that the methods use some kind of a dictionary that contains parts of the input sequence which frequently appear. The encoded sequence in turn contains references to the dictionary elements rather than containing these over and over. [Á. Beszédes et al.: Survey of Code size Reduction Methods, Survey of Code-Size Reduction Methods, ACM Computing Surveys, Vol. 35, Sept. 2003, pp ] - 5 -
6 Key idea (for d bit instructions) instruction address c 2 b CPU b S a b «d bit For each instruction address, S contains table address of instruction. table of used instructions ( dictionary ) d bit Uncompressed storage of a d-bit-wide instructions requires a x d bits. In compressed code, each instruction pattern is stored only once. small Hopefully, axb+cxd < axd. Called nanoprogramming in the Motorola
7 Key requirement #3: Run-time efficiency - Domain-oriented architectures - Example: Filtering in Digital signal processing (DSP) Signal at t=t s (sampling points) - 7 -
8 Key Requirement #4: Real-time capability Timing behavior has to be predictable Features that cause problems: Unpredictable access to shared resources - Caches with difficult to predict replacement strategies - Unified caches (conflicts between instructions and data) Pipelines with difficult to predict stall cycles ("bubbles") Unpredictable communication times for multiprocessors Branch prediction, speculative execution Interrupts that are possible any time Memory refreshes that are possible any time Instructions that have data-dependent execution times F Trying to avoid as many of these as possible. [Dagstuhl workshop on predictability, Nov , 2003] - 8 -
9 12 DSP and Multimedia Processors Springer, 2010 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.
10 Saturating arithmetic Returns largest/smallest number in case of over/ underflows Example: a 0111 b standard wrap around arithmetic (1)0000 saturating arithmetic 1111 (a+b)/2: correct 1000 wrap around arithmetic 0000 saturating arithmetic + shifted 0111 Appropriate for DSP/multimedia applications: No timeliness of results if interrupts are generated for overflows Precise values less important Wrap around arithmetic would be worse. almost correct
11 On-Site Exercise
12 Fixed-point arithmetic Shifting required after multiplications and divisions in order to maintain binary point
13 On-Site Exercise
14 Multimedia-Instructions, Short vector extensions, Streaming extensions, SIMD instructions Multimedia instructions exploit that many registers, adders etc are quite wide (32/64 bit), whereas most multimedia data types are narrow F 2-8 values can be stored per register and added. E.g.: a1 32 bits a2 b1 32 bits b2 c bits c2 2 additions per instruction; no carry at bit 16 Cheap way of using parallelism F SSE instruction set extensions, SIMD instructions
15 Key idea of very long instruction word (VLIW) computers (1) Instructions included in long instruction packets. Multiple instruction are assumed to be executed in parallel. Fixed association of packet bits with functional units. Compiler is assumed to generate these parallel packets
16 Key idea of very long instruction word (VLIW) computers (2) Complexity of finding parallelism is moved from the hardware (RISC/CISC processors) to the compiler; Ideally, this avoids the overhead (silicon, energy,..) of identifying parallelism at run-time. F A lot of expectations into VLIW machines However, possibly low code efficiency, due to many NOPs F Explicitly parallel instruction set computers (EPICs) are an extension of VLIW architectures: parallelism detected by compiler, but no need to encode parallelism in 1 word
17 EPIC: TMS 320C6xxx as an example 1 Bit per instruction encodes end of parallel exec Instr. A Instr. B Instr. C Instr. D Instr. E Instr. F Instr. G Cycle Instruction 1 A 2 B C D 3 E F G Instructions B, C and D use disjoint functional units, cross paths and other data path resources. The same is also true for E, F and G
18 Partitioned register files Many memory ports are required to supply enough operands per cycle. Memories with many ports are expensive. F Registers are partitioned into (typically 2) sets, e.g. for TI C6xxx:
19 Microcontrollers - MHS 80C51 as an example - 8-bit CPU optimised for control applications Extensive Boolean processing capabilities 64 k Program Memory address space 64 k Data Memory address space 4 k bytes of on chip Program Memory 128 bytes of on chip data RAM 32 bi-directional and individually addressable I/O lines Two 16-bit timers/counters Full duplex UART 6 sources/5-vector interrupt structure with 2 priority levels On chip clock oscillators Very popular CPU with many different variations
20 Energy Efficiency of FPGAs Hugo De Man, IMEC, Philips,
21 Reconfigurable Logic Custom HW may be too expensive, SW too slow. Combine the speed of HW with the flexibility of SW F HW with programmable functions and interconnect. F Use of configurable hardware; common form: field programmable gate arrays (FPGAs) Applications: algorithms like de/encryption, pattern matching in bioinformatics, high speed event filtering (high energy physics), high speed special purpose hardware. Very popular devices from XILINX, Actel, Altera and others
22 Fine-Grained Reconfigurable Fabric CLB: Configurable Logic Block PSM: Programmable Switch Matrix Additionally: I/O Blocks, RAM Blocks, Multiplier, CPUs, src: Xilinx Data Sheets
23 Fine-Grained Reconfigurable Fabric Logic Layer: Perform Computations Configuration Layer: Which Computations to Perform Partial Reconfiguration: Only parts of the Fabric are reconfigured Configuration Data Logic layer Logic layer Configuration Layer Configuration Memory (off-chip) src: Kalenteridis et al. A complete platform and toolset for system implementation on fine-grained reconfigurable hardware, Microprocessors and Microsystems
24 Coarse-Grained Reconfigurable Fabric Pros: Reduced Area Higher Frequency Reduced reconfiguration latency Cons Reduced Flexibility Area Wastage à In case computations do not suite the fabric (e.g. bit-level operations, bit shuffling) src: PACT s XPP 64-A1 architecture
25 Reconfigurable Computing: an example Core Processor (CISA) Reconfiguration Reconfigurable HW SIs are executed using core Processor during Reconfiguration Exec. of ME Exec. of EE FPGA partitioned into Containers Reconfiguration Reconfiguration Exec. of LF ME Motion Estimation EE LF LEGEND: Encoding Engine Loop Filter TIME Runtime reconfigurable processor: + Efficient use of hardware: through high degree of parallelism, multiplexed used of reconf. fabric => energy efficient Reconfiguration Latency and Energy
26 Summary Processing VLIW/EPIC processors MPSoCs FPGAs
27 Spare Slides
28 Filtering in digital signal processing ADSP outer loop over -- sampling times t s { MR:=0; A1:=1; A2:=s-1; MX:=w[s]; MY:=a[0]; for (k=0; k <= (n 1); k++) { MR:=MR + MX * MY; MX:=w[A2]; MY:=a[A1]; A1++; A2--; } x[s]:=mr; } Maps nicely
29 DSP-Processors: multiply/accumulate (MAC) and zero-overhead loop (ZOL) instructions MR:=0; A1:=1; A2:=s-1; MX:=w[s]; MY:=a[0]; for ( k:=0 <= n-1) {MR:=MR+MX*MY; MY:=a[A1]; MX:=w[A2]; A1++; A2--} Multiply/accumulate (MAC) instruction Zero-overhead loop (ZOL) instruction preceding MAC instruction. Loop testing done in parallel to MAC operations
30 Heterogeneous registers Example (ADSP 210x): D P Addressregisters A0, A1, A2.. Address generation unit (AGU) AX AY +,-,.. AR AF MX * +,- MR MY MF Different functionality of registers An, AX, AY, AF,MX, MY, MF, MR
31 Separate address generation units (AGUs) Example (ADSP 210x): Data memory can only be fetched with address contained in A, but this can be done in parallel with operation in main data path (takes effectively 0 time). A := A ± 1 also takes 0 time, same for A := A ± M; A := <immediate in instruction> requires extra instruction F Minimize load immediates F Optimization in optimization chapter
32 Modulo addressing Modulo addressing: Am++ Am:=(Am+1) mod n (implements ring or circular buffer in memory) w sliding window t1 t n most recent values.. w[t1-1] w[t1] w[t1-n+1] w[t1-n+2].... w[t1-1] w[t1] w[t1+1] w[t1-n+2].. Memory, t=t1 Memory, t2= t
Embedded System Hardware - Processing -
12 Embedded System Hardware - Processing - Peter Marwedel Informatik 12 TU Dortmund Germany Graphics: Alexandra Nolte, Gesine Marwedel, 2003 2010 年 11 月 15 日 These slides use Microsoft clip arts. Microsoft
1. Introduction to Embedded System Design
1. Introduction to Embedded System Design Lothar Thiele ETH Zurich, Switzerland 1-1 Contents of Lectures (Lothar Thiele) 1. Introduction to Embedded System Design 2. Software for Embedded Systems 3. Real-Time
Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] NIOS II 1 1 What is Nios II? Altera s Second Generation
Architectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2
Reconfigurable Architectures Chapter 3.2 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design Coarse-Grained Reconfigurable Devices Recall: 1. Brief Historically development (Estrin Fix-Plus
Computer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
PROBLEMS #20,R0,R1 #$3A,R2,R4
506 CHAPTER 8 PIPELINING (Corrisponde al cap. 11 - Introduzione al pipelining) PROBLEMS 8.1 Consider the following sequence of instructions Mul And #20,R0,R1 #3,R2,R3 #$3A,R2,R4 R0,R2,R5 In all instructions,
CHAPTER 7: The CPU and Memory
CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
ARM Microprocessor and ARM-Based Microcontrollers
ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 A Microcontroller-Based Embedded System Roadmap 1 Introduction ARM ARM Basics 2 ARM Extensions Thumb Jazelle NEON & DSP Enhancement
Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education
Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
Embedded System Design: Embedded Systems Foundations of Cyber-Physical Systems
12 Embedded System Design: Embedded Systems Foundations of Cyber-Physical Systems Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Springer, 2010 2015 年 10 月 21 日 These slides
Let s put together a Manual Processor
Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single
A Lab Course on Computer Architecture
A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 ISSN 0976 6464(Print)
UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995
UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering EEC180B Lab 7: MISP Processor Design Spring 1995 Objective: In this lab, you will complete the design of the MISP processor,
Instruction Set Architecture
Instruction Set Architecture Consider x := y+z. (x, y, z are memory variables) 1-address instructions 2-address instructions LOAD y (r :=y) ADD y,z (y := y+z) ADD z (r:=r+z) MOVE x,y (x := y) STORE x (x:=r)
what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
COMPUTER HARDWARE. Input- Output and Communication Memory Systems
COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)
150127-Microprocessor & Assembly Language
Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an
Chapter 2 Heterogeneous Multicore Architecture
Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is
Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
LSN 2 Computer Processors
LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2
Instruction Set Design
Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,
Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:
Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove
(Refer Slide Time: 00:01:16 min)
Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control
Introduction to the Latest Tensilica Baseband Solutions
Introduction to the Latest Tensilica Baseband Solutions Dr. Chris Rowen Founder and Chief Technology Officer Tensilica Inc. Outline The Mobile Wireless Challenge Multi-standard Baseband Tensilica Fits
Chapter 2 Logic Gates and Introduction to Computer Architecture
Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are
Aperiodic Task Scheduling
Aperiodic Task Scheduling Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Germany Springer, 2010 2014 年 11 月 19 日 These slides use Microsoft clip arts. Microsoft copyright
Lecture N -1- PHYS 3330. Microcontrollers
Lecture N -1- PHYS 3330 Microcontrollers If you need more than a handful of logic gates to accomplish the task at hand, you likely should use a microcontroller instead of discrete logic gates 1. Microcontrollers
MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1
MICROPROCESSOR A microprocessor incorporates the functions of a computer s central processing unit (CPU) on a single Integrated (IC), or at most a few integrated circuit. It is a multipurpose, programmable
CS352H: Computer Systems Architecture
CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Data Hazards in ALU Instructions
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction
İSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
FLIX: Fast Relief for Performance-Hungry Embedded Applications
FLIX: Fast Relief for Performance-Hungry Embedded Applications Tensilica Inc. February 25 25 Tensilica, Inc. 25 Tensilica, Inc. ii Contents FLIX: Fast Relief for Performance-Hungry Embedded Applications...
Systolic Computing. Fundamentals
Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW
Pipelining Review and Its Limitations
Pipelining Review and Its Limitations Yuri Baida [email protected] [email protected] October 16, 2010 Moscow Institute of Physics and Technology Agenda Review Instruction set architecture Basic
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
A Generic Network Interface Architecture for a Networked Processor Array (NePA)
A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction
Chapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language
Chapter 4 Register Transfer and Microoperations Section 4.1 Register Transfer Language Digital systems are composed of modules that are constructed from digital components, such as registers, decoders,
BASIC COMPUTER ORGANIZATION AND DESIGN
1 BASIC COMPUTER ORGANIZATION AND DESIGN Instruction Codes Computer Registers Computer Instructions Timing and Control Instruction Cycle Memory Reference Instructions Input-Output and Interrupt Complete
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Memory Systems. Static Random Access Memory (SRAM) Cell
Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled
SOC architecture and design
SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external
Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus
Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,
MICROPROCESSOR AND MICROCOMPUTER BASICS
Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
AMD Opteron Quad-Core
AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced
Central Processing Unit (CPU)
Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following
The Evolution of CCD Clock Sequencers at MIT: Looking to the Future through History
The Evolution of CCD Clock Sequencers at MIT: Looking to the Future through History John P. Doty, Noqsi Aerospace, Ltd. This work is Copyright 2007 Noqsi Aerospace, Ltd. This work is licensed under the
FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.
3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 s Introduction Convolution is one of the basic and most common operations in both analog and digital domain signal processing.
Computer Organization and Components
Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides
All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule
All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:
FPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
IA-64 Application Developer s Architecture Guide
IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution
OC By Arsene Fansi T. POLIMI 2008 1
IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5
Float to Fix conversion
www.thalesgroup.com Float to Fix conversion Fabrice Lemonnier Research & Technology 2 / Thales Research & Technology : Research center of Thales Objective: to propose technological breakthrough for the
Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor
Von der Hardware zur Software in FPGAs mit Embedded Prozessoren Alexander Hahn Senior Field Application Engineer Lattice Semiconductor AGENDA Overview Mico32 Embedded Processor Development Tool Chain HW/SW
Middleware. Peter Marwedel TU Dortmund, Informatik 12 Germany. technische universität dortmund. fakultät für informatik informatik 12
Universität Dortmund 12 Middleware Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 2003 2010 年 11 月 26 日 These slides use Microsoft clip arts. Microsoft copyright
VLIW Processors. VLIW Processors
1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW
NIOS II Based Embedded Web Server Development for Networking Applications
NIOS II Based Embedded Web Server Development for Networking Applications 1 Sheetal Bhoyar, 2 Dr. D. V. Padole 1 Research Scholar, G. H. Raisoni College of Engineering, Nagpur, India 2 Professor, G. H.
Pipeline Hazards. Structure hazard Data hazard. ComputerArchitecture_PipelineHazard1
Pipeline Hazards Structure hazard Data hazard Pipeline hazard: the major hurdle A hazard is a condition that prevents an instruction in the pipe from executing its next scheduled pipe stage Taxonomy of
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana
IMPLEMENTATION OF FPGA CARD IN CONTENT FILTERING SOLUTIONS FOR SECURING COMPUTER NETWORKS. Received May 2010; accepted July 2010
ICIC Express Letters Part B: Applications ICIC International c 2010 ISSN 2185-2766 Volume 1, Number 1, September 2010 pp. 71 76 IMPLEMENTATION OF FPGA CARD IN CONTENT FILTERING SOLUTIONS FOR SECURING COMPUTER
Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek
Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture
Embedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven [email protected] Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27
Logistics Week 1: Wednesday, Jan 27 Because of overcrowding, we will be changing to a new room on Monday (Snee 1120). Accounts on the class cluster (crocus.csuglab.cornell.edu) will be available next week.
Driving force. What future software needs. Potential research topics
Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #
Digital Hardware Design Decisions and Trade-offs for Software Radio Systems
Digital Hardware Design Decisions and Trade-offs for Software Radio Systems John Patrick Farrell This thesis is submitted to the Faculty of Virginia Polytechnic Institute and State University in partial
ELEC 5260/6260/6266 Embedded Computing Systems
ELEC 5260/6260/6266 Embedded Computing Systems Spring 2016 Victor P. Nelson Text: Computers as Components, 3 rd Edition Prof. Marilyn Wolf (Georgia Tech) Course Topics Embedded system design & modeling
CPU Organisation and Operation
CPU Organisation and Operation The Fetch-Execute Cycle The operation of the CPU 1 is usually described in terms of the Fetch-Execute cycle. 2 Fetch-Execute Cycle Fetch the Instruction Increment the Program
General Pipeline System Setup Information
Product Sheet General Pipeline Information Because of Pipeline s unique network attached architecture it is important to understand each component of a Pipeline system in order to create a system that
ARM Architecture. ARM history. Why ARM? ARM Ltd. 1983 developed by Acorn computers. Computer Organization and Assembly Languages Yung-Yu Chuang
ARM history ARM Architecture Computer Organization and Assembly Languages g Yung-Yu Chuang 1983 developed by Acorn computers To replace 6502 in BBC computers 4-man VLSI design team Its simplicity it comes
INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER
Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano [email protected] Prof. Silvano, Politecnico di Milano
Spacecraft Computer Systems. Colonel John E. Keesee
Spacecraft Computer Systems Colonel John E. Keesee Overview Spacecraft data processing requires microcomputers and interfaces that are functionally similar to desktop systems However, space systems require:
International Workshop on Field Programmable Logic and Applications, FPL '99
International Workshop on Field Programmable Logic and Applications, FPL '99 DRIVE: An Interpretive Simulation and Visualization Environment for Dynamically Reconægurable Systems? Kiran Bondalapati and
LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.
February 2012 Introduction Reference Design RD1031 Adaptive algorithms have become a mainstay in DSP. They are used in wide ranging applications including wireless channel estimation, radar guidance systems,
MACHINE ARCHITECTURE & LANGUAGE
in the name of God the compassionate, the merciful notes on MACHINE ARCHITECTURE & LANGUAGE compiled by Jumong Chap. 9 Microprocessor Fundamentals A system designer should consider a microprocessor-based
Operating Systems 4 th Class
Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science
An Overview of Stack Architecture and the PSC 1000 Microprocessor
An Overview of Stack Architecture and the PSC 1000 Microprocessor Introduction A stack is an important data handling structure used in computing. Specifically, a stack is a dynamic set of elements in which
Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x
Software Pipelining for (i=1, i
Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic
Aims and Objectives E 3.05 Digital System Design Peter Cheung Department of Electrical & Electronic Engineering Imperial College London URL: www.ee.ic.ac.uk/pcheung/ E-mail: [email protected] How to go
Computer organization
Computer organization Computer design an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + datapath Control = finite state machine inputs
Course on Advanced Computer Architectures
Course on Advanced Computer Architectures Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, September 3rd, 2015 Prof. C. Silvano EX1A ( 2 points) EX1B ( 2
A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications
1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture
Introduction to System-on-Chip
Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
