Embedded System Hardware - Processing -
|
|
|
- Gyles Dickerson
- 9 years ago
- Views:
Transcription
1 12 Embedded System Hardware - Processing - Peter Marwedel Informatik 12 TU Dortmund Germany Graphics: Alexandra Nolte, Gesine Marwedel, 年 11 月 15 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.
2 Key idea of very long instruction word (VLIW) computers Instructions included in long instruction packets. Instruction packets are assumed to be executed in parallel. Fixed association of packet bits with functional units
3 Very long instruction word (VLIW) architectures Very long instruction word ( instruction packet ) contains several instructions, all of which are assumed to be executed in parallel. Compiler is assumed to generate these parallel packets Complexity of finding parallelism is moved from the hardware (RISC/CISC processors) to the compiler; Ideally, this avoids the overhead (silicon, energy,..) of identifying parallelism at run-time. A lot of expectations into VLIW machines Explicitly parallel instruction set computers (EPICs) are an extension of VLIW architectures: parallelism detected by compiler, but no need to encode parallelism in 1 word
4 EPIC: TMS 320C6xx as an example Bit in each instruction encodes end of parallel execution Instr. A Instr. B Instr. C Instr. D Instr. E Instr. F Instr. G Cycle Instruction 1 A 2 B C D 3 E F G Instructions B, C and D use disjoint functional units, cross paths and other data path resources. The same is also true for E, F and G. Parallel execution cannot span several packets
5 Partitioned register files Many memory ports are required to supply enough operands per cycle. Memories with many ports are expensive. Registers are partitioned into (typically 2) sets, e.g. for TI C60x: - 5 -
6 More encoding flexibility with IA-64 Itanium 3 instructions per bundle: instruc 1 instruc 2 instruc 3 template There are 5 instruction types: A: common ALU instructions I: more special integer instructions (e.g. shifts) M: Memory instructions F: floating point instructions B: branches The following combinations can be encoded in templates: MII, MMI, MFI, MIB, MMB, MFB, MMF, MBB, BBB, MLX with LX = move 64-bit immediate encoded in 2 slots Instruction grouping information - 6 -
7 Templates and instruction types End of parallel execution called stops. Stops are denoted by underscores. Example: bundle 1 bundle 2 MMI M_II MFI_ MII MMI MIB_ Group 1 Group 2 Group 3 Very restricted placement of stops within bundle. Parallel execution within groups possible. Parallel execution can span several bundles - 7 -
8 Instruction types are mapped to functional unit types There are 4 functional unit (FU) types: M: Memory Unit I: Integer Unit F: Floating-Point Unit B: Branch Unit Instruction types corresponding FU type, except type A (mapping to either I or M-functional units)
9 L3 cache Implementation: Itanium 2 (2003) 410M transistors 374 mm 2 die size 6MB on-die L3 cache 1.5 GHz at 1.3V [ftp://download.intel.com/design/itaniu m2/download/madison_slides_r1.pdf] Intel,
10 Philips TriMedia- Processor For multimediaapplications, up up to to 5 instructions/ cycle. datasheets/pnx15xx_ser_n_3. pdf (incompatible with firefox?) NXP
11 Large # of delay slots, a problem of VLIW processors add sub and or sub mult xor div ld st mv beq
12 Large # of delay slots, a problem of VLIW processors add sub and or sub mult xor div ld st mv beq
13 Large # of delay slots, a problem of VLIW processors add sub and or sub mult xor div ld st mv beq The execution of many instructions has been started before it is realized that a branch was required. Nullifying those instructions would waste compute power Executing those instructions is declared a feature, not a bug. How to fill all delay slots with useful instructions? Avoid branches wherever possible
14 Predicated execution: Implementing IF-statements branch-free Conditional Instruction [c] I consists of: condition c instruction I c = true => I executed c = false => NOP
15 Predicated execution: Implementing IF-statements branch-free : TI C6x if (c) { a = x + y; b = x + z; } else { a = x - y; b = x - z; } Conditional branch [c] B L1 NOP 5 B L2 NOP 4 SUB x,y,a SUB x,z,b L1: ADD x,y,a ADD x,z,b L2: max. 12 cycles Predicated execution [c] ADD x,y,a [c] ADD x,z,b [!c] SUB x,y,a [!c] SUB x,z,b 1 cycle
16 Microcontrollers - MHS 80C51 as an example - 8-bit CPU optimised for control applications Extensive Boolean processing capabilities 64 k Program Memory address space 64 k Data Memory address space 4 k bytes of on chip Program Memory 128 bytes of on chip data RAM 32 bi-directional and individually addressable I/O lines Two 16-bit timers/counters Full duplex UART 6 sources/5-vector interrupt structure with 2 priority levels On chip clock oscillators Very popular CPU with many different variations Features for Embedded Systems Moved from
17 Trend: multiprocessor systems-on-a-chip (MPSoCs)
18 Multiprocessor systems-on-a-chip (MPSoCs) (2)
19 Multiprocessor systems-on-a-chip (MPSoCs) (3)
20 Multiprocessor systems-on-a-chip (MPSoCs) (4) Hugo De Man, IMEC, 2007 ~50% inherent power efficiency of silicon
21 12 Embedded System Hardware - Reconfigurable Hardware - Peter Marwedel Informatik 12 TU Dortmund Germany Graphics: Alexandra Nolte, Gesine Marwedel, 年 06 月 12 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.
22 Energy Efficiency of FPGAs inherent power efficiency of silicon Hugo De Man, IMEC, Philips,
23 Reconfigurable Logic Full custom chips may be too expensive, software too slow. Combine the speed of HW with the flexibility of SW HW with programmable functions and interconnect. Use of configurable hardware; common form: field programmable gate arrays (FPGAs) Applications: bit-oriented algorithms like encryption, fast object recognition (medical and military) Adapting mobile phones to different standards. Very popular devices from XILINX (XILINX Vertex II are recent devices) Actel, Altera and others
24 Floor-plan of VIRTEX II FPGAs More recent: Virtex 5, but no floor-plan found for Virtex
25 Virtex 5 Configurable Logic Block (CLB)
26 Virtex 5 Slice (simplified) Memories typically used as look-up tables to implement any Boolean function of 6 variables
27 Virtex 5 SliceM SliceM supports using memories for storing data and as shift registers
28 Resources available in Virtex 5 devices [ and source: Xilinx Inc.: Virtex 5 FPGA User Guide, May, 2009 //
29 Interconnect for Virtex II Hierarchical Routing Resources; no routing plan found for Virtex 5.
30 Virtex II Pro Devices include up to 4 PowerPC processor cores Virtex 5 Devices include up to 2 PowerPC processor cores [ and source: Xilinx Inc.: Virtex-II Pro Platform FPGAs: Functional Description, Sept. 2002, //
31 12 Memory Peter Marwedel Informatik 12 TU Dortmund Germany 2009/11/22
32 Memory Memories? Oops! Memories! For the memory, efficiency is again a concern: speed (latency and throughput); predictable timing energy efficiency size cost other attributes (volatile vs. persistent, etc)
33 Access times and energy consumption increases with the size of the memory Example (CACTI Model): "Currently, the size of some applications is doubling every 10 months" [STMicroelectronics, Medea+ Workshop, Stuttgart, Nov. 2003]
34 Access times and energy consumption for multi-ported register files Cycle Time (ns) Area (λ 2 x10 6 ) Power (W) Register File Size Register File Size GP6M2 GP6M3 Rixner s et al. model [HPCA 00], Technology of 0.18 µm Source and H. Valero,
35 Memory system frequently consumes >50 % of the energy used for processing 29% Cache ($)-less monoprocessor Processor Energy Main Mem. Energy 71% Multiprocessor with cache ($) 51,9% 28,1% Proc. Energy I-Cache Energy D-Cache Energy Main Mem. Energy Average over 200 benchmarks analyzed by Verma (U. Dortmund) 5,2% 14,8% [M. Verma, P. Marwedel: Advanced Memory Optimization Techniques for Low-Power Embedded Processors, Springer, 2007]
36 Similar information according to other sources EBOX 8% DMMU 8% Clock 10% IMMU 9% Others 5% Ibox 18% Strong ARM IEEE Journal of SSC Nov. 96 Icache 26% Dcache 16% [Based on slide by and : Osman S. Unsal, Israel Koren, C. Mani Krishna, Csaba Andras Moritz, U. of Massachusetts, Amherst, 2001] SysCtl 3% CP 15 2% PATag RAM 1% Clocks 4% BIU 8% arm9 25% Other 4% I MMU 4% D Cache 19% I Cache 25% D MMU 5% [Segars 01 according to Vahid@ISSS01]
37 Energy consumption in mobile devices [O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source
38 Trends for the Speeds Speed gap between processor and main DRAM increases Speed 1 CPU Performance (1.5-2 p.a.) 2x every 2 years DRAM (1.07 p.a.) years Similar problems also for embedded systems & MPSoCs In the future: Memory access times >> processor cycle times Memory wall problem [P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane]
39 Set-associative cache n-way cache Set = 2 Address Tag Index way 0 $ ( ) way 1 Tags data block Tags data block = = 1 Data
40 Hierarchical memories using scratch pad memories (SPM) SPM is a small, physically separate memory mapped into the address space Hierarchy main SPM processor 0 FFF.. Address space scratch pad memory no tag memory select SPM Selection is by an appropriate address decoder (simple!) Example ARM7TDMI cores, wellknown for low power consumption
41 Comparison of currents using measurements E.g.: ATMEL board with ARM7TDMI and ext. SRAM Current 32 Bit-Load Instruction (Thumb) 200 ma ,2 82,2 1,16 48,2 50,9 44,4 53,1 Prog Main/ Data Main Prog Main/ Data SPM Prog SPM/ Data Main Prog SPM/ Data SPM Core+SPM (ma) Main Memory Current (ma)
42 Why not just use a cache? 2. Energy for parallel access of sets, in comparators, muxes. 9 Energy per access [nj] Scratch pad Cache, 2way, 4GB space Cache, 2way, 16 MB space Cache, 2way, 1 MB space memory size [R. Banakar, S. Steinke, B.-S. Lee, 2001]
43 Influence of the associativity Parameters different from previous slides [P. Marwedel et al., ASPDAC, 2004]
44 Summary Processing VLIW/EPIC processors MPSoCs FPGAs Memories Small is beautiful (in terms of energy consumption, access times, size)
Embedded System Hardware - Processing (Part II)
12 Embedded System Hardware - Processing (Part II) Jian-Jia Chen (Slides are based on Peter Marwedel) Informatik 12 TU Dortmund Germany Springer, 2010 2014 年 11 月 11 日 These slides use Microsoft clip arts.
7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
VLIW Processors. VLIW Processors
1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW
Kirchhoff Institute for Physics Heidelberg
Kirchhoff Institute for Physics Heidelberg Norbert Abel FPGA: (re-)configuration and embedded Linux 1 Linux Front-end electronics based on ADC and digital signal processing Slow control implemented as
Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
Introduction to Microprocessors
Introduction to Microprocessors Yuri Baida [email protected] [email protected] October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?
COMPUTER HARDWARE. Input- Output and Communication Memory Systems
COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)
Architectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
CHAPTER 7: The CPU and Memory
CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip
Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single
Embedded System Design: Embedded Systems Foundations of Cyber-Physical Systems
12 Embedded System Design: Embedded Systems Foundations of Cyber-Physical Systems Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Springer, 2010 2015 年 10 月 21 日 These slides
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
Computer Architectures
Computer Architectures 2. Instruction Set Architectures 2015. február 12. Budapest Gábor Horváth associate professor BUTE Dept. of Networked Systems and Services [email protected] 2 Instruction set architectures
1. Introduction to Embedded System Design
1. Introduction to Embedded System Design Lothar Thiele ETH Zurich, Switzerland 1-1 Contents of Lectures (Lothar Thiele) 1. Introduction to Embedded System Design 2. Software for Embedded Systems 3. Real-Time
FPGA-based MapReduce Framework for Machine Learning
FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China
Aperiodic Task Scheduling
Aperiodic Task Scheduling Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Germany Springer, 2010 2014 年 11 月 19 日 These slides use Microsoft clip arts. Microsoft copyright
Data Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule
All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:
Central Processing Unit (CPU)
Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following
A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications
1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture
FPGA Synthesis Example: Counter
FPGA Synthesis Example: Counter Peter Marwedel Informatik XII, U. Dortmund Gliederung Einführung SystemC Vorlesungen und Programmierung FPGAs - Vorlesungen - VHDL-basierte Konfiguration von FPGAs mit dem
IA-64 Application Developer s Architecture Guide
IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve
what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
ARM Microprocessor and ARM-Based Microcontrollers
ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 A Microcontroller-Based Embedded System Roadmap 1 Introduction ARM ARM Basics 2 ARM Extensions Thumb Jazelle NEON & DSP Enhancement
This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo
Middleware. Peter Marwedel TU Dortmund, Informatik 12 Germany. technische universität dortmund. fakultät für informatik informatik 12
Universität Dortmund 12 Middleware Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 2003 2010 年 11 月 26 日 These slides use Microsoft clip arts. Microsoft copyright
Am186ER/Am188ER AMD Continues 16-bit Innovation
Am186ER/Am188ER AMD Continues 16-bit Innovation 386-Class Performance, Enhanced System Integration, and Built-in SRAM Problem with External RAM All embedded systems require RAM Low density SRAM moving
Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory
Computer Organization and Architecture Chapter 4 Cache Memory Characteristics of Memory Systems Note: Appendix 4A will not be covered in class, but the material is interesting reading and may be used in
Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Modern GPU
Chapter 2 Logic Gates and Introduction to Computer Architecture
Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are
Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik
Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Contents Überblick: Aufbau moderner FPGA Einblick: Eigenschaften
Compiling PCRE to FPGA for Accelerating SNORT IDS
Compiling PCRE to FPGA for Accelerating SNORT IDS Abhishek Mitra Walid Najjar Laxmi N Bhuyan QuickTime and a QuickTime and a decompressor decompressor are needed to see this picture. are needed to see
FPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
Chapter 7 Memory and Programmable Logic
NCNU_2013_DD_7_1 Chapter 7 Memory and Programmable Logic 71I 7.1 Introduction ti 7.2 Random Access Memory 7.3 Memory Decoding 7.5 Read Only Memory 7.6 Programmable Logic Array 77P 7.7 Programmable Array
Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.
Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how
Design Cycle for Microprocessors
Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types
AMD Opteron Quad-Core
AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced
Instruction Set Design
Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,
Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] NIOS II 1 1 What is Nios II? Altera s Second Generation
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
picojava TM : A Hardware Implementation of the Java Virtual Machine
picojava TM : A Hardware Implementation of the Java Virtual Machine Marc Tremblay and Michael O Connor Sun Microelectronics Slide 1 The Java picojava Synergy Java s origins lie in improving the consumer
MPSoC Virtual Platforms
CASTNESS 2007 Workshop MPSoC Virtual Platforms Rainer Leupers Software for Systems on Silicon (SSS) RWTH Aachen University Institute for Integrated Signal Processing Systems Why focus on virtual platforms?
1. Memory technology & Hierarchy
1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency
OpenSPARC T1 Processor
OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware
Accelerate Cloud Computing with the Xilinx Zynq SoC
X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce
A SystemC Transaction Level Model for the MIPS R3000 Processor
SETIT 2007 4 th International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 25-29, 2007 TUNISIA A SystemC Transaction Level Model for the MIPS R3000 Processor
Price/performance Modern Memory Hierarchy
Lecture 21: Storage Administration Take QUIZ 15 over P&H 6.1-4, 6.8-9 before 11:59pm today Project: Cache Simulator, Due April 29, 2010 NEW OFFICE HOUR TIME: Tuesday 1-2, McKinley Last Time Exam discussion
More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
CPU Organization and Assembly Language
COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:
Introducción. Diseño de sistemas digitales.1
Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1
SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing
SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing Won-Jong Lee 1, Youngsam Shin 1, Jaedon Lee 1, Jin-Woo Kim 2, Jae-Ho Nah 3, Seokyoon Jung 1, Shihwa Lee 1, Hyun-Sang Park 4, Tack-Don Han 2 1 SAMSUNG
FlexPath Network Processor
FlexPath Network Processor Rainer Ohlendorf Thomas Wild Andreas Herkersdorf Prof. Dr. Andreas Herkersdorf Arcisstraße 21 80290 München http://www.lis.ei.tum.de Agenda FlexPath Introduction Work Packages
SOC architecture and design
SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external
A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
Performance of Host Identity Protocol on Nokia Internet Tablet
Performance of Host Identity Protocol on Nokia Internet Tablet Andrey Khurri Helsinki Institute for Information Technology HIP Research Group IETF 68 Prague March 23, 2007
İSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x
Software Pipelining for (i=1, i
Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001
Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering
Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2
Reconfigurable Architectures Chapter 3.2 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design Coarse-Grained Reconfigurable Devices Recall: 1. Brief Historically development (Estrin Fix-Plus
Computer Architecture
Cache Memory Gábor Horváth 2016. április 27. Budapest associate professor BUTE Dept. Of Networked Systems and Services [email protected] It is the memory,... The memory is a serious bottleneck of Neumann
路 論 Chapter 15 System-Level Physical Design
Introduction to VLSI Circuits and Systems 路 論 Chapter 15 System-Level Physical Design Dept. of Electronic Engineering National Chin-Yi University of Technology Fall 2007 Outline Clocked Flip-flops CMOS
ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT
216 ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT *P.Nirmalkumar, **J.Raja Paul Perinbam, @S.Ravi and #B.Rajan *Research Scholar,
OPENSPARC T1 OVERVIEW
Chapter Four OPENSPARC T1 OVERVIEW Denis Sheahan Distinguished Engineer Niagara Architecture Group Sun Microsystems Creative Commons 3.0United United States License Creative CommonsAttribution-Share Attribution-Share
FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25
FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 December 2014 FPGAs in the news» Catapult» Accelerate BING» 2x search acceleration:» ½ the number of servers»
DS1104 R&D Controller Board
DS1104 R&D Controller Board Cost-effective system for controller development Highlights Single-board system with real-time hardware and comprehensive I/O Cost-effective PCI hardware for use in PCs Application
Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor
Von der Hardware zur Software in FPGAs mit Embedded Prozessoren Alexander Hahn Senior Field Application Engineer Lattice Semiconductor AGENDA Overview Mico32 Embedded Processor Development Tool Chain HW/SW
- Nishad Nerurkar. - Aniket Mhatre
- Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,
In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
Memory Architecture and Management in a NoC Platform
Architecture and Management in a NoC Platform Axel Jantsch Xiaowen Chen Zhonghai Lu Chaochao Feng Abdul Nameed Yuang Zhang Ahmed Hemani DATE 2011 Overview Motivation State of the Art Data Management Engine
9/14/2011 14.9.2011 8:38
Algorithms and Implementation Platforms for Wireless Communications TLT-9706/ TKT-9636 (Seminar Course) BASICS OF FIELD PROGRAMMABLE GATE ARRAYS Waqar Hussain [email protected] Department of Computer
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University [email protected].
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University [email protected] Review Computers in mid 50 s Hardware was expensive
Defining Platform-Based Design. System Definition. Platform Based Design What is it? Platform-Based Design Definitions: Three Perspectives
Based Design What is it? Question: How many definitions of Based Design are there? Defining -Based Design Answer: How many people to you ask? What does the confusion mean? It is a definition in transition
Computer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
Computer Organization and Components
Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides
System on Chip Platform Based on OpenCores for Telecommunication Applications
System on Chip Platform Based on OpenCores for Telecommunication Applications N. Izeboudjen, K. Kaci, S. Titri, L. Sahli, D. Lazib, F. Louiz, M. Bengherabi, *N. Idirene Centre de Développement des Technologies
Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education
Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,
Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:
Pipelining HW Q. Can a MIPS SW instruction executing in a simple 5-stage pipelined implementation have a data dependency hazard of any type resulting in a nop bubble? If so, show an example; if not, prove
FPGA Design From Scratch It all started more than 40 years ago
FPGA Design From Scratch It all started more than 40 years ago Presented at FPGA Forum in Trondheim 14-15 February 2012 Sven-Åke Andersson Realtime Embedded 1 Agenda Moore s Law Processor, Memory and Computer
Network Traffic Monitoring an architecture using associative processing.
Network Traffic Monitoring an architecture using associative processing. Gerald Tripp Technical Report: 7-99 Computing Laboratory, University of Kent 1 st September 1999 Abstract This paper investigates
Chapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC
Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC Yuan-Hsiu Chen and Pao-Ann Hsiung National Chung Cheng University, Chiayi, Taiwan 621, ROC. [email protected]
White Paper FPGA Performance Benchmarking Methodology
White Paper Introduction This paper presents a rigorous methodology for benchmarking the capabilities of an FPGA family. The goal of benchmarking is to compare the results for one FPGA family versus another
EE361: Digital Computer Organization Course Syllabus
EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)
How To Write Security Enhanced Linux On Embedded Systems (Es) On A Microsoft Linux 2.2.2 (Amd64) (Amd32) (A Microsoft Microsoft 2.3.2) (For Microsoft) (Or
Security Enhanced Linux on Embedded Systems: a Hardware-accelerated Implementation Leandro Fiorin, Alberto Ferrante Konstantinos Padarnitsas, Francesco Regazzoni University of Lugano Lugano, Switzerland
Power Reduction Techniques in the SoC Clock Network. Clock Power
Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a
Model-based system-on-chip design on Altera and Xilinx platforms
CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect [email protected] Agenda 3T Company profile Technology
Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi University of Utah & HP Labs 1 Large Caches Cache hierarchies
Five Families of ARM Processor IP
ARM1026EJ-S Synthesizable ARM10E Family Processor Core Eric Schorn CPU Product Manager ARM Austin Design Center Five Families of ARM Processor IP Performance ARM preserves SW & HW investment through code
Microprocessor and Microcontroller Architecture
Microprocessor and Microcontroller Architecture 1 Von Neumann Architecture Stored-Program Digital Computer Digital computation in ALU Programmable via set of standard instructions input memory output Internal
ARM Architecture. ARM history. Why ARM? ARM Ltd. 1983 developed by Acorn computers. Computer Organization and Assembly Languages Yung-Yu Chuang
ARM history ARM Architecture Computer Organization and Assembly Languages g Yung-Yu Chuang 1983 developed by Acorn computers To replace 6502 in BBC computers 4-man VLSI design team Its simplicity it comes
Optimizing Configuration and Application Mapping for MPSoC Architectures
Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : [email protected] 1 Multi-Processor Systems on Chip (MPSoC) Design Trends
Building Blocks for PRU Development
Building Blocks for PRU Development Module 1 PRU Hardware Overview This session covers a hardware overview of the PRU-ICSS Subsystem. Author: Texas Instruments, Sitara ARM Processors Oct 2014 2 ARM SoC
Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
