Thread Level Parallelism (TLP)

Size: px
Start display at page:

Download "Thread Level Parallelism (TLP)"

Transcription

1 Thread Level Parallelism (TLP) Calcolatori Elettronici 2 TLP: SUN Microsystems vision (2004) Roberto Giorgi, Universita di Siena, C208L15, Slide 2

2 Estimated Industry Trends Moore's Law allows for the rapid increase in transistors per core. TLP optimised cores will start out much simpler, and may grow complex more slowly. The trend is for chips and CPU cores to get smaller, though TLP optimised ones will start much smaller. Growth rates in maximum power for "fat" CPUs have levelled off a bit. For "thin" cores, the number of CPU cores per chip will probably increase rather than the power consumption per core. "Fat" cores need lots of cache to reduce memory latency. TLP optimised designs are less latency sensitive, so less cache is needed. Better process technology helps both types to increase, though the simpler, slower clocked "thin" cores will be slower on more traditional benchmarks. "Fat" cores will benefit from TLP techniques and general improvements, but not as much as "thin" cores. Roberto Giorgi, Universita di Siena, C208L15, Slide 3 Current 4-way SMP An illustration of a 4-way system today. The only TLP comes from having multiple chips Roberto Giorgi, Universita di Siena, C208L15, Slide 4

3 Toward NIAGARA chips An illustration of a system with a heavily optimised TLP design Roberto Giorgi, Universita di Siena, C208L15, Slide 5 Niagara: A Torrent of Threads Niagara floorplan Roberto Giorgi, Universita di Siena, C208L15, Slide 6

4 First Niagara Chips: November 2005 UltraSPARC T1 I sistemi Niagara hanno 14 volte le prestazioni di un sistema UltraSPARC IIIi I sistemi con il single-chip Niagara 2, 35 volte I sistemi con Victoria Falls, 65 volte Roberto Giorgi, Universita di Siena, C208L15, Slide 7 EMBEDDED SYSTEM TRENDS Roberto Giorgi, Universita di Siena, C208L15, Slide 8

5 Global Embedded Systems Revenue (by Region) AAGR: average annual growth rate Global Embedded Systems Revenue $ Billions Americas Europe Japan Asia-Pacific AAGR% AAGR% Region Source: Future of Embedded Systems Technology, BCC Co, Inc., 2005 Roberto Giorgi, Universita di Siena, C208L15, Slide 9 Global Embedded Systems Revenue (by Application) World Embedded Systems Revenue $ Billions AAGR% 0 0 Telecomm Consumer Automotive Medical/Office Application Industrial/Milit AAGR% Source: Future of Embedded Systems Technology, BCC Co, Inc., 2005 Roberto Giorgi, Universita di Siena, C208L15, Slide 10

6 Global Embedded HW Revenue MPU : microprocessors MCU: microcontrollers Global Embedded Hardware Revenue by Category $ Billions AAGR% 0 MPU MCU DSP Memory Category ASIC/PLD Analog AAGR% Source: Future of Embedded Systems Technology, BCC Co, Inc., 2005 Roberto Giorgi, Universita di Siena, C208L15, Slide 11 Projected Technology Progress 1000 Transistor Density MPU (including SRAM) Source: Process Integration, Devices and Structures, ITRS, 2005 Mtransistors/cm Year Transistor number will continue to scale for some time Roberto Giorgi, Universita di Siena, C208L15, Slide 12

7 Embedded Platforms Roadmap Use of embedded processors in FPGAs 100% 80% 60% 40% Hard FPGA processor Soft FPGA processor No FPGA processor 20% 0% Hardwired Logic (ASIC-like) is being replaced by embedded processor devices Source: Survey of System Design Trends, Celoxica Inc., August 2005 Roberto Giorgi, Universita di Siena, C208L15, Slide 13 Embedded Processors: Innovation driven by Technology + Architecture Advances Multi-processing: Higher throughput With less speed Source: The Era of Tera, Pat Gelsinger, Intel, 2005 Roberto Giorgi, Universita di Siena, C208L15, Slide 14

8 Case Study ITRS Mobile Handheld Roadmap Year of Production Process Technology (nm) Supply Voltage (V) Clock Frequency (MHz) Processing Performance (GOPS) Average Power (W) Standby Power (mw) Applications Real Time Video Codec TV Telephone Source: System Drivers, ITRS, 2003 Performance, En. Efficiency (GOPs/W) increase by 200x Roberto Giorgi, Universita di Siena, C208L15, Slide 15 ITRS Low-Power SoC Source: System Drivers, ITRS, 2005 Many Processing Elements Reusability, Multi-Standard requirements drive for programmable (processor-based) solutions (PEs) (Heterogeneous) Multi-Processor systems-on-a-chip (SoC) Roberto Giorgi, Universita di Siena, C208L15, Slide 16

9 ITRS Low-Power SoC Processing/Performance Trends Source: System Drivers, ITRS, 2005 > 100 Processing Elements in 2011! Roberto Giorgi, Universita di Siena, C208L15, Slide 17 Future Embedded System Design Trends Mobile Handset Market driving commercial factor New applications, wireless transmission standards require high performance embedded low power ITRS foresees 3x magnitude improvement in performance and energy efficiency over the next 10 years (Heterogeneous) Multi-Processor system-on-chip Platforms Compiler Technologies for high-performance, low-power embedded computing will be needed Compiler and System-Design Tools for heterogeneous, massively parallel processing systems and networks Roberto Giorgi, Universita di Siena, C208L15, Slide 18

10 Network of Excellence HiPEAC High-Performance Embedded Architectures and Compilers IST Web Site Roberto Giorgi, Universita di Siena, C208L15, Slide 20

11 What We Have Now ACACES Extranet (Program, practical info,...) Participant management HiPEAC Conference Extranet (Committees, Call for papers, practical info,...) Paper submission (Commence) Roberto Giorgi, Universita di Siena, C208L15, Slide 21 SARC: Scalable ARChitectures WEB Site: Roberto Giorgi, Universita di Siena, C208L15, Slide 22

12 Paradigm shift Tiled architecture, built from fixed size nodes The architecture scales up by adding nodes NOT by growing the node size The node becomes the processor The processors become the functional units Roberto Giorgi, Universita di Siena, C208L15, Slide 23 Programming model features Programming model will have tagged procedure calls Define local and global (shared) variables - Defines address range(s) to copy to local store - Automatic programming of DMA transfers - Defines address range(s) to watch for interference Set procedure properties - Has secondary effects (modifies global state) - Reads global space - Writes global space - Requires atomicity - Regarding local variables - Regarding global variables Processor functionality requirements - Supports a specific ISA extension (or a different ISA) Roberto Giorgi, Universita di Siena, C208L15, Slide 24

13 Intra-node memory hierarchy Architecture must be easy to program for: Shared memory Accelerators may have: Local memory - Private, non-coherent DMA controller - Bridge between global memory and Local memory Accelerators must have: Global memory access - Directly, or through cache hierarchy Single load/store instruction Address range differentiates Local memory from Global memory Local memory ACC DMA Accelerator Cache(s) Local memory Local interconnect Outer shared cache ACC DMA Accelerator Cache(s) Roberto Giorgi, Universita di Siena, C208L15, Slide 25 Intra-node memory hierarchy (II) All caches inside a node must be coherent All outer caches (from each node) should also be coherent Caches work as shared distributed memory If threads do not share memory - There s no coherence traffic, nor overhead - There s no memory waste If threads share memory - Turning off coherence results in wrong execution Which is the benefit of turning off coherence? The hardware must be there anyway Turn it off for power savings? Lower memory access latency in non-shared mode? Use the hardware for something else? (what else? additional storage?) Roberto Giorgi, Universita di Siena, C208L15, Slide 26

14 Examples for intra-node memory Local memory ACC DMA Accelerator Cache(s) Local memory ACC Accelerator Cache(s) ACC Acc Cache(s) Local memory ACC DMA Local interconnect Outer shared cache Roberto Giorgi, Universita di Siena, C208L15, Slide 27 Determine the node size If node size is fixed, we must determine its size Split available area among Shared cache Local interconnect General purpose processor Accelerators Fixed or flexible distribution? Fixed GPP, cache, interconnect Reconfigurable accelerator area How many accelerators can a thread actually exploit? Streaming computation Parallel computation Task offloading Outer cache memory Local interconnect GPP Roberto Giorgi, Universita di Siena, C208L15, Slide 28

15 Node examples Sea of simple cores Niagara Cell Few complex cores Power5 Single vector/media/bio accelerator Multiple accelerators Outer cache memory Local interconnect GPP Roberto Giorgi, Universita di Siena, C208L15, Slide 29 GPP Accelerator interface For the processor to become the functional unit, task offloading must have minimum overhead Outer cache memory Accelerator as ISA extension Shares PC, Fetch & Decode with a general purpose CPU Issue logic sends instructions to CPU or Accelerator Units Implements an extension of the base ISA Accelerator as a new CPU Has a separate PC, Fetch, Decode engine May implement a completely different ISA - VLIW, SIMD, Stack, 16-bit ACC F & D Local interconnect Fetch & Dispatch Fetch & Dispatch GPP ACC CPU ACC Roberto Giorgi, Universita di Siena, C208L15, Slide 30

16 Memory Hierarchy DRAM I/O L3 Control Control Cache Set of coherent (processor-shared?) L1 caches inside the nodes C x Node Set of coherent node-shared L2 caches inside the chip (one from each node) 1 x Node, N x Chip Chip-shared L3 cache 1 x Chip Off-chip DRAM (or other memory technology) Roberto Giorgi, Universita di Siena, C208L15, Slide 31 Motivation Hard to further scale uniprocessors Brought back focus to multiprocessors Different applications profit from different techniques/types of parallelism ILP, TLP, DLP Motivates a customizable system with complex cores simple cores domain-specific accelerators 32 Roberto Giorgi, Universita di Siena, C208L15, Slide 32

17 Motivation (2) Parallelism type exhibited by application and suitable architecture: TLP SSC CMP+ vector DLP SMT vector FCC ILP 33 Roberto Giorgi, Universita di Siena, C208L15, Slide 33 SARC? complex cores simple cores accelerators 34 Roberto Giorgi, Universita di Siena, C208L15, Slide 34

18 ISA considerations Complex cores and simple cores have the same ISA (allows to move threads from one to another [for real-time performance, power, ], simpler programming and compilation) ISA-agnostic approaches applicable to basically any ISA (ARM, PowerPC, ) Accelerator ISAs extensions of GPP ISA single instruction stream (co-processor instructions) or multiple instructions stream 35 Roberto Giorgi, Universita di Siena, C208L15, Slide 35 How to realize customization? At design-time: The right mix of simple cores, complex cores, accelerators is determined at design-time Pro: Highest performance for specific application domains Con: after fabrication, only for specific application domains At run-time: There will be many processing cores on a chip, for temperature reasons some will have to be powered down anyhow Pro: Allows to achieve good performance, low power on many applications Con: Performance not as high as at design-time 36 Roberto Giorgi, Universita di Siena, C208L15, Slide 36

19 Levels of Abstraction Levels of abstraction: Architecture Microarchitecture Implementation Realization SARC WP1 focuses mainly on levels 1 and 2 37 Roberto Giorgi, Universita di Siena, C208L15, Slide 37 SARC node architecture 38 Roberto Giorgi, Universita di Siena, C208L15, Slide 38

20 Architectures of Domain Specific Accelerators SARC specifically targets (but is not limited to) application domains scientific computing (supercomputing) bioinformatics multimedia internet and transaction processing Contain code pieces responsible for large fraction of execution time Performance and power-efficiency can be improved significantly by employing domain-specific accelerators 39 Roberto Giorgi, Universita di Siena, C208L15, Slide 39 Scientific Computing Vector Accelerator Architecture For applications dominated by loops with vector operands What are the innovations: Matrix by Matrix operations (at least 2D) Dimensionality not encoded in the instructions (novel register file to support this) Sparse and Dense matrices considered identically Auto-indexing and sectioning addressing mechanisms (link to WP2) (possible) on-chip distributed vector facility ISA, data formats, register file organization and memory addressing scheme under investigation 40 Roberto Giorgi, Universita di Siena, C208L15, Slide 40

21 Scientific Computing Vector Accelerator Architecture (cont) ISA (check the document) Operand types: Vectors, Matrices (Sparse and Dense), Bit vectors and Scalars. (in sparse mode ½ of the available registers used as index vectors) Data formats: 64 bit FP; 8, 16, 32 and 64 bit INT and BOOL Auto indexing for rectangular patterns (dense): 41 Roberto Giorgi, Universita di Siena, C208L15, Slide 41 Scientific Computing Vector Accelerator Architecture (cont) Register file: The SARC vector register file is a parameterizable register file, which can be logically reorganized by the programmer to support multiple register dimensions and sizes simultaneously. Scalar reg. file shared with GPP 1) Vector registers can overlap (think about it) 2) Scalar registers can be used for conditional branches on the GPP side 42 Roberto Giorgi, Universita di Siena, C208L15, Slide 42

22 Bioinformatics Accelerator Will have a scalar and vector-simd part (Multiple) sequence alignment algorithms require: support for efficient unaligned memory accesses strided memory accesses vector reduction operations, etc. In structure prediction monte carlo or molecular dynamic simulations common can profit from earlier ASIC/FPGA work Docking profits from architectural features incorporated for structure prediction but also from matrix rotations, transposes, 43 Roberto Giorgi, Universita di Siena, C208L15, Slide 43 Multimedia accelerator Vector-SIMD architecture Architecture agnostic to physical vector length Avoid packing/unpacking, reorganization overhead unpacking while loading packing while storing flexible access to register file Use more dimensions 44 Roberto Giorgi, Universita di Siena, C208L15, Slide 44

23 Micro-architectural considerations Simple/complex GPP mixture Scalable cache coherence Support for (existing) sequential, single-threaded applications Thread-level speculation Kilo-instruction processors 45 Roberto Giorgi, Universita di Siena, C208L15, Slide 45 I/O and Communication Subsystem Overheads of system call, context switch, interrupt, network protocol no longer justified With fewer threads than processing cores no reason for switching execution context OS must not run on same processor as user applications requires extra-low communication latency 46 Roberto Giorgi, Universita di Siena, C208L15, Slide 46

24 Interconnection Network LANs/SANs are so fast that switching and routing have to be provided in hardware but reliable and congestion control left to end-nodes needs to be addressed Power considerations also Applies to multi-chip interconnection networks, but NoCs have to solve similar problems in a much more constrained enviroment 47 Roberto Giorgi, Universita di Siena, C208L15, Slide 47 TRANSACTIONAL MEMORY The most difficult task when developing multithreaded applications is making sure that the program works (e.g. deadlocks may occur when combining correct code fragments) Transactional memory is a concurrency control mechanism for controlling access to shared memory A transaction is a piece of code that executes a series of reads and writes to shared memory, which logically occur at a single instant in time, and are typically implemented in a lock-free way Transactional memory is optimistic: every thread completes its modifications to shared memory without regard for what other threads might be doing, recording every read and write that it makes in a log, which are validated in the commit stage Implementing part of the system memory as transactional memory could be the solution for storing shared data in parallel applications while simplifying programming Roberto Giorgi, Universita di Siena, C208L15, Slide 48

25 Riflessione PROBLEM: THINKING IN PARALLEL IS HARD! Perhaps: THINKING is hard! (YALE PATT - Sep.2007) Roberto Giorgi, Universita di Siena, C208L15, Slide 49

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

1. PUBLISHABLE SUMMARY

1. PUBLISHABLE SUMMARY 1. PUBLISHABLE SUMMARY ICT-eMuCo (www.emuco.eu) is a European project with a total budget of 4.6M which is supported by the European Union under the Seventh Framework Programme (FP7) for research and technological

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:

More information

OC By Arsene Fansi T. POLIMI 2008 1

OC By Arsene Fansi T. POLIMI 2008 1 IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: Embedded Systems - , Raj Kamal, Publs.: McGraw-Hill Education Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU

More information

Embedded System Hardware - Processing (Part II)

Embedded System Hardware - Processing (Part II) 12 Embedded System Hardware - Processing (Part II) Jian-Jia Chen (Slides are based on Peter Marwedel) Informatik 12 TU Dortmund Germany Springer, 2010 2014 年 11 月 11 日 These slides use Microsoft clip arts.

More information

ELEC 5260/6260/6266 Embedded Computing Systems

ELEC 5260/6260/6266 Embedded Computing Systems ELEC 5260/6260/6266 Embedded Computing Systems Spring 2016 Victor P. Nelson Text: Computers as Components, 3 rd Edition Prof. Marilyn Wolf (Georgia Tech) Course Topics Embedded system design & modeling

More information

Real-Time Operating Systems for MPSoCs

Real-Time Operating Systems for MPSoCs Real-Time Operating Systems for MPSoCs Hiroyuki Tomiyama Graduate School of Information Science Nagoya University http://member.acm.org/~hiroyuki MPSoC 2009 1 Contributors Hiroaki Takada Director and Professor

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

ARM Microprocessor and ARM-Based Microcontrollers

ARM Microprocessor and ARM-Based Microcontrollers ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 A Microcontroller-Based Embedded System Roadmap 1 Introduction ARM ARM Basics 2 ARM Extensions Thumb Jazelle NEON & DSP Enhancement

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

Thread level parallelism

Thread level parallelism Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Operating System Support for Multiprocessor Systems-on-Chip

Operating System Support for Multiprocessor Systems-on-Chip Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Chapter 2 Heterogeneous Multicore Architecture

Chapter 2 Heterogeneous Multicore Architecture Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

Computer Architecture TDTS10

Computer Architecture TDTS10 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Introducción. Diseño de sistemas digitales.1

Introducción. Diseño de sistemas digitales.1 Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Design Cycle for Microprocessors

Design Cycle for Microprocessors Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types

More information

Data Center and Cloud Computing Market Landscape and Challenges

Data Center and Cloud Computing Market Landscape and Challenges Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

OpenSoC Fabric: On-Chip Network Generator

OpenSoC Fabric: On-Chip Network Generator OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation

More information

What are embedded systems? Challenges in embedded computing system design. Design methodologies.

What are embedded systems? Challenges in embedded computing system design. Design methodologies. Embedded Systems Sandip Kundu 1 ECE 354 Lecture 1 The Big Picture What are embedded systems? Challenges in embedded computing system design. Design methodologies. Sophisticated functionality. Real-time

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

Energy-Efficient, High-Performance Heterogeneous Core Design

Energy-Efficient, High-Performance Heterogeneous Core Design Energy-Efficient, High-Performance Heterogeneous Core Design Raj Parihar Core Design Session, MICRO - 2012 Advanced Computer Architecture Lab, UofR, Rochester April 18, 2013 Raj Parihar Energy-Efficient,

More information

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Power-Aware High-Performance Scientific Computing

Power-Aware High-Performance Scientific Computing Power-Aware High-Performance Scientific Computing Padma Raghavan Scalable Computing Laboratory Department of Computer Science Engineering The Pennsylvania State University http://www.cse.psu.edu/~raghavan

More information

STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS

STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS Nitin Chaturvedi 1 S Gurunarayanan 2 1 Department of Electrical Electronics Engineering, BITS, Pilani, India nitin80@bits-pilani.ac.in

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.

More information

Chapter 2 Parallel Computer Architecture

Chapter 2 Parallel Computer Architecture Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis Parallel Computers Definition: A parallel computer is a collection of processing

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

High Performance Computing in the Multi-core Area

High Performance Computing in the Multi-core Area High Performance Computing in the Multi-core Area Arndt Bode Technische Universität München Technology Trends for Petascale Computing Architectures: Multicore Accelerators Special Purpose Reconfigurable

More information

Intel Labs at ISSCC 2012. Copyright Intel Corporation 2012

Intel Labs at ISSCC 2012. Copyright Intel Corporation 2012 Intel Labs at ISSCC 2012 Copyright Intel Corporation 2012 Intel Labs ISSCC 2012 Highlights 1. Efficient Computing Research: Making the most of every milliwatt to make computing greener and more scalable

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

Low Power AMD Athlon 64 and AMD Opteron Processors

Low Power AMD Athlon 64 and AMD Opteron Processors Low Power AMD Athlon 64 and AMD Opteron Processors Hot Chips 2004 Presenter: Marius Evers Block Diagram of AMD Athlon 64 and AMD Opteron Based on AMD s 8 th generation architecture AMD Athlon 64 and AMD

More information

08 - Address Generator Unit (AGU)

08 - Address Generator Unit (AGU) September 30, 2013 Todays lecture Memory subsystem Address Generator Unit (AGU) Memory subsystem Applications may need from kilobytes to gigabytes of memory Having large amounts of memory on-chip is expensive

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

The new 32-bit MSP432 MCU platform from Texas

The new 32-bit MSP432 MCU platform from Texas Technology Trend MSP432 TM microcontrollers: Bringing high performance to low-power applications The new 32-bit MSP432 MCU platform from Texas Instruments leverages its more than 20 years of lowpower leadership

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

Optimizing Configuration and Application Mapping for MPSoC Architectures

Optimizing Configuration and Application Mapping for MPSoC Architectures Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : Sebastien.Le-Beux@polymtl.ca 1 Multi-Processor Systems on Chip (MPSoC) Design Trends

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings This Unit: Multithreading (MT) CIS 501 Computer Architecture Unit 10: Hardware Multithreading Application OS Compiler Firmware CU I/O Memory Digital Circuits Gates & Transistors Why multithreading (MT)?

More information

COEN-4720 Embedded Systems Design Lecture 1 Introduction Fall 2016. Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University

COEN-4720 Embedded Systems Design Lecture 1 Introduction Fall 2016. Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University COEN-4720 Embedded Systems Design Lecture 1 Introduction Fall 2016 Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University 1 Outline What is an Embedded System (ES) Examples

More information

Multiprocessor System-on-Chip

Multiprocessor System-on-Chip http://www.artistembedded.org/fp6/ ARTIST Workshop at DATE 06 W4: Design Issues in Distributed, CommunicationCentric Systems Modelling Networked Embedded Systems: From MPSoC to Sensor Networks Jan Madsen

More information

Multithreading Lin Gao cs9244 report, 2006

Multithreading Lin Gao cs9244 report, 2006 Multithreading Lin Gao cs9244 report, 2006 2 Contents 1 Introduction 5 2 Multithreading Technology 7 2.1 Fine-grained multithreading (FGMT)............. 8 2.2 Coarse-grained multithreading (CGMT)............

More information

MCA Standards For Closely Distributed Multicore

MCA Standards For Closely Distributed Multicore MCA Standards For Closely Distributed Multicore Sven Brehmer Multicore Association, cofounder, board member, and MCAPI WG Chair CEO of PolyCore Software 2 Embedded Systems Spans the computing industry

More information

Processor Architectures

Processor Architectures ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture

More information

Microwatt to Megawatt - Transforming Edge to Data Centre Insights

Microwatt to Megawatt - Transforming Edge to Data Centre Insights Security Level: Public Microwatt to Megawatt - Transforming Edge to Data Centre Insights Steve Langridge steve.langridge@huawei.com May 3, 2015 www.huawei.com Agenda HW Acceleration System thinking Big

More information

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

26 April (Next Friday)

26 April (Next Friday) MAXIMUM ADDITIONAL SCORE: 2 points Description: 1. Selection of a research paper of interest from a given list 2. Study of the selected paper and the referenced material 3. Presentation of the paper in

More information

An Overview of Stack Architecture and the PSC 1000 Microprocessor

An Overview of Stack Architecture and the PSC 1000 Microprocessor An Overview of Stack Architecture and the PSC 1000 Microprocessor Introduction A stack is an important data handling structure used in computing. Specifically, a stack is a dynamic set of elements in which

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

Multi-core Systems What can we buy today?

Multi-core Systems What can we buy today? Multi-core Systems What can we buy today? Ian Watson & Mikel Lujan Advanced Processor Technologies Group COMP60012 Future Multi-core Computing 1 A Bit of History AMD Opteron introduced in 2003 Hypertransport

More information

Introducing the Singlechip Cloud Computer

Introducing the Singlechip Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

VLIW Processors. VLIW Processors

VLIW Processors. VLIW Processors 1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications 1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture

More information

Operating Systems 4 th Class

Operating Systems 4 th Class Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science

More information

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

Architecture of Hitachi SR-8000

Architecture of Hitachi SR-8000 Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data

More information

Processor to Usher in a New Era of Computing

Processor to Usher in a New Era of Computing Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced

More information

Memory Architecture and Management in a NoC Platform

Memory Architecture and Management in a NoC Platform Architecture and Management in a NoC Platform Axel Jantsch Xiaowen Chen Zhonghai Lu Chaochao Feng Abdul Nameed Yuang Zhang Ahmed Hemani DATE 2011 Overview Motivation State of the Art Data Management Engine

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

Multicore and Parallel Processing

Multicore and Parallel Processing Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 Administrivia FlameWar Games Night Next Friday, April 27 th 5pm

More information

Extending the Power of FPGAs. Salil Raje, Xilinx

Extending the Power of FPGAs. Salil Raje, Xilinx Extending the Power of FPGAs Salil Raje, Xilinx Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of

More information

Embedded Systems: map to FPGA, GPU, CPU?

Embedded Systems: map to FPGA, GPU, CPU? Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware

More information

Hardware accelerated Virtualization in the ARM Cortex Processors

Hardware accelerated Virtualization in the ARM Cortex Processors Hardware accelerated Virtualization in the ARM Cortex Processors John Goodacre Director, Program Management ARM Processor Division ARM Ltd. Cambridge UK 2nd November 2010 Sponsored by: & & New Capabilities

More information

Computer Architecture. R. Poss

Computer Architecture. R. Poss Computer Architecture R. Poss 1 What is computer architecture? 2 Your ideas and expectations What is part of computer architecture, what is not? Who are computer architects, what is their job? What is

More information

The 5G Infrastructure Public-Private Partnership

The 5G Infrastructure Public-Private Partnership The 5G Infrastructure Public-Private Partnership NetFutures 2015 5G PPP Vision 25/03/2015 19/06/2015 1 5G new service capabilities User experience continuity in challenging situations such as high mobility

More information

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

Multicore Architectures

Multicore Architectures Multicore Architectures Week 1, Lecture 2 Multicore Landscape Intel Dual and quad-core Pentium family. 80-core demonstration last year. AMD Dual, triple (?!), and quad-core Opteron family. IBM Dual and

More information

Going Linux on Massive Multicore

Going Linux on Massive Multicore Embedded Linux Conference Europe 2013 Going Linux on Massive Multicore Marta Rybczyńska 24th October, 2013 Agenda Architecture Linux Port Core Peripherals Debugging Summary and Future Plans 2 Agenda Architecture

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano

More information

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing Won-Jong Lee, Shi-Hwa Lee, Jae-Ho Nah *, Jin-Woo Kim *, Youngsam Shin, Jaedon Lee, Seok-Yoon Jung SAIT, SAMSUNG Electronics, Yonsei Univ. *,

More information