First In Vivo Medical Images Using Photon- Counting, Real-Time GPU Reconstruction

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "First In Vivo Medical Images Using Photon- Counting, Real-Time GPU Reconstruction"

Transcription

1 First In Vivo Medical Images Using Photon- Counting, Real-Time GPU Reconstruction A.P. Lowell P. Kahn J. Ku 25 March 2014

2 Overview Application Algorithms History and Limitations of Traditional Processors GPU Solution

3 Overview Application Algorithms History and Limitations of Traditional Processors GPU Solution

4 Application General: Cardiac Fluoroscopy System

5 Application Makes Live Video of a Beating Heart

6 General: Overview Application Used for non-surgical cardiac procedures Assessment of stenosis Angioplasty Stent Placement Real-Time Digital X-ray Imaging High data throughput and processing Multiple equipment racks and custom enclosures across several rooms

7 Application Functional Blocks Control Room Equipment Space Exam Area Facility Installations Image Displays Image Displays Display Processor User Controls Equipment Racks Imaging Chassis I/O Gantry Gantry C-Arm X-ray Detector X-ray Tube Heat Exchanger HVPS PDU UPS Gantry Pedestal Motion Control Patient Table Motion Control User Controls

8 Background Application Triple Ring Technologies is a contract R&D firm specializing in sensor-based systems Project was funded by: NovaRay, Inc. National Institutes of Health Clinical work at University of Wisconsin, Madison Initial implementation by MultiCoreWare, Inc.

9 Application Performance Summary Real-time Tomosynthesis Input: continuous sensor images ~123 billion rays/second ~40 Gbps sensor downlink rate 640x320 photon-counting sensor array 10,000 scanned source locations 1.28 μs/snapshot Output: live video x1000-pixel focal planes internally x1000-pixel best-focus image output 30 frames/second ~1.4 trillion mathematical operations per second

10 Application Technological Novelty: Scanning-Beam Digital X-ray Geometry Traditional X-ray Point X-ray Source Large-Area Detector Close To Patient SBDX Large-Area X-ray Source Small-Area Detector Far From Patient Imaging Information Flat Projection 3-D Dose Acceptable 80% to 90% less

11 Application Technological Novelty: Reverse Geometry Standard Geometry Reverse Geometry

12 Application Technological Novelty: Reverse Geometry Scattered x-rays miss the detector: less noise! Standard Geometry Reverse Geometry

13 Application Technological Novelty: Reverse Geometry Multiple source perspectives 3D tomography Standard Geometry Reverse Geometry

14 Overview General Application Algorithms History and Limitations of Traditional Processors GPU Solution

15 Algorithms Tomosynthesis: Focal Planes D1 D2 D3 High Plane Focal-Plane Low Plane Detector Plane Images must be reconstructed Within a focal plane: Rays from a set of source/detector combinations converge to the same pixel constructive reinforcement Outside the focal plane: Rays from same set of source/detector combinations diverge into different pixels result is blurring Rate of divergence defines depth-of-field Requires multiple focal-planes to image full volume Source Locations S1 S2 S3

16 Algorithms Tomosynthesis: Digital Lens A Virtual Image Plane A B Virtual Lens Detector Plane Focal-Plane Mapping of rays to image pixels is the virtual equivalent of having a physical lens at the detector plane to bend the rays onto a focal plane Changing the bending characteristic of the virtual lens (ie. the mapping function) creates different focal-planes Source Plane

17 Algorithms Tomosynthesis: Digital Lens Focal Plane A In-Focus A Virtual Image Plane Focal Plane B In-Focus B Virtual Image Plane A Detector Plane Detector Plane A Focal-Plane A Focal-Plane B B Source Plane Source Plane

18 Application Tomosynthesis: Focal Plane Example

19 Algorithms Reconstruction By Projection Geometric projection based on ray-tracing Both projection coefficients and extent vary with focalplane

20 Algorithms Reconstruction by Projection: Basic Geometry Detector Elements Rays from each source location to each detector element intersect the focal-plane within some window that spans a (typically) non-integer number of pixels Focal-Plane Pixels Source Locations

21 Algorithms Tomosynthesis: Basic Geometry Detector Elements Windows from adjacent detector elements will (in general) overlap at the boundary pixels Overlap is not constant -- projection kernel varies between detector elements Focal-Plane Pixels Source Locations

22 Algorithms Reconstruction by Projection: Basic Geometry Detector Elements Windows from adjacent source locations will overlap Multiple detector samples for each reconstructed image pixel Focal-Plane Pixels Source Locations

23 Algorithms Reconstruction By Projection: Rotated Detector Rotation of detector improves sampling as projection advances across the image

24 Algorithms Reconstruction By Projection: Rotated Detector Rotation of detector improves sampling as projection advances across the image However, now a given detector row or column does not map consistently onto a pixel row or column the pixel row indices change with detector column, and vice-versa

25 Algorithms Tomosynthesis = CT? CT No SBDX CT SBDX Perspective Parallel to Rays Perpendicular to Rays Sample Rate <~500 Msps (high-end) 7.7 Gsps Response Time ASAP 30 fps, < 100ms latency Projection Geometry Reconstruction Irregular Varies with rotation angle Correct geometric distortion Filtered back-projection Regular Integer source step-size Allow geometric distortion Unfiltered back-projection

26 Algorithms Plane-Selection Single Focal-Plane Best Focus

27 Plane-Selection Algorithms Detect features of interest (things in-focus ) in each focal-plane Algorithms may include matched filters, gradient estimation, topological operators,. Calculate figures-of-merit Major impediments high levels of Poisson noise in dark regions low contrast for small features Select which plane to display in final image on a pixelby-pixel basis Plane-to-plane comparison over a large number of planes

28 Application Live Image from GPU system

29 Algorithms Other Processing Artifact removal (per focal plane) Residue of reconstruction methods: pattern noise, gain corrections Dynamic range adjustment (per focal plane) Typical image dynamic range is far in excess of display capabilities and of the human visual system Noise management Noise is dominated by photon statistics rather than by scatter User-applied filters Temporal averaging with motion-detection Edge enhancement Contrast enhancement

30 Temporal Constraints Algorithms Thermal loading of x-ray target mandates re-scan of source locations Previously-reconstructed pixels must be re-visited Requires a large fraction of the final image to remain resident in memory for re-scanning Real-Time feedback of physical manipulations: Hand-Eye coordination for the surgeon Imposes maximum latency requirement of < ~100 ms along with sustained 30Hz frame rate

31 Overview Application Algorithms History and Limitations of Traditional Processors GPU Solution

32 Previous Implementations History ~10x increase in resolution/calculations per generation 1 st and 2 nd Generations FPGAs: fully-custom parallel pipelines > $15k/focal-plane x 16 focal-planes = >$240k/system Memory-constrained Development and maintenance difficult 3 rd Generation MPPA (Ambric/Nethra): 336 processors with local memory and flexible data distribution mesh Obsolete architecture Still used FPGAs for input formatting/post-processing ~$1500/focal-plane x 32 focal-planes = ~$48k/system Proprietary development environment

33 History Generation 2 Blue: FPGAs 1 focalplane/board Green: FPGAs Artifact removal Dynamic Range Management Separate board for planeselection

34 History Generation 3 Blue: MPPAs 1 focalplane/chip Green: FPGAs Data input/format Artifact removal Dynamic Range Management Same board used for planeselection

35 Traditional Processors: History Previous attempts to map algorithms to common commercial processors failed DSP Cell GPU Limitations: I/O: bandwidth Memory: Available resources (buffer results for many focal planes) Memory: Cache sizing (fall off the cache) Memory: Burst optimization 2-D array access adjacent accesses in one dimension but not in the other Degree of management required by slower host processors

36 Overview Application Algorithms History and Limitations of Traditional Processors GPU Solution

37 What is our configuration? GPU Solution

38 What is our configuration? GPU Solution 9x K20 ~$850/focal-plane x 32 planes = ~$27k/system 1x GTX680 (for managing displays) PCIe 2.0 backplane Redhat on Supermicro Cuda 5

39 GPU Solution Logical Configuration and Data Flow Ethernet Switch 1000-base-T Image Reconstruction PCIe K20 K20 K20 K20 K20 K20 K20 K20 PCIe Multi-Cast, RDMA PCIe Re-scan Aggregator Fiber X-ray Detector Framing Fiber System Controller (Mediation) PCIe Disk Array X-ray Source Framing Fiber Supermicro PCIe K20 Artifact Removal Dynamic Range Management Plane-Selection PCIe 1000-base-T GTX680 HDMI (GigE Vision) To External System Display

40 GPU Solution Physical Configuration 1000-base-T X-ray Detector 1000-base-T Re-scan Aggregator x8 (Gen 1) Multi-Cast Sensor Data Image Reconstruction PCIe Chassis (Cubix 8) x16 x16 Multi-Cast Sensor Data x16 PCIe Switch (Gen 2) x16 x16 x16 Multi-Cast Sensor Data x16 PCIe Switch (Gen 2) x16 x16 x16 K20 K20 x16 K20 PCIe Switch (Gen 2) K20 x16 x16 K20 x16 x16 x16 x16 K20 x16 K20 PCIe Switch (Gen 2) K20 PCIe Chassis (Cubix 8) x8 (Gen 1) Interconnect Multi-Cast Sensor Data 1000-base-T Disk Array GigE Vision Display 1000-base-T Ethernet Switch 1000-base-T X-ray Source 1000-base-T µp K20 x16 x16 x16 x16 PCIe Switch (Gen 3) GTX 680 Host Computer Artifact Removal Dynamic Range Management Plane-Selection Display 1000-base-T System Controller

41 GPU Solution What is new now that allows it to work? Gen 2 PCIe Interface with multi-cast High-enough bandwidth All planes use the same data set and must receive the same data stream GPU Direct or Remote DMA (RDMA) Allows source data streaming directly to GPUs, bypassing the host Dynamic Parallelism Decreases latency by allowing management of parallel operations without host intervention Significant increase in fast shared memory Significant increase in core density

42 Application Live Image from GPU system

43 GPU Solution What would make it better? More shared memory we are still bandwidth-limited! RDMA improvements Bidirectional: We have to get the images out as well as getting the data in Peer-to-Peer (GPU-to-GPU) communication/coordination without host intervention Better support for real-time operations Timeouts/Host waits Support for code executing on streaming multiprocessor CUDA API is optimized for batch operations, not streaming operations Better debugging support for multi-gpu systems Ability to isolate reporting to subsets

44 Reference S4363: Accelerated X-ray Imaging: Real- Time Multi-Plane Image Reconstruction with CUDA discusses an alternate implementation of the reconstruction algorithm

45 Thanks To Paul Kahn, Jamie Ku, and the rest of the TRT team NovaRay, Inc. NIH University of Wisconsin at Madison MultiCoreWare

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

Advantages of CT in 3D Scanning of Industrial Parts

Advantages of CT in 3D Scanning of Industrial Parts Advantages of CT in 3D Scanning of Industrial Parts Julien Noel, North Star Imaging Inc C omputed tomography (CT) has come along way since its public inception in 1972. The rapid improvement of computer

More information

Router Architectures

Router Architectures Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components

More information

Comp 410/510. Computer Graphics Spring 2016. Introduction to Graphics Systems

Comp 410/510. Computer Graphics Spring 2016. Introduction to Graphics Systems Comp 410/510 Computer Graphics Spring 2016 Introduction to Graphics Systems Computer Graphics Computer graphics deals with all aspects of creating images with a computer Hardware (PC with graphics card)

More information

Router Architectures

Router Architectures Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers Copyright 1999. All

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

1. If we need to use each thread to calculate one output element of a vector addition, what would

1. If we need to use each thread to calculate one output element of a vector addition, what would Quiz questions Lecture 2: 1. If we need to use each thread to calculate one output element of a vector addition, what would be the expression for mapping the thread/block indices to data index: (A) i=threadidx.x

More information

Next Generation Operating Systems

Next Generation Operating Systems Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the

More information

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt. Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu Outline Motivation why do we need GPUs? Past - how was GPU programming

More information

CHAPTER 3: DIGITAL IMAGING IN DIAGNOSTIC RADIOLOGY. 3.1 Basic Concepts of Digital Imaging

CHAPTER 3: DIGITAL IMAGING IN DIAGNOSTIC RADIOLOGY. 3.1 Basic Concepts of Digital Imaging Physics of Medical X-Ray Imaging (1) Chapter 3 CHAPTER 3: DIGITAL IMAGING IN DIAGNOSTIC RADIOLOGY 3.1 Basic Concepts of Digital Imaging Unlike conventional radiography that generates images on film through

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Direct GPU/FPGA Communication Via PCI Express

Direct GPU/FPGA Communication Via PCI Express Direct GPU/FPGA Communication Via PCI Express Ray Bittner, Erik Ruf Microsoft Research Redmond, USA {raybit,erikruf}@microsoft.com Abstract Parallel processing has hit mainstream computing in the form

More information

Graphical displays are generally of two types: vector displays and raster displays. Vector displays

Graphical displays are generally of two types: vector displays and raster displays. Vector displays Display technology Graphical displays are generally of two types: vector displays and raster displays. Vector displays Vector displays generally display lines, specified by their endpoints. Vector display

More information

Advances in scmos Camera Technology Benefit Bio Research

Advances in scmos Camera Technology Benefit Bio Research Advances in scmos Camera Technology Benefit Bio Research scmos camera technology is gaining in popularity - Why? In recent years, cell biology has emphasized live cell dynamics, mechanisms and electrochemical

More information

Packet-based Network Traffic Monitoring and Analysis with GPUs

Packet-based Network Traffic Monitoring and Analysis with GPUs Packet-based Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar wenji@fnal.gov, demar@fnal.gov GPU Technology Conference 2014 March 24-27, 2014 SAN JOSE, CALIFORNIA Background Main

More information

Chapter 3 SYSTEM SCANNING HARDWARE OVERVIEW

Chapter 3 SYSTEM SCANNING HARDWARE OVERVIEW Qiang Lu Chapter 3. System Scanning Hardware Overview 79 Chapter 3 SYSTEM SCANNING HARDWARE OVERVIEW Since all the image data need in this research were collected from the highly modified AS&E 101ZZ system,

More information

Accelerating Wavelet-Based Video Coding on Graphics Hardware

Accelerating Wavelet-Based Video Coding on Graphics Hardware Wladimir J. van der Laan, Andrei C. Jalba, and Jos B.T.M. Roerdink. Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA. In Proc. 6th International Symposium on Image and Signal Processing

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering

More information

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT SPOT THE ODD ONE BEFORE IT IS OUT flexaware.net Streaming analytics: from data to action Do you need actionable insights from various data streams fast?

More information

Network Traffic Monitoring & Analysis with GPUs

Network Traffic Monitoring & Analysis with GPUs Network Traffic Monitoring & Analysis with GPUs Wenji Wu, Phil DeMar wenji@fnal.gov, demar@fnal.gov GPU Technology Conference 2013 March 18-21, 2013 SAN JOSE, CALIFORNIA Background Main uses for network

More information

PTask: Operating System Abstractions To Manage GPUs as Compute Devices

PTask: Operating System Abstractions To Manage GPUs as Compute Devices PTask: Operating System Abstractions To Manage GPUs as Compute Devices C.J. Rossbach, J. Currey - Microsoft Research B. Ray, E. Witchel - University of Texas M.Silberstein - Technion Presentation: Adam

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

GPU-based Decompression for Medical Imaging Applications

GPU-based Decompression for Medical Imaging Applications GPU-based Decompression for Medical Imaging Applications Al Wegener, CTO Samplify Systems 160 Saratoga Ave. Suite 150 Santa Clara, CA 95051 sales@samplify.com (888) LESS-BITS +1 (408) 249-1500 1 Outline

More information

Time and Frequency Synchronizations in Broadcast Video

Time and Frequency Synchronizations in Broadcast Video Time and Frequency Synchronizations in Broadcast Video Introduction Synchronization has always been important in broadcast video. Even analog TVs relied on synchronization pulses embedded into analog video

More information

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 Direct Increased Performance, Scaling and Resiliency July 2012 Motti Beck, Director, Enterprise Market Development Motti@mellanox.com

More information

Computed Tomography Resolution Enhancement by Integrating High-Resolution 2D X-Ray Images into the CT reconstruction

Computed Tomography Resolution Enhancement by Integrating High-Resolution 2D X-Ray Images into the CT reconstruction Digital Industrial Radiology and Computed Tomography (DIR 2015) 22-25 June 2015, Belgium, Ghent - www.ndt.net/app.dir2015 More Info at Open Access Database www.ndt.net/?id=18046 Computed Tomography Resolution

More information

NVIDIA Quadro M4000 Sync PNY Part Number: VCQM4000SYNC-PB. User Guide

NVIDIA Quadro M4000 Sync PNY Part Number: VCQM4000SYNC-PB. User Guide NVIDIA Quadro M4000 Sync PNY Part Number: VCQM4000SYNC-PB User Guide PNY 100 Jefferson Road Parsippany NJ 07054-0218 973-515-9700 www.pny.com/quadro Features and specifications are subject to change without

More information

USB readout board for PEBS Performance test

USB readout board for PEBS Performance test June 11, 2009 Version 1.0 USB readout board for PEBS Performance test Guido Haefeli 1 Li Liang 2 Abstract In the context of the PEBS [1] experiment a readout board was developed in order to facilitate

More information

Cloud Data Center Acceleration 2015

Cloud Data Center Acceleration 2015 Cloud Data Center Acceleration 2015 Agenda! Computer & Storage Trends! Server and Storage System - Memory and Homogenous Architecture - Direct Attachment! Memory Trends! Acceleration Introduction! FPGA

More information

Touchstone -A Fresh Approach to Multimedia for the PC

Touchstone -A Fresh Approach to Multimedia for the PC Touchstone -A Fresh Approach to Multimedia for the PC Emmett Kilgariff Martin Randall Silicon Engineering, Inc Presentation Outline Touchstone Background Chipset Overview Sprite Chip Tiler Chip Compressed

More information

Unified Computing Systems

Unified Computing Systems Unified Computing Systems Cisco Unified Computing Systems simplify your data center architecture; reduce the number of devices to purchase, deploy, and maintain; and improve speed and agility. Cisco Unified

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Using PCIe & intelligent DMA to achieve blazing data rates in real-time recording instruments

Using PCIe & intelligent DMA to achieve blazing data rates in real-time recording instruments August 17, 2011 Design Article Using PCIe & intelligent DMA to achieve blazing data rates in real-time recording instruments Chris Tojeira Chris Tojeira of Pentek describes how the use of PCIe, intelligent

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

Chapter 1 Reading Organizer

Chapter 1 Reading Organizer Chapter 1 Reading Organizer After completion of this chapter, you should be able to: Describe convergence of data, voice and video in the context of switched networks Describe a switched network in a small

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre

Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre Graphics Processing Unit (GPU) Memory Hierarchy Presented by Vu Dinh and Donald MacIntyre 1 Agenda Introduction to Graphics Processing CPU Memory Hierarchy GPU Memory Hierarchy GPU Architecture Comparison

More information

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010 Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

MRC High Resolution. MR-compatible digital HD video camera. User manual

MRC High Resolution. MR-compatible digital HD video camera. User manual MRC High Resolution MR-compatible digital HD video camera User manual page 1 of 12 Contents 1. Intended use...2 2. System components...3 3. Video camera and lens...4 4. Interface...4 5. Installation...5

More information

NVIDIA VIDEO ENCODER 5.0

NVIDIA VIDEO ENCODER 5.0 NVIDIA VIDEO ENCODER 5.0 NVENC_DA-06209-001_v06 November 2014 Application Note NVENC - NVIDIA Hardware Video Encoder 5.0 NVENC_DA-06209-001_v06 i DOCUMENT CHANGE HISTORY NVENC_DA-06209-001_v06 Version

More information

Intel DPDK Boosts Server Appliance Performance White Paper

Intel DPDK Boosts Server Appliance Performance White Paper Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks

More information

Performance of Software Switching

Performance of Software Switching Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance

More information

HyperQ Hybrid Flash Storage Made Easy White Paper

HyperQ Hybrid Flash Storage Made Easy White Paper HyperQ Hybrid Flash Storage Made Easy White Paper Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com sales@parseclabs.com

More information

3D MODEL DRIVEN DISTANT ASSEMBLY

3D MODEL DRIVEN DISTANT ASSEMBLY 3D MODEL DRIVEN DISTANT ASSEMBLY Final report Bachelor Degree Project in Automation Spring term 2012 Carlos Gil Camacho Juan Cana Quijada Supervisor: Abdullah Mohammed Examiner: Lihui Wang 1 Executive

More information

Data Center and Cloud Computing Market Landscape and Challenges

Data Center and Cloud Computing Market Landscape and Challenges Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operatin g Systems: Internals and Design Principle s Chapter 11 I/O Management and Disk Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles An artifact can

More information

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS) PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4

More information

Processor to Usher in a New Era of Computing

Processor to Usher in a New Era of Computing Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced

More information

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K cgupta@clemson.edu, sumodm@clemson.edu Abstract In this project we propose a method to improve

More information

Data and Control Plane Interconnect solutions for SDN & NFV Networks Raghu Kondapalli August 2014

Data and Control Plane Interconnect solutions for SDN & NFV Networks Raghu Kondapalli August 2014 Data and Control Plane Interconnect solutions for SDN & NFV Networks Raghu Kondapalli August 2014 Title & Abstract Title: Data & Control Plane Interconnect for SDN & NFV networks Abstract: Software defined

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

Designing 1000BASE-T1 Into Automotive Architectures

Designing 1000BASE-T1 Into Automotive Architectures Designing 1000BASE-T1 Into Automotive Architectures Alexander E Tan Ethernet PHY and Automotive PLM alextan@marvell.com Ethernet IP & Automotive Tech Day October 23 & 24th, 2014 Agenda What Does 1000BASE-T1

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

NVIDIA Quadro K2200. Product Specifications. NVIDIA Quadro K2200 Part No. VCQK2200 PB $ CUDA Cores 640. Maximum Power Consumption

NVIDIA Quadro K2200. Product Specifications. NVIDIA Quadro K2200 Part No. VCQK2200 PB $ CUDA Cores 640. Maximum Power Consumption NVIDIA Quadro K2200 NVIDIA Quadro K2200 Part No. VCQK2200 PB $599.00 84 0 0 36 Product Specifications CUDA Cores 640 GPU Memory Memory Interface Memory Bandwidth System Interface Maximum Power Consumption

More information

ALSE Video Reference Designs built using AVDB

ALSE Video Reference Designs built using AVDB Dec 2014 ALSE Video Reference Designs built using AVDB HDMI Bypass This demonstration turns AVDB into a TV player that sends video and audio to an HDMI TV or monitor. The video (& audio) comes from an

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

Petascale Visualization: Approaches and Initial Results

Petascale Visualization: Approaches and Initial Results Petascale Visualization: Approaches and Initial Results James Ahrens Li-Ta Lo, Boonthanome Nouanesengsy, John Patchett, Allen McPherson Los Alamos National Laboratory LA-UR- 08-07337 Operated by Los Alamos

More information

A Prototype For Eye-Gaze Corrected

A Prototype For Eye-Gaze Corrected A Prototype For Eye-Gaze Corrected Video Chat on Graphics Hardware Maarten Dumont, Steven Maesen, Sammy Rogmans and Philippe Bekaert Introduction Traditional webcam video chat: No eye contact. No extensive

More information

A CANbus Replacement for the BIMA Antenna Telemetry

A CANbus Replacement for the BIMA Antenna Telemetry A CANbus Replacement for the BIMA Antenna Telemetry A. D. Bolatto 1. Description of Current Telemetry System As of February 2003, the telemetry data flows from the array control computer to the telemetry

More information

Lustre Networking BY PETER J. BRAAM

Lustre Networking BY PETER J. BRAAM Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information

More information

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Learn CUDA in an Afternoon: Hands-on Practical Exercises Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA

More information

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability

More information

X-Rays were discovered accidentally in 1895 by Wilhelm Conrad Röntgen

X-Rays were discovered accidentally in 1895 by Wilhelm Conrad Röntgen X-Rays were discovered accidentally in 1895 by Wilhelm Conrad Röntgen Due to their short wavelength, on the order of magnitude of cells, and their high energy, they can penetrate skin and other soft tissue.

More information

Network Traffic Monitoring and Analysis with GPUs

Network Traffic Monitoring and Analysis with GPUs Network Traffic Monitoring and Analysis with GPUs Wenji Wu, Phil DeMar wenji@fnal.gov, demar@fnal.gov GPU Technology Conference 2013 March 18-21, 2013 SAN JOSE, CALIFORNIA Background Main uses for network

More information

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant

More information

Basler. Line Scan Cameras

Basler. Line Scan Cameras Basler Line Scan Cameras High-quality line scan technology meets a cost-effective GigE interface Real color support in a compact housing size Shading correction compensates for difficult lighting conditions

More information

Scan Time Reduction and X-ray Scatter Rejection in Dual Modality Breast Tomosynthesis. Tushita Patel 4/2/13

Scan Time Reduction and X-ray Scatter Rejection in Dual Modality Breast Tomosynthesis. Tushita Patel 4/2/13 Scan Time Reduction and X-ray Scatter Rejection in Dual Modality Breast Tomosynthesis Tushita Patel 4/2/13 Breast Cancer Statistics Second most common cancer after skin cancer Second leading cause of cancer

More information

Integrated Sensor Analysis Tool (I-SAT )

Integrated Sensor Analysis Tool (I-SAT ) FRONTIER TECHNOLOGY, INC. Advanced Technology for Superior Solutions. Integrated Sensor Analysis Tool (I-SAT ) Core Visualization Software Package Abstract As the technology behind the production of large

More information

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data White Paper Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data What You Will Learn Financial market technology is advancing at a rapid pace. The integration of

More information

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search

More information

7 MEGAPIXEL 180 DEGREE IP VIDEO CAMERA

7 MEGAPIXEL 180 DEGREE IP VIDEO CAMERA Scallop Imaging is focused on developing, marketing and manufacturing its proprietary video imaging technology. All our activities are still proudly accomplished in Boston. We do product development, marketing

More information

CHAPTER FIVE RESULT ANALYSIS

CHAPTER FIVE RESULT ANALYSIS CHAPTER FIVE RESULT ANALYSIS 5.1 Chapter Introduction 5.2 Discussion of Results 5.3 Performance Comparisons 5.4 Chapter Summary 61 5.1 Chapter Introduction This chapter outlines the results obtained from

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

FLAT PANEL DETECTORS. Dedicated Cardiac Cath Lab Biplane Five-Axis

FLAT PANEL DETECTORS. Dedicated Cardiac Cath Lab Biplane Five-Axis FLAT PANEL DETECTORS Dedicated Cardiac Cath Lab Biplane Five-Axis Advanced technologies deliver optimized biplane imaging Designed in concert with leading pediatric physicians, the Infinix CF-i/BP provides

More information

Nutaq. PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET. nutaq.com MONTREAL QUEBEC

Nutaq. PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET. nutaq.com MONTREAL QUEBEC Nutaq PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET QUEBEC I MONTREAL I N E W YO R K I nutaq.com Nutaq PicoDigitizer 125-Series The PicoDigitizer 125-Series

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,

More information

Understanding Video Latency What is video latency and why do we care about it?

Understanding Video Latency What is video latency and why do we care about it? By Pete Eberlein, Sensoray Company, Inc. Understanding Video Latency What is video latency and why do we care about it? When choosing components for a video system, it is important to understand how the

More information

Operating Systems 4 th Class

Operating Systems 4 th Class Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science

More information

SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC

SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort 1.2

More information

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

Immersive experiences The experiences worth having, remembering, and reliving Draw you in Take you to another place Keep you present in the moment 2

Immersive experiences The experiences worth having, remembering, and reliving Draw you in Take you to another place Keep you present in the moment 2 Qualcomm Technologies, Inc. August 2015 1 Immersive experiences The experiences worth having, remembering, and reliving Draw you in Take you to another place Keep you present in the moment 2 Immersion

More information

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance Alex Ho, Product Manager Innodisk Corporation Outline Innodisk Introduction Industry Trend & Challenge

More information

Cloud-Based Apps Drive the Need for Frequency-Flexible Clock Generators in Converged Data Center Networks

Cloud-Based Apps Drive the Need for Frequency-Flexible Clock Generators in Converged Data Center Networks Cloud-Based Apps Drive the Need for Frequency-Flexible Generators in Converged Data Center Networks Introduction By Phil Callahan, Senior Marketing Manager, Timing Products, Silicon Labs Skyrocketing network

More information

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory supports up to 4 display monitor(s) without DisplayPort 4 x Maximum Display

More information

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015 INF5063: Programming heterogeneous multi-core processors because the OS-course is just to easy! Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks October 20 th 2015 Håkon Kvale

More information

White Paper. Cloth Simulation. February 2007 WP-03018-001_v01

White Paper. Cloth Simulation. February 2007 WP-03018-001_v01 White Paper Cloth Simulation February 2007 WP-03018-001_v01 White Paper Document Change History Version Date Responsible Reason for Change _v01 CZ, TS Initial release Go to sdkfeedback@nvidia.com to provide

More information

Large surveillance systems

Large surveillance systems Axis partner whitepaper Large surveillance systems An Axis, HP and Milestone joint solution guide. In cooperation with: Table of contents Overview 3 1. Purpose and motivation 4 2. System overview 4 3.

More information

GPU for Scientific Computing. -Ali Saleh

GPU for Scientific Computing. -Ali Saleh 1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU

More information

GOJAN SCHOOL OF BUSINESS AND TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY CS2411-OPERATING SYSTEM QUESTION BANK UNIT-I (PROCESSES AND THREADS)

GOJAN SCHOOL OF BUSINESS AND TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY CS2411-OPERATING SYSTEM QUESTION BANK UNIT-I (PROCESSES AND THREADS) GOJAN SCHOOL OF BUSINESS AND TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY CS2411-OPERATING SYSTEM QUESTION BANK UNIT-I (PROCESSES AND THREADS) 1. What is an Operating system? What are the various OS

More information

DICOM Correction Item

DICOM Correction Item Correction Number DICOM Correction Item CP-626 Log Summary: Type of Modification Clarification Rationale for Correction Name of Standard PS 3.3 2004 + Sup 83 The description of pixel spacing related attributes

More information