NVIDIA Jetson TK1 Development Kit

Similar documents
Standardization with ARM on COM Qseven. Zeljko Loncaric, Marketing engineer congatec

NVIDIA GeForce GTX 580 GPU Datasheet

Whitepaper. NVIDIA Miracast Wireless Display Architecture

NVIDIA VIDEO ENCODER 5.0

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

ARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG

TEGRA LINUX DRIVER PACKAGE R21.1

NVIDIA AUTOMOTIVE. Driving Innovation

QuickSpecs. NVIDIA Quadro M GB Graphics INTRODUCTION. NVIDIA Quadro M GB Graphics. Overview

High Performance GPGPU Computer for Embedded Systems

SABRE Lite Development Kit

NVIDIA GeForce GTX 750 Ti

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

NVIDIA GRID OVERVIEW SERVER POWERED BY NVIDIA GRID. WHY GPUs FOR VIRTUAL DESKTOPS AND APPLICATIONS? WHAT IS A VIRTUAL DESKTOP?

ARM Cortex -A8 SBC with MIPI CSI Camera and Spartan -6 FPGA SBC1654

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Accelerating CFD using OpenFOAM with GPUs

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

Java Embedded Applications

NVIDIA GeForce Experience

NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

HP Workstations graphics card options

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

TESLA C2050/2070 COMPUTING PROCESSOR INSTALLATION GUIDE

EVGA Z97 Classified Specs and Initial Installation (Part 1)

Windows Embedded Security and Surveillance Solutions

Designing Feature-Rich User Interfaces for Home and Industrial Controllers

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

TESLA M2075 DUAL-SLOT COMPUTING PROCESSOR MODULE

GeoImaging Accelerator Pansharp Test Results

HP Workstations graphics card options

Intel architecture. Platform Basics. White Paper Todd Langley Systems Engineer/ Architect Intel Corporation. September 2010

Video/Cameras, High Bandwidth Data Handling on imx6 Cortex-A9 Single Board Computer

QULU VMS AND SERVERS Elegantly simple, Ultimately scalable

THE AMD MISSION 2 AN INTRODUCTION TO AMD NOVEMBER 2014

White Paper. Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor

NVIDIA GRID DASSAULT CATIA V5/V6 SCALABILITY GUIDE. NVIDIA Performance Engineering Labs PerfEngDoc-SG-DSC01v1 March 2016

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

HPC with Multicore and GPUs

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

Intel RAID SSD Cache Controller RCS25ZB040

PCI Express* Ethernet Networking

Intel X58 Express Chipset

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

TESLA M2050 AND TESLA M2070 DUAL-SLOT COMPUTING PROCESSOR MODULES

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Which ARM Cortex Core Is Right for Your Application: A, R or M?

Technical Brief. Introducing Hybrid SLI Technology. March 11, 2008 TB _v01

Intel Gateway Solutions for the Internet of Things. Intel Quark SoC X1000 Applications Marketing Seminar Anaheim, California Oct.

NVIDIA GRID 2.0 ENTERPRISE SOFTWARE

Board Specification. Tesla C1060 Computing Processor Board. September 2008 BD _v03

Windows Embedded Compact 7: RemoteFX and Remote Experience Thin Client Integration

Using Mobile Processors for Cost Effective Live Video Streaming to the Internet

Achieving Real-Time Performance on a Virtualized Industrial Control Platform

EVGA X99 Classified Specs and Initial Installation (Part 1)

Embedded Development Tools

AFDtek Energy Dashboard

VDI Clients. Delivering Tomorrow's Virtual Desktop Today

QUADRO POWER GUIDELINES

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

The Future of the ARM Processor in Military Operations

TESLA K10 GPU ACCELERATOR

White Paper EMBEDDED GPUS ARE NO GAMBLE

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

VDI Clients Delivering Tomorrow's Virtual Desktop Today

Installation Guide. (Version ) Midland Valley Exploration Ltd 144 West George Street Glasgow G2 2HG United Kingdom

GRID LICENSING. DU April User Guide

SBC8600B Single Board Computer

PNY Professional Solutions NVIDIA GRID - GPU Acceleration for the Cloud

Cloud Gaming & Application Delivery with NVIDIA GRID Technologies. Franck DIARD, Ph.D. GRID Architect, NVIDIA

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

Lecture 2: Computer Hardware and Ports.

Computer Graphics Hardware An Overview

CUBIX ACCEL-APP SYSTEMS Linux2U Rackmount Elite

Technical Brief. NVIDIA nview Display Management Software. May 2009 TB _v02

Light Industrial Panel PC

TESLA K20X GPU ACCELERATOR

THE WORLD LEADER IN VISUAL COMPUTING

Technical Brief. Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output. April 2008 TB _v01

Several tips on how to choose a suitable computer

Intel Desktop Board D925XECV2 Specification Update

System Design Issues in Embedded Processing

How to choose a suitable computer

Super-Accelerated, Simultaneous Transport & Transcoding

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Our innovation, Your Applications. Your Own Custom Embedded Board in 5 weeks!

NVIDIA Tesla Compute Cluster Driver for Windows

BIOS Update Release Notes

SNAPPIN.IO. FWR is a Hardware & Software Factory, which designs and develops digital platforms.

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Setting a new standard

ARM* to Intel Atom Microarchitecture - A Migration Study

HOW MANY USERS CAN I GET ON A SERVER? This is a typical conversation we have with customers considering NVIDIA GRID vgpu:

Several tips on how to choose a suitable computer

Tegra Android Accelerometer Whitepaper

Intel Processors in Industrial Control and Automation Applications Top-to-bottom processing solutions from the enterprise to the factory floor

Transcription:

Technical Brief NVIDIA Jetson TK1 Development Kit Bringing GPU-accelerated computing to Embedded Systems

P a g e 2 V1.0

P a g e 3 Table of Contents... 1 Introduction... 4 NVIDIA Tegra K1 A New Era in Mobile Computing... 5 Kepler GPU in Tegra K1... 6 Jetson TK1 Development Kit... 7 Hardware Platform... 7 Software Platform... 7 VisionWorks Computer Vision Toolkit... 8 Jetson TK1 Delivering Exceptional Performance and Power Efficiency... 9 Conclusion... 11 Appendix... 12 Jetson K1 Platform Peripheral and IO ports... 12 Jetson K1 Development Platform Power Consumption... 12 Document Revision History... 14

P a g e 4 Introduction GPU-accelerated computing is rapidly increasing the velocity of innovation in the fields of science, medicine, finance and engineering. GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate scientific, engineering, and enterprise applications. Pioneered by NVIDIA, GPU-accelerated computing offers unprecedented application performance by offloading compute-intensive portions of the application to the GPU, while the remainder of the code still runs on the CPU. From a user's perspective, applications simply run significantly faster. NVIDIA also defined the NVIDIA CUDA parallel programming platform to standardize and refine GPUaccelerated computing. CUDA has become the world s leading GPU computing platform used by millions of users for high-performance computing across a range of industries and sciences, including usage in many of the top supercomputers in the world. GPU computing delivers unprecedented levels of performance speedups by parallelizing application workloads and running them on the GPU. NVIDIA s Fermi and Kepler GPUs have already redefined and accelerated High Performance Computing (HPC) capabilities in areas such as seismic processing, biochemistry simulations, weather and climate modeling, computational finance, computer aided engineering, computational fluid dynamics and data analysis. The Kepler Compute architecture together with NVIDIA s CUDA parallel programming platform delivers tremendous performance speedup not only for numerous high-performance computing applications, but also for applications such as speech recognition, live video processing, computer vision, augmented reality, and of course computer gaming. Embedded Computing is the next frontier where GPUs can help accelerate the pace of innovation and deliver significant benefits in the fields of computer vision, robotics, automotive, image signal processing, network security, medicine, and many others. The Jetson TK1 Development Kit is specifically designed to enable rapid development of GPU-accelerated embedded applications, bringing significant parallel processing performance and exceptional power efficiency to embedded applications. The Jetson TK1 Development kit is designed around the revolutionary 192-core NVIDIA Tegra K1 mobile processor. Tegra K1 is based on the same NVIDIA Kepler GPU architecture used in supercomputers and High Performance Computing systems around the world. The Jetson Development Kit delivers a fully functional NVIDIA CUDA platform and includes the Board Support Package, CUDA 6, OpenGL 4.4, and the NVIDIA VisionWorks toolkit. With a complete suite of development and profiling tools, plus out-ofthe-box support for cameras and other peripherals, the NVIDIA Jetson TK1 Development Kit is the ideal development platform to shape a brand new future for Embedded Computing. Note: Details on Jetson TK1 IO ports and Platform Power is provided in the Appendix

P a g e 5 NVIDIA Tegra K1 A New Era in Mobile Computing NVIDIA s latest and most advanced mobile processor, the Tegra K1, creates a major discontinuity in the state of mobile graphics by bringing the powerful NVIDIA Kepler GPU architecture to mobile and delivering tremendous visual computing capabilities and breakthrough power efficiency. The NVIDIA Tegra K1 mobile processor is designed from the ground up to create a major discontinuity in the capabilities of mobile processors, and delivers the industry s fastest and most power efficient implementation of mobile CPUs, PC-class graphics, and advanced GPU-accelerated computing capabilities. Some of the key features of the Tegra K1 SoC (System-on-a-Chip) architecture are: 4-PLUS-1 Cortex A15 r3 CPU architecture that delivers higher performance and is more power efficient than the previous generation. Kepler GPU architecture that utilizes 192 CUDA cores to deliver advanced graphics capabilities, GPU computing with NVIDIA CUDA 6 support, breakthrough power efficiency and performance for the next generation of gaming and GPU-accelerated computing applications. Dual ISP Core that delivers 1.2 Giga Pixels per second of raw processing power supporting camera sensors up to 100 Megapixels. Advanced Display Engine that is capable of simultaneously driving both the 4K local display and a 4K external monitors via HDMI Built on the TSMC 28 nm HPM process to deliver excellent performance and power efficiency. Figure 1 NVIDIA Tegra K1 Mobile Processor

P a g e 6 Kepler GPU in Tegra K1 The Kepler GPU in Tegra K1 is built on the same high performance, energy efficient Kepler GPU architecture that is found in our high-end GeForce, Quadro, and Tesla GPUs for graphics and computing. As a result, Tegra K1 is the only mobile processor that supports CUDA 6 for computing and full desktop OpenGL 4.4 for graphics. Kepler delivers the most advanced graphics for mobile graphics applications, and is the first modern mobile GPU capable of supporting all the GPU compute APIs. Tegra K1 with the Kepler GPU architecture is a parallel processor capable of over 300 GFLOPS of 32-bit floating point computations. Tegra K1 is also very power efficient and delivers almost fifty percent higher performance per watt compared to competing mobile processors. More importantly, Tegra K1 s support of CUDA and desktop graphics APIs means that much of your existing compute and graphics software will port quite easily to Jetson TK1. TEGRA K1 Kepler Graphics GEFORCE GTX 770 OpenGL ES 3.1 OpenGL 4.4 DX12 Tessellation CUDA 6.0 Figure 2 Tegra K1's Kepler GPU supports the feature set found in high-end NVIDIA GPUs The Kepler GPU delivers the graphics features, rich APIs, and compute architecture of its desktop counterpart, and has additional power optimizations for mobile usage. The Kepler GPU in Tegra K1 is a significant milestone in the history of computing and computer graphics and will drive a revolutionary change in mobile visual computing. More details on Tegra K1 and its Kepler GPU are provided in the Tegra K1 whitepaper.

P a g e 7 Jetson TK1 Development Kit Figure 3 Jetson TK1 Development board Top View Hardware Platform The Jetson TK1 development Kit is a 5 wide by 5 long PCB with a Tegra K1 processor, 2 GB of RAM, 16GB of on-board storage and numerous peripherals and IO ports (see Appendix). To enable development of computer vision and other camera based embedded applications, Jetson K1 is capable of supporting multiple cameras through a variety of interfaces. The USB 3.0 and Gigabit Ethernet ports can be used to hook up cameras that communicate via these interfaces. In addition, the PCIe x1 port on the PCB can be used to connect an additional camera via an Ethernet to PCIe adapter. The CSI 1x4 and 1x1 buses available through the expansion port can be used to feed camera images directly into the Image Signal Processor (ISP) on Tegra K1 (bypassing memory), for direct image processing. Software Platform The Jetson TK1 Development Kit runs Linux for Tegra (L4T), a modified Ubuntu 14.04 Linux distribution provided by NVIDIA. The software provided by NVIDIA includes the Board Support Package (BSP) and the software stack that includes CUDA 6 Toolkit, OpenGL 4.4 drivers and the NVIDIA VisionWorks Toolkit.

P a g e 8 VisionWorks Computer Vision Toolkit Many of the interesting embedded computing domains that Jetson TK1 will support rely on computer vision. Jetson TK1 supports NVIDIA s new VisionWorks Computer Vision Toolkit. VisionWorks is an SDK that provides a rich set of algorithms optimized for NVIDIA CUDA-capable GPUs and SOCs such as Tegra K1, giving you the power to realize CV applications quickly on a scalable and flexible platform. The core VisionWorks algorithms are engineered for solutions in advanced driver assistance systems (ADAS), augmented reality (AR), computational photography, human-machine interaction (HMI), and robotics. The VisionWorks Toolkit includes libraries of algorithms and primitives to enable highly optimized pipelines on Tegra K1, example code, and full documentation. Having the level of performance and energy efficiency Jetson TK1 offers can potentially support the development of robots with real-time object recognition and compelling autonomous navigation capabilities - Chris Jones, Director of Strategic Technology Development irobot Corporation

P a g e 9 Jetson TK1 Delivering Exceptional Performance and Power Efficiency The architecture of the Kepler GPU in Tegra K1 is virtually identical to the Kepler GPU architecture used in high-end systems, but also includes a number of optimizations for mobile system usage to conserve power and deliver industry-leading mobile GPU performance. While the highest-end Kepler GPUs in desktop, workstation, and supercomputers include up to 2880 single-precision floating point CUDA cores and consume a few hundred watts of power, the Jetson TK1 platform with Tegra K1 includes 192 CUDA cores and consumes significantly lower power. Note that the Tegra K1 Kepler GPU has more cores than many entry-level to mainstream desktop GPUs of just a few years ago. Delivering higher power efficiency and much lower platform power consumption, the Jetson TK1 platform is ideally suited for applications in the embedded space that require exceptional power efficiency, low thermal dissipation and significantly higher performance than current FPGA and x86-based embedded solutions. Tegra K1 can change what s possible in the rugged and industrial embedded market. We expect to be able to offer solutions in the sub-10 watt space that previously consumed 100 watts or more. - Simon Collins, Product Manager GE Intelligent Platforms When compared to current generation mobile processors, the Tegra K1 powered Jetson TK1 platform delivers almost 2.5x the peak performance of competing mobile processors. When limited to match the power consumption of competing mobile processors, Tegra K1 delivers almost 50% higher performance per Watt.

Performance per Watt Performance Speedup P a g e 10 3 2.5 2 Jetson TK1 performance on GFXBench3.0 Manhattan 1080p offscreen Jetson TK1 delivers 2.5x the graphics performance of competing processors 1.5 1 0.5 0 Apple A7 Tegra K1 Figure 4 Jetson TK1 delivers 2.5X higher graphics performance than competing processors 1 8 7 6 5 4 Tegra K1 Performance Per Watt on GFXBench 3.0 Manhattan Offscreen 5.08 FPS/Watt Tegra K1 delivers higher performance while consuming same power as A7 7.29 FPS/Watt 3 2 1 0 Apple A7 Tegra K1 Figure 5 Tegra K1 delivers almost 1.5X performance per watt compared to Apple A7 23 1 Tegra K1 performance measured on Jetson TK1 platform running Linux for Tegra using Linux version of GFXBench 3.0. Apple A7 performance measured on iphone 5S running ios 7.1 2 Tegra K1 performance was measured by constraining Tegra K1 AP+DRAM power to match the AP+DRAM power consumption of iphone 5S while running GFXBench3.0 Manhattan 1080p offscreen test. 3 Tegra K1 Performance per watt represented in this chart were collected on a mobile optimized Tegra K1 platform (that uses LPDDR3, smart panel, and other mobile optimized platform components). Jetson TK1 platform is not optimized for mobile power levels.

P a g e 11 Conclusion The NVIDIA Jetson TK1 Development kit is the world s first mobile supercomputer for embedded systems and opens the door for embedded system designs to harness the power of GPU-accelerated computing. Jetson TK1 will enable a new generation of applications for computer vision, robotics, medical imaging, automotive, and many other areas. Powered by the revolutionary 192-core NVIDIA Tegra K1 mobile processor, the Jetson platform delivers over 300 GFLOPS of performance that is almost three times more than any similar embedded platform. The fully programmable 192 CUDA cores in Tegra K1 along with CUDA 6 Toolkit support makes programming on Jetson TK1 much easier than on FPGA, Custom ASIC, and DSP processors that are commonly used in current embedded systems. The CUDA programming model is used by over 100,000 developers at over 8,000 institutions worldwide. The Jetson TK1 Developer Kit comes with full support of the CUDA 6.0 developer tool suite, including debuggers and profilers, to allow development of powerful embedded applications. Jetson TK1 fast tracks embedded computing into a future where machines interact and adapt to their environments in real time, and deliver whole new experiences in various fields such as robotics, augmented reality, computational photography, human-computer interface, and advanced driver assistance systems.

P a g e 12 Appendix Jetson K1 Platform Peripheral and IO ports Standard ports on Jetson TK1 Development board 1 Half mini-pcie slot 1 Full-size SD/MMC connector 1 Full-size HDMI port 1 USB 2.0 port, micro AB 1 USB 3.0 port, A 1 RS232 serial port 1 ALC5639 Realtek Audio Codec with Mic in and Line out 1 RTL8111GS Realtek GigE LAN 1 SATA data port SPI 4MByte boot flash Additional ports available via Expansion port DP/LVDS Touch SPI 1x4 + 1x1 CSI-2 GPIOs, UART, HSIC and I2C Table 1 Peripheral and IO Ports on Jetson TK1 Development board Jetson TK1 Development Platform Power Consumption The Jetson TK1 Development platform is primarily designed to enable the development of GPUaccelerated embedded applications and is not optimized to deliver the low power consumption required on mobile devices such as tablets and smartphones. Being a development platform, the Jetson TK1 has numerous hardware, IO, and peripheral interfaces that add to the total platform power consumption. The core software package is optimized to deliver the performance required for embedded applications such as computer vision, robotics, and image processing. For example, mobile platforms that run on battery power may use power-efficient LPDDR3 memory, smart panel displays, PMICs, and other low power components. The Jetson development platform is designed to run on AC power and therefore uses standard DDR3L memory, Ethernet adapters, standard HDMI, and other platform components that are not optimized for low power.

P a g e 13 Due to these design choices, the Jetson TK1 development platform should not be used evaluate the power efficiency of the Tegra K1 mobile processor for mobile implementations and applications. Developers and OEMs who want to power profile applications for mobile are encouraged to contact our NVIDIA Developer Relations team to request mobile platform support. In addition, the Developer Relations team can provide guidance for power efficiency optimization for mobile and embedded applications. The following table breaks down the Jetson TK1 platform power consumption at idle state and at peak performance. Platform Component Idle State Platform Power (milli-watts) Platform power when AP+DRAM is constrained to Apple A7 power when running GFXBench 3.0 4 (milli-watts) Platform power when delivering peak GFXBench 3.0 Performance(milli- Watts) AP+DRAM 5 660 3660 6 6980 Power consumed by Platform components such as Fan, HDMI, PCIE, SATA and others SoC Voltage regulator Efficiency loss Total Power at DC input of the board 2080 2090 2090 280 960 1790 3020 6710 10860 AC to DC conversion loss (15%) 7 450 1180 1630 Power consumed at AC outlet 8 3470 7890 12490 Table 2 Jetson TK1 Development Platform Power 4 Power was measured while running GFXBench 3.0 Manhattan 1080p Offscreen test 5 Jetson TK1 uses standard DDR3L memory and the L4T processor governors are not optimized for power efficiency. Conclusions based on these numbers on Tegra K1 power consumption for mobile applications will be incorrect. 6 Apple A7 consumed 2560 mw to deliver 13 fps on iphone 5S. An equivalent Tegra K1 mobile platform consumes 2605 mw to deliver 1.45x higher performance. Since Jetson uses DDR3L and OS is not tuned for mobile, it consumes 3658 mw to deliver an equivalent 1.45x performance advantage over Apple A7. 7 AC to DC power conversion efficiency of the power adapter will have some variance that is also influenced by the power consumed by the board. 8 AC power meters may have considerable measurement variances and it is recommended that power be measured at the DC power input of the board.

P a g e 14 Document Revision History Initial release 1.0

P a g e 15 Notice ALL INFORMATION PROVIDED IN THIS WHITE PAPER, INCLUDING COMMENTARY, OPINION, NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, MATERIALS ) ARE BEING PROVIDED AS IS. NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, the NVIDIA logo, Chimera, Tegra, TegraZone are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright 2014 NVIDIA Corporation. All rights reserved.