Choosing the Right DSP for High-Resolution Imaging in Mobile and Wearable Applications



Similar documents
A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Basler beat AREA SCAN CAMERAS. High-resolution 12 MP cameras with global shutter

Architectures and Platforms

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Basler dart AREA SCAN CAMERAS. Board level cameras with bare board, S- and CS-mount options

Basler scout AREA SCAN CAMERAS

Computer Graphics Hardware An Overview

Datasheet. Unified Video Surveillance Management. Camera Models: UVC, UVC-Dome, UVC-Micro, UVC-Pro NVR Model: UVC-NVR

NVIDIA AUTOMOTIVE. Driving Innovation

Datasheet. Unified Video Surveillance Management. Camera Models: UVC, UVC-Dome, UVC-Pro NVR Model: UVC-NVR. Scalable Day or Night Surveillance

Which ARM Cortex Core Is Right for Your Application: A, R or M?

A case study of mobile SoC architecture design based on transaction-level modeling

7a. System-on-chip design and prototyping platforms

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance

Overview. Proven Image Quality and Easy to Use Without a Frame Grabber. Your benefits include:

USB 3.0 Camera User s Guide

Networking Remote-Controlled Moving Image Monitoring System

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions

What is a System on a Chip?

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Image-Processing System LSI for

Basler. Line Scan Cameras

big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices

CSE 237A Final Project Final Report

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Basler pilot AREA SCAN CAMERAS

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

A Computer Vision System on a Chip: a case study from the automotive domain

Basler racer. Line Scan Cameras. Next generation CMOS sensors with 2k to 12k resolution and up to 80 khz line rate

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH

FLIX: Fast Relief for Performance-Hungry Embedded Applications

DATA/SPEC SHEET VIRTUAL MATRIX DISPLAY CONTROLLER VERSION 8

White Paper. Recording Server Virtualization

How To Make A Car A Car Into A Car With A Car Stereo And A Car Monitor

Next Generation GPU Architecture Code-named Fermi

Basler. Area Scan Cameras

1-Gigabit TCP Offload Engine

NVIDIA GeForce GTX 580 GPU Datasheet

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Embedded Systems on ARM Cortex-M3 (4weeks/45hrs)

Mobile Processors: Future Trends

PCI Express* Ethernet Networking

Speed Performance Improvement of Vehicle Blob Tracking System

QuickSpecs. Model. HP USB HD 720p v2 Business Webcam. Overview

Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim?

Introduction to the Latest Tensilica Baseband Solutions

A Safety Methodology for ADAS Designs in FPGAs

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

Intelligent Monitoring Software. IMZ-RS400 Series IMZ-RS401 IMZ-RS404 IMZ-RS409 IMZ-RS416 IMZ-RS432

Basler Cameras. Product line overview

HD Security Camera. No cords. No wires. No worries. Features. Motion Alerts Receive push notifications and s if anything moves

Performance Optimization and Debug Tools for mobile games with PlayCanvas

CMOS OV7660 Camera Module 1/5-Inch 0.3-Megapixel Module Datasheet

FPGA. AT6000 FPGAs. Application Note AT6000 FPGAs. 3x3 Convolver with Run-Time Reconfigurable Vector Multiplier in Atmel AT6000 FPGAs.

Fujisoft solves graphics acceleration for the Android platform

Android Box SNNPB73B

HP Moonshot System. Table of contents. A new style of IT accelerating innovation at scale. Technical white paper

Smart Home Security Camera. No cords. No wires. No worries. Features. Motion Alerts Receive push notifications and s if anything moves

How to Choose the Right Network Cameras. for Your Surveillance Project. Surveon Whitepaper

Arlo Wire-free HD Camera Security System

SOC architecture and design

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

WHITE PAPER. Are More Pixels Better? Resolution Does it Really Matter?

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

DesignWare IP for IoT SoC Designs

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

Consolidating Multiple Network Appliances

ni.com/vision NI Vision

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

SMARTPHONE INNOVATION

Application Performance Analysis of the Cortex-A9 MPCore

Boundless Security Systems, Inc.

ELEC 5260/6260/6266 Embedded Computing Systems

Agenda. Context. System Power Management Issues. Power Capping Overview. Power capping participants. Recommendations

Development of a high-resolution, high-speed vision system using CMOS image sensor technology enhanced by intelligent pixel selection technique

Dell Wyse Cloud Connect

Hardware accelerated Virtualization in the ARM Cortex Processors

PC Solutions That Mean Business

SoMA. Automated testing system of camera algorithms. Sofica Ltd

SNC-VL10P Video Network Camera

White Paper. Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor

IoT Platform for Signal Processing Savvy Applications. May 2015 Moshe Sheier, Director of Product Marketing

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Operating Systems: Basic Concepts and History

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Standardization with ARM on COM Qseven. Zeljko Loncaric, Marketing engineer congatec

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

Application Note SigC667x-64 Cards Power and Thermal Evaluation

How To Use The Dc350 Document Camera

White Paper. Enhancing Website Security with Algorithm Agility

PARALLELS SERVER BARE METAL 5.0 README

Transcription:

Choosing the Right DSP for High-Resolution Imaging in Mobile and Wearable Applications By Pulin Desai, Cadence Design Systems From smartphones to smart watches, from advanced driver assistance systems (ADAS) to virtual-reality gaming consoles, drone control, and a host of security devices, the application areas that rely on highresolution imaging (1080p, 4K, and beyond) are growing. There s great opportunity to further enhance high-resolution imaging quality to increase the effectiveness of computer vision applications, especially those for mobile and wearable devices. What s needed is a next-generation DSP that can strike the right balance between power consumption and performance. In this paper, we will discuss criteria to consider when choosing the ideal DSP for high-resolution imaging in mobile and wearable applications. Contents Introduction...1 Solving the Power/Performance Riddle...2 What Kind of DSP Can Support Computer Vision Applications?...3 New Image and Vision DSP...4 Use Cases...5 Summary...6 For Further Information...6 Footnote...6 Introduction When Steve Sasson, an Eastman Kodak engineer, unveiled the first digital camera in 1975, he gave the world a whole new way to look at photography. Who knew back then how much image resolution would improve and how ubiquitous cameras would become? Today, high-resolution imaging is so sophisticated that we re relying on it for everything from face detection in smartphones to face recognition in security systems to traffic sign recognition in our vehicles, and for the autonomous vehicle of the future. According to Yole Developpement, the CMOS image sensor industry is expected to grow at a CAGR of 10.6% to become a US$16.2B market by 2020. While smartphone applications comprise the bulk of the market, automotive, medical, and surveillance applications also present market opportunities. On the technology side, camera module height and pixel sizes currently as small as 0.9μ are reaching a limitation. With the increased pace of change in new sensor technologies, unique image processing methods are needed to extract the highest possible image fidelity. Figure 1 outlines the latest approach for the camera pipeline. Image and vision processing algorithms can be implemented in an electronic system using a combination of general-purpose CPUs and/or GPUs, or the hardware pipeline or RTL. Because sensor technologies are changing at a rapid rate and vary widely with different suppliers, the RTL pipeline struggles to keep pace in terms of size and functionality.

ISP Sensor Hardware Pipeline, Bayer to RGB Post- (YUV) Image/Video Analysis Vision Series Focus Type of Lens Shading Defect Correction Demosaic Color Matrix 2D/3D Noise Reduction Image Stabilization HDR/WDR Super Resolution Face Detection People Detection Object Detection Gesture Detection Motion Detection Image Vision Figure 1: The latest approach for the camera pipeline. In this paper, we ll discuss a more viable method for 32MP and above image and vision processing: offloading these functions to specially designed DSPs. By choosing the right DSP, you can maintain the functionality needed for your computer vision design without sacrificing performance or power. Solving the Power/Performance Riddle While the CPU/GPU route is flexible, one of the biggest drawbacks is the performance/power tradeoff. As Figure 2 shows, running computationally demanding image and vision processing algorithms taxes the CPU/GPU, weighing down the performance for other tasks and consuming a substantial amount of power. Moreover, if your end device is a wearable, the CPU/GPU option will not support the long battery life needed, will likely cause the device to run hot (and slow down the CPU/GPU frequency), and will consume a substantial amount of power, particularly for applications such as video chat. CPU/GPU Offload Energy Comparison: Noise Reduction 2000 1600 Energy Consumption 1200 800 400 0 Host CPU (4 cores) Host CPU (4 cores) + 3 pipe GPU (4 cores) Vision DSP 25X More Energy-Efficient than CPU Figure 2: Offloading image and vision processing functions to a DSP results in substantial energy consumption savings. www.cadence.com 2

The other option, running the algorithms on an RTL-based hardware design, provides a performance/power advantage, but limits your flexibility. Hardware design requires a lengthy design and verification process, and each function would require its own gates a costly proposition. And should anything change in your algorithm to address a new requirement or sensor, or fix a bug, for example you d have to perform an expensive hardware re-spin. Offloading to a specially designed DSP presents a solution that balances performance and power requirements, provided you have the right DSP. A camera pipeline with a DSP consumes a smaller amount of total logic, as the DSP can be reused for multiple algorithms. What Kind of DSP Can Support Computer Vision Applications? Computer vision which involves duplicating the abilities of human vision by electronically perceiving and understanding an image has been around since the 1950s. First used in academia, computer vision is becoming prevalent in mobile wearable devices, as well as automotive and drone applications. In today s consumer devices, computer vision is driven by low energy consumption (for a longer battery life), high performance, and killer use cases. Wouldn t you enjoy video chatting with a far-away family member on a tablet featuring razor-sharp 4K resolution? Or, consider the possibilities if the same technology could support functional measurements so that users could determine, for instance, whether a piece of furniture seen online could fit in their bedroom. Camera processing systems for these applications consist of two key components image processing and vision processing each with its own requirements that vary depending on the end device. Image Sophisticated noise reduction to accommodate small pixel size Video image stabilization capabilities Fast auto-focus capabilities High dynamic range (HDR) Support for dual cameras in the same device Support for wide dynamic range (WDR), digital video stabilization, and multiple advanced noise reduction methods Vision Face detection, with resolution ranging from the standard video graphics array (VGA) level to 720p People and object detection Gesture tracking Eye tracking capabilities Motion detection Convolutional neural network (CNN) Continually improving image resolution is also driving requirements for high memory bandwidth. While today s smartphones can deliver as many as 16MP, tomorrow s phones are expected to have resolution as high as 24MP to 32MP. Similarly, smartphones are already delivering 1080p video and will eventually move to 4K, with videos running at 120fps. With these types of imaging and vision processing functions and the memory bandwidth demands, flexibility is key for a computer vision DSP. After all, optics, sensors, and algorithms are constantly changing. Supporting these high-resolution image and vision processing functions and memory bandwidth demands requires a highly flexible computer vision DSP with a variety of features: Support for multiple algorithms, such as face detection, gesture detection, motion detection, image stabilization, HDR, WDR, and 2D/3D noise reduction www.cadence.com 3

Programmability, so that a single DSP can support multiple end applications design requirements will vary High performance and low power consumption Ability to be placed inside the ISP processing path for processing in the Bayer domain to support functions such as noise reduction and HDR or to support new types of sensors (IR sensors, or those that detect a mix of R, G, B, and IR pixels, for example) Small form factor Developing such DSPs calls for highly sophisticated development tools and a highly optimized imaging and vision library. Some applications require high-precision math, so the solution would need to be able to perform vector floating-point operations. New Image and Vision DSP Cadence s Tensilica Vision P5, the newest DSP in the Tensilica family, is an example of a specially designed processor for image and vision processing applications. Figure 3 shows the architecture and key features of this fourth-generation DSP. Data Memory 0 Bank 1 Bank 0 Data Memory 1 Bank 1 Bank 0 512 512 512 512 Memory 1 Memory 2 Cache Load/Store 128 Custom s idma Register File Vector Register File (32) 128 128 Decoder Cache Controller Timers, Interrupts, Performance Counters Power Management Pipeline Management Optional vfpu AXI4 Interface Memory Mux SuperGather Load Accumulator Register File (4) Predicate Register File Vector AXI4 Interface 128 128 AXI4 Figure 3: Tensilica Vision P5 architecture The Tensilica Vision P5 offers features ideal for image processing and analytics as well as video pre- and postprocessing: Step function in vision performance over the previous-generation Tensilica IVP-EP DSP 64-way SIMD engine 1024-bit memory interface with SuperGather technology Enriched support for 32-bit data types Customizable for even higher performance Support for a large number of OpenCV and OpenVX library functions, as well as a number of imaging/vision kernels for filtering, object detection, and optical flow Multi-processor ready www.cadence.com 4

Developed on the configurable Cadence Tensilica Xtensa processor, the Tensilica Vision P5 has an integrated 2D idma that provides high-bandwidth data transfer between local and system memory. The processor features an optional vector floating-point unit, error correction code (ECC) support for automotive qualification, and an ARM AMBA AXI4 interface to the system bus. The Tensilica Vision P5 offers a significant performance improvement over its predecessor (see Figure 4). Improvement on P5 (including clock increase) 16.00 14.00 12.00 10.00 8.00 6.00 4.00 2.00 0.00 Hough Line Transform Canny Lens Distortion Correction Perspective Warp Harris Corners Tree Classifier Affine Warp Face Detection Tensilica Vision P5 Speed Over Tensilica IVP-EP: 13X to 3X on Various Imaging and Vision Applications ORB Advanced Noise Reduction Figure 4: Performance enhancement of Tensilica Vision P5 vs. previous-generation DSP. Use Cases Morpho, Inc., an image processing technology company based in Tokyo, Japan, used an earlier generation Tensilica image/vision processing DSP for its video WDR, a single image-based luminance enhancement technology. By porting the video WDR to the DSP, Morpho achieved an 8X performance gain compared to its previous CPU-based solution. On the CPU, 1080p resolution at 30fps required 2.2GHz, compared to only 282MHz with the DSP, as shown in Figure 5. Since the Tensilica Vision P5 provides 2X to 4X better application-level performance, the performance number could be as low as 70MHz. Morpho Video WDR: 1080P 30FPS IVP-EP 282MHz CPU 2.2 GHz -300 200 700 1200 1700 2200 MHz Figure 5: Morpho achieved a substantial performance gain in its video WDR application after migrating to Tensilica DSP technology www.cadence.com 5

Uurmi Systems also took advantage of the Tensilica image/vision processing DSP family s energy efficiency, achieving a 15X fps/watts improvement over a mobile CPU+GPU solution in its fog removal application, as shown in Figure 6. Originally implemented in the previous-generation Tensilica IVP-EP DSP, this application is expected to achieve even greater benefits with the Tensilica Vision P5 because the new DSP is 5X more energy-efficient vs. its predecessor. 250 Fog Removal Energy Efficiency Fog Removal fps per watt 200 150 100 50 0 Dual Mobile CPU Mobile CPU + Mobile GPU Vision DSP 15X Higher FPS per Watt vs. Mobile CPU + Mobile GPU Figure 6: Uurmi Systems lowers energy consumption in fog removal application Summary Running computer vision algorithms on CPUs or a combination of CPUs/GPUs is a common approach, but has limitations, particularly with power consumption in mobile and wearable applications. In contrast, offloading to a DSP such as the Tensilica Vision P5 can yield substantial energy consumption savings while supporting the high workloads of image and vision processing applications. For Further Information Learn more about the Tensilica Vision P5 DSP at http://ip.cadence.com/ipportfolio/tensilica-ip/image-visionprocessing Footnote 1 Yole Developpement. (2015). The CMOS Image Sensors industry is about to change, with major investment in manufacturing & design [Press release]. Retrieved from http://www.yole.fr/cis_2015.aspx#.veidtm6asm4 Cadence Design Systems enables global electronic design innovation and plays an essential role in the creation of today s electronics. Customers use Cadence software, hardware, IP, and expertise to design and verify today s mobile, cloud and connectivity applications. www.cadence.com 2015 Cadence Design Systems, Inc. All rights reserved. Cadence, the Cadence logo, Tensilica, and Xtensa are registered trademarks of Cadence Design Systems, Inc. ARM and AMBA are registered trademarks of ARM Ltd. (or its subsidiaries) in the EU and/or elsewhere. All others are properties of their respective holders. 5152 09/15 CY/DM/PDF