DESIGN SOLUTION: A C U S T O M E R S U C C E S S S T O R Y Fujisoft solves graphics acceleration for the Android platform by Hiroyuki Ito, Senior Engineer Embedded Core Technology Department, Solution Business Division, Fujisoft The Project: an Android SoC The Android software platform is becoming increasingly The Design Team: Fujisoft is a design services and IP provider with deep experience in packaging FPGAbased hardware with software stacks to provide complete solutions to embedded design teams. Challenge: Embedded systems employing the Android platform frequently need graphics acceleration for the Android user interface, but most embedded hardware lacks the performance to provide a smooth experience. Solution: Fujisoft has developed a compact, low-powered graphics engine with the specific accelerations Android requires, based on a small FPGA. appealing to embedded designers because it provides a turnkey environment for developing a high-performance Linux-based system with rich user-interface and communications libraries. But many embedded developers are not sufficiently familiar with Android to be comfortable shopping for hardware components and bringing up their own hardware to support the platform. So Fujisoft is providing a package. Specifically, Fujisoft is providing the Graphics Accelerator IP Core needed to provide a smooth-running Android OS on the Altera SoC FPGAs. On top of that we will provide the Android Platform that includes the standard open source Android OS version 4.0.4 ICS running on top of the Linux Kernel from Altera s BSP. For more information: Fujisoft: www.fsi.co.jp/solution/android/e/ Email: global_embedded@fsi.co.jp Altera: www.altera.com/devices/processor/ soc-fpga/cyclone-v-soc/cyclone-v-soc.html
We expect that many FPGA users may not be so familiar with Android and Linux so we can also offer OS support services as well as full hardware/software designs (Figure 1). The Graphics Accelerator, Android PF and OS Support basically enables design teams to start developing their Android-based products using an Altera SoC. We can then support whatever development they re doing with full-system design services. Figure 1: The Android Stack Application Application Application Full Design Services Applications, Middlewares, OS, Drivers, Boards, FPGA and IP Middleware Android Anrooid PF and Driver for Altera SoC Open Source 4.0 OS, FSI PF Driver Linux Android and Linux OS Support Service Driver Graphic Accelerator Control Driver Graphics Accelerator IP Core and Driver Graphic Accelerator atop IP HPS FPGA Altera SoC Full Customization and Design Service 2
The Design Challenge Android has grown to be the number one Smartphone operating system. While Android continues to be the most popular OS for Smartphones and tablets, it is also being adopted for other embedded devices and industrial equipment. However, embedded-system designers rarely employ high-performance CPUs. But implementation using rather low-performing CPUs will slow down the Android drawing process and thus the user experience. Typical Android functions such as movement, rotation, and scaling (shrink & magnification) are an essential part of the user-interface metaphor. But these actions will appear unsmooth and slow due to a rather weak processing performance when you rely on the ARM CPU cores alone. The obvious solution is a hardware graphics accelerator. But today SoCs with hardware graphics acceleration for Android are typically aimed at smart phones, not embedded applications. And integrating a stand-alone graphics engine into an embedded design is fraught with device-choice, bandwidth, power, and verification issues, especially since proper operation of the graphics is so fundamental to the Android experience. The Design Solution Our solution to this challenge was to implement specific accelerations for Android graphics. Through analysis we isolated the functions that consumed the most CPU effort the heaviest Android drawing functions. By exploiting the very high bandwidth between the ARM hardware core and the logic fabric in the SoC FPGA, we were able to extract these functions from the ARM software and implement them inside the FPGA fabric. This division of labor not only speeds graphics operation restoring the smooth, natural feel to the user interface but it eases the processing bottleneck, with the result of faster performance for application code as well. Figure 2: Selecting graphics functions for acceleration Testing done on Multi-touch veek (P013) 800x400 Performance α Blending fps CPU Usage Total With Graphic Accellerator 55 fps Android Drawing 19% Image Rotation Without Graphic Accellerator 33 fps Android Drawing 49% Shrink and Magnification Layer Mixing Android Drawing refers to Application layer and User Interaction and does not include Linux and Android System. Size LE ALM SRAM 2DGE 5500 2400 262 Kbit (1024 bit x 32 bit x 8 line) + DSP 18 bit x 38 3
Our IP processes the heaviest 4 Android drawing functions α blending, image rotation, scaling, and layer mixing-in the Altera SoC FPGA fabric, as shown in Figure 2. The result is faster frames per second (FPS) performance and lower CPU Usage. Theory of Operation The following is an explanation of the processing flow from the point of initiating the drawing of an application to the point where the application can be output onto an LCD with Android. Every application renders images on each respective application s image rendering buffer. When this happens, as is standard with Android, an image rendering software library is used through an API such as OpenGL ES, as shown in Figure 3. Figure 3: Graphics Accelerator IP Core Theory of Operation App Process 1 App Process 2 App Process 3 App Process X App 1 App 2 App 3 App X 1 Image Process Software Draw Libraries (OpenGL ES1.0, Skia, etc.) 2 3 Software Draw Libraries (OpenGL ES1.0) Point 1 Point 3 Software Libraries Graphics Accelerator Image Synthesizing Point 2 Software Libraries Software Draw Libraries (OpenGL ES1.4) Image Synthesizing Point 4 Graphics Accelerator X Every application uses Software Libraries (OpenGL ES, etc.) to perform 3D image rendering, etc., on the application s image rendering buffer The data from the application s image rendering buffer is blended to the image synthesizing buffer LCD Output Previous Software Library usage sequence Accelerated sequence using Graphics Accelerator LCD Output The data from the image synthesizing buffer is transferred to the LCD output buffer 4
Next, the SurfaceFlinger function blends together the multiple layers that have been written on the application s image rendering buffer. In order to create one frame, the layers are blended to an image-synthesizing buffer while using an image-rendering software library through the OpenGL ES API. (See Point 1 in Figure 3) At this point, traditionally the ARM CPU alone must perform the extremely heavy processing needed for many Android drawing functions such as α-blending, scaling (shrink and magnification) and layer mixing etc. This causes a bottleneck that slows down the Android user experience. Lastly, the picture blended on the image synthesizing buffer is transferred to the LCD output buffer. The final LCD picture is sent through the standard Android image rendering software library called API EGL, at Point 2 in Figure 3. Implementing our graphics accelerator will ease the bottleneck that occurs in points 1 & 2 above. The accelerator takes the heaviest Android Drawing functions α-blending, scaling, layer mixing and rotation away from the ARM CPU and processes them within the FPGA fabric. See Points 3 and 4 in Figure 3. The red line illustrated by Points 3 and 4 in the diagram represent the improved Android drawing process whereby the Arm CPU is complemented by the accelerator inside the FPGA. This results in an improved frame rate processing performance thus creating a user experience similar to that of a Smartphone. Figure 4: Accelerator Block Diagram DRAM CPU Avalon-MM Slave Host Interface Memory Avalon-MM Slave DMA Control Scaler Mixer Avalon-MM Slave DRAM DRAM 5
Actual implementation of the accelerator exploits the high-bandwidth connections from the ARM system bus into the logic fabric, as shown in Figure 4. We employed three memory-mapped interfaces two bus-master interfaces and one slave interface, to handle image-processing data flows and set-up/control flow, respectively, between the ARM core and the fabric. Results We implemented the graphics accelerator and Android platform on an Altera Cyclone V SoC FPGA, using the Altera Cyclone V SoC development kit and a compatible Terasic multi-touch LCD module, with a 4 GByte (or greater) MicroSD card for storage. The package as implemented includes Android 4.0.4 (ICS), Linus kernel 3.7.0, and U-boot version 2012.10. In this system we measured the following results: With Graphics Accelerator for Android Without Graphics Accelerator for Android CPU Usage (%) 19.0 48.8 Frame Rate (fps) 55 33 These figures show that we have indeed achieved significant off-loading of the ARM CPU. And user interaction with the device verifies that the look and feel of the user interface dynamics are the same as one would experience on a smart phone. We believe that this implementation of selected graphics primitives in the fabric of an FPGA illustrates the importance of a very high-bandwidth, low-latency connection between the Android host CPU and the accelerators, and the value of programmable logic as a means of implementing hardware acceleration in an embedded environment. Altera Corporation 101 Innovation Drive San Jose, CA 95134 USA www.altera.com Altera European Headquarters Holmers Farm Way High Wycombe Buckinghamshire HP12 4XF United Kingdom Telephone: (44) 1494 602000 Altera Japan Ltd. Shinjuku i-land Tower 32F 6-5-1, Nishi-Shinjuku Shinjuku-ku, Tokyo 163-1332 Japan Telephone: (81) 3 3340 9480 www.altera.co.jp Altera International Ltd. Unit 11-18, 9/F Millennium City 1, Tower 1 388 Kwun Tong Road Kwun Tong Kowloon, Hong Kong Telephone: (852) 2945 7000 2014 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, ENPIRION, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark Office and are trademarks or registered trademarks in other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders as described at www.altera.com/legal. DS-1000