GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications



Similar documents
Architectures and Platforms

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Real-time Process Network Sonar Beamformer

CHAPTER 4: SOFTWARE PART OF RTOS, THE SCHEDULER

Multi-GPU Load Balancing for Simulation and Rendering

Optimizing Configuration and Application Mapping for MPSoC Architectures

A Software and Hardware Architecture for a Modular, Portable, Extensible Reliability. Availability and Serviceability System

Scalability and Classifications

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Weighted Total Mark. Weighted Exam Mark

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU

FPGA area allocation for parallel C applications

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

Parallel Computing. Benson Muite. benson.

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Rapid System Prototyping with FPGAs

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Intel DPDK Boosts Server Appliance Performance White Paper

SOC architecture and design

Interconnection Networks

Getting Embedded C Applications to Market Faster using the Model-Driven Development Technologies of Modeling, Simulation and Code Generation

Introduction to Digital System Design

A Case Study - Scaling Legacy Code on Next Generation Platforms

System Software Integration: An Expansive View. Overview

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

A Comparison of General Approaches to Multiprocessor Scheduling

PCI Express Overview. And, by the way, they need to do it in less time.

OpenText Information Hub (ihub) 3.1 and 3.1.1

Performance Analysis and Optimization Tool

OPNET Network Simulator

Introduction to MATLAB Gergely Somlay Application Engineer

evm Virtualization Platform for Windows

Multi-objective Design Space Exploration based on UML

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

MAQAO Performance Analysis and Optimization Tool

SAN Conceptual and Design Basics

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Layered Approach to Development of OO War Game Models Using DEVS Framework

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Software Development with Real- Time Workshop Embedded Coder Nigel Holliday Thales Missile Electronics. Missile Electronics

SCADE System Technical Data Sheet. System Requirements Analysis. Technical Data Sheet SCADE System

International Workshop on Field Programmable Logic and Applications, FPL '99

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Real-Time Operating Systems for MPSoCs

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

7a. System-on-chip design and prototyping platforms

OPNET - Network Simulator

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

BSC vision on Big Data and extreme scale computing

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Control 2004, University of Bath, UK, September 2004

Applying Multi-core and Virtualization to Industrial and Safety-Related Applications

Resource Aware Scheduler for Storm. Software Design Document. Date: 09/18/2015

Load DynamiX Storage Performance Validation: Fundamental to your Change Management Process

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

White Paper. Requirements of Network Virtualization

Overlapping Data Transfer With Application Execution on Clusters

Freescale Semiconductor, I

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

System-on. on-chip Design Flow. Prof. Jouni Tomberg Tampere University of Technology Institute of Digital and Computer Systems.

Tool Support for Inspecting the Code Quality of HPC Applications

To introduce software process models To describe three generic process models and when they may be used

CDC UNIFIED PROCESS JOB AID

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Universal Flash Storage: Mobilize Your Data

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

Efficient Load Balancing using VM Migration by QEMU-KVM

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Optimizing Linux Performance

Software Engineering. Software Processes. Based on Software Engineering, 7 th Edition by Ian Sommerville

The Virtualization Practice

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

An Implementation Of Multiprocessor Linux

Notes and terms of conditions. Vendor shall note the following terms and conditions/ information before they submit their quote.

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Multicore Programming with LabVIEW Technical Resource Guide

Driving force. What future software needs. Potential research topics

What s New in Mike Bailey LabVIEW Technical Evangelist. uk.ni.com

EnduraData Cross Platform File Replication and Content Distribution (November 2010) A. A. El Haddi, Member IEEE, Zack Baani, MSU University

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

EHOOKS Prototyping is Rapid Again

A Review of Customized Dynamic Load Balancing for a Network of Workstations

A Lab Course on Computer Architecture

Automotive Software Engineering

RevoScaleR Speed and Scalability

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Embedded/Real-Time Software Development with PathMATE and IBM Rational Systems Developer

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Transcription:

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102 Introduction GEDAE TM is an advanced graphical development and automatic code generation tool, which is revolutionizing the design and development of digital signal processing systems. It enables designers to capture signal-processing applications in a hardware-independent graphical representation. Designers can then partition and map the application to a variety of commercial multiprocessor embedded hardware architectures and generate real-time software using target specific vendor supplied vector libraries. For truly embedded systems, the application can be controlled from an external program independent from the development environment. The GEDAE TM visualization tools display all hardware and software activity on the target embedded system, including processing, interprocessor com-munications, and buffer activity, enabling a level of optimization equivalent to or surpassing that achievable through hand coding. This paper describes the capa-bilities and features of GEDAE TM. Design Process The design process for designing embedded signal processors using GEDAE TM is shown in Figure 1. The design process begins with an Virtual Prototyping Candidate Architecture(s architecture-independent data flow graph representing the signal processing algorithms for the intended application. The function boxes or nodes in the graph represent a processing function, such as FFT or FIR filter function. The lines in the graph represent data flowing between the nodes. The graph nodes are mapped to the multiple processors in the architecture, and performance estimates are generated by simulation. This allocation is initially performed using engineering judgement but it may be modified as virtual prototyping trade-off studies proceed. Virtual prototyping provides the ability to simultaneously simulate the hardware and software design prior to building the hardware. Virtual prototyping is used to evaluate Figure 1. GEDAE Design Process Simulation Application Data Flow Graph Partitioning / Mapping Scheduling Analysis Execution on Real Hardware Autocoding Target Hardware

alternative approaches to partitioning of the processing nodes and mapping of partitions to processors. Once a satisfactory partitioning and mapping scheme is determined, the architecture independent data flow graph is transformed into an architecture specific set of software executables by autocoding. Autocoding is the process of generating software automatically from the partitioned and mapped data flow graph. The functions executed by the nodes in the graph are reusable library elements. The function libraries are completely architecture independent and are converted by autocoding into architecture specific optimized vendor specific library calls. Both processing function and communication is autocoded from the data flow graph representation. As depicted in Figure 1, the design process is iterative. Design Environment GEDAE provides a unified graphical environment to develop signal processing systems. Typical process improvements are summarized in Figure 2. GEDAE is comprised of a workstation development environment and target specific run-time kernels for embedded targets. The workstation development environment provides the capability required for developing data flow graphs and validating their functionality. Included is support for mapping the data flow graph to multiple processors, autocoding the application to run on those processors, and visualization of performance. Cost/Performance Improvement Development Time >5x Integration and Test Time >10x Processor/Memory Efficiency ~1x Figure 2. GEDAE vs. Conventional Process Improvement The user environment is common to both workstation and embedded multiprocessor applications so it is not necessary to switch tools when moving from algorithm development to the generation and optimization of code for embedded systems. The application developer never needs to write any interprocessor communication software for multiprocessor implementations. In fact, this may be the greatest benefit of graph based programming for multiprocessors, because multiprocessor communication is responsible for most of the debugging problems for large applications. Algorithm Capture: Algorithms are captured in GEDAE by placing processing function boxes extracted from a library on a work area and interconnecting them. Designers can create graphs by selecting from a large library of standard functions. Templates are provided to create new library primitives and new data types. Custom primitives are created using standard C syntax. Graphs can be hierarchical to any depth required by the application. Unique graphical syntax consisting of families and route boxes support succinct description of parallelism. The algebraic description of graphs via parameterized families and routing enables automatic graph restructuring to support parallelism. The upper left corner of Figure 3 shows an example of a GEDAE flow graph. Data Flow and Functional Validation: Execution of data flow graphs is controlled through the same interface used to construct the graphs. There are several ways to observe the execution of a graph from both a hardware and software perspective. There are dynamic displays that let users see what is occurring while the graph is executing, and static displays that collect detailed information in the background for subsequent Figure 3. Example Application Graph Generated Using GEDAE

analysis. Scopes, such as shown in the upper right corner of Figure 3, and monitors can be inserted into a graph to facilitate observations during execution. Event timing data can be collected in the background while a graph is executing and the information stored until the Trace Table display is requested by the developer. The Trace Table, shown in the lower left corner of Figure 3, contains detailed time line information for system analysis. Virtual Prototyping: CSIM is a C language based virtual prototyping tool that is currently being integrated with GEDAE. CSIM provides a natural and powerful description of a parallel processor algorithm mapping on a described architecture. It can describe the function of each device in a system in terms of time delays for computation and I/O and its interaction with the rest of the system. It can support interconnecting the models of each device according to arbitrary topologies and run discrete event simulations of the described system. Finally, using the resulting system model, CSIM can be used to investigate the effects of link bandwidth in conjunction with the network architecture (buses, rings, meshes, etc.) and used to investigate the performance of algorithm mappings onto the modeled architectures. The completed interface of CSIM and GEDAE will permit a user to develop an application, establish correct functionality, graphically define a virtual architecture, map the application to the virtual architecture, predict performance on the vir-tual architecture, and autocode the parti-tioned and mapped system for execution on the target hardware. Embedded Code Generation: Once the data flow and functionality have been verified and a partitioning and mapping scheme have been determined, GEDAE generates the execution schedules for each of the embedded processors. As shown in the lower right corner of Figure 3, the mapping table is used to specify partition assignments. The schedule generation process maximizes the use of static scheduling to minimize overhead, but it preserves dynamic behavior where required. A schedule may be divided into multiple sub-schedules, which may all operate at different firing granularities to optimize performance. The code is then automatically compiled, linked, loaded and executed on the embedded hardware. The library functions used to construct the graph are linked to the optimized math library provided by the hardware vendor to achieve optimum performance. A Run-Time Kernel residing on each of the embedded processors supports the execution of the autocoded application.

Schedules can be viewed using the Schedule Display. Schedules are presented with the graph functions listed down the left side of the table in their order of execution. For each entry in the Schedule Display, memory information and execution time is presented. When executing on multiple processors, the Trace Table reflects the presence of additional processors and the fact that communication occurs between them. Computation time, data flow activity (queues filling and emptying), and communication (sending, receiving and local memory copies) are all detailed in the Trace Table. Optimization: The types of optimization that are supported for embedded execution include interactive partitioning and mapping, memory usage, communication mechanisms selection for inter-partition links, schedule firing granularity, queue capacities, and scheduling options. The group control dialog is the interface to all optimization mechanisms that give designers control over the optimization and execution characteristics of applications and assist the designer in attaining optimized performance for the application. Stand-Alone Operation: Embedded applications must be capable of execution independent of a workstation and display. GEDAE enables autocoded applications to be targeted for stand-alone operation. To support this mode of operation, GEDAE provides a software API that facilitates controlling graphs from other software such as higher level control software. The API provides a set of functions that may be called to start and stop graphs, set parameters, read and write data to the graph, and connect graphs to other graphs. These capabilities provide the ability to develop applications using the analyses facilities of the development environment with the capability to divorce the application from that environment and control it from external software. Currently, GEDAE provides a set of functions used to instantiate, control, and configure the application graph. Future improvements will extend support of control software development to include some fine grain control in GEDAE autocoding. A prototype tool known as Application Interface Builder (AIB), which autocodes control software, has been developed. Near term efforts on control software autocoding will focus on the refinement of the AIB tool with the intent of incorporating the tool into the development environment. Longer-term efforts include the development of graphical methods for specifying the control software and providing co-simulation with data flowgraphs. Demonstration and Benefits GEDAE has been shown to provide many benefits, including increased productivity and easier application retargeting, which provides the ability for designers to leverage the hardware technology curve. Rapid Prototyping/Portability: A synthetic aperture radar (SAR) application was originally hand-generated for Mercury Computer Systems RaceWay architecture and then was re-implemented using GEDAE. The resulting autocoded application achieved the same execution and memory efficiency as the hand-coded version with about a 10X reduction in implementation time. The same GEDAE application was correctly remapped to several different commercial signal processing architectures including Mercury PowerPC, Sharc and I860, Ixthos Sharc and Alex Sharc by simply repartitioning and remapping the application to the new

architecture. These remappings were accomplished in hours. Re-Use of Legacy Software: A fifty thousand-line sonar algorithm, developed by the Navy, was converted into GEDAE data flow graphs in less than twelve weeks. Once converted, the application was distributed for real-time operation on a Mercury PowerPC architecture. Test, integration, and optimization on the target architecture took four weeks. Optimization of Large Systems: The Semi- Automated IMINT Processing (SAIP) application utilized 4 Alex Computer Systems Sharc boards with 18 Sharcs per board to meet real-time performance requirements. As depicted in Figure 4, the GEDAE TM virtual prototyping and autocoding process enabled efficient implementation of this 72-processor system. Detailed virtual prototyping verified HW/SW mapping and network communication bandwidth performance, and it established the final executable timing and memory specification. In the final design, Sharc memory was over 90% utilized, as was the processor loading. The utilization of virtual prototyping and auto-coding the SAIP benchmark delivered a 100x improvement in throughput density and reduced the hardware cost enough to offset development costs for the first system. Summary A hardware/software codesign methodology utilizing virtual prototyping and autocoding tools reduces system costs. Productivity improvements of 5x in software development and 10x in integration and test have been demonstrated. Such improvement lead to lower system cost and faster time to market. Improved application portability and retargetability significantly reduce the cost of migrating applications from one hardware platform to another and provide the ability to easily leverage the hardware technology development curve. Because communication software is automatically generated, retargeting applications to new hardware and reoptimizing can be achieved in weeks or even days. System Architecture Model HighClass Data Flow Graph Software Model Figure 4. Virtual Prototype System Alex SHARC Board Model Alex SharcPac Model Final System Hardware Configuration