GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102 Introduction GEDAE TM is an advanced graphical development and automatic code generation tool, which is revolutionizing the design and development of digital signal processing systems. It enables designers to capture signal-processing applications in a hardware-independent graphical representation. Designers can then partition and map the application to a variety of commercial multiprocessor embedded hardware architectures and generate real-time software using target specific vendor supplied vector libraries. For truly embedded systems, the application can be controlled from an external program independent from the development environment. The GEDAE TM visualization tools display all hardware and software activity on the target embedded system, including processing, interprocessor com-munications, and buffer activity, enabling a level of optimization equivalent to or surpassing that achievable through hand coding. This paper describes the capa-bilities and features of GEDAE TM. Design Process The design process for designing embedded signal processors using GEDAE TM is shown in Figure 1. The design process begins with an Virtual Prototyping Candidate Architecture(s architecture-independent data flow graph representing the signal processing algorithms for the intended application. The function boxes or nodes in the graph represent a processing function, such as FFT or FIR filter function. The lines in the graph represent data flowing between the nodes. The graph nodes are mapped to the multiple processors in the architecture, and performance estimates are generated by simulation. This allocation is initially performed using engineering judgement but it may be modified as virtual prototyping trade-off studies proceed. Virtual prototyping provides the ability to simultaneously simulate the hardware and software design prior to building the hardware. Virtual prototyping is used to evaluate Figure 1. GEDAE Design Process Simulation Application Data Flow Graph Partitioning / Mapping Scheduling Analysis Execution on Real Hardware Autocoding Target Hardware
alternative approaches to partitioning of the processing nodes and mapping of partitions to processors. Once a satisfactory partitioning and mapping scheme is determined, the architecture independent data flow graph is transformed into an architecture specific set of software executables by autocoding. Autocoding is the process of generating software automatically from the partitioned and mapped data flow graph. The functions executed by the nodes in the graph are reusable library elements. The function libraries are completely architecture independent and are converted by autocoding into architecture specific optimized vendor specific library calls. Both processing function and communication is autocoded from the data flow graph representation. As depicted in Figure 1, the design process is iterative. Design Environment GEDAE provides a unified graphical environment to develop signal processing systems. Typical process improvements are summarized in Figure 2. GEDAE is comprised of a workstation development environment and target specific run-time kernels for embedded targets. The workstation development environment provides the capability required for developing data flow graphs and validating their functionality. Included is support for mapping the data flow graph to multiple processors, autocoding the application to run on those processors, and visualization of performance. Cost/Performance Improvement Development Time >5x Integration and Test Time >10x Processor/Memory Efficiency ~1x Figure 2. GEDAE vs. Conventional Process Improvement The user environment is common to both workstation and embedded multiprocessor applications so it is not necessary to switch tools when moving from algorithm development to the generation and optimization of code for embedded systems. The application developer never needs to write any interprocessor communication software for multiprocessor implementations. In fact, this may be the greatest benefit of graph based programming for multiprocessors, because multiprocessor communication is responsible for most of the debugging problems for large applications. Algorithm Capture: Algorithms are captured in GEDAE by placing processing function boxes extracted from a library on a work area and interconnecting them. Designers can create graphs by selecting from a large library of standard functions. Templates are provided to create new library primitives and new data types. Custom primitives are created using standard C syntax. Graphs can be hierarchical to any depth required by the application. Unique graphical syntax consisting of families and route boxes support succinct description of parallelism. The algebraic description of graphs via parameterized families and routing enables automatic graph restructuring to support parallelism. The upper left corner of Figure 3 shows an example of a GEDAE flow graph. Data Flow and Functional Validation: Execution of data flow graphs is controlled through the same interface used to construct the graphs. There are several ways to observe the execution of a graph from both a hardware and software perspective. There are dynamic displays that let users see what is occurring while the graph is executing, and static displays that collect detailed information in the background for subsequent Figure 3. Example Application Graph Generated Using GEDAE
analysis. Scopes, such as shown in the upper right corner of Figure 3, and monitors can be inserted into a graph to facilitate observations during execution. Event timing data can be collected in the background while a graph is executing and the information stored until the Trace Table display is requested by the developer. The Trace Table, shown in the lower left corner of Figure 3, contains detailed time line information for system analysis. Virtual Prototyping: CSIM is a C language based virtual prototyping tool that is currently being integrated with GEDAE. CSIM provides a natural and powerful description of a parallel processor algorithm mapping on a described architecture. It can describe the function of each device in a system in terms of time delays for computation and I/O and its interaction with the rest of the system. It can support interconnecting the models of each device according to arbitrary topologies and run discrete event simulations of the described system. Finally, using the resulting system model, CSIM can be used to investigate the effects of link bandwidth in conjunction with the network architecture (buses, rings, meshes, etc.) and used to investigate the performance of algorithm mappings onto the modeled architectures. The completed interface of CSIM and GEDAE will permit a user to develop an application, establish correct functionality, graphically define a virtual architecture, map the application to the virtual architecture, predict performance on the vir-tual architecture, and autocode the parti-tioned and mapped system for execution on the target hardware. Embedded Code Generation: Once the data flow and functionality have been verified and a partitioning and mapping scheme have been determined, GEDAE generates the execution schedules for each of the embedded processors. As shown in the lower right corner of Figure 3, the mapping table is used to specify partition assignments. The schedule generation process maximizes the use of static scheduling to minimize overhead, but it preserves dynamic behavior where required. A schedule may be divided into multiple sub-schedules, which may all operate at different firing granularities to optimize performance. The code is then automatically compiled, linked, loaded and executed on the embedded hardware. The library functions used to construct the graph are linked to the optimized math library provided by the hardware vendor to achieve optimum performance. A Run-Time Kernel residing on each of the embedded processors supports the execution of the autocoded application.
Schedules can be viewed using the Schedule Display. Schedules are presented with the graph functions listed down the left side of the table in their order of execution. For each entry in the Schedule Display, memory information and execution time is presented. When executing on multiple processors, the Trace Table reflects the presence of additional processors and the fact that communication occurs between them. Computation time, data flow activity (queues filling and emptying), and communication (sending, receiving and local memory copies) are all detailed in the Trace Table. Optimization: The types of optimization that are supported for embedded execution include interactive partitioning and mapping, memory usage, communication mechanisms selection for inter-partition links, schedule firing granularity, queue capacities, and scheduling options. The group control dialog is the interface to all optimization mechanisms that give designers control over the optimization and execution characteristics of applications and assist the designer in attaining optimized performance for the application. Stand-Alone Operation: Embedded applications must be capable of execution independent of a workstation and display. GEDAE enables autocoded applications to be targeted for stand-alone operation. To support this mode of operation, GEDAE provides a software API that facilitates controlling graphs from other software such as higher level control software. The API provides a set of functions that may be called to start and stop graphs, set parameters, read and write data to the graph, and connect graphs to other graphs. These capabilities provide the ability to develop applications using the analyses facilities of the development environment with the capability to divorce the application from that environment and control it from external software. Currently, GEDAE provides a set of functions used to instantiate, control, and configure the application graph. Future improvements will extend support of control software development to include some fine grain control in GEDAE autocoding. A prototype tool known as Application Interface Builder (AIB), which autocodes control software, has been developed. Near term efforts on control software autocoding will focus on the refinement of the AIB tool with the intent of incorporating the tool into the development environment. Longer-term efforts include the development of graphical methods for specifying the control software and providing co-simulation with data flowgraphs. Demonstration and Benefits GEDAE has been shown to provide many benefits, including increased productivity and easier application retargeting, which provides the ability for designers to leverage the hardware technology curve. Rapid Prototyping/Portability: A synthetic aperture radar (SAR) application was originally hand-generated for Mercury Computer Systems RaceWay architecture and then was re-implemented using GEDAE. The resulting autocoded application achieved the same execution and memory efficiency as the hand-coded version with about a 10X reduction in implementation time. The same GEDAE application was correctly remapped to several different commercial signal processing architectures including Mercury PowerPC, Sharc and I860, Ixthos Sharc and Alex Sharc by simply repartitioning and remapping the application to the new
architecture. These remappings were accomplished in hours. Re-Use of Legacy Software: A fifty thousand-line sonar algorithm, developed by the Navy, was converted into GEDAE data flow graphs in less than twelve weeks. Once converted, the application was distributed for real-time operation on a Mercury PowerPC architecture. Test, integration, and optimization on the target architecture took four weeks. Optimization of Large Systems: The Semi- Automated IMINT Processing (SAIP) application utilized 4 Alex Computer Systems Sharc boards with 18 Sharcs per board to meet real-time performance requirements. As depicted in Figure 4, the GEDAE TM virtual prototyping and autocoding process enabled efficient implementation of this 72-processor system. Detailed virtual prototyping verified HW/SW mapping and network communication bandwidth performance, and it established the final executable timing and memory specification. In the final design, Sharc memory was over 90% utilized, as was the processor loading. The utilization of virtual prototyping and auto-coding the SAIP benchmark delivered a 100x improvement in throughput density and reduced the hardware cost enough to offset development costs for the first system. Summary A hardware/software codesign methodology utilizing virtual prototyping and autocoding tools reduces system costs. Productivity improvements of 5x in software development and 10x in integration and test have been demonstrated. Such improvement lead to lower system cost and faster time to market. Improved application portability and retargetability significantly reduce the cost of migrating applications from one hardware platform to another and provide the ability to easily leverage the hardware technology development curve. Because communication software is automatically generated, retargeting applications to new hardware and reoptimizing can be achieved in weeks or even days. System Architecture Model HighClass Data Flow Graph Software Model Figure 4. Virtual Prototype System Alex SHARC Board Model Alex SharcPac Model Final System Hardware Configuration