Analysis of Hardware and Software Approaches to Embedded In-Circuit Emulation of Microprocessors Hsin-Ming Chen, Chung-Fu Kao and Ing-Jer Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University Kaohsiung, Taiwan {hmchen,cfkao,ijhuang}@cse.nsysu.edu.tw Abstract This paper investigates various approaches to embed the functionality of in-circuit emulation (ICE) into microprocessor cores in SoC (System-On-Chip) chips. Three styles of ICE s (hardware-oriented, software-oriented and hybrid) are defined and implemented. They are integrated with a synthesizable ARM7 microprocessor core and synthesized to gate level to quantitatively analyze and compare their performance, cost and debugging features. 1. Introduction An in-circuit emulator (ICE) is an important tool for the development of microprocessor-based systems. For the past years, approaches such as external ICE boxes with probing sockets or ROM Monitors (a small monitoring software residing in ROM) are commonly used. External ICE boxes are complex and expensive piece of hardware so their use is usually limited to the debugging phase of the microprocessor-based systems that involve hardware/software integration or investigation of real time I/O or bus events. On the other hand, a ROM monitor is a software-oriented approach and is less expensive, but provides less observability for the microprocessor s operations. In either case, the design practice of the ICE devices is usually independent to the design task of the microprocessor itself. The performance and cost of the ICE devices are not relevant to the microprocessors since they are two different entities. The ICE s are used only during the development/debugging of the microprocessor-based systems by substituting the original microprocessor on the socket with the ICE. The ICE is unplugged after debugging and the original microprocessor is placed back into the socket for normal operations of the microprocessor-based systems. Therefore, the performance and cost of the ICE s do not impact these of the microprocessor-based systems because the ICE s do not exist in the system during normal operations. However, in the era of system-on-chip (SoC), the considerations of the ICE s and the microprocessors are no longer irrelevant. In an SoC implementation of a microprocessor-based system, the in-circuit emulator has to be integrated with the microprocessor core early in the design stage, and it permanently reside in the SoC chip, during both the development phase and the normal operation phase of the SoC chip. There is no way to remove it out of the SoC chip. Therefore, the architecture and implementation of the ICE deeply impact the performance and cost of the SoC chip. In this paper we investigate the possible approaches to embedding an ICE into a microprocessor in the SoC chip. We then define three feasible solutions (software, hardware and hybrid) and integrate them with a synthesizable ARM7 microprocessor core. The microprocessors with the embedded ICE s are synthesized and simulated to analyze and compare the corresponding hardware/software performance and cost and the debugging features of these approaches. 1.1. Background The purpose of in-circuit emulation is to emulate the behavior of a microprocessor to support the development of embedded software and hardware (Apply Microsystems). It can accept run-control commands to execute debugging functions, such as loading, forcing stop or executing the user program, reading or writing registers/memory, single stepping, setting breakpoints. The development system diagram is shown in Fig.1. Host Computer Dump Register or Memory Edit, Complier and Link Downloader Debug Software Debug Status Control Command (Go,Stop,Step,Breakpoint) Protocol Converter Set Breakpoint Download Program Target System in-circuit emulation Target Processor Copyright 2002, Australian Computer Society, Inc. This paper appeared at the Seventh Asia-Pacific Computer Systems Architecture Conference (ACSAC 2002), Melbourne, Australia. Conferences in Research and Practice in Information Technology, Vol. 6. Feipei Lai and John Morris, Eds. Reproduction for academic, not-for-profit purposes permitted provided this text is included. Fig. 1 Microprocessor based development system
2. Classification of In-Circuit Emulation Approaches 2.1. The Classification Methodology We can divide the operations of in-circuit emulation into two operation modes i. ForeGround Debug Mode (FGDM) In this mode, the in-circuit emulator has full control of the target processor. It accepts commands from the host to control the execution of the target processor or access its internal information. We can use hardware or software methods to implement the FGDM mode. In the software approach, the in-circuit emulation is a software code executed by the target processor. It needs a passive communication device, such as UART, driven by the target processor to communication with the host computer. In the hardware approach, the in-circuit emulation has a set of logic circuitry to generate control signals, which can halt or release the execution of the target processor and has an active communication device which can communication with host or emulator to execute other debug function when processor is halted. ii. BackGround Debug Mode (BGDM) In this mode, the target processor is executing the normal user program and the in-circuit emulator is monitoring the system status in the background. If system status matches the event value that is previously defined by the user in the FGDM, the in-circuit emulator will switch the operation mode back to FGDM. The BGDM operation mode can also be implemented with either software or hardware methods. In the software approach, there are two types of approaches software monitor and instruction patching. The software monitor uses software to check the execution result of each user instruction. It requires the processor to provide the single-step exception for automatic checking. On the other hand, the approach of instruction patching is to insert special instructions at specific locations of the user program. When the patched instruction is executed, the debug mode will switch into FGDM. In the hardware approach, it usually has a hardware comparator to monitor address, data, and some control signals. iii. Debug mode switching If the debug operation is switched from FGDM to BGDM, the user application program will resume its normal execution as if nothing happens. If the debug operation is switched from BGDM to FGDM, the execution of user application program is interrupted and the control will be changed to FGDM. At this time, the importance task is to save the system status for further investigation. iv. The classification of emulation approaches Emulation approaches can be classified into Software Emulation, Hardware Emulation, or Hybrid Emulation. The classification is shown in Table1. In the following sections, we define the approaches with corresponding industrial examples. Software Emulation Hardware Emulation Hybrid Emulation FGDM BGDM Examples BUFFLO Monitor Program HW HW ARM7TDMI HW HW Intel x86 Debug Register Motorola s Background Debug Mode (BDM) Table1 The classification of emulation approaches 2.2. Software Emulation The software emulation means that both FGDM and BGDM are implemented using the software approach. The Table 2 list the comparison of Intel x86 Debugging and BUFFLO Monitor Program. Processor Patching code (Breakpoint) Single-Step Intel x86 Debugging Intel 386,486,Pentium 0CCh (INT 3) Set TF flag enable. (Single- Step trap) BUFFLO Monitor Program Motorola 68HC11 EVB I Set OC5timer counter value to cause an interrupt after a prescribed number of clock cycles. Table 2 The comparison of two software approaches i. Intel x86 DEBUGGING Intel s X86 processor has three debug functions for the BGDM. The following mechanisms are used for the software approach Breakpoint Instruction and Single- Step Trap (INTEL Ltd. 1994). Breakpoint Instruction The breakpoint trap is caused by executing the INT3 instruction. Typically, a debugger prepares a breakpoint to replace the opcode byte of the INT3 for the breakpoint. Single-Step Trap The trap occurs if the TF flag isset before each instruction is executed. This trap will call the INT1 interrupt handle. ii. THE BUFFALO MONITOR PROGRAM The Buffalo monitor is part of the 68HC11 evaluation board (JOSEPH D. G. and WILLIAM C. W. 1988). The
debug routines usually include several utility functions. When this monitor program is included in the system, it must reside at the top of memory. The RESET vector directs the processor to start executing the initialization routine. It then enters the command loop and wait for input command from the keyboard. When we set breakpoints, the breakpoint routines temporarily insert a SoftWareInterrupt (I) instruction to replace user program instruction. When program run up to the I instruction, the I service routine displays the contents of the processor registers, and returns to the debug command loop. Memory and registers can be examined to see if they contain the user expected data. 2.3. Hardware Emulation The Hardware Emulation means that both FGDM and BGDM are implemented with hardware. This kind of BGDM uses a hardware comparator to detect breakpoint events, and active debug circuitry to control the execution of the processor. Many modern hardware emulators are built upon the IEEE 1149.1 JTAG port, which is a specification for boundary-scan testing at chip or board level (TOM WILLIAMS 2000, ING-JER HUANG AND TAI-AN LU 1999, P. C. CHING, Y. H. CHENG and M. H. KO 1994). It consists of scan chains that provide test access by allowing data to be shifted in or out of the chain in a serial fashion. Although JTAG is not originally intended for software debugging, it can be used for this purpose by inserting appropriate command or opcode into the internal register and control the processor execution. The disadvantage of the JTAG-based debugging because of the length of the chain and the operations require that the scan chain be traversed multiple times. The advantage of debugging via the JTAG port is that it used processor pins that are already dedicated and therefore does not require any additional pins. For example, the D and I of ARM7TDMI is the optional facility of on-chip debugging supported by ARM Ltd. The D is Debug Interface, which is based on JTAG IEEE Std. 1149.1 and extended allow the core to be halted by ICEBreaker or debug-request signal. The I of ARM7TDMI is ICEBreaker, which is a comparator. It can store breakpoint value form Debug interface (STEVE FURBER, 2000). Another example is AMDebug interface, which included on the ÉlanSC520 microcontroller and provides the product design team with two different communication paths. The Serial AMDebug technology uses a serial connection based on an enhanced JTAG protocol. The Parallel AMDebug technology uses a 25-pin parallel debug port to exchange commands and data between the ÉlanSC520 microcontroller and the host (Advanced Micro Devices Inc. 1999). processors use hardware comparators for BGDM and using software approach for FGDM. The notable feature is the hardware comparator would cause an exception when the matching event occurs and then processor executes the exception handle to switch into FGDM. There are some examples. i. Intel x86 DEBUG REGISTER This is the third on-chip debugging function provided by x86 processor (INTEL Ltd. 1994). There are six registers for control debugging accessed with forms of the MOV instruction. The debug registers are privileged resources. The MOV instructions that access them can be executed only at privilege level 0. There are four debug address registers (DR0-DR3). Each of these registers holds the linear address for one of the four breakpoints, which means that breakpoint comparisons are made before physical address translation occurs. Each breakpoint condition is specified further by the contents of the debug control register (DR7). ii. Motorola MCF5407 Real-Time Debug Support The ColdFire Family provides support to debug real-time applications. For these types of embedded systems, the processor must continue to operate during debug. Debug interrupts let real-time systems execute a interrupt routine, which can quickly save the contents of key registers and variables and program execution resume. 3. Case Study Implementations for the software, hardware and hybrid approaches with a synthesizable ARM7 microprocessor core In this section, we use a synthesizable ARM7 processor to implement three emulation approaches. We will present the algorithm of software approach and the architecture of hardware approach. We will also illustrate how co-work by software and hardware in the Hybrid Emulation. 3.1. Software Emulation We design the FGDM using software approach. The FGDM of software approaches we called SoftFGDM, which reside in Undefined exception handle. When the processor is in the FGDM, the debug program receives and executes commands debug function. The command description is shown as Table 3. 2.4. Hybrid Emulation The Hybrid Emulation means that FGDM is implemented with different approaches. Most of
Command syntax GO Operation description Resume user program execution BP0 <address> This is first breakpoint of Breakpoint Table BP1 <address> This is second breakpoint of Breakpoint Table ST <In/Over> REG MEM <lowbound> <upbound> Single-Step Display register content Display memory content. The display range is from <low-bound> to <up-bound>. Table 3 The system commands of Software FGDM We have to allocate a memory space to store the contents of the register file, breakpoint table, debug status, and command receiver. The memory map is shown in Fig. 2. To enter FGDM, the primary job is to the register file by saving them into registers preserve area. When we input REG command into the debug program, the debug program will display the memory content in registers preserve area. In the breakpoint table, we can set four breakpoints. We assume there have a passive communication device that can communicate via output buffer and input buffer field. program execute the Undefined code, the processor will enter the FGDM and restore the instruction code replaced. When the processor wants exit FGDM, it needs to execute the main program from the restored instruction. This kind of BGDM can only set breakpoint event according the address of instruction code. The patching procedure diagram is shown in Fig. 3. If we want to implement the single-step function, we can treat the single-step as consecutive breakpoints. When we want to run single-step function from the current breakpoint, we need know where the next instruction is. First, we need to know if the breakpoint instruction is a branch instruction and the condition code is matched. If it is a branch operation, we need to calculate the destination address, or otherwise just increment the breakpoint instruction address to patch target instruction. Receive Augrment find breakpoint instruction Is it "B"? No Is it "BL" No Want Step-Into? software emulation (Foreground debug mode) registers preserve breakpoint table Take offset field Is it negtive? Extend sign bit Add in address No Increase the address output buffer input buffer Host Patch in target instruction ExitFGDM Fig. 2 Software Emulation of memory map diagram Fig. 4 The single-step algorism diagram Store Breakpoint Address = 0x458 Breakpoint Table 0x00000458 Search 0x400 0xe2433001 Store 0x458 Undefined Code =0xe6000010 Replace Main Program 0xe2433001 SUB R3,R3,#1 Another case is to run step-into function, which will execute single-step in the subroutine. This operation, such as branch instruction, need to calculate the subroutine entry location. The algorism diagram is shown as Fig. 4. Fig. 3 The diagram of patching procedure We use the patching code approach to implement BGDM. When we input BP0 or BP1 command, the debug program will scan the user program to find breakpoint location. It then saves the original instruction at the location and replaces it with the breakpoint code, the Undefined code in the ARM7 processor. When user
3.2. Hardware Emulation In the hardware emulation (STEVE FURBER 2000), we adopt the Debug Interface and ICEBreaker of ARM7TDMI. There are three major modules FGDM Controller, Hardware Monitor and Debug Mode Switching Control. The Hardware Emulation architecture is shown in Figure 5. nopc nrw MAS[10] ntrans A[310] DOUT[310] DIN[310] Hardware Monitor Scan Chain 0 TAP controller TCK TMS ntrst TDI TDO ARM7TM_Like Scan Chain 1 Fig. 5 The Hardware Emulation Architecture All Other Signal s The FGDM Controller is a TAP controller, shown as Fig. 6. It consists of standard TAP functions, such as BYPASS, INTEST and EXTEST, and several additional functions for debugging, such as BKPT and RESTART. The command description is shown as Table 4. It uses IEEE1149.1 JTAG test mechanism to communicate with the host and receives command to run debug operation in FGDM. TDI TRST TCK TMS IR Decoder IR register IR control TAP control Bk_en IR tdo Bypass register INTEST EXTEST CHAN0_en CHAN1_en restart bypass Breakpoint register DR control Bk_tdo Bypass tdo TDO selecter (To Scan Chain) (To control DBGACK) (To Hardware Monitor) Scan Chain 0 tdo Scan Chain 1 tdo TDO (To Scan Chain) Fig. 6 The FGDM controller architecture diagram We can access ARM7 register file and memory content by using the CHAN1 of boundary scan chain, which lies on the data input and output bus of ARM7, to insert Load/Store of ARM7 instruction code. We can set breakpoint in FGDM from the JTAG port. We designed a BKPT extended TAP instruction. It will select test data register, which next inputted, is breakpoint register and input breakpoint value into breakpoint register. The breakpoint value will be stored into the Hardware Monitor. The breakpoint event types can include Address and Data events, which can make the debug capability more flexible and powerful. DBGEN Address[310] Data[310] Control[70] Command INTEST BKPT RESTART BYPASS CHAN0 CHAN1 Description The selected scan chain is connected between TDI and TDO and place in the internal test mode. The test pattern will be shifted in from TDI and placed on input pin and the result on output pin will be shift out to TDO. The scan chain is selected by CHAN0 or CHAN1 command. This command will select the Breakpoint register as test data register to be connected between TDI and TDO. This command will restart processor to exit from FDM. This command will select the BYPASS register as test data register, which is a 1-bit shift register, to be connected between TDI and TDO. It let all scan chain exit from any test mode and processor will exit FDGM temporarily for system speed instruction execution. To select the scan chain, which across data, address and control bus, for be tested. The scan chain only across data bus is a part of CHAN0. Table 4 The FGDM Controller commands list _ctrl mas k _ctrl valu e _dat a mas k _dat a valu e _add ress mas k Watchpoint Set Register 0 _add ress valu e ENABLE Watchpoint Set Register 1 BREAKPTI Fig. 7 The Hardware Monitor architecture diagram In the debug mode switching control module, it handles all of breakpoint generate conditions to control the switch timing of the debug mode. The DBGRQ pin will force the processor to switch into FGDM. When breakpoint is occurred, it needs calculate the pipeline effect of ARM7 to delay the halting of processor. We finish debug work in FGDM, we can input RESTART TAP instruction to switch into BGDM.. BREAPTI nopc DBGREQ bypass bit33 eoi flush DCLK restart Enter_ FGDM module Exit_FGDM module breakpoint watchpoint dbg_request sys_speed ExitDbg EnterDbg DBGACK Fig. 8 The debug mode switching module diagram 3.3. Hybrid Emulation In the Hybrid Emulation, we use the software FGDM emulation. And, we use the Monitor of Hardware
Emulation for BGDM hardware. We can use Load/Store instruction to set breakpoint. The breakpoint signal is connected to an input pin of the ARM core to raise an exception. We use this feature to emulate the function of Hardware Emulation. When the breakpoint is matched, it will cause an external interrupt. The interrupt will switch the debug mode into FGDM. The FGDM is the same software as in section 3.1. The hybrid emulation architecture is shown in Fig. 9. Memory Maqping Logic Address Write data Read data RW Addr[310] DataOut[310] DataIn[310] Hardware Monitor BREAKPT Abort I ARM7TM_Like Fig. 9 The hybrid emulation architecture 4. Analysis and comparison In this section, we synthesize the three versions and analyze the advantages and disadvantages of the three emulation approaches. Table 5 shows our experimental result of implementing these emulation approaches with a synthesizable ARM7 processor. The software part is written in ARM assembly code and is assembled and linked with the ARM STD v2.5 development tool. The machine code is loaded into the embedded memory (SRAM) on the SoC chip. We implement whole development system as Verilog RTL code and synthesized with the 0.35 TSMC cell library. Only relevant data are shown in the table. For example, the gate count of software part in the embedded memory is only that of the memory cells storing the software monitor, not of the normal user programs. Similarly, the gate count of hardware is only the ICE part, not of the microprocessor core itself. Operation Time is the total time for executing all the debug functions. The HW approach uses the JTAG port (boundary scan cells) to feed the debug instructions and to communicate with the outside. The JTAG port is a serial operation. On the other hand, the and Hybrid approaches use the instruction bus to feed instructions to control the ICE and use the memory data bus to communicate with outside world. Therefore, the required numbers of cycles for the and Hybrid approaches are smaller than the JTAG port since the instruction/data buses provide larger bandwidth than the JTAG port. However, the clock frequency for the JTAG port operations may be higher than that of the or Hybrid approaches since the HW operations are mainly the operations of the JTAG circuitry, which is much simpler, whereas the or Hybrid operations are mainly the operations of the microprocessor core, which is more complex. Therefore, it is possible to drive the JTAG port with a faster clock. Another way to speed up the JTAG port access is to utilize the instruction/data buses as the communication channel, as in the and Hybrid cases. However, this would deviate from the standard JTAG serial access method. Software Resource means that the system software resources required by the debug software. As described previously, the Emulation utilizes the Undefined exception to debug and patch the user programs. The Hybrid Emulation needs the Abort exception for hardware comparison. On the other hand, the HW emulation uses a hardware circuit to switch between the normal and debug modes, and therefore it does not consume any exception space, leaving the greatest flexibility for the designers of system and user software. Breakpoint Quantity is the number of breakpoints that the ICE can support. In the emulation, the number of breakpoints is limited to the size of the breakpoint table allocated in the FGDM. If such allocation can be reconfigured or dynamically adjusted, then the number can be adjusted according to the debug needs. On the other hand, the number of breakpoints, two in our implementations, is fixed in HW or Hybrid Emulation once the hardware is finished. Unless the hardware is redesigned and re-fabricated, there is no way to increase the number of breakpoints. Breakpoint Condition refers to the supported conditions under which the microprocessor can be breakpointed. In the HW and Hybrid Emulations, the sophisticated comparison hardware logic make it possible to break on both data and instruction locations, on the specific contents stored on the target memory locations, and even on the Boolean combinations of these conditions. On the other hand, the Emulation breaks only on the instruction location, because it relies on the instruction exception mechanism. It may be possible to support more breakpoint conditions through the software checking of the BGDM. However, it would require many cycles of operations in order to determine a possible breakpoint and hence heavily degrade the software performance[ij6]. Communication Device refers to the device with which the ICE communicates with outside world. In or Hybrid Emulation, the instruction and data buses are used for communication. The advantage is that the high bandwidth of these buses is utilized for high-speed communication. The disadvantage is that the communication mechanisms might not be consistent from one microprocessor to another microprocessor. On the other hand, the HW emulation uses the industrial standard JTAG port for communication. The advantage is that it is standardized. The disadvantage is that the communication speed is limited to the serial access style of the JTAG port. If standardization is not an issue, then the instruction/data buses or other I/O pins can be utilized to improve the communication speed. Design Complexity refers to the design effort of the ICE. The Emulation is a piece of software residing in the memory. It can be revised and downloaded to the memory (unless the memory is a ROM). Therefore, it requires only the software programming effort but not the
hardware design effort. On the other hand, the HW emulation implements both its FGDM and BGDM in hardware, and thus requires significant efforts in designing the ICE components, integrating the ICE with the microprocessor core and verification. As suggested by its name, the design effort of the Hybrid emulation lies between two extremes. Suitable Application Domains mean suitable application domains for each of the implemented ICE approach based on its hardware and software features. The emulation provides the most basic debugging support, which is good for debugging software, but not for debugging hardware. In addition, its software-oriented debugging mechanism is good for functional debugging, but not for debugging timing or I/O related activities. The Hybrid emulation uses hardware to serve as the BGDM to detect breakpoints, making debugging at the system speed possible. Therefore, it is good for real time debugging. However, since the FGDM is still in software, the accessibility of hardware internal signals and status is limited to what can be accessed by software, i.e., the instruction set. Finally, since the HW emulation implements both FGDM and BGDM in hardware, real time debugging of both hardware and software is achievable. This approach provides the finest observability into the hardware internal signals and status, which can be accessed through either the JTAG port or the instruction set. Code Size (Line/bytes) Gate count ( Part, in terms of gates of the required memory cells) Gate count (HW Part) Operation cycles Software Resource Used Breakpoint Quantity Breakpoint Condition Communication Device Design Complexity Suitable Application Domain Hybrid HW 222/888 230/1040-14208 14720 - - 5320 6115 1293 1596 4382 Undefined exception Flexible, as required by the user Only instruction address Parallel or serial Port Abort exception 2, fixed Instruction/data access, content dependent Parallel or serial Port - 2, fixed Instruction/data access, content dependent JTAG Simplest Simpler Complex Functional debugging Real-time debugging Real-time low level HW/ debugging 5. Conclusion We have motivated the need to investigate the possible in-circuit emulation (ICE) approaches for embedded microprocessors in system-on-chip (SoC) because the adopted ICE approaches may heavily impact the performance, cost and debugging features of the SoC chips. We then characterize the possible approaches into three categories, i.e., software emulation, hardware emulation and hybrid emulation, and match each category with available commercial solutions. We have further defined and implemented one feasible solution for each of the category and embedded them with a synthesizable ARM7 microprocessor core. The microprocessors with the embedded ICE s are synthesized, simulated and compared. The comparisons show that each solution has its own merits, such as hardware/software performance and cost, verification efforts, debug features and flexibility. SoC designers can choose an appropriate embedded ICE solution based upon their specific hardware and software needs. 6. Reference STEVE FURBER (2000) ARM system-on-chip architecture, second edition. Addison Wesley. ING-JER HUANG AND TAI-AN LU (1999) ICEBERG An embedded in-circuit emulator synthesizer for microcontrollers. Proc. of the 36 th Design Automation Conference, New Orleans, USA. JOSEPH D. G. and WILLIAM C. W. (1988) Using microprocessors and microcomputers the motorola family, second edition. Prentice Hall. INTEL Ltd. (1994) Pentium processor family user s manual volume 3 architecture and programming manual. Intel Corporation. TOM WILLIAMS (2000) On-chip debug support gets to the heart of code. Embedded Systems Development. http//www.esdonline.com/2000/jan0500/18/jan2000-18.htm. Apply Microsystems Basics of Embedded Debugging - Essential Concepts of Embedded Systems Development. http//www.amc.com/products/sware_hware_enhanced _debug_tools/debuggers/app_notes/basics.html Advanced Micro Devices Inc. (1999). Élan SC520 Microcontroller User s Manual, http//www.amd.com/products/epd/processors/ 4.32bitcont/14.lan5xxfam/24.lansc520/22004/22004b. pdf P. C. CHING, Y. H. CHENG and M. H. KO (1994) An in-circuit emulator for TMS320C25. IEEE Transactions on Education, Feb. 51-56. Table 5 The experimental result of emulation approaches