Reduce RTOS latency in interrupt-intensive apps Nick Lethaby, Texas Instruments - June 05, 2009 In hard real-time applications such as motor control, failure to respond in a timely manner to critical interrupts may result in equipment damage or failure. As a result, developers of such applications have tended to shy away from use of third-party real-time operating systems (RTOS). However, as communications peripherals such as Ethernet and USB become more pervasive in control applications, software complexity is pushing developers toward greater RTOS use. In this article, we will examine the potential for an RTOS to introduce greater interrupt overhead when used inappropriately and then describe a practical design technique to bring the benefits of a multithreading operating system to control applications with high interrupt rates. We will begin by providing a detailed overview of interrupt management inside an RTOS and compare it with how an interrupt might be handled outside an RTOS, including a comparison of overhead for use cases at different interrupt rates. We will conclude with an illustration, including example code, of a hybrid interrupt-handling approach that greatly reduces interrupt overhead. Understanding interrupt processing in an RTOS When using an RTOS, the typical approach for responding to an interrupt involves the RTOS interrupt dispatcher invoking a user-defined interrupt service routine (ISR), which does a minimal amount of work before deferring most processing to another thread such as a task. The RTOS will usually enable a developer to write the ISR in C since it automatically handles lowlevel operations, such as context save and restore operations, further simplifying ISR development. In applications where interrupt rates are relatively low, there is little reason to consider deviating from this methodology. While RTOS vendors highly optimize their interrupt handling code, a commercial RTOS must address a broad variety of applications. As a result, a number of the operations performed by an RTOS may not be needed for a specific interrupt or application. In applications where interrupts are more frequent, developers need to consider whether the impact of RTOS overhead, such context switches, may cause the application to miss real-time deadlines. Although commercial RTOS vendors provide detailed performance benchmarks that knowledgeable developers can use to determine if real-time deadlines can be met, it's important to fully understand the operations performed by an RTOS when handling an interrupt. It is equally important to understand the different options provided by an RTOS and what design trade-offs have been made,
since these can affect interrupt processing times significantly. By and large there are four main areas in which an RTOS may introduce latency or overhead into interrupt processing times: Operating system (OS) interrupt latency: An RTOS must sometimes disable interrupts while accessing critical OS data structures. The maximum time that an RTOS disables interrupts is referred to as the OS interrupt latency. Although this overhead will not be incurred on most interrupts since the RTOS disables interrupts relatively infrequently, developers must always factor in this interrupt latency to understand the worst-case scenario. It should be noted that some RTOS implementations may allow for some interrupts to always be enabled, thus avoiding any latency. However, these interrupts are not able to interact with the operating system directly, since they would then potentially cause corruption in the RTOS's critical regions. In some applications, there may also be application code, or even some ISRs, that also disable interrupts for long periods. If the period for which user code disables interrupts is longer than the OS interrupt latency, then developers do not need to concern themselves with the impact of the RTOS in this respect. Low-level interrupt-related operations: When an interrupt occurs, the context must be initially saved and then later restored after the interrupt processing has been completed. The amount of context that needs to be saved depends on how many registers would potentially be modified by the ISR. If the RTOS interrupt dispatcher is designed to enable the developer to call an arbitrary C function or the full range of RTOS system calls from within the ISR, it must save and restore the entire scratch register context required by arbitrary C programs. If an ISR only needs to use a few registers, saving the complete context introduces overhead. Support for nested interrupts can be another source of potential overhead in an RTOS interrupt dispatcher. By default most microprocessors disable (in hardware) all interrupts when an interrupt is asserted. If an RTOS wants to enable nested interrupts, it must update the interrupt mask and then re-enable interrupts prior to calling the ISR. Obviously, if an application does not require interrupt nesting, these operations represent unneeded overhead in the interrupt handling process. An RTOS may offer configuration options on whether to support nesting and restrictions on which system calls can be made from ISRs that significantly reduce these potential overheads. Enabling the ISR to interact with the RTOS: An ISR will typically interact with an RTOS by making a system call such as a semaphore post. To ensure the ISR function can complete and exit before any context switch to a task is made, the RTOS interrupt dispatcher must disable preemption before calling the ISR function. Once the ISR function completes, preemption is re-enabled and the application will context switch to the highest priority thread that is ready to run. If there is no need for an ISR to make an RTOS system call, the disable/enable kernel preemption operations would again add overhead. It is logical to handle such an ISR outside of the RTOS. Context switching: When an ISR defers processing to an RTOS task or other thread, a context switch needs to occur for the task to run. It is also important to note that to complete the interrupt processing, another context switch must occur out of the task that completed the processing. The amount of time required by the context switch will depend on the type of thread to which the processing is deferred. It is not uncommon for an RTOS to offer additional low-overhead thread types that may be used for interrupt processing as an alternative to tasks. Examples of such low overhead thread types are the DSP/BIOS software interrupts or the OSE interrupt processes; therefore, a developer may lower this overhead by selecting an optimal thread type. Nevertheless, context switching will still typically be the largest part of any-rtos related interrupt processing overhead. As discussed, when calculating what overhead or latency is being introduced by the RTOS, it's
important to consider the specific needs of the application. If an application requires nested interrupt handling and has complex algorithms that disable interrupts for hundreds of microseconds, then some of the "overheads" I've described here will not be applicable. In addition, programming convenience must also be considered. If the system has a relatively low interrupt rate and sufficient processor headroom, it makes sense to handle all the interrupts through the RTOS. However, in systems with high-interrupt rates, even small overheads can rapidly compound to consume a significant amount of CPU resources. Figure 1 shows the CPU resources consumed by different interrupt handling approaches using the DSP/BIOS operating system running on a 32-bit C28x MCU. In these benchmarks, the standard DSP/BIOS interrupt dispatcher was used and no advantage was taken of alternative mechanisms provided that can reduce interrupt overhead. View the full-size image Figure 1 illustrates a couple of important points about RTOS interrupt handling overhead. First, the additional cycles consumed by an RTOS interrupt dispatcher are in the 'noise' except when interrupts start occurring at faster than 30 microsecond intervals (on a 150 MHz part). Second, the real overhead is incurred in the context switches. As a result, an RTOS that offers alternative thread types useful for handling interrupts with lower context switch times can certainly provide an advantage. Figure 1 graphically illustrates the necessity of avoiding context switches to handle interrupts that occur at high frequency since the CPU may be "thrashed" to the point where it is doing nothing but context switching. High-frequency interrupts Developers implementing applications that combine one or more very high-frequency interrupts and a number of other interrupt-driven background functions may be tempted to avoid using an RTOS because of concerns about the interrupt overhead. However, this approach is akin to "throwing the baby out with the bathwater." A more optimal
approach is to take advantage of an RTOS to implement the majority of the system function, but to handle the high-frequency interrupts outside of the RTOS, enabling the specific ISRs to be highly optimized. The drawback of this approach is that a "non-rtos" ISR cannot interact with the threads managed by the RTOS. If the ISR functionality is truly standalone, this is not an issue. However, in many applications it might be necessary for the high-frequency ISR to pass some data back into an RTOS thread for background processing. An example of this might be a console application that displays some statistics on motor control performance. The high-frequency motor control interrupt must be handled outside of the RTOS to minimize overhead but also needs to periodically pass some data into the part of the application running on the OS. When faced with such a requirement, the best approach for a developer is to employ a two-level interrupt technique that enables some buffering of data from the high-frequency ISR. After a certain number of data samples have been collected, the high-frequency ISR will then trigger another interrupt that is handled by the RTOS interrupt dispatcher. This second ISR can then pass the data to a background thread. Most processors offer a way to trigger an interrupt from software. On a C28x device, for example, this can be done by writing to the Interrupt Flag Register (IFR). Looking back to Figure 1, note that the context switch overhead begins to rise dramatically once interrupts occur faster than about 40 microsecond intervals. Therefore even buffering up as few as four data samples will greatly decrease overhead, although buffering up eight samples would yield further significant savings. When adopting such an approach, some thought should be given to how the data is passed up from the non-rtos ISR. The simplest approach is to have the second ISR copy it into a new structure that is then passed up to the thread. While data copies may seem wasteful, they are likely much more efficient than sharing the same data structure between the thread and the non-rtos ISR. This would require interrupts to be disabled around accesses to the data structure to prevent data corruption and more complex program logic to ensure that the thread knows it is reading valid data. The skeleton code, shown in Listing 1, demonstrates how a high-frequency interrupt can trigger a secondary interrupt to defer some processing to a task running on the DSP/BIOS kernel. In this demonstration, the program handles a high-frequency timer interrupt (interrupt 13) in myisr. However, after every HWIPRD invocations of myisr, myisr triggers another interrupt that is handled by myhwi.
View the full-size image The PIE table (part of the interrupt management mechanism on TMS320C28x devices) entry corresponding to INT11.1 (PIE group 11, interrupt1) has been plugged by myhwi using the HWI_dispatchPlug DSP/BIOS API call (this specific piece of code is not shown here). Using the HWI_dispatchPlug call causes myhwi to be invoked by the dispatcher whenever interrupt INT11.1 occurs. myisr can then trigger interrupt INT11.1 by manually flagging the appropriate bit in PIE group 11's IFR. The program therefore invokes INT11.1 every HWIPRD invocations of myisr. The value count keeps track of the number of times myisr has been invoked. The value snapshot is equal to the value of count at the point at which the hardware interrupt handled by myhwi is triggered. If snapshot does not equal count when mytsk executes the statements that output the above log text, one or more timer interrupts have occurred during the course of the interrupt routine involving myhwi. In fact, the number of timer interrupts that have occurred during this period is equal to the difference between snapshot and count. mytsk must disable interrupts during the calculation of difference to prevent myisr from modifying the value of count, otherwise the code would not be thread safe. Overhead control These different illustrations of how an RTOS can add latency and overhead into interrupt processing times have shown how such overheads can break down largely into context switch time and a lesser component directly associated with handling the ISR within the RTOS. The first priority for developers of applications with a high-frequency interrupt must be to avoid
always deferring processing from such an ISR to an RTOS thread. Otherwise, the context switch overhead can become prohibitive. Further savings may be obtained by handling the high-frequency interrupt outside of the RTOS and having its ISR periodically trigger an RTOS ISR to defer some processing to an RTOS thread. Nick Lethaby is the operating system product manager at Texas Instruments, where he is responsible for product requirements definition for the DSP/BIOS real-time operating system and multimedia SDKs. Lethaby has over 20 years of applications engineering, product management, and marketing experience in embedded systems development tools. He graduated from the University of London with a bachelor of science in computer science. He may be reached at nlethaby@ti.com.