Modular Real-Time Linux Shinpei Kato Department of Information and Computer Science, Keio University 3-14-1 Hiyoshi, Kohoku, Yokohama, Japan shinpei@ny.ics.keio.ac.jp Nobuyuki Yamasaki Department of Information and Computer Science, Keio University 3-14-1 Hiyoshi, Kohoku, Yokohama, Japan yamasaki@ny.ics.keio.ac.jp Abstract In this paper, we develop a kernel module that is able to boost the real-time capability of the Linux kernel in high-load situations. While the traditional Real-Time Linux requires a major modification to the kernel source code, the developed kernel module requires only a minor modification, for instance the Linux kernel 2.6.25 needs just one line modified, and thus it offers high scalability. The developed kernel module overwrites a part of the Linux scheduler so that the pick next task member of the rt sched class structure, which is a pointer to the function choosing a next scheduled task, refers to the original function implemented in the module. The original function exploits such a scheduling algorithm that traces the remaining execution time of each task and dynamically assigns the highest priority to a job going to miss its deadline with the current priority. Using this framework, other scheduling algorithms can be also easily installed into the Linux kernel for improvement of real-time capability. 1 Introduction With the dramatic development of the embedded systems technology in recent years, underlying operating systems play more significant roles than ever before. In such embedded systems that exploit multimedia computation or many I/O devices, it is essential to make use of software libraries and device drivers. Unfortunately, traditional dedicated lightweight operating systems are no longer capable of providing sufficient functions. Thus, embedded systems have raised expectations for the Linux kernel, one of the versatile open-source operating systems. The Linux kernel enables us to make use of their huge software resource, such as libraries and drivers. In addition, many programmers are now used to Linux programming, which results in little educational cost for system development. The Linux kernel offers great benefits for embedded systems. An issue of concern here is that the Linux kernel has been originally developed for general-purpose systems, and hence some parts of its design are not suitable for embedded systems. For instance, the Linux kernel does not offer sufficient real-time capability needed by most embedded systems. Therefore, many companies and institutions have developed Real-Time Linux. In traditional Real-Time Linux, real-time capability is attached by modifying the kernel source code, such as a scheduler function and a timer function. This approach is reasonable in that the performance is optimized in accordance with the intended use. However, given the future scalability, it includes problems. The Linux kernel is an open-source software that is modified and improved day to day. It This work was supported in part by Grant in Aid for the Global Center of Excellence Program for Center for Education and Research of Symbiotic, Safe and Secure System Design from Ministry of Education, Culture, Sport, and Technology in Japan. This work was also supported in part by the fund of Core Research for Evolutional Science and Technology, Japan Science and Technology Agency. 1
is not so uncommon that the kernel design is widely changed even by a minor version upgrade. For example, the version upgrade from 2.6.22 to 2.6.23 made a drastic modification to the scheduler to implement the Completely Fair Scheduler (CFS). The same is true of the version upgrade from 2.4 to 2.6, which implemented the O(1) scheduler. In order to make efficient use of the latest Linux kernel we need to design Real-Time Linux with sufficient scalability. In this paper, we design Modular Real-Time Linux called Linux T-ReX (Linux The Real-time extension), in which real-time capability is attached by kernel modules without major modification to the kernel. With respect to the Linux kernel 2.6.25, the designed framework requires the kernel to be modified with only one line. For an element of kernel modules, we develop a real-time scheduling module that is able to boost the real-time capability of the Linux kernel in high-load situations. 2 Linux T-ReX This section describes the framework of Linux T- ReX. Linux T-ReX is composed of the Linux kernel with a minimum modification, called Linux+ kernel in this paper, and the kernel modules for real-time computing. It can be downloaded from our website http://www.ny.ics.keio.ac.jp/ shinpei/t-rex/. Linux kernel Minimum modification Module interfaces FIGURE 1: Modules Scheduler Aperiodic server DVFS Resource reservation Locking protocol Multicore support Framework of Linux T-ReX. Figure 1 illustrates the framework of Linux T- ReX. In fact, most of real-time computing techniques depend on the real-time scheduling. For instance, there are different aperiodic server algorithms which improve the responsiveness to aperiodic tasks, with respect to fixed-priority scheduling and dynamicpriority scheduling. We thereby let only the scheduler module connect directly to the kernel, and other modules are connected via the scheduler module. Such an approach fairly simplifies the framework, since we only need to modify the kernel so that it connects to the scheduler module. In the case using the Linux kernel 2.6.25, this implementation can be done by making only one line modification to the kernel. Linux T-ReX targets soft real-time systems rather than hard real-time systems, since the native Linux kernel has been originally developed for non real-time systems and so it is quite difficult to apply to hard real-time systems as for some implementations, such as I/O interruptions. Therefore, Linux T-ReX gives its best effort to efficiently schedule a task set so as to meet as many deadlines as possible even in high-load situation. At the same time, it also aims to (i) minimize the response time of non realtime tasks or aperiodic tasks, (ii) reduce the power consumption as much as possible, and (iii) maintain the quality of service, as much as possible. To achieve this, we plan to implement the following modules at the moment. Scheduler Aperiodic server Dynamic voltage scaling Resource reservation Locking protocol Multicore support 3 Scheduler Module This section presents Resch (Real-time scheduler) module that is a core of Linux T-ReX. The task scheduling in the Linux kernel after the version 2.6.23 falls into three classes: rt sched class, fair sched class, and idle sched class. Due to this classification, the scheduling of real-time tasks is now independent from the scheduling of other tasks. Since this classified scheduling seems suitable to achieve modular scheduler, we adopted the version 2.6.25 in this paper. 3.1 Kernel Modification In the Linux kernel, scheduling decision is unalterably made in the schedule() function. The schedule() function proceeds as follows. 1. Disable preemptions and lock a run queue. 2. Check the status of the current task (prev). 3. Select the next running task (next). 4. Switch prev to next. 5. Enable preemptions and unlock a run queue. A scheduling algorithm is what decides a next running task, and it is only related to the third step. The third step is actually implemented as the pick next task() function. The following shows its pseudo code. 2
pick next task() { class =rtsched class; for ( ; ; ) { p = class->pick next task(); if (p) return p; class = class->next; The pick next task() function first executes the pick next task() function of the rt sched class that is a real-time scheduling class. If there are no ready real-time tasks, the return value p becomes NULL, and then the function executes the pick next task() function of the next scheduling class. The three scheduling class is linked by the next member in order of rt sched class fair sched class idle sched class and rt sched class is always executed in priority to the other classes. Here, we notice that the pick next task member of each class is a function pointer, and in fact it refers to the pick next task rt() function. So, if we can change the reference of the pointer, we can install our own pick next task() function to the Linux kernel. The rt sched class structure is declared in the kernel as follows. const struct sched class rt sched class Hence, we modify the declaration as follows and export it from the kernel to enable a kernel module to overwrite the pick next task pointer. EXPORT(rt sched class) struct sched class rt sched class In the Linux+ kernel developed in this work, we never make further modifications. If the above modification is applied to the native Linux in the later version, we come to be able to install our own scheduler module without any kernel modification. 3.2 Module Interface Linux kernel rt_sched_class pick_next_task FIGURE 2: pick_next_task_rt Before install Resch module pick_next_task_resch After install Install of Resch module. Figure 2 illustrates the install of the Resch module. Since the pick next task function pointer of the rt sched class structure can be overwritten from the kernel space due to the above kernel modification, we can embed our own scheduling algorithm by changing the function pointer so as to refer to the pick next task resch() function implemented within the Resch module from the original pick next task rt() function on installing the module. The scheduling algorithm of the pick next task resch() function is presented in Section 3.3. In order to make use of the Resch module, we mainly manipulate the following four API functions. int resch init(long priority) int resch exit(void) int resch run(long period, long timeout) int resch yield(void) The resch init() function inserts the current task executing the caller application program into the run queue which is managed by the Resch module. The task priority is set to the argument. If we want to finish its real-time execution scheduled by the Resch module, we use the resch exit() function. The resch run() function actually starts periodic realtime execution with the specified period when the specified timeout is expired. The resch yield() function yields the CPU time to another task, and periodic tasks must call this function at every end of period. Notice that the Resch module runs in the kernel space while user tasks run in the user space. Therefore, a user task cannot directly call the functions implemented within the Resch module. In order to enable a user task to call the module functions, we prepare a virtual character device (/dev/resch) and implement the Resch module as its device driver. When the Resch module is initialized, it registers the resch write() function so that it is executed when the write() system call is executed to /dev/resch. Thus, using the write() system call, the above API functions can manipulate the Resch module. For instance, let us assume the resch init() function is executed as below. resch init(99); This function deploys the following code. ((long*)data)[0] = ID INIT; ((long*)data)[1] = 99; fd = open( /dev/resch, O RDWR); write(fd, data, sizeof(data)); Here, ID INIT is a constant that indicates the ID of the resch init() function. The Resch module then parse ID INIT and carries out the corresponding processing. 3
In the Resch module, we need to associate the data such as deadline and execution time to each task for real-time scheduling. These data are defined by the rt data structure. The following depicts the main members of the rt data structure. struct rt data { long period; long deadline; long wcet; long remaining time; The wcet member indicates the worst-case execution time. The Resch module tracks the execution time of a task. When the resch yield() function is called, wcet is updated if the execution time of the task in the current period is greater than that of ever before. The remaining time member indicates the remaining execution time of the task to wcet. Note that we did not make any modification to the task control block in the kernel (task struct). We associate the rt data structure to the task struct structure by using the time slice member of the rt sched entity member of the task struct structure. In fact, the time slice member is never used in the SCHED FIFO real-time scheduling. So, we use this member to store the pointer to the rt data structure as follows. rt data *rt = kmalloc(sizeof(*rt)); current->rt.time slice = (int)rt; Finally, executing the sched setscheduler function of the Linux kernel with the specified priority and the SCHED FIFO policy as the arguments, the task will be scheduled according to the scheduling algorithm of the Resch module, which is described in the next section. 3.3 Scheduling Algorithm The native Linux kernel offers two scheduling algorithms for real-time tasks: SCHED FIFO and SCHED RR. The difference between SCHED FIFO and SCHED RR is that SCHED FIFO never preempts the current task unless higher priority tasks are ready, while SCHED RR switches the current task when the time slice is expired if there are ready tasks with the same priority. In fact, they are the same algorithm in terms of fixed-priority scheduling. If the scheduling algorithm is either SCHED FIFO or SCHED RR, the occurrence of context switches is confined to the following two cases. When higher priority tasks are released. When the current task stops (completion or sleep). If the scheduling point of a scheduling algorithm is limited to the above two cases, all we have to do is to prepare a new pick next task resch function to add-in the scheduling algorithm to the Linux kernel. In this paper, we design a scheduling algorithm that can maintain the real-time capability even in highload situation. Then, we implement the scheduling algorithm to the Resch module. A fixed-priority algorithm has many advantages, such as simplicity, predictability, small jitters, and so on. However, a main disadvantage of fixedpriority scheduling is that a deadline may be missed even though the CPU utilization is not quite high. Rate Monotonic (RM) [1], an optimal fixed-priority scheduling algorithm, may miss deadlines if the CPU utilization is higher than 69% in the worst case. Meanwhile, Earliest Deadline First (EDF) [1], an optimal dynamic-priority scheduling algorithm, can always meet deadlines if the CPU utilization does not exceed 100%. However, in EDF scheduling, the priorities of tasks are dynamically changed and thus its implementation is not suitable to the O(1) scheduler of the Linux kernel. Besides, it is widely known that one deadline miss can cause following deadline misses in EDF scheduling if the CPU is overloaded, which is often called Domino Effect. Therefore, we design a scheduling algorithm based on the RM algorithm, with offering high CPU utilization and at the same time succeeding the advantage of fixedpriority scheduling such as simplicity, predictability, and small jitters. T 1 T 2 T 3 deadline miss FIGURE 3: RM scheduling. First, we consider RM scheduling. Figure 3 shows a deadline miss in RM scheduling, when three tasks T 1, T 2,andT 3 are scheduled. A low priority task T 3 is blocked by higher priority tasks T 1 and T 2, and thereby misses a deadline. Here, we realize that a deadline miss can be avoided if a gray-color execution portion of T 3 is scheduled in priority to the third job of T 1 in the figure. In order to achieve this scheduling, we introduce a concept of Critical Laxity [2] to RM scheduling. Critical Laxity is a derivative of Zero Laxity [3, 4]. The laxity of task T i at time t is denoted by x i (t) and computed as follows where d i is a deadline of T i and e i is a remaining execution time of T i at time t. x i (t) =d i (t + e i ) 4
Let t s be any scheduling point of fixed-priority scheduling, which is a time instant when some job is released or complete. The laxity of T i is said to be Critical Laxity, if it holds the following condition at time t s,wheree hp denotes the remaining execution time of the highest priority task. x i (t s ) <e hp t s e hp x ( ) i t s critical laxity T hp zero laxity FIGURE 4: Critical laxity. In RM scheduling, a task reaching Critical Laxity will surely miss a deadline unless it is assigned the highest priority on the spot. Figure 4 depicts an example. Assuming that task T i reaches Critical Laxity at time t s.duetox i (t s ) <e hp,thelaxityof T i will reach zero before the highest priority task T hp completes, and thus it will never meet a deadline. In other words, if the highest priority is dynamically assigned to a task with Critical Laxity, a deadline miss can be avoided. Such a scheduling algorithm is defined Rate Monotonic Critical Laxity (RMCL) in this paper. T 1 T 2 e i T i d i /* t is current time. */ pick next task resch() { T hp =picknext task rt(); for (each task T i ) { if (d i t + e i <e hp && d hp t + e hp e i ) return T i ; return T hp ; First, the module calls the pick next task rt() function that is originally implemented in the kernel to get the highest priority task T hp.next,itverifies if there exist tasks that have Critical Laxity. If a task T i has Critical Laxity, the module assigns the highest priority to T i unless this assignment causes T hp to miss a deadline. Otherwise, T hp is the highest priority task. 4 Conclusion In this paper, we designed and implement a modular real-time Linux called Linux T-ReX. Linux T- ReX requires only one line modification to the original Linux kernel, and the real-time capability is improved by using kernel modules. As a start point, we developed the Resch module for efficient real-time scheduling in Linux. The RMCL algorithm that is implemented in the Resch module can maintain the real-time capability even in high-load situation, compared to the traditional RM algorithm. In the future, we plan to develop other kernel modules for supporting sophisticated real-time computing. T 3 critical laxity FIGURE 5: RMCL scheduling. Figure 5 shows the RMCL scheduling of the three tasks. Although T 3 misses a deadline in RM scheduling as shown in Figure 3, it can meet a deadline in RMCL scheduling. In fact, RMCL scheduling is at least as effective as RM scheduling. In other words, any task set that is successfully scheduled by RM can be also successfully scheduled by RMCL. Besides, we can see that no additional preemptions occur in the figure. In fact, the number of scheduler invocations in RMCL scheduling is same as that in RM scheduling. Since RMCL scheduling behaves as RM scheduling until any task reaches Critical laxity, it succeeds all advantage of RM scheduling such as simplicity, predictability, and small jitters. In consequence, we believe that RMCL scheduling is very effective for the Linux kernel. Finally, we show an implementation of the RMCL algorithm in the Resch module by pseudo code. 5 References [1] C. L. Liu and J. W. Layland. Scheduling Algorithms for Multiprogramming in a Hard Real- Time Environment. Journal of the ACM, 20:46 61, 1973. [2]S.KatoandN.Yamasaki. GlobalEDF-based Scheduling with Efficient Priority Promotion. In Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 10 pages, 2008. [3] S. Cho, S.K. Lee, A. Han, and K.J. Lin. Efficient Real-Time Scheduling Algorithms for Multiprocessor Systems. IEICE Transactions on Communications, E85-B(12):2859 2867, 2002. [4] M. Cirinei and T.P. Baker. EDZL Scheduling Analysis. In Proceedings of the Euromicro Conference on Real-Time Systems, pages 9 18, 2007.