EE8205: Embedded Computer System Electrical and Computer Engineering, Ryerson University Multitasking ARM-Applications with uvision and RTX 1. Objectives The purpose of this lab is to lab is to introduce students to uvision and the ARM Cortex-M3's various RTX based Real-Time Operating System (RTOS) capabilities. Specifically, students will learn how to schedule round-robin, priority preemptive, and non-preemptive applications using uvision and supporting libraries. 2. Working with uvision and RTX 2.1. Setting up an RTX Project Launch the uvision application. Create a new project "RTX_Demo" in your "S:\\" folder. Select the LPC1768 chip. Copy the files provided to you in the course directory "U:\\ee8205\project\labs\rtx" to your project directory. Configure your project workspace to resemble that of Fig. 1. Do not use the LED, LCD etc files provided for this section. To configure the "RTX_Library" folder in the project tree, navigate to the directory C:\\Keil\ARM\CMSIS\uVision\LIB\ARM\ and find the file RTX_CM3.lib. When searching for the.lib file, ensure that the field "Files of type:" is set to "All files (".")" to see files of type.lib. To manage the structure of your project workspace (add, delete and manoeuvre the Targets, Groups, and associated files), use the project manager. To access this click on Project >> Manage >> Components, Environment, Books. Try manipulating the Project components so that they are exact to Fig. 1. Fig. 1. Project Workspace for Demo Once the workspace is setup, follow the following steps to setup the RTX kernel: 1. To specify the RTX target options in uvision, either select the icon or Project >> Options for Target "LPC1700". 2. Click on the Target tab. In the Operation System dropdown box, select RTX Kernel. 3. In the Code Generation box, check Use MicroLib. This will help the compiler generate a smaller code size which is ideal for embedded applications. All other configuration options should be left as their default values. 4. Click Ok to close the window. Select File >> Save All. EE8205: Embedded Computer System -- RTOS Tutorial Page 1/8
Now RTX must be configured for specifications such as the frequency of the CPU's systick timer for scheduling and the arbitration techniques desired for our multi-threaded applications. 2.2 Configuring RTX 1. Open the file RTX_Conf_CM.c either by selecting File >> Open or by double clicking the file in the "Configuration" sub-tree in the Project workspace. 2. Once the file has opened, select the Configuration Wizard tab found at the bottom of the coding window. Click Expand All to see all the options that the RTX kernel provides. 3. There is a heading in the option tree entitled RTX Kernel Timer Tick Configuration. This option provides the user with the capability of using the internal Cortex-M timer as the RTX kernel's timer. For scheduling purposes, selecting this option will ease the programmer's responsibilities (as opposed to directly instantiating timer interrupts in the code). Therefore ensure that the option Use Cortex-M SysTick timer as RTX Kernel Timer is selected. 4. Under this option, there should also be a subheading entitled Timer Clock value[hz]. Ensure that this option is set to 10000000 (10 MHz), and that the subsequent Timer tick value[us] option is set to 10000 (10ms). What exactly do these numbers mean? We set the frequency to 10MHz. Therefore a timer clock cycle occurs every 0.1us. Then we set the timer's tick value to 10000. Therefore the "RTX's clock" occurs every 10000*0.1us = 1000us = 1ms. Why do we need an RTX clock? Real-time system's are not usually designed for applications that deal with nanosecond time scales that are typical of conventional processors. They revolve around more practical applications and human-like response times. Systems designed for such an environment will more than likely require a higher magnitude of time, such as milliseconds. By adding an RTX clock, the application can be coded easily for variety of possible scheduling algorithms, without the need to explicitly poll for timer flags or use interrupts to schedule the tasks. RTX does this for us. Fig. 2: RTX_Conf_CM.c Configuration Wizard 5. Round-Robin Scheduling: The RTX_Conf_CM.c file is especially useful for programming multithreaded applications with round-robin scheduling algorithms. There is an option in the.c file's System Configuration heading entitled Round-Robin Thread Switching. We will be programming a round-robin application in this demo. Therefore ensure that this box is checked. Under this subheading there should also be a field entitled Round-Robin Timeout[ticks]. We will use a time slice of 10ms for each task. Therefore since every "tick" is equivalent to 1ms, set the timeout field to 10. Also ensure that the "user timers" option is checked off. Your configuration file should now resemble that of Fig. 2. EE8205: Embedded Computer System -- RTOS Tutorial Page 2/8
Note: When a change is made in the Configuration Wizard, the code in the Text Editor tab will also change automatically according to the options you selected. Ensure that you Save All to save these changes and include them during compilation. 3. Programming with uvision and RTX 3.1 Understanding the RTX Program Open the Demo.c file and examine the code. We will create an RTX application consisting of two simple threads each executing their own task. The osthreadcreate() and osthreaddef() functions will create the threads and set their priorities respectively. Task1 and Task2 will infinitely loop using a round-robin scheduling technique. This timing specification was also included in the config file (RTX_Conf_CM.c) from the previous section. oskernelinitialize() and oskernelstart() will setup the round-robin scheduling definition and execute the kernel respectively. Compile the application and enter Debug mode. We will now use the uvision tools to analyze the RTX program. 3.2 Analyzing the RTX Project a) Watch Windows Keep track of the variables counta and countb: 1. Open the watch window by selecting View > Watch Window > Watch 1. 2. Find the column entitled "Name" in the Watch 1 window. The subsequent rows under this column should read <Enter expression>. Highlight the field and press backspace. Enter "counta" in the first row, and "countb" in the second. 3. When you click the RUN icon to execute the program, the values of counta and countb should alternatively increment depending on the task that's currently executing. 4. It is also possible to change counta or countb's value as its incrementing during execution. If you enter a '0' in the value field, you may modify the variable's value without affecting the CPU cycles during executing. This technique can work both in simulation and while executing on the CPU. b) Watchpoints (Access Breaks) Similar to the Watch window, watch points allow a program to stop when a variable has reached a specific value. This can also be very useful for debugging purposes. 1. Select Debug >> Breakpoints. 2. In this window we can select a variable's value for which we would like the program to terminate. Enter the expression field as: counta == 0xFFFF. Check off the "Write" box in the "Access" field. Click on "Define" to set this as a watchpoint under Current Breakpoints. Click Close. 3. Go back to the Watch window and set counta as 0x0 (zero). Click RUN. The processor (and simulator) will then stop when counta is equal to the value 0xFFFF. To remove the watchpoint, open the Breakpoint window once again and select "Kill All". Select Close. c) Performance Analyzer 1. Select View >> Analysis Windows >> Performance Analyzer. 2. Expand the "Demo" function in the PA window by pressing the "+" sign located next to the heading. There should be a list of functions present under this heading tree. There should also be EE8205: Embedded Computer System -- RTOS Tutorial Page 3/8
another subheading entitled "Demo". Press the "+" sign again to collapse the demo tree further. There you can see task1 and task2, along with the main function. 3. Reset the program (ensure that the program has been stopped first). Click RUN. 4. Watch the program execute and how the functions are called. Exercise: Try setting a watch point for the variable counta again to 0xFFFF. Reset the program and click RUN. At what time does the overall program halt? What task was the program executing when the watch point was reached? How long was the TASK running for? d) RTX Event Viewer The Event Viewer is a graphical representation of a program's task execution timeline. An example is given in Fig. 3. The Event Viewer runs on the Keil simulator but must be configured properly for CPU execution using a Serial Wire Viewer (SWV). To use this feature: Fig. 3: Event Viewer 1. In the main menu select Debug >> OS Support >> Event Viewer. A window should appear. 2. Ensure that you have killed all the breakpoints from the previous steps by selecting Debug >> Kill All Breakpoints. 3. Click RUN. Click the "All" button under the zoom menu in the Event Viewer window. You may also select "In" or "Out" to adjust the view of the timeline which dynamically updates as the program continues to execute. Note the other tasks other than task1 and task2 that are also present in the execution timeline. 4. Let the program execute for approximately 50 msec. Click STOP. Your window should now look similar to that of Fig. 3. 5. Check off "Task Info" in the Event Viewer menu. Hover the mouse over one of the task time slices (blue blocks indicating execution of the task). You will see stats of the task appear. The stats should concur with the round-robin scheduling we set up in RTX_Conf_CM.c (i.e. 10ms time slices). 6. Try going back to the RTX_Conf_CM.c file and changing the time stats of the round-robin scheduler. Rebuild the project and run it again in Debug mode. See if the Event Viewer reflects the changes you made to the file. e) RTX Tasks and System Window This window provides an RTX kernel summary that details the specifications of the RTX_Conf_CM.c file, along with the execution profiling information of tasks that are executing. An example window is provided in Fig. 4. The information obtained in this window comes from the Cortex-M3 DAP (Debug Access Port). The DAP acquires such information by reading and writing memory locations through the JTAG port in a continuous manner. To use this feature: EE8205: Embedded Computer System -- RTOS Tutorial Page 4/8
Fig. 4: RTX Tasks and System Window 1. Select Debug >> OS Support >> RTX Tasks and System. 2. As you run the program (or Reset and RUN), the state of the "Tasks" heading will change dynamically. The "System" information however will remain the same since these values were specified prior to runtime in RTX_Conf_CM.c. 4. Revisiting Demo.c Now that we have coded a simplistic multi-threaded application and analyzed various performance features using uvision, it is time to revisit the Demo.c file. You might have noticed that when we evaluated the code using the various uvision tools that we may have over looked some technicalities. For instance, the RTL.H and cmsis_os.h headers must be included to define and access all RTX features. Take a look at the code once more. Let's go through the code step-by-step: 1. Re-execute the code and take a look at the Event Viewer. What task executes first? ostimerthread() thread initializes and executes - this thread is responsible for executing time management functions specified by ARM's RTOS configuration 2. The program starts execution from main(), where main() ensures that: a. The Cortex-M3 system is initialized b. the os kernel is initialized for interfacing software to hardware c. Creates the threads to execute task1 and task2 d. Starts the kernel to begin thread switching 3. Task1 executes for its round-robin time slice. After 10msec the timer thread forces control to task2. 4. Task2 executes during its time slice for 10msec and is forced to stop again and execute task1. This occurs infinitely. 5. Data exchange between tasks should be done globally (based on scheduling), else local data will be overwritten when the task's time slice has been reached. If you are implementing loops within pre-emptive tasks, it is wise to consider a global while loop over a local for loop implementation. 4.1 Processor Idling Time When there are no tasks to run, a processor will typically transition to an idle state and wait until a new task is ready for execution. In RTX mode, the Cortex ARM core and the RTX kernel automatically create an "Idle Demon" task for the idle state. The actual task definition for Idle Demon is located in the RTX_Conf_CM.c file in the function entitled os_idle_demon(void). The task may be altered to do useful work when idling, or even count the idle cycles to determine the utilization time of the processor. As an exercise, let us determine the idling time of the code we have been currently working with: EE8205: Embedded Computer System -- RTOS Tutorial Page 5/8
1. Exit Debug mode and open the RTX_Conf_CM.c file. Under the line #include <cmsis_os.h> insert the definition for the global variable unsigned int countidle = 0; 2. In the same.c code find the task entitled os_idle_demon(void) and adjust the function's code so that it resembles the following: /*--------------------------- os_idle_demon ---------------------------------*/ void os_idle_demon (void) { for (;;) { countidle++; 3. Save the file and compile the project. Re-enter Debug mode. 4. Open the Watch window. Add counta, countb, and countidle to the expression list of variables to watch during execution. Click reset, and then RUN. 5. Observe the Watch 1 window, and as counta and countb increment, but the countidle variable does not. What does this mean? This means that the cpu is currently under 100% utilization by the task threads. Note that Idle Demon is set with the lowest priority in the task list. You can verify this by using the RTX Tasks and System tool. 4.2 Implementing Different Scheduling Algorithms Exercise 1- Setting Priority: Exit the Debug mode to access the Demo.c file. Change the line: osthreaddef(task2, osprioritynormal, 1, 0); TO osthreaddef(task2, ospriorityabovenormal, 1, 0); Compile the program and return to Debug mode. Run the program and open the Event Viewer window. What do you notice? By setting the priority of task2 to that of a higher priority in comparison to task1, the basis of a priority pre-emptive (interruptible) scheduling technique was created, where the higher priority will execute to completion first. Since task1 was created first, it was also expected to run first. Task1 however will never be executed due to its "Normal" priority setting (in comparison to task2's "AboveNormal") and the fact that task2 executes infinitely. Conversely, if the code was programmed such that task2 terminates after a finite time (when its workload completes), task1 would then be able to execute thereafter. Note that it is also possible to create a non-preemptive scheduling algorithm by assigning appropriate priority levels to the tasks. Exercise 2 - Pre-emptive Scheduling: Exit Debug mode to access the Demo.c file again. Change the task1 and task2 function code to the following: task void task1 (void const *arg) { // task is an RTX keyword for (;;){ // Infinite loop runs while task1 runs. counta++; // Increment global variable counta indefinitely os_tsk_pass(); task void task2 (void const *arg) { for (;;){ countb++; os_tsk_pass(); Also make sure to change: EE8205: Embedded Computer System -- RTOS Tutorial Page 6/8
osthreaddef(task2, ospriorityabovenormal, 1, 0); back to osthreaddef(task2, osprioritynormal, 1, 0); Recompile the files. Enter Debug mode. Open a Watch window to track the counta and countb variables, along with the Event Viewer. Reset the program and click RUN. How does the execution of the code using os_tsk_pass() differ from round-robin? If you were successful, you will observe short execution time slices per task in the Event Viewer, where it almost appears as if the tasks were running as round-robin (after several msec). With the changes made to the program, each task should simply increment their counter by one and pass control (or the 'os token') to the next task. os_tsk_pass() allows tasks of various priorities to pass control to the next task. Specifically, you should observe that on average a single task runs for 2.52us before passing control to the next task (which is the equivalent time spent entering the task, incrementing the counter, and passing control). Note that if there were many finite tasks with variable priority within an application, os_tsk_pass() would allow for a pre-emptive scheduling algorithm as it is able to pass control to other tasks. What is the utilization time of the processor? Check the Idle Demon variable and task using the performance based tools. Try replacing the os_tsk_pass() with osthreadyield(); What do you notice? Exercise 3: Stop the previous program and exit Debug mode to gain access to the Demo.c file. Remove the os_tsk_pass() functions you implemented in the last exercise. Update task1 and task2 with the following code: task void task1 (void const *arg) { // task is an RTX keyword for (;;){ // Infinite loop runs while task1 runs. os_dly_wait(2); counta++; // Increment global variable counta indefinitely task void task2 (void const *arg) { for (;;){ os_dly_wait(1); countb++; Recompile the files and enter Debug mode. Setup the Watch 1 window with the variables counta, countb, and countidle. RUN the program Based on the Watch window, what do you think os_dly_wait() function does? Use the Performance Analyzer and Event Viewer to verify your findings. What is the utilization time of the CPU? Note on Pre-emptive and Non Pre-emptive Scheduling To implement pre-emptive or non pre-emptive scheduling techniques, the RTX_Conf_CM.c file must be adjusted. Specifically in the Configuration Wizard, the option System Configuration >> Round-Robin Thread switching must be disabled. Ensure however that the systick timers are enabled. 5. Optional Assignment The following outlines the specifications for 3 different scheduling applications (Questions 1, 2, and 3). You may create TWO versions for each application: 1) an analysis version and 2) a demo version. The analysis will be used for debug mode to analyze performance of your applications for your report, and must not include any LED or LCD code. The demo version will include LCD and LED functions for EE8205: Embedded Computer System -- RTOS Tutorial Page 7/8
your demo 1. You will be marked on both versions of the code, but are only required to submit the analysis version for grading. 1. Write a round-robin scheduling example using 3 different tasks. Each task should be allotted a time slice of 15msec. Note: Your code must perform a different functionality than the one provided in this demo. Marks will be awarded for creativity. Ensure that the tasks do not run infinitely, and they have a finite workload with respect to time. For the demo version only, use the LEDs and the LCD to indicate the threads that are currently executing in your program. TABLE I: LIST OF PRE-EMPTIVE TASKS Task Functionality Thread Priority A A = [ + ( +2)] B 2 3! 1 B = C C = D D = 1 + + + + +!!!!! E E = + 2 + 3 + + 12 3 2. Table I provides a list of pre-emptive tasks, with their function and priority listed. Note: The lower the number in the Priority column, the higher the priority. Write the pre-emptive code for a scheduling algorithm which invokes the tasks and functionalities in Table I based on their priority level (i.e. Task C should finish computing first etc). Each task should print their final result to stdout. For the demo version only, use the LEDs and the LCD to indicate the threads that are currently executing in your program. 3. (2 + 1) 2+4 ( +2) [ + ( 2)] 10 ( + 3) = 8! Develop a multithreaded program to calculate the formula above using a non pre-emptive algorithm. Ensure that BEDMAS is followed accordingly using a minimum of 5 different tasks. This will require you to experiment with different priority levels, and ensure that producer instructions are computed ahead of time to provide data to its consumers. The final result should be written to stdout. For the demo version only, use LEDs and the LCD to indicate the threads that are currently executing. 4. Write a half page report describing the differences and similarities between the scheduling algorithms you have implemented. In particular, what did you notice when coding Questions 1, 2, and 3? What were the differences in implementation? Explain. You may hand in the printout of the analysis version of your.c code, RTX_Conf_CM.c Configuration Wizard file, and snapshots of your Event Viewer and Performance Analyzer windows for each application. You may demonstrate the demo version for each application. 2 References 1. "The Keil RTX Real Time Operating System and μvision" www.keil.com. An ARM Company. 2. "Keil μvision and Microsemi SmartFusion" Cortex-M3 Lab by Robert Boys www.keil.com. 3. "Keil RTX RTOS the easy way" by Robert Boys www.keil.com. 1 Note, you may need to increase the stack size to accommodate the LED/LCD code in RTX_Conf_CM.h EE8205: Embedded Computer System -- RTOS Tutorial Page 8/8
EE8205: Embedded Computer System -- RTOS Tutorial Page 9/8