Comparison of Service Call Implementations in an AUTOSAR Multi-core OS

Similar documents
System Software and TinyAUTOSAR

Implementing AUTOSAR Scheduling and Resource Management on an Embedded SMT Processor

Do AUTOSAR and functional safety rule each other out?

Implementing AUTOSAR Scheduling and Resource Management on an Embedded SMT Processor

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

OSEK/VDX. Operating System. Version February 17 th, 2005

Predictable response times in event-driven real-time systems

ECU State Manager Module Development and Design for Automotive Platform Software Based on AUTOSAR 4.0

AUTOSAR Software Architecture

Deeply Embedded Real-Time Hypervisors for the Automotive Domain Dr. Gary Morgan, ETAS/ESC

Embedded OS. Product Information

174: Scheduling Systems. Emil Michta University of Zielona Gora, Zielona Gora, Poland 1 TIMING ANALYSIS IN NETWORKED MEASUREMENT CONTROL SYSTEMS

Embedded Systems. 6. Real-Time Operating Systems

Partition Scheduling in APEX Runtime Environment for Embedded Avionics Software

Real Time Programming: Concepts

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

Patterns for Secure Boot and Secure Storage in Computer Systems

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

SYSTEM ecos Embedded Configurable Operating System

Task Priority Optimization in Real-Time Multi-Core Embedded Systems

Control 2004, University of Bath, UK, September 2004

Stream Processing on GPUs Using Distributed Multimedia Middleware

Software Development for Multiple OEMs Using Tool Configured Middleware for CAN Communication

Real-Time Operating Systems for MPSoCs

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Mutual Exclusion using Monitors

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

Middleware. Peter Marwedel TU Dortmund, Informatik 12 Germany. technische universität dortmund. fakultät für informatik informatik 12

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM

Threads Scheduling on Linux Operating Systems

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Standardized software components will help in mastering the. software should be developed for FlexRay were presented at

Safety and security related features in AUTOSAR

Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration

Module 6. Embedded System Software. Version 2 EE IIT, Kharagpur 1

Modular Real-Time Linux

Tasks Schedule Analysis in RTAI/Linux-GPL

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS

Safety and Security Features in AUTOSAR

Scalability and Classifications

An Active Packet can be classified as

Seminar Automotive Open Systems Architecture

A Multi-Agent Approach to a Distributed Schedule Management System

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

Java Virtual Machine: the key for accurated memory prefetching

Plug and Play Solution for AUTOSAR Software Components

Chapter 6, The Operating System Machine Level

Chapter 1 Computer System Overview

Chapter 3. Internet Applications and Network Programming

HARD REAL-TIME SCHEDULING: THE DEADLINE-MONOTONIC APPROACH 1. Department of Computer Science, University of York, York, YO1 5DD, England.

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Chapter 1: Introduction. What is an Operating System?

Weighted Total Mark. Weighted Exam Mark

Operating Systems 4 th Class

Operating System Tutorial

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Embedded & Real-time Operating Systems

Runtime Verification for Real-Time Automotive Embedded Software

Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC

Chapter 11 I/O Management and Disk Scheduling

Engineering Change Management (ECM)

An Implementation Of Multiprocessor Linux

The Service Availability Forum Specification for High Availability Middleware

Optimizing Shared Resource Contention in HPC Clusters

A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems

Multi-core Programming System Overview

Resource Utilization of Middleware Components in Embedded Systems

A Tool for Generating Partition Schedules of Multiprocessor Systems

Symmetric Multiprocessing

Processor Capacity Reserves: An Abstraction for Managing Processor Usage

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Effective Scheduling Algorithm and Scheduler Implementation for use with Time-Triggered Co-operative Architecture

Open Source Software

Multicore scheduling in automotive ECUs

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

Model Based System Engineering (MBSE) For Accelerating Software Development Cycle

CS 565 Business Process & Workflow Management Systems

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

A Survey of Fitting Device-Driver Implementations into Real-Time Theoretical Schedulability Analysis

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

Embedded Parallel Computing

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

Transcription:

Comparison of Service Call Implementations in an AUTOSAR Multi-core OS Christian Bradatsch, Florian Kluge, Theo Ungerer Systems and Networking Department of Computer Science University of Augsburg Augsburg, Germany {bradatsch,kluge,ungerer}@informatik.uni-augsburg.de Abstract Multi-core processors are gaining a foothold in the domain of embedded automotive systems. The AUTOSAR Release 4.1 establishes a common standard for the use of multicore processors in automotive systems. While interfaces and functionalities are well defined in the specification, the actual implementation is left open to the software manufacturers. We exploit this room that is left by the specification for the implementation of cross-core service calls. In this paper, we compare two opposite implementation approaches that can be used in sharedmemory multi-core processors. The actual execution of a service call either takes place on the affected core, or on the invoking core. Our performance evaluations indicate an advantage of a lock-based approach with execution on the invoking core. I. INTRODUCTION Multi-core processors are standard in the desktop area and also widely spread in embedded systems domains. In embedded systems, the demand for more processing power is increasing constantly. The two main reasons therefore are more complex algorithms and the integration of multiple distributed electronic units into a single processing module. Since the AUTOSAR Release 4.0 in December 2009, multi-cores are present in the automotive domain too. With the publication of Release 4.1 in March 2013, the support of multi-core systems was further improved. AUTOSAR establishes a common standard and defines standardized interfaces and functionality, but leaves implementation open to the manufacturers. In this paper we investigate some service calls, which were extended to support cross core execution introduced in the AUTOSAR 4.0 standard. We consider two fundamentally different implementation approaches for such service calls crossing core boundaries. On the one side a strict message based approach is used, and on the other side we use a lock based approach derived from OS approaches for SMP systems. Between these two approaches, different combinations containing parts from both sides are conceivable. On the basis of one exemplary service call, we evaluate our results. We compare execution speed and consider real-time constraints. As evaluation platform, we use the many-core simulator of the parmerasa project [1]. For our test scenarios we set up the simulator in a dual-core and quad-core configuration. Hence, the results are comparable to embedded COTS processors. A. Related Work Another scientific approach is focusing on a similar topic. In [2], the authors examine multi-core strategies for the AUTOSAR system software to run legacy code with as less modifications as possible. In their paper, they list several approaches how synchronization can be achieved regarding the system software. The main focus lies on how to prevent concurrent access to higher level system services, for example services for sending and receiving CAN messages. In [3] two of these approaches are compared. In contrast, in this paper the basic OS service calls are examined regarding multi-core execution. Those basic OS services are also used in other service calls, e.g. CAN, and hence reside on a lower level. The OS services are not only applied to other system software services, but also used for application level communication. The remainder of this paper is structured as follows: Section II gives an overview of AUTOSAR OS and the services relevant for multi-core execution. In section III the two different approaches are compared and in section IV the timing behavior is discussed. Section V details the corresponding implementations. The results of the evaluation are discussed in section VI. At the end, a short outlook for future work is presented and the paper is concluded with a summary in section VII. II. AUTOSAR OS AUTOSAR [4] consist of the three main parts, software architecture, methodology, and application interface. The software architecture part can be divided into runtime environment (RTE) and basic software (BSW). The BSW is organized in BSW modules providing standardized services from application to driver level for various purposes, such as system services, communication interfaces, and peripheral access. The OS module is one of these BSW modules, which is a central module that cannot be omitted. AUTOSAR OS [5] is an event-triggered real-time operating system descending from the OSEK/VDX OS [6]. It schedules tasks according to a fixed priority preemptive policy.it also inherits mechanisms to allow for time triggered scheduling. Since the last revision of AUTOSAR Release 4.0, multicore support is an inherent part of the OS specification. The AUTOSAR Specification of Operating System [5] describes the general multi-core concept, the API and functionality of

Service Call Check passed yes Action Return Message based access across core Lock based direct access across core no Fig. 1. General OS service call processing sequence: After service call several conditions are checked. If check passed, the corresponding service call action is performed and it returns. Otherwise, the service call returns immediately with an error code. OS services and their extensions for multi-core support. It also states the extensions required for OS services adopted from the OSEK specification. However, implementation details are left open. An OS service is a system or function call with the following properties: It can possess input and output parameters and has a return status. Almost all services have an input parameter passing an object ID, e.g. TaskID. It determines the OS object affected by the service call. Fig. 1 shows the general structure of an OS service. The processing sequence starts with the service call. Next, a check of certain conditions takes place, for example, whether the passed input parameters are valid. If the check fails, the service returns immediately, otherwise the service performs its specific action and returns. A corresponding error value is given back at return. The program execution is delayed until the service returns, i.e. the service call is synchronous. A few OS services only request some system status information, whereas others make major changes to the system state and additionally require a rescheduling at service return. The AUTOSAR OS specification lists 19 former OS services, which have been extended to support cross core usage and 8 new services especially for usage in multi-cores. A few other services had to be adapted to work correctly. The OS object ID parameter of a service helps identifying the affected core. Some OS services do not require an object ID, so they are executed on the same core where they were called. If the action of an OS service called on one core (source core) is related to an OS object on a different core (target core), the action has to be performed immediately anyway. Only the core where a certain service was called gets the appropriate return status, regardless whether the action was executed on another core. Nearly all automotive embedded systems postulate hard real-time requirements [7]. To comply with these requirements and to make analysis easier regarding real-time constraints, the whole AUTOSAR OS is statically configured. This means that the required amount of memory for all OS objects is directly derived from the configuration and known at compilation time. Furthermore, on a multi-core processor each OS object, like tasks, is assigned to one specific core and each core has its own scheduler. III. CONCEPT The AUTOSAR OS Specification leaves the implementation of the operating system open. For multi-core systems, we see two basic implementation approaches for cross core service Core (SC) Core (TC) Core (SC) Core (TC) Fig. 2. Comparison of access across cores: the dashed rectangles symbolize the shared memory where the is stored. The dashed arrows indicate the accesses, if source core (SC) and target core (TC) are swapped. Left side: is only accessed from the corresponding core. SC sends a message to TC (arrow between SC and TC), which performs the corresponding service call and updates the relevant. Right side: can be directly accessed from SC. Hence, the data must be protected by mutual exclusion. calls that we want to compare in this paper. Each service call reads OS runtime data during its check phase and reads and/or writes during its action phase. For example, activating a task changes amongst others the scheduling table on the corresponding core the task is located. The scheduling table is part of the OS runtime data of each core. Fig. 2 contrasts the two access methods crossing core boundaries addressed in this paper. We are assuming that all cores are connected to a shared memory and a global address space is available. On each core an OS kernel is executed, which administrates its own OS runtime data. The of all cores is stored in the same shared memory. The left side of Fig. 2 shows a message based approach, where the target core (TC) is notified by a message about an OS service call and the call is processed on the TC. Since every service call is performed on the TC, there is no need to access TC s from any other core. For this approach, it is not mandatory that the of all kernels are accessible from each core, but it is also not forbidden. The right side of Fig. 2 presents a lock based approach. Each service call is processed on the source core (SC). To read/write the affected by the service call, direct access to the of the TC is required. Therefore it is necessary that the of all kernels is accessible from each core and thus stored in shared memory. For preserving data consistency, the access has to be secured by mutual exclusion. The two approaches are illustrated in Fig. 3 and Fig. 4. Both figures show the general processing sequence of an OS service call as explained in section II and presented in Fig. 1. There are two time lines, the upper one representing the source core and the lower one the target core. The main difference is that the service check and action are processed either on the SC or the TC. A. Approach 1: Message based Fig. 3 represents the first approach using messages. Task on the SC calls a service with an ID of an OS object located on the TC as parameter. The SC sends a message to

Service Call Service Return Service Call Lock Unlock Service Return SC TC M. Service Call M. Return Status T2 check action Tx SC TC T2 check action M. Reschedule OS runtime data locked by SC T2 Tx t SC t SC t TC t TC t Service t Service Fig. 3. Cross Core Service Call Processing Sequence using a Message based Approach Fig. 4. Cross Core Service Call Processing Sequence using a lock based Approach the TC, containing the service tag, e.g. ActivateTask, and the corresponding parameters. At this time, the TC executes task T2. The TC is interrupted immediately and starts checking the service constraints. After a positive check, the service call cannot fail anymore. Regardless of the outcome of the check, at this point a message is sent back to the SC with the return status. On the SC the service call returns and the program execution of continues. If the check has passed, the TC processes the specific action of the OS service. Depending on the system state of the TC and the called service, a rescheduling may take place after it. So, either program execution of T2 continues or another task gets swapped in. Since a service call is synchronous, the calling core is blocked for the same period as the TC. If a second service targeting an OS object on the TC is called during the check or action phase, the service must wait until the action is finished. B. Approach 2: Lock based The second approach using locks is presented in Fig. 4. Task calls a service targeting an OS object located on the TC. The service tries to get the lock for the OS runtime data of the TC. After acquiring it, the preconditions are checked. Thereby, the SC accesses the data of the OS object, assigned to the TC by configuration. If all conditions are met, the action is performed on these object data. For example, in the case of an OS service requesting the actual information of a task bound to another core, the service action reads the status of the task control block on the TC. If the service does not require a rescheduling (cf. [6] p. 20 ff.), the lock is released and the service returns. Task is resumed on the SC and T2 continues. Otherwise, the SC checks if a task other than the currently running task T2 has to be swapped in. If so, a message with the service tag Reschedule is sent to the TC. Afterwards, the SC releases the lock and resumes execution of. Task T2 gets interrupted by the message and checks again, if another task has to be started or resumed. Depending on the result, T2 is resumed or the highest priority task is loaded. The double check, whether a task swapping has to be done or not, is because between unlock and receive of the reschedule message the system state on the TC might have changed by other OS service calls. During the lock/unlock period, all service calls even called on the TC and regarding an OS object on the TC are blocked until the release of the lock. C. Summary In approach 1, the TC is interrupted for each OS service call affecting an OS object on it. The second approach only interrupts the TC, if a rescheduling is required. Both approaches have in common that during an OS service call an access to OS object data of the TC is not possible from a competing service call. On the one hand the access is prevented through the execution of the service on the TC (cf. Fig. 3) and on the other hand by locking the OS object data (cf. Fig. 4). IV. DISCUSSION OF TIMING BEHAVIOR All AUTOSAR services can be used by application software that has to operate under hard real-time conditions. Therefore, the timing behavior of a service call is of paramount significance for the system. Hard real-time constraints require that the execution time of any service call is statically analyzable and safe timing bounds can be computed. Furthermore, possible interferences between different tasks must be boundable, if they cannot be omitted completely. Concerning the execution of a service call, we argue as follows: the concepts underlying the service calls are already used in today s single-core implementations, where they provide timing predictability. This means that if the phases check and action are executed on a single-core processor, a WCET bound can be derived. Since the core implementation also applies to the multi-core variants of the service calls, the WCET predictability is conserved. The migration to multicore introduces some additional mechanisms into the service call that require further examination. Especially, we have to examine interferences that cannot happen in the single-core case, but most probably will do in the multi-core implementation. In approach 1, where the service call is executed on the target core, we have to examine what can be happening while the service call message is being transmitted. Obviously, the worst case would be if similar messages were issued from other cores, and all would be processed before the one under

examination. In such a scenario, the processing of the service call will be delayed. However, the delay can be bounded: Service calls are executed synchronously, meaning that a task that has a pending service call is blocked until the call returns. Thus, no task can send more than one service call at any time. As the system itself is configured statically, the total number of tasks n t is already known at integration time, thus providing an upper bound to the number of service calls that might be active at a target core. An even better bound is obtained through the maximum number of tasks that can execute concurrently on different cores, which is bounded by the number of cores n c available for task execution. The service call messages themselves are processed in order of their arrival. Therefore, a naive bound for the worst-case waiting time (WCWT, the time between issuing a service call and the start of the actual processing) would be n c w S with w S being the worst-case execution time of the most complex service call that can occur. Further analysis of the system most probably will yield the additional information that not all of the n c tasks will issue a service call to the specific target core, which will lead to a diminished WCWT. In approach 2, the source core might also experience a waiting time until access to the lock is granted. To bound waiting times a fair locking mechanism is required. Fair locking mechanism in this context is defined as follows [8]: If more than one thread compete to enter a critical section, an analysable ordering (e.g. first-come, first-served (FCFS)) that allows each thread to enter (and leave) the critical section with bounded waiting must be assured. Due to this requirement, the same considerations as for approach 1 can be applied here. Therefore, timing predictability of the service call is preserved for both implementations. Additionally, we have to check the interferences that task T2 running on the target core will experience. In approach 1, T2 gets interrupted any time a task running on another core issues a service call to the target core. A thorough analysis of the system and its behaviour can yield information about the occurrence rate of these interruptions. Using wellknown techniques like e.g. event streams [9], it is possible to account for these interruptions in the schedulability analysis for T2 and thus still achieve a predictable timing behaviour. In approach 2, where the service call is executed on the SC, T2 will only be interrupted if a rescheduling is necessary, i.e. if a task with a higher priority got ready. This behavior is not different from the one that can be observed in today s single-core implementations: tasks with low priorities might get preempted by tasks with higher priority. Such behaviour must be covered by regular schedulability analysis like e.g. rate monotonic [10]. Furthermore, T2 may experience an additional blocking time if it executes a (local) service call while another task is holding the lock for the core s data structures. Similar to the above discussion, this waiting time is bounded by the maximum number of tasks that might access the lock at the same time. The locking behaviour is similar to the multiprocessor priority ceiling protocol (MPCP) [11]: Manipulation of the TC s can be seen as a global critical section, which is prioritized over all local processing performed on the same core. Although both approaches appear feasible under real-time constraints, approach 2 seems more appealing under the following viewpoint: approach 1 interrupts the target core in any case, regardless whether actual execution on the core must actually be changed due to the service call. Thus, we expect approach 2 to achieve at least a higher average performance. V. IMPLEMENTATION For the implementation, appropriate support from the system is needed. The first approach requires a synchronous intercore notification mechanism and the second one a fair locking mechanism. The platform architecture of the parmerasa project [1] covers both requirements. The parmerasa manycore simulator simulates the parmerasa many-core architecture and is based on SocLib [12], which is an open platform for virtual prototyping of multi-processors system on chip. SocLib is cycle accurate and provides a global address space. It is divided into several clusters, each having 4 8 cores sharing a cluster memory. Between cores, interrupts can be used for inter-core notification. The memory architecture inside a cluster can be compared to embedded multi-core processors, which have a shared memory, for example Freescale Qoriwa MPC5643L. So it is possible to transfer evaluations preformed on the simulator to current multi-cores. Furthermore, the parmerasa platform provides a system software concept, which consists of a Library [13] and different-domain specific runtime environments. For the automotive runtime environment, we implemented a subset of AUTOSAR functions, called TinyAUTOSAR, on the parmerasa simulator. The Library offers a fair locking mechanism in form of a ticket lock implementation [14] based on a fetch-and-add instruction (cf. [8] p. 36). In the TinyAUTOSAR OS implementation, separate OS runtime data, for example scheduling lists or actual OS runtime information concerning the core, are existing for each core. Each OS object, e.g. core or task, has a unique ID inside the TinyAUTOSAR OS. It also possesses statically configured properties as well as individual runtime information. Each OS service, e.g. ActivateTask, has its own service type ID. The inter-core notification mechanism is paired with a highly reduced message passing interface. It provides an adapted blocking send (MessageSend) and receive (MessageRecv) operation. Both operations have the following parameters: pointer to the send/receive buffer containing the message, buffer respectively message size, destination/- source core, and a message tag. void MessageSend ( void buf, i n t s i z e, i n t d e s t, i n t t a g ) ; void MessageRecv ( void buf, i n t s i z e, i n t source, i n t t a g ) ;

The MessageSend parameter dest indicates the target core and the MessageRecv parameter source is used to accept only those messages from the specified source core. The message tag contains the OS service ID to distinguish between the various OS services. Only if the tag on sender and receiver side equals, the message is received. MessageRecv also accepts an ANY_SOURCE and ANY_TAG parameter for source and tag. So, it can accept any message from an arbitrary source. The message itself has variable size and content, depending on the OS service. It is stored in a local buffer with the size of the message and referenced by buf. Common to all OS services, the message contains the source core ID, OS object ID, and the required service call parameters. Only the OS services StartOS, ShutdownOS, and ShutdownAllCores do not have an object ID. On sender side a local buffer contains the message to be sent. On receiver side the received message is stored in a local buffer. Here, it is about two different local buffers. For each receiving core a separate, globally accessible message buffer exists, called global receiving buffer. Each global receiving buffer is organized as a list with the number of elements equal to the number of sending cores. Each list element is assigned to exactly one core. For example, in a quad-core processor, where all cores are exchanging messages with each other, there exist four global buffers each with four list elements. When MessageSend is called, it checks whether the corresponding list element of the global receiving buffer of the destination core is empty. If so, the global receiving buffer of the accordant core is locked. The message is copied from the local buffer to the global receiving buffer. The list element is insert at the head of the list and marked not empty. Afterwards, the lock is released. Otherwise, MessageSend is blocking until the list element is free. When MessageReceive is called, it checks whether the global receiving buffer contains a message with the equivalent source and tag. If the passed parameters are ANY_SOURCE and ANY_TAG, MessageReceive checks whether the list is not empty. Otherwise, MessageReceive is waiting until an appropriate message is available and acquires the lock. The message is copied form the global receiving buffer to the local buffer. The corresponding list element is removed from the list and marked as empty. Then the lock is released. A. Approach 1: Message based In the first approach, each service call is forwarded and executed on the target core. At the beginning, each service routine checks, whether the passed OS object ID is valid. With its help, the target core of the OS object identified by this ID is determined. This can be done on the core where the service was called, since the object ID information is statically configured. If the OS object resides in the local core, the service is processed like in the single-core implementation. Otherwise, a message is prepared including the source (local) core ID, OS object ID, and possibly additional service call parameters. Afterwards, MessageSend is called. The destination core is notified by an inter-core interrupt. The source core is then waiting for a reply containing the return status to finish the service call. Therefore, MessageRecv is called with the target core ID as source parameter and the service type ID of ReturnStatus as message tag. On target core side, a interrupt service routine (ISR) is responsible for receiving and analyzing messages. Due to the fact that the ISR does not know from which core or of which type the next message might be, it has to accept every message. Thus, MessageRecv is called with ANY_SOURCE and ANY_TAG parameter. After receiving a message, the ISR evaluates the service type ID and calls the corresponding OS service with the parameters encapsulated in the message. Then, the service checks whether certain conditions are met. At this point, the return status can be evaluated, since the service action can not fail anymore, except for an abnormal OS service termination. The status is transmitted to the source core by calling MessageSend. Next, the service action is performed and a rescheduling takes place eventually, depending on the OS service. Otherwise, the task executed before interruption is resumed. B. Approach 2: Lock based The lock based approach starts in the same way as the message based one. First, the passed OS object ID is checked for validity and possibly the service returns with an appropriate error. If the ID is valid, the SC enters a critical section by acquiring a lock for the OS runtime data of the TC. To guarantee fairness and real-time constraints, a ticket lock implementation was chosen. The SC checks several conditions to be met. Therefor, the SC reads the corresponding values from the TC. If the checks passed, the SC executes the service action. Depending on the specific service call, the SC has to read and/or write values located on the TC. If the OS service postulates a rescheduling, the SC checks, whether a task swapping has to be performed on the TC. If so, MessageSend is called with the TC as destination and service type ID of the OS service Reschedule. Afterwards, a software interrupt is triggered at the TC to notify it immediately about the message. The lock is released and the service returns with a certain status. If one of the checks did not pass, the service immediately returns indicating an error. As stated in section V-A, an interrupt service routine at TC side processes incoming messages. After evaluation of the service ID tag of the message, the rescheduling service on the TC is called. VI. EVALUATION For our evaluations, we decided to analyze the OS service ActivateTask. ActivateTask and SetEvent are the only cross core service calls leading to a rescheduling on the target core. This means that the scheduling table and the task control block on TC side have to be updated. Thus, these two service calls are the most complex ones. Since the SetEvent processing sequence is the same as of ActivateTask, it is not separately evaluated. One reason for evaluating ActivateTask is because of its complexity. The other one is due to the fact that it is

Software Component A Core 1 Core 2 Software Component B TABLE I SERVICE CALL EXECUTION TIMES IN CYCLES ON DUAL-CORE CONFIGURATION RTE RTE_Send_... Activate Task Buffer Forward to Task: Read Buffer RTE Message based Lock based Overhead t SC 41513 12499 232.1% t T C 18728 8550 119.0% t Service 42098 9489 343.7% Fig. 5. Sender Receiver Communication across Cores often used when data is passed between tasks ([15] p. 6 ff.). More precisely, it is relevant for inter-partition (cf. [16] p. 16) and inter OS-Application (cf. [5] p. 120 ff.) communication. Inter OS-Application communication is used whenever user applications are communicating across core boundaries. ChainTask is nearly the same as ActivateTask with the difference that first the task calling ChainTask is terminated and then a specific successor task is activated. CancelAlarm and the Get... and Set... service calls have the same general processing sequence, but are less complex. StartCore, StartNonAutosarCore, ShutdownAllCores, StartOS, and ShutdownOS are only called at the beginning and the end of the program and thus have less impact on the overall performance. GetSpinlock, TryToGetSpinlock, and ReleaseSpinlock are service calls listed as services supporting multi-cores in the AUTOSAR specification (cf. [5] pp. 96 98). But they are only operating on the OS runtime data of that core, on which they are called. They do not need to manipulate of other cores and this distinguishes these three service calls form the others. So they are not covered in this paper. ActivateTask prepares a task for execution and sets the desired task in ready state. The task with the highest priority and residing in ready state is executed after rescheduling. The ActivateTask processing sequence (cf. Fig. 1) is as follows: First, the prerequisites are checked, whether it is permitted to set the task s state to ready. If the check passed, the task is set to ready and all associated actions on the OS data structures are performed. Afterwards, a rescheduling is performed and the ActivateTask service call returns signaling no errors occurred. If the check did not pass, the service call directly returns with an error code. Fig. 5 shows a send/receive communication scenario between two software components (SW-Cs) belonging to different cores. This kind of mechanism is not only used for sender/receiver communication with notification, but also for client/server communication. A lot of inter-core communication is performed in this way. In Fig. 5, SW-C A of core 1 wants to send data to SW-C B of core 2 and notify it. SW-C A calls RTE_Send_... and the RTE takes care of writing the data into a shared buffer. The RTE itself calls ActivateTask to immediately start a reader task on core 2, which reads the data out of the buffer. In turn, the reader task forwards the data to SW-C B. Without calling ActivateTask, SW-C B would have to check periodically, if data is available in the buffer. To compare our two approaches, we measured the following values as illustrated in Fig. 3 and Fig. 4: In both approaches the duration of the service call on the source core (t SC ) is calculated and in approach 1 also on the target core (t T C ). In the second approach, the duration on the target core (t T C ) reveals the time the OS runtime data of the target core is locked by the access of the source core. Also, the time from calling the service until finishing its action is measured (t Service ). This gives us information about the blocking times on the source and target core, and the latency until an OS service call crossing core boundaries is performed. For all measurements the service call ActivateTask was executed 10 times and the results were averaged. In our first test scenario, a dual-core configuration is used. The processing sequence of a service call on a dual-core is equivalent to sequences shown in Fig. 3 and Fig. 4. In both cases, task calls ActivateTask to activate a task on the target core, which currently executes task T2. We assume that T2 is not performing another OS service call at the same time. In both approaches, this would delay the service call of a certain amount of time. In the message based approach, the service call message would not be processed until the OS service called by T2 had returned. In the lock based approach, acquiring the lock would take as long time until the OS service called by T2 has released the lock on the same data. This would increase all measured values likewise. For our measurements, it does not play a role, whether a rescheduling happens at the end of the service. The real OS service action is completed before a rescheduling is initiated. It is also the task of the scheduler, not of the service. Furthermore, not all OS services postulate a rescheduling. The results are detailed in Table I. The times include also the time consumed by the messaging and locking functions. They are stated in clock cycles and the last column shows the overhead of the message based compared to the lock based approach. As you can see from Table I, all times of approach 1 are higher. Furthermore, the SC as well as the TC are blocked by executing the service. In contrast, in approach 2 only the SC is occupied by the service whereas the TC executes T2 nonstop. But there is one limitation: the TC cannot access OS runtime data for the time (t T C ). So, in a dual-core scenario the lock based approach performs better. It has an additional advantage in the case of OS services requiring a rescheduling. If the precondition check of the service fails, the TC is not interrupted. In the message based approach it is definitely interrupted. This also has an influence on the WCET of the TC.

TABLE II SERVICE CALL EXECUTION TIMES IN CYCLES ON QUAD-CORE CONFIGURATION Message based Lock based Overhead t SC 56726 18104 213.3% t T C 32156 10024 220.8% t Service 57176 11081 416.0% TABLE III TIME OVERHEAD OF THE DUAL-CORE COMPARED TO THE QUAD-CORE CONFIGURATION Message based Lock based t SC 36.6% 44.8% t T C 71.7% 17.2% t Service 35.8% 16.8% The second test scenario examines the behavior on a quadcore configuration. Three SCs try to activate the same task on the TC at the same time. In the case of the message based approach, this means that the service call messages of all SCs arrive nearly simultaneous at the TC. We only measure the times of the SC whose message arrives first. The other SCs must wait until the first service call returns, so the first call is not interrupted by the subsequent calls. For the lock based approach we also only measure the times of the SC which obtains the lock first. The results in Table II illustrate that the lock based approach has an obvious advantage compared to the message based approach. Table III shows the increase in time overhead from the dualcore configuration compared to the quad-core configuration. The increase in time is smaller for the lock-based approach, except for t SC. VII. CONCLUSION We have discussed two opposite implementation approaches for cross-core service calls in an AUTOSAR multi-core system. The first approach executes the service on the target core, interrupting the currently executing task in any case. In the second approach, the service call is executed on the invoking core and manipulates the target core s data structures residing in shared memory. The target core is only interrupted if a rescheduling needs to take place. The discussion on the timing behavior shows the feasibility of both approaches from the point of view of a WCET analysis. Measurements performed on a prototype implementation on the parmerasa many-core simulator indicate a significant advantage of the lock based approach. So, we conclude that for multi-core processors with up to four cores providing a shared memory architecture, a lock based approach offers more performance. It also has advantages regarding WCET analyzability due to missing remote blocking times when no rescheduling is required. The results of the message based approach rely heavily on the message passing mechanism. Hence, a more optimized message passing mechanism might lead to better results. We will investigate different mechanisms in future work. Furthermore, we will examine the behavior on multicore processors providing core local memory. This also might improve the OS service call durations in the message based approach. ACKNOWLEDGMENT The research leading to these results has received funding from the European Union Seventh Framework Programme under grant agreement no. 287519 (parmerasa). REFERENCES [1] T. Ungerer et al., parmerasa Multi-Core Execution of Parallelised Hard Real-Time Applications Supporting Analysability, in 16th Euromicro Conference on Digital System Design (DSD). IEEE, Sep. 2013, pp. 363 370. [2] N. Böhm, D. Lohmann, and W. Schröder-Preikschat, Multi-Core Processors in the Automotive Domain: An AUTOSAR Case Study, in Proceedings Work-in-Progress Session of the 22nd Euromicro Conference on Real-Time Systems (ECRTS), Jul. 2010, pp. 25 28. [3], A Comparison of Pragmatic Multi-Core Adaptations of the AUTOSAR System, in 7th annual Workshop on Operating System Platforms for Embedded Real-Time Applications (OSPERT), Jul. 2011, pp. 16 22. [4] AUTOSAR. (2014, Feb.). [Online]. Available: http://www.autosar.org/ [5], Specification of Operating System (Version 5.2.0), Automotive Open System Architecture GbR, Tech. Rep., Oct. 2013. [6] OSEK/VDX, Operating System (Version 2.2.3), OSEK Group, Tech. Rep., Feb. 2005. [7] K. Kavi, R. Akl, and A. Hurson, Real-Time Systems: An Introduction and the State-of-the-Art. John Wiley & Sons, Inc., 2009, ch. Wiley Encyclopedia of Computer Science and Engineering, pp. 2369 2377. [8] M. Gerdes, Timing Analysable Synchronisation Techniques for Parallel Programs on Embedded Multi-Cores, Ph.D. dissertation, University of Augsburg, Oct. 2013. [Online]. Available: http://opus.bibliothek.uniaugsburg.de/opus4/frontdoor/index/index/docid/2396 [9] K. Gresser, An Event Model for Deadline Verification of Hard Real- Time Systems, in Proceedings of the fifth Euromicro Workshop on Real- Time Systems. IEEE, Jun. 1993, pp. 118 123. [10] C. L. Liu and J. W. Layland, Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment, J. ACM, vol. 20, no. 1, pp. 46 61, Jan. 1973. [11] R. Rajkumar, L. Sha, and J. Lehoczky, Real-time synchronization protocols for multiprocessors, in Real-Time Systems Symposium, 1988., Proceedings., Dec. 1988, pp. 259 269. [12] Soclib, Soclib: an open platform for virtual prototyping of multiprocessors system on chip, Soclib Consortium and Others, Tech. Rep., 2008. [Online]. Available: http://www. soclib.fr [13] C. Bradatsch, F. Kluge, and T. Ungerer, A Cross-Domain System Architecture for Embedded Hard Real-Time Many-Core Systems, in 11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC-13). IEEE, Nov. 2013. [14] M. Gerdes, F. Kluge, T. Ungerer, C. Rochange, and P. Sainrat, Time Analysable Synchronisation Techniques for Parallelised Hard Real- Time Applications, in Design, Automation Test in Europe Conference Exhibition (DATE). IEEE, Mar. 2012, pp. 671 676. [15] ETAS Group Automotive LifeCycle Solutions. (2009, Sep.) Multi-core Automotive ECUs: Software and Hardware Implications. [Online]. Available: http://www.etas.com/en/downloadcenter/11421.php [16] AUTOSAR, Guide to Multi-Core Systems (Version 1.0.0), Automotive Open System Architecture GbR, Tech. Rep., Jan. 2013.