Adaptive Allocation of Software and Hardware Real-Time Tasks for FPGA-based Embedded Systems

Size: px
Start display at page:

Download "Adaptive Allocation of Software and Hardware Real-Time Tasks for FPGA-based Embedded Systems"

Transcription

1 Adaptive Allocation of Software and Hardware Real-ime asks for FPGA-based Embedded Systems Rodolfo Pellizzoni and Marco Caccamo Department of Computer Science, University of Illinois at Urbana-Champaign Abstract Operating systems for reconfigurable devices enable the development of embedded systems where software tasks, running on a CPU, can coexist with hardware tasks running on a reconfigurable hardware device (FPGA). Furthermore, in such systems relocatable tasks can be migrated from software to hardware and viceversa. he combination of high performance and predictability of hardware execution with software flexibility makes such architecture especially suitable to implement high-performance real-time embedded systems. In this work, we first discuss design and scheduling issues for relocatable tasks. We then concentrate on the on-line admission control problem. ask allocation and migration between the CPU and the reconfigurable device is discussed and sufficient feasibility tests are derived. Finally, the effectiveness of our relocation strategy is shown through a series of synthetic simulations. 1 Introduction As systems-on-chips (SoCs) become more widely used due to their improved performance, both in term of speed and power consuption, reconfigurable devices and in particular field-programmable gate arrays (FPGAs) are becoming more and more popular in the development of embedded systems where issues such as short time-to-market and update capabilities after deployment are critical. Recent developments in the field of operating systems for reconfigurable devices (OSRD) [18, 25, 26] enable a highly dynamic use of partially reconfigurable FPGAs, running multiple concurrent circuits (hardware tasks) with full multitasking capabilities. Furthermore, the introduction of embedded devices comprised of an FPGA and one or possibly several CPUs permits to run both software and hardware tasks on the same silicon device, achieving even greater flexibility. While a lot of work has been done in the design of suitable operating system abstractions and in the development of working prototypes for OSRD, much more remains to his work is supported in part by the NSF grant CCR , NSF grant CCR , and NSF CNS be done to obtain a feasibly usable platform. In particular, the important topic of real-time resource management has received little attention. In this work, we first introduce our vision for a reconfigurable platform that enables relocation (i.e. task migration between software and hardware) as a way to improve the system ability to cope with dynamic workloads. We then propose a novel allocation and admission control scheme that is able to improve the usage of system resources while preserving all timing constraints. In particular, our main contribution is the development of a relocation scheme with proven feasibility conditions that is suitable for real-time applications. he paper is organized as follows. In Section 2 we introduce our system abstraction, discussing its applicability and practical limitations, and we further describe our resource management scheme. In Section 3 and 4 we present our solutions to the allocation and relocation problem, providing simulation results in Section 5. Finally, in Section 6 we discuss related works and in Section 7 we provide concluding remarks and future work. 2 System Model We consider a system comprised by a general purpose CPU and a partially Reconfigurable Device (RD), together with main memory and I/O devices. Modern devices, like the Xilinx Virtex-II Pro and Virtex-IV family of FPGA [27], implement all of the above on a single configurable SoC. An OSRD is used to manage the entire system; prototypes have been proposed in [18, 25, 26]. asks can be provided to the system in both a software and a hardware configuration. he software configuration is a traditional software program that runs on the CPU, while the hardware configuration is implemented as a hardware circuit on the RD. Codesign tools can be used to generate both configurations given an initial specification in a highlevel language [10]. Since the RD is partially reconfigurable, it is possible to reconfigure a single hardware task at run-time by downloading its configuration data (known as a bitstream) without affecting the remaining hardware configurations. asks are relocatable, i.e. they can migrate from software to hardware and viceversa (relocation is implemented by [18]). 1

2 asks can be dynamically activated and terminated. Furthermore, we assume that tasks are subject to real-time constraints, i.e. once activated they are periodically executed and each task instance must terminate before a given deadline. We believe that this architectural model suits a variety of systems, including Micro unmanned Aerial Vehicles (MAVs) [], wearable computing [20] and sensor networks for complex tracking and surveillance applications [22], which exhibit characteristics that makes solutions based on both single or multiple CPU and on fixed hardware unsuitable: Dynamic workload with high computational demands: As an example, in applications based on multiple unmanned aerial vehicles coordinating on a global mission each vehicle must perform multiple concurrent control tasks together with complex multi-vehicle coordination [12], wireless processing and data aggregation, sensor processing and target tracking and localization. he workload is extremely dynamic, depending on both the vehicle and the mission status. Due to multiple target tracking and multiple surveillance objectives different tasks are contending for system resources. Wearable computing and sensor networks for tracking applications share similar characteristics (dynamic tasks with event-based workload surges), plus due to their long deployment time software updates are often necessary. Due to these intrinsic dynamic aspects, static task allocation on fixed hardware is hardly possible. At the same time, general purpose processors can not provide the required level of performance. he proposed model can constitute a valid solution, combining the flexibility of general purpose systems with the performance of hardware solutions; in particular, the advantages of relocation are thoroughly discussed in [17, 20]. Energy and cost constraints: All the proposed systems are severely energy constrained, since they draw power from either batteries or solar cells; the amount of energy used for computation is a significant percentage of the total energy consumption (see [] for details on MAVs). FP- GAs have been proven to provide better performance and to be more power efficient than both general purpose and application specific processors for a variety of applications [20]. Furthermore, all the proposed systems are deployed in large numbers and are therefore cost-sensitive. he use of a single SoC including a high performance FPGA can easily replace a number of discrete chips thus lowering board complexity and helping reducing costs. Both energy and cost constraints imply that the available computational resources must be efficiently used, i.e. by maximizing the amount of computation (tasks) that the system can handle. Real-time constraints: MAV applications, wearable computing systems and sensor networks can include critical monitoring and targeting tasks for which proven delay bounds must be guaranteed. Furthermore, flight control on MAVs is implemented through hard-real time tasks. Following the discussion above, the overall goal of our resource management scheme can be stated as: maximize the number of tasks simultaneously running in the system, while guaranteeing all real-time constraints. herefore, we need to provide an admission control test: every time a task is presented to the system, we run the test to check if the new task can be admitted while guaranteeing all the already running tasks. We introduce the details of our management scheme in Section 2.3, after we discuss some key model limitations in Section 2.1 and our task model in Section 2.2. It is important to note that our management goal assumes that, from a computational point of view, the software and hardware configurations of a task provide equal performance. In other applications (for example multimedia terminal) it can be more useful to consider hardware configurations providing better service compared to the corresponding software configurations. In this case the overall system goal would be to maximize the quality of service perceived by the user. Although we do not consider such scenario in this work, we are currently investigating it and our current results show that the admission control scheme can be applied unchanged to this further case. 2.1 Model Limitation When concerned with practical implementation of the proposed abstraction, several limitations of currently available reconfigurable devices and operating systems need to be considered. First of all, hardware configurations must be constrained in rectangular areas. wo area model are employed by current OSDR prototypes. In slotted area model, the device is divided in a series of slots, each of which has the same dimensions. Each task is partitioned by means of suitable design tools in some number of slots, which can be positioned anywhere on the device. he slotted area model incurs internal fragmentation: some area on the device can be wasted if the area occupied by a task is not a multiple of the defined slot area. he slotted area model is employed by [17, 18]. In the 1D area model, each task occupies a rectangular area on the device. he vertical dimension is fixed and spans the height of the device, while the horizontal dimension can vary. he 1D area model incurs both internal and external fragmentation: the total area available on the device can be greater than the area required by a task, but placing it can be impossible if the area is divided in smaller unconnected stripes. he 1D area model is employed by [25]. A more complex 2D area model is further discussed in the literature, but to the best of our knowledge, no working OSRD prototype is able to employ such model. In this work, we will only consider the slotted area model. ask communication is also a major issue, and different solutions, including buses and packet-switch networks, have been proposed [25, 15]. Communication is particularly critical in the slotted model since slots pertaining to the same tasks need strict synchronization. Real-time constraints for bus based systems are introduced in [6]. Since in this work we are mainly concerned with the management and relocation problem, we will assume that the system provides enough communication resources to meet the needs of all tasks and reserve a more thoroughly analysis for future work. 2

3 An important issue regards the reconfiguration capability of the RD. Each time a new task is started on the device, its bitstream needs to be loaded inside the device s configuration SRAM using the configuration interface; while downloading a bitstream, the area occupied by it can clearly not be used (other tasks can still run undisturbed). he load time is proportional to the task area; for modern, large devices, it is not negligible, in the order of tens of milliseconds to reconfigure a task that occupies the entire device [23]. his imposes severe constraints on how hardware tasks are managed. In particular, hardware tasks cannot be scheduled like periodic software tasks. Consider the slotted area model and suppose that hardware tasks are scheduled like periodic tasks, i.e. each hardware task is defined by a period and an execution time and is periodically activated. For simplicity, assume that all tasks occupy only one slot and have the same period p and execution time e. Let rec be the time needed to reconfigure the entire device, and t rec = rec A be the time needed to reconfigure a single slot, where A is the total number of slots. If we serialize slot reconfigurations, while a slot is reconfigured in t rec time all other tasks can keep running; therefore, we define U as the task utilization e+t rec p. It is then easy to see that if we want to keep the device constantly busy, we need to reconfigure the entire device 1 U times each p seconds, thus the following inequality must hold: p rec U. Supposing a typical time rec =50 ms [23] and U = 1 4, we cannot achieve frequencies greater than 5 Hz. herefore, in order to reduce the reconfiguration overhead we will impose that each hardware configuration executes for the entirety of its period, so that no reconfiguration is needed if no new task is activated. his is not a major limitation, since different synthesis parameters in tools permit a tradeoff between occupied area and execution time. his means that although the hardware configuration executes longer than the software one, it occupies a much smaller area that the one needed by an equivalent CPU dedicated to running the software configuration. he last issue regards hardware/software relocation. While suspending and migrating a software task between homogeneous CPUs is relatively easy, since the state of a software task can easily be saved, saving the state of a hardware task is more complex, since it involves saving the state of all its internal registers. While this is not technically impossible [21], it can nevertheless incur in a unbearable overhead. Instead, a different approach to relocation will be used. We assume that each task, either in software or hardware configuration, eventually reaches a point in which the execution of its next periodic instance does not depend on the state of the task after the completion of its previous instance, i.e. no internal state must be preserved between the two successive activations. When this point is met, a task can be relocated at the end of its period. Note, however, that reconfiguration constraints must be taken into account: while we can usually safely assume that starting a task on the CPU takes zero time, this is not true for the RD. herefore, the OSRD must first begin loading the task bitstream into the RD, which can possibly last for multiple task periods. When the loading operation completes and the end of a period is reached, the software configuration is terminated on the CPU and the hardware configuration is started on the RD. If a stateless point is never reached, we can add some additional logic to the task in order to save and restore state between instance activations. he resulting overhead is still much lower that permitting to save the state of the task at any time [16]. 2.2 ask Model Each relocatable task τ i is defined by a period p i,arelative deadline D i and two configurations: τi s (software), defined by an execution time e i,andτi h (hardware), defined by an area a i. We assume relative deadlines equal to periods, i.e. i, D i = p i. he execution time of a software configuration can be either a worst-case parameter (for hard tasks) or average-case parameter (for soft tasks). Furthermore, let U i = ei p i be the task s software utilization. Hardware configurations have no associated execution time. Each periodic instance (also called a job) of a hardware configuration runs for the entire period. Since hardware configurations cannot be preempted, they always meet their deadline as long as configuration changes (relocation) are only allowed between jobs. he area parameter depends on the area model of the RD: under the slotted area model, we denote with A the total number of slots on the RD and with a i the number of slots occupied by τi h. We assume that communication among tasks follows a synchronous dataflow approach, i.e. all inputs to a job are made available by the OSRD before the job starts and all outputs are propagated at the end of the job to subsequent tasks in the data graph. he dataflow model has several advantages. First, it enables transparency between hardware and software configurations since all data can be held in buffers managed by the operating system. Second, many commercially available languages and tools for hardware specification follow the dataflow model [3, 11]. hird, there is no need to account for blocking time due to critical sections during the execution of a task. Finally, the careful placement of buffers takes care of delays in data propagation along the communication infrastructure; in particular, precedence constraints among successive tasks can be removed by buffering one full task period. Software tasks can be scheduled on the CPU using any real-time scheduler with proven schedulability bounds and suitable isolation mechanisms. In this paper we will consider the EDF scheduler [14] in conjunction with the wellknown Constant Bandwidth Server [1]. he CBS provides isolation between hard and soft tasks so that all jobs of hard tasks are proven to complete within their deadlines if a feasibility condition is met. For a fixed task set S of software tasks, the following is a sufficient and necessary feasibility condition provided that kernel overhead is included in task execution times: U = U i 1, (1) τ i S where U is known as the total software utilization. 3

4 τ i τi s τi h p i D i e i U i a i i th task i th task software configuration i th task hardware configuration task period task relative deadline software configuration execution time task utilization hardware configuration area A RD area task set S set of tasks in software configuration H set of tasks in hardware configuration A = { S, H } allocation for task set U = τ U i i total utilization of tasks in U A = U S total utilization of tasks in soft. config. a = τ a i i total area of tasks in able 1. System notation In the same way, in order to be schedulable on the RD hardware configurations must meet placement constraints. Definition 1 (Slotted feasible placement) For the slotted area model, given a set H of hardware configurations scheduled on the RD we say that their placement is feasible iff: a i A. (2) τ i H asks can dynamically join and leave the system. he activation time of a task corresponds to the activation of its first job. he termination time of a task corresponds to the deadline of its last job. At any time t, (t) is the set of currently active tasks. Furthermore, let S (t) be the set of software tasks running on the CPU at time t and H (t) be the set of hardware tasks placed on the RD; then A (t) ={ S (t), H (t)} is the allocation for (t) iff S (t) H (t) = (t). Hence, the allocation of a task set defines how tasks are partitioned between the CPU and the RD. is said to be feasible iff each job of tasks in gets executed on either the CPU or the RD and A (t) results in both a feasible schedule and a feasible placement. able 1 summarizes the notation used throughout this work. 2.3 Management Scheme he following overall management strategy will be used. When a task or a group of tasks arrives in the system, an admission test is run to determine if it can be admitted. If the test succeeds, then the task is immediately activated on the CPU; in fact, loading a hardware configuration on the RD would delay the activation of the task. After the new task is activated on the CPU, or whenever a task is terminated, the system performs a relocation phase. he goal of the relocation phase is to relocate tasks, including the newly admitted one, in order to minimize the total software utilization, while preserving all feasibility constraints. We feel that this optimization objective is sensible for multiple reasons: Since newly activated tasks are admitted on the CPU to avoid the RD configuration overhead, minimizing the CPU utilization maximizes the probability of passing the admission test. Although we only consider relocatable tasks in this work, real systems would probably also be comprised of software-only tasks that cannot be placed on the RD. Although we are only concerned with the admission control problem in this work, we can envision situations in which hardware configurations provide services with better performance and lower power consumption compared to corresponding software configurations. he OS needs to run both the admission test and further computations to drive the relocation phase and to load hardware configurations. his added overhead can be considered as an added utilization term on the CPU. We will split the problem as follows. In Section 3, we discuss the problem of finding an optimal allocation given a task set subject to the slotted area model, assuming that no task is already running and therefore no relocation is required. In the subsequent Section 4 we will see how a pseudo-optimal solution can be used to drive the relocation phase. Due to space constraints, theorem proofs are not reported; they can be found in [1]. 3 Allocation Problem Given a task set of relocatable tasks, the optimal allocation problem consists in determining the feasible allocation A that minimizes the total software utilization on the CPU, supposing that no task is already running in the system. he problem can be stated as an integer linear programming optimization problem. Let us introduce for each task τ i in two indicator variables r i and c i. r i is set to one if τ i is placed on the RD, while c i is set to one if the task is scheduled on the CPU. he optimal allocation problem can then be represented as follows: Definition 2 (ILP ALLOC) Minimize τ i c iu i, subject to the following constraints and the restriction that variables r i,c i takes integer values only: τ i,c i + r i =1 (3) r i a i A (4) τ i τ i, 0 r i 1 (5) τ i, 0 c i 1 (6) 4

5 Lemma 1 ([1]) Any optimal solution to ILP ALLOC is an optimal solution for the allocation problem under the slotted area model, supposing that no task is already running. Now note that since τ i,r i +c i =1, min c i U i = min (1 r i )U i = U i max r i U i. herefore, the ILP ALLOC problem can be restated as the following equivalent ILP KNAP problem: Definition 3 (ILP KNAP) Maximize r i U i, subject to the following constraints and the restriction that variables r i take integer values only: r i a i A (7) τ i τ i, 0 r i 1 (8) Problem ILP KNAP is in the form of the well-known 0-1 KNAPSACK problem [13], which is known to be NP-hard in the weak sense. his means that pseudo-polynomial exact algorithms exist for the problem. However, since we are required to solve the allocation problem at run-time, even pseudo-polynomial algorithms can be excessively costly. Furthermore, as we will discuss in Section 4.1, using an optimal algorithm does not lead to a significant increase in performance. We will therefore use the simple greedy algorithm for 0-1 KNAPSACK to obtain a pseudo-optimal solution. he greedy algorithm works as follows, where R is used as a helper variable: Order all tasks in decreasing order of Ui a i. Assign R A. Starting from the first task τ i in the defined order, if R a i then set r i 1,R R a i,elsesetr i 0. Proceed to next task. Since we need to order the tasks, the complexity of the algorithm is O(N log(n)) where N is the number of tasks in. In order to characterize the performance of the algorithm, let LP KNAP be the linear relaxation of ILP KNAP (obtained by removing the constraints of r i being integer), OP(ILP KNAP), OP(LP KNAP) be the optimal solution to the ILP KNAP and LP KNAP problems respectively, and GREEDY(ILP KNAP) be the greedy solution to ILP KNAP. Furthermore, let τ c be the critical task, i.e.the first task, in decreasing order of Ui a i, such that r c =0in the greedy solution. It can be seen that the only difference between the OP(LP KNAP) and GREEDY(ILP KNAP) solutions is that while τ c is partially placed on the RD in the optimal linear solution, the greedy algorithm places it entirely on the CPU. herefore, since OP(LP KNAP) OP(ILP KNAP) GREEDY(ILP KNAP), the following inequalities hold: OP(LP KNAP) GREEDY(ILP KNAP) > > OP(LP KNAP) U c () OP(ILP KNAP) GREEDY(ILP KNAP) > (10) > OP(ILP KNAP) U c Given GREEDY(ILP KNAP), the total CPU utilization U can be computed as τ (1 r i i)u i = τ U i i GREEDY(ILP KNAP). he task set can then be admitted if U 1. However, if some tasks are already running in the system, then a relocation phase is needed to reach the new computed allocation A. he following section details the relocation phase and how to combine it with the admission test. 4 Relocation Phase In this section we discuss how task relocation can be performed without violating any feasibility constraints for the slotted area model. We consider a general relocation problem of the following type: given a task set,relocatable task sets S S,, H H,, and an allocation A = { S S, H H },we want to relocate tasks in order to obtain a new allocation A = { S S, H H }. Hence, S S and H H represent the sets of tasks that are kept on the CPU and RD respectively, while and represent the sets of tasks that are relocated from the CPU to the RD and from the RD to the CPU respectively. RD constraints must be considered when performing relocation. Consider a simple example in which = {τ i }, = {τ j }, τ i,τ j havethesameareaandtheslotted area model is used. Also suppose that RD area is fully occupied. hen in order to perform relocation we really need to swap the two tasks from CPU to RD and vice versa. However, tasks can only be relocated at the beginning of a job, and there may be no time instant in which two jobs of τ i,τ j start simultaneously. Furthermore, reconfiguring the device takes time. herefore, the only feasible approach is as follows: first, at the beginning of some job τi s is activated on the CPU while τi h is suspended. hen, the bitstream of τj h is loaded in the device. Finally, at the beginning of some job τj s is terminated and τ j h is started. Note that for some time both tasks software configurations are running on the CPU. his implies that relocation incurs an overhead in term of CPU utilization, in the sense that in order to perform relocation in a feasible way we need to leave some free computational power on the CPU in order to feasibly schedule an additional software configuration. Feasibility constraints for software tasks are typically expressed for fixed task sets, while in our case the set of active software configurations S running on the CPU frequently changes. However, it can be trivially proven that, if a software configuration is considered to be active on the CPU until the deadline of its last software job, then the classic EDF utilization bound: t 0,U S(t) 1 (11) can still be applied. Relocating tasks that have different areas is more difficult. We can clearly always perform relocation by first activating on the CPU the software configurations of all tasks 5

6 executed on the RD and then reconfiguring the whole RD, but this is highly improbable without violating software feasibility. We will therefore use the following idea: first, we partition both and into an equal number of sets of tasks that we will call swapping group, andwefurther create pairs of such swapping groups. hen, for each swapping pair we perform relocation in a way similar to the single-task case described before: first, we activate the software configurations of all tasks in the pair s swapping group from. hen, we load and activate all hardware configurations in the swapping group from.hekey concept is that we will build swapping groups in such a way as to minimize the CPU overhead required by the relocation process. In order to determine the swapping groups in a consistent and simple enough way to be applicable at runtime, we impose further constraints on task area. In particular, task area can only be chosen among a defined set of K areas {a 1,...,a K }, such that the area of the device is a multiple of a K and 1 <k K, a k is a multiple of a k 1. For example, for a typical value of A =6and K =6, {a 1 =1,a 2 =3,a 3 =6,a 4 =12,a 5 =24,a 6 =48} are possible values. Note that while this may seem a major limitation, the system designer is free to choose K and the value set based on the tasks in the system and furthermore following the dataflow model each task can be decomposed in possibly several subtasks to better fit the area constraints (tools often provide functionalities to partition logical functions in hardware). Note that in both allocations A and A the RD can be not fully utilized, i.e. some space can be unallocated. Since this complicates the analysis, we will solve the problem by introducing the concept of placeholder task. A placeholder task τ i is by definition a task with a i =1, U i =0andnoassociated bitstream/code. A placeholder task never executes: it is merely used in order to mark a certain area on the RD as being occupied. We can then define new task sets and as follows: Definition 4 Given task set (respectively ), ( ) is the task set comprised of all the tasks in ( )plusa a a H H (A a a H H ) placeholder tasks. Lemma 2 ([1]) For each allocation A = { S S, H H }: a = a (12) Note that since the allocation algorithm tries to place as many tasks as possible on the RD, the number of placeholder tasks is generally small. We can now define our swapping groups as follows: Lemma 3 ([1]) Let a max = max τi {a i}. hen each of, can be partitioned in M = a a max sets {S 1,...,SM }, {S1,...,SM } of area a max and at most one leftover set S M+1,SM+1 of area a mod a max ; furthermore, if a min = min τi {a i},thena mod a max a max a min and a mod a max a max a min. Note that the above theorem also suggests a constructive way to build the swapping groups; hence, algorithm GROUP PARIION can be defined as follows: starting from the smallest tasks, at each step we group them so as to form tasks of the immediately greater size, placing the leftover aside. We then continue grouping until we reach the size of the maximum area, and we combine all leftovers to produce the unique leftover group. hanks to the introduction of placeholder tasks, Lemmas 2 and 3 ensure that the resulting groups for and are of the same size. Note that GROUP PARIION has a complexity of O(N 2 ),wheren is the total number of tasks in the set we are partitioning. In fact, after we sort all tasks by area in O(N log(n)), at each step the number of newly produced groups is at most half the number of tasks for that step, therefore the quadratic complexity follows. Once the swapping groups have been created, we need to define swapping pairs. he two leftover tasks S M+1,SM+1 constitute a pair. Furthermore, suppose that the M swapping groups S 1,...,SM of are arranged such that k, 1 k<m: U S k U S k+1 and similarly for the M swapping groups S 1...,SM of, k, 1 k < M : U S U k S. We k+1 can then form pairs {P 1 =(S 1,S1 ),...,PM = (S M,SM )} and swap groups one pair at a time starting from P 1 to P M. he following theorems express sufficient feasibility conditions for relocation. heorem 4 ([1]) Under the slotted area model, consider allocations A = { S S, H H }, A = { S S, H H } and associated swapping pairs {P 1,...,P M }, with no leftover group. hen the following are sufficient feasibility conditions to relocate A to A : 1. U A + U S 1 1; 2. and U A + U S M 1 heorem 5 ([1]) Under the slotted area model, consider allocations A = { S S, H H }, A = { S S, H H } and associated swapping pairs {P 1,...,P M } and P M+1 = (S M+1,SM+1 ); furthermore, suppose that PM+1 is swapped before P 1. hen the following are sufficient feasibility conditions to relocate A to A : 1. U A + U S M+1 1; 2. and U A + U S M+1 U S M+1 + U S 1 1; 3. and U A + U S M 1 6

7 Note that heorem 5 basically means that the feasibility of the relocation phase only depends on the utilization of the leftover groups and on the smallest utilization of any two other groups in, (including the placeholder tasks). Furthermore, note that heorem 4 does not depend on the assumption U A U A, which makes it applicable to any kind of relocation. heorems 4 and 5 rely on the assumption that tasks are already partitioned in swapping groups. However, since we can choose how to create the partition, we can maximize the probability of a relocation being feasible by using the following guidelines: 1. minimize U S M+1 and U S 1 ; 2. minimize U S M ; task τ 1 τ 2 τ 3 τ 4 τ 5 τ 6 τ 7 τ 8 τ τ 10 a i U i 3 RD able 2. Example: task set (a) Example: initial allocation CPU maximize U S M+1 he swapping groups can then be created using the above guidelines with algorithm GROUP PARIION, by simply ordering all tasks by area and utilization. he complexity remains bounded by O(N 2 ) since at each step we can merge the newly created groups preserving ordering in linear time. 4.1 Admission Control RD RD CPU (b) Example: intermediate allocation CPU Using the feasibility tests from heorems 4, 5, an admission test can be run along the lines introduced in Section 3. Provided that all new tasks can be initially allocated on the CPU, we first run the GREEDY allocation algorithm to obtain a new pseudo-optimal allocation. hen, we create the swapping pairs. Finally, we check the feasibility conditions. If they hold, then relocation is possible. If not, we can choose between accepting the modified task set without relocation or rejecting the modification. he choice can depend on the criticality of newly arrived tasks, although accepting them without performing relocation may clearly compromise the future system performance in term of admitted tasks. Note that we do not expect an optimal allocation algorithm to perform any better than the greedy solution. o understand the reason, consider that the only difference between OP(LP KNAP) and GREEDY(ILP KNAP), as detailed in Section 3, lies in the allocation of the critical task τ c. herefore, while the greedy solution has a higher CPU utilization, it also has either free area on the RD or area occupied by tasks with lower utilization. We can thus expect that the minimum utilization swapping group for has lower utilization. As long as the critical task area a c is not greater than a max, these two factors typically balance out in the feasibility conditions. Also note that our relocation scheme can take a non negligible time to reconfigure the entire system in the presence of many swapping groups. his is not a main concern in multimedia systems where task arrivals and terminations are triggered by user interaction, but could be a problem in systems with short interarrival times since a new task could (c) Example: final allocation Figure 1. Example Relocation arrive before relocation is finished. We plan to address this problem as part of our future work, modifying our scheme to allow tasks to be admitted even during a relocation phase. A final note regards the management of software-only tasks, i.e. tasks that can only be scheduled on the CPU. Such tasks can be trivially included in our framework by simply forcing them to be allocated in S S. 4.2 Example In this section we provide a comprehensive example of the admission control and relocation procedure. We assume an RD area A =,taskareain{1, 3} and an optimal allocation algorithm. he task set is reported in able 2. ask parameters were chosen to keep the example simple and easily understandable; they should not be considered as real task cases. he initial situation is depicted in Figure 1(a), where the width of each task on the RD represents the number of slots occupied by its hardware configuration. asks τ 1 through τ 7 are running on the RD while tasks τ 8,τ are running on the 7

8 CPU. Note that since τ 8,τ have the lowest Ui a i ratio among the running tasks and there is no free space on the RD the allocation is optimal. his situation changes when simultaneously τ 6 and τ 7 terminate and a new task τ 10 arrives in the system. Since U 8 +U +U 10 = 7.5 1, the task can be safely admitted on the CPU, producing the allocation shown in Figure 1(b). A new allocation is then computed for task set = {τ 1,τ 2,τ 3,τ 4,τ 5,τ 8,τ,τ 10 }. It is easy to see that in the optimal solution τ 1,τ 2,τ 3,τ,τ 10 are allocated on the RD and τ 4,τ 5,τ 8 on the CPU. herefore we can derive the following sets: H H = {τ 1,τ 2,τ 3 }, = {τ 4,τ 5 }, S S = {τ 8 }, = {τ,τ 10 } Note that since a H H + a =7, we add two placeholder tasks τ 11,τ 12 with area 1 and utilization 0 to (no placeholder is necessary for ). We can then run the GROUP PARIION algorithm producing the following swapping groups: S 2 = {τ 11},S 1 = {τ 12,τ 4,τ 5 },S 2 = {τ 10}, S 1 = {τ } Once swapping groups have been defined, we check the feasibility conditions: 1. U 8 + U + U 10 + U 11 = U 8 +U +U 10 +U 11 U 10 +U 12 +U 4 +U 5 = U 8 + U 4 + U 5 + U = We can finally relocate tasks as described in Section 4 by first swapping S 2 with S2 and then swapping S 1 with S1. Since all feasibility conditions hold, according to heorem 5 no task misses its deadline. he final resulting allocation is shown in Figure 1(c). 5 Simulation Results We have measured the effectiveness of our relocation strategy through a series of synthetic simulations. In particular, we have compared our admission test against a reference test which simply tries to allocate each new task on the RD firstly and on the CPU secondly, rejecting the task if there is not sufficient free area and free utilization. It is worth noticing that, to the best of our knowledge, no better test exists in the literature to perform admission control when using the described system; in fact, our comparison choice is a trivial extension of the reference algorithm shown in [23]. For each test, we have simulated the arrival of 100,000 synthetic tasks, and determined the rejection rate in term of the percentage of the area of rejected tasks with respect to all tasks arrived in the system. For each synthetic task τ i, the area a i is randomly chosen to account for tasks with very different computational requirements and the utilization U i is randomly generated with mean proportional to a i. We define the load L of the system in a given interval of time [t 1,t 2 ] as the load offered by all tasks activated in [t 1,t 2 ]: τ L([t 1,t 2 ]) = i activated in [t U 1,t 2] i (13) t 2 t 1 where is the average time that a task remains in the system; therefore, U i is the mean execution time required by all jobs of τ i. is computed and task terminations are randomly generated such that the average load is equal to a given value L. Note that since the mean value of U i is proportional to a i, we could redefine the offered load in term of a i A instead of U i as in [24]. Furthermore, since the system is comprised of both a RD and a CPU, a load of L 2 should lead to no rejection (in practice, since task arrivals and terminations constitute a random process, rejections happen even for L 2). Figures 2(a),2(b),2(c),2(d) show a subset of the results for a RD area of 12 slots (Xilinx XC4VFX140), with L ranging from 1 to 4; a more comprehensive set of graphs can be found in [1]. In Figures 2(a) and 2(b) task area is chosen in set {1, 2, 4, 8, 16, 32}, with smaller areas being extracted with higher probability than bigger ones (the average task area is 3.05). In Figure 2(c) the area is chosen in set {1, 2, 4, 8, 16, 32, 64}, with each element being given equal probability (the average task area is 18.14). ask utilization is randomly generated with standard deviation ai A in Figures 2(a) and 2(c) and 0.5 ai A in Figure 2(b). Figure 2(d) uses the same parameters as 2(a), but results are shown in term of the percentage of rejected tasks instead of rejected area. In all figures, relocation and relocation optim refer to our new admission test with relocation while reference is the reference test. he new allocation is computed using the GREEDY algorithm in relocation, while an optimal dynamic programming algorithm [13] is used in relocation optim. he average time needed to perform a single admission test is equal to 243 µs for relocation and 151 ms for relocation optim on our test system (a Pentium IV at 2.8 Ghz). Note that graphs do not saturate since we plot them as a function of the load offered to the system and not of the load of accepted tasks. Results in term of rejected tasks and rejected task area show similar trends; the percentage of rejected area is higher since tasks with bigger area are clearly more likely to be rejected. In all tests relocation clearly outperforms reference, rejecting less than one third of the tasks/area with respect to reference in the most favorable case of Figure 2(a). he performance of relocation clearly depends on both the average task area and the utilization standard deviation, with relocation performing better in the presence of small tasks with big differences between area and utilization; note that since all systems proposed in Section 1 include different kinds of activities, we expect that some tasks, like signal processing, can be much better optimized for hardware execution than others. he performance trend is expected, since the optimized allocation algorithm leads to better results as the utilization standard deviation increases. In the same way, smaller tasks lead to smaller swapping groups 8

9 reference relocation optim relocation reference relocation optim relocation Percentage of rejected area Percentage of rejected area Load Load (a) Max area 32, standard deviation a i A. (b) Max area 32, standard deviation 0.5 a i A reference reference 0.3 relocation optim relocation 0.1 relocation optim relocation Percentage of rejected area Percentage of rejected tasks Load Load (c) Max area 64, standard deviation a i A. (d) Max area 32, standard deviation a i A. Figure 2. Experimental Results with lower total utilization, therefore the reallocation phase is more likely to be feasibly executable. Finally, note that relocation optim does not provide any performance improvement over relocation, as predicted in Section 4.1. Since the run time overhead of relocation optim is about three orders of magnitude greater with respect to relocation, using the simpler GREEDY algorithm in the allocation phase is the best choice. 6 Related Work o the best of our knowledge, no previous work on combined scheduling of software/hardware tasks has been published. he closest related work is presented in [24, 23], dealing with the admission control problem for real-time hardware tasks, and in [7], dealing with scheduling algorithms for periodic hardware tasks. However, only hard aperiodic tasks are considered in [24, 23], and furthermore no mentioned work takes configuration overheads into account. he problem of on-line task allocation for non real-time hardware tasks, with the goal of minimizing task activation delay, has received more attention [2], including schemes that relocate (i.e. move) tasks on the RD mainly in the interest of avoiding external fragmentation in the 1D and 2D area models [4, 5]. However, whenever relocation is performed tasks are assumed to be suspendable at any time, which can be difficult to achieve, and possibly for significant periods of time, which is unacceptable for real-time execution. In [8] a technique to relocate tasks on FPGAs without suspending them is introduced, but there is no analysis of the overhead in term of area that needs to be left free on the RD to relocate a task. 7 Conclusions and Future Work In this work, we have first proposed a pseudo-optimal allocation algorithm and a relocation scheme for relocatable tasks. We have then derived feasibility conditions for both software and hardware scheduling and we have defined an admission control test based on such conditions. Finally, the performance benefits of relocation have been measured through a series of synthetic simulations. Although we only considered systems comprised of a single CPU, we believe that our scheme can be easily adapted to multi-cpu systems by modifying the allocation algorithm. As future work, we first plan to extend our analysis to the 1D and possibly 2D area models. Since such models are

10 affected by external fragmentation, a suitable defragmentation scheme is needed to place tasks in a pseudo-optimal way. However, we believe that defragmentation can be easily accounted for in the schedulability analysis. Finally, as a long term objective we intend to develop an implementation of the proposed techniques on a working OSRD prototype. References [1] L. Abeni and G.Buttazzo. Integrating multimedia applications in hard real-time systems. In Proceedings of the 1th IEEE Real-ime Systems Symposium, Madrid, Spain, december 18. [2] K. Bazargan, R. Kastner, and M. Sarrafzadeh. Fast template placement for reconfigurable computing systems. IEEE Design and ests of Computers, 17(1):68 83, [3] G. Berry, S. Moisan, and J.-P. Rigault. Esterel: owards a synchronous and semantically sound high-level language for real-time applications. In Proc. IEEE Real-ime Systems Symposium, pages 30 40, Arlington, Virginia, 183. [4] G. Brebner and O. Diessel. Chip-based reconfigurable task management. In Proceedings of the 11 th International Conference on Field-Programmable Logic and Applications (FPL), pages , [5] K. Compton, Z. Li, J. Cooley, S. Knol, and S. Hauck. Configuration relocation and defragmentation for run-time reconfigurable computing. IEEE ransactions on Very Large Scale Integration (VLSI) Systems, 10(3):20 220, June [6] K. Danne and M. Platzner. Memory-demanding periodic real-time applications on FPGA computers. In Work-in- Progress of the 17th Euromicro Conference on Real-ime Systems (ECRS), Palma de Mallorca, Spain, July [7] K. Danne and M. Platzner. Periodic real time scheduling for FPGA computers. In hird IEEE Int l Workshop on Intelligent Solutions in Embedded Systems (WISES), May [8] M. Gericota, G. Alves, M. Silva, and J. Ferreira. Online defragmentation for run-time partially reconfigurable FPGAs. In Proceedings of the 12th International Conference on Field-Programmable Logic and Applications (FPL), Montpellier, France, September [] J. M. Grasmeyer and M.. Keennon. Development of the black widow micro air vehicle. In Proceedings of AIAA Conference on Aerospace Sciences, [10] G.Vanmeerbeeck, P.Schaumont, S.Vernalde, M.Engels, and I.Bolsens. Hardware/software partitioning for embedded systems in OCAPI-xl. In CODES 01, Copenhagen, Denmark, April [11] N. Halbwachs, P. Caspi, and D. Pilaud. he synchronous dataflow programming language Lustre. In Another Look at Real ime Programming, Proceedings of the IEEE, Special Issue, September 11. [12] A. Howard, M. J. Matarić, and G. S. Sukhatme. An incremental self-deployment algorithm for mobile sensor networks. Autonomous Robots, 13(2): , [13] H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack Problems. Springer, [14] C. Liu and J. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the Association for Computing Machinery, 20(1), 173. [15]. Marescaux, A. Bartic, D. Verkest, S. Vernalde, and R. Lauwereins. Interconnection networks enable fine-grain dynamic multi-tasking on FPGAs. In Proc. of the 12th International Conference on Field-Programmable Logic and Applications (FPL), Montpellier, France, September [16] J.-Y. Mignolet, V. Nollet, P. Coene, D.Verkest, S. Vernalde, and R. Lauwereins. Infrastructure for design and management of relocatable tasks in a heterogeneous reconfigurable system-on-chip. In Proceedings of the DAE 03 conference, Munich, Germany, March [17] J.-Y. Mignolet, S. Vernalde, D. Verkest, and R. Lauwereins. Enabling hardware-software multitasking on a reconfigurable computing platform for networked portable multimedia appliances. In Proceedings of the International Conference on Engineering Reconfigurable Systems and Algorithms, pages , Las Vegas, June [18] V. Nollet, P. Coene, D. Verkest, S. Vernalde, and R. Lauwereins. Designing an operating system for a heterogeneous reconfigurable SoC. In Proceedings of the RAW 03 workshop, Nice,France, April [1] R. Pellizzoni and M. Caccamo. Adaptive real-time management of relocatable tasks for FPGA-based embedded systems. echnical report, University of Illinois, mcaccamo/papers/. [20] C. Plessel, R. Enzler, H. Walder, J. Beutel, M. Platzner, L. hiele, and G. röster. he case for reconfigurable hardware in wearable computing. Personal and Ubiquitous Computing, October [21] H. Simmler, L. Levinson, and R. Männer. Multitasking on FPGA coprocessors. In Proc. 10 th Int l Conf. Field Programmable Logic and Applications, Villach, Austria, August [22] G. Simon, M. Maróti, Á. Lédeczi, G. Balogh, B. Kusy, A. Nádas, G. Pap, J. Sallai, and K. Frampton. Sensor network-based countersniper system. In Proceedings of the ACM Second International Conference on Embedded Networked Sensor Systems (SenSys), [23] C. Steiger, H. Walder, and M. Platzner. Operating systems for reconfigurable embedded platforms: Online scheduling of real-time tasks. IEEE ransactions on Computers, 53(11): , [24] C. Steiger, H. Walder, M. Platzner, and L. hiele. Online scheduling and placement of real-time tasks to partially reconfigurable devices. In Proceedings of the 24 th IEEE Real- ime System Symposium, Cancun, Mexico, December [25] H. Walder and M. Platzner. Reconfigurable hardware operating systems: From concepts to realizations. In Proc. Int l Conf. Eng. of Reconfigurable Systems and Algorithms (ERSA), [26] G. Wigley and D. Kearney. he development of an operating system for reconfigurable computing. In Proceedings IEEE Symposium FPGAs for Custom Computing Machines (FCCM), [27] Xilinx, Inc. Virtex-4, Virtex-II Pro and Virtex-II Pro X FPGA User Guide. 10

Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC

Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC Yuan-Hsiu Chen and Pao-Ann Hsiung National Chung Cheng University, Chiayi, Taiwan 621, ROC. pahsiung@cs.ccu.edu.tw

More information

A Reconfigurable RTOS with HW/SW Co-scheduling for SOPC

A Reconfigurable RTOS with HW/SW Co-scheduling for SOPC A Reconfigurable RTOS with HW/SW Co-scheduling for SOPC Qingxu Deng, Shuisheng Wei, Hai Xu, Yu Han, Ge Yu Department of Computer Science and Engineering Northeastern University, China xhsoldier@163.com

More information

Online Scheduling for Block-partitioned Reconfigurable Devices

Online Scheduling for Block-partitioned Reconfigurable Devices Online Scheduling for Block-partitioned Reconfigurable Devices Herbert Walder and Marco Platzner Computer Engineering and Networks Lab Swiss Federal Institute of Technology (ETH) Zurich, Switzerland walder@tik.ee.ethz.ch

More information

Periodic Real-Time Scheduling for FPGA Computers

Periodic Real-Time Scheduling for FPGA Computers Periodic Real-Time Scheduling for FPGA Computers Klaus Danne 1 and Marco Platzner 2 1 Design of Parallel Systems Group, Heinz Nixdorf Institute, University of Paderborn, Germnay danne@upb.de 2 Computer

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

Attaining EDF Task Scheduling with O(1) Time Complexity

Attaining EDF Task Scheduling with O(1) Time Complexity Attaining EDF Task Scheduling with O(1) Time Complexity Verber Domen University of Maribor, Faculty of Electrical Engineering and Computer Sciences, Maribor, Slovenia (e-mail: domen.verber@uni-mb.si) Abstract:

More information

Scheduling Real-time Tasks: Algorithms and Complexity

Scheduling Real-time Tasks: Algorithms and Complexity Scheduling Real-time Tasks: Algorithms and Complexity Sanjoy Baruah The University of North Carolina at Chapel Hill Email: baruah@cs.unc.edu Joël Goossens Université Libre de Bruxelles Email: joel.goossens@ulb.ac.be

More information

Real-Time Scheduling (Part 1) (Working Draft) Real-Time System Example

Real-Time Scheduling (Part 1) (Working Draft) Real-Time System Example Real-Time Scheduling (Part 1) (Working Draft) Insup Lee Department of Computer and Information Science School of Engineering and Applied Science University of Pennsylvania www.cis.upenn.edu/~lee/ CIS 41,

More information

ReCoNodes. Optimization Methods and Platform Design for Reconfigurable Hardware Systems. Prof. Dr.-Ing. Jürgen Teich. Prof. Dr.

ReCoNodes. Optimization Methods and Platform Design for Reconfigurable Hardware Systems. Prof. Dr.-Ing. Jürgen Teich. Prof. Dr. Optimization Methods and Platform Design for Reconfigurable Hardware Systems Prof. Dr. Sándor Fekete Dr. Tom Kamphans Dipl.-Math. Jan van der Veen Dipl.-Math.oec. Nils Schweer Prof. Dr.-Ing. Jürgen Teich

More information

Hard Real-Time Reconfiguration Port Scheduling

Hard Real-Time Reconfiguration Port Scheduling Hard Real-Time Reconfiguration Port Scheduling Florian Dittmann and Stefan Frank Heinz Nixdorf Institute, University of Paderborn Fuerstenallee, 330 Paderborn, Germany {roichen, sfrank}@upb.de Abstract

More information

Load Distribution in Large Scale Network Monitoring Infrastructures

Load Distribution in Large Scale Network Monitoring Infrastructures Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu

More information

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses Overview of Real-Time Scheduling Embedded Real-Time Software Lecture 3 Lecture Outline Overview of real-time scheduling algorithms Clock-driven Weighted round-robin Priority-driven Dynamic vs. static Deadline

More information

Predictable response times in event-driven real-time systems

Predictable response times in event-driven real-time systems Predictable response times in event-driven real-time systems Automotive 2006 - Security and Reliability in Automotive Systems Stuttgart, October 2006. Presented by: Michael González Harbour mgh@unican.es

More information

HARD REAL-TIME SCHEDULING: THE DEADLINE-MONOTONIC APPROACH 1. Department of Computer Science, University of York, York, YO1 5DD, England.

HARD REAL-TIME SCHEDULING: THE DEADLINE-MONOTONIC APPROACH 1. Department of Computer Science, University of York, York, YO1 5DD, England. HARD REAL-TIME SCHEDULING: THE DEADLINE-MONOTONIC APPROACH 1 N C Audsley A Burns M F Richardson A J Wellings Department of Computer Science, University of York, York, YO1 5DD, England ABSTRACT The scheduling

More information

Efficient and Robust Allocation Algorithms in Clouds under Memory Constraints

Efficient and Robust Allocation Algorithms in Clouds under Memory Constraints Efficient and Robust Allocation Algorithms in Clouds under Memory Constraints Olivier Beaumont,, Paul Renaud-Goud Inria & University of Bordeaux Bordeaux, France 9th Scheduling for Large Scale Systems

More information

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications Open System Laboratory of University of Illinois at Urbana Champaign presents: Outline: IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications A Fine-Grained Adaptive

More information

Improved Handling of Soft Aperiodic Tasks in Offline Scheduled Real-Time Systems using Total Bandwidth Server

Improved Handling of Soft Aperiodic Tasks in Offline Scheduled Real-Time Systems using Total Bandwidth Server Improved Handling of Soft Aperiodic Tasks in Offline Scheduled Real-Time Systems using Total Bandwidth Server Gerhard Fohler, Tomas Lennvall Mälardalen University Västeras, Sweden gfr, tlv @mdh.se Giorgio

More information

Aperiodic Task Scheduling

Aperiodic Task Scheduling Aperiodic Task Scheduling Gerhard Fohler Mälardalen University, Sweden gerhard.fohler@mdh.se Real-Time Systems Gerhard Fohler 2005 Non Periodic Tasks So far periodic events and tasks what about others?

More information

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

More information

Aperiodic Task Scheduling

Aperiodic Task Scheduling Aperiodic Task Scheduling Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Germany Springer, 2010 2014 年 11 月 19 日 These slides use Microsoft clip arts. Microsoft copyright

More information

Embedded Systems 20 REVIEW. Multiprocessor Scheduling

Embedded Systems 20 REVIEW. Multiprocessor Scheduling Embedded Systems 0 - - Multiprocessor Scheduling REVIEW Given n equivalent processors, a finite set M of aperiodic/periodic tasks find a schedule such that each task always meets its deadline. Assumptions:

More information

Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis

Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis Felipe Augusto Nunes de Oliveira - GRR20112021 João Victor Tozatti Risso - GRR20120726 Abstract. The increasing

More information

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...

More information

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Heechul Yun +, Gang Yao +, Rodolfo Pellizzoni *, Marco Caccamo +, Lui Sha + University of Illinois at Urbana and Champaign

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

Embedded Systems 20 BF - ES

Embedded Systems 20 BF - ES Embedded Systems 20-1 - Multiprocessor Scheduling REVIEW Given n equivalent processors, a finite set M of aperiodic/periodic tasks find a schedule such that each task always meets its deadline. Assumptions:

More information

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT 216 ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT *P.Nirmalkumar, **J.Raja Paul Perinbam, @S.Ravi and #B.Rajan *Research Scholar,

More information

174: Scheduling Systems. Emil Michta University of Zielona Gora, Zielona Gora, Poland 1 TIMING ANALYSIS IN NETWORKED MEASUREMENT CONTROL SYSTEMS

174: Scheduling Systems. Emil Michta University of Zielona Gora, Zielona Gora, Poland 1 TIMING ANALYSIS IN NETWORKED MEASUREMENT CONTROL SYSTEMS 174: Scheduling Systems Emil Michta University of Zielona Gora, Zielona Gora, Poland 1 Timing Analysis in Networked Measurement Control Systems 1 2 Introduction to Scheduling Systems 2 3 Scheduling Theory

More information

Performance Oriented Management System for Reconfigurable Network Appliances

Performance Oriented Management System for Reconfigurable Network Appliances Performance Oriented Management System for Reconfigurable Network Appliances Hiroki Matsutani, Ryuji Wakikawa, Koshiro Mitsuya and Jun Murai Faculty of Environmental Information, Keio University Graduate

More information

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING Hussain Al-Asaad and Alireza Sarvi Department of Electrical & Computer Engineering University of California Davis, CA, U.S.A.

More information

Lecture 3 Theoretical Foundations of RTOS

Lecture 3 Theoretical Foundations of RTOS CENG 383 Real-Time Systems Lecture 3 Theoretical Foundations of RTOS Asst. Prof. Tolga Ayav, Ph.D. Department of Computer Engineering Task States Executing Ready Suspended (or blocked) Dormant (or sleeping)

More information

Topology adaptive network-on-chip design and implementation

Topology adaptive network-on-chip design and implementation Topology adaptive network-on-chip design and implementation T.A. Bartic, J.-Y. Mignolet, V. Nollet, T. Marescaux, D. Verkest, S. Vernalde and R. Lauwereins Abstract: Network-on-chip designs promise to

More information

Scheduling Sporadic Tasks with Shared Resources in Hard-Real-Time Systems

Scheduling Sporadic Tasks with Shared Resources in Hard-Real-Time Systems Scheduling Sporadic Tasks with Shared Resources in Hard-Real- Systems Kevin Jeffay * University of North Carolina at Chapel Hill Department of Computer Science Chapel Hill, NC 27599-3175 jeffay@cs.unc.edu

More information

International Workshop on Field Programmable Logic and Applications, FPL '99

International Workshop on Field Programmable Logic and Applications, FPL '99 International Workshop on Field Programmable Logic and Applications, FPL '99 DRIVE: An Interpretive Simulation and Visualization Environment for Dynamically Reconægurable Systems? Kiran Bondalapati and

More information

Run-Time Scheduling Support for Hybrid CPU/FPGA SoCs

Run-Time Scheduling Support for Hybrid CPU/FPGA SoCs Run-Time Scheduling Support for Hybrid CPU/FPGA SoCs Jason Agron jagron@ittc.ku.edu Acknowledgements I would like to thank Dr. Andrews, Dr. Alexander, and Dr. Sass for assistance and advice in both research

More information

Competitive Analysis of On line Randomized Call Control in Cellular Networks

Competitive Analysis of On line Randomized Call Control in Cellular Networks Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising

More information

High-Level Synthesis for FPGA Designs

High-Level Synthesis for FPGA Designs High-Level Synthesis for FPGA Designs BRINGING BRINGING YOU YOU THE THE NEXT NEXT LEVEL LEVEL IN IN EMBEDDED EMBEDDED DEVELOPMENT DEVELOPMENT Frank de Bont Trainer consultant Cereslaan 10b 5384 VT Heesch

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

Real-Time Scheduling 1 / 39

Real-Time Scheduling 1 / 39 Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A

More information

Resource Reservation & Resource Servers. Problems to solve

Resource Reservation & Resource Servers. Problems to solve Resource Reservation & Resource Servers Problems to solve Hard-deadline tasks may be Periodic or Sporadic (with a known minimum arrival time) or Non periodic (how to deal with this?) Soft-deadline tasks

More information

4. Fixed-Priority Scheduling

4. Fixed-Priority Scheduling Simple workload model 4. Fixed-Priority Scheduling Credits to A. Burns and A. Wellings The application is assumed to consist of a fixed set of tasks All tasks are periodic with known periods This defines

More information

How To Find An Optimal Search Protocol For An Oblivious Cell

How To Find An Optimal Search Protocol For An Oblivious Cell The Conference Call Search Problem in Wireless Networks Leah Epstein 1, and Asaf Levin 2 1 Department of Mathematics, University of Haifa, 31905 Haifa, Israel. lea@math.haifa.ac.il 2 Department of Statistics,

More information

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM 152 APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM A1.1 INTRODUCTION PPATPAN is implemented in a test bed with five Linux system arranged in a multihop topology. The system is implemented

More information

Module 6. Embedded System Software. Version 2 EE IIT, Kharagpur 1

Module 6. Embedded System Software. Version 2 EE IIT, Kharagpur 1 Module 6 Embedded System Software Version 2 EE IIT, Kharagpur 1 Lesson 30 Real-Time Task Scheduling Part 2 Version 2 EE IIT, Kharagpur 2 Specific Instructional Objectives At the end of this lesson, the

More information

Algorithmic Skeletons for the Design of Partially Reconfigurable Systems

Algorithmic Skeletons for the Design of Partially Reconfigurable Systems for the Design of Partially Reconfigurable Systems Heinz Nixdorf Institute, Florian Dittmann Stefan Frank Franz Rammig Universität Paderborn Motivation Dynamically Reconfigurable Systems - Promising benefits

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Scheduling using Optimization Decomposition in Wireless Network with Time Performance Analysis

Scheduling using Optimization Decomposition in Wireless Network with Time Performance Analysis Scheduling using Optimization Decomposition in Wireless Network with Time Performance Analysis Aparna.C 1, Kavitha.V.kakade 2 M.E Student, Department of Computer Science and Engineering, Sri Shakthi Institute

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND -Based SSDs Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim Department of Computer Science and Engineering, Seoul National University,

More information

How To Design An Image Processing System On A Chip

How To Design An Image Processing System On A Chip RAPID PROTOTYPING PLATFORM FOR RECONFIGURABLE IMAGE PROCESSING B.Kovář 1, J. Kloub 1, J. Schier 1, A. Heřmánek 1, P. Zemčík 2, A. Herout 2 (1) Institute of Information Theory and Automation Academy of

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

More information

3. Scheduling issues. Common approaches /1. Common approaches /2. Common approaches /3. 2012/13 UniPD / T. Vardanega 23/01/2013. Real-Time Systems 1

3. Scheduling issues. Common approaches /1. Common approaches /2. Common approaches /3. 2012/13 UniPD / T. Vardanega 23/01/2013. Real-Time Systems 1 Common approaches /1 3. Scheduling issues Clock-driven (time-driven) scheduling Scheduling decisions are made beforehand (off line) and carried out at predefined time instants The time instants normally

More information

International Journal of Advancements in Research & Technology, Volume 2, Issue3, March -2013 1 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 2, Issue3, March -2013 1 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 2, Issue3, March -2013 1 FPGA IMPLEMENTATION OF HARDWARE TASK MANAGEMENT STRATEGIES Assistant professor Sharan Kumar Electronics Department

More information

Multi-objective Design Space Exploration based on UML

Multi-objective Design Space Exploration based on UML Multi-objective Design Space Exploration based on UML Marcio F. da S. Oliveira, Eduardo W. Brião, Francisco A. Nascimento, Instituto de Informática, Universidade Federal do Rio Grande do Sul (UFRGS), Brazil

More information

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

More information

Global Multiprocessor Real-Time Scheduling as a Constraint Satisfaction Problem

Global Multiprocessor Real-Time Scheduling as a Constraint Satisfaction Problem Global Multiprocessor Real-Time Scheduling as a Constraint Satisfaction Problem Liliana Cucu-Grosean & Olivier Buffet INRIA Nancy Grand-Est 615 rue du Jardin Botanique 54600 Villers-lès-Nancy, France firstname.lastname@loria.fr

More information

How To Compare Real Time Scheduling To A Scheduled Scheduler On Linux On A Computer System With A Visualization System

How To Compare Real Time Scheduling To A Scheduled Scheduler On Linux On A Computer System With A Visualization System MS project proposal: A comparison of real-time scheduling algorithms using visualization of tasks and evaluation of real-time extensions to Linux Kevin Churnetski Computer Science-RIT 8/21/2003 Abstract:

More information

A General Framework for Tracking Objects in a Multi-Camera Environment

A General Framework for Tracking Objects in a Multi-Camera Environment A General Framework for Tracking Objects in a Multi-Camera Environment Karlene Nguyen, Gavin Yeung, Soheil Ghiasi, Majid Sarrafzadeh {karlene, gavin, soheil, majid}@cs.ucla.edu Abstract We present a framework

More information

Partitioned real-time scheduling on heterogeneous shared-memory multiprocessors

Partitioned real-time scheduling on heterogeneous shared-memory multiprocessors Partitioned real-time scheduling on heterogeneous shared-memory multiprocessors Martin Niemeier École Polytechnique Fédérale de Lausanne Discrete Optimization Group Lausanne, Switzerland martin.niemeier@epfl.ch

More information

Cloud Management: Knowing is Half The Battle

Cloud Management: Knowing is Half The Battle Cloud Management: Knowing is Half The Battle Raouf BOUTABA David R. Cheriton School of Computer Science University of Waterloo Joint work with Qi Zhang, Faten Zhani (University of Waterloo) and Joseph

More information

Compositional Real-Time Scheduling Framework with Periodic Model

Compositional Real-Time Scheduling Framework with Periodic Model Compositional Real-Time Scheduling Framework with Periodic Model INSIK SHIN and INSUP LEE University of Pennsylvania It is desirable to develop large complex systems using components based on systematic

More information

R u t c o r Research R e p o r t. A Method to Schedule Both Transportation and Production at the Same Time in a Special FMS.

R u t c o r Research R e p o r t. A Method to Schedule Both Transportation and Production at the Same Time in a Special FMS. R u t c o r Research R e p o r t A Method to Schedule Both Transportation and Production at the Same Time in a Special FMS Navid Hashemian a Béla Vizvári b RRR 3-2011, February 21, 2011 RUTCOR Rutgers

More information

Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration

Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration 1 Harish H G, 2 Dr. R Girisha 1 PG Student, 2 Professor, Department of CSE, PESCE Mandya (An Autonomous Institution under

More information

PFP Technology White Paper

PFP Technology White Paper PFP Technology White Paper Summary PFP Cybersecurity solution is an intrusion detection solution based on observing tiny patterns on the processor power consumption. PFP is capable of detecting intrusions

More information

Online Scheduling and Placement of Real-time Tasks to Partially Reconfigurable Devices

Online Scheduling and Placement of Real-time Tasks to Partially Reconfigurable Devices Online Scheduling and Placement of Real-time Tasks to Partially Reconfigurable Devices Christoph Steiger, Herbert Walder, Marco Platzner, Lothar Thiele Computer Engineering and Networks Lab Swiss Federal

More information

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

OpenMosix Presented by Dr. Moshe Bar and MAASK [01] OpenMosix Presented by Dr. Moshe Bar and MAASK [01] openmosix is a kernel extension for single-system image clustering. openmosix [24] is a tool for a Unix-like kernel, such as Linux, consisting of adaptive

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

A Hardware-Software Cosynthesis Technique Based on Heterogeneous Multiprocessor Scheduling

A Hardware-Software Cosynthesis Technique Based on Heterogeneous Multiprocessor Scheduling A Hardware-Software Cosynthesis Technique Based on Heterogeneous Multiprocessor Scheduling ABSTRACT Hyunok Oh cosynthesis problem targeting the system-on-chip (SOC) design. The proposed algorithm covers

More information

ELEC 5260/6260/6266 Embedded Computing Systems

ELEC 5260/6260/6266 Embedded Computing Systems ELEC 5260/6260/6266 Embedded Computing Systems Spring 2016 Victor P. Nelson Text: Computers as Components, 3 rd Edition Prof. Marilyn Wolf (Georgia Tech) Course Topics Embedded system design & modeling

More information

The Trip Scheduling Problem

The Trip Scheduling Problem The Trip Scheduling Problem Claudia Archetti Department of Quantitative Methods, University of Brescia Contrada Santa Chiara 50, 25122 Brescia, Italy Martin Savelsbergh School of Industrial and Systems

More information

Multi-core real-time scheduling

Multi-core real-time scheduling Multi-core real-time scheduling Credits: Anne-Marie Déplanche, Irccyn, Nantes (many slides come from her presentation at ETR, Brest, September 2011) 1 Multi-core real-time scheduling! Introduction: problem

More information

Locality Based Protocol for MultiWriter Replication systems

Locality Based Protocol for MultiWriter Replication systems Locality Based Protocol for MultiWriter Replication systems Lei Gao Department of Computer Science The University of Texas at Austin lgao@cs.utexas.edu One of the challenging problems in building replication

More information

Advanced Operating Systems (M) Dr Colin Perkins School of Computing Science University of Glasgow

Advanced Operating Systems (M) Dr Colin Perkins School of Computing Science University of Glasgow Advanced Operating Systems (M) Dr Colin Perkins School of Computing Science University of Glasgow Rationale Radical changes to computing landscape; Desktop PC becoming irrelevant Heterogeneous, multicore,

More information

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,

More information

The Truth Behind IBM AIX LPAR Performance

The Truth Behind IBM AIX LPAR Performance The Truth Behind IBM AIX LPAR Performance Yann Guernion, VP Technology EMEA HEADQUARTERS AMERICAS HEADQUARTERS Tour Franklin 92042 Paris La Défense Cedex France +33 [0] 1 47 73 12 12 info@orsyp.com www.orsyp.com

More information

Profit Maximization and Power Management of Green Data Centers Supporting Multiple SLAs

Profit Maximization and Power Management of Green Data Centers Supporting Multiple SLAs Profit Maximization and Power Management of Green Data Centers Supporting Multiple SLAs Mahdi Ghamkhari and Hamed Mohsenian-Rad Department of Electrical Engineering University of California at Riverside,

More information

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum Scheduling Yücel Saygın These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum 1 Scheduling Introduction to Scheduling (1) Bursts of CPU usage alternate with periods

More information

Passive Discovery Algorithms

Passive Discovery Algorithms t t Technische Universität Berlin Telecommunication Networks Group arxiv:1506.05255v1 [cs.ni] 17 Jun 2015 Optimized Asynchronous Passive Multi-Channel Discovery of Beacon-Enabled Networks Niels Karowski,

More information

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

More information

Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems

Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems 215 IEEE International Conference on Big Data (Big Data) Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems Guoxin Liu and Haiying Shen and Haoyu Wang Department of Electrical

More information

Hardware/Software Co-Design of a Java Virtual Machine

Hardware/Software Co-Design of a Java Virtual Machine Hardware/Software Co-Design of a Java Virtual Machine Kenneth B. Kent University of Victoria Dept. of Computer Science Victoria, British Columbia, Canada ken@csc.uvic.ca Micaela Serra University of Victoria

More information

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com

More information

Operating Systems, 6 th ed. Test Bank Chapter 7

Operating Systems, 6 th ed. Test Bank Chapter 7 True / False Questions: Chapter 7 Memory Management 1. T / F In a multiprogramming system, main memory is divided into multiple sections: one for the operating system (resident monitor, kernel) and one

More information

FPGA-based MapReduce Framework for Machine Learning

FPGA-based MapReduce Framework for Machine Learning FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

TPCalc : a throughput calculator for computer architecture studies

TPCalc : a throughput calculator for computer architecture studies TPCalc : a throughput calculator for computer architecture studies Pierre Michaud Stijn Eyerman Wouter Rogiest IRISA/INRIA Ghent University Ghent University pierre.michaud@inria.fr Stijn.Eyerman@elis.UGent.be

More information

Contributions to Gang Scheduling

Contributions to Gang Scheduling CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,

More information

Online Scheduling for Cloud Computing and Different Service Levels

Online Scheduling for Cloud Computing and Different Service Levels 2012 IEEE 201226th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum Online Scheduling for

More information

Lec. 7: Real-Time Scheduling

Lec. 7: Real-Time Scheduling Lec. 7: Real-Time Scheduling Part 1: Fixed Priority Assignment Vijay Raghunathan ECE568/CS590/ECE495/CS490 Spring 2011 Reading List: RM Scheduling 2 [Balarin98] F. Balarin, L. Lavagno, P. Murthy, and A.

More information

Schedulability Analysis for Memory Bandwidth Regulated Multicore Real-Time Systems

Schedulability Analysis for Memory Bandwidth Regulated Multicore Real-Time Systems Schedulability for Memory Bandwidth Regulated Multicore Real-Time Systems Gang Yao, Heechul Yun, Zheng Pei Wu, Rodolfo Pellizzoni, Marco Caccamo, Lui Sha University of Illinois at Urbana-Champaign, USA.

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Technical Investigation of Computational Resource Interdependencies

Technical Investigation of Computational Resource Interdependencies Technical Investigation of Computational Resource Interdependencies By Lars-Eric Windhab Table of Contents 1. Introduction and Motivation... 2 2. Problem to be solved... 2 3. Discussion of design choices...

More information

Quality of Service su Linux: Passato Presente e Futuro

Quality of Service su Linux: Passato Presente e Futuro Quality of Service su Linux: Passato Presente e Futuro Luca Abeni luca.abeni@unitn.it Università di Trento Quality of Service su Linux:Passato Presente e Futuro p. 1 Quality of Service Time Sensitive applications

More information

White Paper FPGA Performance Benchmarking Methodology

White Paper FPGA Performance Benchmarking Methodology White Paper Introduction This paper presents a rigorous methodology for benchmarking the capabilities of an FPGA family. The goal of benchmarking is to compare the results for one FPGA family versus another

More information

Cache-aware compositional analysis of real-time multicore virtualization platforms

Cache-aware compositional analysis of real-time multicore virtualization platforms DOI 10.1007/s11241-015-9223-2 Cache-aware compositional analysis of real-time multicore virtualization platforms Meng Xu 1 Linh Thi Xuan Phan 1 Oleg Sokolsky 1 Sisu Xi 2 Chenyang Lu 2 Christopher Gill

More information