Dynamic Memory Management for Embedded Real-Time Systems

Similar documents

Lecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc()

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

The Relative Worst Order Ratio for On-Line Algorithms

Multi-core real-time scheduling

Memory management basics (1) Requirements (1) Objectives. Operating Systems Part of E1.9 - Principles of Computers and Software Engineering

Chapter 7 Memory Management

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Chapter 7 Memory Management

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Real-Time Software. Basic Scheduling and Response-Time Analysis. René Rydhof Hansen. 21. september 2010

CS5460: Operating Systems

Predictable response times in event-driven real-time systems

Getting Started with HC Exchange Module

Lecture 17: Virtual Memory II. Goals of virtual memory

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

DATABASE. Pervasive PSQL Performance. Key Performance Features of Pervasive PSQL. Pervasive PSQL White Paper

Real-Time Scheduling (Part 1) (Working Draft) Real-Time System Example

Introduction to Operating Systems. Perspective of the Computer. System Software. Indiana University Chen Yu

Embedded Systems. 6. Real-Time Operating Systems

Physical Data Organization

Architecture of distributed network processors: specifics of application in information security systems

Practical Performance Understanding the Performance of Your Application

Competitive Analysis of QoS Networks

ELEC 377. Operating Systems. Week 1 Class 3

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Making Dynamic Memory Allocation Static To Support WCET Analyses

6. Storage and File Structures

Virtuoso and Database Scalability

Title: XtratuM: a Hypervisor for Safety Critical Embedded Systems. Authors:M. Masmano, I. Ripoll, A. Crespo and J.J. Metge

COS 318: Operating Systems. Virtual Memory and Address Translation

4 Internet QoS Management

IA-64 Application Developer s Architecture Guide

Cognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

Operating Systems, 6 th ed. Test Bank Chapter 7

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Weighted Total Mark. Weighted Exam Mark

Certification Authorities Software Team (CAST) Position Paper CAST-13

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

MS SQL Performance (Tuning) Best Practices:

COURSE OUTLINE Survey of Operating Systems

Resource Reservation & Resource Servers. Problems to solve

Long-term monitoring of apparent latency in PREEMPT RT Linux real-time systems

MDM Multidomain Edition (Version 9.6.0) For Microsoft SQL Server Performance Tuning

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

From Control Loops to Software

OPERATING SYSTEM - VIRTUAL MEMORY

Chapter 3 Operating-System Structures

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

Embedded & Real-time Operating Systems

XtratuM: a Hypervisor for Safety Critical Embedded Systems

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Tuning Tableau Server for High Performance

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Performance Comparison of RTOS

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Full and Para Virtualization

Dongwoo Kim : Hyeon-jeong Lee s Husband

Free-Space Management

SYSTEM ecos Embedded Configurable Operating System

Operating System Overview. Otto J. Anshus

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

Multi-objective Design Space Exploration based on UML

Summary: This paper examines the performance of an XtremIO All Flash array in an I/O intensive BI environment.

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

9/26/2011. What is Virtualization? What are the different types of virtualization.

Resource Utilization of Middleware Components in Embedded Systems

Java Virtual-Machine Support for Portable Worst-Case Execution-Time Analysis

On Demand Loading of Code in MMUless Embedded System

Trends in Embedded Software Engineering

Real-time Debugging using GDB Tracepoints and other Eclipse features

11.1 inspectit inspectit

Motivation: Smartphone Market

Tuning WebSphere Application Server ND 7.0. Royal Cyber Inc.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

I-Motion SQL Server admin concerns

Open Source Implementation of Hierarchical Scheduling for Integrated Modular Avionics

Siebel Application Deployment Manager Guide. Siebel Innovation Pack 2013 Version 8.1/8.2 September 2013

CHAPTER 17: File Management

Computer Architecture

Troubleshooting Common Issues in VoIP

Real-time processing the basis for PC Control

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

4. Fixed-Priority Scheduling

Real Time Scheduling Basic Concepts. Radek Pelánek

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Validating Java for Safety-Critical Applications

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages

Offline sorting buffers on Line

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Introduction to Virtual Machines

EMBEDDED SOFTWARE DEVELOPMENT: COMPONENTS AND CONTRACTS

Configuring Apache Derby for Performance and Durability Olav Sandstå

Multicore partitioned systems based on hypervisor

Real Time Programming: Concepts

Scalable Prefix Matching for Internet Packet Forwarding

Transcription:

Dynamic Memory Management for Embedded Real-Time Systems Alfons Crespo, Ismael Ripoll and Miguel Masmano Grupo de Informática Industrial Sistemas de Tiempo Real Universidad Politécnica de Valencia Instituto de Automática e Informática Industrial http://www.gii.upv.es

Outline Introduction Basic concepts DSA requirements for real-time systems Allocator classification TLSF description Evaluation Conclusions

Introduction Nowadays embedded systems are used in a wide range of industrial sectors requiring appropriate functionalities. The main advantages of embedded systems are their reduced price and size, broadening the scope of possible applications, but the main problem for their use is the limited computational capabilities they can offer. Optimal use of the resources is an important issue

Introduction Dynamic memory management or Dynamic Storage Allocation (DSA) is one part of the software system that influences the performance and the cost of a product the most. The system must be optimized due to the limitation of memory. Real-time deadlines must be respected: the dynamic memory management system must allocate and deallocate blocks in due time.

Introduction But there is a general misunderstanding of the use of Dynamic Memory Allocation It is not used in real-time systems Other techniques (ad-hoc) have been used It is mostly forgotten in QoS techniques Processor, Network, Energy,... Applications Multimedia systems Mobile phones Applications using AI techniques Languages: Java, RTJava

Misunderstandings about DSA 1. Long running programs will fragment the heap more and more, consuming unbounded memory. 2. Memory request operations (malloc / free) are inherently slow and unbounded 3. It is usually better to implement your own ad-hoc memory allocator than use a known allocator Consequence: It is not used in real-time systems

Dynamic Memory and RT Systems Currently, RT-Systems do not use explicit dynamic memory because Allocation response time is either unbounded or very long The fragmentation problem However, currently, several factors such as RTJava, the existence of more and more complex applications will force the use of dynamic memory

Explicit Dynamic Memory Management Dynamic memory allocation consists in managing, in execution time, a free area of memory (heap) to satisfy a sequence of requests (allocation/deallocation) without any knowledge of future requests This is an on-line optimisation problem: space optimisation Competitive analysis it is not possible to design an optimum algorithm There exist many memory allocators (First-Fit, Best-Fit, Binary-Buddy, etc) (Robson, 80) The off-line version of the problem is NP-Hard

Basic concepts M Memory pool / Heap Task malloc Buffer Task free 0

Basic concepts M Allocated blocks Free blocks Memory pool / Heap Task malloc Buffer Task free

Grupo de Informática Industrial Sistemas de Tiempo-Real Basic concepts Fragmentation: the inability to reuse memory that is free or the amount of wasted memory at a steady state M Allocated blocks Free blocks Memory pool / Heap

Grupo de Informática Industrial Sistemas de Tiempo-Real Basic concepts Fragmentation: There are two sources of wasted memory: internal fragmentation external fragmentation M Allocated blocks Free blocks Memory pool

Basic concepts Finding free blocks: - best fit: extensive search - good fit: find a free block near the best. M Memory pool Allocated blocks Free blocks

Requirements for DSA Bounded response time. The worst-case execution time (WCET) of memory allocation and deallocation has to be known in advance and be independent of application data. Fast response time. Besides, having a bounded response time, the response time has to be fast enough to be usable. Memory requests need to be always satisfied through an efficient use of memory. The allocator has to handle memory efficiently, that is, the amount of wasted memory should be as small as possible.

Allocators classification Policy Allocation First-fit Best-fit Good-fit Next-fit Worst-fit Deallocation Immediate coalescence Deferred coalescence No coalescence Mechanism Sequential fits (linked lists) Segregated lists (set of free lists) Buddy systems (Segregated free lists) Indexed structures (AVL, Cartesian trees) Bitmaps Several mechanisms (bitmaps + segregated list)

Selected Dynamic Memory Allocators Reference algorithms First-Fit and Best-Fit Labelled as RT-Systems allocators Binary Buddy Widely used allocators Doug Lea s malloc (DLmalloc) Designed for RT-Systems Half-Fit and TLSF

Most used/known allocators Allocator First-fit Allocation Policy Deallocation Policy Mechanism First fit Immediate coalescence Linked List Best-fit Best fit Immediate coalescence Linked List Binary-buddy Best fit Immediate coalescence Buddy systems AVL Best fit Immediate coalescence Indexed Lists DLmalloc < 512b Exact fit No coalescence Bitmaps 512b Best fit Deferred coalescence Linked List Half-fit Good fit Immediate coalescence TLSF Good fit Immediate coalescence Bitmaps + Segregated list Bitmaps + Segregated list

Worst /Bad Case Costs First-fit/Best-fit Binary-buddy DLmalloc AVL Allocator Half-fit/TLSF Allocation O( M ( 2 n) O( log 2 O( ( M M O(1) n n ) ) )) Deallocation O(1) O( log2 ( O(1) O(1) M O( 2.44 log2 ( M n )) O( 4.32 log2 ( M n )) M: Maximum memory size (Heap) n: Largest allocated block n ))

Grupo de Informática Industrial Sistemas de Tiempo-Real TLSF Design TLSF (Two Level Segregated Lists) TLSF was designed and implemented in the EU project OCERA (Open Components for Real-Time Embedded Applications) (http://www.ocera.org) It performs inmediate coalescense of free blocks Uses Segregated list & bitmaps Uses the Good fit policy

TLSF Design Uses Segregated list & bitmaps Free Blocks 104b 109b Second Level Directory 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 [480..511] [448..479] [416..447] [384..415] [352..383] [320..351] [288..319] [255..287] 256 8 0 0 1 1 7 128 64 6 32 5 36b 38b [240..255] [224..239] [208..223] [192..207] [176..191] [160..175] [144..159] [128..143] [120..127] [112..119] [104..111] [96..103] [88..95] [80..87] [72..79] [64..71] [60..64] [56..59] [52..55] [48..51] [44..47] [40..43] [36..39] [32..35] 7 6 5 4 3 2 1 0 First Level Directory

Implementation issues Two configuration parameters (I, J) I: 2 I maximum block size J: 2 J number of second level lists Each set of list have associated a bitmap Bitmap search using bit search forward and reverse (bsf y bsr in IA32) [64, 128[ 31 30 29 8 7 6 5 0 0 0 0 0 1 0 31 30 29 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 31 30 29 2 1 0 0 0 0 1 0 0 31 30 29 2 1 0 [68, 70[ 69 List (i, j) has blocks in the range i i J i i J ( i, j) =[ 2 + j 2,2 + ( j+ 1) 2 [ Ex:. J=5, the list (i: 10, j: 5) has blocks of size in range [1084, 1216[

Implementation issues Two translation functions are provided Search : returns the first list in the range higher or equal to r log () r J 2 buscar(r) search(r) = i : log 2 r + 2 j : r + 2 Insert: return the list which range includes r insertar(r insert(r) () ( ) log 2 r J i i J 1 2 / 2 ) = i : log 2 j : ( r ) i J ( r 2 )/( 2 i ) 1

Evaluation Worst / Bad Case Execution Time Specific scenarios to achieve the worst or bad case and measure it (cycles and number of instructions) Synthetic loads (real-time loads?) Measures of average, st_dev, maximum and minimum of several tests (> 20 experiences)

Worst / Bad scenario evaluation Identification of each worst/bad case Definition of a load to achieve the w/b case Execute the load Measure Worst-case (WC) and Bad-case (BC) allocation Malloc FF BF BB DL HF TLSF Processor instructions 81995 98385 1403 721108 164 197

Evaluation loads Real load There are not examples of dynamic memory use in real-time systems Available loads of classical programs using dynamic memory: compilers (gcc, perl,..), applications (gs, espresso, cfrac, ) Synthetic loads Several general purpose models Generated from periodic memory use models

Synthetic load for periodic models Periodic task model extension Each task is defined as T i =(c i, p i, d i, g i, h i ) g i : Maximum amount of memory per period h i : Holding time Memory pool Gmax... Htmax Attributes: Gmax - maximum amount of memory allocated by pe Htmax - maximum holding time

Grupo de Informática Industrial Sistemas de Tiempo-Real Load generation A load generator produces set of task with different profiles. Huge blocks Small blocks Hybrid (small and large) blocks Load examples

Evaluation Temporal measures Number of cycles: Highly dependent of the processor used (AMD, Intel) and of the data caches, Number of instructions: Unaffected by cache, TLBs, processor, but it is hard to measure (Processor switched to single-step mode) Spatial measures Fragmentation: very huge number of operations

Evaluation: Number of proc. cycles In order to reduce the system (hardware, os, interrupts,..) interferences Each test has been executed with interrupt disabled A trace has been generated and used for 4 replicas in order to avoid processor interferences and cache missing trace replicas compare and extract minimum Result

Evaluation: Number of proc. cycles Temporal cost of the allocator operations in processor cycles alloc Test1 Test2 Test3 lloc. Avg. Stdv. Max. Min. Avg. Stdv. Max. Min. Avg. Stdv. Max. Min. irst-fit 182 218 977 101 150 286 1018 95 169 282 2059 10 est-fit 479 481 2341 114 344 348 2357 98 1115 1115 6413 10 inary-buddy 169 465 1264 140 156 228 656 104 162 225 1294 11 Lmalloc 344 347 1314 123 268 290 2911 79 309 312 2087 8 alf-fit 189 331 592 131 148 577 657 108 157 552 626 13 LSF 206 256 371 135 169 268 324 109 191 229 349 11 ree Test1 Test2 Test3 lloc. Avg. Stdv. Max. Min. Avg. Stdv. Max. Min. Avg. Stdv. Max. Min. irst-fit 162 188 1432 87 148 182 1412 86 174 195 1512 9 est-fit 152 188 1419 88 121 235 1287 87 145 179 1450 9 inary-buddy 152 302 1127 126 155 300 760 126 150 306 824 12 Lmalloc 122 211 342 87 101 328 335 75 127 186 335 7 alf-fit 181 212 518 104 171 233 870 103 182 210 1025 10 LSF 192 215 624 109 161 211 523 106 187 214 552 10

Evaluation: Number of instructions A process (parent) counts (PTRACE) the executed instructions of another process (process that performs the mallocs and frees) Monitor PTRACE step by step Process to be analysed

Grupo de Informática Industrial Sistemas de Tiempo-Real Evaluation: Number of instructions 2500 First-fit 6000 Best-fit 900 Binary-Buddy using real load: cfrac, gawk, perl 1400 DLmalloc TLSF presentation 125 Half-fit 170 TLSF 32

Evaluation: Number of instructions Temporal cost of the allocator operations in processor instructions alloc Test1 Test2 Test3 lloc. Avg. Stdv. Max. Min. Avg. Stdv. Max. Min. Avg. Stdv. Max. Min. irst-fit 204 21 478 71 201 17 818 70 203 23 957 70 est-fit 582 69 798 76 442 130 1006 76 805 179 1539 76 inary-buddy 169 17 843 157 136 22 1113 95 153 24 1113 95 Lmalloc 279 107 921 64 161 126 933 49 232 152 1277 57 alf-fit 118 1 123 115 116 7 123 76 118 1 123 82 LSF 147 13 164 104 118 25 164 84 133 22 164 84 ree Test1 Test2 Test3 lloc. Avg. Stdv. Max. Min. Avg. Stdv. Max. Min. Avg. Stdv. Max. Min. irst-fit 93 96 128 59 90 92 128 57 92 95 128 57 est-fit 91 115 126 57 69 148 126 57 79 198 128 57 inary-buddy 68 70 225 65 68 72 277 65 69 73 228 65 Lmalloc 70 128 77 53 59 177 77 39 67 168 77 39 alf-fit 117 117 167 73 115 116 165 76 117 117 167 76 LSF 140 140 217 91 107 110 216 87 120 122 217 87

Evaluation: Fragmentation It is measured to a factor F which is computed as the point of the maximum memory used by the allocator relative to the point of the maximum amount of memory used by the load (live memory). 1 2 1 2 F = 2 Metric proposed by Johnstone et al. (Johnstone et al., 98)

Grupo de Informática Industrial Sistemas de Tiempo-Real Evaluation: Fragmentation

Evaluation: Fragmentation Fragmentation results Factor F Test1 Test2 Test3 c. Avg. Stdv. Max. Min. Avg. Stdv. Max. Min. Avg. Stdv. Max. Min. t-fit 93,25 3,99 99,58 87,57 83,21 9,04 98,17 70,67 87,63 4,41 94,82 70,7 st-fit 10,26 1,25 14,23 7,2 21,51 2,73 26,77 17,17 11,76 1,32 14,14 9,7 ary-buddy 73,56 6,36 85,25 66,61 61,97 1,97 65,06 58,79 77,58 5,39 84,34 64,8 malloc 10,11 1,55 12,9 7,39 17,13 2,07 21,75 14,71 11,79 1,39 13,72 9 lf-fit 84,67 3,02 90,07 80,4 71,5 3,44 75,45 65,02 98,14 3,12 104,67 94,2 F 10,49 1,66 11,79 6,51 14,86 2,15 18,56 9,86 11,15 1,1 13,91 7,4

Conclusions TLSF ia an allocator that performs operations with predictable cost very efficient (fast) low fragmentation It permits: To consider dynamic storage allocation in real-time systems (predictable operations) The analysis of a resource as memory for systems with memory constraints (embedded systems)