DYNAMIC MEMORY ALLOCATION AND FRAGMENTATION IN C & C++ COLIN WALLS, MENTOR GRAPHICS

DYNAMIC MEMORY ALLOCATION AND FRAGMENTATION IN C & C++ COLIN WALLS, MENTOR GRAPHICS E M B E D D E D S O F T W A R E W H I T E P A P E R w w w. m e n t o r. c o m

INTRODUCTION In C and C++, it can be very convenient to allocate and de-allocate blocks of memory as and when needed. This is certainly standard practice in both languages and almost unavoidable in C++. However, the handling of such dynamic memory can be problematic and inefficient. For desktop applications, where memory is freely available, these difficulties can be ignored. For real-time embedded systems, ignoring the issues is not an option. Dynamic memory allocation tends to be non-deterministic; the time taken to allocate memory may not be predictable and the memory pool may become fragmented, resulting in unexpected allocation failures. In this paper the problems will be outlined in detail. Facilities in the Nucleus RTOS for handling dynamic memory are outlined and an approach to deterministic dynamic memory allocation detailed. C/C++ MEMORY SPACES It may be useful to think in terms of data memory in C and C++ as being divided into three separate spaces: 1. Static memory. This is where variables, which are defined outside of functions, are located. The keyword static does not generally affect where such variables are located; it specifies their scope to be local to the current module. Variables that are defined inside of a function, which are explicitly declared static, are also stored in static memory. Commonly, static memory is located at the beginning of the RAM area. The actual allocation of addresses to variables is performed by the embedded software development toolkit: a collaboration between the compiler and the linker. Normally, program sections are used to control placement, but more advanced techniques, like Fine Grain Allocation, give more control. Commonly, all the remaining memory, which is not used for static storage, is used to constitute the dynamic storage area, which accommodates the other two memory spaces. 2. Automatic variables. Variables defined inside a function, which are not declared static, are automatic. There is a keyword to explicitly declare such a variable -auto- but it is almost never used. Automatic variables (and function parameters) are usually stored on the stack. The stack is normally located using the linker. The end of the dynamic storage area is typically used for the stack. Compiler optimizations may result in variables being stored in registers for part or all of their lifetimes; this may also be suggested by using the keyword register. 3. The heap. The remainder of the dynamic storage area is commonly allocated to the heap, from which application programs may dynamically allocate memory, as required. Figure 1 illustrates a typical memory layout for the three C/C++ data memory spaces. If an operating system like Nucleus is used, each task normally has its own private stack. DYNAMIC MEMORY IN C In C, dynamic memory is allocated from the heap using some standard library functions. The two key dynamic memory functions are malloc() and free(). Figure 1: C/C++ data memory spaces. 2

The malloc() function takes a single parameter, which is the size of the requested memory area in bytes. It returns a pointer to the allocated memory. If the allocation fails, it returns NULL. The prototype for the standard library function is like this: void *malloc(size_t size); The free() function takes the pointer returned by malloc() and de-allocates the memory. No indication of success or failure is returned. The function prototype is like this: void free(void *pointer); To illustrate the use of these functions, here is some code to statically define an array and set the fourth element s value: int my_array[10]; my_array[3] = 99; The following code does the same job using dynamic memory allocation: int *pointer; pointer = malloc(10 * sizeof(int)); *(pointer+3) = 99; The pointer de-referencing syntax is hard to read, so normal array referencing syntax may be used, as [ and ] are just operators: pointer[3] = 99; When the array is no longer needed, the memory may be de-allocated thus: free(pointer); pointer = NULL; Assigning NULL to the pointer is not compulsory, but is good practice, as it will cause an error to be generated if the pointer is erroneous utilized after the memory has been de-allocated. The amount of heap space actually allocated by malloc() is normally one word larger than that requested. The additional word is used to hold the size of the allocation and is for later use by free(). This size word precedes the data area to which malloc() returns a pointer. There are two other variants of the malloc() function: calloc() and realloc(). The calloc() function does basically the same job as malloc(), except that it takes two parameters the number of array elements and the size of each element instead of a single parameter (which is the product of these two values). The allocated memory is also initialized to zeros. Here is the prototype: void *calloc(size_t nelements, size_t elementsize); 3

The realloc() function resizes a memory allocation previously made by malloc(). It takes as parameters a pointer to the memory area and the new size that is required. If the size is reduced, data may be lost. If the size is increased and the function is unable to extend the existing allocation, it will automatically allocate a new memory area and copy data across. In any case, it returns a pointer to the allocated memory. Here is the prototype: void *realloc(void *pointer, size_t size); DYNAMIC MEMORY IN C++ Management of dynamic memory in C++ is quite similar to C in most respects. Although the library functions are likely to be available, C++ has two additional operators -new and delete- which enable code to be written more clearly, succinctly and flexibly, with less likelihood of errors. The new operator can be used in three ways: p_var = new typename; p_var = new type(initializer); p_array = new type [size]; In the first two cases, space for a single object is allocated; the second one includes initialization. The third case is the mechanism for allocating space for an array of objects. The delete operator can be invoked in two ways: delete p_var; delete[] p_array; The first is for a single object; the second de-allocates the space used by an array. It is very important to use the correct de-allocator in each case. There is no operator that provides the functionality of the C realloc() function. Here is the code to dynamically allocate an array and initialize the fourth element: int* pointer; pointer = new int[10]; pointer[3] = 99; Using the array access notation is natural. De-allocation is performed thus: delete[] pointer; pointer = NULL; Again, assigning NULL to the pointer after de-allocation is just good programming practice. Another option for managing dynamic memory in C++ is the use the Standard Template Library. This may be inadvisable for real-time embedded systems. 4

ISSUES AND PROBLEMS As a general rule, dynamic behavior is troublesome in real-time embedded systems. The two key areas of concern are determination of the action to be taken on resource exhaustion and non-deterministic execution performance. When considering dynamic memory utilization, there are two likely problem areas: stacks and the use of malloc(). STACKS A stack is located in memory, set up to be a particular size and the stack pointer initialized to point to the word following the allocated memory. Thereafter, problems occur if stack operations result in data being written outside of the allocated memory space. This may be stack overflow, where more data is pushed on to the stack than it has capacity to accommodate. Or it can be stack underflow, where data is popped off of an empty stack (as a result of a programming error). In both cases the problems can be very hard to locate, as they typically manifest themselves later in completely unrelated code. It may even result is problems in another, unrelated task. USE OF MALLOC() There are a number of problems with dynamic memory allocation in a real time system. First, the standard library functions (malloc() and free()) are not normally reentrant, which would be problematic in a multithreaded application. If the source code is available, this should be straightforward to rectify by locking resources using RTOS facilities (like a semaphore). A more intractable problem is associated with the performance of malloc(). Its behavior is unpredictable, as the time it takes to allocate memory is extremely variable. Such non-deterministic behavior is intolerable in real time systems. Without great care, it is easy to introduce memory leaks into application code implemented using malloc() and free(). This is caused by memory being allocated and never being de-allocated. Such errors tend to cause a gradual performance degradation and eventual failure. This type of bug can be very hard to locate. Memory allocation failure is a concern. Unlike a desktop application, most embedded systems do not have the opportunity to pop up a dialog and discuss options with the user. Often, resetting is the only option, which is unattractive. If allocation failures are encountered during testing, care must be taken with diagnosing their cause. It may be that there is simply insufficient memory available this suggests various courses of action. However, it may be that there is sufficient memory, but not available in one contiguous chunk that can satisfy the allocation request. This situation is called memory fragmentation. MEMORY FRAGMENTATION The best way to understand memory fragmentation is to look at an example. For this example, it is assumed that there is a 10K heap. First, an area of 3K is requested, thus: #define K (1024) char *p1; p1 = malloc(3*k); Then, a further 4K is requested: 5

p2 = malloc(4*k); The resulting situation is shown in Figure 2 3K of memory is now free. Some time later, the first memory allocation, pointed to by p1, is de-allocated: free(p1); This leaves 6K of memory free in two 3K chunks, as illustrated in Figure 3. A further request for a 4K allocation is issued: p1 = malloc(4*k); This results in a failure - NULL is returned into p1 - because, even though 6K of memory is available, there is not a 4K contiguous block available. This is memory fragmentation. Figure 2: Two memory areas allocated. Figure 3: Non-contiguous free memory. It would seem that an obvious solution would be to de-fragment the memory, merging the two 3K blocks to make a single one of 6K. However, this is not possible because it would entail moving the 4K block to which p2 points. Moving it would change its address, so any code that has taken a copy of the pointer would then be broken. In other languages (such as Visual Basic, Java and C#), there are de-fragmentation (or garbage collection ) facilities. This is only possible because these languages do not support direct pointers, so moving the data has no adverse effect upon application code. This de-fragmentation may occur when a memory allocation fails or there may be a periodic garbage collection process that is run. In either case, this would severely compromise real time performance and determinism. MEMORY WITH AN RTOS Memory management facilities that are compatible with real-time requirements i.e. they are deterministic are usually provided with commercial real time operating systems. A widely used scheme which allocates blocks or partitions of memory under the control of the OS is employed by Nucleus. NUCLEUS RTOS BLOCK/PARTITION MEMORY ALLOCATION Block memory allocation is performed using a partition pool, which is defined statically or dynamically and configured to contain a specified number of blocks of a specified fixed size. For Nucleus OS, the API call to define a partition pool has the following prototype: STATUS NU_Create_Partition_Pool(NU_PARTITION_POOL *pool, CHAR *name, VOID *start_address, UNSIGNED pool_size, UNSIGNED partition_size, OPTION suspend_type); This is most clearly understood by means of an example: 6

status = NU_Create_Partition_Pool(&MyPool, any name, (VOID *) 0xB000, 2000, 40, NU_FIFO); This creates a partition pool with the descriptor MyPool, containing 2000 bytes of memory, filled with partitions of size 40 bytes (i.e. there are 50 partitions). The pool is located at address 0xB000. The pool is configured such that, if a task attempts to allocate a block, when there are none available, and it requests to be suspended on the allocation API call, suspended tasks will be woken up in a first-in, first-out order. The other option would have been task priority order. Another API call is available to request allocation of a partition. Here is an example using Nucleus RTOS: status = NU_Allocate_Partition(&MyPool, &ptr, NU_SUSPEND); This requests the allocation of a partition from MyPool. When successful, a pointer to the allocated block is returned in ptr. If no memory is available, the task is suspended, because NU _ SUSPEND was specified; other options, which may have been selected, would have been to suspend with a timeout or to simply return with an error. When the partition is no longer required, it may be de-allocated thus: status = NU_Deallocate_Partition(ptr); If a task of higher priority was suspended pending availability of a partition, it would now be run. There is no possibility for fragmentation, as only fixed size blocks are available. The only failure mode is true resource exhaustion, which may be controlled and contained using task suspend, as shown. Additional API calls are available which can provide the application code with the means to handle and obtain information about the status of partitions: NU _ Delete _ Partition _ Pool() removes a partition pool. NU _ Partition _ Pool _ Information() returns a variety of information about a partition pool. NU _ Established _ Partition _ Pools() returns the number of partition pools [currently] in the system. NU _ Partition _ Pool _ Pointers() returns pointers to all the partition pools in the system. Care is required in allocating and de-allocating partitions, as the possibility for the introduction of memory leaks remains. MEMORY LEAK DETECTION The potential for programmer error resulting in a memory leak when using partition pools is recognized by vendors of real time operating systems. Typically, a profiler tool is available which assists with the location and rectification of such bugs. Figure 4 illustrates a screen shot from the Mentor Graphics EDGE Profiler, working with Nucleus RTOS. The graph shows the amount of memory that has been allocated from the pool. Each dot shows an allocation or de-allocation; clicking on an allocation results in its corresponding de-allocation being highlighted and vice versa. Unmatched allocations are shown in red and may indicate a memory leak. 7

Figure 4: Mentor Graphics EDGE Profiler. REAL-TIME MEMORY SOLUTIONS STACKS Having identified a number of problems with dynamic memory behavior in real-time systems, some possible solutions and better approaches can be proposed. Although it could be a result of a programming error, the most likely reason for stack overflow is that the stack space allocation was insufficient. Specifying stack size is very challenging. It is almost impossible to estimate statically for most systems. The simplest approach is to make measurements on a running system. Allocate a generous amount of space for the stack, fill it with a known value non-zero, odd numbers make most sense, as they less likely to be a valid address. Run the code for a reasonable period of time and inspect the stack to determine how much was used. Nucleus RTOS includes facilities for stack monitoring specifically an API call: NU _ Check _ Stack(). Error checking in the kernel may also be configured using the symbol NU _ ENABLE _ STACK _ CHECK. To monitor at run time for stack overflow/ underflow, a good technique is to place a guard word at either end of the stack space, as illustrated in Figure 5. These may be initialized to a known value (again a non-zero, odd number is best) and a background (low priority) Figure 5: Stack guard words. 8

task used to monitor the words for changes during execution. Alternatively, a memory management unit may be used to trap an error if the guard words are accessed. DYNAMIC MEMORY CONCLUSION It is possible to use partition memory allocation to implement malloc() in a robust and deterministic fashion. The idea is to define a series of partition pools with block sizes in a geometric progression; e.g. 32, 64, 128, 256 bytes. A malloc() function may be written to deterministically select the correct pool to provide enough space for a given allocation request. For example, if 56 bytes are requested, a 64 byte partition would be used; for 99 bytes, a 128 byte partition. Unfortunately, a 65 byte requested would be satisfied with a 128 byte partition. This approach takes advantage of the deterministic behavior of the partition allocation API call, the robust error handling (e.g. task suspend) and the immunity from fragmentation offered by block memory. The choice of partition sizes, being powers of 2, is to facilitate efficient code using a bit mask. C and C++ use memory in various ways, both static and dynamic. Dynamic memory includes stack and heap. Dynamic behavior in embedded real-time systems is generally a source of concern, as it tends to be nondeterministic and failure is hard to contain. Using the facilities provided by most real-time operating systems, including Nucleus RTOS, a dynamic memory facility may be implemented which is deterministic, immune from fragmentation and with good error handling. F o r t h e l a t e s t p r o d u c t i n f o r m a t i o n, c a l l u s o r v i s i t : w w w. m e n t o r. c o m 2010 Mentor Graphics Corporation, all rights reserved. This document contains information that is proprietary to Mentor Graphics Corporation and may be duplicated in whole or in part by the original recipient for internal business purposes only, provided that this entire notice appears in all copies. In accepting this document, the recipient agrees to make every reasonable effort to prevent unauthorized use of this information. All trademarks mentioned in this document are the trademarks of their respective owners. MGC 05-10 TECH9040-w