Best Practices for Parallels Containers for Linux: Q1 2013 Using Virtual Swap to Maximize Container Performance www.parallels.com
Table of Contents Introduction... 3 How VSwap Works... 3 Using VSwap with UBC... 4 Comparison Between UBC and VSwap... 5 Best Practices... 7 Conclusion... 8
Introduction Virtual Swap (VSwap) is an easy-to-use memory management scheme that allows you to limit the maximum amount of memory allocated to a container by specifying how much physical memory and virtual swap space the container may use. It supersedes the earlier service-level management (SLM) scheme and coexists with the traditional user beancounter (UBC) memory scheme. The VSwap memory management scheme which was first introduced in Parallels Virtuozzo Containers for Linux (PVCfL) 4.7 and in Parallels Server Bare Metal 5 with Kernel 2.6.32 and is now available in Parallels Cloud Server 6.0 allows a container to behave more like a dedicated server, giving you the ability to map files or devices into memory. Previously, trying to do this with UBC or SLM was a complicated process one that was not always successful. VSwap has two key benefits. First, it allows you to run memory-intensive applications, such as MongoDB, inside containers without having to first tweak their configuration. Second, it reduces much of the overhead and misconfiguration that UBC caused, dramatically increasing the system s performance while also increasing its stability. This paper explains how VSwap works, the benefits it provides, and the best practices for using it. How VSwap Works VSwap is managed with two primary parameters, physpages and swappages: The physpages parameter limits the amount of physical memory (RAM) available to processes inside a container, including user memory, the kernel memory, and the page cache. When you set a physpages value, the system will ignore the barrier limit and continue allocating memory to applications running in the container until it reaches the physpages value. At that point, the application will get no more memory and will stop working. The swappages parameter limits the amount of swap space that can be allocated to processes running in a container. There is also a third parameter, vm_overcommit, which allows you to overcommit memory above the assigned limit. The overcommitted memory is calculated as (physpages + swappages) X vm_overcommit. By default, this parameter is equal to 1 which means there is no overcommitment. We recommend keeping it at this default value because if an application attempts to use the overcommitted memory, the kernel won t allow it, and may end up terminating the application. The only situation in which you might want to set this parameter to a value greater than 1 is in the case of an application that requires the assignment of substantially more memory than it will actually use. At a minimum, you need to specify the physpages value. If you want the container to make use of swap space, you also need to set the swappages value otherwise, the VSwap will assign it a default value of 0. For VSwap purposes, you don t need any of the other UBC parameters. When the physpages limit is reached, memory pages belonging to the container are pushed out to a virtual swap space (VSwap). When the swappages limit is reached, the container will run into an out-of-memory (OOM) situation, and the kernel will start terminating applications inside it to free up more memory. Parallels Using Virtual Swap to Maximize Container Performance 3
One key difference between normal swap and VSwap is that with VSwap no actual disk I/O occurs. Instead, the container that exceeded the physpages limit is artificially slowed down to emulate the effect of the real swapping. Actual swap-out occurs only if there is a global memory shortage on the system. Another difference is that with VSwap, the disk page cache is limited to the physpages value. This approach improves container isolation, as it prevents a container that is reading a lot of disk data from preempting the cache of other containers. A third difference is the existence of the new swappages parameter, allowing you, for the first time, to specify how much memory you want to allocate to swap space. Using VSwap with UBC Since the VSwap memory management model coexists with the traditional UBC model, you can use the two approaches together. By default, VSwap will set all primary and secondary UBC parameters to unlimited, as physical servers are limited only by the physical CPU and memory resources, and it s important to match the behaviors of physical servers as closely as possible to ensure that applications will run properly. However, you can still use the UBC parameters to limit a particular resource (e.g., the number of processes) within the overall memory limits. For example, you might set VSwap to 512MB 1 and set the number of processes to start at 10, indicating that you can start only 10 application processes inside the container at any given time. In this case, whichever limit is hit first (512MB or 10 processes) will take effect. To retain backward compatibility with UBC, VSwap uses some of the existing UBC parameters, as well as the new parameters discussed in the preceding section. Table 1 shows how VSwap sets the value for each UBC parameter. We recommend that you keep these default values, since if you change them, you won t be able to use VSwap and will revert back to the UBC functionality. Once you set the physpages parameter in VSwap (and, optionally, the swappages parameter), VSwap will use these values to automatically calculate all the UBC parameters listed in Table 1. Table 1. VSwap s Use of UBC Parameters UBC Value VSwap Value VMGUARPAGES OOMGUARPAGES PRIVVMPAGES LOCKEDPAGES VSwap sets this value to the sum of physpages + swappages. VSwap sets this value to the physpages value, guaranteeing the physical memory in situations where the hostnode runs out of memory. So, for example, if you have physpages set at 512MB and the hostnode runs out of memory, the kernel usually won t terminate any processes running within the 512MB. VSwap sets this value to the product of vmguarpages X vm_overcommit. If you set the vm_overcommit value to 1, as recommended above, VSwap will set the privvmpages parameter to unlimited, meaning that it will be ignored in any memory-related decision (such as terminating applications in an OOM situation). VSwap sets this value to the physpages value to guarantee that locked memory will not be swapped out (e.g., in the case of applications like passwd). 1 Note: Throughout the paper, all values represent pages, and one page = 4,096 bytes. Parallels Using Virtual Swap to Maximize Container Performance 4
Comparison Between UBC and VSwap To illustrate the differences between UBC and VSwap, we ran a simple test on two systems one with PVCfL 4.6 running UBC, and one with PVCfL 4.7 running VSwap. The test involved two C functions: exec and fork. An exec operation replaces the current process with another process, specified by its filename. A fork operation duplicates the calling process, spawning a child process with own memory and process space as occurs, for example, when an application drops privileges and runs under a different user. We chose these two functions because the kernel spends a lot of time executing them, as it needs to both allocate memory and guarantee its availability. We ran version 0.99.23 of the test on both systems, running it multiple times, both as a single-process application and as a multi-process application, using a larger number of parallel processes each time, so it would utilize all the CPU cores. In the case of both the exec and fork functions, the test with 32 simultaneous processes using UBC ran out of memory and failed the test. Both the UBC and VSwap tests ran on Hexacore Xeon X5650 CPUs with Hyper-Threading enabled. Table 2 shows the configuration details for each machine, with the text in red highlighting configuration differences. Figure 1 shows the results of the exec test, specifying the number of processes that each test system could execute within 1 second. Figure 2 provides the same information for the fork test. From the results in these two figures, you can see that VSwap scales very well on multi-core systems, while UBC does not, due to the nature of its design. Table 2. Configuration of Test Systems Configuration #1 PVC 4.6 @ Kernel 2.6.18 (UBC) #2 PVC 4.7 @ Kernel 2.6.32 (VSwap) Container CPU vcpu @ 2.67 GHz vcpu @ 2.67 GHz RAM 256 MB 256 MB OS Linux 2.6.18 028stab077.1 #1 SMP Mon Nov 1 19:26:08 MSK 2010 Red Hat 5.6 Tikanga x86_64 Linux 2.6.32 042stab036.6 #1 SMP Tue Sep 13 19:37:36 MSD 2011 Red Hat 6.1 Santiago x86_64 Host CPU 24 SMP @ 2.67GHz 24 SMP @ 2.67GHz RAM 48 GB 48 GB OS Linux 2.6.18 028stab077.1 #1 SMP Mon Nov 1 19:26:08 MSK 2010 Red Hat 5.4 Final x86_64 Linux 2.6.32 042stab036.6 #1 SMP Tue Sep 13 19:37:36 MSD 2011 Red Hat 5.0.0 1066 x86_64 Parallels Using Virtual Swap to Maximize Container Performance 5
Figure 1. Differences between UBC and VSwap in times required to run the exec test. Figure 2. Differences between UBC and VSwap in times required to run the fork test. Parallels Using Virtual Swap to Maximize Container Performance 6
Best Practices VSwap makes the memory management of containers much easier. Instead of having to set a large number of primary and secondary UBC parameters, you just use the physpages and swappages parameters to set limits for the amount of memory and swap space available to each container. Best Practice: Do not overcommit memory. That is, the sum of the physpages values for all containers should not exceed the memory available on the physical server, and the sum of the swappages values should not be greater than 25% of the hostnode s physical memory. As an illustration of this point, consider the following example: Physical server = 32 GB RAM Number of containers = 64 Physpages per container =~ 512 MB RAM Swappages per container = ~128 MB (25% of 512 MB) In this case, the physical memory of all containers is exactly the same as hostnode s memory, which will prevent an OOM situation from terminating an application or process inside the container. As for the swappages value, it does create a 25% chance that an OOM situation will terminate one or more processes in at least one container, and a.39% chance (25% divided by 64) that an OOM situation will terminate one or more processes in a specific container. However, since it s unlikely that all containers will simultaneously request the maximum memory at the same time, you may decide that this small risk is worth taking, since it will improve density on the server. If you do decide to overcommit, it s important to monitor containers usage of memory closely and adjust it if you see that see certain containers are driving overall memory usage too high. It s also important to not overcommit too much of the hostnode s memory, as the greater the overcommitment, the greater the probability that the kernel will terminate applications. Note that VSwap will automatically virtualize the /proc/meminfo output inside each container, so that the maximum allowable memory for that container is visible to the applications within the container. That way, applications will know how much memory is available to them and will not try to use more than the allotted amount. In contrast, before VSwap was available, the /proc/meminfo value was manually assigned and could be completely different from the actual available memory. In such a case, when the customer would try to use this additional memory, the application would fail. As for vm_overcommit, the best practice is simply to use the default value of 1 so you don t have to worry about your application being terminated if it requests more memory than is available for the container. Table 3 provides an overview of the differences between using UBC and VSwap with regard to memory available to the container. Table 3. Memory Available to Containers With UBC vs. VSwap Memory Management Hostnode Container (what is seen) UBC without meminfo virtualization UBC with meminfo virtualization vswap without overcommit (recommended configuration) vswap with overcommit configurable up to * configurable up to * configurable up to configurable up to configurable up to 16 GB Swap** configurable up to ** * But only the amount specified in PRIVVMPAGES would be actually usable. ** But only the amount specified in PHYSPAGES and SWAPPAGES would be actually usable. Parallels Using Virtual Swap to Maximize Container Performance 7
One general suggestion: it s not a good idea to make container owners aware of resources that they can t use (e.g., there s no point in letting them know that the server has 128 GB of RAM when their container is only allowed to use only 16 GB). Conclusion VSwap, which is available with Paralllels Cloud Server 6.0, offers significant ease of use, performance, and stability improvements over legacy memory management schemes. Instead of having to set numerous parameters, you only have to set two physpages and swappages. In addition, VSwap lets you run memory-intensive applications inside containers without having to first tweak their configuration, and you can run parallel processes in far less time than you could with UBC without running out of memory. For all these reasons, VSwap is a much better approach than UBC for managing memory inside containers. Parallels Using Virtual Swap to Maximize Container Performance 8