Memory-Shielded RAM Disk Support in RedHawk Linux 6.0



Similar documents
Comparison of the real-time network performance of RedHawk Linux 6.0 and Red Hat operating systems

Real Time Performance of a Security Hardened RedHawk Linux System During Denial of Service Attacks

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

Recommended hardware system configurations for ANSYS users

How to Choose your Red Hat Enterprise Linux Filesystem

Tableau Server 7.0 scalability

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

- An Oracle9i RAC Solution

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

VMware vcenter Update Manager Performance and Best Practices VMware vcenter Update Manager 4.0

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

SQL Server 2008 Performance and Scale

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Packet Capture in 10-Gigabit Ethernet Environments Using Contemporary Commodity Hardware

Benchmarking Cassandra on Violin

Accelerating Microsoft Exchange Servers with I/O Caching

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

NetIQ Privileged User Manager

FileMaker. Running FileMaker Pro 7 on Windows Server 2003 Terminal Services

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

CMSC 611: Advanced Computer Architecture

Multiprocessor Cache Coherence

LS DYNA Performance Benchmarks and Profiling. January 2009

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Infor Web UI Sizing and Deployment for a Thin Client Solution

iscsi Performance Factors

Big data management with IBM General Parallel File System

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Multi-core and Linux* Kernel

Parallels Cloud Storage

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Performance And Scalability In Oracle9i And SQL Server 2000

EXPLORING LINUX KERNEL: THE EASY WAY!

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

CQG Trader Technical Specifications. December 1, 2014 Version

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

QLIKVIEW SERVER MEMORY MANAGEMENT AND CPU UTILIZATION

SAS Business Analytics. Base SAS for SAS 9.2

Solid State Drive Architecture

Real-Time Analysis of CDN in an Academic Institute: A Simulation Study

Programming Flash Microcontrollers through the Controller Area Network (CAN) Interface

SQL Server Virtualization

Boost Database Performance with the Cisco UCS Storage Accelerator

Random-Access Memory (RAM) The Memory Hierarchy. SRAM vs DRAM Summary. Conventional DRAM Organization. Page 1

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

- An Essential Building Block for Stable and Reliable Compute Clusters

On Benchmarking Popular File Systems

Comparing the Network Performance of Windows File Sharing Environments

IxChariot Virtualization Performance Test Plan

Accelerating Business Intelligence with Large-Scale System Memory

How To Write An Article On An Hp Appsystem For Spera Hana

Improving Grid Processing Efficiency through Compute-Data Confluence

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

WHITE PAPER. How To Compare Virtual Devices (NFV) vs Hardware Devices: Testing VNF Performance

MySQL performance in a cloud. Mark Callaghan

Building Docker Cloud Services with Virtuozzo

MOSIX: High performance Linux farm

NetFlow Collection and Processing Cartridge Pack User Guide Release 6.0

QUALITY OF SERVICE FOR CLOUD-BASED MOBILE APPS: Aruba Networks AP-135 and Cisco AP3602i

How System Settings Impact PCIe SSD Performance

High Performance Computing in CST STUDIO SUITE

PERFORMANCE TUNING ORACLE RAC ON LINUX

AdminToys Suite. Installation & Setup Guide

Aras Innovator 10 Scalability Benchmark Methodology and Performance Results

High Availability of the Polarion Server

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

8x8 Network Monitoring Tool

Lesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization

Integrated Grid Solutions. and Greenplum

Performance Guide. 275 Technology Drive ANSYS, Inc. is Canonsburg, PA (T) (F)

ISPS & WEBHOSTS SETUP REQUIREMENTS & SIGNUP FORM LOCAL CLOUD

SAN Conceptual and Design Basics

Configuration Maximums VMware Infrastructure 3

Amazon EC2 XenApp Scalability Analysis

Three Key Design Considerations of IP Video Surveillance Systems

VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop

Linux Process Scheduling Policy

evm Virtualization Platform for Windows

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Achieving Performance Isolation with Lightweight Co-Kernels

PARALLELS SERVER 4 BARE METAL README

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing

Understanding the Benefits of IBM SPSS Statistics Server

Interoperability Test Results for Juniper Networks EX Series Ethernet Switches and NetApp Storage Systems

Chapter 1 Computer System Overview

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

ECLIPSE Performance Benchmarks and Profiling. January 2009

Supported File Systems

Running FileMaker Pro 5.0v3 on Windows 2000 Terminal Services

An Oracle White Paper September Oracle Database Smart Flash Cache

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Transcription:

A Concurrent Real-Time White Paper 2881 Gatewa Drive Pompano Beach, FL 33069 (800) 666-4544 real-time.ccur.com Memor-Shielded RAM Disk Support in RedHawk Linux 6.0 B: John Blackwood Concurrent Real-Time Linux Development Team March 23, 2011

Abstract This white paper describes a feature of RedHawk Linux that provides the abilit to control NUMA node page placement of RAM disk data. Placing RAM disk in the same NUMA node where the RAM disk applications execute can result in higher performance and reduced sstem-wide contention. The paper describes several test case scenarios that demonstrate the benefits of proper RAM disk page placement. Overview In Non-Uniform Memor Architecture (NUMA) computer sstems, access time depends upon the location relative to a processor. NUMA multiprocessor sstems are made up of two or more NUMA nodes, where each node contains one or more processors and. In NUMA sstems, all of is accessible to all processors. Memor accesses can be either local or remote -- local accesses to that resides on the same node as the processor, and remote accesses to that resides in a different node than where the processor resides. Remote accesses can be more expensive in terms of latenc and increased bus contention, especiall where the remote resides several NUMA node bus interconnections awa from the accessing processor. Historicall, RedHawk Linux has provided the abilit to -shield a NUMA node. Onl userspace belonging to applications that are biased to execute on the -shielded NUMA node will reside in that node's. Memor-shielding also moves user-space of tasks that are executing on other NUMA nodes out of the -shielded node's. RedHawk -shielding support greatl increases the amount of local accesses that a -shielded application makes and reduces the amount of cross-node bus contention. A RAM disk is a portion of RAM/ that is used as if it were a disk drive. RAM disks have fixed sizes and act like regular disk partitions. Access time is much faster for a RAM disk compared to a phsical disk. Data stored on a RAM disk is lost when the sstem is powered down; therefore RAM disks provide a good method for storing temporar data that require fast access. In RedHawk 6.0, support has been added that gives the user the abilit to specif the NUMA node(s) where the of a RAM disk will reside. A user ma control, on a per-ram disk basis, the NUMA nodes that will hold the data of the RAM disk. This new feature can further enhance the performance of -shield applications. RAM Disk NUMA Node Interface A user can specif the NUMA node page placement of a RAM disk through the use of /proc file interface. Additionall, the per-node page count values for a RAM disk ma also be viewed. - 2 -

The /proc/driver/ram/ram[n] /nodemask file ma be used to view and to set the NUMA nodes where the RAM disk should reside, where the nodemask is a hexadecimal bitmask value. B default, RAM disk ma reside on an NUMA node. For example, the default nodemask for RAM disk 2 on an 8 NUMA node sstem would be: # cat /proc/driver/ram/ram2/nodemask 00000000,000000ff In order to setup RAM disk 2 so that its are allocated in NUMA node 3, the following commands are issued to set and check the nodemask value: # /bin/echo 8 > /proc/driver/ram/ram2/nodemask # cat /proc/driver/ram/ram2/nodemask 00000000,00000008 After RAM disk have been allocated for a RAM disk, the /proc/driver/ram/ram[n]/tat file ma be used to view the per-node page counts. For example: # cat /proc/driver/ram/ram2/tat Preferred node 3 : 2048 Performance Measurements This paper discusses two test scenarios that demonstrate the potential performance and determinism advantages of using RAM disk that are local to the processes accessing the RAM disk compared to RAM disk that reside in a remote NUMA node. The RedHawk Linux -shielded RAM disk support was used in these tests to control the NUMA node placement of the RAM disk. Although the test scenarios are not necessaril tpical of the tpe of and RAM disk usage patterns in actual application, some portion of the differences observed in these test scenarios would be seen in actual real-world applications. The tests were executed on a quadprocessor Opteron sstem with four NUMA nodes and four cores per node. The tests described below utilized two of the four NUMA nodes in the sstem. Testing Methodolog A file write/read program was used in both of the two test scenarios. This program opens either a /dev/ram[n] file directl, or opens a file that resides within the alread mounted RAM disk file sstem. The O_DIRECT open(2) flag is used so that the RAM disk accesses bpass the file sstem page cache and cause direct file data copies between the user's buffers in user-space and the RAM disk in kernel-space. - 3 -

The program first obtains a starting time using clock_gettime(2), and then executes either a loop of lseek(2) and write(2) calls, or a loop of lseek(2) and read(2) calls on the open file. After executing this set of sstem service calls, clock_gettime(2) is again used to obtain the ending time. The pseudo-code for the lseek and write loop is shown below: clock_gettime(clock_monotonic, start); for (loop = 0; loop < loopcount; loop++) { lseek(fd, 0, SEEK_SET); write(fd, buffer, size); } clock_gettime(clock_monotonic, end); In addition to this RAM disk write/read loop test, a CPU write/read test that continuall writes and reads to/from user-space buffers was used in the first test scenario below in order to generate CPU access traffic. This test was written in a wa where these CPU accesses tend to cause CPU cache misses, thus resulting in more actual phsical (non-cached) accesses. Test Scenario 1 In this scenario, one file write/read program was executed on the first NUMA node, where the program opened /dev/ram0 and issued 8 MB writes and reads directl to the raw RAM disk device. Using Remote RAM Disk Pages File w/r prog (1) CPU w/r progs (4) RAM disk Initiall, the RAM disk were setup to reside on a second (remote) NUMA node. With no other activit on the remote node, the timings for accessing the RAM disk were: 8.916 milliseconds 8.700 milliseconds - 4 -

Then when four copies (1 cop per CPU) of the CPU write/read program were executed on the remote NUMA node where the RAM disk resided, the timings for accessing the RAM disk increased to: 13.647 milliseconds 12.981 milliseconds Thus contention for accessing the remote RAM disk caused an increase of 53% for write calls and 49% for read calls in this test scenario. Using local RAM Disk Pages File w/r prog (1) CPU w/r progs (4) RAM disk The RAM disk were then placed in the local NUMA node. The same single file write/read program was executed on the first NUMA node. When there was no activit on the second NUMA node, the timings were: 8.414 milliseconds 8.501 milliseconds Then when the same four copies of the CPU read/write program were executed on the second NUMA node, the timings increased to: 8.800 milliseconds 8.917 milliseconds When the RAM disk page accesses were to local NUMA node, intense activit on the second NUMA node onl increased the RAM disk file write and read accesses b 5%. This test scenario demonstrates the benefits of localizing RAM disk accesses on a NUMA sstem when activit is occurring on other NUMA nodes. - 5 -

Test Scenario 2 In this test scenario: Two RAM disks were used: one disk's resided in the first NUMA node, and the second disk's resided in the second NUMA node. An ext3 file sstem was created in each RAM disk and both ext3 file sstems were mounted. Four copies of the file write/read program were executed on each of the two NUMA nodes, for a total of eight copies simultaneousl executing on the sstem. Each of the eight copies of the program wrote and read to their own 2MB file that was located on one of the mounted RAM disk file sstems. Each write and read was 2MB in size. Local RAM Disk Files File w/r progs (4) File w/r progs (4) RAM disk RAM disk In this first case, all of the eight file write/read program's RAM file resided in local NUMA node. The highest timings observed were: 6.968 milliseconds 7.050 milliseconds Remote Memor RAM Disk Files File w/r progs (4) File w/r progs (4) RAM disk - 6 - RAM disk

In the second case, all of the eight file write/read program's RAM disk resided in the remote NUMA node's. The highest timings observed in this case were: 8.301 milliseconds 8.207 milliseconds Performance Difference When the RAM disk ext3 file sstem files are being remotel accessed, the writes took 19% longer, and the reads took 16% longer. Summar The performance measurements discussed in this paper have shown that using RedHawk Linux -shielded RAM disk support to place RAM disk in the same NUMA node where the RAM disk is most frequentl accessed can result in higher performance and reduced contention. About Concurrent Computer Corporation Concurrent (NASDAQ: CCUR) is a global leader in innovative solutions serving aerospace and defense; automotive; broadcasting; cable and broadband; financial; IPTV; and telecom industries. Concurrent Real-Time is one of the industr's foremost providers of highperformance real-time computer sstems, solutions, and software for commercial and government markets, focusing on areas that include hardware-in-the-loop and man-in-theloop simulation, data acquisition, industrial sstems and software. Operating worldwide, Concurrent provides sales and support from offices throughout North America, Europe, Asia and Australia. For more information, please visit Concurrent Real-Time Linux Solutions at http://real-time.ccur.com or call (800) 666-4544. 2011 Concurrent Computer Corporation. Concurrent Computer Corporation and its logo are registered trademarks of Concurrent. All other Concurrent product names are trademarks of Concurrent, while all other product names are trademarks or registered trademarks of their respective owners. Linux is used pursuant to a sublicense from the Linux Mark Institute. - 7 -