Realtime Linux Kernel Features Tim Burke, Red Hat, Director Emerging Technologies Special guest appearance, Ted Tso of IBM
Realtime what does it mean to you?
Agenda What? Terminology, Target capabilities How? Highlights of upstream realtime kernel development Performance results IBM realtime Java by Ted Tso
Throughput & variability Average times vary considerably No prioritization Favors throughput (traffic volume / time)
Predictability, deterministic High speed lane Prioritization Determinism = consistent response time = low latency Downside Average throughput may decrease
Example use cases Financial Services time is money, consistency is law Government, command & control Telco & network infrastructure Industrial automation
Low latency response time strategies Increase concurrency Remove serialization Split up lengthy operations Downside not everyone needs a 4 person tire changing pit crew
What? Deterministic latency Predictability in response time Ensure that highest priority processes run first Deterministic upper bound on latency Not guaranteed response time, but greatly reduced probability of poor response Example: 100,000 transactions in a minute 99,999 at 2ms and 1 at 20ms isn't good enough Avoid non deterministic kernel black holes and serialization points
What? Timing targets Deterministic upper bound on latency. 10 µs sleep time, vs 2 ms in stock RHEL5 Context switch latency under 25 µs, 99.9999% under 20 µs (from interrupt to commence running new process) Highly accurate gettimeofday() with nanosecond resolution Average of 38% improvement vs stock RHEL5 Can be tuned to µs accuracy vs ms based system call performance optimization Note: lower bound on timing constraints is hardware dependent
How? Upstream community development rt patchset A working sandbox for a community of developers & users The recognized Linux epicenter for RT active work in progress Comprehensive RT enablement, not just a quick VM/scheduler hack A growing collection of many features. Not just 1 thing. Many of its features & enabling cleanups are already upstream. Large patch set in core kernel codepaths (interrupts, scheduler, locks, etc). 120,000 lines at its height. 80K after 2.6.21. Many more in 2.6.19 21. Other parts will go into.22,.23,... x86 & x86 64 (Intel & AMD)
How? Community Kernel Project Upstream rt developer participants and approximate contribution rate 45% Ingo Molnar lead developer and overall upstream leader. Focus on scheduler, locking, interrupts. Red Hat fulltime employee 35% Thomas Gleixner developer, primarily concentrating on timers. Contractor to Red Hat 10% Steven Rostedt developer. Red Hat fulltime employee 10% all other participants (IBM, Monta Vista, Timesys, Novell, etc) All efforts are ultimately shaped towards long term mainstream inclusion Additional Red Hat crew Testing & test development Fixing Tool development system mgt & performance monitoring
Real time kernel work upstream Items 2.6.18 and prior are in RHEL5 Priority inheritance futexes (PI futex) (2.6.18) Work over the last 2+ years. 90% Red Hat, 1.2million lines Mostly maintained in rt tree Generic IRQ layer (2.6.18) Core time re write (2.6.18) Sleepable RCU (2.6.19) Latency Tracer (2.6.19) High res+dynticks (2.6.21) Over the last year, many patches have moved from rt to mainline kernel: BKL preemptable (2.6.8) Mutex patch (2.6.16) Semaphore to Mutex conversion (ongoing ~85% done) Hrtimers subsystem (2.6.16) Robust futexes (2.6.17) Lock validator (2.6.18) Conversion of spin locks to mutex (2.6.22+) All Interrupt handling in threads (~2.6.22+) Full rt preempt (~2.6.23+)
Upstream success Improve kernel lock synchronization Improve granularity identify and correct contention points Mutex rather than semaphores Mutexes are lighter weight Lock validator Efficient runtime confirmation of lock ordering Can detect race conditions without actually hitting them Priority Inheritance (PI) Prevents low priority processes blocking higher priority. Problem scenario: Low priority process takes lock High priority process needs lock, but must wait Long running medium priority process preempts low priority process Solution: temporarily boost low pri process to allow completion Required for realtime java 1000's of threads
Upstream success timer precision & interrupt handling Timer enhancements Infrastructure cleanup factor common code, increase fields to represent nanosecond precision Timer precision utilize high resolution hardware timers at microsecond precision rather than approximate periodic time interrupt millisecond precision Generic timeof day cleanly accommodate diverse clock sources VDSO gettimeofday() performance enhancement for millisecond accuracy Dynamic ticks power savings no need to to timer interrupt 1000 times per second on idle system transition to low power state (great for OLPC) Interrupt handling Generic IRQ mechanism infrastructure cleanup factor common code More fine grained hardware interrupt control
Realtime kernel performance 01/2007 Testing Results x86_64 em64t, amd64 4 socket dual core AMD64 2.2 Ghz 2 socket dual core Intel Woodcrest 3Ghz Workload Tibco latency test using standard, unmodified application RT kernel 2.6 20 rt4 0096 vs RHEL5 (both tuned) Tibco EMS App Improve spikes and variability by 10x Maintain within 20% overall response time Lmbench Improvements of context swtich rates +20 to 61% Degradation in file system perf 21% due to RT nature of handling file I/O
Red Hat Confidential
RHEL5 tuned vs RT tuned 05/2007 Red Hat Confidential
Number of Samples per 10k messages Histogram of Tibco EMS Response Times 15000.0 9997630.0 14000.0 13000.0 12000.0 11000.0 10000.0 9000.0 RHEL5 8000.0 rt8 7000.0 6000.0 5000.0 4000.0 3000.0 2000.0 1000.0 0.0 211.0 <1 ms Tibco App 1 2 ms 2 5ms 202.0 5 10 20 50 > 200 Peak 10ms 20ms 50ms 100ms ms ms <1 ms 1 2 ms 2 5ms 5 10ms 10 20ms RHEL5 9997642.5 622.0 8264.0 11482.0 4925.0 225.0 3.0 2.0 202.0 rt8 9997630.0 28.0 189.0 542.0 400.0 211.0 0.0 0.0 42.0 Red Hat Confidential 20 50ms50 100ms > 200 ms Peak ms
How? performance monitoring tools Existing standard RHEL5 based performance monitoring tools remain relevant Gdb, OProfile Frysk source level debuggers & profiler SystemTap, kprobe kernel event tracing and dynamic data collection kexec/kdump standard kernel dump / savecore capabilities Latency Tracer new feature post RHEL5 Runtime trace capture of longest latency codepaths Peak detector Selectable triggers for threshold tracing Detailed kernel profiles based on latency triggers
How? No application changes required All of the realtime enhancements are in the kernel under the hood from an application perspective. No application changes are required to benefit from realtime enhancements. Applications which are latency bottlenecked due to kernel scheduling and interrupt handling will see benefit. Latencies introduced entirely in userspace (sub optimal application code, unbounded java garbage collection, etc) are not eliminated. Recompilation is not required (same gcc/glibc as standard RHEL5) Applications recompiled on RHEL5 benefit from pi mutex glibc implementation enhancements to avoid syscall overhead on uncontested locks.
What? The broader picture Most use cases are not isolated systems Commonly networked pipeline of systems Job Scheduler Load Balancer Clients Computation Storage Archival file/db
Realtime & Messaging go hand in hand Most of the projects that drive realtime requirements include messaging Realtime has advantages for any message system, but unique advantages when combined with RHM. Red Hat is a co founder in open standards and implementation of messaging technology Requirements cover: predictability, high speed, reliability, multi vendor interoperability, security and scalability. AMQP Advanced Message Queuing Protocol an open standard for message oriented middleware. In development 3+ years, went public June, 2006. End user participation Qpid Apache project implementation of AMQP, Upstream for RHM Upcoming Red Hat Enterprise Messaging product offering coming soon.
More info Upstream realtime development http://rt.wiki.kernel.org/ Mailing list Kernel source download Test programs Presentations Discuss interest with Red Hat sales & marketing
Real Time in the Real World Navy contracted to Raytheon to design the computer systems on the next generation Destroyer, or the DDG 1000 project. http://www.naval technology.com/projects/dd21 Real time Java from IBM running on Real time Linux was key to Raytheon choice to partner with IBM instead of Sun. IBM Blades and real time JVM will be used to build the central data centers.
IBM's Websphere Real Time A full J2SE JVM with RTSJ support With an IBM exclusive real time garbage collector 1ms maximum pause time Uses at most 30% CPU utilization over any 10ms window. Available today on a modified 32 bit RHEL4 system from IBM. IBM is working with Red Hat to make this available on a productized real time version of RHEL5
The complete enterprise realtime picture