Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles



Similar documents
Application Performance in the QLinux Multimedia Operating System

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

OPERATING SYSTEMS SCHEDULING

Datacenter Operating Systems

Deep Dive: Maximizing EC2 & EBS Performance

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

ICS Principles of Operating Systems

Measuring Wireless Network Performance: Data Rates vs. Signal Strength

Gigabit Ethernet Design

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

Operating Systems 4 th Class

Enabling Technologies for Distributed Computing

PERFORMANCE TUNING ORACLE RAC ON LINUX

Main Points. Scheduling policy: what to do next, when there are multiple threads ready to run. Definitions. Uniprocessor policies

Improving Quality of Service

CHAPTER 1 INTRODUCTION

Using Linux Clusters as VoD Servers

CPU Scheduling. Basic Concepts. Basic Concepts (2) Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems

Quality of Service in the Internet. QoS Parameters. Keeping the QoS. Traffic Shaping: Leaky Bucket Algorithm

Process Scheduling CS 241. February 24, Copyright University of Illinois CS 241 Staff

Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts

Cloud Computing through Virtualization and HPC technologies

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Chapter 5 Process Scheduling

Enabling Technologies for Distributed and Cloud Computing

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Xen and the Art of. Virtualization. Ian Pratt

CPU Scheduling. CPU Scheduling

Ready Time Observations

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Distributed File System Performance. Milind Saraph / Rich Sudlow Office of Information Technologies University of Notre Dame

Chapter 1: Introduction. What is an Operating System?

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

W4118 Operating Systems. Instructor: Junfeng Yang

Linux Process Scheduling Policy

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor

Azure VM Performance Considerations Running SQL Server

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Question: 3 When using Application Intelligence, Server Time may be defined as.

Operating Systems Concepts: Chapter 7: Scheduling Strategies

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project

Design Issues in a Bare PC Web Server

Oracle Database Scalability in VMware ESX VMware ESX 3.5

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Xen Live Migration. Networks and Distributed Systems Seminar, 24 April Matúš Harvan Xen Live Migration 1

QoS Parameters. Quality of Service in the Internet. Traffic Shaping: Congestion Control. Keeping the QoS

CPU Scheduling Outline

Chapter 5 Cloud Resource Virtualization

Handling Multimedia Under Desktop Virtualization for Knowledge Workers

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

- An Essential Building Block for Stable and Reliable Compute Clusters

VIA CONNECT PRO Deployment Guide

InterScan Web Security Virtual Appliance

Comparison of Web Server Architectures: a Measurement Study

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Operating Systems Lecture #6: Process Management

Operating System: Scheduling

LCMON Network Traffic Analysis

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010.

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

ò Paper reading assigned for next Thursday ò Lab 2 due next Friday ò What is cooperative multitasking? ò What is preemptive multitasking?

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Lecture 16: Quality of Service. CSE 123: Computer Networks Stefan Savage

VIA COLLAGE Deployment Guide

SIDN Server Measurements

I/O virtualization. Jussi Hanhirova Aalto University, Helsinki, Finland Hanhirova CS/Aalto

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Quantum StorNext. Product Brief: Distributed LAN Client

CHAPTER 15: Operating Systems: An Overview

A Performance Study on Internet Server Provider Mail Servers

Optimizing TCP Forwarding

GraySort on Apache Spark by Databricks

Readings for this topic: Silberschatz/Galvin/Gagne Chapter 5

<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

How To Monitor And Test An Ethernet Network On A Computer Or Network Card

HPC performance applications on Virtual Clusters

Capacity Estimation for Linux Workloads

1 Storage Devices Summary

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

How to Perform Real-Time Processing on the Raspberry Pi. Steven Doran SCALE 13X

RevoScaleR Speed and Scalability

Introduction to the NI Real-Time Hypervisor

Storage I/O Control: Proportional Allocation of Shared Storage Resources

Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC)

Real-Time Scheduling 1 / 39

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Processor Scheduling. Queues Recall OS maintains various queues

Types Of Operating Systems

Cloud Operating Systems for Servers

20 Command Line Tools to Monitor Linux Performance

Transcription:

Application Performance in the QLinux Multimedia Operating System Sundaram, A. Chandra, P. Goyal, P. Shenoy, J. Sahni and H. Vin Umass Amherst, U of Texas Austin ACM Multimedia, 2000 Introduction General purpose operating systems handline diverse set of tasks Conventional best-effort with low response time + Ex: word processor Throughput intensive applications + Ex: compilation Soft real-time applications + Ex: streaming media Many studies show can do one at a time, but when do two or more grossly inadequate MPEG-2 when compiling has a lot of jitter Introduction Reason? Lack of service differentiation Provide best-effort to all Special-purpose operating systems are similarly inadequate for other mixes Need OS that: Multiplexes resources in a predictable manner Service differentiation to meet individual application requirements Solution: QLinux Solution: QLinux (the Q is for Quality) Enhance standard Linux Hierarchical schedulers + classes of applications or individual applications CPU, Network, Disk Outline QLinux philosophy CPU Scheduler Evaluation Packet Scheduler Evaluation Disk Scheduler Evaluation Lazy Receiver Processing Evaluation Conclusion QLinux Design Principles Support for Multiple Service Classes Interactive, Throughput-Intensive, Soft Real-time Low average response times, high aggregate throughput, performance guarantees Predictable Resource Allocation Priority not enough (starvation of others) Ex: mpeg_decoder at highest can starve kernel QLinux uses rate-based rather than priority based + Weight based on rate for each: w i / Σ j w j Not static partitioning since unused can be used by others 1

QLinux Design Principles Service Differentiation Within a class, applications treated differently Uses hierarchical schedulers Top level gives resources to class In each class, can allocate resources appropriately among all applications Support for Legacy Applications Support binaries of all existing applications (no special system calls required) No worse performance (but may be better) QLinux Design Principles Proper Accounting of Resource Usage Application level CPU easy Kernel resources hard + Load from interrupts difficult to charge to process + Many kernel tasks are system-wide Lazy receiver processing + Defer packet processing when receiver asks CPU scheduler allocation holds even when kernel uses up various amounts of CPU QLinux Components Hierarchical Start-time Fair Queuing (H-SFQ) CPU Scheduler (Typical OS?) Uses a tree Each thread belongs to 1 leaf Each leaf is an application class Weights are of parent class Each node has own scheduler Uses Start-Time Fair Queuing at top for time for each H-SFQ CPU Scheduler Nodes can be created on the fly Threads can move from node to node Defaults to top-level fair scheduler if not specified Utilities to do external from application Allow support of legacy apps without modifying source Experimental Setup (for all) Cluster of PCs P2-350 MHz 64 MB RAM RedHat 6.1 QLinux based on Linux 2.2.0 Network 100 Mb/s 3-Com Ethernet 3Com Superstack II switch (100 Mb/s) Assume machines and net lightly loaded 2

Experimental Workloads Inf: executes infinite loop Compute-intensive, Best effort Mpeg_play: Berkeley MPEG-1 decoder Compute-intensive, Soft real-time Apache Web Server and Client I/O intensive, Best effort Streaming media server I/O intensive, Soft real-time Net_Inf: send UDP as fast as possible I/O instensive, Best effort Dhrystone: measure CPU performance Compute-instensive, Best effort Lmbench: measure I/O, cache, memory perf CPU Scheduler Evaluation-1 Two classes, run Inf for each Assign weights to each (ex: 1:1, 1:2, 1:4) Count the number of loops CPU Scheduler Evaluation-1 CPU Scheduler Evaluation-2 Two classes, equal weights (1:1) Run two Inf Suspend one at t=250 seconds Restart at t=330 seconds Note count count is proportional to CPU bandwidth allocated CPU Scheduler Evaluation-2 CPU Scheduler Evaluation-3 Two classes: soft real-time & best effort (1:1) Run: MPEG_PLAY in real-time (1.49 Mbps) Dhrystone in best effort Increase Dhrystone s from 1 to 2 to 3 Note MPEG bandwidth Re-run experiment with Vanilla Linux (Counts twice as fast when other suspended) 3

CPU Scheduler Evaluation-3 CPU Scheduler Evaluation-4 Explore another best-effort case Run two Web servers (representing, say 2 different domains) Have clients generate many requests See if CPU bandwidth allocation is proportional CPU Scheduler Evaluation-4 CPU Scheduler Overhead Evaluation Scheduler takes some overhead since recursively called Run Inf at increasing depth in scheduler hierarchy tree Record count for 300 seconds CPU Scheduler Overhead Evaluation QLinux Components 4

H-SFQ Packet Scheduler Typical OS uses FIFO scheduler for outgoing packets Use H-SFQ (Fair Queue) to schedule Each leaf is one or more queues of packets Weights for queues Unused bandwidth to others H-SFQ Packet Scheduler Operations on the fly Associate with queue via setsockopt() Packet Scheduler Evaluation-1 Two classes using Net_inf Run two receivers to count received packets 8KB packets Packet Scheduler Evaluation-1 (Different packets sizes?) Packet Scheduler Evaluation-2 Packet Scheduler Evaluation-3 Real-world applicatis Streaming media server in soft real-time class Increasing number of Net_inf apps Compare QLinux with Vanilla Linux 5

Packet Scheduler Evaluation-3 Packet Scheduler Overhead Evaluation (Me note, degradation not linear) Combined Packet and Scheduler Evaluation Web server and several I/O intensive apps Two classes in CPU and Packet scheduler Web server in one All I/O intensive Net_inf in other Web server driven by trace (ClarkNet) Increase number of Net_inf Compare to Vanilla Linux Packet/CPU Evaluation Qlinux degrades at 8 ideas why? QLinux Components Cello Disk Scheduler Typical OS uses SCAN for disk Cello 2 levels: class independ, class specific 3 classes Class specific decides when and how many to move Class ind puts where Lastly moved FCFS (Badri s thesis) 6

Cello Disk Scheduler Evaluation QLinux Components (None in this paper) (Previous paper at SIGMetrics) Lazy Receiver Processing (LRP) Process A running Packet arrives for process B Interrupt, IP, TCP, Enqueue gets charged to A! LRP postpones until process does a read Tricky! Some steps, e.g. TCP ack, requires it to happen right away Special thread for each process for packets QLinux uses special queues, decodes only as far as needed Special queue for ICMP, ARP LRP Evaluation and Run 2 Apache Web Servers Lightly loaded, retrieve 2KB file in 51ms Bombard 1 server with DoS by sending 300 requests/sec Other server load went to 70ms Re-run with Vanilla Linux Other server load went to 80ms QLinux Total System Evaluation Run lmbench System call overhead Context switch times Network I/O File I/O Memory perofrmance QLinux vs. Vanilla Linux QLinux Total System Evaluation Not much overall. Context switch overhead, but 100 ms time slice QLinux untuned, so could be better 7

Conclusion Future Work? Qlinux provides CPU scheduler Packet scheduler Disk scheduler Proper I/O processing Provide fair and predictable allocation Multimedia and Web applications can benefit Overhead is low All conventional operating systems should incorporate Future Work Disk scheduler results Multiprocessors Fair allocation of other I/O interrupts Other devices since Cello disk specific RAID, tape, 8