Datacenter Operating Systems

Similar documents
Next Generation Operating Systems

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing

Arrakis: The Operating System is the Control Plane

D1.2 Network Load Balancing

How Solace Message Routers Reduce the Cost of IT Infrastructure

ODP Application proof point: OpenFastPath. ODP mini-summit

Network Virtualization Technologies and their Effect on Performance

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

ioscale: The Holy Grail for Hyperscale

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

Intel DPDK Boosts Server Appliance Performance White Paper

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Enhancing Hypervisor and Cloud Solutions Using Embedded Linux Iisko Lappalainen MontaVista

VMWARE WHITE PAPER 1

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

DPDK Summit 2014 DPDK in a Virtual World

Virtualization for Cloud Computing

Linux Performance Optimizations for Big Data Environments

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Delivering Quality in Software Performance and Scalability Testing

Eloquence Training What s new in Eloquence B.08.00

Performance Management for Cloudbased STC 2012

10Gb Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Latency-Sensitive Applications

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

State of the Art Cloud Infrastructure

Hypertable Architecture Overview

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

I3: Maximizing Packet Capture Performance. Andrew Brown

High-Density Network Flow Monitoring

Cloud Optimize Your IT

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

Cloud Computing through Virtualization and HPC technologies

Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan

Scala Storage Scale-Out Clustered Storage White Paper

Web Technologies: RAMCloud and Fiz. John Ousterhout Stanford University

High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs.

Scaling from Datacenter to Client

Advanced Computer Networks. Network I/O Virtualization

The Flash Transformed Data Center & the Unlimited Future of Flash John Scaramuzzo Sr. Vice President & General Manager, Enterprise Storage Solutions

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Violin: A Framework for Extensible Block-level Storage

Can High-Performance Interconnects Benefit Memcached and Hadoop?

10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications

Virtual Network Exceleration OCe14000 Ethernet Network Adapters

Achieving Performance Isolation with Lightweight Co-Kernels

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308

I/O virtualization. Jussi Hanhirova Aalto University, Helsinki, Finland Hanhirova CS/Aalto

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Optimizing SQL Server Storage Performance with the PowerEdge R720

DISTRIBUTED COMPUTER SYSTEMS CLOUD COMPUTING INTRODUCTION

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

High Frequency Trading and NoSQL. Peter Lawrey CEO, Principal Consultant Higher Frequency Trading

Data Centers and Cloud Computing

LS DYNA Performance Benchmarks and Profiling. January 2009

Client-aware Cloud Storage

Understanding Data Locality in VMware Virtual SAN

Resource Efficient Computing for Warehouse-scale Datacenters

SharePoint 2010 Performance and Capacity Planning Best Practices

IO Visor: Programmable and Flexible Data Plane for Datacenter s I/O

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Enabling High performance Big Data platform with RDMA

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Scaling in a Hypervisor Environment

Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

THE CLOUD STORAGE ARGUMENT

Xen and XenServer Storage Performance

RoCE vs. iwarp Competitive Analysis

Distributed File Systems

Virtualization. Dr. Yingwu Zhu

How Linux kernel enables MidoNet s overlay networks for virtualized environments. LinuxTag Berlin, May 2014

Dell s SAP HANA Appliance

Improving the performance of data servers on multicore architectures. Fabien Gaud

Cray DVS: Data Virtualization Service

White Paper. Recording Server Virtualization

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Server-i-zation of Storage Jeff Sosa Sr. Director of Product Management, Virident

SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

High-performance vswitch of the user, by the user, for the user

Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Operating Systems Design 16. Networking: Sockets

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

HRG Assessment: Stratus everrun Enterprise

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

A Deduplication File System & Course Review

Achieving a High-Performance Virtual Network Infrastructure with PLUMgrid IO Visor & Mellanox ConnectX -3 Pro

Transcription:

Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015

This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major problem: Server I/O performance Arrakis, a datacenter OS Addresses the I/O performance problem (for now)

What s a Datacenter? Large facility to house computer systems 10,000s of machines Independently powered Consumes as much power as a small town First built in the early 2000s In the wake of the Internet Runs a large portion of the digital economy

Why Datacenters? Consolidation Run many people s workloads on the same infrastructure Use infrastructure more efficiently (higher utilization) Leverage workload synergies (eg., caching) Virtualization Build your own private infrastructure quickly and cheaply Move it around anywhere, anytime Automation No need for expensive, skilled IT workers Expertise is provided by the datacenter vendor

Types of Datacenters Supercomputers Compute intensive Scientific computing: weather forecast, simulations, Hyperscale (this lecture) I/O intensive => Makes for cool OS problems Large-scale web services: Google, Facebook, Twitter, Cloud Virtualization intensive Everything else: Smaller businesses (eg., Netflix)

Hyperscale Datacenters Hyperscale: Provide services to billions of users Users expect response at interactive timescales Within milliseconds Examples: Web search, Gmail, Facebook, Twitter Built as multi-tier application Front end services: Load balancer, web server Back end services: database, locking, replication Hundreds of servers contacted for 1 user request Millions of requests per second per server

Hyperscale: I/O Problems Hardware trend Network & stoage speeds keep on increasing 10-100 Gb/s Ethernet Flash storage CPU frequencies don t 2-4 GHz Example system: Dell PowerEdge R520 + + = Intel X520 Intel RS3 RAID 10G NIC 1GB flash-backed cache 2 us / 1KB packet 25 us / 1KB write Sandy Bridge CPU 6 cores, 2.2 GHz

Hyperscale: OS I/O Problems OS problem Traditional OS: Kernel-level I/O processing => slow Shared I/O stack => Complex Layered design => Lots of indirection Lots of copies

Receiving a packet in BSD Stream Datagram TCP UDP ICMP IP Receive queue Kernel Network interface

Receiving a packet in BSD Stream Datagram TCP UDP ICMP Kernel IP Receive queue 1. Interrupt 1.1 Allocate mbuf 1.2 Enqueue packet 1.3 Post s/w interrupt Network interface

Receiving a packet in BSD Stream Datagram TCP UDP ICMP IP 2. S/W Interrupt High priority IP processing TCP processing Enqueue on Receive queue Kernel Network interface

Receiving a packet in BSD Stream Datagram 3. Access control Copy mbuf to user space TCP UDP ICMP IP Receive queue Kernel Network interface

Sending a packet in BSD Stream Datagram TCP UDP ICMP IP Receive queue Kernel Network interface

Sending a packet in BSD Stream 1. Datagram Access control Copy from user space to mbuf Call TCP code and process Possible enqueue on queue TCP UDP ICMP IP Receive queue Kernel Network interface

Sending a packet in BSD Stream Datagram TCP UDP ICMP IP 2. S/W Interrupt Remaining TCP processing IP processing Enqueue on NIC queue Receive queue Kernel Network interface

Sending a packet in BSD Stream Datagram TCP UDP ICMP IP Kernel Network interface Receive queue 3. Interrupt Send packet Free mbuf

Linux I/O Performance % OF 1KB REQUEST TIME SPENT Redis G ET SET HW 18% HW 13% Kernel 62% Kernel 84% App 20% App 3% 9 us 163 us API Multiplexing Kernel Naming Access control I/O Processing Protection Resource limits I/O Scheduling Copying Data Path 10G NIC 2 us / 1KB packet RAID Storage 25 us / 1KB write

Arrakis Datacenter OS Can we deliver performance closer to hardware? Goal: Skip kernel & deliver I/O directly to applications Reduce OS overhead Keep classical server OS features Process protection Resource limits I/O protocol flexibility Global naming The hardware can help us

Hardware I/O Virtualization Standard on NIC, emerging on RAID Multiplexing SR-IOV: Virtual PCI devices w/ own registers, queues, INTs Protection IOMMU: Devices use app virtual memory Packet filters, logical disks: Only allow eligible I/O I/O Scheduling NIC rate limiter, packet schedulers User-level VNIC 1 SR-IOV NIC Rate limiters Packet filters Network User-level VNIC 2

How to skip the kernel? Redis Library API Multiplexing Kernel Naming Access control I/O Processing Protection Resource limits I/O Scheduling Copying Data Path I/O Devices

Arrakis I/O Architecture Control Plane Data Plane Kernel Naming Access control Resource limits Redis Redis API I/O Processing Data Path I/O Devices Protection Multiplexing I/O Scheduling

Arrakis Control Plane Access control Do once when configuring data plane Enforced via NIC filters, logical disks Resource limits Program hardware I/O schedulers Global naming Virtual file system still in kernel Storage implementation in applications

Global Naming Redis emacs Fast HW ops Indirect IPC interface Virtual Storage Area /tmp/lockfile /var/lib/key_value.db /etc/config.rc open( /etc/config.rc ) Kernel VFS Logical disk

Storage Data Plane: Persistent Data Structures Examples: log, queue Operations immediately persistent on disk Benefits: In-memory = on-disk layout Eliminates marshaling Metadata in data structure Early allocation Spatial locality Data structure specific caching/prefetching Modified Redis to use persistent log: 109 LOC changed

Redis Latency Reduced in-memory GET latency by 65% Linux HW 18% Kernel 62% App 20% 9 us Arrakis HW 33% libio 35% App 32% 4 us Reduced persistent SET latency by 81% Linux (ext4) HW 13% Kernel 84% App 3% 163 us Arrakis HW 77% libio 7% App 15% 31 us

Redis Throughput Improved GET throughput by 1.75x Linux: 143k transactions/s Arrakis: 250k transactions/s Improved SET throughput by 9x Linux: 7k transactions/s Arrakis: 63k transactions/s

memcached Scalability 1200 10Gb/s interface limit 3.1x 1000 800 Throughput (k transactions/s) 600 2x 400 1.8x 200 0 1 2 4 Number of CPU cores Linux Arrakis

Summary OS is becoming an I/O bottleneck Globally shared I/O stacks are slow on data path Arrakis: Split OS into control/data plane Direct application I/O on data path Specialized I/O libraries -level I/O stacks deliver great performance Redis: up to 9x throughput, 81% speedup Memcached scales linearly to 3x throughput

Interested? I am recruiting PhD students I work at UT Austin Apply to UT Austin s PhD program: http://services.cs.utexas.edu/recruit/grad/frontmatter/announcement.html