Linux Kernel Networking. Raoul Rivas



Similar documents
Operating Systems Design 16. Networking: Sockets

Red Hat Linux Internals

Linux Kernel Architecture

Linux Driver Devices. Why, When, Which, How?

Network packet capture in Linux kernelspace

SYSTEM ecos Embedded Configurable Operating System

An Implementation Of Multiprocessor Linux

REAL TIME OPERATING SYSTEM PROGRAMMING-II: II: Windows CE, OSEK and Real time Linux. Lesson-12: Real Time Linux

Linux Scheduler. Linux Scheduler

Operating Systems. Design and Implementation. Andrew S. Tanenbaum Melanie Rieback Arno Bakker. Vrije Universiteit Amsterdam

Outline. Operating Systems Design and Implementation. Chap 1 - Overview. What is an OS? 28/10/2014. Introduction

Kernel Synchronization and Interrupt Handling

Threads Scheduling on Linux Operating Systems

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

IMPROVING PERFORMANCE OF SMTP RELAY SERVERS AN IN-KERNEL APPROACH MAYURESH KASTURE. (Under the Direction of Kang Li) ABSTRACT

Jorix kernel: real-time scheduling

Linux Firewall Lab. 1 Overview. 2 Lab Tasks. 2.1 Task 1: Firewall Policies. Laboratory for Computer Security Education 1

Virtual Private Systems for FreeBSD

Tasks Schedule Analysis in RTAI/Linux-GPL

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

ELEC 377. Operating Systems. Week 1 Class 3

CSC 2405: Computer Systems II

A Survey of Parallel Processing in Linux

Linux scheduler history. We will be talking about the O(1) scheduler

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Session NM059. TCP/IP Programming on VMS. Geoff Bryant Process Software

Module 20: The Linux System

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

W4118 Operating Systems. Junfeng Yang

TCP/IP Fundamentals. OSI Seven Layer Model & Seminar Outline

Shared Address Space Computing: Programming

CS 416: Opera-ng Systems Design

Chapter 6, The Operating System Machine Level

Datacenter Operating Systems

Operating System Components and Services

Project No. 2: Process Scheduling in Linux Submission due: April 28, 2014, 11:59pm

COS 318: Operating Systems

Xen and XenServer Storage Performance

Operating System Manual. Realtime Communication System for netx. Kernel API Function Reference.

Linux LKM Firewall v 0.95 (2/5/2010)

Kernel Intrusion Detection System

Lecture 3 Theoretical Foundations of RTOS

Improving DNS performance using Stateless TCP in FreeBSD 9

Have both hardware and software. Want to hide the details from the programmer (user).

Lecture 25 Symbian OS

What is best for embedded development? Do most embedded projects still need an RTOS?

What is an RTOS? Introduction to Real-Time Operating Systems. So what is an RTOS?(contd)

Real Time Programming: Concepts

Enabling Linux* Network Support of Hardware Multiqueue Devices

Virtual Servers. Virtual machines. Virtualization. Design of IBM s VM. Virtual machine systems can give everyone the OS (and hardware) that they want.

Scaling Networking Applications to Multiple Cores

Network Simulation Traffic, Paths and Impairment

Network Attached Storage. Jinfeng Yang Oct/19/2015

The Linux Kernel: Process Management. CS591 (Spring 2001)

Network File System (NFS) Pradipta De

Migration of Process Credentials

Real-time KVM from the ground up

Chapter 3 Operating-System Structures

KVM Architecture Overview

A SIMPLE WAY TO CAPTURE NETWORK TRAFFIC: THE WINDOWS PACKET CAPTURE (WINPCAP) ARCHITECTURE. Mihai Dorobanţu, M.Sc., Mihai L. Mocanu, Ph.D.

SEP Packet Capturing Using the Linux Netfilter Framework Ivan Pronchev

IPv6 Challenges for Embedded Systems István Gyürki

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308

REAL TIME OPERATING SYSTEMS. Lesson-10:

POSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2)

Protocols. Packets. What's in an IP packet

The Performance Analysis of Linux Networking Packet Receiving

Lab 6: Building Your Own Firewall

Chapter 9. IP Secure

Data Communication Networks and Converged Networks

Safety measures in Linux

I/O. Input/Output. Types of devices. Interface. Computer hardware

Real-Time Scheduling 1 / 39

Chapter 10 Case Study 1: LINUX

Enhancing the Monitoring of Real-Time Performance in Linux

A Packet Forwarding Method for the ISCSI Virtualization Switch

Accelerate In-Line Packet Processing Using Fast Queue

Operating System Overview. Otto J. Anshus

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Optimizing TCP Forwarding

Mutual Exclusion using Monitors

Project 4: IP over DNS Due: 11:59 PM, Dec 14, 2015

Application Note: AN00121 Using XMOS TCP/IP Library for UDP-based Networking

Introduction To Computer Networking

Network Virtualization Technologies and their Effect on Performance

Going Linux on Massive Multicore

Priority Based Implementation in Pintos

Porting Lustre to Operating Systems other than Linux. Ken Hornstein US Naval Research Laboratory April 16, 2010

Port of the Java Virtual Machine Kaffe to DROPS by using L4Env

Design of an Application Programming Interface for IP Network Monitoring

Integrity of In-memory Data Mirroring in Distributed Systems Tejas Wanjari EMC Data Domain

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

CS161: Operating Systems

Page 1 of 5. IS 335: Information Technology in Business Lecture Outline Operating Systems

Hard Real-Time Linux

Transcription:

Linux Kernel Networking Raoul Rivas

Kernel vs Application Programming No memory protection Memory Protection We share memory with devices, scheduler Sometimes no preemption Can hog the CPU Segmentation Fault Preemption Scheduling isn't our responsibility Concurrency is difficult Signals (Control-C) No libraries Libraries Printf, fopen No security descriptors In Linux no access to files Direct access to hardware Security Descriptors In Linux everything is a file descriptor Access to hardware as files

Outline User Space and Kernel Space Running Context in the Kernel Locking Deferring Work Linux Network Architecture Sockets, Families and Protocols Packet Creation Fragmentation and Routing Data Link Layer and Packet Scheduling High Performance Networking

System Calls A system call is an interrupt syscall(number, arguments) Syscall table 0xFFFF50 Kernel Space sys_write() The kernel runs in a different address space Data must be copied back and forth copy_to_user(), copy_from_user() Never directly dereference any pointer from user space INT 0x80 Copy_from_user() syscall(write, ptr, size) write(ptr, size); User Space ptr 0x011075

Context Kernel Context Process Context Interrupt Context Preemptible Yes Yes No PID Itself Application PID No Can Sleep? Yes Yes No Example Kernel Thread System Call Timer Interrupt Context: Entity whom the kernel is running code on behalf of Process context and Kernel Context are preemptible Interrupts cannot sleep and should be small They are all concurrent Process context and Kernel context have a PID: Struct task_struct* current

Race Conditions Process context, Kernel Context and Interrupts run concurrently How to protect critical zones from race conditions? Spinlocks Mutex Semaphores Reader-Writer Locks (Mutex, Semaphores) Reader-Writer Spinlocks

THE SPINLOCK SPINS... THE MUTEX SLEEPS Inside Locking Primitives Spinlock //spinlock_lock: disable_interrupts(); while(locked==true); //critical region //spinlock_unlock: enable_interrupts(); locked=false; We can't sleep while the spinlock is locked! We can't use a mutex in an interrupt because interrupts can't sleep! Mutex //mutex_lock: If (locked==true) { Enqueue(this); Yield(); } locked=true; //critical region //mutex_unlock: If!isEmpty(waitqueue) { wakeup(dequeue()); } Else locked=false;

When to use what? Mutex Spinlock Short Lock Time Long Lock Time Interrupt Context Sleeping Usually functions that handle memory, user space or devices and scheduling sleep Kmalloc, printk, copy_to_user, schedule wake_up_process does not sleep

Extensibility Linux Kernel Modules Ideally you don't want to patch but build a kernel module Separate Compilation Runtime-Linkage Entry and Exit Functions Run in Process Context LKM Hello-World #define MODULE #define LINUX #define KERNEL #include <linux/module.h> #include <linux/kernel.h> #include <linux/init.h> static int init myinit(void) { printk(kern_alert "Hello, world\n"); Return 0; } static void exit myexit(void) { printk(kern_alert "Goodbye, world\n"); } module_init(myinit); module_exit(myexit); MODULE_LICENSE("GPL");

The Kernel Loop The Linux kernel uses the concept of jiffies to measure time Inside the kernel there is a loop to measure time and preempt tasks A jiffy is the period at which the timer in this loop is triggered Varies from system to system 100 Hz, 250 Hz, 1000 Hz. Use the variable HZ to get the value. The schedule function is the function that preempts tasks schedule() Timer 1/HZ tick_periodic: add_timer(1 jiffy) jiffies++ scheduler_tick()

Deferring Work / Two Halves Kernel Timers are used to create timed events They use jiffies to measure time Timers are interrupts We can't do much in them! Solution: Divide the work in two parts Use the timer handler to signal a thread. (TOP HALF) Let the kernel thread do the real job. (BOTTOM HALF) Timer Interrupt context Kernel context TOP HALF Timer Handler: wake_up(thread); Thread: While(1) { Do work(); Schedule(); } BOTTOM HALF

Linux Kernel Map

Linux Network Architecture File Access Socket Access VFS Protocol Families INET UNIX Network Storage NFS SMB iscsi Socket Splice Logical Filesystem EXT4 Protocols UDP TCP IP Network Interface ethernet 802.11 Network Device Driver

Socket Access Contains the system call functions like socket, connect, accept, bind Implements the POSIX socket interface Independent of protocols or socket types Responsible of mapping socket data structures to integer handlers Calls the underlying layer functions sys_socket() sock_create socket Socket create sys_socket Integer handler Handler table

Protocol Families Implements different socket families INET, UNIX Extensible through the use of pointers to functions and modules. Allocates memory for the socket *pf net_proto_family inet_create AF_LOCAL AF_UNIX Calls net_proto_familiy create for familiy specific initilization

Socket Splice Unix uses the abstraction of Files as first class objects Linux supports to send entire files between file descriptors. A descriptor can be a socket Also Unix supports Network File Systems NFS, Samba, Coda, Andrew The socket splice is responsible of handling these abstractions

Protocols Families have multiple protocols INET: TCP, UDP Protocol functions are stored in proto_ops socket proto_ops inet_stream_ops inet_bind inet_listen inet_stream_connect Some functions are not used in that protocol so they point to dummies inet_dgram_ops inet_bind Some functions are the same across many protocols and can be shared NULL inet_dgram_connect

Packet Creation At the sending function, the buffer is packetized. Packets are represented by the sk_buff data structure Contains pointers the: transport layer header Link-layer header Received Timestamp Device we received it Some fields can be NULL char* Struct sk_buf Struct sk_buf TCP Header tcp_send_msg tcp_transmit_skb ip_queue_xmit

Fragmentation and Routing Fragmentation is performed inside ip_fragment If the packet does not have a route it is filled in by ip_route_output_flow There are routing mechanisms used Route Cache Forwarding Information Base Slow Routing Y ip_forward (packet forwarding) ip_fragment ip_route_output_flow Y Y N N N Y forward N Route cache FIB Slow routing dev_queue_xmit (queue packet)

Data Link Layer The Data Link Layer is responsible of packet scheduling The dev_queue_xmit is responsible of enqueing packets for transmission in the qdisc of the device Then in process context it is tried to send If the device is busy we schedule the send for a later time The dev_hard_start_xmit is responsible for sending to the device Dev_queue_xmit(sk_buf) Dev qdisc enqueue Dev qdisc dequeue dev_hard_start_xmit()

Case Study: inet INET is an EDF (Earliest Deadline First) packet scheduler Each Packet has a deadline specified in the TOS field We implemented it as a Linux Kernel Module We implement a packet scheduler at the qdisc level. Replace qdisc enqueue and dequeue functions Enqueued packets are put in a heap sorted by deadline enqueue(sk_buf) dequeue(sk_buf) HW Deadline heap

High-Performance Network Stacks Minimize copying Zero copy technique Page remapping Use good data structures Inet v0.1 used a list instead of a heap Optimize the common case Branch optimization Avoid process migration or cache misses Avoid dynamic assignment of interrupts to different CPUs Combine Operations within the same layer to minimize passes to the data Checksum + data copying

High-Performance Network Stacks Cache/Reuse as much as you can Headers, SLAB allocator Hierarchical Design + Information Hiding Data encapsulation Separation of concerns Interrupt Moderation/Mitigation Receive packets in timed intervals only (e.g. ATM) Packet Mitigation Similar but at the packet level

Conclusion The Linux kernel has 3 main contexts: Kernel, Process and Interrupt. Use spinlock for interrupt context and mutexes if you plan to sleep holding the lock Implement a module avoid patching the kernel main tree To defer work implement two halves. Timers + Threads Socket families are implemented through pointers to functions (net_proto_family and proto_ops) Packets are represented by the sk_buf structure Packet scheduling is done at the qdisc level in the Link Layer

References Linux Kernel Map http://www.makelinux.net/kernel_map A. Chimata, Path of a Packet in the Linux Kernel Stack, University of Kansas, 2005 Linux Kernel Cross Reference Source R. Love, Linux Kernel Development, 2 nd Edition, Novell Press, 2006 H. Nguyen, R. Rivas, idsrt: Integrated Dynamic Soft Realtime Architecture for Critical Infrastructure Data Delivery over WAN, Qshine 2009 M. Hassan and R. Jain, High Performance TCP/IP Networking: Concepts, Issues, and Solutions, Prentice-Hall, 2003 K. Ilhwan, Timer-Based Interrupt Mitigation for High Performance Packet Processing, HPC, 2001 Anand V., TCPIP Network Stack Performance in Linux Kernel 2.4 and 2.5, IBM Linux Technology Center