Linux Networking Stack

Similar documents
Operating Systems Design 16. Networking: Sockets

The POSIX Socket API

Network packet capture in Linux kernelspace

ICT SEcurity BASICS. Course: Software Defined Radio. Angelo Liguori. SP4TE lab.

Tutorial on Socket Programming

ELEN 602: Computer Communications and Networking. Socket Programming Basics

Socket Programming. Kameswari Chebrolu Dept. of Electrical Engineering, IIT Kanpur

Computer Networks Network architecture

SEP Packet Capturing Using the Linux Netfilter Framework Ivan Pronchev

Linux Kernel Networking. Raoul Rivas

Networks. Inter-process Communication. Pipes. Inter-process Communication

Socket Programming. Srinidhi Varadarajan

UNIX Sockets. COS 461 Precept 1

Optimizing Point-to-Point Ethernet Cluster Communication

Introduction to Socket Programming Part I : TCP Clients, Servers; Host information

Implementing Network Software

Analysis of Open Source Drivers for IEEE WLANs

IMPROVING PERFORMANCE OF SMTP RELAY SERVERS AN IN-KERNEL APPROACH MAYURESH KASTURE. (Under the Direction of Kang Li) ABSTRACT

Unix Network Programming

Lab 4: Socket Programming: netcat part

Socket Programming. Request. Reply. Figure 1. Client-Server paradigm

Programmation Systèmes Cours 9 UNIX Domain Sockets

Socket Programming in C/C++

Lab 6: Building Your Own Firewall

Xinying Wang, Cong Xu CS 423 Project

Linux Driver Devices. Why, When, Which, How?

Ethernet. Ethernet. Network Devices

NS3 Lab 1 TCP/IP Network Programming in C

AT12181: ATWINC1500 Wi-Fi Network Controller - AP Provision Mode. Introduction. Features. Atmel SmartConnect APPLICATION NOTE

IT304 Experiment 2 To understand the concept of IPC, Pipes, Signals, Multi-Threading and Multiprocessing in the context of networking.

Networking Test 4 Study Guide

TCP/IP - Socket Programming

Linux Kernel Architecture

Packet Sniffing and Spoofing Lab

RF Monitor and its Uses

Linux Firewall Lab. 1 Overview. 2 Lab Tasks. 2.1 Task 1: Firewall Policies. Laboratory for Computer Security Education 1

Introduction to Socket programming using C

Porting applications & DNS issues. socket interface extensions for IPv6. Eva M. Castro. ecastro@dit.upm.es. dit. Porting applications & DNS issues UPM

Session NM059. TCP/IP Programming on VMS. Geoff Bryant Process Software

What is CSG150 about? Fundamentals of Computer Networking. Course Outline. Lecture 1 Outline. Guevara Noubir noubir@ccs.neu.

OSBRiDGE 5XLi. Configuration Manual. Firmware 3.10R

INTRODUCTION UNIX NETWORK PROGRAMMING Vol 1, Third Edition by Richard Stevens

Introduction To Computer Networking

Basic processes in IEEE networks

Direct Sockets. Christian Leber Lehrstuhl Rechnerarchitektur Universität Mannheim

point to point and point to multi point calls over IP

Application Note: AN00121 Using XMOS TCP/IP Library for UDP-based Networking

Unix System Administration

Hardware Prerequisites Atmel SAM W25 Xplained Pro Evaluation Kit Atmel IO1 extension Micro-USB Cable (Micro-A / Micro-B)

Customized Data Exchange Gateway (DEG) for Automated File Exchange across Networks

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Basic Networking Concepts. 1. Introduction 2. Protocols 3. Protocol Layers 4. Network Interconnection/Internet

How do I get to

Software Datapath Acceleration for Stateless Packet Processing

Transport Layer Protocols

Host Configuration (Linux)

Improving DNS performance using Stateless TCP in FreeBSD 9

Linux LKM Firewall v 0.95 (2/5/2010)

Firewalls. Chien-Chung Shen

Network Security. Chapter 3. Cornelius Diekmann. Version: October 21, Lehrstuhl für Netzarchitekturen und Netzdienste Institut für Informatik

IP Network Layer. Datagram ID FLAG Fragment Offset. IP Datagrams. IP Addresses. IP Addresses. CSCE 515: Computer Network Programming TCP/IP

Objectives of Lecture. Network Architecture. Protocols. Contents

First Workshop on Open Source and Internet Technology for Scientific Environment: with case studies from Environmental Monitoring

Troubleshooting Tools

Application Architecture

ACHILLES CERTIFICATION. SIS Module SLS 1508

The Performance Analysis of Linux Networking Packet Receiving

Transport Layer. Chapter 3.4. Think about

Network Security TCP/IP Refresher

SMTP-32 Library. Simple Mail Transfer Protocol Dynamic Link Library for Microsoft Windows. Version 5.2

Enabling Linux* Network Support of Hardware Multiqueue Devices

TCP/IP Fundamentals. OSI Seven Layer Model & Seminar Outline

Lab Exercise Objective. Requirements. Step 1: Fetch a Trace

Network Diagnostic Tools. Jijesh Kalliyat Sr.Technical Account Manager, Red Hat 15th Nov 2014

Network Administration and Monitoring

Getting started with IPv6 on Linux

RIOT-Lab. How to use RIOT in the IoT-Lab. Oliver "Oleg" Hahm. November 7, 2014 INRIA. O. Hahm (INRIA) RIOT-Lab November 7, / 29

CSE 127: Computer Security. Network Security. Kirill Levchenko

Chapter 9. IP Secure

Transport and Network Layer

Voice over IP. Demonstration 1: VoIP Protocols. Network Environment

LESSON Networking Fundamentals. Understand TCP/IP

Post-Class Quiz: Telecommunication & Network Security Domain

A way towards Lower Latency and Jitter

Hands On Activities: TCP/IP Network Monitoring and Management

Markku Renfors. Partly based on student presentation by: Lukasz Kondrad Tomasz Augustynowicz Jaroslaw Lacki Jakub Jakubiak

Network Programming with Sockets. Process Management in UNIX

Writing a C-based Client/Server

Subnetting,Supernetting, VLSM & CIDR

IPv6.marceln.org.

Content Distribution Networks (CDN)

MEASURING WIRELESS NETWORK CONNECTION QUALITY

Overview of TCP/IP. TCP/IP and Internet

JOB READY ASSESSMENT BLUEPRINT COMPUTER NETWORKING FUNDAMENTALS - PILOT. Test Code: 4514 Version: 01

Transcription:

Linux Networking Stack Kiran Divekar 28th May 2014

Agenda System calls in Networking world Client server model Linux networking stack Evolution of networking stack Driver Interface Introduction to Wifi Stack Wifi stack as an example Future...?

Simple router Control plane Module 1 to Module n are processes on the CP Module 1 Module 2 Module n Control Plane User Space Interconnect Protocol Network Driver Control Plane Kernel Space Data Plane Intercard communication mechanism

Problem definition All CP modules are communicating with each other IPC Control plane / Data plan communication happens over high speed network link Line cards can interact with other line cards or Control cards. And the router crashes???

Things to look out for... Is kernel, network driver alive, kernel log, crash dump. see if there is a particular irq screaming in /proc/interrupts /proc/sys/net/* : networking information Check top, free output if any process is hogging cpu? Check ps to see expected processes/threads are alive : status of CP processes. Try to get some info from /proc/net/nf_conntrack_stats to see if a particular type of error packet is being reported Check firewall rules: iptables -L, ifconfig, route. Kernel/Application log indicating any error: /var/log/syslog

Going deeper... Check files, sockets owned by each process. cat /proc/$pid/* : fd, wchan /proc/net/tcp, /proc/net/udp netstat -apeen lsof (-i for networking) Socket Status [socket operation on non-socket] Kernel modules to spit information on data structures like task_struct, struct netdevice

Knowing the full stack... In order to understand the complete, kernel knowledge is necessary. User space applications Threads, socket types Kernel interface through system calls TCP IP stack inside the kernel Interaction with network device driver. And the kernel subsystems.

Do you know this? User space Kernel space

User space kernel space...

Standard Socket Sequence The server application socket() bind() listen() The client application socket() bind() accept() 3-way handshake connect() read() write() data flow to client data flow to server write() read() close() 4-way handshake close()

Socket() in kernel For every socket which is created by a userspace application, there is a corresponding socket struct and sock struct in the kernel int socket (int family, int type, int protocol); SOCK_STREAM : TCP, SOCK_DGRAM: UDP, SOCK_RAW. This system call eventually invokes the sock_create() method in the kernel. struct socket { /* ONLY important members */ socket_state state; } unsigned long flags; struct fasync_struct *fasync_list; wait_queue_head_t wait; struct file *file; struct sock *sk; const struct proto_ops *ops;

socket queues

bind() in kernel int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen); This system call eventually invokes the inet_bind() method in the kernel. The bind system call associates a local network transport address with a socket. For a client process, it is not mandatory to issue a bind call. The kernel takes care of doing an implicit binding when the client process issues the connect system call. The kernel function sys_bind() does the following: sock = sockfd_lookup_light(fd, &err, &fput_needed); sock->ops->bind(sock, (struct sockaddr *)address, addrlen); Point to note: Binding to unprivileged ports (<1024)

listen() in kernel int listen(int sockfd, int backlog); backlog argument defines the maximum length to which the queue of pending connections for sockfd may grow. Linux uses two queues, a SYN queue (or incomplete connection queue) and an accept queue (or complete connection queue). Connections in state SYN RECEIVED are added to the SYN queue and later moved to the accept queue when their state changes to ESTABLISHED, i.e. when the ACK packet in the 3-way handshake is received. As the name implies, the accept call is then implemented simply to consume connections from the accept queue. In this case, the backlog argument of the listen syscall determines the size of the accept queue. SYN queue with a size specified by a system wide setting. /proc/sys/net/ipv4/tcp_max_syn_backlog. accept queue with a size specified by the application. Implementation is in inet_listen() kernel function.

connect() in kernel int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen); Calls inet_autobind() to use the available source port as needed. Fills destination in inet_sock and calls ipv4_stream_connect or ipv4_datagram_connect (for IPV4). Routing is done by ip_route_connect function (L3)

accept() in kernel int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen); This system call eventually invokes the inet_accept() method in the kernel.

TCP 3-way handshake (SYN)

TCP 3-way handshake (SYN-ACK)

TCP 3-way handshake (SYN-ACK)

TCP 3-way handshake (ACK)

close() in kernel int shutdown(int sockfd, int how); int close(int sockfd); Shutdown can bring down the connection in half duplex mode. At the point, the queues associated with socket are not purged. Hence, it is necessary to call the close() function.

Network stack Architecture Application User Kernel Socket Layer Socket Interface PF_INET PF_PACKET PF_UNIX PF_... SOCK_ STREAM TCP SOCK_ DGRAM UDP SOCK _RAW SOCK _RAW SOCK_ DGRAM.. Protocol Layers IPV4 Ethernet Network Device Layer... PPP WLAN Kernel Device Layer Intel E1000 Hardware

Socket Data Structures For every socket which is created by a user space application, there is a corresponding struct socket and struct sock in the kernel. struct socket: include/linux/net.h Data common to the BSD socket layer Has only 8 members Any variable sock always refers to a struct socket struct sock : include/net/sock.h Data common to the Network Protocol layer (i.e., AF_INET) Any variable sk always refers to a struct sock.

AF Interface Main data structures struct net_proto_family struct proto_ops Key function sock_register(struct net_proto_family *ops) Each address family: Implements the struct net _proto_family. Calls the function sock_register( ) when the protocol family is initialized. Implement the struct proto_ops for binding the BSD socket layer and protocol family layer.

INET and PACKET proto_family static const struct net_proto_family inet_family_ops = {.family = PF_INET,.create = inet_create,.owner = THIS_MODULE, /* af_inet.c */ }; static const struct net_proto_family packet_family_ops = {.family = PF_PACKET,.create = packet_create,.owner = THIS_MODULE, /* af_packet.c */ };

PF_INET proto_ops inet_stream_ops (TCP) inet_dgram_ops (UDP) inet_sockraw_ops (RAW).family PF_INET PF_INET PF_INET.owner THIS_MODULE THIS_MODULE THIS_MODULE.release inet_release inet_release inet_release.bind inet_bind inet_bind inet_bind.connect inet_stream_connect inet_dgram_connect inet_dgram_connect.socketpair sock_no_socketpair sock_no_socketpair sock_no_socketpair.accept inet_accept sock_no_accept sock_no_accept.getname inet_getname inet_getname inet_getname.poll tcp_poll udp_poll datagram_poll.ioctl inet_ioctl inet_ioctl inet_ioctl.listen inet_listen sock_no_listen sock_no_listen.shutdown inet_shutdown inet_shutdown inet_shutdown.setsockopt sock_common_setsockopt sock_common_setsockopt sock_common_setsockopt.getsockopt sock_common_getsockop sock_common_getsockop sock_common_getsockop.sendmsg tcp_sendmsg inet_sendmsg inet_sendmsg.recvmsg sock_common_recvmsg sock_common_recvmsg sock_common_recvmsg.mmap sock_no_mmap sock_no_mmap sock_no_mmap.sendpage tcp_sendpage inet_sendpage inet_sendpage.splice_read tcp_splice_read -- -- net/ipv4/af_inet.c

udp_prot struct proto udp_prot = {.name = "UDP",.owner = THIS_MODULE,.close = udp_lib_close,.connect = ip4_datagram_connect,.disconnect = udp_disconnect,.ioctl = udp_ioctl,.destroy = udp_destroy_sock,.setsockopt = udp_setsockopt,.getsockopt = udp_getsockopt,.sendmsg = udp_sendmsg,.recvmsg = udp_recvmsg,.sendpage = udp_sendpage,.backlog_rcv = udp_queue_rcv_skb,.hash = udp_lib_hash,.unhash = udp_lib_unhash,.get_port = udp_v4_get_port,.memory_allocated = &udp_memory_allocated,.sysctl_mem = sysctl_udp_mem,.sysctl_wmem = &sysctl_udp_wmem_min,.sysctl_rmem = &sysctl_udp_rmem_min,.obj_size = sizeof(struct udp_sock),.slab_flags = SLAB_DESTROY_BY_RCU,.h.udp_table = &udp_table, #ifdef CONFIG_COMPAT.compat_setsockopt = compat_udp_setsockopt,.compat_getsockopt = compat_udp_getsockopt, #endif }; net/ipv4/af_inet.c

Relationship struct sock sk_common sk_common sk_lock sk_lock sk_backlog sk_backlog...... struct socket state state type type flags flags fasync_list fasync_list struct proto_ops inet_release inet_release inet_bind inet_bind inet_accept inet_accept...... wait wait file file sk sk proto_ops proto_ops struct proto udp_lib_close udp_lib_close ipv4_dgram_connect ipv4_dgram_connect udp_sendmsg udp_sendmsg udp_recvmsg udp_recvmsg...... af_inet.c af_inet.c PF_INET PF_INET (*sk_prot_creator) (*sk_prot_creator) sk_socket sk_socket struct sock_common skc_node skc_node skc_refcnt skc_refcnt skc_hash skc_hash...... skc_proto skc_proto skc_net skc_net sk_send_head sk_send_head......

protocol handlers

Key structure: packet_type struct packet_type { unsigned short type; htons(ether_type) struct net_device *dev; NULL means all dev int (*func) (...); handler address void *data; private data struct list_head list; }; There are two exported kernel functions for adding and removing handlers: void dev_add_pack(struct packet_type *pt) void dev_remove_pack(struct packet_type *pt)

sk_buff structure...

sk_buff Kernel buffer that stores packets. Contains headers for all network layers. Creation Application sends data to socket. Packet arrives at network interface. Copying Copied from user/kernel space. Copied from kernel space to NIC. Send: appends headers via skb_reserve(). Receive: moves ptr from header to header.

sk_buff (cont...) sk_buff represents data and headers. sk_buff API (examples) sk_buff allocation is done with alloc_skb() or dev_alloc_skb(); drivers use dev_alloc_skb(); (free by kfree_skb() and dev_kfree_skb(). unsigned char* data : points to the current header. skb_pull(int len) removes data from the start of a buffer by advancing data to data+len and by decreasing len. Almost always sk_buff instances appear as skb in the kernel code

sk_buff functions skb_headroom(), skb_tailroom() Prototype / Description int skb_headroom(const struct sk_buff *skb); bytes at buffer head int skb_tailroom(const struct sk_buff *skb); bytes at buffer

sk_buff functions skb_reserve() Prototype void skb_reserve(struct sk_buff *skb, unsigned int len); Description adjust headroom. Used to make reservation for the header. When setting up receive packets that an ethernet device will DMA into, skb_reserve(skb, NET_IP_ALIGN) is called. This makes it so that, after the ethernet header, the protocol header will be aligned on at least a 4-byte boundary

sk_buff functions skb_push() Prototype unsigned char *skb_push(struct sk_buff *skb, unsigned int len); Description add data to the start of a buffer. skb_push() decrements 'skb- >data' and increments 'skb->len'. e.g. adding ethernet header before IP, TCP header.

sk_buff functions skb_pull() Prototype unsigned char *skb_pull(struct sk_buff *skb, unsigned int len); Description remove data from the start of a buffer

sk_buff functions skb_put() Prototype unsigned char *skb_put(struct sk_buff *skb, unsigned int len); Description add data to a buffer. skb_put() advances 'skb->tail' by the specified number of bytes, it also increments 'skb->len' by that number of bytes as well. Make sure, that enough tailroom is available, else skb_over_panic()

sk_buff functions skb_trim() Prototype void skb_trim(struct sk_buff *skb, unsigned int len); Description remove end from a buffer

Network device drivers net_device registration hard_start_xmit function pointer Interrupt handler for packet reception Bus Interaction (e.g. PCI) NAPI context

net_device structure net_device represents a network interface card. It is used to represent physical or virtual devices. e.g. loopback devices, bonding devices used for load balancing or high availability. Implemented using the private data of the device (the void *priv member of net_device); unsigned char* data : points to the current header. skb_pull(int len) removes data from the start of a buffer by advancing data to data+len and by decreasing len. Almost always sk_buff instances appear as skb in the kernel code

net_device structure (cont...) unsigned int mtu Maximum Transmission Unit: the maximum size of frame the device can handle. unsigned int flags, dev_addr[6]. void *ip_ptr: IPv4 specific data. This pointer is assigned to a pointer to in_device in inetdev_init() (net/ipv4/devinet.c) struct in_device: It contains a member named cnf (which is instance of ipv4_devconf). Setting /proc/sys/net/ipv4/conf/all/forwarding

Packet Transmission TCP/IP stack calls dev_queue_xmit function to queue the packet in the device queue. The device driver has a Tx handler registered as hard_start_xmit() function pointer. This function transmits the packet over wire or air and waits for completion callback. This completion callback is generally used to free the sk_buff associated with the packet.

Packet Transmission (cont...) Handling of sending a packet is done by ip_route_output_key(). Routing lookup also in the case of transmission. If the packet is for a remote host, set dst >output to ip_output() ip_output() will call ip_finish_output() This is the NF_IP_POST_ROUTING point

Packet Reception When working in interrupt-driven model, the nic registers an interrupt handler with the IRQ with which the device works by calling request_irq(). This interrupt handler will be called when a frame is received. The same interrupt handler will be called when transmission of a frame is finished and under other conditions like errors. Interrupt handler should verify interrupt cause Control transferred to TCP/IP stack using netif_rx() or netif_rx_ni()

Packet Reception (cont...) Interrupt handler: sk_buff is allocated by calling dev_alloc_skb() ; also eth_type_trans() is called; It also advances the data pointer of the sk_buff to point to the IP header using skb_pull(skb, ETH_HLEN). This interrupt handler will be called when a frame is received. The same interrupt handler will be called when transmission of a frame is finished and under other conditions like errors. Interrupt handler should verify interrupt cause.

Network Packets Handling

Physical ( Ethernet ) [L1] NIC generates an Interrupt Request ( IRQ ) The card driver is the Interrupt Service Routine ( ISR ) - disables interrupts Allocates a new sk_buff structure Fetches packet data from card buffer to freshly allocated sk_buff ( using DMA ) Invokes netif_rx() When netif_rx() returns, the Interrupts are reenabled and the ISR is terminated

Journey of a packet The picture: Receiving Process wake_up_interruptible() Socket Level data_ready() TCP Processing UDP ICMP tcp_rcv() udp_rcv() icmp_rcv() Other Layer 3 Proc AF_INET ( IP ) AF_PACKET *_rcv() ip_rcv() packet_rcv() Deferred pkt rcptn net_rx_action() Low Lever Pkt Rx netif_rx() Ethernet Driver

TCP/IP stack Minimize copying Zero copy technique Page remapping Branch optimization Avoid process migration or cache misses Avoid dynamic assignment of interrupts to different CPUs Combine Operations within the same layer to minimize passes to the data

Wifi Networking stack

Wifi Programming Steps for programming the wireless extensions: Open a network socket. (PF_INET, SOCK_DGRAM). Setup the wireless request using struct iwreq. Set device name. Set wireless request data. Set subioctl_no. Invoke device ioctl. Wait for the response. [ Blocking Call ] Wireless events are received over netlink socket. ( PF_NETLINK )

Wifi kernel handling Kernel space handling: * When kernel ioctl handler transfers control to the ioctl from the wireless device driver. * The driver invokes appropriate wireless extension call based on the ioctl command. * The wireless extension call transfers control to wireless firmware using special command interface over the USB/SDIO/MMC bus. * Wireless driver can receive events from firmware. ( e.g.link_loss Event)

Driver firmware interface What is a firmware? * Firmware is wireless networking software that runs on the wireless chipset. * The wireless device driver downloads the firmware to the wireless chipset, upon initialization. * All low level wireless operations are performed by the firmware software. * It works in two modes Synchronous Request, response protocol Asynchronous Events from FW. * The firmware resides in /lib/firmware/ e.g. /lib/firmware/iwl-3945.ucode

Need for NGW Next Generation Wireless Centralized control for all wireless work Drivers implement small set of configuration methods Semantics as per flows in the IEEE specifications Various modes of operation Station, AP, Monitor, IBSS, WDS, Mesh, P2P

Mac80211, cfg80211 Mac80211 is Linux kernel subsytem Implements shared code for soft MAC, half MAC devices Contains MLME (Media Access Control (MAC) Layer Management Entity) Authenticate, Deauthenticate, Associate, Disassociate Reassociate, Beacon, Probe Cfg80211 is the layer between user space and mac80211.

architecture

architecture

END OF PART I Questions...