sys socketcall: Network systems calls on Linux



Similar documents
Socket Programming. Srinidhi Varadarajan

Porting applications & DNS issues. socket interface extensions for IPv6. Eva M. Castro. ecastro@dit.upm.es. dit. Porting applications & DNS issues UPM

ELEN 602: Computer Communications and Networking. Socket Programming Basics

Socket Programming. Request. Reply. Figure 1. Client-Server paradigm

INTRODUCTION UNIX NETWORK PROGRAMMING Vol 1, Third Edition by Richard Stevens

NS3 Lab 1 TCP/IP Network Programming in C

Operating Systems Design 16. Networking: Sockets

Socket Programming in C/C++

W4118 Operating Systems. Junfeng Yang

The POSIX Socket API

Introduction to Socket Programming Part I : TCP Clients, Servers; Host information

How To Write Portable Programs In C

Introduction to Socket programming using C

Tutorial on Socket Programming

Implementing Network Software

Socket Programming. Kameswari Chebrolu Dept. of Electrical Engineering, IIT Kanpur

Unix Network Programming

UNIX. Sockets. mgr inż. Marcin Borkowski

Session NM059. TCP/IP Programming on VMS. Geoff Bryant Process Software

CSE 333 SECTION 6. Networking and sockets

C++ INTERVIEW QUESTIONS

VMCI Sockets Programming Guide VMware ESX/ESXi 4.x VMware Workstation 7.x VMware Server 2.0

The C Programming Language course syllabus associate level

TCP/IP - Socket Programming

The programming language C. sws1 1

Lab 4: Socket Programming: netcat part

Active-Active Servers and Connection Synchronisation for LVS

TCP/IP Illustrated, Volume 2 The Implementation

An Introduction to the ARM 7 Architecture

Networking in NSA Security-Enhanced Linux

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

UNIX Sockets. COS 461 Precept 1

Application Architecture

1 Abstract Data Types Information Hiding

3.5. cmsg Developer s Guide. Data Acquisition Group JEFFERSON LAB. Version

A deeper look at Inline functions

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

MatrixSSL Porting Guide

MatrixSSL Developer s Guide

Improving DNS performance using Stateless TCP in FreeBSD 9

The Plan Today... System Calls and API's Basics of OS design Virtual Machines

Programming with TCP/IP Best Practices

Assembly Language: Function Calls" Jennifer Rexford!

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Implementing and testing tftp

IBM i Version 7.2. Programming Socket programming IBM

Limi Kalita / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3), 2014, Socket Programming

Keil C51 Cross Compiler

Toasterkit - A NetBSD Rootkit. Anthony Martinez Thomas Bowen

Binary compatibility for library developers. Thiago Macieira, Qt Core Maintainer LinuxCon North America, New Orleans, Sept. 2013

CHAPTER 6 TASK MANAGEMENT

Lecture 5. User-Mode Linux. Jeff Dike. November 7, Operating Systems Practical. OSP Lecture 5, UML 1/33

Linux Kernel Architecture

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

IT304 Experiment 2 To understand the concept of IPC, Pipes, Signals, Multi-Threading and Multiprocessing in the context of networking.

Concurrent Server Design Alternatives

Optimizing Point-to-Point Ethernet Cluster Communication

TFTP Usage and Design. Diskless Workstation Booting 1. TFTP Usage and Design (cont.) CSCE 515: Computer Network Programming TFTP + Errors

ICT SEcurity BASICS. Course: Software Defined Radio. Angelo Liguori. SP4TE lab.

Writing Client/Server Programs in C Using Sockets (A Tutorial) Part I. Session Greg Granger grgran@sas. sas.com. SAS/C & C++ Support

Apache Thrift and Ruby

Andreas Herrmann. AMD Operating System Research Center

User-level processes (clients) request services from the kernel (server) via special protected procedure calls

Programmation Systèmes Cours 9 UNIX Domain Sockets

Jorix kernel: real-time scheduling

µtasker Document FTP Client

Lab 6: Building Your Own Firewall

Crash Course in Java

CS 416: Opera-ng Systems Design

Sources: On the Web: Slides will be available on:

Parameter Passing. Standard mechanisms. Call by value-result Call by name, result

Introduction to Data Structures

How do Users and Processes interact with the Operating System? Services for Processes. OS Structure with Services. Services for the OS Itself

Java Interview Questions and Answers

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Data Types in the Kernel

Encryption Wrapper. on OSX

Lecture 22: C Programming 4 Embedded Systems

Semantic Analysis: Types and Type Checking

Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Kernel Types System Calls. Operating Systems. Autumn 2013 CS4023

Python, C++ and SWIG

Between Mutual Trust and Mutual Distrust: Practical Fine-grained Privilege Separation in Multithreaded Applications

picojava TM : A Hardware Implementation of the Java Virtual Machine

Hacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail

Networks class CS144 Introduction to Computer Networking Goal: Teach the concepts underlying networks Prerequisites:

Base Conversion written by Cathy Saxton

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation

User guide for the Error & Warning LabVIEW toolset

Project Adding a System Call to the Linux Kernel

Numbering Systems. InThisAppendix...

Intel 8086 architecture

Generalised Socket Addresses for Unix Squeak

Transcription:

sys socketcall: Network systems calls on Linux Daniel Noé April 9, 2008 The method used by Linux for system calls is explored in detail in Understanding the Linux Kernel. However, the book does not adequately describe the idiosyncrasies of networking system calls. When the BSD Socket interface was added to the kernel seventeen system calls were added at once. The Linux programmers decided to use an additional level of indirection in order to conserve system call numbers. This unusual technique remains in the kernel in many architectures, including i386. Some newer architectures such as x86 64 avoid the extra layer of indirection and use the same technique as other system calls. In this short article we will explore the sys socketcall system call dispatch method. To match Understanding the Linux Kernel all examples here are for Linux Kernel version 2.6.11 on the i386 architecture. We will assume the reader is familiar with the standard system call mechanisms on i386. As you may recall, the kernel contains a large jump table in arch/i386/kernel/entry.s which maps system call numbers to functions. By convention these functions begin with sys. But there are no sys socket or sys bind or any other Berkeley Sockets API system calls listed directly in this table. When a user space library wishes to make a normal system call, it places the corresponding system call number (0-228 currently) in the eax register, then performs whatever action is required to switch to the kernel. On older x86 systems this was done with the int 0x80 instruction. Newer systems provide the sysenter instruction which is more efficient. Libraries generally use the vsyscall mechanism, which allows the kernel to automatically select the best entry mechanism. If the system call requires arguments, these are passed in additional registers starting with ebx. Socket calls use work differently. Instead of one system call per userspace call, for socket calls everything is wrapped through the sys socketcall function. This is defined in the entry.s jump table as system call 102. The 1

/* Argument list sizes for sys_socketcall */ #define AL(x) ((x) * sizeof(unsigned long)) static unsigned char nargs[18]={al(0),al(3),al(3),al(3),al(2),al(3), AL(3),AL(3),AL(4),AL(4),AL(4),AL(6), AL(6),AL(2),AL(5),AL(5),AL(3),AL(3)}; #undef AL Figure 1: Argument list sizes for sys socketcall actual desired user space socket call number (in the range 1-17) is passed in ebx. The socket call numbers are shown in Table 1 and are found in linux/net.h. Any arguments given to the user space socket call are passed using an args array of type unsigned long. The address of the first element of this array is placed in ecx and a system call to sys socketcall is made via the usual means. The sys socketcall function is defined in net/socket.c and takes two parameters: an integer call number and the unsigned long pointer args. Via the normal system call convention, this function will be called with the value from ebx in the first argument and ecx in the second. Four local stack variables are declared in sys socketcall. The first is an unsigned long array a of size 6. Two more unsigned long variables a0 and a1 are also declared. Finally, an integer err is declared to hold return values (if the return value is less than 0 it indicates an error and it is negated and placed in errno by user space code). The first thing sys socketcall does is to check if the call number passed in ebx is reasonable. If it is less than 1 or more than SYS RECVMSG (17, the highest call code numerically) then -EINVAL is returned ( Invalid argument ). Next, the arguments array is copied from user space to kernel space using the copy from user() function. The arguments to this function are the kernel stack array a, the user space array args and the number of bytes to copy. In order to determine the number of bytes to pass as the third argument of copy from user(), the code consults the static unsigned char array called nargs[], shown in Figure 1. This array is constructed with the help of a macro AL(x) which returns x times the size of an unsigned long. The nargs array contains one entry for each socket call code. It is initialized using the AL macro with the number of unsigned long arguments of each socket call, respectively. This way the nargs array contains the number of bytes in the userspace arguments array for each socket call. The nargs array 2

Userspace function Call code (from linux/net.h) Kernel function socket() SYS SOCKET sys socket() bind() SYS BIND sys bind() connect() SYS CONNECT sys connect() listen() SYS LISTEN sys listen() accept() SYS ACCEPT sys accept() getsockname() SYS GETSOCKNAME sys getsockname() getpeername() SYS GETPEERNAME sys getpeername() socketpair() SYS SOCKETPAIR sys socketpair() send() SYS SEND sys send() sendto() SYS SENDTO sys sendto() recv() SYS RECV sys recv() recvfrom() SYS RECVFROM sys recvfrom() shutdown() SYS SHUTDOWN sys shutdown() setsockopt() SYS SETSOCKOPT sys setsockopt() getsockopt() SYS GETSOCKOPT sys getsockopt() sendmsg() SYS SENDMSG sys sendmsg() recvmsg() SYS RECVMSG sys recvmsg() Table 1: Mappings from switch(call) in net/socket.c: sys socketcall() is subscripted with the socket call number in order to pass the number of bytes to copy from user(). See Figure 1 for a listing. Notice that the AL macro is undefined after it is used to avoid polluting the namespace. The next operation is to copy the a[0] and a[1] values into the a0 and a1 variables. I m not exactly sure what the point of this is, since it simply provides a different way to refer to these argument values. It may be there in order to provide a hint to the compiler that the first two arguments should be placed into registers. Finally, sys socketcall enters a large switch statement. It contains one case for each of the seventeen socket call codes listed in Table 1. There is also a default case, but this should never be reached since the call code was already checked before the copy was performed. Each case of the switch statement calls the appropriate kernel function with the correct number of arguments from the a array. The return value of the function is placed in err, which is returned at the end of the switch statement. Note that the sys socketcall abstraction described here is not used on 3

struct sockaddr { sa_family_t sa_family; /* address family, AF_xxx */ char sa_data[14]; /* 14 bytes of protocol address */ }; Figure 2: struct sockaddr defined in linux/socket.h all architectures. There is a #ifdef ARCH WANT SYS SOCKETCALL which disables the definition of sys socketcall and the nargs static array if the architecture elects to directly call the functions from Table 1 instead of using sys socketcall. Note that certain architectures such as x86 64 define this macro even though they don t seem to be using the sys socketcall mechanism. I think this is because the sys socketcall code is used for ia32 compatibility routines in the kernel (for example, 32 bit binaries running on a 64 bit install). One major data structure is commonly seen in the socket system calls. It is struct sockaddr, seen in Figure 2. What seems to be a very simple structure is in fact a placeholder for something much more complex (those used to fancy abstract programming interfaces may even say sinister!). In order to allow the socket system calls to operate on a diverse array of different protocols, the first field of the struct sockaddr is a short int which represents the address family. These address families are defined as macros in linux/socket.h. Some of them are well known and common (AF INET), some haven t come of age yet (AF INET6), and the sun is setting on some (AF SNA, maintained by the Linux SNA Project, which socket.h describes as nutters! ). By reading the first two bytes of the struct sockaddr the address family and thus address format can be determined. This means the same socket interface can be used for many different network protocols. The sa data field of the structure is marked as 14 bytes, but it doesn t have to be. For example, an IPv6 address is 16 bytes alone, even without additional information required for the socket (such as the port). Typically an address family specific structure such as sockaddr in is allocated and populated. This is then cast to a struct sockaddr and passed to the socket system call. The receiving end peeks at the first two bytes of the structure and then knows that the struct sockaddr is really a particular protocol specific structure. It can then perform a cast to access the struct sockaddr as the protocol specific structure. This technique seems rather ugly at first glance when compared to mod- 4

ern object-oriented techniques but there is a certain simplicity to it. It also keeps data compact, which improves cache locality. The only issue is that the casting must be performed manually, and this means it is easy to shoot yourself in the foot. For example, the struct sockaddr itself does not have enough space to fit an IPv6 address (but it does have enough for IPv4). It is necessary to allocate space for the larger structure required by IPv6 if you wish to use the same allocation for both IPv6 and IPv4 addresses. The consequences of trying to put an IPv6 address into a 14 byte structure will likely be severe and difficult to debug. The sys socketcall interface provides an interesting look into the historical development of the Linux kernel. It is hardly necessary to do this kind of indirection out of an attempt to conserve system calls. There have certainly been numerous system calls added since sys socketcall yet the kernel is not close to running out. Yet, the sys socketcall interface remains. It provides negligible overhead and changing the interface would require changes to system libraries. This small example of kernel engineering will remain in the kernel for a long time. 5