Operating Systems Project: Device Drivers



Similar documents
Linux Driver Devices. Why, When, Which, How?

Linux LKM Firewall v 0.95 (2/5/2010)

Linux Firewall Lab. 1 Overview. 2 Lab Tasks. 2.1 Task 1: Firewall Policies. Laboratory for Computer Security Education 1

Operating Systems. 12. Devices. Paul Krzyzanowski. Rutgers University. Spring /9/ Paul Krzyzanowski

User-level processes (clients) request services from the kernel (server) via special protected procedure calls

Linux Kernel Rootkit : Virtual Terminal Key Logger

W4118 Operating Systems. Junfeng Yang

Lab 6: Building Your Own Firewall

System Calls Related to File Manipulation

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

The Linux Kernel Module Programming Guide

The Linux Kernel Module Programming Guide

Jorix kernel: real-time scheduling

Lab 2 : Basic File Server. Introduction

Migration of Process Credentials

Project Adding a System Call to the Linux Kernel

Project No. 2: Process Scheduling in Linux Submission due: April 28, 2014, 11:59pm

How To Understand How A Process Works In Unix (Shell) (Shell Shell) (Program) (Unix) (For A Non-Program) And (Shell).Orgode) (Powerpoint) (Permanent) (Processes

REAL TIME OPERATING SYSTEM PROGRAMMING-II: II: Windows CE, OSEK and Real time Linux. Lesson-12: Real Time Linux

System Calls and Standard I/O

Linux Kernel Architecture

A Comparison of the Linux and Windows Device Driver Architectures

Load Balancing - Single Multipath Route HOWTO

BOSS MOOL GNU/Linux User Manual

The C Programming Language course syllabus associate level

Illustration 1: Diagram of program function and data flow

Priority Based Implementation in Pintos

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

The Linux Virtual Filesystem

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Keil C51 Cross Compiler

Assignment 5: Adding and testing a new system call to Linux kernel

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Future Technology Devices International Ltd. Mac OS X Installation Guide

Threads Scheduling on Linux Operating Systems

Storage Classes CS 110B - Rule Storage Classes Page 18-1 \handouts\storclas

Libmonitor: A Tool for First-Party Monitoring

1 Abstract Data Types Information Hiding

NDK: NOVELL NSS AUDIT

Traditional Rootkits Lrk4 & KNARK

Mouse Drivers. Alan Cox.

Virtual Servers. Virtual machines. Virtualization. Design of IBM s VM. Virtual machine systems can give everyone the OS (and hardware) that they want.

Freescale Semiconductor, I

Audit Trail Administration

Chapter 12 File Management

Chapter 12 File Management. Roadmap

The Linux Kernel Device Model

Sources: On the Web: Slides will be available on:

File-system Intrusion Detection by preserving MAC DTS: A Loadable Kernel Module based approach for LINUX Kernel 2.6.x

Operating Systems and Networks

Toasterkit - A NetBSD Rootkit. Anthony Martinez Thomas Bowen

Introduction. What is an Operating System?

CANnes PC CAN Interface Manual

8 Tutorial: Using ASN.1

C++ INTERVIEW QUESTIONS

Setting up PostgreSQL

Safety measures in Linux

Linux/UNIX System Programming. POSIX Shared Memory. Michael Kerrisk, man7.org c February 2015

Applying Clang Static Analyzer to Linux Kernel

Glossary of Object Oriented Terms

IPC. Semaphores were chosen for synchronisation (out of several options).

Chapter 6, The Operating System Machine Level

Moving from CS 61A Scheme to CS 61B Java

Object Oriented Software Design II

TELE 301 Lecture 7: Linux/Unix file

TEL2821/IS2150: INTRODUCTION TO SECURITY Lab: Operating Systems and Access Control

Linux Firewall Exploration Lab

Java CPD (I) Frans Coenen Department of Computer Science

Linux Kernel Networking. Raoul Rivas

Fast Arithmetic Coding (FastAC) Implementations

umps software development

Laboratorio di Sistemi Operativi Anno Accademico

Virtuozzo Virtualization SDK

Data Types in the Kernel

UNIX File Management (continued)

MatrixSSL Porting Guide

Informatica e Sistemi in Tempo Reale

IBM SDK, Java Technology Edition Version 1. IBM JVM messages IBM

Sophos Anti-Virus for Linux user manual

Using the CoreSight ITM for debug and testing in RTX applications

Table of Contents. The RCS MINI HOWTO

Professional. SlickEdif. John Hurst IC..T...L. i 1 8 О 7» \ WILEY \ Wiley Publishing, Inc.

Code and Usage Description. A.Konstantinov

DataTraveler Vault - Privacy User Manual

Programing the Microprocessor in C Microprocessor System Design and Interfacing ECE 362

Lecture 22: C Programming 4 Embedded Systems

Install Java Development Kit (JDK) 1.8

PCI-SIG ENGINEERING CHANGE REQUEST

Kernel Intrusion Detection System

Forensic Analysis of Internet Explorer Activity Files

Intel P6 Systemprogrammering 2007 Föreläsning 5 P6/Linux Memory System

A Survey of Parallel Processing in Linux

Leak Check Version 2.1 for Linux TM

Jonathan Worthington Scarborough Linux User Group

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

1. Introduction to the UNIX File System: logical vision

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Network packet capture in Linux kernelspace

Analysis of Open Source Drivers for IEEE WLANs

Transcription:

Operating Systems Project: Device Drivers Jordi Garcia and Yolanda Becerra 1 Department of Computer Architecture Universitat Politècnica de Catalunya 1. Introduction September 2012 The main aim of this project is to study the internal functions of an operating system in depth. You will learn how to modify basic data structures of an OS and improve its functionalities. In this second project, a generic Linux distribution (specifically, version 2.6) will be used and several kernel modules to add new functionalities will be implemented. When you start your PC in the laboratory, you must boot Ubuntu and the image labeled proso as usual. In [1] you can find all the documentation about Linux Kernel Modules (LKM). They basically allow kernel parts to be dynamically modified/added while Linux is still running without having to recompile or relink, as you had to do in the previous project. You will thus learn another way of modifying system code. Obviously, some functions have restricted access and some functions cannot be inserted into the kernel in this way. The most usual system changes made are to device drivers. However, to do so, it is necessary to be a privileged user as not everybody is allowed to make changes to a system. The printer driver is a typical example. Imagine a laptop that is used at home and at work; it will have several printer drivers installed on it. However, people rarely have the same printer at home as they do at work. Therefore, even though the drivers will always be installed, the physical devices (the printers) will not always be readily available. 1 This document was drawn up with the support of professors on previous courses: Julita Corbalán, Juan José Costa, Marisa Gil, Jordi Guitart, Amador Millan, Gemma Reig Silvia LLorente, Pablo Chacín and Rubén González. - - 1 - -

Specifically, in this project you will add a monitoring mechanism for some Linux system calls. This monitoring will be dynamically added by using a module, without the need for recompiling the Linux kernel. Once monitored, a new device will be added to allow users to access the statistics they wish to consult. It will therefore be necessary to create a driver for this device. Another module will have to be used to avoid recompiling the kernel. A summary is given below of essential concepts and of the basic code for creating modules, devices and drivers. 2. Previous concepts 2.1. Linux Kernel Modules (LKM) LKM is a Linux mechanism for dynamically adding a set of routines and data structures to a system. Each module is made up of an object file that can be dynamically mounted (inserted) on the running (executable) system using the insmod program and unmounted (removed) using the rmmod program. 2.1.1 Module definition In general, a module only needs to define an initialization function and an ending function, as can be seen in Figure 1, which shows the functions Mymodule_init and Mymodule_exit. #include <linux/module.h> #include <linux/kernel.h> #include <linux/init.h> MODULE_LICENSE( GPL ); /* * Initialize the module. */ static int init Mymodule_init(void void) { /* Initialization code */ printk(kern_debug Mymodule successfully loaded\n ); return 0; // This function returns 0 if is everything is OK // and < 0 in case of error } /* * Unload the module. */ static void exit Mymodule_exit(void void) { /* Finalization code*/ } module_init(mymodule_init); module_exit(mymodule_exit); Figure 1. Basic code of a module (mymodule.c) - - 2 - -

The optional tokens init and exit are used to indicate the kernel that these functions can only be used when initializing/ending the module. The routines defined with module_init and module_exit macros are automatically executed when the module is loading and unloading, respectively. The optional keywords init and exit inform the kernel that these functions can only be used when the module is being initialized/ended. The routines defined with the macros module_init and module_exit are executed automatically when the module is loading and unloading, respectively. These macros are mandatory. 2.1.2 Defining module parameters in loading time The Linux version 2.4 and later versions allow programmers to define parameters in loading time. The interface is quite easy: module_param. Defines the parameter, its type and the access rights in the corresponding sysfs 2 file that will be created for this module to allow users to access the parameter (in our case 0, which means it will not create any files). MODULE_PARM_DESC Makes it possible to add a short description to the parameter (which can be consulted later using the modinfo command). MODULE_AUTHOR Includes the author name in the module. MODULE_DESCRIPTION Includes a description of the module. MODULE_LICENSE. Shows the type of license the module has (GPL, BSD, etc.). There is a small example below. A parameter (the PID) of the type of integer that could be modified in loading time is defined in the module s source code: #include <linux/moduleparam.h>... int pid = 1; module_param (pid, int, 0); MODULE_PARM_DESC (pid, "Process ID to monitor (default 1)");... MODULE_AUTHOR("Joe Bloggs <joe.bloggs@somewhere>"); MODULE_LICENSE ("GPL"); MODULE_DESCRIPTION("ProSO driver");... 2.1.3 Compiling a module To compile the code in Figure 1 (saved in a file named mymodule.c), a Makefile must be created as described below: obj-m += mymodule.o 2 sysfs is a file system, generally located at /sys, used by the kernel to obtain information about devices, modules, etc. You can find further information in Chapter 2 of Linux Device Drivers, listed in the bibliography. - - 3 - -

all: clean: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean This command will result in an ELF file named mymodule.ko (ko=kernel object). 2.1.4 Utilities that Linux has for managing modules Linux has the insmod, lsmod and rmmod commands, which install, list and remove modules from the system, respectively. Additionally, the command modprobe determines whether or not a module with the same name exists and it installs it automatically; and modinfo, which is used to look up information about the module. To load a module, use the following command: #insmod mymodule.ko In loading time, a defined parameter can be given: #insmod mymodule.ko pid=1 Also try executing the command: #modinfo mymodule.ko 2.1.5 What does insmod do internally? The program loads the file given in the address space of the operating system and also links the remaining unresolved symbols of the file in the system symbol table It also makes it possible to change some values of the integer variables or object strings, so that the module (the driver it contains) can be configured in loading time. 2.1.6 When can a module be unloaded? A module can only be unloaded when no one is accessing it. In order to ascertain whether or not it is in use, the kernel maintains a reference counter that must be properly updated. For instance, all the functions in a module that can be accessed from other modules must increment this counter when called and decrement it when returned. To maintain this counter, the programmer can use the following macros: try_module_get(this_module): Increments the counter module_put(this_module): Decrements the counter These counters can be checked in the special device /proc/modules. If the counter is not 0, it is not possible to unload the module. Therefore it is important to maintain a consistent number of gets and puts in the counter. 2.1.7 Expressing dependencies between modules In some cases, a module needs the functionality of another module. It is therefore not possible to install the first module until the second has been installed. Linux allows - - 4 - -

these dependencies to be expressed by means of the file /lib/modules/modules.dep (for instance, /lib/modules/2.6.27-proso/modules.dep). For example, if module modulea requires module moduleb, this can be expressed as: /absolute_path /modula.ko: /absolute_path /modulb.ko /absolute_path /modulb.ko: It should be highlighted that the path must be an absolute path to the module s code. Thus, the command modprobe facilitates the task of loading modules when the following command is executed: #modprobe modula.ko The modules will be loaded in the proper order. 2.2. Devices A device is a real or virtual peripheral that users can use to perform input/output operations or to interact with the OS kernel. 2.2.1. What is a device driver? It is the group of routines and variables that handles the functions of a device (open, release, read, write, etc.), as is shown in Figure 2Error! No s'ha trobat l'origen de la referència.. Figure 2. Data structures for device management in Linux Usually, the routines that control the operation of a device require access to instructions (in/out) or addresses not allowed as an ordinary user. To be able to access these instructions and/or addresses, the code must be executed in system mode and, therefore, the driver is included in the OS code. - - 5 - -

2.2.2. How to install a device driver in the system There are two possible mechanisms: Statically, by recompiling all the system, including the new driver routines. Dynamically, by using system calls or software that make it possible to dynamically include object files in the kernel of the OS (for example, a module). You can see how a module is compiled and installed in Sections 2.1.3 and 2.1.4. 2.2.3. Defining device operations To define the driver, only the group of valid operations for the device has to be defined. The possible operations to be defined are found in the header file <linux/fs.h>, in the file_operations structure. Its format is: struct file_operations { struct module *owner owner; loff_t(*llseek) (struct file *, loff_t, int); ssize_t(*read read) (struct file *, char user *, size_t, loff_t *); ssize_t(*aio_read) (struct kiocb *, char user *, size_t, loff_t); ssize_t(*write) (struct file *, const char user *, size_t, loff_t *); ssize_t(*aio_write) (struct kiocb *, const char user *, size_t, loff_t); int (*readdir) (struct file *, void *, filldir_t); unsigned int (*poll) (struct file *, struct poll_table_struct *); int (*ioctl ioctl) (struct inode *, struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); int (*open open) (struct inode *, struct file *); int (*flush) (struct file *); int (*release release) (struct inode *, struct file *); int (*fsync) (struct file *, struct dentry *, int datasync); int (*aio_fsync) (struct kiocb *, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); ssize_t(*readv) (struct file *, const struct iovec *, unsigned long, loff_t *); ssize_t(*writev) (struct file *, const struct iovec *, unsigned long, loff_t *); ssize_t(*sendfile) (struct file *, loff_t *, size_t, read_actor_t, void user *); ssize_t(*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); unsigned long (*get_unmapped_area) (struct file *, unsigned long, unsigned long, unsigned long, unsigned long); You will use the open, release (which corresponds to close), read and ioctl operations. The open function makes the device available to the program and the release function ends access. They both return 0 if everything is correct and <0 in the case of error. The arguments of the read function are the following: the user s - - 6 - -

buffer where the read characters are stored, the number of characters to read size, and an offset in/out parameter that shows the current position of the read/write pointer before it is read, and the current position after it is read. The call returns the number of bytes read, 0 if it has reached the end of the file, or <0 if an error has occurred. The ioctl function returns 0 if everything is correct or <0 if an error has occurred. The first field in the file_operations structure, named owner, is used when the driver is installed as a module. If defined, this field saves the programmer the task of explicitly handling the reference counter of the driver s module (as explained in Section 2.1.6): when using the macro THIS_MODULE, the kernel automatically maintains the reference counter and it is not necessary to use the functions try_module_get and module_put 3. There are also definitions for the structures struct inode and struct file in this header file. In order to make the code reading easier, the required fields of this structure might be tagged such as those listed below: struct file_operations mymod_fops = { } owner: THIS_MODULE, read: mymod_read, ioctl: mymod_ioctl, open: mymod_open, release: mymod_release, Note that this syntax is not C standard, but an extension of the GNU compiler. In the case this compiler is not be available, our old friend NULL will have to be used in the fields in which initialization is unwanted. Finally, it must be pointed out that the only operations required are those that the driver will have. 2.2.4. Device identification How does the system know that a specific system call references a specific driver? By default, it does not know. When the driver is installed, an identifier must be explicitly specified. This identifier is unique and it is formed by two integers: the major and the minor (traditionally the major was used to identify the device type and the minor to identify the subtype). This operation is called registering the driver and is performed in two steps: first, the identifier is reserved, and then the operations are associated with this identifier. To generate an identifier for a driver, use the MKDEV macro, which, given a major and a minor, fills the type dev_t structure with the corresponding identifier. dev_t MKDEV(unsigned int major, unsigned int minor); A definition of the structure dev_t can be found in the header file 3 Keep in mind that a driver is considered in use since the device is opened until it is released. - - 7 - -

<linux/types.h>. Furthermore, when a new device is added to the system, it is also necessary to specify the major and minor of the corresponding device driver that manages it. Thus, each time an operation is performed on a device, the system uses this identifier to find the operations of its driver. 2.2.5. How to register a device driver The first step in registering a device driver is to reserve its identifier. Using the register_chrdev_region function defined in the header file <linux/fs.h>, it is possible to reserve a range of device drivers identifiers: int register_chrdev_region (dev_t first, unsigned int count, const char *name); The arguments are the first identifier of the region to be reserved (first), which must be previously generated from a major and a minor using the MKDEV macro; the number of identifiers to be reserved (count); and the name of the device (name), which will be shown in /proc/devices. A negative return value means an error has occurred. This function reserves count identifiers, all of which have the same major (from the parameter first) and consecutive minors (starting with the minor from the parameter first) 4. To release the driver s identifiers and allow them to be used in the future, the unregister_chrdev_region function can be used. void unregister_chrdev_region(dev_t first, unsigned int count); The arguments are the region s first identifier (first) and the number of identifiers in the region (count). Once the identifiers have been reserved, they must be associated with the driver s specific operations. To do so, a type cdev structure is used. This structure is defined in the header file <linux/cdev.h>. First, it is necessary to define a new structure: struct cdev *my_cdev; It is then necessary to reserve the memory space for the structure using the following 4 It is also possible to let the system assign all the identifiers for a driver, without passing the first identifier in the range, but this requires using the alloc_chrdev_region function instead of register_chrdev_region. For further details about this alternative function, see Chapter 3 of Linux Device Drivers, cited in the bibliography. - - 8 - -

function 5 : struct cdev *cdev_alloc(); Two of its fields must then be initialized, namely, the owner field, used by the system to maintain a counter of references to the structure and that must be initialized using the macro THIS_MODULE; and the ops field, which must be initialized using the structure file_operations that contains the specific operations for the driver. Finally, this structure must be attached to the device structures registered in the system using the following function: int cdev_add(struct cdev *dev, dev_t num, unsigned int count); The parameters of this function are: the structure that contains the operations of the driver (dev), the first identifier of the region (num) and the number of drivers of the region which are to be associated with these operations (count). This function returns a negative value if any errors occur. Until this function is successfully executed, the driver will not be visible to the system and, therefore, it will not be possible to use its functions. Below is a short example in which a new cdev is declared and initialized: struct cdev *my_cdev; my_cdev = cdev_alloc(); my_cdev->owner = THIS_MODULE; my_cdev->ops = &my_fops; /* my_fops is a static structure of type file_operations previously initialized with the operations of the driver */ cdev_add(my_cdev, dev, ndev); /* check of errors must be made */ When the driver is no longer in use, its cdev structure must be removed from the system: void cdev_del(struct cdev *dev); 2.2.6. How to select the major and the minor The device driver identifier is derived from the major and the minor. Therefore, a combination that no other device is using is required. There is a list of all the installed device drivers with their major 6 in the file /proc/devices. Although Linux version 2.6 5 If the variable of type cdev is defined statically rather than as a pointer, the cdev_init must be used instead of the cdev_alloc function. You will find the definition of this function in Linux Device Drivers, cited in the bibliography. 6 In the previous version of Linux, the major was used to identify the device driver and the minor was only used internally by the driver to distinguish between the different device types that it could manage. In version 2.6 and the following versions, both numbers (major and minor) are needed to identify the operations associated with a device. However, the format in /proc/devices still only shows the major of the driver. - - 9 - -

allows different drivers to have the same major, a major currently not assigned can be selected to obtain a new major-minor combination. Thus, any minor can work with the confidence that the combination is not already in use. There is an option that frees the programmer from the task of selecting these numbers, whereby the system is told to dynamically reserve a rank of driver identifiers (which implicitly selects the majors and the minors in the region 7 ). Note that in this case, the driver identifiers can vary each time the driver is installed: this behavior must be considered when the devices are added. 2.2.7. How the major and the minor are recognized inside the driver Using the following macros: int MAJOR(dev_t dev); int MINOR(dev_t dev); The value of the parameter dev is extracted from the inode (one of the parameters that all the driver s operations receive: Section Error! No s'ha trobat l'origen de la referència.). 2.2.8. How a file is associated with a device The devices are visible from the filesystem (by default the files in /dev/* are devices). They can be created using the system call mknod. #mknod file type major minor The arguments are: the file, which identifies the file that will be used as a device; the type (a c to create a character device); the major and the minor, which are integers that make it possible to identify the device in the system (see Section Error! No s'ha trobat l'origen de la referència. for further details; any minor can be used to begin). To see the various devices already existing in the system, check the file /proc/devices, in which the available devices (major and registered names) are grouped by device type. Once the device file is created, the functionalities (i.e. which operations the system allows for the device) of this file must be defined. This is done by using the device driver. 2.2.9. How users access the device Users can access the device using the usual file system calls: open, close, read, write, ioctl. 7 For further details, see Chapter 3 of Linux Device Drivers, cited in the bibliography. - - 10 - -

The only of these system calls that is really dependent on the peripheral is ioctl, which allows users to perform specific operations on the peripheral by combining its last two arguments. 2.3. Linux 2.3.1. How to find the Linux source code The directory /usr/src/linux contains the system s source code. The headers related to the system version are in /usr/src/linux/include. The various routines and structures used by Linux to manage processes can be found in Chapter 3 of [2]: for_each_process, find_task_by_pid, etc. (see http://lxr.linux.no). 2.3.2. Symbols By symbols, we mean variable names and routine names. Symbols from an object file can be consulted using the nm command. Another kind of symbol is that defined using #define, such as the macro current, which returns a pointer struct task struct that references the control data of the running process. This structure is known as the PCB (Process Control Block). See http://lxr.linux.no/source/include/asm-generic/current.h). 2.3.3. The Linux symbol table Linux exports a set of symbols so that they can be used and referenced from modules. For each exported symbol (variable, routine, etc.) the system keeps the name and memory address where it is allocated (system logical address) in a table. This symbol table is created in compilation time, since it is necessary to have the name of the symbol and its address. To export a symbol, the macro EXPORT_SYMBOL(symbol_name) must be used and the kernel recompiled (an example can be seen at http://lxr.linux.no). As modules are part of the kernel, they can also export symbols using this macro. A definition of the system s architecture-independent symbol table can be found in the file kernel/ksyms.c. The Intel architecture-dependent exported symbols can be found in the file arch/i386/kernel/i386_ksyms.c. 2.3.4. How to ascertain the contents of the system s symbol table There are two different ways: Reading the device /proc/ksyms Using the command ksyms -a - - 11 - -

2.3.5. What must internal system routines return? Who receives this information? The Linux convention states that a negative value (< 0) is returned when an error occurs. Otherwise, a non-negative value (>=0), is returned. The type of error is the absolute value of the returned code. It is necessary to find out its meaning in the header file <sys/errno.h>. 2.3.6. What action should be taken when an internal system routine returns an error? What type of error should be returned? If we do not have a specific treatment, it should be returned the same error that has already given us back the system routine. 2.3.7. User space and system space The system differentiates between two address translation functions when it is running. Depending on the processor execution mode, one or another will be used. Usually, there is a unique translation function for the system code and a specific translation function for each running process. This mechanism guarantees the system security, since users cannot change system data from its applications or between applications because they cannot access other application address spaces. Likewise, the memory access mechanism depends on the execution mode. If it is necessary to access the user address space (to pass parameters for system calls, for example) when running in system mode, special instructions will be needed to tell the processor that the user address space must be used, even if the system mode is on. Remember that you will have to check in each case if it is possible for the user and the system space to copy this information, as you did in Project 1. 2.3.8. Operations to copy data between address spaces Basically, there are two operations, which are declared in <asm-i386/uaccess.h>: unsigned long copy_from_user(void *to, const void *from, unsigned long count); To copy from the user mode to system mode. unsigned long copy_to_user(void *to, const void *from, unsigned long count); To copy from the system mode to user mode. See the parameters and return values at http://lxr.linux.no. 2.3.9. How printk works printk is the routine that can be used inside the kernel to write information in the computer console. However, its (excessive) use is discouraged. The format and parameters that it uses are a limited version of that used for the printf routine from the C library. It is a line-buffered writing function. This means that it will not write data until it does not find a line jump. It must be highlighted that printk has a special feature: the first characters of a string - - 12 - -

are interpreted as the message priority that is to be written. The format of this information is: printk ("<N>Goodbye cruel world\n"); printk (KERN_EMERG "Goodbye cruel world\n"); where N is a number between 0 and 7. Depending on the priority level, the message appears in a different place: the computer console, a log file (for example, /var/log/messages or /var/log/kern.log), etc. The log file name depends on the system configuration (/etc/syslog.conf). Some macros such as KERN_EMERG for defining different priorities can be found in the file <linux/kernel.h>. If the priority is lower than console_loglevel, the message is printed to the console. If syslogd and klogd are running, the message is also written in the log file, regardless of whether or not it is written to the console. All these kernel messages are kept on a structure called the kernel ring buffer, which can be accessed using the dmesg command. The size of this buffer is limited, so old messages are removed to make room for new ones (a kind of circular queue). More information can be found in the appendix of this document or by using the man command: man dmesg man syslogd man syslog.conf 3. Description of work In this project, you have to modify the system to take usage statistics. Current OSes have different ways to store statistics about them, so that problems can be easily identified and the appropriate actions taken. In our particular case, the task will be to find out the system s mean response time by focusing on system calls. To do so, the entry point of each system call must be modified introducing new code (instrumentation). We will keep information for the following calls: open, write, clone, close and lseek. The information needed for each type can be summarized by: Number of times the call is initiated Number of times it ends correctly Number of times it ends incorrectly The time the call is running By adding this instrumentation to each system call, the system may be a little slower. You are therefore to implement this instrumentation dynamically so that it can be enabled or disabled. To do this, the Linux system call table will be intercepted and each function to measure up, replaced for a local function. This local function will check the time it takes the old function to execute. This local function will have the same interface than the corresponding system call (you can see this interface on the Linux source code). - - 13 - -

To sum up, you will have to implement two modules: Module 1 to intercept system calls and measure the time spent. Module 2 to access system statistics. 3.1. Module 1: Intercepting and measuring This module will intercept the symbols table (modify the system call table), insert the instrumentation functions when the module is loaded into the system (by enabling instrumentation) and remove them when unloading it. The instrumentation functions will be in the module and they will update the system call counters for the current process. You can use the mechanism explained in the section How to measure time for measuring times. From the system call table, the original functions will be replaced by those to be monitored for the routines created. These monitoring routines must (see Figure 3): 1. Mark the beginning of the system call 2. Execute the original system call 3. Calculate the total running time and obtain the call result Running program Your routine is executed: time check Original system call System call System call return The original system call ends. Time check and total duration calculated. Figure 3. Diagram of system call interceptions 3.1.1. Intercepting the system call table The system call table is called sys_call_table and it is a table of function pointers. Thus, if the module defines: extern void * sys_call_table[]; - - 14 - -

a symbol will be obtained that references the system call table (see http://lxr.linux.no/). 3.1.2. Intercepting system calls After the system call table has been obtained, only the functions to be monitored need to be kept. Below is an example of where the variable is stored: sys_open_old = sys_call_table[pos_syscall_open]; From this time on, the variable sys_open_old will contain a pointer to the original open system call. 3.1.3. Modifying the system call table The table will obviously be modified in the same way. If a function to monitor the open called sys_open_local has been defined, it will only be necessary to enable the following: sys_call_table[pos_syscall_open] = sys_open_local; 3.1.4. Headers of the system call to trace As an example, you can find below the headers of the system calls to trace in modul 2. Notice that the kernel headers are different from the user (system calls headers): long sys_open(const char user * filename, int flags, int mode); long sys_close(unsigned int fd); ssize_t sys_write(unsigned int fd, const char user * buf, size_t count) ; off_t sys_lseek(unsigned int fd, off_t offset, unsigned int origin) ; int sys_clone(struct pt_regs regs) ; To obtain the other system call headers you can see: http://www.jollen.org/blog/2006/10/linux_2611_system_calls_table.html 3.1.5. Where to maintain statistics In kernel 2.6, the management of the PCB is different to version 2.4 and is therefore different to the ZeOS. In version 2.6 and the following versions, it was decided to split the PCB into two components: task_struct and thread_info. - - 15 - -

current_thread_info() current Figure 4. Sharing of two pages by the kernel stack and the thread_union The task_struct contains the process information, such as the open files and a pointer to the thread_info. The thread_info is the structure that shares the memory space with the kernel stack. It contains the thread s execution state and a pointer to the task_struct. A definition can be seen at http://lxr.linux.no/#linux+v2.6.34.1/arch/x86/include/asm/thread_info.h. A definition of the thread_union (the convergence of the thread_info and the stack) is at http://lxr.linux.no/#linux+v2.6.34.1/include/linux/sched.h#l1939. Also see Figure 4. The macro current must be used to obtain the address of the task_struct. The routine current_thread_info() must be used to obtain the base address of the thread_info. As the statistics require little space, they will be stored just above the structure thread_info so that they can easily be associated with the process. To do so, a new structure called my_thread must be created. It will contain the structure thread_info and the statistics for the process (see Figure 5). - - 16 - -

Figure 5. Where statistics are stored For each process, it must be determined whether or not its statistics have been initialized. Notice that to create a new process, the kernel will reuse the same data structures previously associated with a dead process. Therefore, this structure can still store the statistics from the previous process. It must be determined whether or not statistics have been initialized. Therefore, an additional field can be defined in the data structure to store the PID of the process associated with the data currently stored. If the PID does not match the current process PID, the statistics have not been initialized and, therefore, this must be done and the PID updated. 3.1.6. How to measure time You can use the proso_get_cycles function for measuring time. It is implemented as follows: #define proso_rdtsc(low,high) \ asm volatile ("rdtsc" : "=a" (low), "=d" (high)) static inline unsigned long long proso_get_cycles (void) { unsigned long eax, edx; } proso_rdtsc(eax, edx); return ((unsigned long long) edx << 32) + eax; 3.1.7. Other restrictions Each process must have its own counters for the system calls, so they will have to be reset to zero for each new process. As a final requirement, you must check that everything works properly. Thus, when the module is uninstalled from the system, it will print the PID s statistics on the screen that will have been entered as an argument when the module was inserted. It is necessary to prevent the module from being unloaded when there is a process with an intercepted call (you can use try_module_get and module_put, as explained in Section 2.1.6). - - 17 - -

3.1.8. Tests To check that the module is working properly, a number of tests must be run to show this. Below is a skeleton of the tests: Begin the test. Print a start message. Print the PID of the current process and block the process until a key is pressed. (Load the module with the PID of the test.) Press a key and continue with the test. Check that all the system calls have been monitored. Print an end message to finish the test. Block the process until a key is pressed. (Download the module and print the process statistics.) Check that the results belong to the test. End the test. 3.2. Module 2: Accessing system information The main aim of this project is to build a module that will make it possible to access the information stored in the previous delivery. Using the previous module, it was possible to monitor all the processes created in the system. Using this module, a new device will be created to access this information. To create this device, you must select a major and a minor to identify the driver, which will be used to register it and to create the file that will represent the device in the file system. This module allows the process and the system call to be selected from which information is to be obtained (even when all the processes and the five system calls are being monitored). A new device must be created that will make it possible to perform the following operations: ssize_t read (struct file *f, char user *buffer, size_t s, loff_t *off); int ioctl (struct inode *i, struct file *f, unsigned int arg1, unsigned long arg2); int open (struct inode *i, struct file *f); int release (struct inode *i, struct file *f); open. The device can only be opened one process at a time and only by the user root (uid==0). read. A read on this device will return a structure to the user space (buffer) with information about the current monitored system call for the process currently being monitored. Users should create the structure before executing the read system call. The number of bytes to be read will be the minimum before the s parameter and the sizeof(struct_info). ioctl. Users will be able to modify the device s settings using this call (selected process and system call, etc.). release. This call will deallocate the use of the device. - - 18 - -

By default, the read system call obtains the statistics of the open system call for the process that opened the device. The structure that will be returned to the user is of the type shown below: struct t_info { } int num_entries; int num_exits_ok; int num_exits_error; unsigned long long total_time; In order to control the behavior of this new device using the ioctl call, the following parameters (the values in brackets are constant values) must be defined: CHANGE_PROCESS (arg1 = 0). The third parameter (arg2) indicates, by reference, the identifier of the process that is to be analyzed. If the pointer is NULL (zero), this means that once again the target process for the read system call is the one that opened the device. If the required process does not exist, the system call must return an error. CHANGE_SYSCALL (arg1 = 1). This makes it possible to change the target system call for the read system call. The meaning of the third parameter (arg2) is: OPEN (0) WRITE (1) LSEEK (2) CLOSE (3) CLONE (4) RESET_VALUES (arg1 = 2). This resets the statistics of the process currently being analyzed. RESET_VALUES_ALL_PROCESSES (arg1=3). This resets the statistics of all processes. The system call will return a zero if everything worked properly, and < 0 if an error occurred (the corresponding error code will be displayed). 4. Dynamic monitoring The aim in this stage of the project is to add more dynamism to the instrumentation mechanisms than have been explained so far. To do so, the modules created must be modified. - - 19 - -

4.1. Changes in Module 1 The monitoring of system calls is to be dynamically activated/deactivated. Thus, the new behavior will be as follows: By default, all five system calls will be monitored, as before. Two new functions will be added to make it possible to activate/deactivate the monitoring of system calls. Module 2 will access these two new functions. The addresses of the system calls should be kept on a table (penalties will be imposed if such a table is not implemented). 4.2. Changes in Module 2 To allow users to activate/deactivate the monitoring of a system call, the functionality of the ioctl function for the device must be extended using the new functions added to Module 1. Enable and disable the system calls to implement selectively. The system call ioctl must be modified to implement two new operations: o o ENABLE SYS CALL (arg1 = 4 and arg2 = num). Enables the instrumentation of the system call num (or all of them if num is a negative number). DISABLE SYS CALL (arg1 = 5 and arg2 = num). Disables the instrumentation of the system call num (or all of them if num is a negative number). Users must be able to easily introduce the type of the call to be implemented, such as by using constants. 5. Deliverables You should deliver all of the source files (including Makefiles) you have created and the test suite you used to test the modules. Additionally, you must submit a README file describing your test suite and the instructions to execute it. - - 20 - -

6. References [1] Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman: Linux Device Drivers, third Edition, February 2005. http://lwn.net/kernel/ldd3 [2] Daniel P. Bovet, Marco Cesati: Understanding the Linux Kernel. O Reilly. November 2005. [3] http://tlpd.org/lpd/kmpg/2.6/html/lkmpg.html#aen245 7. Appendix: how to export the sys_call_table symbol In the kernel you will use in the laboratory, the symbol sys_call_table will already have been exported. However, if you want to test your project at home, follow the steps below to generate a new kernel image (with the name linux.2.6.xx-proso, where xx is the subversion you have installed). 1. Go to the kernel source code directory (assume to be /usr/src/linux). # cd /usr/src/linux 2. Modify the file arch/i386/kernel/i386_ksyms.c by adding the following lines: extern void * sys_call_table[]; EXPORT_SYMBOL(sys_call_table); 3. Edit the Makefile to modify the variable EXTRAVERSION, to set the image name: # vi Makefile... EXTRAVERSION=-proso... 4. Use the current configuration file from the /boot directory. # cp /boot/config-2.6.xxxx.config 5. Prepare the environment to compile the kernel. # make oldconfig 6. Recompile the kernel. # make 7. Recompile the modules. # make modules_install - - 21 - -

8. Install the kernel image (vmlinuz-2.6.xxx-proso) and the symbols (System.map-2.6.xx-proso). # make install 9. Generate a boot file with the required modules (otherwise, the system will not boot). # mkinitramfs -o /boot/initrd.img-2.6.xxx-proso 2.6.XXX-proso 10. Modify the grub s boot file, /boot/grub/menu.lst to add the new image. # vi /boot/grub/menu.lst 11. Modify the following fields to point to the new image and the new initrd file: title kernel initrd - - 22 - -