File Management. Chapter 12



Similar documents
Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

File Management. Chapter 12

Chapter 12 File Management

Chapter 12 File Management. Roadmap

Chapter 12 File Management

Chapter 12 File Management

File Management. File Management

FILE MANAGEMENT CHAPTER

Outline. File Management Tanenbaum, Chapter 4. Files. File Management. Objectives for a File Management System

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Physical Data Organization

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13. Disk Storage, Basic File Structures, and Hashing

File Management. COMP3231 Operating Systems. Kevin Elphinstone. Tanenbaum, Chapter 4

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

File-System Implementation

Record Storage and Primary File Organization

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Storage in Database Systems. CMPSCI 445 Fall 2010

CHAPTER 17: File Management

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

File Management Chapters 10, 11, 12

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

Chapter 6, The Operating System Machine Level

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

File System Management

TELE 301 Lecture 7: Linux/Unix file

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Databases and Information Systems 1 Part 3: Storage Structures and Indices

File Systems Management and Examples

COS 318: Operating Systems

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Lecture 1: Data Storage & Index

Lab 2 : Basic File Server. Introduction

Chapter 11 I/O Management and Disk Scheduling

Database 2 Lecture I. Alessandro Artale

Topics in Computer System Performance and Reliability: Storage Systems!

6. Storage and File Structures

Unit Storage Structures 1. Storage Structures. Unit 4.3

Filing Systems. Filing Systems

SMALL INDEX LARGE INDEX (SILT)

In-memory Tables Technology overview and solutions

The Classical Architecture. Storage 1 / 36

Buffer Management 5. Buffer Management

Linux Driver Devices. Why, When, Which, How?

White Paper. Optimizing the Performance Of MySQL Cluster

System Calls and Standard I/O

OPERATING SYSTEMS FILE SYSTEMS

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Lecture 17: Virtual Memory II. Goals of virtual memory

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

Part III Storage Management. Chapter 11: File System Implementation

Operating Systems. Virtual Memory

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Overview. File Management. File System Properties. File Management

Lecture 18: Reliable Storage

Overview of Storage and Indexing

Query Processing C H A P T E R12. Practice Exercises

Prof. Dr. Ing. Axel Hunger Dipl.-Ing. Bogdan Marin. Operation Systems and Computer Networks Betriebssysteme und Computer Netzwerke

FAWN - a Fast Array of Wimpy Nodes

Chapter 11 I/O Management and Disk Scheduling

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

Distributed File Systems

1 Storage Devices Summary

Raima Database Manager Version 14.0 In-memory Database Engine

Optional custom API wrapper. C/C++ program. M program

Storing Data: Disks and Files

Introduction. What is an Operating System?

Operating system Dr. Shroouq J.

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Operating Systems 4 th Class

PIONEER RESEARCH & DEVELOPMENT GROUP

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Lesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

DATABASE DESIGN - 1DL400

Storage and File Structure

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Operating Systems, 6 th ed. Test Bank Chapter 7

Algorithms and Methods for Distributed Storage Networks 7 File Systems Christian Schindelhauer

Ryusuke KONISHI NTT Cyberspace Laboratories NTT Corporation

Understanding EMC Avamar with EMC Data Protection Advisor

Chapter 11: File System Implementation. Chapter 11: File System Implementation. Objectives. File-System Structure

Computer Architecture

Concepts of digital forensics

Storage Management for Files of Dynamic Records

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

Chapter 13: Query Processing. Basic Steps in Query Processing

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

Transcription:

Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution of applications. Users want to access files, save them, and maintain the integrity of their data files. Hence, virtually all OS provide some sort of file management facilities. Although a file management system consists of system utility programs that run as special applications, they all need certain special services from OS, and sometimes, is simply part of the OS, such as Window Explorer, and the one embedded in unix. 1

Overview of file structure A field is the basic element of data. It consists of a length, together with a type, for the value it is supposed to contain. It can be of fixed length, or variable length. A record is a collection of fields that can be treated as a unit by some other programs. For example, an employee record may contain such fields as name, SSN, DOB, data of hire, etc.. A file is a collection of records. It is treated as a single entity by users and applications, and can be referenced by a unique name. Needless to say that a file can be created, deleted, and modified. Access control is usually done at the file level to give different users different access to different files. Finally, a database is a collection of related data files. (Do we really need to talk about it?) 2

Typical operations A file management system usually provides the following operations: Retrieve all the records of a file, e.g., when an application has to produce a summary of the information contained in a file. Retrieve one record, as frequently required in database transactions. Retrieve next(previous) record. This may be required in filling a form. Insert a new record into a file, delete an existing one from a file, or update a record in a file, or retrieve a number of records.. We saw all these in the context of a database transaction. 3

Objectives of a file system 1. To meet the data management needs and requirements of the user, e.g., having the ability of performing the above operations. 2. To guarantee that the data in the file are valid. 3. To optimize performance, e.g., in terms of system throughput and response time to the user. 4. To provide I/O support for a variety of devices. 5. To provide a uniform set of I/O interface routines. 6. In a multi-user system, provide corresponding I/O services. 4

More specifically,... for an interactive, and general-purpose file management system, it should allow each user to: 1. able to create, delete, read, and change files; 2. perhaps have access to other uses files with certain restrictions; 3. control access rights to its own files; 4. able to restructure her files to a form proper to the problem; 5. able to move data between files; 6. able to back up and restore her files in case of damage; and 7. able to get access to her files by using symbolic(logical) names. 5

File system architecture Below shows a typical architecture for a file management system: At the lowest level, such a system provides drivers for such storage devices as disk(direct access) and tape(sequential access). The next one, basic file system, provides a primary interface with the outside environment, concerned with placing blocks of data on the secondary storage device and on the buffering of those data inside the main memory. 6

The other stuff The basic I/O supervisor is responsible for all file I/O initiation and termination. It is also concerned with selecting an appropriate device to perform certain file I/O operation, based on the nature of the file to be processed. It is also responsible for various scheduling, and optimization, activities. Buffering and secondary memory allocation are also dealt with at this level. This is part of the operating system. The logical I/O piece enables users and applications to access records, instead of data blocks, which is handled by the basic file system. Finally, the level closest to the user is often called the access method, which leads to different file structures and ways of accessing to, and processing of, the data. 7

File organization and access methods In choosing a file organization, we have to consider the following factors: access speed, ease of update, economy of storage, simple maintenance, and last, but not least, reliability. Importance of these factors vary with applications. For example, when a file is only processed in the batch mode, with all the records accessed every time, then rapid access to a single record is not important. They may conflict with each other, as well. For example, for achieving economy of storage, there has to be minimum redundancy in the data. On the other hand, redundancy is the chief means of increasing data access, and raising reliability. 8

Structure of a file A file can be organized as a pile, a sequential file, an indexed sequential file, an indexed file, and a direct file. With a pile organization, data records are just collected in the way they arrive, sort of a set. Such records may have different number of fields, or even similar fields in different orders. Hence, the (maximum) length of each field for a record in a pile should be specified in a certain way. Since there is no structure, the only access method for the pile structure is via an exhaustive search. 9

Sequential file This is the most common form of file structure, where a fixed format is used for records. All of the records are of the same number of fixed-length in a given order. One of the fields is referred to as the key field, which is used to uniquely identify the records. Records are typically stored in the order of the key values. It is the only structure suitable for the tape media, and is the optimal method for applications that have to access to all the records. This structure does pose some issues when we have to deal with dynamic data. Typically, a log file is used and later merged with the sequential file. A linked structure, which collects all the blocks with the same key, is also a viable alternative. 10

Indexed sequential file Indexed sequential structure is a popular alternative. It maintains the key characteristic of the sequential structure, with two additional features: 1) An index is added to support the random access, by providing a lookup ability to the vicinity of a desired record. 2) An overflow file with a pointer mechanism so that the records in such an overflow part can be accessed via pointers from their previous records located in the main file. When only one level of index is used, an index record consists of a key and a pointer into its location in the main file. To find a specific record, the index is searched to find the highest key value that is equal to, or precedes, the desired key value, then the search continues in the main file, via the associated pointer. 11

An example Consider a sequential file with one million records. To look for a specific key value, with the sequential search algorithm, will take on average half a million comparisons. (You might think of sorting them out, but external sorting takes a lot more time, although in the same order.) Now assume that we add in an index with 1000 indices. If the keys in the index are more or less evenly distributed over the main file, any search for a specific record will now make on average 500 accesses to the index file, followed by another 500 accesses to the related segment of the main file to eventually find the record. This leads to a reduction of comparisons from 500,000 to 1,000 at the cost of an additional index file. 12

Just a bit maintenance Each record in such a file contains an invisible linker to the associated overflow file. When a record is added to an indexed sequential file, it is added to the overflow file, and the record in the main file with its key value immediately preceding that of the newly added record is updated to contain a pointer to the newly added. On the other hand, if such a record is itself contained in the overflow file, then its pointer will be changed accordingly. As with the sequential file, the overflow file is occasionally merged with the main file. Such a file is pretty flexible, and greatly increases the efficiency of file management. Multiple levels of indices can be used, which will lead to further improvement. 13

Another issue and its solution The indexed sequential file keeps one limitation of the sequential file. Effective access depends on one key field. It is not easily done when a search has to be carried out on an attribute other than the key field. To achieve such a flexibility, we usually use a structure containing multiple indexes, one for each type of the searchable field. Records organized in such a structure are only accessed through the corresponding indexes. Thus, it no longer matters where a record is placed, and variable-length record can be used. These types of files are referred to as indexed files, and are used in applications when timeliness is a critical issue, and data are rarely processed exhaustively. Did we go through this stuff in DB? 14

Direct access Finally, the direct, or hashed, file exploits the capability found on disks to access directly any block of a known address. Again, a key field is made use of, but it does not need any sequential ordering within the file. A typical process of searching for a record is to use a Hashing table, which allows constant access time, if we are not too greedy. (Still remember anything about it, such as loading factors?) This type of organization is often used when rapid access is required, where records are always accessed one at a time. Examples include price lists, name lists, schedules, etc.. 15

File directories Associated with any file management is a file directory structure, which contains information about files, their size, location, ownership, access rights, etc.. The directory itself is a file, owned by the OS and accessed by various file management routines. A very simple way to organize a directory is to use a list of entries, one for each file. Although it was actually used in some earlier system, it is certainly inadequate in any realistic sense. For example, a user might want to organize her files by type, by project, which calls for more sophisticated way for organization, and forces users to use different names for the same file, but of different type. 16

A no brainer Hence, the hierarchical, or tree, structure, as a much more flexible and natural directory structure is almost universally adopted. (Is any explanation necessary on what such a structure looks like?) As we discussed many times, the key property of such a structure is that the path between any two nodes is unique. Each directory itself, at a lower level, can be organized as a subtree, or in a sequential file; and in case of a bigger one, we can follow the approach of using a direct file. 17

What else? With a tree as the directory structure, the naming becomes pretty simple and logical. We simply follows the path from the root all the way down to the file itself. On the other hand, it is not very convenient to always have to use the whole path. This leads to the concept of a working directory or relative path: all references to files will then be relative to the current directory, or the working directory. I recently found out that some earlier system did not support changing of working directory (:-(). 18

File sharing In a multiuser system, there is almost always a requirement for allowing files to be shared among many users. To ensure the integrity of such sharing, we have to address the issues of access rights and simultaneous access. The file system should provide a number of options so that we can control the way a file is shared. Typically, users or groups of users are granted certain access rights to a file. Still remember the chmod stuff? 19

File sharing examples Such hierarchical rights can be none, when a user will not even be aware of the existence of such files; knowledge, when a user knows about this file, as well as its owner, thus, she might ask for the owner for additional rights; execution, when a user can load and execute the application; reading, where a user can get access to its content; appending, a user can add stuff in; updating, a user can even change things; and changing protection, when a user can change access rights granted to other users; and deletion. 20

Ownership, etc. One user is initially designated as the owner of a given file, who will be granted all the above rights and can also assign rights to other users, classified as follows: specific user, namely an individual user; user groups, namely, a group of users; and all, namely, everybody gets accesses to, such as public files. When access to append or update a file is granted to more than one users, the OS must make sure that only one user can get access to such a file at one time to maintain data consistency. Thus, such issues as mutual exclusion and lock, dead or alive, have to be addressed. 21

Record blocking Although records are the logical units of accessing a file, blocks are the units an I/O device deals with. To carry out an I/O operation, records must be grouped into blocks first. The question is thus how to block records? On most systems, blocks are of fixed length. This simplifies I/O operation, buffer allocation in main memory, as well as block organization on secondary storage. But, it may lead to internal fragmentation... We went through this in the last chapter, didn t we? (Homework 11.7) 22

Everything is a tradeoff. When considering the block size relative to the average record size, we notice that, the larger the block, the more records can be passed in with just one I/O operation; which is ideal with files of a sequential nature. On the other hand, if records are being accessed randomly, and no particular locality of reference exists, then a larger block leads to unnecessary transfer of unused data. Similar concerns are expressed in terms of internal(external) fragmentation in the memory management part. Also, larger blocks lead to larger buffers, which are not as easy to manage. 23

Actual methods With fixed blocking, fixed-length records are used, and an integral number of records are kept in a block. There may be unused space at the end of each block, leading to internal fragmentation. (512 = 4 120 + 32.) With variable-length spanned blocking, variablelength records are used and packed together to get rid of unused space. Thus, some records will span over two blocks, aided with a linker mechanism. Finally, there can be variable-length un-spanned blocking. Similar to fixed blocking technique, it can lead to wasted space, external fragmentation this time: a block too small to be useful. 24

The UNIX way The UNIX kernel views all files as stream of bytes. It categories files into four types: besides the usual ordinary files, it also has directory file, which collects directories, and is readonly for users; special files to access various I/O devices, e.g., stdin, stdout; and named pipes, the stuff that we ran into in Project 2. Files of all those types are managed by the UNIX in terms of inodes. An inode(information node) is a control structure that contains the key information needed by OS to manage a particular file. Several file names may be associated with an inode. But, an active inode is associated with only one file, and each file is controlled by exactly one inode. 25

File allocation File allocation is done based on blocks. It is dynamic, thus assigning blocks as needed. An indexed allocation method is used to keep track of each file, with index being stored in an inode. Each inode contains 39 bytes of address information, organized into 13 3-byte addresses. The first 10 refer to the first 10 blocks of the file. If there are more blocks allocated, the 11 th address points to a single indirect block, which contains pointers to succeeding blocks. If that is not enough, then the 12 th and the 13 th addresses will further point to a double, and triple, indirect block. With such a mechanism, a file can be bigger than 16GB. 26

I did not make that up. Below shows the structure we just went through. 27