Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University
File Management File management system has traditionally been considered part of the operating system. Applications are stored as files Input for application is stored in files Output from application is stored in files File management has become increasingly distributed and removed from the operating system
Objectives for a File Management System Guarantee that the data in the file are valid Optimize performance Provide I/O support for a variety of storage device types Minimize or eliminate the potential for lost or destroyed data Provide a standardized set of I/O interface routines Provide I/O support for multiple users Security
User Program Pile Sequential Indexed Sequential Indexed Hashed Logical I/O Basic I/O Supervisor Basic File System Disk Device Driver Tape Device Driver Figure 12.1 File System Software Architecture [GROS86]
Device Drivers Lowest level Communicates directly with peripheral devices Responsible for starting I/O operations on a device Processes the completion of an I/O request
Basic File System Physical I/O Deals with exchanging blocks Concerned with the placement of blocks Concerned with buffering blocks in main memory
File Structure Directory management Access method Records Blocking Physical blocks in main memory buffers Disk scheduling Physical blocks in secondary storage (disk) User & program comands Operation, File name File manipulation functions I/O Free storage management File allocation User access control File management concerns Operating system concerns Figure 12.2 Elements of File Management
File Organization Character Files Piles Sequential Files Indexed Sequential Files Indexed File Direct, or Hashed, File
Character Files File is organized as a sequence of characters Basic Operations Seek. Each character has a position in the file Seek may allow random access or be more restricted Read. User is allowed to read any number of bytes At lower level, blocks are read Overwrite Append Truncate Very common file organization
File Organization with Records A more sophisticated way is to organize files as records A record consists of a fixed set of fields (a structure) Each field has a data type and length All records do not necessarily have the same fields Character Files can be viewed as a file composed of very simples Records
Piles Data are collected in the order they arrive Purpose is to accumulate a mass of data and save it Records may have different fields This requires records to be self describing At the very least the length of each record needs to be stored Record access is by exhaustive search
Piles Variable-length records Variable set of fields Chronological order (a) Pile File Fixed-length records Fixed set of fields in fix Sequential order based (b) Seq Exhaustive index Exhaustive index Index levels n Main File
Sequential Files Fixed format used for records All records are identical Same length, same fields in same order Field names and lengths are attributes of the file Can support random access One field is the key field Uniquely identifies the record Records often stored in key sequence
Sequential Files h records fields order a) Pile File Fixed-length records Fixed set of fields in fixed order Sequential order based on key field (b) Sequential File Exhaustive index Exhaustive index Partial index Main File
Sequential Files If files are stored in key sequence, adding a record becomes expensive Requires moving records in the file To reduce the cost, records are not immediately added to the file New records are placed in a log file or transaction file Batch update is performed to merge the log file with the master file A special case needs to be made to search the log file
Indexed Sequential File Index provides provides a lookup capability to quickly search the file by key Index contains a key field and a pointer to the main file Not every single entry need be indexed Index is used to get close to desired record Sequential search is used to find desired record
Variable-length records Variable set of fields Chronological order Fixed-length records Fixed set of fields in fix Sequential order based Indexed Sequential (a) Pile File File Exhaustive index (b) Seq Exhaustive index Index levels 2 1 n Index Main File Overflow File (c) Indexed Sequential File Primary F (variable-length (d) Indexe Figure 12.3 Common File Organizati
Indexed Sequential File New records are stored in overflow file Overflow file is periodically merged with main file Main file includes pointers to entries in overflow file
Advantage of Index Example: a file contains 1 million records On average 500,000 accesses are required to find a record in a sequential file If an index contains 1000 entries it will take 500 accesses on average to find nearest key in index, followed by 500 accesses on average in the main file. A binary search can be used to find the nearest key in index with 10 accesses, followed by 500 accesses to main file
Indexed File Uses multiple indices for different key fields May contain an exhaustive index that contains one entry for every record in the main file May also contain a partial index
h records fields order Fixed-length records Fixed set of fields in fixed order Sequential order based on key field a) Pile File Exhaustive index Indexed (b) Sequential File Exhaustive index Partial index Main File Overflow File ed Sequential File Primary File (variable-length records) (d) Indexed File re 12.3 Common File Organizations
Direct File Similar to a sequential file, but records are not ordered Records are accessed directly by key Similar to an STL map Implemented using a hash table Hash function is used to find location of record in file Some sort of chaining is used to handle collisions
Implementation of Different File Types Most OS s do not provide direct support for the above file organizations All of the above organizations can be built on top of character files Piles can be used for web logs Relational databases can be built out of indexed sequential files UNIX dmb files are direct files
Files have several attributes Name Access Privileges Location File Directories To make locating files easy, files are organized into directories The Directory is a file Provides a mapping between file names and files A Directory is a sequential file sorted by the file name key.
Hierarchical Directories Directories can contain files an other directories The directories form a tree Their is some master director The root directory in UNIX Logical Organization Allows users to organize files and directories Physical Organization Allows system to quickly access traverse hierarchy
Master Directory Subirectory Subirectory Subirectory Subirectory Subirectory File File File File Figure 12.4 Tree-Structured Directory
Master Directory System User A User B User C Directory "User C" Directory "Word" Unit A Directory "User B" Draw Word Directory "User A" Directory "Draw" ABC Directory "Unit A" File "ABC" ABC File "ABC" Pathname: /User B/Word/UnitA/ABC Figure 12.5 Example of Tree-Structured Directory
Naming Conventions Files are located by following a path from root directory Several files can have the same file name as long they have unique path names Files can be referenced relative to the current directory
File Sharing In multiuser system, files may be shared among users Two issues Access rights Management of simultaneous access
None Access Rights User may not know of the existence of the file User is not allowed to read the user directory that includes the file Knowledge User can only determine that the file exists and who its owner is
Access Rights Execution The user can load and execute a program but not copy it Reading The user can read the file for any purpose, including copying and execution Appending The user can add data to the file but cannot modify or delete any of the file s contents
Access Rights Updating The user can modify, delete and add data. Changing Protection User can change access rights granted to other users Deletion User can delete file
Files have an owner Access Rights Owner has all rights previously listed Owner may grant rights to others users Specific users User groups Everybody
Simultaneous Access System may provide ability to lock a file locks may be at the file level user may be able to lock individual records Mutual exclusion and deadlock are issues for shared access
Record Blocking Records must be organized as blocks for I/O to be performed Records are the logical unit of access of a file Blocks are the unit of I/O with secondary storage Two issues: Should blocks be fixed or variable length? What should be relative size of a block compared to the average record size? Tradeoffs: Smaller block, more I/O operations needed Larger block, more records in one I/O operation
R1 R2 R3 R4 R4 R5 R6 Track 1 R6 R7 Fixed Blocking R9 R9 R10 R8 R11 R12 R13 Track 2 Variable Blocking: Spanned R1 R2 R3 R4 R5 R1 R2 R3 R4 Track 1 Track 1 R6 R5 R7 R6 R8 R7 R9 R10 R8 Track 2 Track 2 Variable Blocking: Unspanned Fixed Blocking Data Waste due to record fit to block size Gaps R1 due to hardware R2 design R3 R4 Waste R4 due to block R5 size constraint R6 from fixed record size Waste due to block fit to track size Track 1 R6 R7 R8 R9 R9 R10 R11 R12 R13 Figure 12.6 Record Blocking Methods [WIED87] Track 2 Variable Blocking: Spanned
R6 R7 R8 R9 R9 R10 R11 R12 R13 R5 R6 R7 R8 Variable Variable Blocking: Spanned Spanned Fixed Blocking Track 2 Track 2 R1 R2 R3 R4 R5 R1 R2 R3 R4 R4 R5 R6 Track 1 Track 1 R6 R6 R7 R7 R8 R9 R8 R9 R10 R9 R10 R11 R12 R13 Track 2 Track 2 Variable Blocking: Unspanned Variable Blocking: Spanned Data Waste due to record fit to block size Gaps R1 due to hardware R2 design R3 Waste R4 due to block size R5 constraint from fixed record size Waste due to block fit to track size Track 1 R6 R7 Figure 12.6 Record Blocking Methods [WIED87] R8 Variable Blocking: Unspanned R9 R10 Track 2 Data Waste due to record fit to block size
R1 R2 R3 R4 R4 R5 R6 Track 1 Variable R6 R7 Blocking: R8 R9 R9Unspanned R10 R11 R12 R13 Track 2 Variable Blocking: Spanned R1 R2 R3 R4 R5 Track 1 R6 R7 R8 R9 R10 Track 2 Variable Blocking: Unspanned Data Gaps due to hardware design Waste due to block fit to track size Waste due to record fit to block size Waste due to block size constraint from fixed record size Figure 12.6 Record Blocking Methods [WIED87]
Secondary Storage Management Space must be allocated to files Must keep track of the space available for allocation
Preallocation VS Dynamic Allocation Preallocation Need maximum size for the file at creation time Difficult to reliably estimate the maximum potential size of the file Tend to overestimate file size so we do not run out of space Dynamic Allocation Allocate space to a file in portions as needed
Portion Size Portion Size: the size of the portion allocated to a file Variable, large contiguous: Advantages: better performance, avoids waste, file allocation tables are small Disadvantage: space is hard to reuse Blocks: Advantages: greater flexibility Disadvantage: requires large tables or complex structures for their allocation
Free Space Fragmentation Similar problem as allocating dynamic partitions First Fit Best Fit Nearest Fit Choose the unused group of sufficient size that is closest to the previous allocation for the file to increase locality
Methods of File Allocation Contiguous allocation Single set of blocks is allocated to a file at the time of creation Only a single entry in the file allocation table: starting block and length of file External fragmentation will occur, compaction is needed Need to declare the size of the file at the time of creation
File A 0 1 2 3 4 5 6 7 8 9 File B 10 11 12 13 14 File Name File A File B File C File D File E File Allocation Table Start Block 2 9 18 30 26 Length 3 5 8 2 3 15 16 17 18 19 20 21 File C 22 23 24 File E 25 26 27 28 29 File D 30 31 32 33 34 Figure 12.7 Contiguous File Allocation
File A 0 1 2 3 4 File B 5 6 7 8 9 File C 10 11 12 13 14 File E File D 15 16 17 18 19 File Name File A File B File C File D File E File Allocation Table Start Block 0 3 8 19 16 Length 3 5 8 2 3 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Figure 12.8 Contiguous File Allocation (After Compaction)
Chained allocation Methods of File Allocation Allocation on basis of individual block Each block contains a pointer to the next block in the chain Only single entry in the file allocation table: starting block and length of file No external fragmentation Best for sequential files Does not take advantage of locality
File B 0 1 2 3 4 5 6 7 8 9 File Name File B File Allocation Table Start Block 1 Length 5 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Figure 12.9 Chained Allocation
File B 0 1 2 3 4 5 6 7 8 9 File Name File B File Allocation Table Start Block 0 Length 5 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Figure 12.10 Chained Allocation (After Consolidation)
Indexed allocation Methods of File Allocation File allocation table contains a separate on-level index for each file The index has one entry for each portion to the file The file allocation table contains block number for the index
File B 0 1 2 3 4 5 6 7 8 9 File Allocation Table File Name File B Index Block 24 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 1 8 3 14 28 30 31 32 33 34 Figure 12.11 Indexed Allocation with Block Portions
File B 0 1 2 3 4 5 6 7 8 9 File Allocation Table File Name File B Index Block 24 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Start Block 1 28 14 Length 3 4 1 30 31 32 33 34 Figure 12.12 Indexed Allocation with Variable-Length Portions
Bit Tables Chained Free Portions Indexing Free Block List Free Space Management
Types of files Ordinary Directory Special Named I-nodes UNIX File Management A control structure that contains the key information needed by the OS for a particular file.
Direct(0) Direct(1) Direct(2) Direct(3) Direct(4) Direct(5) Direct(6) Direct(7) Direct(8) Direct(9) single indirect double indirect triple indirect Inode address fields Blocks on disk Figure 12.13 UNIX Block Addressing Scheme
Flush the log file Log File Service Write the cache Log the transaction Read/write the file I/O Manager NTFS Driver Fault Tolerant Driver Disk Driver Read/write a mirrored or striped volume Read/write the disk Cache Manager Access the mapped file or flush the cache Load data from disk into memory Virtual Memory Manager Figure 12.15 Windows NTFS Components [CUST94]