Distributed File Systems Part I Daniel A. Menascé File Naming Issues in Centralized File Systems c:\courses\cs571\procs.ps (MS-DOS) /usr/menasce/courses/cs571/processes.ps (UNIX) File Structure bitstream or bytestream record oriented (record = key + data) indexed (e.g., B*-trees (IBM VSAM) ) 1
B*-Tree Files a > b > index nodes leaf nodes Issues in Centralized File Systems File Types text (e.g., ASCII) binary (e.g., executables, images, etc.) Directory Structures flat hierarchical (tree) graph 2
hierarchical Directories graph menasce menasce courses papers courses papers CS571 INFS601 CS571 INFS601 intro.ps procs.ps intro.ps procs.ps intro.ps procs.ps grinfs601.xls grcs571.xls grinfs601.xls grcs571.xls hierarchical Directories menasce courses papers CS571 INFS601 ~menasce/courses/cs571/intro.ps ~menasce/courses/infs601/intro.ps intro.ps procs.ps grcs571.xls intro.ps procs.ps grinfs601.xls 3
Directories graph menasce ~menasce/courses/cs571/intro.ps ~menasce/courses/infs601/intro.ps courses papers CS571 INFS601 intro.ps procs.ps grcs571.xls grinfs601.xls Issues in Centralized File Systems Allocation of File to Disk Blocks contiguous linked indexed i-node (UNIX) 4
Contiguous Allocation of File to Disk Blocks 101102103 150 0 1 2 49 start address = 101 no. of used blocks = 3 last reserved block = 150 simple mapping bad use of disk space hard to expand if maximum allocation is exceeded Linked Allocation of File to Disk Blocks 154 35 237 0 1 2 first block address = 154 last block address = 237 number of blocks = 3 good use of disk space bad performance for direct access (e.g. reading the k-thblock requires reading k blocks) directory info 5
Indexed Allocation of File to Disk Blocks 0 1 2 3 154 35 237-1... 511-1 (index in main memory) disk 35 154 237 efficient direct access good use of disk space inadequate for very large files (very large index). UNIX I-node item type (e.g., file, directory) item size in bytes time the file s inode was last modified time the file s contents was last modified time the file was last accessed reference count: number of file names file s owner (a UID) file s group (a GID) file s mode bits (r,w,x) Pointers to Data on Disk 6
UNIX Directories... foo bar notes doc notes and doc are the same file 0 1 2 3 I-node Allocation of File to Disk Blocks file attributes SIP= single indirect pointer DIP= double indirect pointer TIP= triple indirect pointer 509 510 511 SIP DIP TIP I-node 7
I-node Allocation of File to Disk Blocks Efficient access to data blocks of small (from i-node), medium (from single indirect blocks), large (from double indirect blocks), and huge (from triple indirect block) files. Maximum file size (assuming 512 byte blocks and 4 bytes per pointer): (120+128+128**2+128***3) * 512 1 GByte Security in Centralized Systems What is security? Storing protection data. UNIX File Protection. Authentication methods. Users, Groups, and the superuser. 8
What Is Security? Confidentiality: protecting information from being read or copied by unauthorized users. Data Integrity: protecting information from being deleted or altered without permission. Availability: avoiding denial of service. Access Control: controlling who has access to the system. Accountability: keeping track of unauthorized accesses on an audit trail. Storing Protection Data SeCurity Protection Matrix Access Control Lists Capabilities usr1 usr n file 1 file 2 file m rw r rwx - rw - 9
Access Control Lists and Capabilities usr1 usr n file 1 file 2 file m rw r rwx - rw - capabilities: list of objects and access rights per user. access control list: list of users and access rights per object. UNIX Protection Model usr1 usr n file 1 file 2 file m rw r rwx - rw - access control list: list of users and access rights per object. UNIX implements a coarse grain version of ACLs. Users are divided into three groups: - owner - group - world Protection bits (r,w,x) are associated with each group. 10
Protection Bits for Files drwx--s--- 2 menasce 512 Nov 4 13:49 grades/ -rw-rw-r-- 1 menasce 684 Nov 4 13:48 project_ideas -rw------- 1 menasce 509 Nov 4 13:48 student_mail -rw-r--r-- 1 menasce 3063 Nov 4 13:49 syllabus other s rights group rights owner rights entry type (- file; d directory) Authentication Methods Something that you know: password. Something that you have: a card key. Something that you are: fingerprint Combination: card key and password card key and weight 11
Passwords Passwords are stored in password files (/etc/passwd in UNIX) in an encrypted form (one-way encryption). Users should select hard to crack passwords: Use combinations of lower and upper case characters, punctuation signs (!$#?;:), and numbers. Good password: A$1c;:mE Bad password: sunshine Easy to remember: base password on a phrase. Change passwords regularly Users, User IDs and the Superuser Every user in UNIX has a username and a user identifier (UID) which is a number. Common users in UNIX systems: root: superuser performs accounting and low-level functions. daemon: handles network aspects agent: handles e-mail guest: for visitors ftp: for anonymous ftp. 12
Groups and Group Identifiers Every UNIX user belongs to one or more groups. Groups have a group name and a group ID (GID). Each user belongs to the primary group stored in the /etc/passwd file All groups are listed in the /etc/group file in UNIX Groups and Group Identifiers peter student group (gid 40) root john mary susan jill admin group (gid 0) users group (gid 104) ftp ftp group (gid 10) 13
The Superuser Every UNIX system has a special user with UI = 0 and usually called root. root is used by the OS to accomplish its basic functions root has access to all system resources! More than one user can be the superuser (they just need to have UID = 0). The superuser is the main security weakness in UNIX. Distributed File Systems File Service Interface: - upload/download model client get file put file server - entire files are retrieved from the server, and accessed at the client. - once the client is done, the file is stored back at the server. - typical of mass storage systems: e.g. Unitree. 14
Distributed File Systems File Service Interface: - remote access model client read block write block server - only the needed blocks of files are retrieved from the server. - once the client is done with a block, it is written back to the server. - example: NFS Distributed File Systems: directory service interface root at client 1 file server 1: A A D B C B C E F file server 2: D A root at client 2 D E F B C E F 15
Distributed File Systems: directory service interface root at client 1 file server 1: A A D B C B C E F file server 2: D A root at client 2 D E F B C E F Distributed File Systems: naming Location transparency: the path name does not reveal the file location. e.g.: /servera/dir1/dir2/x does not say where the server is located. Location independence: files can be moved and all references to them continue to be valid. e.g.: /servera/dir1/dir2/x is not location independent. 16
Distributed File Systems: two-level naming Symbolic Names: human readable. e.g.: /courses/slides/files.ps Binary names: machine readable names. Easier to manipulate. e.g.: UNIX i-node, or server IP address:i-node number Symbolic to binary name mapping may be one to many in a distributed system (file replication). Semantics of File Sharing UNIX semantics: used in centralized systems. - a read that follows a write sees the value written by the write. write x to block a read block a t1 t2 time x x x a a a get x 17
Semantics of File Sharing UNIX semantics: - a read that follows two writes in quick succession sees the result of the last write. write x to block a write x to block a read block a t1 t2 t3 get x x x x x x Semantics of File Sharing Issues in Distributed File Systems Single File Server - No client caching - easy to implement UNIX semantics Client File Caching - improves performance by decreasing demand at the server - updates to the cached file are not seen by other clients. 18
Semantics of File Sharing Session Semantics: (relaxed semantics) - changes to an open file are only visible to the process that modified the file. - when the file is closed, changes are visible to other processes closed file is sent back to the server. Semantics of File Sharing Session Semantics: - what if two or more clients are caching and modifying a file? final result depends on who closes last use an arbitrary rule to decide who wins. - file pointer sharing not possible when a process and its children run on different machines 19
Semantics of File Sharing No File Updates Semantics: - files are never updated. - allowed file operations: CREATE and READ. - files are atomically replaced in the directory. - Problem: what if two clients want to replace a file at the same time? take the last one or use any nondeterministic rule. Semantics of File Sharing Transaction Semantics: - all file changes are delimited by a Begin and End transaction. - all file requests within the transaction are carried out in order. - the complete transaction is either carried out completely or not at all (atomicity). 20
Semantics of File Sharing UNIX Semantics every operation is instantly visible to others Session Semantics no changes visible until file is closed. No Updates Semantics no file updates are allowed. Transactions atomic updates. 21