CSC 2233: Topics in Computer System Performance and Reliability: Storage Systems! Note: some of the slides in today s lecture are borrowed from a course taught by Greg Ganger and Garth Gibson at Carnegie Mellon University 1
Who am I? 2
What makes storage systems so cool? 1. Combines so many topic areas: hardware meets OS meets networking meets distributed systems meets security meets AI meets HCI 3
What makes storage systems so cool? 1. Combines so many topic areas 2. This is where great jobs are! Designers and implementers still needed not just testing J Continuing growth area for the future The Internet is a network, but the web is a storage system Strong existing companies: EMC, NetApp, Core competency for Internet services: Google, Microsoft, Amazon, and still support for start-ups 4
What makes storage systems so cool? 1. Combines so many topic areas 2. Great careers 3. Still so much room to contribute: performance actually matters here in fact, it dominates other parts of system performance in many cases and reliability too storage management wide open and, storage starting to take over computation Big data Lots is and will be happening Solid state drives and other technologies? 5
Amdahl s Law Speedup limited to fraction improved obvious, but fundamental, observation 50 50 5 50 90% reduction in BLUE yields only 45% reduction in total What does this mean for storage systems? 6
Technology Trends Normalized value relative to 2000" 100" 10" 1" CPU Performance" Network Bandwidth" Memory Bandwidth" Disk Bandwidth" Network Latency" Disk Latency" 2000" 2002" 2004" 2006" 2008" 2010" Year" 7
Consequence: storage performance dominates 100 90 80 CPU Time I/O Time 100 90 80 CPU Time I/O Time 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 1 2 3 4 5 0 1 2 3 4 5 Example of Amdahl s Law 8
Storage systems: fun quotes I/O certainly has been lagging in the last decade Seymour Cray, 1976 Also, I/O needs a lot of work David Kuck, 1988 In 3 to 5 years, we will start seeing servers as peripherals to storage SUN Chief Technology Officer, 1998 Scalable I/O is perhaps the most overlooked area of high-performance computing R&D Suggested R&D topic report for 2005-2009 9
Logistics & Administratives Class time: Thu 10am 12pm Office hours: By appointment Class web page www.cs.toronto.edu/~bianca/csc2233.html 10
Grading 30% class participation Participation in class discussions (Read all papers prior to class) Class presentation of research paper 70% class project No exams, no homework, no paper summaries 11
Class project Can be done in team of two or alone Start looking for a partner now! On a research project you pick I will suggest possible projects (see course web page) You can propose your own Start thinking about it soon, proposal due in ~3 weeks Output: workshop quality research paper (10-12 pages) Even better: conference quality paper Use latex template on course web page All reports will be published as tech-report 12
Class project Output: workshop quality research paper (10-12 pages) I will help you get there --- multiple milestones: Project proposal Related work Status reports Final report And meetings with instructor 13
Topic of class project Project topic must be related to the topic of the class Is it OK to have overlap with my research / my course project in another course? You cannot get academic credit for the same piece of work twice 14
Paper presentation Each of you will present one or two papers in class Format of the presentation: 30 min presentation of paper 5-15 min paper review Good points Bad points 10 min class discussion that you lead! Prepare questions! 15
Paper presentation What I do not want: A long laundry list of all things the paper did What I do want: A lecture style presentation of the paper Including background material your fellow class mates might need to understand the paper A critical discussion of the paper Strength & Weaknesses Prepare questions! 16
Purpose of presentation Wrong answers: To give a verbal version of the paper, cramming all its content into 30 min To impress people with your technical depth and thoroughness In fact, no one cares about these things The goal is to filter out the main points of the paper and present them well By the end, everybody in the audience should remember 2-3 takehome messages 17
What s on each slide? Each slide should have one basic point There should NOT be tons of text Use sentence fragments Use pictures everywhere you possibly can! A picture says more than 1000 words Saves text and thus slides Much easier to process 18
Rest of today: Some review 19
What are storage systems all about? Memory/storage hierarchy 20
Memory/storage hierarchies Balancing performance with cost Small memories are fast but expensive Large memories are slow but cheap Exploit locality to get the best of both worlds locality = re-use/nearness of accesses allows most accesses to use small, fast memory Capacity Performance 21
Example memory hierarchy values Notice the huge access time gap between DRAM and disk Where will SSDs go? 22
What are storage systems all about? Memory/storage hierarchy Combining many technologies to balance costs/benefits No longer the focal point of storage system design Still important though Maybe more so with new technologies arriving on the market 23
What are storage systems all about? Memory/storage hierarchy Combining many technologies to balance costs/benefits No longer the focal point of storage system design Still important though Maybe more so with new technologies arriving on the market Persistence Storing data for lengthy periods of time To be useful, it must also be possible to find it again later this brings in data organization, consistency, and management issues This is where the serious action is and it does relate to the memory/storage hierarchy 24
Why persistence is important Some statistics: Among companies who lose data in a disaster, 50% never re-open and 90% are out of business within two years Even smaller incidents can be costly Reproducing some tens of megabytes of accounting data can take several weeks and cost tens of thousands of dollars Bad PR! 25
What is a storage system: Big Picture Application Bob1 Bob2 Bob2 Bob3 Bob4 The storage system Application gives keeps the data objects data objects & their and returns one upon IDs to storage request (by ID) Storage System Bob1 Bob2 Bob3 Bob4 26
Storage Systems & Interfaces What is a Storage System? Hardware (devices, controllers, interconnect) and Software (file system, device drivers, firmware) dedicated to providing management of and access to persistent storage. One view: defined by collection of interfaces 27
Storage Software Interfaces Program File system Device driver I/O controller Physical Media High level of abstraction No abstraction Understands files and directories HDD understands platters, cylinders, tracks, sectors 28
OS sees storage as linear array of blocks 5 6 7 12 23 OS s view of storage device Common disk block size: 512 bytes Number of blocks: device capacity / block size Common OS-to-storage requests defined by few fields R/W, block #, # of blocks, memory source/dest 29
OS sees storage as linear array of blocks 5 6 7 12 23 OS s view of storage device How does the OS implement the abstraction of files and directories on top of this logical array of disk blocks? 30
File System Implementation File systems define a block size (e.g., 4KB) Disk space is allocated in granularity of blocks Notice the terminology clash here: block is used for different things by the file system and the disk interface and this kind of thing is common in storage systems!! Superblock Bitmap Default usage of LBN space Space to store files and directories 31
File System Implementation File systems define a block size (e.g., 4KB) Disk space is allocated in granularity of blocks A Master Block determines location of root directory (aka superblock) Always at a well-known disk location Often replicated across disk for reliability A free map determines which blocks are free, allocated Usually a bitmap, one bit per block on the disk Also stored on disk, cached in memory for performance Remaining disk blocks used to store files (and dirs) There are many ways to do this Superblock Bitmap Default usage of LBN space Space to store files and directories 32
Disk Layout Strategies Files span multiple blocks How do you allocate the blocks for a file? 1. Contiguous allocation 33
Contiguous Allocation Disk 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 directory File Name Start Blk Length File A 2 3 File B 9 5 File C 18 8 File D 27 2 20 21 22 23 24 25 26 27 28 29 34
Disk Layout Strategies Files span multiple disk blocks How do you find all of the blocks for a file? 1. Contiguous allocation Like memory Fast, simplifies directory access Inflexible, causes fragmentation, needs compaction 2. Linked, or chained, structure 35
Linked Allocation 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 directory File Name Start Blk File B 1 22 Last Blk 20 21 22 23 24 25 26 27 28 29 36
Disk Layout Strategies Files span multiple disk blocks How do you find all of the blocks for a file? 1. Contiguous allocation Like memory Fast, simplifies directory access Inflexible, causes fragmentation, needs compaction 2. Linked, or chained, structure Each block points to the next, directory points to the first Good for sequential access, bad for all others 3. Indexed structure (indirection, hierarchy) An index block contains pointers to many other blocks Handles random better, still good for sequential May need multiple index blocks (linked together) 37
Indexed Allocation: Unix Inodes Unix inodes implement an indexed structure for files Each file is represented by an inode Each inode contains 15 block pointers First 12 are direct block pointers (e.g., 4 KB data blocks) Then single, double, and triple indirect 0 1 12 13 14 38
Unix Inodes and Path Search Unix Inodes are not directories They describe where on the disk the blocks for a file are placed Directories are files, so inodes also describe where the blocks for directories are placed on the disk Directory entries map file names to inodes To open /one, use Master Block to find inode for / on disk and read inode into memory inode allows us to find data block for directory / Read /, look for entry for one This entry locates the inode for one Read the inode for one into memory The inode says where first data block is on disk Read that block into memory to access the data in the file 39
Data and Inode Placement Original Unix FS had two placement problems: 1. Data blocks allocated randomly in aging file systems Blocks for the same file allocated sequentially when FS is new As FS ages and fills, need to allocate into blocks freed up when other files are deleted Problem: Deleted files essentially randomly placed So, blocks for new files become scattered across the disk 2. Inodes allocated far from blocks All inodes at beginning of disk, far from data Traversing file name paths, manipulating files, directories requires going back and forth from inodes to data blocks Both of these problems generate many long seeks Superblock Default usage of LBN space Bitmap Inodes 2015-09-23 CSC369 Data Blocks 40
Cylinder Groups BSD Fast File System (FFS) addressed placement problems using the notion of a cylinder group (aka allocation groups in lots of modern FS s) Disk partitioned into groups of cylinders Data blocks in same file allocated in same cylinder group Files in same directory allocated in same cylinder group Inodes for files allocated in same cylinder group as file data blocks Superblock Cylinder Group Cylinder group organization 41
More FFS solutions Small blocks (1K) in orig. Unix FS caused 2 problems: Low bandwidth utilization Small max file size (function of block size) => fix using a larger block (4K) Problem: Media failures Replicate master block (superblock) Problem: Device oblivious Parameterize according to device characteristics 42
File Buffer Cache Applications exhibit significant locality for reading and writing files Idea: Cache file blocks in memory to capture locality Issues This is called the file buffer cache Cache is system wide, used and shared by all processes Reading from the cache makes a disk perform like memory Even a 4 MB cache can be very effective The file buffer cache competes with VM (tradeoff here) Like VM, it has limited size Need replacement algorithms 43
Read Ahead Many file systems implement read ahead FS predicts that the process will request next block FS goes ahead and requests it from the disk This can happen while the process is computing on previous block Overlap I/O with execution When the process requests block, it will be in cache Compliments the on-disk cache, which also is doing read ahead For sequentially accessed files, can be a big win Unless blocks for the file are scattered across the disk File systems try to prevent that, though (during allocation) 44
Caching Writes On a write, some applications assume that data makes it through the buffer cache and onto the disk As a result, writes are often slow even with caching Several ways to compensate for this write-behind Maintain a queue of uncommitted blocks Periodically flush the queue to disk Unreliable Battery backed-up RAM (NVRAM) As with write-behind, but maintain queue in NVRAM Expensive Log-structured file system Always write contiguously at end of previous write 45
Remainder of the course Other optimizations: Other file system designs: log-structured, journaling Devices: Hard disks & Solid state drives Reliability & fault tolerance Performance modeling Distributed file systems: Google & Netapp Parallel file systems: GPFS & PanFS Storage for data-intensive computing 46