EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors
|
|
|
- Jared Logan
- 10 years ago
- Views:
Transcription
1 EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang Joint work with Can Sar, Paul Twohey, Ben Pfaff, Dawson Engler and Madan Musuvathi
2 Mom, Google Ate My GMail! 2
3 3
4 Why check storage systems? Storage system errors: some of the most serious machine crash data loss data corruption Code complicated, hard to get right Conflicting goals: speed, reliability (recover from any failures and crashes) Typical ways to find these errors: ineffective Manual inspection: strenuous, erratic Randomized testing (e.g. unplug the power cord): blindly throwing darts Error report from mad users 4
5 Goal: build tools to automatically find storage system errors Sub-goal: comprehensive, lightweight, general 5
6 EXPLODE [OSDI06] Comprehensive: adapt ideas from model checking General, real: check live systems Can run (on Linux, BSD), can check, even w/o source code Fast, easy Check a new storage system: 200 lines of C++ code Port to a new OS: 1 kernel module + optional modification Effective 17 storage systems: 10 Linux FS, Linux NFS, Soft-RAID, 3 version control, Berkeley DB, VMware Found serious data-loss in all Subsumes FiSC [OSDI04, best paper] 6
7 Outline Overview Checking process Implementation Example check: crashes during recovery are recoverable Results 7
8 Long-lived bug fixed in 2 days in the IBM Journaling file system (JFS) Serious Loss of an entire FS! Fixed in 2 days with our complete trace Hard to find 3 years old, ever since the first version Dave Kleikamp (IBM JFS): I really appreciate your work finding and recreating this bug. I'm sure this has bitten us before, but it's usually hard to go back and find out what causes the file system to get messed up so bad 8
9 Events to trigger the JFS bug creat( /f ); flush /f 5-char system call, not a typo crash! fsck.jfs File system recovery utility, run after reboot Buffer Cache (in mem) Disk / / Orphan file removed. Legal behavior for file systems f 9
10 Events to trigger the JFS bug creat( /f ); bug under low mem (design flaw) flush / crash! fsck.jfs File system recovery utility, run after reboot Buffer Cache (in mem) Disk / f dangling / pointer! fix by zeroing, entire FS gone! 10
11 Overview User-written Checker creat( /f ) JFS EXPLODE Runtime / f EKM old Linux Kernel Hardware void mutate() { creat( /old ); sync(); creat( /f ); check_crash_now(); } / f / f fsck.jfs fsck.jfs fsck.jfs void check() { int fd = open( /old, O_RDONLY); if (fd < 0) error( lost old! ); close(fd); } Toy checker: crash after creat( /f ) should not lose any old file that is already User-written persistent checker: on disk can be either very sophisticated or very simple crash-disk check( ) old check( ) check( ) old f User code Our code EKM = EXPLODE kernel module 11
12 Outline Overview Checking process Implementation Example check: crashes during recovery are recoverable Results 12
13 One core idea from model checking: explore all choices Bugs are often triggered by corner cases How to find? Drive execution down to these tricky corner cases Principle When execution reaches a point in program that can do one of N different actions, fork execution and in first child do first action, in second do second, etc. Result: rare events appear as often as common ones 13
14 Crashes (Overview slide revisit) User-written Checker JFS old EXPLODE Runtime / f EKM Linux Kernel Hardware crash-disk f / / f fsck.jfs fsck.jfs fsck.jfs check( ) old check( ) check( ) old f 14
15 External choices Fork and do every possible operation a a b a c Explore generated states as well Users write code to check FS valid. EXPLODE amplifies a 15
16 Internal choices Fork and explore all internal choices a a b a struct block* read_block (int i) { struct block *b; if ((b = cache_lookup(i))) return b; return disk_read (i); } a b 16
17 Users expose choices using choose(n) To explore N-choice point, users instrument code using choose(n) (also used in other model checkers) choose(n): N-way fork, return K in K th kid struct block* read_block (int i) { struct block *b; if ((b = cache_lookup(i))) return b; return disk_read (i); } cache_lookup (int i) { if(choose(2) == 0) return NULL; // normal lookup } Optional. Instrumented only 7 places in Linux 17
18 Crash X External X Internal a b a a c a 18
19 Speed: skip same states a b a b c a a c Abstract and hash a state, discard if seen. a 19
20 Outline Overview Checking process Implementation FiSC, File System Checker, [OSDI04], best paper EXPLODE, storage system checker, [OSDI06] Example check: crashes during recovery are recoverable Results 20
21 Checking process S0 S0 = checkpoint() enqueue(s0) while(queue not empty){ S = dequeue() for each action in S { restore(s) do action S = checkpoint() if(s is new) enqueue(s ) } } How to checkpoint and restore a live OS? 21
22 FiSC: jam OS into tool User-written Checker JFS User Mode Linux FiSC Linux Kernel Hardware User code Our code Pros Comprehensive, effective No model, check code Checkpoint and restore: easy Cons Intrusive. Build fake environment. Hard to check anything new. Months for new OS, 1 week for new FS Many tricks, so complicated that we won best paper OSDI 04 22
23 EXPLODE: jam tool into OS User-written Checker JFS User-written Checker EXPLODE Runtime User Mode Linux FiSC Linux Kernel Hardware JFS EKM Linux Kernel Hardware User code Our code EKM = EXPLODE kernel module 23
24 EKM lines of code OS Lines of code Linux 2.6 1,915 FreeBSD 6.0 1,210 EXPLODE kernel modules (EKM) are small and easy to write 24
25 How to checkpoint and restore a live OS kernel? Checker JFS Linux Kernel Hardware EXPLODE Runtime EKM Hard to checkpoint live kernel memory Virtual machine? No VMware: no source Xen: not portable heavyweight There s a better solution for storage systems 25
26 Checkpoint: save actions instead of bits state = list of actions checkpoint S = save (creat, cache miss) restore = re-initialize, creat, cache miss re-initialize = unmount, mkfs Utility that clears Utility to Redo-to-restore-state in-mem state of a idea create used an in model storage checking system to trade empty time FS for space (stateless search). S0 creat S We use it only to reduce intrusiveness 26
27 Deterministic replay Storage system: isolated subsystem Non-deterministic kernel scheduling decision Opportunistic fix: priorities Non-deterministic interrupt Fix: use RAM disks, no interrupt for checked system Non-deterministic kernel choose() calls by other code Fix: filter by thread IDs. No choose() in interrupt Worked well in practice Mostly deterministic Worst case: auto-detect & ignore non-repeatable errors 27
28 Outline Overview Checking process Implementation Example check: crashes during recovery are recoverable Results 28
29 Why check crashes during recovery? Crashes are highly correlated Often caused by kernel bugs, hardware errors Reboot, hit same bug/error What to check? fsck once == fsck & crash, re-run fsck fsck(crash-disk) to completion, /a recovered fsck(crash-disk) and crash, fsck, /a gone Bug! Powerful heuristic, found interesting bugs (wait until results) 29
30 How to check crashes during recovery? crash-disk fsck.jfs EXPLODE Runtime fsck.jfs fsck.jfs fsck.jfs same as? same as? same as? crash-crash-disk Problem: N blocks 2^N crash-crash-disks. Too many! Can prune many crash-crash-disks 30
31 Simplified example fsck(000) 3-block disk, B1, B2, B3 each block is either 0 or 1 crash-disk = 000 (B1 to B3) Read(B1) = 0 Write(B2, 1) Write(B3, 1) buffer cache: B2=1 buffer cache: B2=1, B3=1 Read(B3) = 1 Write(B1, 1) buffer cache: B2=1, B3=1, B1=1 fsck(000) =
32 Naïve strategy: 7 crash-crash-disks crash-disk = 000 fsck(000) = 111 Read(B1) = 0 Write(B2, 1) Write(B3, 1) Read(B3) = 1 Write(B1, 1) buffer cache: B2=1, B3=1, B1= {B2=1} fsck(010) == 111? fsck(001) == 111? fsck(011) == 111? fsck(100) == 111? fsck(110) == 111? fsck(101) == 111? fsck(111) == 111? 32
33 Optimization: exploiting determinism crash-disk = 000 fsck(000) = 111 Read(B1) = 0 Write(B2, 1) Write(B3, 1) Read(B3) = 1 Write(B1, 1) For all practical purposes, fsck is deterministic read same blocks write same blocks {B2=1} fsck(010) == 111? fsck(000) doesn t read B2 So, fsck(010) =
34 What blocks does fsck(000) actually read? crash-disk = 000 fsck(000) = 111 Read(B1) = 0 Write(B2, 1) Write(B3, 1) Read(B3) = 1 Write(B1, 1) Read of B3 will get what we just wrote. Can t depend on B3 fsck(000) reads/depends only on B1. It doesn t matter what we write to the other blocks. fsck(0**) =
35 Prune crash-crash-disks matching 0** crash-disk = 000 buffer cache: B2=1, B3=1, B1=1 fsck(000) = 111 Read(B1) = 0 Write(B2, 1) Write(B3, 1) Read(B3) = 1 Write(B1, 1) fsck(010) == 111? fsck(001) == 111? fsck(011) == 111? fsck(100) == 111? fsck(110) == 111? fsck(101) == 111? fsck(111) == 111? Can further optimize using this and other ideas 35
36 Outline Overview Checking process Implementation Example check: crashes during recovery are recoverable Results 36
37 Bugs caused by crashes during recovery Found data-loss bugs in all three FS that use logging (ext3, JFS, ReiserFS), total 5 Strict order under normal operation: First, write operation to log, commit Second, apply operation to actual file system Strict (reverse) order during recovery: First, replay log to patch actual file system Second, clear log No order corrupted FS and no log to patch it! 37
38 Bug in fsck.ext3 recover_ext3_journal( ) { // retval = -journal_recover(journal); // // clear the journal e2fsck_journal_release( ) // } journal_recover( ) { // replay the journal // // sync modifications to disk fsync_no_super ( ) } // Error! Empty macro, doesn t sync data! #define fsync_no_super(dev) do {} while (0) Code directly adapted from the kernel But, fsync_no_super defined as NOP: hard to implement 38
39 FiSC Results (can reproduce in EXPLODE) Error Type VFS ext2 ext3 JFS ReiserFS total Data loss N/A N/A False clean N/A N/A Security Crashes Other Total in total, 21 fixed, 9 of the remaining 11 confirmed 39
40 EXPLODE checkers lines of code and errors found Storage System Checked Checker Bugs Storage applications Transparent subsystems 10 file systems 5, CVS 68 1 Subversion 69 1 EXPENSIVE Berkeley DB RAID FS NFS FS 4 VMware GSX/Linux FS 1 Total 6, bugs per 1,000 lines of checker code 40
41 Related work FS Testing Static (compile-time) analysis Software model checking 41
42 Conclusion EXPLODE Comprehensive: adapt ideas from model checking General, real: check live systems in situ, w/o source code Fast, easy: simple C++ checking interface Results Checked 17 widely-used, well-tested, real-world storage systems: 10 Linux FS, Linux NFS, Soft-RAID, 3 version control, Berkeley DB, VMware Found serious data-loss bugs in all, over 70 bugs in total Many bug reports led to immediate kernel patches 42
A crash course on some recent bug finding tricks.
A crash course on some recent bug finding tricks. Junfeng Yang, Can Sar, Cristian Cadar, Paul Twohey Dawson Engler Stanford Background Lineage Thesis work at MIT building a new OS (exokernel) Spent last
EXPLODE: A Lightweight, General Approach to Finding Serious Errors in Storage Systems
EXPLODE: A Lightweight, General Approach to Finding Serious Errors in Storage Systems Junfeng Yang, Paul Twohey, Ben Pfaff, Can Sar, Dawson Engler Computer Systems Laboratory Stanford University Stanford,
CS3210: Crash consistency. Taesoo Kim
1 CS3210: Crash consistency Taesoo Kim 2 Administrivia Quiz #2. Lab4-5, Ch 3-6 (read "xv6 book") Open laptop/book, no Internet 3:05pm ~ 4:25-30pm (sharp) NOTE Lab6: 10% bonus, a single lab (bump up your
Using Model Checking to Find Serious File System Errors
Using Model Checking to Find Serious File System Errors Junfeng Yang, Paul Twohey, Dawson Engler {junfeng, twohey, engler}@cs.stanford.edu Computer Systems Laboratory Stanford University Stanford, CA 94305,
File Systems Management and Examples
File Systems Management and Examples Today! Efficiency, performance, recovery! Examples Next! Distributed systems Disk space management! Once decided to store a file as sequence of blocks What s the size
COS 318: Operating Systems
COS 318: Operating Systems File Performance and Reliability Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Topics File buffer cache
Linux Driver Devices. Why, When, Which, How?
Bertrand Mermet Sylvain Ract Linux Driver Devices. Why, When, Which, How? Since its creation in the early 1990 s Linux has been installed on millions of computers or embedded systems. These systems may
Review: The ACID properties
Recovery Review: The ACID properties A tomicity: All actions in the Xaction happen, or none happen. C onsistency: If each Xaction is consistent, and the DB starts consistent, it ends up consistent. I solation:
UNIX File Management (continued)
UNIX File Management (continued) OS storage stack (recap) Application FD table OF table VFS FS Buffer cache Disk scheduler Device driver 2 Virtual File System (VFS) Application FD table OF table VFS FS
Replication on Virtual Machines
Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004 Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism
1. Introduction to the UNIX File System: logical vision
Unix File System 1. Introduction to the UNIX File System: logical vision Silberschatz, Galvin and Gagne 2005 Operating System Concepts 7 th Edition, Feb 6, 2005 Logical structure in each FS (System V):
File System Reliability (part 2)
File System Reliability (part 2) Main Points Approaches to reliability Careful sequencing of file system opera@ons Copy- on- write (WAFL, ZFS) Journalling (NTFS, linux ext4) Log structure (flash storage)
Outline. Failure Types
Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 11 1 2 Conclusion Acknowledgements: The slides are provided by Nikolaus Augsten
Recovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability
Database Management Systems Winter 2004 CMPUT 391: Implementing Durability Dr. Osmar R. Zaïane University of Alberta Lecture 9 Chapter 25 of Textbook Based on slides by Lewis, Bernstein and Kifer. University
Using Model Checking to Find Serious File System Errors
Using Model Checking to Find Serious File System Errors Junfeng Yang, Paul Twohey, Dawson Engler {junfeng, twohey, engler}@cs.stanford.edu Computer Systems Laboratory Stanford University Stanford, CA 94305,
Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies
Microkernels & Database OSs Recovery Management in QuickSilver. Haskin88: Roger Haskin, Yoni Malachi, Wayne Sawdon, Gregory Chan, ACM Trans. On Computer Systems, vol 6, no 1, Feb 1988. Stonebraker81 OS/FS
Virtualization. Explain how today s virtualization movement is actually a reinvention
Virtualization Learning Objectives Explain how today s virtualization movement is actually a reinvention of the past. Explain how virtualization works. Discuss the technical challenges to virtualization.
COS 318: Operating Systems. Virtual Memory and Address Translation
COS 318: Operating Systems Virtual Memory and Address Translation Today s Topics Midterm Results Virtual Memory Virtualization Protection Address Translation Base and bound Segmentation Paging Translation
Virtual Servers. Virtual machines. Virtualization. Design of IBM s VM. Virtual machine systems can give everyone the OS (and hardware) that they want.
Virtual machines Virtual machine systems can give everyone the OS (and hardware) that they want. IBM s VM provided an exact copy of the hardware to the user. Virtual Servers Virtual machines are very widespread.
COS 318: Operating Systems
COS 318: Operating Systems OS Structures and System Calls Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Outline Protection mechanisms
Practical Online Filesystem Checking and Repair
Practical Online Filesystem Checking and Repair Daniel Phillips Samsung Research America (Silicon Valley) [email protected] 1 2013 SAMSUNG Electronics Co. Why we want online checking: Fsck
Database Concurrency Control and Recovery. Simple database model
Database Concurrency Control and Recovery Pessimistic concurrency control Two-phase locking (2PL) and Strict 2PL Timestamp ordering (TSO) and Strict TSO Optimistic concurrency control (OCC) definition
DualFS: A New Journaling File System for Linux
2007 Linux Storage & Filesystem Workshop February 12-13, 13, 2007, San Jose, CA DualFS: A New Journaling File System for Linux Juan Piernas SDM Project Pacific Northwest National
Transactions and Recovery. Database Systems Lecture 15 Natasha Alechina
Database Systems Lecture 15 Natasha Alechina In This Lecture Transactions Recovery System and Media Failures Concurrency Concurrency problems For more information Connolly and Begg chapter 20 Ullmanand
A Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
Lecture 18: Reliable Storage
CS 422/522 Design & Implementation of Operating Systems Lecture 18: Reliable Storage Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of
Virtualization. Dr. Yingwu Zhu
Virtualization Dr. Yingwu Zhu What is virtualization? Virtualization allows one computer to do the job of multiple computers. Virtual environments let one computer host multiple operating systems at the
Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications
Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Crash Recovery Part 2 (R&G ch. 18) Write-Ahead Log Checkpoints Logging Schemes
EMC DATA DOMAIN DATA INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY
White Paper EMC DATA DOMAIN DATA INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY A Detailed Review Abstract No single mechanism is sufficient to ensure data integrity in a storage
Virtualization in Linux
Virtualization in Linux Kirill Kolyshkin September 1, 2006 Abstract Three main virtualization approaches emulation, paravirtualization, and operating system-level virtualization are covered,
Encrypt-FS: A Versatile Cryptographic File System for Linux
Encrypt-FS: A Versatile Cryptographic File System for Linux Abstract Recently, personal sensitive information faces the possibility of unauthorized access or loss of storage devices. Cryptographic technique
Virtual Private Systems for FreeBSD
Virtual Private Systems for FreeBSD Klaus P. Ohrhallinger 06. June 2010 Abstract Virtual Private Systems for FreeBSD (VPS) is a novel virtualization implementation which is based on the operating system
W4118: segmentation and paging. Instructor: Junfeng Yang
W4118: segmentation and paging Instructor: Junfeng Yang Outline Memory management goals Segmentation Paging TLB 1 Uni- v.s. multi-programming Simple uniprogramming with a single segment per process Uniprogramming
CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study
CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what
A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 16 Fixing Windows Problems
A+ Guide to Managing and Maintaining Your PC, 7e Chapter 16 Fixing Windows Problems Objectives Learn what to do when a hardware device, application, or Windows component gives a problem Learn what to do
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY. 6.828 Operating System Engineering: Fall 2005
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.828 Operating System Engineering: Fall 2005 Quiz II Solutions Average 84, median 83, standard deviation
Providing High Availability in Cloud Storage by decreasing Virtual Machine Reboot Time Shehbaz Jaffer, Mangesh Chitnis, Ameya Usgaonkar NetApp Inc.
Providing High Availability in Cloud Storage by decreasing Virtual Machine Reboot Time Shehbaz Jaffer, Mangesh Chitnis, Ameya Usgaonkar NetApp Inc. Abstract A Virtual Storage Architecture (VSA) is a storage
Network File System (NFS) Pradipta De [email protected]
Network File System (NFS) Pradipta De [email protected] Today s Topic Network File System Type of Distributed file system NFS protocol NFS cache consistency issue CSE506: Ext Filesystem 2 NFS
What is going on in Operating Systems Research: The OSDI & SOSP Perspective. Dilma M. da Silva IBM TJ Watson Research Center, NY [email protected].
What is going on in Operating Systems Research: The OSDI & SOSP Perspective Dilma M. da Silva IBM TJ Watson Research Center, NY [email protected] 16 July 2006 Slide 2 Main OS conferences OSDI Operating
Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition
Chapter 11: File System Implementation 11.1 Silberschatz, Galvin and Gagne 2009 Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation
Recovery Principles in MySQL Cluster 5.1
Recovery Principles in MySQL Cluster 5.1 Mikael Ronström Senior Software Architect MySQL AB 1 Outline of Talk Introduction of MySQL Cluster in version 4.1 and 5.0 Discussion of requirements for MySQL Cluster
Distributed System Monitoring and Failure Diagnosis using Cooperative Virtual Backdoors
Distributed System Monitoring and Failure Diagnosis using Cooperative Virtual Backdoors Benoit Boissinot E.N.S Lyon directed by Christine Morin IRISA/INRIA Rennes Liviu Iftode Rutgers University Phenix
OCFS2: The Oracle Clustered File System, Version 2
OCFS2: The Oracle Clustered File System, Version 2 Mark Fasheh Oracle [email protected] Abstract This talk will review the various components of the OCFS2 stack, with a focus on the file system and
NSS Volume Data Recovery
NSS Volume Data Recovery Preliminary Document September 8, 2010 Version 1.0 Copyright 2000-2010 Portlock Corporation Copyright 2000-2010 Portlock Corporation Page 1 of 20 The Portlock storage management
The Fallacy of Software Write Protection in Computer Forensics Mark Menz & Steve Bress Version 2.4 May 2, 2004
The Fallacy of Software Write Protection in Computer Forensics Mark Menz & Steve Bress Version 2.4 May 2, 2004 1.0 Table of Contents 1. Table of Contents 2. Abstract 3. Introduction 4. Problems a. Controlled
Storage and File Structure
Storage and File Structure Chapter 10: Storage and File Structure Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files
Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated
Taking Linux File and Storage Systems into the Future Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated 1 Overview Going Bigger Going Faster Support for New Hardware Current Areas
Oracle Cluster File System on Linux Version 2. Kurt Hackel Señor Software Developer Oracle Corporation
Oracle Cluster File System on Linux Version 2 Kurt Hackel Señor Software Developer Oracle Corporation What is OCFS? GPL'd Extent Based Cluster File System Is a shared disk clustered file system Allows
Transparent Monitoring of a Process Self in a Virtual Environment
Transparent Monitoring of a Process Self in a Virtual Environment PhD Lunchtime Seminar Università di Pisa 24 Giugno 2008 Outline Background Process Self Attacks Against the Self Dynamic and Static Analysis
Mark Bennett. Search and the Virtual Machine
Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business
Linux Kernel Architecture
Linux Kernel Architecture Amir Hossein Payberah [email protected] Contents What is Kernel? Kernel Architecture Overview User Space Kernel Space Kernel Functional Overview File System Process Management
Machine check handling on Linux
Machine check handling on Linux Andi Kleen SUSE Labs [email protected] Aug 2004 Abstract The number of transistors in common CPUs and memory chips is growing each year. Hardware busses are getting faster. This
Configuring Apache Derby for Performance and Durability Olav Sandstå
Configuring Apache Derby for Performance and Durability Olav Sandstå Sun Microsystems Trondheim, Norway Agenda Apache Derby introduction Performance and durability Performance tips Open source database
Supported File Systems
System Requirements VMware Infrastructure Platforms Hosts vsphere 5.x vsphere 4.x Infrastructure 3.5 (VI3.5) ESX(i) 5.x ESX(i) 4.x ESX(i) 3.5 vcenter Server 5.x (optional) vcenter Server 4.x (optional)
SIDN Server Measurements
SIDN Server Measurements Yuri Schaeffer 1, NLnet Labs NLnet Labs document 2010-003 July 19, 2010 1 Introduction For future capacity planning SIDN would like to have an insight on the required resources
Jorix kernel: real-time scheduling
Jorix kernel: real-time scheduling Joris Huizer Kwie Min Wong May 16, 2007 1 Introduction As a specialized part of the kernel, we implemented two real-time scheduling algorithms: RM (rate monotonic) and
Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014
Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014 Topics Auditing a Geospatial Server Solution Web Server Strategies and Configuration Database Server Strategy and Configuration
How to Perform Real-Time Processing on the Raspberry Pi. Steven Doran SCALE 13X
How to Perform Real-Time Processing on the Raspberry Pi Steven Doran SCALE 13X Outline What is Real-Time? What is the Raspberry Pi? Can the Raspberry Pi handle Real-Time (And why would you want to? Why
MySQL Backup and Recovery: Tools and Techniques. Presented by: René Cannaò @rene_cannao Senior Operational DBA www.palominodb.com
MySQL Backup and Recovery: Tools and Techniques Presented by: René Cannaò @rene_cannao Senior Operational DBA www.palominodb.com EXPERIENCE WITH BACKUP How many of you consider yourself beginners? How
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
Instant Recovery for VMware
NETBACKUP 7.6 FEATURE BRIEFING INSTANT RECOVERY FOR VMWARE NetBackup 7.6 Feature Briefing Instant Recovery for VMware Version number: 1.0 Issue date: 2 nd August 2013 This document describes a feature
Chapter 15: Recovery System
Chapter 15: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Shadow Paging Recovery With Concurrent Transactions Buffer Management Failure with Loss of
Removing The Linux Routing Cache
Removing The Red Hat Inc. Columbia University, New York, 2012 Removing The 1 Linux Maintainership 2 3 4 5 Removing The My Background Started working on the kernel 18+ years ago. First project: helping
Kafka & Redis for Big Data Solutions
Kafka & Redis for Big Data Solutions Christopher Curtin Head of Technical Research @ChrisCurtin About Me 25+ years in technology Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)
Virtually Effortless Backup for VMware Environments
A White Paper ly Effortless for Environments Abstract: and disaster recovery (DR) strategies protect data from corruption and failures and ensure that if these types of events occur, companies aren t forced
Chapter 14: Recovery System
Chapter 14: Recovery System Chapter 14: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Remote Backup Systems Failure Classification Transaction failure
Chapter 5 Cloud Resource Virtualization
Chapter 5 Cloud Resource Virtualization Contents Virtualization. Layering and virtualization. Virtual machine monitor. Virtual machine. Performance and security isolation. Architectural support for virtualization.
6.828 Operating System Engineering: Fall 2003. Quiz II Solutions THIS IS AN OPEN BOOK, OPEN NOTES QUIZ.
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.828 Operating System Engineering: Fall 2003 Quiz II Solutions All problems are open-ended questions. In
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
RELIABLE OPERATING SYSTEMS
RELIABLE OPERATING SYSTEMS Research Summary 1 st EuroSys Doctoral Workshop October 23, 2005 Brighton, UK Jorrit N. Herder Dept. of Computer Science Vrije Universiteit Amsterdam PERCEIVED PROBLEMS Weak
Operating System Structure
Operating System Structure Lecture 3 Disclaimer: some slides are adopted from the book authors slides with permission Recap Computer architecture CPU, memory, disk, I/O devices Memory hierarchy Architectural
COS 318: Operating Systems. Virtual Machine Monitors
COS 318: Operating Systems Virtual Machine Monitors Kai Li and Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall13/cos318/ Introduction u Have
Chapter 11 I/O Management and Disk Scheduling
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization
BackupEnabler: Virtually effortless backups for VMware Environments
White Paper BackupEnabler: Virtually effortless backups for VMware Environments Contents Abstract... 3 Why Standard Backup Processes Don t Work with Virtual Servers... 3 Agent-Based File-Level and Image-Level
6. Storage and File Structures
ECS-165A WQ 11 110 6. Storage and File Structures Goals Understand the basic concepts underlying different storage media, buffer management, files structures, and organization of records in files. Contents
Response Time Analysis
Response Time Analysis A Pragmatic Approach for Tuning and Optimizing Oracle Database Performance By Dean Richards Confio Software, a member of the SolarWinds family 4772 Walnut Street, Suite 100 Boulder,
Lessons Learned while Pushing the Limits of SecureFile LOBs. by Jacco H. Landlust. zondag 3 maart 13
Lessons Learned while Pushing the Limits of SecureFile LOBs @ by Jacco H. Landlust Jacco H. Landlust 36 years old Deventer, the Netherlands 2 Jacco H. Landlust / idba Degree in Business Informatics and
x86 ISA Modifications to support Virtual Machines
x86 ISA Modifications to support Virtual Machines Douglas Beal Ashish Kumar Gupta CSE 548 Project Outline of the talk Review of Virtual Machines What complicates Virtualization Technique for Virtualization
Journaling the Linux ext2fs Filesystem
Journaling the Linux ext2fs Filesystem Stephen C. Tweedie [email protected] Abstract This paper describes a work-in-progress to design and implement a transactional metadata journal for the Linux ext2fs
High Availability Solutions for the MariaDB and MySQL Database
High Availability Solutions for the MariaDB and MySQL Database 1 Introduction This paper introduces recommendations and some of the solutions used to create an availability or high availability environment
Hotpatching and the Rise of Third-Party Patches
Hotpatching and the Rise of Third-Party Patches Alexander Sotirov [email protected] BlackHat USA 2006 Overview In the next one hour, we will cover: Third-party security patches _ recent developments
Converting Linux and Windows Physical and Virtual Machines to Oracle VM Virtual Machines. An Oracle Technical White Paper December 2008
Converting Linux and Windows Physical and Virtual Machines to Oracle VM Virtual Machines An Oracle Technical White Paper December 2008 Converting Linux and Windows Physical and Virtual Machines to Oracle
Manage the RAID system from event log
Manage the RAID system from event log Tim Chung Version 1.0 (JAN, 2010) - 1 - QSAN Technology, Inc. http://www.qsantechnology.com White Paper# QWP201001-ALL lntroduction Event log records the information
Virtual machines and operating systems
V i r t u a l m a c h i n e s a n d o p e r a t i n g s y s t e m s Virtual machines and operating systems Krzysztof Lichota [email protected] A g e n d a Virtual machines and operating systems interactions
1 Storage Devices Summary
Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious
High Availability Server Management
High Availability Server Management Maintain high server availability through active performance monitoring and low-impact, on-demand remote management services. Maintain high server availability with
Virtualization Technology. Zhiming Shen
Virtualization Technology Zhiming Shen Virtualization: rejuvenation 1960 s: first track of virtualization Time and resource sharing on expensive mainframes IBM VM/370 Late 1970 s and early 1980 s: became
Datacenter Operating Systems
Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major
Audit & Tune Deliverables
Audit & Tune Deliverables The Initial Audit is a way for CMD to become familiar with a Client's environment. It provides a thorough overview of the environment and documents best practices for the PostgreSQL
An Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
The Design and Implementation of a Log-Structured File System
The Design and Implementation of a Log-Structured File System Mendel Rosenblum and John K. Ousterhout Electrical Engineering and Computer Sciences, Computer Science Division University of California Berkeley,
On Benchmarking Popular File Systems
On Benchmarking Popular File Systems Matti Vanninen James Z. Wang Department of Computer Science Clemson University, Clemson, SC 2963 Emails: {mvannin, jzwang}@cs.clemson.edu Abstract In recent years,
Chapter 16: Recovery System
Chapter 16: Recovery System Failure Classification Failure Classification Transaction failure : Logical errors: transaction cannot complete due to some internal error condition System errors: the database
Crash Consistency: FSCK and Journaling
42 Crash Consistency: FSCK and ing As we ve seen thus far, the file system manages a set of data structures to implement the expected abstractions: files, directories, and all of the other metadata needed
