ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
|
|
- Osborn Beasley
- 8 years ago
- Views:
Transcription
1 ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems Finding a needle in Haystack: Facebook s photo storage
2 Overview Haystack is an object storage designed for sharing photos on Facebook where data is written once, read often, never modified, and rarely deleted. Use a network attached storage (NAS) appliance mounted over NFS implies disadvantages when metadata are handled. For the Photos application most of this metadata is unused and the more significant cost is that file s metadata must be read from disk into memory in order to find the file itself Multiplied over billion of photos, accessing metadata is the throughput bottleneck Using disk IOs for metadata is the limiting factor to read throughput Haystack served photos quickly using at most one disk operation. It is possible keeping all metadata in main memory Acknowledgment: Few slides are adapted from slides made by Ewa Syta.
3 Facebook Photos in Numbers The biggest photo sharing website in the world For each photo Facebook generates and stores four images Currently stores over 260 billion-> over 20 petabytes One billion new photos uploaded each week One million images per second at peak
4 How FacebookPhotosare used? Profile pictures and pictures recently uploaded Very frequently accessed right after being uploaded Likely to be accessed by different users More likely to be deleted Likely to be cached Long Tail Album photos and older photos Less popular but still frequently accessed Oftenrequestedin a sequenceby the sameuser Likelynotto be in cache and to be retrievedfrom the storage hosts
5 TypicalDesign How web servers, contentdelivery network and storage systems interact to serve photoson a popularsite. HTTP requestissentto a web server Web server generatesthe markup for the brower to render; for eachimage isconstructeda URL directingthe browser to a location from whichto download the data ThisURL pointsto a CDN ; ifthe CDN hasthe image cachedthencdn responds with the data Otherwise, the CDN, via embedded information in the URL, retrieves the photo from the site s storage system The CDN updates its cache data Send the image to the user s browser
6 NFS-based Design Stores each photo in its own file on a set of commercial NAS appliances Photo Store servers mount all the volume exported over NFS : Process HTTP request for images Extracts the volume and full path to the file from an image s URL Read the data over NFS Returns the result to the CDN.
7 NFS Design s Issues Thousands of files stored in each directory of an NFS volume Excessive number of disk operations to read even a single image because of metadata lookups Most of metadata not used for photos Waste of storage capacity Requires disk read operations to find the file itself It was common to incur more than 10 disk operations to retrieve a single image The key problem: Accessing metadata is the throughput bottleneck
8 NFS Design s Issues (contd) Two attempts to reduce disk operations: Reducing directory sizes to hundreds of images per directory. It decrease the disk operations to 3: I. read the directory metadata into memory, II. load the inode into memory, III. read the file contents Let the Photo Store servers explicitly cache file handles returned by NAS Focusing only on caching has limited impact for reducing disk operations. The storage system ends up processing the long tail of requests for less popular photos, which are not available in the CDN and are thus likely to miss in our cache. Using disk IOs for metadata is the limiting factor for the read throughput
9 Why build a custom storage system Traditional filesystems perform poorly under Facebook s workload Existing systems lack the right RAM-to-disk ratio Having enough main memory may helps to cache all the filesystemmetadata? No, it s not cost-effective in this approach: one photo corresponds to one file and each file requires at least one inode, which is hundreds of bytes large. Serving photo requests in the long tail represents a problem in that case. Facebook decided to build a custom storage system that reduces the amount of filesystemmetadata per photo so that having enough main memory is dramatically cost-effective.
10 Haystack Haystack is designed to achieve four main goals: High throughput and low latency Keep up with the users requests Facilitate a good user experience serving photos quickly Fault-tolerant Users should not experience errors despite the inevitable server crashes and hard drive failures Cost-effective Cost per terabyte of usable storage Read rate normalized for each terabyte of usable storage Simple Straight-forward design
11 Design Use a CDN to serve popular images Leverage Haystack to respond to photo requests in the long tail efficiently Reduce the memory used for filesystem metadata Store multiple photos in a single file and therefore maintains very large files. Two kinds of metadata: Application metadata : information needed to construct a URL for the browser Filesystemmetadata: data necessary to retrieve a photos on a host s disk.
12 Design (contd) 3 core components: Haystack Directory Haystack Cache Haystack Store (The three components fit into the canonical interactions between a user s browser, web server, CDN and storage system.) The browser can be directed to either the CDN or the Cache. Having an internal caching infrastructure gives us the ability to reduce our dependence on external CDNs.
13 Haystack Directory Provides a mapping from logical volumes to physical volumes. Useful for web servers for upload photos and construct the image URLs for a page request. id>/<logical volume, Photo> The URL contains several pieces of information, each piece corresponding to the sequence of steps from when a user s browser contact the CDN (or Cache) to ultimately retrieving a photo from a machine in the Store. Load balances writes across logical volumes and reads across physical volumes. Determines whether a photo request should be handled by the CDN or by the Cache Identifies read-only logical volumes Machine is marked read-only when it exhausts its capacity or for operational reasons The Directory stores its information in a replicated database accessed via a PHP interface that leverages memcache to reduce latency.
14 Haystack Cache Functions as an internal CDN Organized as a distributed hash table and use a photo s id as the key to locate cached data It catches a photo only if: The request comes directly from a user and not the CDN Post-CDN caching is ineffective The photo is fetched from a write-enabled Store machine Photos are most heavily accessed soon after they are uploaded Haystack performs better when doing either reads or writes
15 Haystack Store Encapsulates the storage system for photos and manages the filesystem metadata for photos Organized by physical volumes 10 terabytes of physical storage divided into 100 physical volumes 100 gigabytes each Physical volumes on different machines grouped into logical volumes Mitigate data loss Manages operations Read Write Delete
16 Haystack Store (contd) Access a photo quickly using only the id of the corresponding logical volume and the file offset at which the photo resides. Keystone of the Haystack design: retrieving the filename, offset, and size for a particular photo without needing disk operations. Keeps open file descriptors for each physical volume and also an in-memory mapping of photo ids to the filesystem metadata critical for retrieving that photo.
17 Physical Volume Layout Store machine represent a physical volume as a large file consisting of a superblock followed by a sequence of needles. Each needle represent a photo stored in Haystack Think of a physical volume as a very large file saved as /hay/haystack/<logical volume id> To retrieve needles quickly each Store machine maintain an in-memory data structure for each of its volumes, which maps pairs of(key, alternative key) to the corresponding needle s flag, size in bytes and volume offset.
18 Physical Volume Layout
19 Photo Read When a Cache machine requests a photo it supplies the logical volume id, key, alternate key and cookie Cookie s value is randomly assigned by and stored in the Directory at the time the photo is uploaded Used to eliminate attacks aimed at guessing valid URLs for photos. Store machine looks up the relevant metadata in its inmemory mappings Checks if it s not deleted Seeks to the appropriate offset in the volume file Reads the entire needle from disk Verifies the cookie and the integrity of the data Returns the photo if checks passed
20 Photo Write Web servers provide: Logical volume id, key, alternate key, cookie and data to Store machines Each machine synchronously appends needle images to its physical volume files and updates in-memory mappings as needed Volumes are append-only so photos can only be modified by adding an update needle with the same key and alternate key o Different logical volume: the Directory updates its application metadata and future requests will never fetch the older version o Same logical volume: duplicate distinguished based on their offsets: highest offset = latest version
21 Photo Write (contd) Uploading a Photo
22 Photo Delete Very straightforward Sets the delete flag in both the in-memory mapping and synchrounously in the volume file Space occupied by deleted needles is lost for some time and reclaimed later via compaction Online operation that reclaims the space used by deleted and duplicate needles Needles are copied into a new file and the new file replaced the current file The pattern for deletes is similar to photo views Young photos are a lot more likely to be deleted Over the year 25% of the photos get deleted
23 Index File Store machines maintain an index file for each of their volumes Checkpoint of the in-memory data structures used to locate needles efficiently on disk Allows a Store machine to build its in-memory mappings quickly, shortening restart time
24 Index File (contd) Layout of Haystack Index file
25 Index File (contd) Is updated asynchronously Allows write and delete operations to return faster Two sides effects: Needles can exist without corresponding index records (Orphans) olast record in the index corresponds to last non-orphan needle in the volume file Index records do not reflect deleted photos oif a needle is marked as deleted the Store machine update its in-memory mapping and notifies Cache
26 Filesystem Store machine uses XFS, an extent based file system Doesn t need much memory to be able to perform random seeks within a large file quickly The blockmapsfor several contiguous large files can be small enough to be stored in main memory XFS provides efficient preallocationand mitigating fragmentation
27 Recovery from failures Haystack needs to tolerate a variety of failures: faulty hard drives, misbehaving RAID controllers, bad motherboards, etc Two straight forward techniques to tolerate failures: A background task, pitchfork, periodically checks the availability of each volume file, tests the connection to each Store machine and attempts to read data. If the check fails the logical volume are marked as read-only To fix the problem sometimes (a few each month) is required a bulk sync operation in which we reset the data of a Store machine using the volume files supplied by a replica.
28 Haystack Optimizations Haystack is an object store designed for sharing photos on Facebook where data is written once, read often, never modified, and rarely deleted. It keeps all metadata in main memory and requires at most one disk operation per read The per photo metadata to find a photo on disk are reduced It replicates each photo in geographically distinct locations Each usable terabyte costs ~28% less and processes ~4x more reads per second than an equivalent terabyte on a NAS appliance
29 Haystack Optimizations (contd) Store machines reduce their main memory footprints by 20% : For deleted photos the offset is 0 The supplied cookie are checked after reading a needle from disk It uses on average 10 bytes of main memory per photo. Each photo is scaled to four photos with the same key(64 bits), different alternate key(32 bits) and different data size(16 bits) In addition, 2 bytes per image in overheads due to hash tables, bringing the total for four scaled photos of the same image to 40 bytes For comparison, xfs_inode_t structure in Linux is 536 bytes
30 Traffic Volume The number of Haystack photos written is 12 times the number of photos uploaded Haystack responds to approximately 10% of all photo requests from CDNs The smaller images account for most of the photo viewed Reading smaller images is typically a more latency sensitive operation for Facebook
31 Evaluation Directory Directory balances very effectively writes across Store machines well balanced behavior
32 Evaluation Cache The Cache is effective in dramatically reducing the read request rate for the machines that would be most affected These photos are relatively recent, which explains the high hit rates of ~80%
33 Evaluation Store Two benchmarks: Randomio and Haystress Haystack delivers 85% of the raw throughput of the device while incurring only 17% higher latency The Store delivers high read throughput even in the presence of writes
34 Evaluation Store (contd) The latency of multi-write operations is fairly low and stable even as the volume of traffic varies dramatically
35 Questions Haystack sets its goal as having at most one disk operation per read. To this end, it must keep all metadata in main memory. What are barriers to achieve this objective? (Section 1) What does long tail refer to regarding Haystack s workload and access pattern? Why does long tail make caches in Photo Store Servers deployed before NAS servers less effective? (Section 2) To retrieve needles quickly, what does the in-memory mapping data structure include? Image that each photo is stored as a file in a conventional file system, to locate the photo (or the data) what metadata are required? (Section 3.4) Apparently Haystack provides much small metadata. Do you think whether it is necessary to introduce the Haystack s technique into the conventional/general-purpose file systems to improve the performance? Why? If it is indeed adopted, what is the disadvantage?
Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems
Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems 1 Some Numbers (2010) Over 260 Billion images (20 PB) 65 Billion X 4 different sizes for each image. 1 Billion
More informationFinding a needle in Haystack: Facebook s photo storage
Finding a needle in Haystack: Facebook s photo storage Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, Peter Vajgel, Facebook Inc. {doug, skumar, hcli, jsobel, pv}@facebook.com Abstract: This paper
More informationToday s Papers. RAID Basics (Two optional papers) Array Reliability. EECS 262a Advanced Topics in Computer Systems Lecture 4
EECS 262a Advanced Topics in Computer Systems Lecture 4 Filesystems (Con t) September 15 th, 2014 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley Today
More informationChapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition
Chapter 11: File System Implementation 11.1 Silberschatz, Galvin and Gagne 2009 Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation
More informationOriginal-page small file oriented EXT3 file storage system
Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: wzzhang@hit.edu.cn
More informationFAWN - a Fast Array of Wimpy Nodes
University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed
More informationGoogle File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
More informationChapter 11: File System Implementation. Operating System Concepts 8 th Edition
Chapter 11: File System Implementation Operating System Concepts 8 th Edition Silberschatz, Galvin and Gagne 2009 Chapter 11: File System Implementation File-System Structure File-System Implementation
More informationCLOUD scale storage Anwitaman DATTA SCE, NTU Singapore CE 7490 ADVANCED TOPICS IN DISTRIBUTED SYSTEMS
CLOUD scale storage Anwitaman DATTA SCE, NTU Singapore NIST definition: Cloud Computing Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable
More informationNetwork File System (NFS) Pradipta De pradipta.de@sunykorea.ac.kr
Network File System (NFS) Pradipta De pradipta.de@sunykorea.ac.kr Today s Topic Network File System Type of Distributed file system NFS protocol NFS cache consistency issue CSE506: Ext Filesystem 2 NFS
More informationWikimedia architecture. Mark Bergsma <mark@wikimedia.org> Wikimedia Foundation Inc.
Mark Bergsma Wikimedia Foundation Inc. Overview Intro Global architecture Content Delivery Network (CDN) Application servers Persistent storage Focus on architecture, not so much on
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP)
More informationCOS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File
Topics COS 318: Operating Systems File Layout and Directories File system structure Disk allocation and i-nodes Directory and link implementations Physical layout for performance 2 File System Components
More informationDistributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics data 4
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Fall 2004 Lecture 13: FFS, LFS, RAID Geoffrey M. Voelker Overview We ve looked at disks and file systems generically Now we re going to look at some example file
More informationNetwork Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
More informationJune 2009. Blade.org 2009 ALL RIGHTS RESERVED
Contributions for this vendor neutral technology paper have been provided by Blade.org members including NetApp, BLADE Network Technologies, and Double-Take Software. June 2009 Blade.org 2009 ALL RIGHTS
More informationCS 153 Design of Operating Systems Spring 2015
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations Physical Disk Structure Disk components Platters Surfaces Tracks Arm Track Sector Surface Sectors Cylinders Arm Heads
More informationDistributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
More informationHow to Choose your Red Hat Enterprise Linux Filesystem
How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to
More informationHSS: A simple file storage system for web applications
HSS: A simple file storage system for web applications Abstract AOL Technologies has created a scalable object store for web applications. The goal of the object store was to eliminate the creation of
More informationMoving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage
Moving Virtual Storage to the Cloud Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage Table of Contents Overview... 1 Understanding the Storage Problem... 1 What Makes
More informationCOSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationCHAPTER 17: File Management
CHAPTER 17: File Management The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
More informationBig data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
More informationChapter 12 File Management
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access
More informationChapter 12 File Management. Roadmap
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access
More informationTECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE
TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE Deploy a modern hyperscale storage platform on commodity infrastructure ABSTRACT This document provides a detailed overview of the EMC
More informationMoving Virtual Storage to the Cloud
Moving Virtual Storage to the Cloud White Paper Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage www.parallels.com Table of Contents Overview... 3 Understanding the Storage
More informationTop Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation
Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely
More informationHow To Understand The Power Of A Content Delivery Network (Cdn)
Overview 5-44 5-44 Computer Networking 5-64 Lecture 8: Delivering Content Content Delivery Networks Peter Steenkiste Fall 04 www.cs.cmu.edu/~prs/5-44-f4 Web Consistent hashing Peer-to-peer CDN Motivation
More informationWorkspace Acceleration and Storage Reduction: A Comparison of Methods & Introduction to IC Manage Views. Roger March and Shiv Sikand, IC Manage, Inc.
Workspace Acceleration and Storage Reduction: A Comparison of Methods & Introduction to IC Manage Views Roger March and Shiv Sikand, IC Manage, Inc. Digital Assets Growing at Rapid Rate File systems are
More informationPIONEER RESEARCH & DEVELOPMENT GROUP
SURVEY ON RAID Aishwarya Airen 1, Aarsh Pandit 2, Anshul Sogani 3 1,2,3 A.I.T.R, Indore. Abstract RAID stands for Redundant Array of Independent Disk that is a concept which provides an efficient way for
More informationDesigning a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
More informationAnalyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution
Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems JHalstuch@racktopsystems.com Big Data Invasion We hear so much on Big Data and
More informationThe Google File System
The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:
More informationIBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE
White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores
More informationCOSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card
More informationLecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
More informationScala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
More informationRemoving Failure Points and Increasing Scalability for the Engine that Drives webmd.com
Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com Matt Wilson Director, Consumer Web Operations, WebMD @mattwilsoninc 9/12/2013 About this talk Go over original site
More informationFile-System Implementation
File-System Implementation 11 CHAPTER In this chapter we discuss various methods for storing information on secondary storage. The basic issues are device directory, free space management, and space allocation
More informationReview. Lecture 21: Reliable, High Performance Storage. Overview. Basic Disk & File System properties CSC 468 / CSC 2204 11/23/2006
S 468 / S 2204 Review Lecture 2: Reliable, High Performance Storage S 469HF Fall 2006 ngela emke rown We ve looked at fault tolerance via server replication ontinue operating with up to f failures Recovery
More informationHypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
More informationCASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level -ORACLE TIMESTEN 11gR1 CASE STUDY Oracle TimesTen In-Memory Database and Shared Disk HA Implementation
More informationDirect NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationTwo Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface
File Management Two Parts Filesystem Interface Interface the user sees Organization of the files as seen by the user Operations defined on files Properties that can be read/modified Filesystem design Implementing
More informationChapter 13 File and Database Systems
Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation
More informationChapter 13 File and Database Systems
Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation
More informationWOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief
DDN Solution Brief Personal Storage for the Enterprise WOS Cloud Secure, Shared Drop-in File Access for Enterprise Users, Anytime and Anywhere 2011 DataDirect Networks. All Rights Reserved DDN WOS Cloud
More informationWHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation
WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:
More informationEverything you need to know about flash storage performance
Everything you need to know about flash storage performance The unique characteristics of flash make performance validation testing immensely challenging and critically important; follow these best practices
More informationDeltaStor Data Deduplication: A Technical Review
White Paper DeltaStor Data Deduplication: A Technical Review DeltaStor software is a next-generation data deduplication application for the SEPATON S2100 -ES2 virtual tape library that enables enterprises
More informationHRG Assessment: Stratus everrun Enterprise
HRG Assessment: Stratus everrun Enterprise Today IT executive decision makers and their technology recommenders are faced with escalating demands for more effective technology based solutions while at
More informationHDFS Users Guide. Table of contents
Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9
More informationWOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved.
DDN Whitepaper WOS for Research Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved. irods and the DDN Web Object Scalar (WOS) Integration irods, an open source
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationLarge-Scale Web Applications
Large-Scale Web Applications Mendel Rosenblum Web Application Architecture Web Browser Web Server / Application server Storage System HTTP Internet CS142 Lecture Notes - Intro LAN 2 Large-Scale: Scale-Out
More informationHighly Available Mobile Services Infrastructure Using Oracle Berkeley DB
Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides
More informationWeb Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)
1 1 Distributed Systems What are distributed systems? How would you characterize them? Components of the system are located at networked computers Cooperate to provide some service No shared memory Communication
More informationThe Hadoop Distributed File System
The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture
More informationDiagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
More informationOperating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University
Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications
More informationThe Microsoft Large Mailbox Vision
WHITE PAPER The Microsoft Large Mailbox Vision Giving users large mailboxes without breaking your budget Introduction Giving your users the ability to store more e mail has many advantages. Large mailboxes
More informationDistributed Filesystems
Distributed Filesystems Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 8, 2014 Amir H. Payberah (SICS) Distributed Filesystems April 8, 2014 1 / 32 What is Filesystem? Controls
More informationAvoiding the Disk Bottleneck in the Data Domain Deduplication File System
Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Benjamin Zhu Data Domain, Inc. Kai Li Data Domain, Inc. and Princeton University Hugo Patterson Data Domain, Inc. Abstract Disk-based
More information1 Storage Devices Summary
Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious
More informationLecture 18: Reliable Storage
CS 422/522 Design & Implementation of Operating Systems Lecture 18: Reliable Storage Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of
More informationThe OpenStack TM Object Storage system
The OpenStack TM Object Storage system Deploying and managing a scalable, open- source cloud storage system with the SwiftStack Platform By SwiftStack, Inc. contact@swiftstack.com Contents Introduction...
More informationSocial Networking at Scale. Sanjeev Kumar Facebook
Social Networking at Scale Sanjeev Kumar Facebook Outline 1 What makes scaling Facebook challenging? 2 Evolution of Software Architecture 3 Evolution of Datacenter Architecture 845M users worldwide 2004
More informationJournal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
More informationSynology High Availability (SHA)
Synology High Availability (SHA) Based on DSM 5.1 Synology Inc. Synology_SHAWP_ 20141106 Table of Contents Chapter 1: Introduction... 3 Chapter 2: High-Availability Clustering... 4 2.1 Synology High-Availability
More informationWe mean.network File System
We mean.network File System Introduction: Remote File-systems When networking became widely available users wanting to share files had to log in across the net to a central machine This central machine
More informationWith DDN Big Data Storage
DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationDeploying Silver Peak VXOA with EMC Isilon SyncIQ. February 2012. www.silver-peak.com
Deploying Silver Peak VXOA with EMC Isilon SyncIQ February 2012 www.silver-peak.com Table of Contents Table of Contents Overview... 3 Solution Components... 3 EMC Isilon...3 Isilon SyncIQ... 3 Silver Peak
More informationHow To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
More informationSimplified HA/DR Using Storage Solutions
Simplified HA/DR Using Storage Solutions Agnes Jacob, NetApp and Tom Tyler, Perforce Software MERGE 2013 THE PERFORCE CONFERENCE SAN FRANCISCO APRIL 24 26 2 SIMPLIFIED HA/DR USING STORAGE SOLUTIONS INTRODUCTION
More informationZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals...2 1.2 Data model and the hierarchical namespace...3 1.3 Nodes and ephemeral nodes...
More information09'Linux Plumbers Conference
09'Linux Plumbers Conference Data de duplication Mingming Cao IBM Linux Technology Center cmm@us.ibm.com 2009 09 25 Current storage challenges Our world is facing data explosion. Data is growing in a amazing
More informationObject Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.
Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat
More informationA Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
More informationHadoop Scalability at Facebook. Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011
Hadoop Scalability at Facebook Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011 How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid How Facebook uses Hadoop Usages
More informationThe Value of a Content Delivery Network
September 2010 White Paper The Value of a Content Delivery Network Table of Contents Introduction... 3 Performance... 3 The Second Generation of CDNs... 6 Conclusion... 7 About NTT America... 8 Introduction
More informationFile Management Chapters 10, 11, 12
File Management Chapters 10, 11, 12 Requirements For long-term storage: possible to store large amount of info. info must survive termination of processes multiple processes must be able to access concurrently
More informationProtecting Information in a Smarter Data Center with the Performance of Flash
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 Protecting Information in a Smarter Data Center with the Performance of Flash IBM FlashSystem and IBM ProtecTIER Printed in
More informationA block based storage model for remote online backups in a trust no one environment
A block based storage model for remote online backups in a trust no one environment http://www.duplicati.com/ Kenneth Skovhede (author, kenneth@duplicati.com) René Stach (editor, rene@duplicati.com) Abstract
More informationTableau Server Scalability Explained
Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted
More informationBerkeley Ninja Architecture
Berkeley Ninja Architecture ACID vs BASE 1.Strong Consistency 2. Availability not considered 3. Conservative 1. Weak consistency 2. Availability is a primary design element 3. Aggressive --> Traditional
More informationCouchbase Server Under the Hood
Couchbase Server Under the Hood An Architectural Overview Couchbase Server is an open-source distributed NoSQL document-oriented database for interactive applications, uniquely suited for those needing
More informationDesign and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
More informationData Center Performance Insurance
Data Center Performance Insurance How NFS Caching Guarantees Rapid Response Times During Peak Workloads November 2010 2 Saving Millions By Making It Easier And Faster Every year slow data centers and application
More informationDependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs
Dependable Systems 9. Redundant arrays of inexpensive disks (RAID) Prof. Dr. Miroslaw Malek Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs Redundant Arrays of Inexpensive Disks (RAID) RAID is
More informationHow To Virtualize A Storage Area Network (San) With Virtualization
A New Method of SAN Storage Virtualization Table of Contents 1 - ABSTRACT 2 - THE NEED FOR STORAGE VIRTUALIZATION 3 - EXISTING STORAGE VIRTUALIZATION METHODS 4 - A NEW METHOD OF VIRTUALIZATION: Storage
More informationIntroduction to NetApp Infinite Volume
Technical Report Introduction to NetApp Infinite Volume Sandra Moulton, Reena Gupta, NetApp April 2013 TR-4037 Summary This document provides an overview of NetApp Infinite Volume, a new innovation in
More informationHadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013
Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 * Other names and brands may be claimed as the property of others. Agenda Hadoop Intro Why run Hadoop on Lustre?
More information