Distributed Storage Networks and Computer Forensics

Size: px
Start display at page:

Download "Distributed Storage Networks and Computer Forensics"

Transcription

1 Distributed Storage Networks and Computer Forensics 7 Storage Virtualization and DHT Technical Faculty Winter Semester 2011/12

2 Overview Concept of Virtualization Storage Area Networks Principles Optimization Distributed File Systems Without virtualization, e.g. Network File Systems With virtualization, e.g. Google File System Distributed Wide Area Storage Networks Distributed Hash Tables Peer-to-Peer Storage 2

3 Concept of Virtualization Principle A virtual storage constitutes handles all application accesses to the file system The virtual disk partitions files and stores blocks over several (physical) hard disks Control mechanisms allow redundancy and failure repair Control Virtualization server assigns data, e.g. blocks of files to hard disks (address space remapping) Controls replication and redundancy strategy Adds and removes storage devices 3 File Hard Disks Virtual Disk

4 Storage Virtualization Capabilities Replication Pooling Disk Management Advantages Data migration Higher availability Simple maintenance Scalability Disadvantages Un-installing is time consuming Compatibility and interoperability Complexity of the system Classic Implementation Host-based - Logical Volume Management - File Systems, e.g. NFS Storage devices based - RAID Network based - Storage Area Network New approaches Distributed Wide Area Storage Networks Distributed Hash Tables Peer-to-Peer Storage 4

5 Storage Area Networks Virtual Block Devices without file system connects hard disks Advantages simpler storage administration more flexible servers can boot from the SAN effective disaster recovery allows storage replication Compatibility problems between hard disks and virtualization server 5

6 SAN Networking Networking FCP (Fibre Channel Protocol) - SCSI over Fibre Channel iscsi (SCSI over TCP/IP) HyperSCSI (SCSI over Ethernet) ATA over Ethernet Fibre Channel over Ethernet iscsi over InfiniBand FCP over IP 6

7 SAN File Systems File system for concurrent read and write operations by multiple computers without conventional file locking concurrent direct access to blocks by servers Examples Veritas Cluster File System Xsan Global File System Oracle Cluster File System VMware VMFS IBM General Parallel File System 7

8 Distributed File Systems (without Virtualization) aka. Network File System Supports sharing of files, tapes, printers etc. Allows multiple client processes on multiple hosts to read and write the same files concurrency control or locking mechanisms necessary Examples Network File System (NFS) Server Message Block (SMB), Samba Apple Filing Protocol (AFP) Amazon Simple Storage Service (S3) 8

9 Distributed File Systems with Virtualization Example: Google File System File system on top of other file systems with builtin virtualization System built from cheap standard components (with high failure rates) Few large files Only operations: read, create, append, delete - concurrent appends and reads must be handled High bandwidth important Replication strategy chunk replication master replication Application GFS client (file name, chunk index) (chunk handle, chunk locations) (chunk handle, byte range) chunk data GFS master File namespace /foo/bar chunk 2ef0 Instructions to chunkserver GFS chunkserver Linux file system Chunkserver state Figure 1: GFS Architecture GFS chunkserver Linux file system 4 step 1 file region Client Master and replication decisions using global knowledge. However, tent 2 TCP connection to the chunkserver clients, over alta we must minimize its involvement in reads and writes 3 so period of time. Third, it reduces the dividual size of th o that it does not become a bottleneck. Clients never read stored on the master. This allows usorder to keep onth a and write file data through the master. Instead, a client asks in memory, which in turn brings other butadvanta undefi Secondary the master which chunkservers it should contact. It caches will discuss in Section Replica A this information for a limited time and interacts with the6 On the other hand, a large chunk size, even wit chunkservers directly for many subsequent operations. allocation, has its disadvantages. A3.2 small file Dac Let us explain the interactions for a simple read with reference to Figure 1. First, using the fixed chunk 7 size, the client storing those chunks may become hot small number of chunks, perhaps just one. WeThe deco c Primary use spots theifne m translates the file name and byte offset specified by the application into a chunk index within the file. Then, it sends been a major issue because our applications are5 accessing the same file. In practice, Replica client hottospo t Legend: pushed lin m the master a request containing the file name and chunk large multi-chunk files sequentially. in a pipel index. The master replies with the corresponding chunk However, hot spots did develop when GFS wa handle and locations of the replicas. The client caches this Control machine s 6 by a batch-queue system: an executable was wri information using the file name and chunk index Secondary as the key. as a single-chunk Data file and then started andon high-l hund The client then sends a request to one ofreplica the replicas, B chines at the same time. The few chunkservers through al most likely the closest one. The request specifies the chunk executable were overloaded by hundredsto of fully simu handle and a byte range within that chunk. Further reads quests. We fixed this problem by storing data is such pu of the same chunk require no more client-master Figure 2: interaction Write Control withand a higher Datareplication Flow factor andthan by making distr until the cached information expires or the file is reopened. queue system stagger application start eachtimes. mach The Google File System In fact, the client typically asks for multiple chunks in the long-term solution is to allow clients to ferread thedata Sanjay Ghemawat, same Howard request Gobioff, and theand master Shun-Tak can alsoleung include the information for chunks immediately following those requested. This clients in such situations. becomes unreachable Computer or repliesnetworks that it no longer and holds Telematics multiple r a lease. To avoid extra information sidesteps several future client-master interactions at practically no extra inter-switc 2.6 Metadata 9 3. cost. The client pushes the data tothe all master the replicas. stores three A client major types of metad machine fo can do so in any order. Each and chunk chunkserver Christian namespaces, will Schindelhauer the 2.5 Chunk Size store mapping from files and the locations of each chunk s replicas. network Allt the data in an internal LRU buffer cache until the Chunk size is one of the key design parameters. We have kept in the master s memory. The first client twois typ p chosen 64 MB, which is much larger data than is typical used or file aged sys- out. By decoupling the data flow paces and file-to-chunk mapping) aresends also kept thep from the control flow, we can improve performance by Legend: Data messages Control messages

10 Distributed Wide Area Storage Networks Distributed Hash Tables Relieving hot spots in the Internet Caching strategies for web servers Peer-to-Peer Networks Distributed file lookup and download in Overlay networks Most (or the best) of them use: DHT 10

11 WWW Load Balancing Web surfing: Web servers offer web pages Web clients request web pages Most of the time these requests are independent Requests use resources of the web servers bandwidth computation time Christian Stefan Arne 11

12 Load Some web servers have always high load for permanent high loads servers must be sufficiently powerful Some suffer under high fluctuations e.g. special events: - jpl.nasa.gov (Mars mission) - cnn.com (terrorist attack) Monday Tuesday Wednesday Server extension for worst case not reasonable Serving the requests is desired 12

13 Load Balancing in the WWW Monday Tuesday Wednesday Fluctuations target some servers A B A B A B (Commercial) solution Service providers offer exchange servers an Many requests will be distributed among these servers But how? A B 13

14 Literature Leighton, Lewin, et al. STOC 97 Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web Used by Akamai (founded 1997) Web-Cache 14

15 Start Situation Without load balancing Advantage simple Disadvantage servers must be designed for worst case situations Web-Server Web pages request Web-Clients 15

16 Site Caching Web-Server The whole web-site is copied to different web caches Browsers request at web server Web server redirects requests to Web- Cache Web-Cache delivers Web pages Advantage: good load balancing Disadvantage: bottleneck: redirect large overhead for complete web-site replication redirect Web-Cache Web-Clients 16

17 Proxy Caching Web-Server Each web page is distributed to a few web-caches Only first request is sent to web server Links reference to pages in the webcache Advantage: No bottleneck Disadvantages: Load balancing only implicit High requirements for placements redirect 2. Link 3. request Then, web clients surfs in the webcache Web- Cache Web-Client 17

18 Requirements Balance fair balancing of web pages Dynamics Efficient insert and delete of webcache-servers and files new X X Views Web-Clients see different set of web-caches 18

19 Hash Functions Buckets Items Set of Items: Set of Buckets: Example: 19

20 Ranged Hash-Funktionen Given: Items, Number Caches (Buckets), Bucket set: Views Ranged Hash Function: Prerequisite: for alle views Buckets View Items 20

21 First Idea: Hash Function Algorithm: Choose Hash function, e.g. 3 i + 1 mod n: number of Cache servers Balance: very good Dynamics Insert or remove of a single cache server New hash functions and total rehashing Very expensive!! i + 2 mod X

22 Requirements of the Ranged Hash Functions Monotony After adding or removing new caches (buckets) no pages (items) should be moved Balance All caches should have the same load Spread (Verbreitung,Streuung) A page should be distributed to a bounded number of Load caches No Cache should not have substantially more load than the average 22

23 Monotony After adding or removing new caches (buckets) no pages (items) should be moved Formally: For all Pages View 1: View 2: Caches Caches Pages 23

24 Balance For every view V the is the f V (i) balanced For a constant c and all : Pages View 1: View 2: Caches Caches Pages 24

25 Spread The spread σ(i) of a page i is the overall number of all necessary copies (over all views) View 1: View 2: View 3: 25

26 Load The load λ(b) of a cache b is the over-all number of all copies (over all views) wher!!!!! in View V := set of all pages assigned to bucket b View 1: View 2: λ(b 1 ) = 2 λ(b 2 ) = 3 View 3: b 1 b 2 26

27 Distributed Hash Tables Theorem There exists a family of hash function with the following properties Each function f F is monotone C! number of caches (Buckets) C/t! minimum number of caches per View V/C = constant (#Views / #Caches) I = C! (# pages = # Caches)! Balance: For every view! Spread: For each page i with probability! Load: For each cache b mit W keit 27

28 The Design 2 Hash functions onto the reals [0,1] maps k log C copies of cache b randomly to [0,1] maps web page i randomly to the interval [0,1] := Cache, which minimizes Caches (Buckets): View View Webseiten (Items): 28

29 Monotony := Cache which minimizes For all : Observe: blue interval in V 2 and in V 1 empty! View View

30 2. Balance Balance: For all views Caches (Buckets): Choose fixed view and a web page i Apply hash functions and. Under the assumption that the mapping is random every cache is chosen with the same probability View 0 1 Webseiten (Items): 30

31 3. Spread σ(i) = number of all necessary copies (over all views) C! number of caches (Buckets) C/t! minimum number of caches per View V/C = constant (#Views / #Caches) I = C! (# pages = # Caches) ever user knows at least a fraction of 1/t over the caches For every page i with prob. Proof sketch: Every view has a cache in an interval of length t/c (with high probability) The number of caches gives an upper bound for the spread 0 t/c 2t/C 1 31

32 4. Load Last (load): λ(b) = Number of copies over all views where := wet of pages assigned to bucket b under view V For every cache be we observe!!!!! with probability Proof sketch: Consider intervals of length t/c With high probability a cache of every view falls into one of these intervals The number of items in the interval gives an upper bound for the load 0 t/c 2t/C 1 32

33 Summary Distributed Hash Table is a distributed data structure for virtualization with fair balance provides dynamic behavior Standard data structure for dynamic distributed storages 33

34 Distributed Storage Networks and Computer Forensics 7 Storage Virtualization and DHT Technical Faculty Winter Semester 2011/12

Algorithms and Methods for Distributed Storage Networks 8 Storage Virtualization and DHT Christian Schindelhauer

Algorithms and Methods for Distributed Storage Networks 8 Storage Virtualization and DHT Christian Schindelhauer Algorithms and Methods for Distributed Storage Networks 8 Storage Virtualization and DHT Institut für Informatik Wintersemester 2007/08 Overview Concept of Virtualization Storage Area Networks Principles

More information

Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg

Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg DAAD Summerschool Curitiba 2011 Aspects of Large Scale High Speed Computing Building Blocks of a Cloud Storage Networks 2: Virtualization of Storage: RAID, SAN and Virtualization Christian Schindelhauer

More information

The Google File System

The Google File System The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:

More information

The Google File System

The Google File System The Google File System Motivations of NFS NFS (Network File System) Allow to access files in other systems as local files Actually a network protocol (initially only one server) Simple and fast server

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead

RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead RAID # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead Tiffany Yu-Han Chen (These slides modified from Hao-Hua Chu National Taiwan University) RAID 0 - Striping

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

Massive Data Storage

Massive Data Storage Massive Data Storage Storage on the "Cloud" and the Google File System paper by: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung presentation by: Joshua Michalczak COP 4810 - Topics in Computer Science

More information

August 2009. Transforming your Information Infrastructure with IBM s Storage Cloud Solution

August 2009. Transforming your Information Infrastructure with IBM s Storage Cloud Solution August 2009 Transforming your Information Infrastructure with IBM s Storage Cloud Solution Page 2 Table of Contents Executive summary... 3 Introduction... 4 A Story or three for inspiration... 6 Oops,

More information

IBM Global Technology Services November 2009. Successfully implementing a private storage cloud to help reduce total cost of ownership

IBM Global Technology Services November 2009. Successfully implementing a private storage cloud to help reduce total cost of ownership IBM Global Technology Services November 2009 Successfully implementing a private storage cloud to help reduce total cost of ownership Page 2 Contents 2 Executive summary 3 What is a storage cloud? 3 A

More information

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Storage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Google File System. Web and scalability

Google File System. Web and scalability Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might

More information

Moving Virtual Storage to the Cloud

Moving Virtual Storage to the Cloud Moving Virtual Storage to the Cloud White Paper Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage www.parallels.com Table of Contents Overview... 3 Understanding the Storage

More information

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: 1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

More information

Network Attached Storage. Jinfeng Yang Oct/19/2015

Network Attached Storage. Jinfeng Yang Oct/19/2015 Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability

More information

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage Moving Virtual Storage to the Cloud Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage Table of Contents Overview... 1 Understanding the Storage Problem... 1 What Makes

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System

V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System André Brinkmann, Michael Heidebuer, Friedhelm Meyer auf der Heide, Ulrich Rückert, Kay Salzwedel, and Mario Vodisek Paderborn

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides. MapReduce for data intensive computing Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

More information

Big Table A Distributed Storage System For Data

Big Table A Distributed Storage System For Data Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,

More information

How To Back Up A Computer To A Backup On A Hard Drive On A Microsoft Macbook (Or Ipad) With A Backup From A Flash Drive To A Flash Memory (Or A Flash) On A Flash (Or Macbook) On

How To Back Up A Computer To A Backup On A Hard Drive On A Microsoft Macbook (Or Ipad) With A Backup From A Flash Drive To A Flash Memory (Or A Flash) On A Flash (Or Macbook) On Solutions with Open-E Data Storage Software (DSS V6) Software Version: DSS ver. 6.00 up40 Presentation updated: September 2010 Different s opportunities using Open-E DSS The storage market is still growing

More information

Storage Networking Overview

Storage Networking Overview Networking Overview iscsi Attached LAN Networking SAN NAS Gateway NAS Attached SAN Attached IBM Total Module Flow Business Challenges Networking Trends and Directions What is Networking? Technological

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center

Big Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center Big Data Processing in the Cloud Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center Data is ONLY as useful as the decisions it enables 2 Data is ONLY as useful as the decisions it enables

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture

More information

Ultimate Guide to Oracle Storage

Ultimate Guide to Oracle Storage Ultimate Guide to Oracle Storage Presented by George Trujillo George.Trujillo@trubix.com George Trujillo Twenty two years IT experience with 19 years Oracle experience. Advanced database solutions such

More information

Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing) 1 1 Distributed Systems What are distributed systems? How would you characterize them? Components of the system are located at networked computers Cooperate to provide some service No shared memory Communication

More information

Distributed File Systems

Distributed File Systems Distributed File Systems File Characteristics From Andrew File System work: most files are small transfer files rather than disk blocks? reading more common than writing most access is sequential most

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Alemnew Sheferaw Asrese University of Trento - Italy December 12, 2012 Acknowledgement: Mauro Fruet Alemnew S. Asrese (UniTN) Distributed File Systems 2012/12/12 1 / 55 Outline

More information

Large Scale Storage. Orlando Richards, Information Services orlando.richards@ed.ac.uk. LCFG Users Day, University of Edinburgh 18 th January 2013

Large Scale Storage. Orlando Richards, Information Services orlando.richards@ed.ac.uk. LCFG Users Day, University of Edinburgh 18 th January 2013 Large Scale Storage Orlando Richards, Information Services orlando.richards@ed.ac.uk LCFG Users Day, University of Edinburgh 18 th January 2013 Overview My history of storage services What is (and is not)

More information

POWER ALL GLOBAL FILE SYSTEM (PGFS)

POWER ALL GLOBAL FILE SYSTEM (PGFS) POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm

More information

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System CS341: Operating System Lect 36: 1 st Nov 2014 Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati File System & Device Drive Mass Storage Disk Structure Disk Arm Scheduling RAID

More information

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015 7/04/05 Fundamentals of Distributed Systems CC5- PROCESAMIENTO MASIVO DE DATOS OTOÑO 05 Lecture 4: DFS & MapReduce I Aidan Hogan aidhog@gmail.com Inside Google circa 997/98 MASSIVE DATA PROCESSING (THE

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

Ceph. A file system a little bit different. Udo Seidel

Ceph. A file system a little bit different. Udo Seidel Ceph A file system a little bit different Udo Seidel Ceph what? So-called parallel distributed cluster file system Started as part of PhD studies at UCSC Public announcement in 2006 at 7 th OSDI File system

More information

Outline. Failure Types

Outline. Failure Types Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 11 1 2 Conclusion Acknowledgements: The slides are provided by Nikolaus Augsten

More information

AIX NFS Client Performance Improvements for Databases on NAS

AIX NFS Client Performance Improvements for Databases on NAS AIX NFS Client Performance Improvements for Databases on NAS October 20, 2005 Sanjay Gulabani Sr. Performance Engineer Network Appliance, Inc. gulabani@netapp.com Diane Flemming Advisory Software Engineer

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

Sunita Suralkar, Ashwini Mujumdar, Gayatri Masiwal, Manasi Kulkarni Department of Computer Technology, Veermata Jijabai Technological Institute

Sunita Suralkar, Ashwini Mujumdar, Gayatri Masiwal, Manasi Kulkarni Department of Computer Technology, Veermata Jijabai Technological Institute Review of Distributed File Systems: Case Studies Sunita Suralkar, Ashwini Mujumdar, Gayatri Masiwal, Manasi Kulkarni Department of Computer Technology, Veermata Jijabai Technological Institute Abstract

More information

Distributed Data Stores

Distributed Data Stores Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High

More information

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation Overview of I/O Performance and RAID in an RDBMS Environment By: Edward Whalen Performance Tuning Corporation Abstract This paper covers the fundamentals of I/O topics and an overview of RAID levels commonly

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

Implementation Issues of A Cloud Computing Platform

Implementation Issues of A Cloud Computing Platform Implementation Issues of A Cloud Computing Platform Bo Peng, Bin Cui and Xiaoming Li Department of Computer Science and Technology, Peking University {pb,bin.cui,lxm}@pku.edu.cn Abstract Cloud computing

More information

IOmark- VDI. Nimbus Data Gemini Test Report: VDI- 130906- a Test Report Date: 6, September 2013. www.iomark.org

IOmark- VDI. Nimbus Data Gemini Test Report: VDI- 130906- a Test Report Date: 6, September 2013. www.iomark.org IOmark- VDI Nimbus Data Gemini Test Report: VDI- 130906- a Test Copyright 2010-2013 Evaluator Group, Inc. All rights reserved. IOmark- VDI, IOmark- VDI, VDI- IOmark, and IOmark are trademarks of Evaluator

More information

www.basho.com Technical Overview Simple, Scalable, Object Storage Software

www.basho.com Technical Overview Simple, Scalable, Object Storage Software www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...

More information

Storage and High Availability with Windows Server 10971B; 4 Days, Instructor-led

Storage and High Availability with Windows Server 10971B; 4 Days, Instructor-led Storage and High Availability with Windows Server 10971B; 4 Days, Instructor-led Course Description Get hands-on instruction and practice provisioning your storage requirements and meeting your high availability

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

How To Design A Data Center

How To Design A Data Center Data Center Design & Virtualization Md. Jahangir Hossain Open Communication Limited jahangir@open.com.bd Objectives Data Center Architecture Data Center Standard Data Center Design Model Application Design

More information

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 EXECUTIVE SUMMARY Microsoft Exchange Server is a disk-intensive application that requires high speed storage to deliver

More information

Chapter 12: Mass-Storage Systems

Chapter 12: Mass-Storage Systems Chapter 12: Mass-Storage Systems Chapter 12: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space Management RAID Structure

More information

Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems

Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems Seminar Presentation for ECE 658 Instructed by: Prof.Anura Jayasumana Distributed File Systems Prabhakaran Murugesan Outline File Transfer Protocol (FTP) Network File System (NFS) Andrew File System (AFS)

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

Distributed Metadata Management Scheme in HDFS

Distributed Metadata Management Scheme in HDFS International Journal of Scientific and Research Publications, Volume 3, Issue 5, May 2013 1 Distributed Metadata Management Scheme in HDFS Mrudula Varade *, Vimla Jethani ** * Department of Computer Engineering,

More information

STORAGE. 2015 Arka Service s.r.l.

STORAGE. 2015 Arka Service s.r.l. STORAGE STORAGE MEDIA independently from the repository model used, data must be saved on a support (data storage media). Arka Service uses the most common methods used as market standard such as: MAGNETIC

More information

Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems

Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems Applied Technology Abstract This white paper investigates configuration and replication choices for Oracle Database deployment with EMC

More information

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering DELL RAID PRIMER DELL PERC RAID CONTROLLERS Joe H. Trickey III Dell Storage RAID Product Marketing John Seward Dell Storage RAID Engineering http://www.dell.com/content/topics/topic.aspx/global/products/pvaul/top

More information

Windows Server 2012 R2 Hyper-V: Designing for the Real World

Windows Server 2012 R2 Hyper-V: Designing for the Real World Windows Server 2012 R2 Hyper-V: Designing for the Real World Steve Evans @scevans www.loudsteve.com Nick Hawkins @nhawkins www.nickahawkins.com Is Hyper-V for real? Microsoft Fan Boys Reality VMware Hyper-V

More information

DAS to SAN Migration Using a Storage Concentrator

DAS to SAN Migration Using a Storage Concentrator DAS to SAN Migration Using a Storage Concentrator April 2006 All trademark names are the property of their respective companies. This publication contains opinions of StoneFly, Inc. which are subject to

More information

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful. Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

CHAPTER 17: File Management

CHAPTER 17: File Management CHAPTER 17: File Management The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

Survey on Load Rebalancing for Distributed File System in Cloud

Survey on Load Rebalancing for Distributed File System in Cloud Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university

More information

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration Solutions Integrated Storage Appliances Management Optimized Storage & Migration Archive Data Retention & Compliance Services Global Installation & Support SECURING THE FUTURE OF YOUR DATA w w w.q sta

More information

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010 Best Practices for Data Sharing in a Grid Distributed SAS Environment Updated July 2010 B E S T P R A C T I C E D O C U M E N T Table of Contents 1 Abstract... 2 1.1 Storage performance is critical...

More information

Virtualization, Business Continuation Plan & Disaster Recovery for EMS -By Ramanj Pamidi San Diego Gas & Electric

Virtualization, Business Continuation Plan & Disaster Recovery for EMS -By Ramanj Pamidi San Diego Gas & Electric Virtualization, Business Continuation Plan & Disaster Recovery for EMS -By Ramanj Pamidi San Diego Gas & Electric 2001 San Diego Gas and Electric. All copyright and trademark rights reserved. Importance

More information

A Survey of Shared File Systems

A Survey of Shared File Systems Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...

More information

VERITAS Storage Foundation 4.3 for Windows

VERITAS Storage Foundation 4.3 for Windows DATASHEET VERITAS Storage Foundation 4.3 for Windows Advanced Volume Management Technology for Windows In distributed client/server environments, users demand that databases, mission-critical applications

More information

10971B: Storage and High Availability with Windows Server

10971B: Storage and High Availability with Windows Server 10971B: Storage and High Availability with Windows Server Course Details Course Code: Duration: Notes: 10971B 4 days This course syllabus should be used to determine whether the course is appropriate for

More information

HDFS Under the Hood. Sanjay Radia. Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

HDFS Under the Hood. Sanjay Radia. Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work 2 Hadoop Hadoop provides a framework

More information

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците Why SAN? Business demands have created the following challenges for storage solutions: Highly available and easily

More information

Optimizing Large Arrays with StoneFly Storage Concentrators

Optimizing Large Arrays with StoneFly Storage Concentrators Optimizing Large Arrays with StoneFly Storage Concentrators All trademark names are the property of their respective companies. This publication contains opinions of which are subject to change from time

More information

Four Reasons To Start Working With NFSv4.1 Now

Four Reasons To Start Working With NFSv4.1 Now Four Reasons To Start Working With NFSv4.1 Now PRESENTATION TITLE GOES HERE Presented by: Alex McDonald Hosted by: Gilles Chekroun Ethernet Storage Forum Members The SNIA Ethernet Storage Forum (ESF) focuses

More information

Block based, file-based, combination. Component based, solution based

Block based, file-based, combination. Component based, solution based The Wide Spread Role of 10-Gigabit Ethernet in Storage This paper provides an overview of SAN and NAS storage solutions, highlights the ubiquitous role of 10 Gigabit Ethernet in these solutions, and illustrates

More information

Technology Insight Series

Technology Insight Series Evaluating Storage Technologies for Virtual Server Environments Russ Fellows June, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved Executive Summary

More information

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:

More information

High Availability Databases based on Oracle 10g RAC on Linux

High Availability Databases based on Oracle 10g RAC on Linux High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN, June 2006 Luca Canali, CERN IT Outline Goals Architecture of an HA DB Service Deployment at the CERN Physics Database

More information

Using EonStor FC-host Storage Systems in VMware Infrastructure 3 and vsphere 4

Using EonStor FC-host Storage Systems in VMware Infrastructure 3 and vsphere 4 Using EonStor FC-host Storage Systems in VMware Infrastructure 3 and vsphere 4 Application Note Abstract This application note explains the configure details of using Infortrend FC-host storage systems

More information

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program

More information

SnapServer NAS GuardianOS 6.5 Compatibility Guide May 2011

SnapServer NAS GuardianOS 6.5 Compatibility Guide May 2011 SnapServer NAS GuardianOS 6.5 Compatibility Guide May 2011 1 Table of Contents 1 Introduction... 3 2 Supported SnapServer NAS Systems... 3 3 Client Compatibility... 3 3.1 Recommended Active Concurrent

More information

Running a Workflow on a PowerCenter Grid

Running a Workflow on a PowerCenter Grid Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

More information

Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation

Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation Scale and Availability Considerations for Cluster File Systems David Noy, Symantec Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted.

More information

Course 10971:Storage and High Availability with Windows Server

Course 10971:Storage and High Availability with Windows Server Course 10971:Storage and High Availability with Windows Server Type:Course Audience(s):IT Professionals Technology:Windows Server Level:300 This Revision:B Delivery method: Instructor-led (classroom) Length:4

More information

Understanding Disk Storage in Tivoli Storage Manager

Understanding Disk Storage in Tivoli Storage Manager Understanding Disk Storage in Tivoli Storage Manager Dave Cannon Tivoli Storage Manager Architect Oxford University TSM Symposium September 2005 Disclaimer Unless otherwise noted, functions and behavior

More information

High Availability Storage

High Availability Storage High Availability Storage High Availability Extensions Goldwyn Rodrigues High Availability Storage Engineer SUSE High Availability Extensions Highly available services for mission critical systems Integrated

More information

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs Distributed File System Choices: Red Hat Storage, GFS2 & pnfs Ric Wheeler Architect & Senior Manager, Red Hat June 27, 2012 Overview Distributed file system basics Red Hat distributed file systems Performance

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

SAN Conceptual and Design Basics

SAN Conceptual and Design Basics TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card

More information

Practical Challenges in Scaling Storage Networks

Practical Challenges in Scaling Storage Networks Practical Challenges in Scaling Networks First Intelligent Workshop May 19-21, 2003 Mark Bakke Cisco Systems Cisco Networking 5428 Stackable iscsi-fc switch/gateway Small-medium business 9xxx Modular FC-based

More information

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK 2014 2015 Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2.

M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK 2014 2015 Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2. M.Sc. IT Semester III VIRTUALIZATION QUESTION BANK 2014 2015 Unit 1 1. What is virtualization? Explain the five stage virtualization process. 2. What are the different types of virtualization? Explain

More information

VMware vsphere Data Protection 6.0

VMware vsphere Data Protection 6.0 VMware vsphere Data Protection 6.0 TECHNICAL OVERVIEW REVISED FEBRUARY 2015 Table of Contents Introduction.... 3 Architectural Overview... 4 Deployment and Configuration.... 5 Backup.... 6 Application

More information

Table of contents. Matching server virtualization with advanced storage virtualization

Table of contents. Matching server virtualization with advanced storage virtualization Matching server virtualization with advanced storage virtualization Using HP LeftHand SAN and VMware Infrastructure 3 for improved ease of use, reduced cost and complexity, increased availability, and

More information