Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann
|
|
|
- Chester Barton
- 10 years ago
- Views:
Transcription
1 Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann
2 Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies reorganizing the whole data Re-striping requires the movement of all data-blocks Time t striping for re-layout grows linear in capacity: Trend t striping = k * C old where k is a constant and C old is the already stored capacity Newly integrated capacity C new is always smaller than C old
3 Assumptions How expensive is re-striping? 36 GByte of data can be re-distributed in each hour 100 GByte of new capacity C new have to added Already existing capacity C old between 100 GByte and 1 PByte Restriping tim (hours) Existing capacity (TBytes)
4 Introduction Randomization Deterministic data placement schemes suffered many drawbacks for a long time Heterogeneity has been an issue It has been costly to adapt to new storage systems It is difficult to support storage-on-demand concepts Is there an alternative to deterministic schemes? Yes, Randomization can help to overcome these drawbacks, but new challenges might be introduced!
5 Basic Results: Balls into bins Games II Assign n balls to n bins For every ball, choose one bin independently, uniformly at random Maximum load is sharply concentrated: where w.h.p. abbreviates with probability at least, for any fixed
6 Balls into bins Games I Basic tasks of balls into bins games Assign a set of m balls to n bins Motivation Idea: Just take a random position! Bins = Hard disks Balls = Data items L = max number of data items on each disk Where should I place the next item?
7 This sounds terrible: Balls into bins Games III The maximum loaded hard disk stores -times more data than the average This seems not to be scalable, or The model assumes that only very few data items are stored inside the environment, but each disk is able to store many objects Let s assume that many objects means Perfect! Then it holds w.h.p. that Additional Offset see, e.g, M. Raab, A. Steger: Balls into Bins - A Simple and Tight Analysis
8 Distributed Hash Tables Randomization introduces some (well known) challenges Key questions are: How can we retrieve a stored data item? How can we adapt to a changing number of disks? How can we handle heterogeneity? How can we support redundancy? Key Tasks of Distributed Hash Tables (DHTs)
9 Consistent Hashing I Introduced in the context of Web Caching Bins are mapped by a pseudo-random hash function h: on a ring (of length 1) Bins become responsible for their interval 1 Balls are mapped by 5 3 an additional hash 2 function g: onto the 4 6 ring Each bin stores balls in its interval See D. Karger, E. Lehman et al.: Consistent Hashing and Random Trees: Tools for Relieving Hot Spots on the World Wide Web
10 Consistent Hashing II Average load of each bin is, but deviation from average can be high: The maximum arc length on the ring becomes w.h.p. Solution: Each bin is mapped by a set of independent hash functions to multiple points on the ring The maximum arc length assigned to a bin can be reduced to for an arbitrary small constant, if virtual bins are used for each physical bin See I. Stoica, R. Morris, et al.: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications.
11 Join and Leave-Operations I In a dynamic network, nodes can join and leave any time The main goal of a DHT is to have the ability to locate every key in the network at (nearly) any time (Planned) Removal of bins changes the length 1 of its neighbor interval Data has to be moved 3 to neighbor Insertion of bins also only 7 changes interval length of its new neighbor
12 Join and Leave-Operations II Definition of a View V: A view V is a set of bins of which a particular client is aware of. Monotonicity: A ranged hash function f is monotone if for all views implies Monotonicity implies that in case of a join operation of a bin i, all moved data items have destination i Consistent Hashing has property of monotonicity
13 Heterogeneous Bins Consistent Hashing is (nearly) optimally suited for homogeneous environment, where all bins (disks) have same capacity and performance Heterogeneous bins can be mapped to Consistent Hashing by using a different number of virtual bins for each physical bin The relation between the number of different bins constantly changes Monotonicity (and some other properties) can not be kept up
14 Why is heterogeneity an issue? Definition A heterogeneous set of disks is a set of disks with different performance and capacity characteristics They are becoming a common configuration Replacing an old disk Adding new disks Cluster build from already existing (heterogeneous) components
15 Traditional solution Many systems just ignore it: all disks are treated as equal The usable size of all disks is like the smallest one The performance of all disks is assumed as the slowest one Implications No performance gain is obtained Except for some implicit side effect Not all potential capacity gain is obtained Some systems use the unused disk space to build a virtual disk
16 THE DATA STORAGE EVOLUTION. Has disk capacity outgrown its usefulness? by Ron Yellin (Terada magazine 2006) Disk capacity
17 THE DATA STORAGE EVOLUTION. Has disk capacity outgrown its usefulness? by Ron Yellin (Terada magazine 2006) Disk performance
18 THE DATA STORAGE EVOLUTION. Has disk capacity outgrown its usefulness? by Ron Yellin (Terada magazine 2006) Capacity vs. performance
19 Growth storage needs Information point of view Increase of 30% each year How much information 2003? Peter Lyman and Hal R. Varian School of Information Management and Systems University of California at Berkeley Manufacturers point of view Increase capacity 50% each year Drive manufacturers THE DATA STORAGE EVOLUTION. Has disk capacity outgrown its usefulness? by Ron Yellin, Terada magazine 2006
20 Share Strategy I g(d) l(c d ) 0 1 Share Strategy tries to map heterogeneous problem to homogeneous solution Each bin d is assigned by a hash function g: to a start point g(d) inside [0,1)-interval The length l of the interval is proportional to the capacity c i (performance, or other metric) of bin i d p o See A. Brinkmann, K. Salzwedel, C. Scheideler: Compact, adaptive placement schemes for non-uniform distribution requirements.
21 Share Strategy II 0 x h(x) How to retrieve location of a data item x inside this heterogeneous setting? Use hash function h: to map x to [0,1)-Interval Use DHT for homogeneous bins to retrieve location of x from all intervals cutting h(x)
22 Share Strategy III 0 x h(x) Properties: (Arbitrary) optimal distribution of balls and bins Computational Complexity in O(1) Competitive Ratio concerning Join and Leave is (1+ ) for arbitrary >0 But Share has been optimized for usage in data center environments Share is not monotone and only partially suited for P2P networks
23 V:Drive SAN MDA V:Drive out-of-band virtualization environment each (Linux) server includes additional blocklevel driver module metadata appliance ensures consistent view on storage and servers Share strategy used as data distribution strategy See A. Brinkmann, S. Effert, et al.: Influence of Adaptive Data Layouts on Performance in dynamically changing Storage Environments
24 Performance V:Drive - Static Throughput (MB/s) Physical 80 Volumes VDrive LVM 60 Avg. latency (ms) Synthetic random I/O benchmark, static configuration Physical volumes VDrive LVM
25 Performance V:Drive Dynamic Throughput (MB/s) Physical 50 volumes VDrive 40 LVM Avg. latency (ms) Synthetic random I/O benchmark, dynamic configuration Physical volumes VDrive LVM
26 V:Drive - Reconfiguration Overhead Throughput / MByte/s Avg. Latency / ms Time / minutes 0
27 Randomization and Redundancy Randomized data distribution schemes do not include mechanisms to safe data against disk failures Question: How to use Randomization and RAID schemes together Assumption: n copies of a data block have to be distributed over n disks No two copies of a data block are allowed to be stored on the same disk
28 Trivial Solutions Trivial Solution I: Divide storage systems into n storage pools Distribute first copies over first pool,, n-th copies over n-th pool Missing flexibility Trivial Solution II: First copy will be distributed over all disks Second copy will be distributed about all but the previously chosen disk, Not able to use capacity efficiently First Copy Second Copy
29 Observation Trivial Solution II is not able to use capacity efficiently, because big storage systems will be penalized compared to smaller devices Theorem: Assume a trivial replication strategy that has to distribute k copies of m balls over n > k bins. Furthermore, the biggest bin has a capacity c max that is at least (1 + ) c j of the next biggest bin j. In this case, the expected load of the biggest bin will be smaller than the expected load required for an optimal capacity efficiency. See A. Brinkmann, S. Effert, et al.: Dynamic and Redundant Data Placement, ICDCS 2007
30 Idea Algorithm has to ensure that bigger bins get data items according to their capacities This can be ensured by an algorithm that iterates over a sorted list of bins 1. At each iteration, the algorithm randomly decides, whether or whether not to place the ball 2. If one of k copies of a ball has been placed, use optimal strategy for (k-1) with remaining bins as input Challenge: How to make random decision in step 1 of each iteration
31 LinMirror
32 Example for Mirroring (k=2) denotes the relative capacity of disk i to all disks denotes the relative capacity of disk i to all disks starting with index i is the weight for the random decision!
33 Example for Mirroring (k=2) If, e.g., disk 2 is chosen as first copy of a mirror, just distribute the second copy according to Share over disks 3, 4, and 5 Some adaptation is necessary, if disk 3 is chose, because weight of disk 4 is greater 1
34 Observations LinMirror is 4-competitive concerning insertion and deletion of a bin Strategy can easily be extended to arbitrary k Lower and upper bound is (k+1)/2 for homogeneous bins (can be improved to 1-competitive) Data distribution is optimal Redistribution of data in dynamic environment is ln n-competitive for arbitrary k Computational complexity can be reduced to O(k)
35 Fairness of k-fold Replication Usage in % Disks 10 Disks 12 Disks 10 Disks 8 Disks
36 Adaptivity of k-fold Replication 6 5 Competitiveness Number of Disks Add as Biggest Add as Smallest
37 Metadata Management Assignment of data items to disks can be solved efficiently for random data distribution schemes Very good distribution of data and requests Computational complexity low Adaptivity to new infrastructures optimal without redundancy, ok with redundancy Over-provisioning can be efficiently integrated but how to find position of data item on the disks? Equal to the dictionary problem Requires O(n) entries to find location of n objects! Defines bulk set of metadata
38 Dictionary Problem Extent Size vs. Volume Size 4 KB 16 KB 256 KB 4MB 16MB 256 MB 1 GB 1 GB 8 MB 2 MB 128 KB 8 KB 2 KB 128 Byte 32 Byte 64 GB 512 MB 128 MB 8 MB 512 KB 128 KB 8 KB 2 KB 1 TB 8 GB 2 GB 128 MB 8 MB 2 MB 128 KB 32 KB 64 TB 512 GB 128 GB 8 GB 512 MB 128 MB 8 MB 2 MB 1 PB 8 TB 2 TB 128 GB 8 GB 2 GB 128 MB 32 MB Extent: Smallest continuous unit that can be addressed by virtualization solution Dictionary easily becomes too big to be stored inside each server system for small extent sizes Solutions Caching Huge extent sizes Object Based Storage Systems
39 Key Value Storage To meet reliability and scaling needs, Amazon has developed a number of storage technologies Amazon Simple Storage Service S3 There are many services on Amazon s platform that only need primary-key access to a data store best seller lists, shopping carts, customer, preferences, session management, sales rank, and product catalog Key Value Stores provide simple primary-key only interface to meet the requirements of these applications See DeCandia, et al.: Dynamo: Amazon s Highly Available Key-value Store
40 Dynamo Dynamo uses a synthesis of well known techniques to achieve scalability and availability Data is partitioned and replicated using consistent hashing Consistency is facilitated by object versioning Consistency among replicas during updates is maintained by quorum-like technique and a decentralized replica synchronization protocol Gossip based distributed failure detection and membership protocol Dynamo is a completely decentralized system with minimal need for manual administration
41 Query Model: Assumptions and Requirements Simple read and write operations to data that is uniquely identified by a key. State is stored as binary objects (i.e., blobs) No operations span multiple data items and there is no need for relational schema
42 Assumptions and Requirements ACID Properties: ACID (Atomicity, Consistency, Isolation, Durability) Experience at Amazon has shown that data stores that provide ACID guarantees tend to have poor availability Dynamo targets applications that operate with weaker consistency (the C in ACID) if this results in high availability Dynamo does not provide any isolation guarantees and permits only single key updates Environment is non-hostile
V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System
V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System André Brinkmann, Michael Heidebuer, Friedhelm Meyer auf der Heide, Ulrich Rückert, Kay Salzwedel, and Mario Vodisek Paderborn
Distributed Data Stores
Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh
Dynamo: Amazon s Highly Available Key-value Store
Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and
Big Data & Scripting storage networks and distributed file systems
Big Data & Scripting storage networks and distributed file systems 1, 2, adaptivity: Cut-and-Paste 1 distribute blocks to [0, 1] using hash function start with n nodes: n equal parts of [0, 1] [0, 1] N
The Advantages and Disadvantages of Network Computing Nodes
Big Data & Scripting storage networks and distributed file systems 1, 2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node
A Case for Virtualized Arrays of RAID
A Case for Virtualized Arrays of RAID André Brinkmann, Kay Salzwedel, Mario Vodisek University of Paderborn, Germany Email: [email protected], {nkz, vodisek}@upb.de. Abstract Redundant arrays of independent
How To Virtualize A Storage Area Network (San) With Virtualization
A New Method of SAN Storage Virtualization Table of Contents 1 - ABSTRACT 2 - THE NEED FOR STORAGE VIRTUALIZATION 3 - EXISTING STORAGE VIRTUALIZATION METHODS 4 - A NEW METHOD OF VIRTUALIZATION: Storage
High Throughput Computing on P2P Networks. Carlos Pérez Miguel [email protected]
High Throughput Computing on P2P Networks Carlos Pérez Miguel [email protected] Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured
File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System
CS341: Operating System Lect 36: 1 st Nov 2014 Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati File System & Device Drive Mass Storage Disk Structure Disk Arm Scheduling RAID
<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store
Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information
Cassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000
The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000 Summary: This document describes how to analyze performance on an IBM Storwize V7000. IntelliMagic 2012 Page 1 This
A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage
Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf
Object Request Reduction in Home Nodes and Load Balancing of Object Request in Hybrid Decentralized Web Caching
2012 2 nd International Conference on Information Communication and Management (ICICM 2012) IPCSIT vol. 55 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V55.5 Object Request Reduction
Load Balancing in Structured Overlay Networks. Tallat M. Shafaat tallat(@)kth.se
Load Balancing in Structured Overlay Networks Tallat M. Shafaat tallat(@)kth.se Overview Background The problem : load imbalance Causes of load imbalance Solutions But first, some slides from previous
Deep Dive: Maximizing EC2 & EBS Performance
Deep Dive: Maximizing EC2 & EBS Performance Tom Maddox, Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved What we ll cover Amazon EBS overview Volumes Snapshots
OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006
OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 EXECUTIVE SUMMARY Microsoft Exchange Server is a disk-intensive application that requires high speed storage to deliver
Physical Data Organization
Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor
Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam
A Survey on P2P File Sharing Systems Using Proximity-aware interest Clustering Varalakshmi.T #1, Arul Murugan.R #2 # Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam
Violin: A Framework for Extensible Block-level Storage
Violin: A Framework for Extensible Block-level Storage Michail Flouris Dept. of Computer Science, University of Toronto, Canada [email protected] Angelos Bilas ICS-FORTH & University of Crete, Greece
A Dell Technical White Paper Dell Compellent
The Architectural Advantages of Dell Compellent Automated Tiered Storage A Dell Technical White Paper Dell Compellent THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL
RADOS: A Scalable, Reliable Storage Service for Petabyte- scale Storage Clusters
RADOS: A Scalable, Reliable Storage Service for Petabyte- scale Storage Clusters Sage Weil, Andrew Leung, Scott Brandt, Carlos Maltzahn {sage,aleung,scott,carlosm}@cs.ucsc.edu University of California,
A Novel Data Placement Model for Highly-Available Storage Systems
A Novel Data Placement Model for Highly-Available Storage Systems Rama, Microsoft Research joint work with John MacCormick, Nick Murphy, Kunal Talwar, Udi Wieder, Junfeng Yang, and Lidong Zhou Introduction
RAID Performance Analysis
RAID Performance Analysis We have six 500 GB disks with 8 ms average seek time. They rotate at 7200 RPM and have a transfer rate of 20 MB/sec. The minimum unit of transfer to each disk is a 512 byte sector.
Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen
Design and Implementation of a Storage Repository Using Commonality Factoring IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Axion Overview Potentially infinite historic versioning for rollback and
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Dynamo: Amazon s Highly Available Key-value Store
Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and
Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
Chapter 13. Disk Storage, Basic File Structures, and Hashing
Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing
Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer
Disks and RAID Profs. Bracy and Van Renesse based on slides by Prof. Sirer 50 Years Old! 13th September 1956 The IBM RAMAC 350 Stored less than 5 MByte Reading from a Disk Must specify: cylinder # (distance
Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
NoSQL. Thomas Neumann 1 / 22
NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,
RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
FAWN - a Fast Array of Wimpy Nodes
University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed
OceanStor UDS Massive Storage System Technical White Paper Reliability
OceanStor UDS Massive Storage System Technical White Paper Reliability Issue 1.1 Date 2014-06 HUAWEI TECHNOLOGIES CO., LTD. 2013. All rights reserved. No part of this document may be reproduced or transmitted
G22.3250-001. Porcupine. Robert Grimm New York University
G22.3250-001 Porcupine Robert Grimm New York University Altogether Now: The Three Questions! What is the problem?! What is new or different?! What are the contributions and limitations? Porcupine from
Hadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
Practical Cassandra. Vitalii Tymchyshyn [email protected] @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
Q & A From Hitachi Data Systems WebTech Presentation:
Q & A From Hitachi Data Systems WebTech Presentation: RAID Concepts 1. Is the chunk size the same for all Hitachi Data Systems storage systems, i.e., Adaptable Modular Systems, Network Storage Controller,
Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card
Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card Version 1.0 April 2011 DB15-000761-00 Revision History Version and Date Version 1.0, April 2011 Initial
DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering
DELL RAID PRIMER DELL PERC RAID CONTROLLERS Joe H. Trickey III Dell Storage RAID Product Marketing John Seward Dell Storage RAID Engineering http://www.dell.com/content/topics/topic.aspx/global/products/pvaul/top
The Sierra Clustered Database Engine, the technology at the heart of
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
Simple, Exact Placement of Data in Containers
Simple, Exact Placement of Data in Containers Thomas Schwarz, S.J. Universidad Católica del Uruguay Montevideo, Uruguay [email protected] Ignacio Corderí Darrell D.E. Long University of California Santa
1. Comments on reviews a. Need to avoid just summarizing web page asks you for:
1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of
P2P Storage Systems. Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung
P2P Storage Systems Prof. Chun-Hsin Wu Dept. Computer Science & Info. Eng. National University of Kaohsiung Outline Introduction Distributed file systems P2P file-swapping systems P2P storage systems Strengths
A Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011
A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example
Appendix A Core Concepts in SQL Server High Availability and Replication
Appendix A Core Concepts in SQL Server High Availability and Replication Appendix Overview Core Concepts in High Availability Core Concepts in Replication 1 Lesson 1: Core Concepts in High Availability
Google File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)
Distributed Computing over Communication Networks: Topology (with an excursion to P2P) Some administrative comments... There will be a Skript for this part of the lecture. (Same as slides, except for today...
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol-1, Iss.-3, JUNE 2014, 54-58 IIST SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE
GraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
EMC XTREMIO EXECUTIVE OVERVIEW
EMC XTREMIO EXECUTIVE OVERVIEW COMPANY BACKGROUND XtremIO develops enterprise data storage systems based completely on random access media such as flash solid-state drives (SSDs). By leveraging the underlying
Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems
Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems 1 Some Numbers (2010) Over 260 Billion images (20 PB) 65 Billion X 4 different sizes for each image. 1 Billion
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1
Slide 13-1 Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible
Theoretical Aspects of Storage Systems Autumn 2009
Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements
Reliability and Fault Tolerance in Storage
Reliability and Fault Tolerance in Storage Dalit Naor/ Dima Sotnikov IBM Haifa Research Storage Systems 1 Advanced Topics on Storage Systems - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
EqualLogic PS Series Load Balancers and Tiering, a Look Under the Covers. Keith Swindell Dell Storage Product Planning Manager
EqualLogic PS Series Load Balancers and Tiering, a Look Under the Covers Keith Swindell Dell Storage Product Planning Manager Topics Guiding principles Network load balancing MPIO Capacity load balancing
Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.
Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE
Information Searching Methods In P2P file-sharing systems
Information Searching Methods In P2P file-sharing systems Nuno Alberto Ferreira Lopes PhD student (nuno.lopes () di.uminho.pt) Grupo de Sistemas Distribuídos Departamento de Informática Universidade do
How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda
How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda 1 Outline Build a cost-efficient Swift cluster with expected performance Background & Problem Solution Experiments
Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright 2004 Pearson Education, Inc. Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files
Using Object Database db4o as Storage Provider in Voldemort
Using Object Database db4o as Storage Provider in Voldemort by German Viscuso db4objects (a division of Versant Corporation) September 2010 Abstract: In this article I will show you how
OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni
OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni Agenda Database trends for the past 10 years Era of Big Data and Cloud Challenges and Options Upcoming database trends Q&A Scope
Hypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How
CS435 Introduction to Big Data
CS435 Introduction to Big Data Final Exam Date: May 11 6:20PM 8:20PM Location: CSB 130 Closed Book, NO cheat sheets Topics covered *Note: Final exam is NOT comprehensive. 1. NoSQL Impedance mismatch Scale-up
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
1 Storage Devices Summary
Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious
The Classical Architecture. Storage 1 / 36
1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage
www.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
Graph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
2009 Oracle Corporation 1
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,
How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.
RESEARCH ARTICLE ISSN: 2321-7758 GLOBAL LOAD DISTRIBUTION USING SKIP GRAPH, BATON AND CHORD J.K.JEEVITHA, B.KARTHIKA* Information Technology,PSNA College of Engineering & Technology, Dindigul, India Article
The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000)
The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000) IntelliMagic, Inc. 558 Silicon Drive Ste 101 Southlake, Texas 76092 USA Tel: 214-432-7920
DB2 Database Layout and Configuration for SAP NetWeaver based Systems
IBM Software Group - IBM SAP DB2 Center of Excellence DB2 Database Layout and Configuration for SAP NetWeaver based Systems Helmut Tessarek DB2 Performance, IBM Toronto Lab IBM SAP DB2 Center of Excellence
Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems
215 IEEE International Conference on Big Data (Big Data) Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems Guoxin Liu and Haiying Shen and Haoyu Wang Department of Electrical
An Optimization Model of Load Balancing in P2P SIP Architecture
An Optimization Model of Load Balancing in P2P SIP Architecture 1 Kai Shuang, 2 Liying Chen *1, First Author, Corresponding Author Beijing University of Posts and Telecommunications, [email protected]
Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
Evaluation of NoSQL databases for large-scale decentralized microblogging
Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica
Cray DVS: Data Virtualization Service
Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with
Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage 11-16-2015
Milestone Solution Partner IT Infrastructure MTP Certification Report Scality RING Software-Defined Storage 11-16-2015 Table of Contents Introduction... 4 Certified Products... 4 Key Findings... 5 Solution
DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing
DFSgc Distributed File System for Multipurpose Grid Applications and Cloud Computing Introduction to DFSgc. Motivation: Grid Computing currently needs support for managing huge quantities of storage. Lacks
Load Balancing in Structured Peer to Peer Systems
Load Balancing in Structured Peer to Peer Systems DR.K.P.KALIYAMURTHIE 1, D.PARAMESWARI 2 Professor and Head, Dept. of IT, Bharath University, Chennai-600 073 1 Asst. Prof. (SG), Dept. of Computer Applications,
Original-page small file oriented EXT3 file storage system
Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: [email protected]
Designing a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
Load Balancing in Structured Peer to Peer Systems
Load Balancing in Structured Peer to Peer Systems Dr.K.P.Kaliyamurthie 1, D.Parameswari 2 1.Professor and Head, Dept. of IT, Bharath University, Chennai-600 073. 2.Asst. Prof.(SG), Dept. of Computer Applications,
