Lecture 5 Distributed Database and BigTable
|
|
- Julianna Wilkerson
- 8 years ago
- Views:
Transcription
1 Lecture 5 Distributed Database and BigTable 922EU3870 Cloud Computing and Mobile Platforms, Autumn 2009 (2009/10/12) Ping Yeh ( 葉 平 ), Google, Inc.
2 Numbers real world engineers should know L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 100 ns Main memory reference 100 ns Compress 1 KB with Zippy 10,000 ns Send 2 KB through 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within the same data center 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from network 10,000,000 ns Read 1 MB sequentially from disk 30,000,000 ns Round trip between California and Netherlands 150,000,000 ns 2
3 The Joys of Real Hardware Typical first year for a new cluster: ~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover) ~1 PDU failure (~ machines suddenly disappear, ~6 hours to come back) ~1 rack-move (plenty of warning, ~ machines powered down, ~6 hours) ~1 network rewiring (rolling ~5% of machines down over 2-day span) ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back) ~5 racks go wonky (40-80 machines see 50% packetloss) ~8 network maintenances (4 might cause ~30-minute random connectivity losses) ~12 router reloads (takes out DNS and external vips for a couple minutes) ~3 router failures (have to immediately pull traffic for an hour) ~dozens of minor 30-second blips for dns ~1000 individual machine failures ~thousands of hard drive failures slow disks, bad memory, misconfigured machines, flaky machines, etc.
4 Overview of Distributed Database Silberschatz, Korth, Sudarsha, Database System Concepts, 4 th ed., McGraw Hill M. Tamer Özsu, DISTRIBUTED DATABASE SYSTEMS, doi= &rep=rep1&type=pdf 4
5 Transaction Glossary A unit of consistent and atomic execution against the database. Termination protocol A protocol by which individual sites can decide how to terminate a particular transaction when they cannot communicate with other sites where the transaction executes. Concurrency control algorithm, Distributed DBMS, Locking, Logging protocol, One-copy equivalence, Query processing, Query optimization, Quorum-based voting algorithm, Read-once, write-all protocol, Serializability, Transparency, Two-phase commit, Two-phase locking 5 5
6 Atomicity ACID properties of transactions either all the operations of a transaction are executed or none of them are (all-or-nothing). Consistency the database is in a legal state before and after a transaction Isolation the effects of one transaction on the database are isolated from other transactions until the first completes its execution. Durability the effects of successfully completed (i.e., committed) transactions endure subsequent failures. 6 6
7 Introduction Database Management System (DBMS) Distributed database system = distributed database + distributed DBMS Distributed database: a collection of multiple inter-correlated databases distributed over a computer network. Distributed DBMS: a DBMS that can manage a distributed database and make the distribution transparent to users. Consists of: query nodes: user interface routines data nodes: data storage Loosely coupled: connected with network, each node has its own storage / processor / operating system 7 7
8 Database System Architectures Centralized one host for everything, multi-processor possible but a transaction gets only one processor. Parallel a transaction may be processed by multiple processors. Client-Server database stored on one server host for multiple clients, centrally managed. Distributed database stored on multiple hosts, transparent to clients. Peer to Peer each node is a client and a server; requires sophisticated protocols, still in development. 8 8
9 Data Models Hierarchical Model: data organized in a tree namespace Network Model: like Hierarchical Model, but a data may have multiple parents Entity-Relationship Model: data are organized in entities which can have relationships among them Object-Oriented Model: database capability in an objectoriented language. Semi-structured Model: schema is contained in data (often associated with self-describing and XML ) etc. 9 9
10 Data distribution Data is physically distributed among data nodes Fragmentation: divide data onto data nodes Replication: copy data among data nodes Fragmentation enables placing data close to clients may reduce size of data involved may reduce transmission cost Replication preferable when the same data are accessed from applications that run at multiple nodes may be more cost-effective to duplicate data at multiple nodes rather than continuously moving it between them Many different schemes of fragmentation and replication 10 10
11 Fragmentation Horizontal fragmentation: split by rows based on a fragmentation predicate. Last name First name Department ID Chang Three Computer Science X12045 Lee Four Law Y34098 Chang Frank Medicine Z99441 Wang Andy Medicine S94717 Vertical fragmentation: split by columns based on attributes. Last name First name Department ID Chang Three Computer Science X12045 Lee Four Law Y34098 Chang Frank Medicine Z99441 Wang Andy Medicine S94717 Also called partition in some literature
12 Other properties of Distributed Databases Concurrency control Make sure the distributed database is in a consistent state after a transaction Reliability protocols Make sure termination of transactions in the face of failures (system failure, storage failure, lost message, network partition, etc) One copy equivalence The same data item in all replicas must be the same 12 12
13 Query Optimization Looking for the best execution strategy for a given query Typically done in 4 steps query decomposition: translate query to relational algebra (for relational database) and analyze/simplify it data localization: decide which fragments are involved and generate local queries to fragments global optimization: finding the best execution strategy of queries and messages to fragments local optimization: optimize the query at a node for a fragment sophisticated topic 13 13
14 B+ Tree A data structure often used for indices of file systems or databases Data are indexed by keys Leaf nodes store data, nodes in interim levels store links Usually use one disk block per node to reduce disk seeks Except the root node, number of links or data in each node is bounded in [d/2, d]. d = order of the B+ tree, typically large d 1 d 4 d 5 d 30 d 33 d 34 d 822 d 823 d
15 Insertion to a B+ tree InsertToTree(key, value, bplus_tree): node = Find(key, bplus_tree) # find the node to insert Insert(key, value, node) # insert Insert(key, value, node): AddData(key, value, node) if Size(node) > d: new_node = Split(node) # node -> new_node + node Insert(new_node.lastkey(), new_node, parent) May produce a new root node (not shown)
16 Deletion in a B+ tree DeleteInTree(key, bplus_tree): node = Find(key, bplus_tree) # find the node to delete if not node: return False Delete(key, node) # delete return True Delete(key, node): RemoveData(key, node) if Size(node) < d/2: RedistributeOrMerge(node, parent) d d 30 d 33 d 34 d 822 d 823 d 1 d 4 d
17 Features of a B+ Tree Good fit for sorted data stored in block storage devices Fast search: O(log d N) with large d Fast range scan with links from one leaf node to the next: O(log d N+k) where k = number of elements Insertion may cause splitting of nodes Deletion may cause merge of nodes Many optimizations exist (with pros vs. cons) data structure of a node (array, binary tree, linked list, etc) compression of keys in a node lazy deletion RAM resident etc 17 17
18 Compressing Data in a B+ Tree How to use less space in nodes? Compressing all keys together most space efficient reading 10 bytes requires uncompressing the whole node Split the keys into blocks and compress each block less space efficient faster in small reads 18 18
19 BigTable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber, OSDI
20 Motivation Lots of (semi-)structured data at Google Web: contents, crawl metadata, links/anchors/pagerank, Per-user data: user preference settings, recent queries, search results, Geographic locations: physical entities (shops, restaurants, etc.), roads, satellite image data, user annotations, Scale is large billions of URLs, many versions/page (~20K/version) Hundreds of millions of users, thousands of q/sec 100TB+ of satellite image data Need data both for offline data processing and online serving 20 20
21 Why not use a commercial DB? Scale is too large for most commercial databases Even if it weren t, cost would be very high Building internally means system can be applied across many projects for low incremental cost Low-level storage optimizations help performance significantly Much harder to do when running on top of a database layer Also fun and challenging to build large-scale systems :) 21 21
22 Goals Wide applicability by many Google products and projects Often want to examine data changes over time, e.g., Contents of a web page over multiple crawls both throughput-oriented batch-processing jobs and latencysensitive serving of data to end users Scalability Handful to thousands of servers, hundreds of TB to PB High performance Very high read/write rates (millions of ops per second) Efficient scans over all or interesting subsets of data High availability Want access to most current data at any time 22 22
23 BigTable Distributed multi-level map With an interesting data model Fault-tolerant, persistent Scalable Thousands of servers Terabytes of in-memory data Petabyte of disk-based data Millions of reads/writes per second, efficient scans Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance 23 23
24 Status Design/initial implementation started beginning of 2004 Production use or active development for many projects: Google Analytics Personal Search History Crawling/indexing pipeline Google Maps/Google Earth Blogger ~100 BigTable cell with largest cell manages ~200TB of data spread over several thousand machines circa
25 Building Blocks of BigTable Distributed File System (GFS): stores persistent state Scheduler (not published): schedules jobs onto machines BigTable jobs are among all kinds of jobs Lock service (Chubby): distributed lock manager Also can reliably hold small files with high availability Master election, location bootstrapping Data processing (MapReduce): Simplified large-scale data processing Often used to read/write BigTable data (not a building block of BigTable, but uses BigTable heavily) 25 25
26 Google File System (GFS) Master manages metadata Data transfers happen directly between clients/chunkservers Files broken into chunks (typically 64 MB) Chunks triplicated across three machines for safety See SOSP 03 paper at master chunk server chunk server chunk server client client client 26
27 Chubby Distributed lock service with a file system for small files Usually have 5 servers running paxos algorithm maintain consistency fault-tolerant master election event notification mechanism Also used for name resolution in the cluster 27
28 Master Key Jobs in a BigTable Cluster schedules tablets assignments quota management health check of tablet servers garbage collection management Tablet servers serve data for reads and writes (one tablet is assigned to exactly one tablet server) compaction replication etc monitor 28 28
29 Typical Cluster Cluster scheduling master Lock service GFS master Machine 1 Machine 2 Machine N User app1 User app2 BigTable server User app1 BigTable server BigTable master Scheduler slave GFS chunkserver Scheduler slave GFS chunkserver Scheduler slave GFS chunkserver Linux Linux Linux 29 29
30 BigTable Overview Data Model Implementation Structure Tablets, compactions, locality groups, API Details Shared logs, compression, replication, Current/Future Work 30 30
31 Basic Data Model Semi-structured: multi-dimensional sparse map (row, column, timestamp) cell contents Columns contents: inlinks: Rows com.cnn.www <html> t 3 t 11 t 17 Timestamps Good match for most of Google's applications 31 31
32 Everything is a string Every row has a single key Rows An arbitrary string (how about numerical keys?) Access to data in a row is atomic Row creation is implicit upon storing data Rows ordered lexicographically by key Rows close together lexicographically usually on one or a small number of machines Question: key distribution? Hot rows? No such things as empty row (see Columns page) 32 32
33 Arbitrary number of columns Columns organized into column families, then locality groups data in the same locality group are stored together (more later) Don't predefine columns (compare: schema) multi-map, not table. column names are arbitrary strings. sparse: a row contains only the columns that have data 33 33
34 Column Family Must be created before any column in the family can be written Has a type: string, protocol buffer, Basic unit of access control and usage accounting: different applications need access to different column families. careful with sensitive data A column key is named as family:qualifier family: printable; qualifier: any string. usually not a lot of column families in a BigTable cluster (hundreds) one anchor: column family for all anchors of incoming links but unlimited columns for each column family columns: anchor:cnn.com, anchor:news.yahoo.com, anchor:someone.blogger.com, 34 34
35 Reading BigTable operations selection by a combination of row, column or timestamp ranges Writing Write to individual cell versions (row, column, timestamp) Delete different granularities up to row Applied atomicity within a row 35 35
36 Read API Scanner: read arbitrary cells in a bigtable Each row read is atomic Can restrict returned rows to a particular range Can ask for just data from 1 row (Lookup), all rows, etc. Can ask for all columns, just certain column families, specific columns, timestamp ranges (ScanStream) Scanner scanner(t); ScanStream *stream; stream = scanner.fetchcolumnfamily("anchor"); stream->setreturnallversions(); scanner.lookup("com.cnn.www"); for (;!stream->done(); stream->next()) { } printf("%s %s %lld %s\n", scanner.rowname(), stream->columnname(), stream->microtimestamp(), stream->value()); 36 36
37 Metadata operations Write API Create/delete tables, column families, change metadata Row mutation Apply: single row only, atomic, sequence of sets and deletes APIs exist for bulk updates: updates are grouped and sent with one RPC call. Table *T = OpenOrDie("/bigtable/web/webtable"); // Write a new anchor and delete an old anchor RowMutation r1(t, "com.cnn.www"); r1.set("anchor: "CNN"); r1.delete("anchor: Operation op; Apply(&op, &r1); 37 37
38 Tablets Large tables broken into tablets at row boundaries Tablet holds contiguous range of rows Clients can often choose row keys to achieve locality Aim for ~100MB to 200MB of data per tablet Serving machine responsible for ~100 tablets Fast recovery: 100 machines each pick up 1 tablet from failed machine Fine-grained load balancing: Migrate tablets away from overloaded machine Master makes load-balancing decisions 38 38
39 Tablets Dynamic fragmentation of rows Unit of load balancing Distributed over tablet servers Tablets split and merge automatically based on size and load or manually Clients can choose row keys to achieve locality 39 39
40 Tablets & Splitting language: contents: aaa.com cnn.com EN <html> cnn.com/sports.html Tablets website.com yahoo.com/kids.html yahoo.com/kids.html\0 zuppa.com/menu.html 40 40
41 Locality Groups Dynamic fragmentation of column families segregates data within a tablet different locality groups different SSTable files on GFS scans over one locality group are O(bytes_in_locality_group), not O(bytes_in_table) Provides control over storage layout memory mapping of locality groups choice of compression algorithms client-controlled block size 41 41
42 Locality Groups contents: Locality Groups language: pagerank: <html> EN
43 Timestamps Used to store different versions of data in a cell New writes default to current time, but timestamps for writes can also be set explicitly by clients Lookup options: Return most recent K values Return all values in timestamp range (or all values) Column familes can be marked w/ attributes: Only retain most recent K values in a cell Keep values until they are older than K seconds 43 43
44 Where is my Tablets? Tablets move around from one tablet server to another (why?) Question: given a row, how does a client find the right tablet server? Tablet server location is ip:port Need to find tablet whose row range covers the target row One approach: could use the BigTable master Central server almost certainly would be bottleneck in large system Instead: store tablet location info in special tablets similar to a B+ tree 44 44
45 Metadata Tablets Approach: 3-level B+-tree like scheme for tablets 1st level: Chubby, points to MD0 (root) 2nd level: MD0 data points to appropriate METADATA tablet 3rd level: METADATA tablets point to data tablets METADATA tablets can be split when necessary MD0 never splits so number of levels is fixed MD
46 Finding Tablet Location Client caches tablet locations. In case if it does not know, it has to make three network round-trips in case cache is empty and up to six round trips in case cache is stale. Tablet locations are stored in memory, so no GFS accesses are required 46
47 Tablet Storage Commit log on GFS Redo log buffered in tablet server's memory A set of locality groups one locality group = a set of SSTable files on GFS key = <row, column, timestamp>, value = cell content 47
48 SSTable SSTable: string to string table. persistent, ordered, immutable map from keys to values. keys and values are arbitrary byte strings. contains a sequence of blocks (typical size = 64KB), with a block index at the end of SSTable loaded at open time. one disk seek per block read. operations: lookup(key), iterate(key_range). an SSTable can be mapped into memory. 48
49 Tablet Serving read Memory memtable (random-access) minor compaction append-only log on GFS minor compaction write SSTable on GFS SSTable on GFS Tablet SSTable: Immutable on-disk ordered map from string->string string keys: <row, column, timestamp> triples 49 49
50 Compactions Tablet state represented as set of immutable compacted SSTable files, plus tail of log (buffered in memory) Minor compaction: When in-memory state fills up, pick tablet with most data and write contents to SSTables stored in GFS Major compaction: Periodically compact all SSTables for tablet into new base SSTable on GFS Storage reclaimed from deletions at this point (garbage collection) 50 50
51 System Structure Bigtable Cell metadata ops Bigtable client Bigtable client library Bigtable master performs metadata ops + load balancing read/write Open() Bigtable tablet server Bigtable tablet server Bigtable tablet server serves data serves data serves data Cluster scheduling system handles failover, monitoring GFS holds tablet data, logs Lock service holds metadata, handles master-election 51 51
52 File Cleaning BigTable generates a lot of files dominated by SSTables SSTables are immutable: they can be created, read, or deleted, but not overwritten. Obsolete SSTables are deleted in a mark-and-sweep garbage collection run by the BigTable master 52 52
53 Chubby Interactions Master election: single Chubby lock Tablet server membership a tablet server creates and acquires an exclusive lock on a uniquely-named file in the servers directory of Chubby when it starts, and stops serving when the lock is lost. master monitors the directory to find tablet servers Chubby stores access control list Metadata Schema information (column family metadata) Tablet advertisement and metadata Replication metadata 53 53
54 Shared Logs Designed for 1M tablets, 1000s of tablet servers 1M logs being simultaneously written performs badly Solution: shared logs Write log file per tablet server instead of per tablet Updates for many tablets co-mingled in same file Start new log chunks every so often (64 MB) Problem: during recovery, server needs to read log data to apply mutations for a tablet Lots of wasted I/O if lots of machines need to read data for many tablets from same log chunk 54 54
55 Recovery: Shared Log Recovery Servers inform master of log chunks they need to read Master aggregates and orchestrates sorting of needed chunks Assigns log chunks to be sorted to different tablet servers Servers sort chunks by tablet, writes sorted data to local disk Other tablet servers ask master which servers have sorted chunks they need Tablet servers issue direct RPCs to peer tablet servers to read sorted data for its tablets 55 55
56 Keys: BigTable Compression Sorted strings of (Row, Column, Timestamp): prefix compression Values: Group together values by type (e.g. column family name) BMDiff across all values in one family BMDiff output for values 1..N is dictionary for value N+1 Zippy as final pass over whole block Catches more localized repetitions Also catches cross-column-family repetition, compresses keys 56 56
57 Compression Many opportunities for compression Similar values in the same row/column at different timestamps Similar values in different columns Similar values across adjacent rows Within each SSTable for a locality group, encode compressed blocks Keep blocks small for random access (~64KB compressed data) Exploit fact that many values very similar Needs to be low CPU cost for encoding/decoding Two building blocks: BMDiff, Zippy 57 57
58 BMDiff Bentley, McIlroy DCC'99: Data Compression Using Long Common Strings Input: dictionary + source Output: sequence of COPY: <x> bytes from offset <y> LITERAL: <literal text> Store hash at every 32-byte aligned boundary in dictionary and source processed so far For every new source byte Compute incremental hash of last 32 bytes, lookup hash table On hit, expand match forwards & backwards, emit COPY Encode: ~ 100 MB/s, Decode: ~1000 MB/s 58 58
59 Zippy LZW-like: Store hash of last four bytes in 16K entry table For every input byte: Compute hash of last four bytes Lookup in table Emit COPY or LITERAL Differences from BMDiff: Much smaller compression window (local repetitions) Hash table is not associative Careful encoding of COPY/LITERAL tags and lengths Sloppy but fast: Algorithm % remaining Encoding Decoding Gzip 13.4% 21 MB/s 118 MB/s LZO 20.5% 135 MB/s 410 MB/s Zippy 22.2% 172 MB/s 409 MB/s 59 59
60 Compression Effectiveness Experiment: store contents for 2.1B page crawl in BigTable instance Key: URL rearranged as com.cnn.www/index.html:http Groups pages from same site together Good for compression Good for clients: efficient to scan over all pages on a web site One compression strategy: gzip each page: ~28% bytes remaining BigTable: BMDiff + Zippy: Type Count (B) Space (TB) Compressed % remaining Web page contents TB 4.2 TB 9.2% Links TB 1.6 TB 13.9% Anchors TB 2.9 TB 12.7% 60 60
61 Bloom Filters A read may need to read many SSTables Idea: use a membership test to remove disk reads for non-existing data membership test: does (row,column) exist in the tablet? Algorithm: Bloom filter No false negatives. False positives: read to find out Update bit vector when new data is inserted. Delete? data in the set: {a 1, a 2, a N } indep. hash functions h 1, h 2, h k v m positions query: b 61 61
62 Replication Often want updates replicated to many BigTable cells in different datacenters Low-latency access from anywhere in world Disaster tolerance Optimistic replication scheme Writes in any of the on-line replicas eventually propagated to other replica clusters 99.9% of writes replicated immediately (speed of light) Currently a thin layer above BigTable client library Working to move support inside BigTable system Replication deployed on My Search History 62 62
63 Performance 63 63
64 Application: Personalized Search Personalized search ( an opt-in service records queries and clicks of a user in Google (web search, image search, news, etc) user can edit the search history search history affects search results Implementation in BigTable one user per row, row name = user ID one column family per action analyzed with MapReduce to produce user profile other products add column families later, quota system 64 64
65 Sample Usages 65 65
66 In Development/Future Plans More expressive data manipulation/access Allow sending small scripts to perform read/modify/write transactions so that they execute on server (kind of stored procedures ) Multi-row (i.e. distributed) transaction support General performance work for very large cells BigTable as a service Interesting issues of resource fairness, performance isolation, prioritization, etc. across different clients App Engine's DataStore 66 66
67 Conclusions Data model applicable to broad range of clients Actively deployed in many of Google s services System provides high performance storage system on a large scale Self-managing Thousands of servers Millions of ops/second Multiple GB/s reading/writing 67 67
Big Table A Distributed Storage System For Data
Big Table A Distributed Storage System For Data OSDI 2006 Fay Chang, Jeffrey Dean, Sanjay Ghemawat et.al. Presented by Rahul Malviya Why BigTable? Lots of (semi-)structured data at Google - - URLs: Contents,
More informationBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber {fay,jeff,sanjay,wilsonh,kerr,m3b,tushar,fikes,gruber}@google.com
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationDesigns, Lessons and Advice from Building Large Distributed Systems. Jeff Dean Google Fellow jeff@google.com
Designs, Lessons and Advice from Building Large Distributed Systems Jeff Dean Google Fellow jeff@google.com Computing shifting to really small and really big devices UI-centric devices Large consolidated
More informationDistributed storage for structured data
Distributed storage for structured data Dennis Kafura CS5204 Operating Systems 1 Overview Goals scalability petabytes of data thousands of machines applicability to Google applications Google Analytics
More informationHypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
More informationStorage of Structured Data: BigTable and HBase. New Trends In Distributed Systems MSc Software and Systems
Storage of Structured Data: BigTable and HBase 1 HBase and BigTable HBase is Hadoop's counterpart of Google's BigTable BigTable meets the need for a highly scalable storage system for structured data Provides
More informationFacebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
More informationA programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
More informationLecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
More informationThe Google File System
The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:
More informationNoSQL, Big Data, and all that
NoSQL, Big Data, and all that Alternatives to the relational model past, present and future Grant Allen fuzz@google.com Technology Program Manager, Principal Architect, Google University of Cambridge,
More informationGoogle File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
More informationDistributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationOpen source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
More informationDistributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
More informationF1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013
F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationFuture Prospects of Scalable Cloud Computing
Future Prospects of Scalable Cloud Computing Keijo Heljanko Department of Information and Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 7.3-2012 1/17 Future Cloud Topics Beyond
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationLecture 6 Cloud Application Development, using Google App Engine as an example
Lecture 6 Cloud Application Development, using Google App Engine as an example 922EU3870 Cloud Computing and Mobile Platforms, Autumn 2009 (2009/10/19) http://code.google.com/appengine/ Ping Yeh ( 葉 平
More informationData Management in the Cloud
Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
More informationBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber {fay,jeff,sanjay,wilsonh,kerr,m3b,tushar,fikes,gruber}@google.com
More informationSAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
More informationBigdata High Availability (HA) Architecture
Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources
More informationIn Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
More informationBig Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationBig Data Storage, Management and challenges. Ahmed Ali-Eldin
Big Data Storage, Management and challenges Ahmed Ali-Eldin (Ambitious) Plan What is Big Data? And Why talk about Big Data? How to store Big Data? BigTables (Google) Dynamo (Amazon) How to process Big
More informationRealtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at
More informationComp 5311 Database Management Systems. 16. Review 2 (Physical Level)
Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster
More informationDistributed Systems. Tutorial 12 Cassandra
Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse
More informationParallel & Distributed Data Management
Parallel & Distributed Data Management Kai Shen Data Management Data management Efficiency: fast reads/writes Durability and consistency: data is safe and sound despite failures Usability: convenient interfaces
More informationData Centers and Cloud Computing. Data Centers
Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet
More informationBigtable is a proven design Underpins 100+ Google services:
Mastering Massive Data Volumes with Hypertable Doug Judd Talk Outline Overview Architecture Performance Evaluation Case Studies Hypertable Overview Massively Scalable Database Modeled after Google s Bigtable
More informationMapReduce Jeffrey Dean and Sanjay Ghemawat. Background context
MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics data 4
More informationData Management in the Cloud -
Data Management in the Cloud - current issues and research directions Patrick Valduriez Esther Pacitti DNAC Congress, Paris, nov. 2010 http://www.med-hoc-net-2010.org SOPHIA ANTIPOLIS - MÉDITERRANÉE Is
More informationScalability of web applications. CSCI 470: Web Science Keith Vertanen
Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches
More informationJeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationHow To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
More informationFAWN - a Fast Array of Wimpy Nodes
University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationCloud Computing mit mathematischen Anwendungen
Cloud Computing mit mathematischen Anwendungen Vorlesung SoSe 2009 Dr. Marcel Kunze Karlsruhe Institute of Technology (KIT) Steinbuch Centre for Computing (SCC) KIT the cooperation of Forschungszentrum
More informationBig Data Processing in the Cloud. Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center
Big Data Processing in the Cloud Shadi Ibrahim Inria, Rennes - Bretagne Atlantique Research Center Data is ONLY as useful as the decisions it enables 2 Data is ONLY as useful as the decisions it enables
More informationlow-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
More informationMASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015
7/04/05 Fundamentals of Distributed Systems CC5- PROCESAMIENTO MASIVO DE DATOS OTOÑO 05 Lecture 4: DFS & MapReduce I Aidan Hogan aidhog@gmail.com Inside Google circa 997/98 MASSIVE DATA PROCESSING (THE
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationQuantcast Petabyte Storage at Half Price with QFS!
9-131 Quantcast Petabyte Storage at Half Price with QFS Presented by Silvius Rus, Director, Big Data Platforms September 2013 Quantcast File System (QFS) A high performance alternative to the Hadoop Distributed
More informationCloud Computing. Up until now
Cloud Computing Lecture 19 Cloud Programming 2011-2012 Up until now Introduction, Definition of Cloud Computing Pre-Cloud Large Scale Computing: Grid Computing Content Distribution Networks Cycle-Sharing
More informationPrinciples of Distributed Database Systems
M. Tamer Özsu Patrick Valduriez Principles of Distributed Database Systems Third Edition
More informationThe Hadoop Distributed File System
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP)
More informationParallel Processing of cluster by Map Reduce
Parallel Processing of cluster by Map Reduce Abstract Madhavi Vaidya, Department of Computer Science Vivekanand College, Chembur, Mumbai vamadhavi04@yahoo.co.in MapReduce is a parallel programming model
More informationCOS 318: Operating Systems
COS 318: Operating Systems File Performance and Reliability Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Topics File buffer cache
More informationPractical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
More informationCOSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network
More informationCassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
More informationComparative analysis of Google File System and Hadoop Distributed File System
Comparative analysis of Google File System and Hadoop Distributed File System R.Vijayakumari, R.Kirankumar, K.Gangadhara Rao Dept. of Computer Science, Krishna University, Machilipatnam, India, vijayakumari28@gmail.com
More informationRaima Database Manager Version 14.0 In-memory Database Engine
+ Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationNoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
More informationMapReduce. Olivier Curé. January 6, 2014. Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France
Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 6, 2014 In more and more situations, data is being so big that it can not be processed on a single machine. Examples: Storage of
More informationDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases aka Just Enough Distributed Systems To Be Dangerous (in 40 minutes) Todd Lipcon (@tlipcon) Cloudera June 11, 2009 Introduction Common Underlying
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationBigdata : Enabling the Semantic Web at Web Scale
Bigdata : Enabling the Semantic Web at Web Scale Presentation outline What is big data? Bigdata Architecture Bigdata RDF Database Performance Roadmap What is big data? Big data is a new way of thinking
More informationBig Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
More informationA Review of Column-Oriented Datastores. By: Zach Pratt. Independent Study Dr. Maskarinec Spring 2011
A Review of Column-Oriented Datastores By: Zach Pratt Independent Study Dr. Maskarinec Spring 2011 Table of Contents 1 Introduction...1 2 Background...3 2.1 Basic Properties of an RDBMS...3 2.2 Example
More informationGraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
More informationWhat is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group grossman@uic.edu ABSTRACT We define analytic infrastructure to be the services,
More informationThe Google File System
The Google File System Motivations of NFS NFS (Network File System) Allow to access files in other systems as local files Actually a network protocol (initially only one server) Simple and fast server
More informationMAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services
MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
More informationParquet. Columnar storage for the people
Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li nong@cloudera.com Software engineer, Cloudera Impala Outline Context from various
More informationMassive Data Storage
Massive Data Storage Storage on the "Cloud" and the Google File System paper by: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung presentation by: Joshua Michalczak COP 4810 - Topics in Computer Science
More informationRAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
More informationIn-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
More informationHBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services
HBase Schema Design NoSQL Ma4ers, Cologne, April 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera ConsulFng on Hadoop projects (everywhere) Apache Commi4er HBase and Whirr
More informationCosmos. Big Data and Big Challenges. Pat Helland July 2011
Cosmos Big Data and Big Challenges Pat Helland July 2011 1 Outline Introduction Cosmos Overview The Structured s Project Some Other Exciting Projects Conclusion 2 What Is COSMOS? Petabyte Store and Computation
More informationThe Sierra Clustered Database Engine, the technology at the heart of
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
More informationOracle Database In-Memory The Next Big Thing
Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationHadoop Distributed File System (HDFS) Overview
2012 coreservlets.com and Dima May Hadoop Distributed File System (HDFS) Overview Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized
More informationDistributed Lucene : A distributed free text index for Hadoop
Distributed Lucene : A distributed free text index for Hadoop Mark H. Butler and James Rutherford HP Laboratories HPL-2008-64 Keyword(s): distributed, high availability, free text, parallel, search Abstract:
More informationCouchbase Server Under the Hood
Couchbase Server Under the Hood An Architectural Overview Couchbase Server is an open-source distributed NoSQL document-oriented database for interactive applications, uniquely suited for those needing
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems Finding a needle in Haystack: Facebook
More informationStorage in Database Systems. CMPSCI 445 Fall 2010
Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query
More informationSlave. Master. Research Scholar, Bharathiar University
Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually
More informationStoring Data: Disks and Files
Storing Data: Disks and Files (From Chapter 9 of textbook) Storing and Retrieving Data Database Management Systems need to: Store large volumes of data Store data reliably (so that data is not lost!) Retrieve
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationCosmos. Big Data and Big Challenges. Ed Harris - Microsoft Online Services Division HTPS 2011
Cosmos Big Data and Big Challenges Ed Harris - Microsoft Online Services Division HTPS 2011 1 Outline Introduction Cosmos Overview The Structured s Project Conclusion 2 What Is COSMOS? Petabyte Store and
More informationScala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationStudy and Comparison of Elastic Cloud Databases : Myth or Reality?
Université Catholique de Louvain Ecole Polytechnique de Louvain Computer Engineering Department Study and Comparison of Elastic Cloud Databases : Myth or Reality? Promoters: Peter Van Roy Sabri Skhiri
More informationHardware Configuration Guide
Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...
More information