Cloud Data Yahoo!

Size: px
Start display at page:

Download "Cloud Data Management @ Yahoo!"

Transcription

1 Cloud Data Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Brian Cooper Adam Silberstein Yahoo! Research Joint work with the Sherpa team in Cloud Computing 1

2 Cloud Use Cases Content Optimization Machine Learning (e.g. Spam filters) Search Index Ads Optimization Attachment Storage Image/Video Storage & Delivery 2

3 Yahoo! Cloud Stack EDGE YCS Horizontal YCPI Cloud Services Brooklyn Provisioning (Self-serve) Monitoring/Metering/Security VM/OS VM/OS WEB Horizontal yapache Cloud ServicesPHP APP Horizontal Serving Cloud Grid Services OPERATIONAL STORAGE PNUTS/Sherpa Horizontal MOBStor Cloud Services App Engine Data Highway Hadoop BATCH STORAGE Horizontal Cloud Services 3

4 Requirements for Cloud Services Multitenant. A cloud service must support multiple, organizationally distant customers. Elasticity. Tenants should be able to negotiate and receive resources/qos on-demand up to a large scale. Resource Sharing. Ideally, spare cloud resources should be transparently applied when a tenant s negotiated QoS is insufficient, e.g., due to spikes. Horizontal scaling. The cloud provider should be able to add cloud capacity in increments without affecting tenants of the service. Metering. A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of the service. Security. A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud. Availability. A cloud service should be highly available. Operability. A cloud service should be easy to operate, with few operators. Operating costs should scale linearly or better with the capacity of the service. Will discuss today 4

5 Web Data Management Warehousing, ML / data mining Scan oriented workloads Focus on sequential disk I/O $ per cpu cycle Large data analysis (Hadoop/Pig/Zebra) Blob storage (MObStor) Structured record storage (PNUTS/Sherpa) Object retrieval and streaming Scalable file storage $ per GB storage & bandwidth CRUD Point lookups and short scans Index organized table and random I/Os $ per latency 6

6 ACID or BASE? Litmus tests are colorful, but the picture is cloudy SCALABLE DATA SERVING 8

7 Yahoo! Serving Storage Problem Small records 100KB or less Structured records Lots of fields, evolving Extreme data scale - Tens of TB Extreme request scale - Tens of thousands of requests/sec Low latency globally datacenters worldwide High Availability - Outages cost $millions Variable usage patterns - Applications and users change 9 9

8 Typical Y! Applications User logins and profiles Including changes that must not be lost! Events But single-record transactions suffice Alerts (e.g., news, price drops) Social network activity (e.g., user goes offline) Ad clicks, article clicks Application-specific data Postings in message board Uploaded photos, tags Shopping carts 500M+ unique users per month Hundreds of petabytes of storage Hundreds of billions of objects Hundred of thousands of requests/sec Global, rapidly evolving workloads 10

9 VLSD Data Serving Stores Must partition data across machines How are partitions determined? Can partitions be changed easily? Affects elasticity, resource sharing, operability How are read/update requests routed? Range selections? Can requests span machines? Availability: What failures are handled? With what semantic guarantees on data access? (How) Is data replicated? Sync or async? Consistency model? Local or geo? How are updates made durable? How is data stored on a single machine? 11

10 PNUTS/SHERPA 12 12

11 The World Has Changed Web serving applications need: Scalability! Elastic on demand, commodity boxes Flexible schemas Geographic distribution/replication High availability Low latency Web serving applications willing to do without: Complex queries ACID transactions But still benefit from support for data consistency 13

12 What is PNUTS/Sherpa? A E B W C W D E E C F E CREATE TABLE Parts ( ID VARCHAR, StockNumberINT, Status VARCHAR ) Structured, flexible schema A E B W C W D E E C F E Parallel database A E B W C W D E E C F E Geographic replication Hosted, managed infrastructure 14 14

13 Architecture Local region Remote regions Clients REST API Routers Tablet Controller Tribble Storage units 15 15

14 PNUTS: Key Components Caches the maps from the TC Routes client requests to correct SU Maintains map from database.table.key-totablet-to-su Provides load balancing Stores records in tablets Services get/set/delete requests 16 16

15 Tablets Hash Table 0x0000 Name Description Price Grape Grapes are good to eat $12 Lime Limes are green $9 Apple Apple is wisdom $1 0x2AF3 Strawberry Orange Strawberry shortcake Arrgh! Don t get scurvy! $900 $2 Avocado But at what price? $3 Lemon How much did you pay for this lemon? $1 0x911F Tomato Banana Is this a vegetable? The perfect fruit $14 $2 0xFFFF Kiwi New Zealand $

16 Tablets Ordered Table A Name Description Price Apple Apple is wisdom $1 Avocado But at what price? $3 Banana The perfect fruit $2 H Grape Kiwi Grapes are good to eat New Zealand $12 $8 Lemon How much did you pay for this lemon? $1 Lime Limes are green $9 Q Orange Strawberry Arrgh! Don t get scurvy! Strawberry shortcake $2 $900 Z Tomato Is this a vegetable? $

17 Flexible Schema Posted date Listing id Item Price 6/1/ Couch $570 6/1/ Bike $86 6/3/ Car $1123 6/5/ Lamp $15 Color Red Condition Good Fair 19

18 PROCESSING READS & UPDATES 20 20

19 Updates 8 Sequence # for key k 1 Write key k Routers Message brokers 3 Write key k 7 Sequence # for key k 2 Write key k 4 SU SU SU 5 SUCCESS 6 Write key k 21 21

20 Accessing Data 4 Record for key k 1 Get key k 3 Record for key k 2 Get key k SU SU SU 22 22

21 Range Queries in YDOT Clustered, ordered retrieval of records Apple Avocado Banana Blueberry Grapefruit Pear? Canteloupe Grape Kiwi Lemon Lime Mango Orange Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Router Lime Pear? Grapefruit Lime? Apple Strawberry Avocado Tomato Banana Watermelon Blueberry Strawberry Tomato Watermelon Lime Mango Orange Canteloupe Grape Kiwi Lemon Storage unit 1 Storage unit 2 Storage unit 3 23

22 Bulk Load in YDOT YDOT bulk inserts can cause performance hotspots Solution: preallocate tablets 24

23 ELASTICITY, OPERABILITY, HORIZONTAL SCALING 25 25

24 Distribution 6/1/ /1/ Couch $570 Data Distribution shuffling for load parallelism balancing Car $1123 6/2/ Bike $86 6/5/ Chair $10 6/7/ Lamp $19 6/9/ Bike $56 6/11/ Scooter $18 6/11/ Hammer $8000 Server 1 Server 2 Server 3 Server

25 Tablet Splitting and Balancing Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspot Storage unit Tablet Overfull tablets split Tablets may grow over time Shed load by moving tablets to other servers 27 27

26 ASYNCHRONOUS REPLICATION AND CONSISTENCY 28 28

27 Asynchronous Replication 29 29

28 Consistency Levels ACID consistency Transaction: Add Brian as Toby s friend, and Toby as Brian s friend Brian s Friends Brad PPSN Adam Utkarsh Toby s Friends Carol Ari Shelton Jay Brian s Friends Brad PPSN Adam Utkarsh Toby Toby s Friends Carol Ari Shelton Jay Brian States you will never see: Brian is Toby s friend but Toby is not Brian s friend Toby is Brian s friend but Brian is not Toby s friend 30

29 The CAP Theorem You have to give up one of the following in a distributed system (Brewer, PODC 2000; Gilbert/Lynch, SIGACT News 2002): Consistency of data Think serializability Availability Pinging a live node should produce results Partition tolerance Live nodes should not be blocked by partitions 31

30 BASE Approaches to CAP No ACID, use a single version of DB, reconcile later Defer transaction commit Until partitions fixed and distr xact can run Eventual consistency (e.g., Amazon Dynamo) Eventually, all copies of an object converge Restrict transactions (e.g., Sharded MySQL) 1-M/c Xacts: Objects in xact are on the same machine 1-Object Xacts: Xact can only read/write 1 object Object timelines (PNUTS) 32

31 Consistency: Social Alice West East Record Timeline User Alice Status Busy Network disruption: Alice redirected to East Busy User Status User Alice Status Free Free Alice Busy busy free User Status Alice??? User Status Alice??? Free 33

32 PNUTS Consistency Model Goal: Make it easier for applications to reason about updates and cope with asynchrony What happens to a record with primary key Alice? Record inserted Update Update Update Update Update Update Update Delete v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time As the record is updated, copies may get out of sync

33 PNUTS Consistency Model Read Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time In general, reads are served using a local copy 35 35

34 PNUTS Consistency Model Read up-to-date Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time But application can request and get current version 36 36

35 PNUTS Consistency Model Read v.6 Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Or variations such as read forward while copies may lag the master record, every copy goes through the same sequence of changes 37 37

36 PNUTS Consistency Model Write Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Achieved via per-record primary copy protocol (To maximize availability, record masterships automatically transferred if site fails) Can be selectively weakened to eventual consistency (local writes that are reconciled using version vectors) 38 38

37 PNUTS Consistency Model Write if = v.7 ERROR Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Test-and-set writes facilitate per-record transactions 39 39

38 Consistency Techniques Per-record mastering Each record is assigned a master region May differ between records Updates to the record forwarded to the master region Ensures consistent ordering of updates Tablet-level mastering Each tablet is assigned a master region Inserts and deletes of records forwarded to the master region Master region decides tablet splits These details are hidden from the application Except for the latency impact! 43

39 Record Master A E B WE C W D E E C F E A E B W C W D E E C F E A E B WE C W D E E C F E 44 44

40 Tablet Master Key2: Region W Key E Key W Key E Key C Key E Region C Tablet master Key E Key W Key E Key C Key E Region E Key E Key W Key E Key C Key E 45 45

41 Tablet Mastership Tablet master Region W Region C Region E Key E Key E Key E Key W Key E Step 1: Forward Req to Tablet Master Key W Key E Key W Key E Key C Key C Key C Key E Step 2: Apply Insert to Tablet Master Key E Key E Key E Key W Key W Key E Key W Key W Key E Step 4: Apply Insert at Rec Master Key E Key C Key E Step 3: Replicate Insert to Other Sites Key E Key W Key W Key E Key C Key C Key E Key E 46 46

42 Write Interaction Consistency on (default) Insert: primary keys guaranteed to be unique Across all data replicas Update: updates are applied to a record in a given order Same order for all replicas Delete: deleted records will not re-appear, and records will not disappear without a delete Consistency off Above guarantees may not be enforced This decision is at the table level Once you do some consistency off stuff, cannot guarantee any consistency 47 47

43 Read Interaction Asynchronous replication means some data will be (temporarily) out of date All replicas of a record experience the same history of updates No guarantees offered for cross-record or whole-database consistency Read levels (assuming consistency on) decided per call: Read any Give me any available version of the record from that record s history Critical read Give me any version of the record, but no older than my last update Read forward Give me any version of the record, but no older than my last read Up to date Give me the most recent version of the record Though new updates may occur while the result is in transit back to me 48 48

44 Summary: Consistency Sherpa offers timeline and eventual consistency Can get per-record ACID using test-and-set Other systems offer other choices ACID Per-record ACID Eventual Other Oracle Weaker serializable levels Weaker serializable levels Supports quorum read/writes HBase Best effort Vertica HDFS Does not allow updates UDB Best-effort only 49

45 AVAILABILITY 50 50

46 Coping With Failures X X A E B W C W D E E C OVERRIDE W E F E A E B W C W D E E C F E A E B W C W D E E C F E 51 51

47 Possible Failure Modes Failure type Storage unit Consistency impact None Availability impact Degraded service (forwards) for some data. Updates and inserts fail for some records Resolution If data not lost: Reboot machine If data lost: Copy lost tablets from a remote replica Issue overrides to restore availability Time to resolve If data lost, hours or less (depending on tablet size and colo location). If no data lost, minutes. X Storage units 52

48 Possible Failure Modes Failure type Router Consistency impact None Availability impact None Resolution Boot router Time to resolve Minutes X Routers 53

49 Possible Failure Modes Failure Tablet controller Consistency impact XTablet controller None Availability impact Some actions (e.g., tablet copy) will be blocked Resolution Start secondary controller Time to resolve Minutes 54

50 Possible Failure Modes Failure One msg hub node Consistency impact X Msg Hubs None Availability impact Writes fail for some records until a new secondary node takes over Resolution Create new primary or secondary for lost topics Time to resolve Minutes 55

51 Possible Failure Modes Failure Colo power outage or partition Consistency impact Option to allow relaxed consistency to improve availability Availability impact Some inserts, updates and deletes cannot succeed Some critical reads fail Option to allow updates to proceed in relaxed consistency mode Resolution Major overrides to force mastership transfer; discard conflicting updates Time to resolve Hours Tablet map Load balancer Server monitor Tablet controller SU API Clients Routers WS API Tribble Storage units 56

52 Further PNutty Reading Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008) Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008) Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni Asynchronous View Maintenance for VLSD Databases (SIGMOD 2009) Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and Raghu Ramakrishnan Cloud Storage Design in a PNUTShell Brian F. Cooper, Raghu Ramakrishnan, and Utkarsh Srivastava Beautiful Data, O Reilly Media, 2009 Adaptively Parallelizing Distributed Range Queries (VLDB 2009) Ymir Vigfusson, Adam Silberstein, Brian Cooper, Rodrigo Fonseca 57

53 Green Apples and Red Apples COMPARING SOME CLOUD SERVING STORES 58

54 Help! I have a huge amount of data. What should I do with it? UDB Sherpa Dryad DB2 59

55 Serving Stores Many cloud DB and nosql systems out there PNUTS BigTable HBase, Hypertable, HTable Azure Cassandra Megastore Amazon Web Services S3, SimpleDB, EBS And more: CouchDB, MongoDB, Voldemort, etc. How do they compare? Feature tradeoffs Performance tradeoffs Not clear! 60

56 The Contestants Baseline: Sharded MySQL Horizontally partition data among MySQL servers PNUTS/Sherpa Yahoo! s cloud database Cassandra BigTable + Dynamo HBase BigTable + Hadoop 61

57 SHARDED MYSQL 62 62

58 Architecture Our own implementation of sharding Client Client Client Client Client Shard Server Shard Server Shard Server Shard Server MySQL MySQL MySQL MySQL 63

59 Pros Simple Infinitely scalable Low latency Geo-replication Pros and Cons Cons Not elastic (Resharding is hard) Poor support for load balancing Failover? (Adds complexity) Replication unreliable (Async log shipping) 64

60 Azure SDS Cloud of SQL Server instances App partitions data into instance-sized pieces Transactions and queries within an instance Data SDS Instance Storage Per-field indexing 65

61 Google MegaStore Transactions across entity groups Entity-group: hierarchically linked records Ramakris Ramakris.preferences Ramakris.posts Ramakris.posts.aug Can transactionally update multiple records within an entity group Records may be on different servers Use Paxos to ensure ACID, deal with server failures Can join records within an entity group Other details Built on top of BigTable Supports schemas, column partitioning, some indexing Phil Bernstein, 66

62 PNUTS 67 67

63 Pros and Cons Pros Reliable geo-replication Scalable consistency model Elastic scaling Easy load balancing Cons System complexity relative to sharded MySQL to support geo-replication, consistency, etc. Latency added by router 68

64 HBASE 69 69

65 Architecture Client Client Client Client Client REST API Java Client HBaseMaster HRegionServer HRegionServer HRegionServer HRegionServer Disk Disk Disk Disk 70

66 HRegion Server Records partitioned into MemStores Each MemStore contains many HFiles All writes to MemStore applied to single memcache Reads consult HFiles and memcache Memcaches flushed as HFiles (HDFS files) when full Compactions limit number of HFiles writes HRegionServer Memcache reads MemStore HFiles Flush to disk 71

67 Pros and Cons Pros High write throughput Elastic scaling Easy load balancing Column storage for OLAP workloads Cons Writes not immediately persisted to disk Reads cross multiple disk, memory locations No geo-replication Latency/bottleneck of HBaseMaster when using REST 72

68 CASSANDRA 73 73

69 Architecture Facebook s storage system BigTable data model Dynamo partitioning and consistency model Peer-to-peer architecture Client Client Client Client Client Cassandra node Cassandra node Cassandra node Cassandra node Disk Disk Disk Disk 74

70 Cassandra Server Writes go to log and memory table Periodically memory table merged with disk table Update Cassandra node RAM Memtable (later) Log Disk SSTable file 75

71 Pros and Cons Pros Elastic scalability Easy management Peer-to-peer configuration BigTable model is nice Flexible schema, column groups for partitioning, versioning, etc. Eventual consistency is scalable Cons Eventual consistency is hard to program against No built-in support for geo-replication Gossip can work, but is not really optimized for, cross-datacenter Load balancing? Consistent hashing limits options System complexity P2P systems are complex; have complex corner cases 76

72 NUMBERS Keep in mind that these systems are in active development, and these numbers represent just a current snapshot! 77 77

73 Benchmark tiers Tier 1 Performance For constant hardware, increase offered throughput until saturation Measure resulting latency/throughput curve Sizeup in Wisconsin benchmark terminology Tier 2 Scalability Scaleup Increase hardware, data size and workload proportionally. Measure latency; should be constant Elastic speedup Run workload against N servers; while workload is running att N+1th server; measure timeseries of latencies (should drop after adding server) 78

74 Test setup Setup Six server-class machines 8 cores (2 x quadcore) 2.5 GHz CPUs, 8 GB RAM, 6 x 146GB 15K RPM SAS drives in RAID 1+0, Gigabit ethernet, RHEL 4 Plus extra machines for clients, routers, controllers, etc. Cassandra (0.6.0-beta2 for range queries) HBase MySQL organized into a sharded configuration Sherpa 1.8 with MySQL No replication; force updates to disk (except HBase, which primarily commits to memory) Workloads 120 million 1 KB records = 20 GB per server Reads retrieve whole record; updates write a single field 100 or more client threads Caveats Write performance would be improved for Sherpa, sharded MySQL and Cassandra with a dedicated log disk We tuned each system as well as we knew how, with assistance from the teams of developers 79

75 Workload A Update heavy 50/50 Read/update Workload A - Read latency Workload A - Update latency Average read latency (ms) Update latency (ms) Throughput (ops/sec) Throughput (ops/sec) Cassandra Hbase Sherpa MySQL Cassandra Hbase Sherpa MySQL Comment: Cassandra is optimized for writes, and achieves higher throughput and lower latency. Sherpa and MySQL achieve roughly comparable performance, as both are limited by MySQL s capabilities. HBase has good write latency, because of commits to memory, and somewhat higher read latency, because of the need to reconstruct records. 80

76 95/5 Read/update Workload B Read heavy Workload B - Read latency Workload B - Update latency Average read latency (ms) Throughput (operations/sec) Average update latency (ms) Throughput (operations/sec) Cassandra HBase Sherpa MySQL Cassandra Hbase Sherpa MySQL Comment: Sherpa does very well here, with better read latency only one lookup into a B- tree is needed for reads, unlike log-structured systems where records must be reconstructed. Cassandra also performs well, matching Sherpa until high throughputs. HBase does well also, although read time is higher. 81

77 Workload E short scans Scans of records of size 1KB 120 Workload E - Scan latency Average scan latency (ms) Throughput (operations/sec) Hbase Sherpa Cassandra Comment: HBase and Sherpa are roughly equivalent for latency and peak throughput, even though HBase is meant for scans. Cassandra s performance is poor, but the development team notes that many optimizations still need to be done. 82

78 Workload E range size Vary size of range scans Range size versus latency (Workload E) Average range scan latency (ms) Max range size (records) Hbase Sherpa Comment: For small ranges, queries are similar to random lookups; Sherpa is efficient for random lookups and does well. As range increases, HBase begins to perform better since it is optimized for large scans 83

79 Scale-up Read heavy workload with varying hardware Read latency during scale-up Average read latency (ms) Number of servers Cassandra Hbase Sherpa Comment: Sherpa and Casandra scale well, with flat latency as system size increases. HBase is very unstable; 3 servers or less performs very poorly. 84

80 Elasticity Run a read-heavy workload on 2 servers; add a 4 th, then 5 th, then 6 th server. 140 Sherpa Elasticity - 5th to 6th server 120 Read latency (ms) Duration of test (min) Comment: Sherpa shows variance in response time as tablets are moving, but after the data moves, it settles into an average that is faster than before the sixth server was added. 85

81 Elasticity Run a read-heavy workload on 2 servers; add a 4 th, then 5 th, then 6 th server. Cassandra Elasticity 5 th to 6 th server Read latency (ms) Duration of test (min) Comment: Cassandra shows lots of latency variance as it moves data to the new server, and takes multiple hours to stabilize 86

82 Elasticity Run a read-heavy workload on 2 servers; add a 4 th, then 5 th, then 6 th server. 250 HBase Elasticity - 5th to 6th Server 200 Read latency (ms) Test duration (min) Comment: HBase shows a small latency bump as the cluster reconfigures. But data is not moved to the new server until a compaction is performed (not shown in the graph) 87

83 Cassandra vs 0.5 Workload B - Read heavy Average latency (ms) Throughput (operations/sec) Cas 0.5 Read Cas 0.5 Update Cas Read Cas Update 88

84 For more information Contact: Brian Cooper Detailed writeup of benchmark: Open source YCSB tool coming soon (watch nt/ycsb) 89

85 Ongoing and Future Work ACID xacts over entity groups, coupled with entity group timelines across replicas Entity groups limited to a single machine, e.g., all records belonging to a user Auto tuning Multitenancy with SLAs Materialized views Alternative storage unit architectures LSM files (Stasis) Hybrid OLAP/OLTP workloads? Leveraging large (aggregate) main memories 95

86 New in 2010! SIGMOD and SIGOPS are starting a new annual conference, to be co-located alternately with SIGMOD and SOSP: ACM Symposium on Cloud Computing (SoCC) PC Chairs: Surajit Chaudhuri & Mendel Rosenblum Steering committee: Phil Bernstein, Ken Birman, Joe Hellerstein, John Ousterhout, Raghu Ramakrishnan, Doug Terry, John Wilkes 96

87 QUESTIONS? 97 97

Yahoo! Cloud Serving Benchmark

Yahoo! Cloud Serving Benchmark Yahoo! Cloud Serving Benchmark Overview and results March 31, 2010 Brian F. Cooper cooperb@yahoo-inc.com Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and

More information

Sherpa: Cloud Computing of the Third Kind

Sherpa: Cloud Computing of the Third Kind Sherpa: Cloud Computing of the Third Kind Raghu Ramakrishnan Yahoo! and Platform Engineering Team What s in a Name? Data Intensive Super Scalable Computing Grid Computing Super Computing Cloud Computing

More information

Perspectives on Cloud Computing

Perspectives on Cloud Computing Perspectives on Cloud Computing Raghu Ramakrishnan Yahoo! Fellow Chief Scientist, Audience and Cloud Computing (Many slides courtesy of others at Yahoo!) 1 Outline Several applications Some takeaways on

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Benchmarking and Analysis of NoSQL Technologies

Benchmarking and Analysis of NoSQL Technologies Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Amr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu

Amr El Abbadi. Computer Science, UC Santa Barbara amr@cs.ucsb.edu Amr El Abbadi Computer Science, UC Santa Barbara amr@cs.ucsb.edu Collaborators: Divy Agrawal, Sudipto Das, Aaron Elmore, Hatem Mahmoud, Faisal Nawab, and Stacy Patterson. Client Site Client Site Client

More information

Benchmarking Cloud Serving Systems with YCSB

Benchmarking Cloud Serving Systems with YCSB Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Santa Clara, CA, USA {cooperb,silberst,etam,ramakris,sears}@yahoo-inc.com

More information

Benchmarking Cloud Serving Systems with YCSB

Benchmarking Cloud Serving Systems with YCSB Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Santa Clara, CA, USA {cooperb,silberst,etam,ramakris,sears}@yahoo-inc.com

More information

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information

More information

Building a Cloud for Yahoo!

Building a Cloud for Yahoo! Building a Cloud for Yahoo! Brian F. Cooper, Eric Baldeschwieler, Rodrigo Fonseca, James J. Kistler, P.P.S. Narayan, Chuck Neerdaels, Toby Negrin, Raghu Ramakrishnan, Adam Silberstein, Utkarsh Srivastava,

More information

The relative simplicity of common requests in Web. CAP and Cloud Data Management COVER FEATURE BACKGROUND: ACID AND CONSISTENCY

The relative simplicity of common requests in Web. CAP and Cloud Data Management COVER FEATURE BACKGROUND: ACID AND CONSISTENCY CAP and Cloud Data Management Raghu Ramakrishnan, Yahoo Novel systems that scale out on demand, relying on replicated data and massively distributed architectures with clusters of thousands of machines,

More information

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal

Cloud data store services and NoSQL databases. Ricardo Vilaça Universidade do Minho Portugal Cloud data store services and NoSQL databases Ricardo Vilaça Universidade do Minho Portugal Context Introduction Traditional RDBMS were not designed for massive scale. Storage of digital data has reached

More information

LARGE-SCALE DATA STORAGE APPLICATIONS

LARGE-SCALE DATA STORAGE APPLICATIONS BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

Introduction to NOSQL

Introduction to NOSQL Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo

More information

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords

More information

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

MASTER PROJECT. Resource Provisioning for NoSQL Datastores Vrije Universiteit Amsterdam MASTER PROJECT - Parallel and Distributed Computer Systems - Resource Provisioning for NoSQL Datastores Scientific Adviser Dr. Guillaume Pierre Author Eng. Mihai-Dorin Istin

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria

More information

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

More information

Hosting Transaction Based Applications on Cloud

Hosting Transaction Based Applications on Cloud Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...

More information

Distributed Systems. Tutorial 12 Cassandra

Distributed Systems. Tutorial 12 Cassandra Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

Distributed Data Stores

Distributed Data Stores Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High

More information

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011 BookKeeper Flavio Junqueira Yahoo! Research, Barcelona Hadoop in China 2011 What s BookKeeper? Shared storage for writing fast sequences of byte arrays Data is replicated Writes are striped Many processes

More information

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election

More information

PNUTS: Yahoo! s Hosted Data Serving Platform

PNUTS: Yahoo! s Hosted Data Serving Platform PNUTS: Yahoo! s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Yahoo!

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Cassandra A Decentralized, Structured Storage System

Cassandra A Decentralized, Structured Storage System Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922

More information

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory) WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...

More information

Cloud Computing Is In Your Future

Cloud Computing Is In Your Future Cloud Computing Is In Your Future Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Cloud Computing is Utility Computing Illusion

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at

More information

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Cloud DBMS: An Overview Shan-Hung Wu, NetDB CS, NTHU Spring, 2015 Outline Definition and requirements S through partitioning A through replication Problems of traditional DDBMS Usage analysis: operational

More information

Cassandra A Decentralized Structured Storage System

Cassandra A Decentralized Structured Storage System Cassandra A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik LADIS 2009 Anand Iyer CS 294-110, Fall 2015 Historic Context Early & mid 2000: Web applicaoons grow at tremendous rates

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Design Patterns for Distributed Non-Relational Databases

Design Patterns for Distributed Non-Relational Databases Design Patterns for Distributed Non-Relational Databases aka Just Enough Distributed Systems To Be Dangerous (in 40 minutes) Todd Lipcon (@tlipcon) Cloudera June 11, 2009 Introduction Common Underlying

More information

Cloud Based Application Architectures using Smart Computing

Cloud Based Application Architectures using Smart Computing Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012 Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP)

More information

Cloud Computing with Microsoft Azure

Cloud Computing with Microsoft Azure Cloud Computing with Microsoft Azure Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Azure's Three Flavors Azure Operating

More information

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful. Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one

More information

Database Scalability and Oracle 12c

Database Scalability and Oracle 12c Database Scalability and Oracle 12c Marcelle Kratochvil CTO Piction ACE Director All Data/Any Data marcelle@piction.com Warning I will be covering topics and saying things that will cause a rethink in

More information

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings Solution Brief Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings Introduction Accelerating time to market, increasing IT agility to enable business strategies, and improving

More information

A survey of big data architectures for handling massive data

A survey of big data architectures for handling massive data CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - jordydomingos@gmail.com Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context

More information

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012 Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics data 4

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Benchmarking Replication in NoSQL Data Stores

Benchmarking Replication in NoSQL Data Stores Imperial College London Department of Computing Benchmarking Replication in NoSQL Data Stores by Gerard Haughian (gh43) Submitted in partial fulfilment of the requirements for the MSc Degree in Computing

More information

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo High Availability for Database Systems in Cloud Computing Environments Ashraf Aboulnaga University of Waterloo Acknowledgments University of Waterloo Prof. Kenneth Salem Umar Farooq Minhas Rui Liu (post-doctoral

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues Dharmit Patel Faraj Khasib Shiva Srivastava Outline What is Distributed Queue Service? Major Queue Service

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Using an In-Memory Data Grid for Near Real-Time Data Analysis SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses

More information

Preparing Your Data For Cloud

Preparing Your Data For Cloud Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability

More information

Social Networks and the Richness of Data

Social Networks and the Richness of Data Social Networks and the Richness of Data Getting distributed Webservices Done with NoSQL Fabrizio Schmidt, Lars George VZnet Netzwerke Ltd. Content Unique Challenges System Evolution Architecture Activity

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Study and Comparison of Elastic Cloud Databases : Myth or Reality? Université Catholique de Louvain Ecole Polytechnique de Louvain Computer Engineering Department Study and Comparison of Elastic Cloud Databases : Myth or Reality? Promoters: Peter Van Roy Sabri Skhiri

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

High Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es

High Throughput Computing on P2P Networks. Carlos Pérez Miguel carlos.perezm@ehu.es High Throughput Computing on P2P Networks Carlos Pérez Miguel carlos.perezm@ehu.es Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured

More information

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014 Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2 DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.

More information

Cloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010

Cloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010 Cloud Design and Implementation Cheng Li MPI-SWS Nov 9 th, 2010 1 Modern Computing CPU, Mem, Disk Academic computation Chemistry, Biology Large Data Set Analysis Online service Shopping Website Collaborative

More information

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment International Journal of Applied Information Systems (IJAIS) ISSN : 2249-868 Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment Yusuf Abubakar Department of Computer Science

More information

Data Management in the Cloud

Data Management in the Cloud Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server

More information

How To Write A Database Program

How To Write A Database Program SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store

More information

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers BASEL UNIVERSITY COMPUTER SCIENCE DEPARTMENT Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers Distributed Information Systems (CS341/HS2010) Report based on D.Kassman, T.Kraska,

More information

bigdata Managing Scale in Ontological Systems

bigdata Managing Scale in Ontological Systems Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural

More information

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

More information

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management A B M Moniruzzaman Department of Computer Science and Engineering, Daffodil International

More information

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort

Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort Alexander Pokluda Cheriton School of Computer Science University of Waterloo 2 University Avenue

More information

Overview: X5 Generation Database Machines

Overview: X5 Generation Database Machines Overview: X5 Generation Database Machines Spend Less by Doing More Spend Less by Paying Less Rob Kolb Exadata X5-2 Exadata X4-8 SuperCluster T5-8 SuperCluster M6-32 Big Memory Machine Oracle Exadata Database

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information