LARGE-SCALE DATA STORAGE APPLICATIONS
|
|
|
- Magdalen Weaver
- 10 years ago
- Views:
Transcription
1 BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013
2 Outline Goal and Motivation Overview of Cassandra and Voldemort Design Benchmark Setup and Methodology Preliminary Results and Status Report Conclusion
3 Goal and Motivation Goal: To understand failover characteristics of largescale data storage applications Few benchmarks of failover characteristics have been done We have chosen to study the following: Cassandra Voldemort HBase (if time permits) Cassandra and Voldemort were chosen because both emphasize availability and performance HBase, which emphasizes consistency and performance, was chosen to cover a wider range of architectures
4 OVERVIEW OF CASSANDRA AND VOLDEMORT DESIGN
5 Cassandra Architecture Mixture of Dynamo and BigTable Consistent hashing Order preserving hash function Various replication options Rack unaware, rack aware, datacenter shard Fault tolerance Accrual Failure Detection Node Failure Down and up: Zookeeper Down entirely: replacement
6 Voldemort
7 Voldemort Consistent hashing Zone aware replication User-defined per zone replication factor Consistency Read-repair & quorum Vector-clock versioning Node failure hinted handoff
8 Cassandra vs. Voldemort Cassandra Voldemort Data Model Replication Partitioning Consistency Model Data Storage Disk Developed column database multi dimensional synchronous/asynchronous chosen by application: Rack unaware, rack aware, across datacenter consistent hashing (order preserving hash function) tunable all the way from "writes never fail" to "block for all replicas to be readable", with the quorum level in the middle by Facebook in Java key-value datastore hash table Zone aware replication consistent hashing tunable Quorum, read-repair, hinted handoff Pluggable Storage Engines: BDB-JE, MySQL, Read-Only by LinkedIn in Java
9 BENCHMARK SETUP AND METHODOLOGY
10 Methodology Using Yahoo! Cloud Serving Systems Benchmark (YCSB) for load generation and reporting Extensible and generic framework for evaluation of key-value stores that has become an industry standard Can generate synthetic workloads that consist of a configurable distribution of CRUD operations Measure latency for a variety of throughputs Measure throughput vs time and error count for blue-sky and failure scenarios Node failures simulated using kill -9 Network failures simulated using tcpkill and firewalls Nodes run on AWS
11 Cluster Configuration Experimental setup consists of 5 nodes on AWS Database Servers 4x m1.xlarge Spot Instances 4 vcpu (8 ECU), 15 GiB RAM 4x420 GB disk as RAID 1+0 YCSB Load Generator 1x m3.2xlarge Spot Instance 8 vcpu (26 ECU), 30 GiB RAM Network throughput between database and YCSB servers consistently > 960 Mbit/s NFS Server 1x m1.small On-Demand Instance 1 vcpu (1 ECU), 1.7 GiB RAM 10 GB persistent EBS volume mounted at /home on all servers
12 Workloads and Parameters Workload A: Update heavy workload 50% Reads, 50% Writes Example: session store recording recent actions Workload B: Read mostly workload 95% Reads, 5% Writes Example: photo tagging Workload C: Read only 100% Read Example: user profile cache
13 Workloads and Parameters Workload D: Read latest workload New records inserted, reads mostly on latest inserted Example: User status updates Workload E: Short ranges Short ranges queried Example: Threaded conversations (clustered by thread ID) Workload F: Read-modify-write Records read, modified and written back Example: User database
14 PRELIMINARY RESULTS AND STATUS REPORT
15 Cassandra: Latency vs Throughput Cassandra No failures
16
17 Cassandra: Throughput vs Time Left: No failures Right: 1 of 4 nodes killed at ms
18 Challenges The YCSB github repository is out of date and the documentation is incomplete Only Cassandra 0.5, 0.6, and 0.7 are supported A patch was submitted for Cassandra but not documented Only Voldemort 0.81 is supported (but this is not documented) After a long search I found someone's personal fork of YCSB with support for Cassandra 2.0 (CQL) and an updated patch for Voldemort 0.96 in one of the Voldemort contributor s github repository
19 Status Report Need to re-run Cassandra tests with correct threadcount parameter and fork of YCSB that supports Cassandra 2.0 ObsoleteVersionExceptions are preventing Voldemort benchmark from progressing Contacted Voldemort developers through issue tracker: they said some ObsoleteVersionExceptions are normal I m working on patching the code based on stotch s recommendations Need to test failure scenarios Need to run Hbase tests (time permitting)
20 CONCLUSION
21 Summary Our goal is to understand failover characteristics of large-scale data storage applications Few benchmarks of failover characteristics have been done Presented an overview of the design of Cassandra and Voldemort Presented preliminary benchmark results using YCSB on a small cluster of nodes running in AWS
22 Lessons Learned Learned how to use AWS EC2 and VPC Learned differences between EBS, Instance Store and S3 and how to create AMIs Learned about instance types, placement groups and on-demand vs spot instances Learned about Regions and availability zones and how AWS is designed Designed to isolate failures Learned about the design and implementation and how to install, configure and tune several NoSQL Systems
23 QUESTIONS?
Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort
Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort Alexander Pokluda Cheriton School of Computer Science University of Waterloo 2 University Avenue
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
PostgreSQL Performance Characteristics on Joyent and Amazon EC2
OVERVIEW In today's big data world, high performance databases are not only required but are a major part of any critical business function. With the advent of mobile devices, users are consuming data
Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk
Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
Yahoo! Cloud Serving Benchmark
Yahoo! Cloud Serving Benchmark Overview and results March 31, 2010 Brian F. Cooper [email protected] Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and
Cassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store
Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information
Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election
NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de [email protected] T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
Benchmarking the Availability and Fault Tolerance of Cassandra
Benchmarking the Availability and Fault Tolerance of Cassandra Marten Rosselli, Raik Niemann, Todor Ivanov, Karsten Tolle, Roberto V. Zicari Goethe-University Frankfurt, Germany Frankfurt Big Data Lab
extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
Benchmarking and Analysis of NoSQL Technologies
Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The
Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
Getting Started with SandStorm NoSQL Benchmark
Getting Started with SandStorm NoSQL Benchmark SandStorm is an enterprise performance testing tool for web, mobile, cloud and big data applications. It provides a framework for benchmarking NoSQL, Hadoop,
Introduction to Apache Cassandra
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Distributed Systems. Tutorial 12 Cassandra
Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse
HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
HDB++: HIGH AVAILABILITY WITH Page 1 OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 OVERVIEW What is Cassandra (C*)?
Cassandra A Decentralized Structured Storage System
Cassandra A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik LADIS 2009 Anand Iyer CS 294-110, Fall 2015 Historic Context Early & mid 2000: Web applicaoons grow at tremendous rates
Benchmarking Replication in NoSQL Data Stores
Imperial College London Department of Computing Benchmarking Replication in NoSQL Data Stores by Gerard Haughian (gh43) Submitted in partial fulfilment of the requirements for the MSc Degree in Computing
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
Evaluation of NoSQL databases for large-scale decentralized microblogging
Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL [email protected] / @marcua
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL [email protected] / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in
Introduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
Practical Cassandra. Vitalii Tymchyshyn [email protected] @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
RDBMS in the Cloud: Oracle Database on AWS
RDBMS in the Cloud: Oracle Database on AWS Jean-Pierre Le Goaller, Carlos Conde, and Shakil Langha October 2013 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper)
Configuration Manual Yahoo Cloud System Benchmark (YCSB) 24-Mar-14 SEECS-NUST Faria Mehak
Configuration Manual Yahoo Cloud System Benchmark (YCSB) 24-Mar-14 SEECS-NUST Faria Mehak Table of Contents 1 Introduction... 3 1.1 Purpose... 3 1.2 Product Information... 3 2 Installation Manual... 3
Study and Comparison of Elastic Cloud Databases : Myth or Reality?
Université Catholique de Louvain Ecole Polytechnique de Louvain Computer Engineering Department Study and Comparison of Elastic Cloud Databases : Myth or Reality? Promoters: Peter Van Roy Sabri Skhiri
Cloud Computing For Bioinformatics
Cloud Computing For Bioinformatics Cloud Computing: what is it? Cloud Computing is a distributed infrastructure where resources, software, and data are provided in an on-demand fashion. Cloud Computing
Structured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014
Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western
MASTER PROJECT. Resource Provisioning for NoSQL Datastores
Vrije Universiteit Amsterdam MASTER PROJECT - Parallel and Distributed Computer Systems - Resource Provisioning for NoSQL Datastores Scientific Adviser Dr. Guillaume Pierre Author Eng. Mihai-Dorin Istin
Implementing Microsoft Windows Server Failover Clustering (WSFC) and SQL Server 2012 AlwaysOn Availability Groups in the AWS Cloud
Implementing Microsoft Windows Server Failover Clustering (WSFC) and SQL Server 2012 AlwaysOn Availability Groups in the AWS Cloud David Pae, Ulf Schoo June 2013 (Please consult http://aws.amazon.com/windows/
Case study: CASSANDRA
Case study: CASSANDRA Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu Cassandra:
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Accelerating Cassandra Workloads using SanDisk Solid State Drives
WHITE PAPER Accelerating Cassandra Workloads using SanDisk Solid State Drives February 2015 951 SanDisk Drive, Milpitas, CA 95035 2015 SanDIsk Corporation. All rights reserved www.sandisk.com Table of
Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment
International Journal of Applied Information Systems (IJAIS) ISSN : 2249-868 Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment Yusuf Abubakar Department of Computer Science
Benchmarking Top NoSQL Databases Apache Cassandra, Couchbase, HBase, and MongoDB Originally Published: April 13, 2015 Revised: May 27, 2015
Benchmarking Top NoSQL Databases Apache Cassandra, Couchbase, HBase, and MongoDB Originally Published: April 13, 2015 Revised: May 27, 2015 http://www.endpoint.com/ Table of Contents Executive Summary...
The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service
The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service Jinesh Varia and Jose Papo March 2012 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1
Distributed Storage Systems
Distributed Storage Systems John Leach [email protected] twitter @johnleach Brightbox Cloud http://brightbox.com Our requirements Bright box has multiple zones (data centres) Should tolerate a zone failure
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
High Throughput Computing on P2P Networks. Carlos Pérez Miguel [email protected]
High Throughput Computing on P2P Networks Carlos Pérez Miguel [email protected] Overview High Throughput Computing Motivation All things distributed: Peer-to-peer Non structured overlays Structured
Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES
THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES Introduction Amazon Web Services (AWS), which was officially launched in 2006, offers you varying cloud services that are not only cost effective, but also
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
NoSQL: Going Beyond Structured Data and RDBMS
NoSQL: Going Beyond Structured Data and RDBMS Scenario Size of data >> disk or memory space on a single machine Store data across many machines Retrieve data from many machines Machine = Commodity machine
BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING
BRAC UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING 12-12-2012 Investigating Cloud Data Storage Sumaiya Binte Mostafa (ID 08301001) Firoza Tabassum (ID 09101028) BRAC University
Building a Private Cloud with Eucalyptus
Building a Private Cloud with Eucalyptus 5th IEEE International Conference on e-science Oxford December 9th 2009 Christian Baun, Marcel Kunze KIT The cooperation of Forschungszentrum Karlsruhe GmbH und
Choosing the right NoSQL database for the job: a quality attribute evaluation
Lourenço et al. Journal of Big Data (2015) 2:18 DOI 10.1186/s40537-015-0025-0 RESEARCH Choosing the right NoSQL database for the job: a quality attribute evaluation João Ricardo Lourenço 1*, Bruno Cabral
Apache Cassandra for Big Data Applications
Apache Cassandra for Big Data Applications Christof Roduner COO and co-founder [email protected] Java User Group Switzerland January 7, 2014 2 AGENDA Cassandra origins and use How we use Cassandra Data
Design and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
Cloud Storage Solution for WSN in Internet Innovation Union
Cloud Storage Solution for WSN in Internet Innovation Union Tongrang Fan, Xuan Zhang and Feng Gao School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, 050043, China
Cloud Spectator Comparative Performance Report July 2014
Performance Analysis: Benchmarking a NoSQL Database on Bare-Metal and Virtualized Public Cloud Aerospike NoSQL Database on Internap Bare Metal, Amazon EC2 and Rackspace Cloud Cloud Spectator Comparative
CS435 Introduction to Big Data
CS435 Introduction to Big Data Final Exam Date: May 11 6:20PM 8:20PM Location: CSB 130 Closed Book, NO cheat sheets Topics covered *Note: Final exam is NOT comprehensive. 1. NoSQL Impedance mismatch Scale-up
Alfresco Enterprise on AWS: Reference Architecture
Alfresco Enterprise on AWS: Reference Architecture October 2013 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 13 Abstract Amazon Web Services (AWS)
High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo
High Availability for Database Systems in Cloud Computing Environments Ashraf Aboulnaga University of Waterloo Acknowledgments University of Waterloo Prof. Kenneth Salem Umar Farooq Minhas Rui Liu (post-doctoral
A Survey of Distributed Database Management Systems
Brady Kyle CSC-557 4-27-14 A Survey of Distributed Database Management Systems Big data has been described as having some or all of the following characteristics: high velocity, heterogeneous structure,
Benchmarking Cloud Serving Systems with YCSB
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Santa Clara, CA, USA {cooperb,silberst,etam,ramakris,sears}@yahoo-inc.com
Benchmarking Cloud Serving Systems with YCSB
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Santa Clara, CA, USA {cooperb,silberst,etam,ramakris,sears}@yahoo-inc.com
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
Distributed Data Stores
Distributed Data Stores 1 Distributed Persistent State MapReduce addresses distributed processing of aggregation-based queries Persistent state across a large number of machines? Distributed DBMS High
Accelerating Big Data: Using SanDisk SSDs for MongoDB Workloads
WHITE PAPER Accelerating Big Data: Using SanDisk s for MongoDB Workloads December 214 951 SanDisk Drive, Milpitas, CA 9535 214 SanDIsk Corporation. All rights reserved www.sandisk.com Accelerating Big
NoSQL Databases. Nikos Parlavantzas
!!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!
Time series IoT data ingestion into Cassandra using Kaa
Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka [email protected] Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox
BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011
BookKeeper Flavio Junqueira Yahoo! Research, Barcelona Hadoop in China 2011 What s BookKeeper? Shared storage for writing fast sequences of byte arrays Data is replicated Writes are striped Many processes
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
Comparing Scalable NOSQL Databases
Comparing Scalable NOSQL Databases Functionalities and Measurements Dory Thibault UCL Contact : [email protected] Sponsor : Euranova Website : nosqlbenchmarking.com February 15, 2011 Clarications
How To Choose Between A Relational Database Service From Aws.Com
The following text is partly taken from the Oracle book Middleware and Cloud Computing It is available from Amazon: http://www.amazon.com/dp/0980798000 Cloud Databases and Oracle When designing your cloud
Cassandra. Jonathan Ellis
Cassandra Jonathan Ellis Motivation Scaling reads to a relational database is hard Scaling writes to a relational database is virtually impossible and when you do, it usually isn't relational anymore The
Using Object Database db4o as Storage Provider in Voldemort
Using Object Database db4o as Storage Provider in Voldemort by German Viscuso db4objects (a division of Versant Corporation) September 2010 Abstract: In this article I will show you how
Deploying Splunk on Amazon Web Services
Copyright 2014 Splunk Inc. Deploying Splunk on Amazon Web Services Simeon Yep Senior Manager, Business Development Technical Services Roy Arsan Senior SoHware Engineer Disclaimer During the course of this
Online data processing with S4 and Omid*
Online data processing with S4 and Omid* Flavio Junqueira Microsoft Research, Cambridge * Work done while in Yahoo! Research Big Data defined Wikipedia In information technology, big data[1][2] is a collection
Comparing NoSQL Solutions In a Real-World Scenario: Aerospike, Cassandra Open Source, Cassandra DataStax, Couchbase and Redis Labs
Comparing NoSQL Solutions In a Real-World Scenario: Aerospike, Cassandra Open Source, Cassandra DataStax, Couchbase and Redis Labs Composed by Avalon Consulting, LLC June 2015 1 Introduction Specializing
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at
MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15
MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You
A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)
1. Computation Amazon Web Services Amazon Elastic Compute Cloud (Amazon EC2) provides basic computation service in AWS. It presents a virtual computing environment and enables resizable compute capacity.
MakeMyTrip CUSTOMER SUCCESS STORY
MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip is the leading travel site in India that is running two ClustrixDB clusters as multi-master in two regions. It removed single point of failure. MakeMyTrip frequently
How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.
nosql and Non Relational Databases
nosql and Non Relational Databases Image src: http://www.pentaho.com/big-data/nosql/ Matthias Lee Johns Hopkins University What NoSQL? Yes no SQL.. Atleast not only SQL Large class of Non Relaltional Databases
C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection
C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Lalith Suresh (TU Berlin) with Marco Canini (UCL), Stefan Schmid, Anja Feldmann (TU Berlin) Tail-latency matters One User Request
Introduction to NOSQL
Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo
Performance test report
Disclaimer This report was proceeded by Netventic Technologies staff with intention to provide customers with information on what performance they can expect from Netventic Learnis LMS. We put maximum
Cloud Storage Solution for WSN Based on Internet Innovation Union
Cloud Storage Solution for WSN Based on Internet Innovation Union Tongrang Fan 1, Xuan Zhang 1, Feng Gao 1 1 School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang,
No-SQL Databases for High Volume Data
Target Conference 2014 No-SQL Databases for High Volume Data Edward Wijnen 3 November 2014 The New Connected World Needs a Revolutionary New DBMS Today The Internet of Things 1990 s Mobile 1970 s Mainfram
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)
Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure
NoSQL Database in the Cloud: Couchbase Server 2.0 on AWS July 2013
NoSQL Database in the Cloud: Couchbase Server 2.0 on AWS July 2013 Kyle Lichtenberg and Miles Ward (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this whitepaper.) Page 1
