Benchmarking and Analysis of NoSQL Technologies

Size: px
Start display at page:

Download "Benchmarking and Analysis of NoSQL Technologies"

Transcription

1 Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune Abstract The social web generates terabytes of unstructured, user generated data, spread across thousands of commodity servers. The changed face of web based applications forced to invent new approaches to data management. The NoSQL databases were created as a mean to offer high performance (both in terms of speed and size) and high availability. They share the goals of massive scaling on demand (elasticity). Hence NoSQL is best suited solution to the big data needs posed by the evolving web application. With the advent of newer NoSQL technologies each day, developers face a serious issue in selecting the best suited datastore for their application as each of them place very different demands. The NoSQL databases haven t been tested for their claims. In this paper we discuss the benchmarking results of the 4 most popular NoSQL technologies namely- Cassandra, HBase, MongoDB and CouchDB. Also we analyze the results based on the performance of each datastore on various parameters. We are using the YCSB (Yahoo! Cloud Serving Benchmark) tool to compare Cassandra vs. HBase (key-value stores), and MongoDB vs. CouchDB (document-oriented databases). We are benchmarking the NoSQL datastore on tiers of Performance, Scalability and Availability. Keywords YCSB, Elasticity, Availability, Scale I. INTRODUCTION Relational database technology, invented in the 1970s are still in widespread use today, was optimized for the applications, users and infrastructure of that era. In some regards, it is the last domino to fall in the inevitable march towards fully-distributed software architecture. While a number of band aids have extended the useful life of the technology (horizontal and vertical sharding, distributed caching and data de-normalization), these tactics nullify key benefits of the relational model while increasing total system cost and complexity. In response to the lack of commercially available alternatives, organizations such as Google and Amazon developed their own kind of schema less stores- NoSQL data stores. These NoSQL or nonrelational database technologies are a better match for the needs of modern interactive software systems. But not every company can or should develop, maintain and support its own database technology. Our goal is to test the four technologies mentioned before using YCSB. YCSB test tool comes with a set of core workloads and a bunch of parameter files. The parameter file can be changed on a per DB instance. A workload represents the load that a given application will put on the database system. For benchmarking purposes, we define workloads that are relatively simple compared to real applications, so that we can better reason about the benchmarking results we get. However, a workload should be detailed enough so that once we measure the database s performance, we know what kinds of applications might experience similar performance. Thus we are writing our own new workloads by loading our own data on different datastores and running different operations on them. We are implementing aggregation as complex operations on the loaded data since aggregation operations are the most relevant and common ones. We are using node.js script that automates data generation as per the format specified in a file. Data up to 100GB has been produced using this script. The tier of scalability can be tested by adding newer nodes and data or removing existing ones and evaluating the behaviour of each datastore. II. PROBLEM CONTEXT We set the context of our work by briefly discussing the presence and need of NoSQL and architecture of YCSB. A. Presence and Need of NoSQL Just at the time when the database market seemed to many to be almost completely mature, a group of nonrelational data stores, collectively categorized as NoSQL databases, have attracted significant attention. These databases are often employed in public, massively scaled Web site scenarios, where traditional database features matter less, and fast fetching of relatively simple data sets matters most.eg Facebook with its 500,000,000 users or Twitter which accumulates Terabytes of data every single day. Many of these databases employ parallelized query mechanisms, horizontal partitioning and allow storage of heterogeneous, loosely-schematized data records. 422

2 In a NoSQL database, there is no fixed schema and no joins. A RDBMS "scales up" by getting faster and faster hardware and adding memory. NoSQL, on the other hand, can take advantage of "scaling out". Scaling out refers to spreading the load over many commodity systems. This is the component of NoSQL that makes it an inexpensive solution for large datasets. 1) Categories of NoSQL: The NoSQL technologies can be roughly classified into four categories Key- value stores, Column oriented stores, Document oriented and Graph based databases. Each of these has their idiosyncrasies and data model. However each of them use map-reduce for parallelized query and computation. The two kinds of categories that we are considering are Column oriented (HBase and Cassandra) and Document oriented data stores (MongoDB and CouchDB). B. YCSB YCSB Yahoo! Cloud Serving Benchmark is a benchmarking tool as well as a standard developed by Yahoo co. to test their PNUTS datastore, later open sourced in April A key design goal of this tool is extensibility, so that it can be used to benchmark new cloud database systems, and new workloads can be developed. YCSB benchmark differs from standard OLTP benchmarks like TPC-C. TPC-C contains several diverse types of queries meant to mimic a company warehouse environment. Some queries execute transactions over multiple tables. In contrast, the web applications we are benchmarking tend to run a huge number of extremely simple queries 1) Architecture of YCSB: The YCSB Client is a Java program for generating the data to be loaded to the database, and generating the operations which make up the workload. The architecture of the client is shown in Figure 1. The basic operation is that the workload executor drives multiple client threads. Each thread executes a sequential series of operations by making calls to the database interface layer, both to load the database (the load phase) and to execute the workload (the transaction phase). The threads throttle the rate at which they generate requests, so that we may directly control the offered load against the database. The threads also measure the latency and achieved throughput of their operations, and report these measurements to the statistics module. At the end of the experiment, the statistics module aggregates the measurements and reports average, 95th and 99th percentile latencies, and either a histogram or time series of the latencies. The Workload Executor contains code to execute both the load and transaction phases of the workload. The Database Interface Layer translates simple requests (such as read ()) from the client threads into calls against the database. C. Benchmarking Tiers Fig: 1 The cloud serving systems make different tradeoffs to make them suit a particular application scenario. There are many design decisions to make when building one of NoSQL systems, and those decisions have a huge impact on how the system performs for different workloads (e.g., read-heavy workloads vs. write-heavy workloads), how it scales, how it handles failures, ease of operation and tuning, etc. Hence we cannot have a cloud serving systems which is a panacea to all problems related to handling huge data sets. It hence becomes very vital to specify the criterions for evaluating the datastores. In this section we discuss the tiers on which we report about the NoSQL data stores briefly. 1) Tier 1- Performance: Performance tier measures the latency of requests when a database is under load. This is the most basic yet very crucial benchmarking tier, as the first question before any deployment is how much load the datastore will take per server. Latency of requests is strongly related to the throughput of the system. As the throughput is pushed higher, latency grows gradually. Eventually, as the system the system reaches saturation, latency jumps dramatically, and then the throughput can be pushed no higher. 2) Tier 2- Scalability: The big selling point of cloud serving systems is their ability to scale upward by adding more servers. 423

3 If the system offers low latency at small scale, as we proportionally increase workload size and the number of servers, latency should remain constant. If this is not the case, this is a hint the system might have bottlenecks that surface at scale. Elasticity is basically the characteristics of cloud serving systems to handle data while more nodes and servers are added while the system is running. 3) Tier 3 Availability: The tier of availability tests the capability of the datastore to provide data to the user despite of node failure. In a real life scenario, failure of commodity hardware s or servers is very common. Such failures can be caused due to network fails or other reasons. Usually the data stored in a cluster is replicated over multiple servers. The tier of availability is tested by Ssh to kill nodes of a cluster and checking the availability of data. The results are based on a workload specification. The Workloads are designed according to certain real life scenarios, specifying different parameters like read/write mix, record count, distribution etc. The difference in results, 95th and 99th percentile latencies, minimum, maximum and average latencies, and achieved throughput are a result of the tradeoffs made by each cloud serving system. As depicted by the figure 3.1, HBase could perform up to 7 operations within 0 seconds, whereas MongoDB displayed a spectacular performance by having 4223 operations done within the same time span. The results for the maximum and minimum latencies showed by the two cloud serving datastores HBase and MongoDB can be seen in the graph below (Fig 3.2). III. DESIGN AND IMPLEMENTATION We now present the specifics of our work. For our experiments, we have used 32-bit 2.5GHz Intel Dual-Core processors, 4GB of RAM to run each system. We also tested Cassandra 0.8 and HBase on multiple nodes (up to five nodes were created on a single machine). The YCSB Client was run with up to 100 threads i.e. client instances. Also we use Amazon Elastic Compute Cloud (Amazon EC2), which is basically a web service that provides resizable compute capacity in the cloud, to test the datastores with varying processing and data capacities. Fig: 3.2 Fig: 3.1 The graph depicted illustrates the performance measure in terms of percentile latencies and throughput for HBase v/s MongoDB. IV. EXPERIMENTAL EVALUATION We present benchmarking results for the systems tested by us. While Cassandra and HBase share similar data model, their underlying implementations are quite different. In our tests, we ran the workloads designed by us, both to measure performance (benchmark tier 1) and to measure scalability and elasticity (benchmark tier 2). Cassandra has a lower Read Latency than HBase on a readheavy workload. Whereas HBase emerges as a winner while running a write-heavy workload as it has an Update Latency lower than Cassandra. Cassandra scaled well as the number of nodes increased. It should be noted that the results reported here are for particular versions of datastore systems that are undergoing continuous development, and that performance may change and improve in the future. 424

4 V. DISCUSSIONS AND FUTURE WORK One aspect of the NoSQL movement has been a move away from trying to maintain completely perfect consistency across distributed servers which is achieved using ACID (Atomicity, Consistency, Isolation, and Durability) properties. To reduce this burden it places on databases, particularly in distributed systems, NoSQL databases have adopted BASE (Basically Available, Soft state, Eventually Consistent) properties and follow CAP theorem. The now famous CAP theorem states that of consistency (all nodes see the same data at the same time), availability (a guarantee that every request receives a response about whether it was successful or failed), and network partitioning (the system continues to operate despite arbitrary message loss); only two can be guaranteed at any time. Traditional relational databases have kept strict transactional semantics to preserve consistency, but many NoSQL databases are moving towards a more scalable architecture that relaxes consistency. Relaxing consistency is often called eventual consistency. This permits much more scalable distributed storage systems where writes can occur without using two phase commits or system-wide locks. Map Reduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. Map reduce is a programming model that serves for processing large data set in a massively parallel manner. Map reduce is used by almost all NoSQL technologies in some or the other way. "Map" step: The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node. "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output the answer to the problem it was originally trying to solve. In addition to performance comparisons, it is important to examine other aspects of cloud serving systems. Replication can be considered as one more benchmarking tier to examine these NoSQL technologies. Cloud systems use replication for both availability and performance. Replicas can be used for failover, and to spread out read load. VI. CONCLUSION We have presented the Yahoo Cloud Serving Benchmark results for the 4 technologies. This benchmark is designed to provide tools for apples-to apples comparison of different serving data stores. One contribution of the benchmark is an extensible workload generator, the YCSB Client, which can be used to load datasets and execute workloads across a variety of data serving systems. Another contribution is the definition of five core workloads, which begin to fill out the space of performance tradeoffs made by these systems. New workloads can be easily created, including generalized workloads to examine system fundamentals, as well as more domain-specific workloads to model particular applications. As an open-source package, the YCSB Client is available for developers to use and extend in order to effectively evaluate cloud systems. We have used this tool to benchmark the performance of four cloud serving systems, and observed that there are clear tradeoffs between read and write performance that result from each system s architectural decisions. REFERENCES [1] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears: Benchmarking Cloud Serving Systems with YCSB. In Yahoo! Research, Santa Clara, CA, USA. [2] Apache Cassandra. [3] Apache CouchDB. [4] Apache HBase. [5] MongoDB. [6] Neal Leavitt: Will NoSQL Databases Live Upto Their Promise?. Published by IEEE Computer Society, [7] Bogdan George Tudorica, Cristian Bucur: A comparison between several NoSQL databases with comments and notes. In Department for Economical Mathematics and Economical Informatics Petroleum-Gas University of Ploiesti, Ploiesti, Romania. [8] Yahoo! Cloud Serving Benchmark. [9] NoSQL Benchmarking. [10] Bringing Big Data to the Enterprise ibm.com/software/data/bigdata/. [11] What is HBase? [12] Java development 2.0: MongoDB: A NoSQL datastore with (all the right) RDBMS moves. [13] Exploring CouchDB

5 [14] Key-Value:Cassandra. [15] YCSB. [16] Lars George: HBase-The Definitive Guide. Published by O REILLY. [17] Eben Hewitt: Cassandra-The Definitive Guide. Published by O RILLY. [18] Kristina Chodorow and Michael Dirolf: MongoDB-The Definitive Guide. Published by O REILLY. [19] J. Chris Anderson, Jan Lehnardt and Noah Slater: CouchDB-The Definitive Guide. Published by O REILLY 426

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Yahoo! Cloud Serving Benchmark

Yahoo! Cloud Serving Benchmark Yahoo! Cloud Serving Benchmark Overview and results March 31, 2010 Brian F. Cooper [email protected] Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

NoSQL in der Cloud Why? Andreas Hartmann

NoSQL in der Cloud Why? Andreas Hartmann NoSQL in der Cloud Why? Andreas Hartmann 17.04.2013 17.04.2013 2 NoSQL in der Cloud Why? Quelle: http://res.sys-con.com/story/mar12/2188748/cloudbigdata_0_0.jpg Why Cloud??? 17.04.2013 3 NoSQL in der Cloud

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk Benchmarking Couchbase Server for Interactive Applications By Alexey Diomin and Kirill Grigorchuk Contents 1. Introduction... 3 2. A brief overview of Cassandra, MongoDB, and Couchbase... 3 3. Key criteria

More information

MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 MongoDB in the NoSQL and SQL world. Horst Rechner [email protected] Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment International Journal of Applied Information Systems (IJAIS) ISSN : 2249-868 Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment Yusuf Abubakar Department of Computer Science

More information

bigdata Managing Scale in Ontological Systems

bigdata Managing Scale in Ontological Systems Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural

More information

LARGE-SCALE DATA STORAGE APPLICATIONS

LARGE-SCALE DATA STORAGE APPLICATIONS BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI

More information

The Quest for Extreme Scalability

The Quest for Extreme Scalability The Quest for Extreme Scalability In times of a growing audience, very successful internet applications have all been facing the same database issue: while web servers can be multiplied without too many

More information

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information

More information

A survey of big data architectures for handling massive data

A survey of big data architectures for handling massive data CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - [email protected] Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Benchmarking Replication in NoSQL Data Stores

Benchmarking Replication in NoSQL Data Stores Imperial College London Department of Computing Benchmarking Replication in NoSQL Data Stores by Gerard Haughian (gh43) Submitted in partial fulfilment of the requirements for the MSc Degree in Computing

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions Big Data Solutions Portal Development with MongoDB and Liferay Solutions Introduction Companies have made huge investments in Business Intelligence and analytics to better understand their clients and

More information

Accelerating Big Data: Using SanDisk SSDs for MongoDB Workloads

Accelerating Big Data: Using SanDisk SSDs for MongoDB Workloads WHITE PAPER Accelerating Big Data: Using SanDisk s for MongoDB Workloads December 214 951 SanDisk Drive, Milpitas, CA 9535 214 SanDIsk Corporation. All rights reserved www.sandisk.com Accelerating Big

More information

Introduction to NOSQL

Introduction to NOSQL Introduction to NOSQL Université Paris-Est Marne la Vallée, LIGM UMR CNRS 8049, France January 31, 2014 Motivations NOSQL stands for Not Only SQL Motivations Exponential growth of data set size (161Eo

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure

More information

Performance Analysis for NoSQL and SQL

Performance Analysis for NoSQL and SQL Available online at www.ijiere.com International Journal of Innovative and Emerging Research in Engineering e-issn: 2394-3343 p-issn: 2394-5494 Performance Analysis for NoSQL and SQL Ms. Megha Katkar ME

More information

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING BRAC UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING 12-12-2012 Investigating Cloud Data Storage Sumaiya Binte Mostafa (ID 08301001) Firoza Tabassum (ID 09101028) BRAC University

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

wow CPSC350 relational schemas table normalization practical use of relational algebraic operators tuple relational calculus and their expression in a declarative query language relational schemas CPSC350

More information

Preparing Your Data For Cloud

Preparing Your Data For Cloud Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

NoSQL Database Options

NoSQL Database Options NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has

More information

nosql and Non Relational Databases

nosql and Non Relational Databases nosql and Non Relational Databases Image src: http://www.pentaho.com/big-data/nosql/ Matthias Lee Johns Hopkins University What NoSQL? Yes no SQL.. Atleast not only SQL Large class of Non Relaltional Databases

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage. Jürmo Mehine Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

More information

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL [email protected] / @marcua

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL [email protected] / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in

More information

BIG DATA TOOLS. Top 10 open source technologies for Big Data

BIG DATA TOOLS. Top 10 open source technologies for Big Data BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed

More information

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

MASTER PROJECT. Resource Provisioning for NoSQL Datastores Vrije Universiteit Amsterdam MASTER PROJECT - Parallel and Distributed Computer Systems - Resource Provisioning for NoSQL Datastores Scientific Adviser Dr. Guillaume Pierre Author Eng. Mihai-Dorin Istin

More information

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

More information

these three NoSQL databases because I wanted to see a the two different sides of the CAP

these three NoSQL databases because I wanted to see a the two different sides of the CAP Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

NoSQL. Thomas Neumann 1 / 22

NoSQL. Thomas Neumann 1 / 22 NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

NoSQL Systems for Big Data Management

NoSQL Systems for Big Data Management NoSQL Systems for Big Data Management Venkat N Gudivada East Carolina University Greenville, North Carolina USA Venkat Gudivada NoSQL Systems for Big Data Management 1/28 Outline 1 An Overview of NoSQL

More information

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de [email protected] T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

NoSQL Databases. Nikos Parlavantzas

NoSQL Databases. Nikos Parlavantzas !!!! NoSQL Databases Nikos Parlavantzas Lecture overview 2 Objective! Present the main concepts necessary for understanding NoSQL databases! Provide an overview of current NoSQL technologies Outline 3!

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management A B M Moniruzzaman Department of Computer Science and Engineering, Daffodil International

More information

CloudDB: A Data Store for all Sizes in the Cloud

CloudDB: A Data Store for all Sizes in the Cloud CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory) WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Transactions and ACID in MongoDB

Transactions and ACID in MongoDB Transactions and ACID in MongoDB Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently

More information

Performance Testing of Big Data Applications

Performance Testing of Big Data Applications Paper submitted for STC 2013 Performance Testing of Big Data Applications Author: Mustafa Batterywala: Performance Architect Impetus Technologies [email protected] Shirish Bhale: Director of Engineering

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra [email protected] Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

MongoDB Developer and Administrator Certification Course Agenda

MongoDB Developer and Administrator Certification Course Agenda MongoDB Developer and Administrator Certification Course Agenda Lesson 1: NoSQL Database Introduction What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL Types of NoSQL

More information

Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort

Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort Alexander Pokluda Cheriton School of Computer Science University of Waterloo 2 University Avenue

More information

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive

More information

Practical Cassandra. Vitalii Tymchyshyn [email protected] @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

MongoDB and Couchbase

MongoDB and Couchbase Benchmarking MongoDB and Couchbase No-SQL Databases Alex Voss Chris Choi University of St Andrews TOP 2 Questions Should a social scientist buy MORE or UPGRADE computers? Which DATABASE(s)? Document Oriented

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Slave. Master. Research Scholar, Bharathiar University

Slave. Master. Research Scholar, Bharathiar University Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful. Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with

More information