NoSQL databases. Mrs.Archana kalia Lecturer in Information Technology Dept. VPMs Polytechnic College,Thane

Similar documents
extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

NoSQL Evaluation. A Use Case Oriented Survey

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

How To Write A Database Program

Lecture Data Warehouse Systems

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

bigdata Managing Scale in Ontological Systems

A survey of big data architectures for handling massive data

Can the Elephants Handle the NoSQL Onslaught?

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Transactions and ACID in MongoDB

these three NoSQL databases because I wanted to see a the two different sides of the CAP

How To Improve Performance In A Database

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

AllegroGraph. a graph database. Gary King gwking@franz.com

NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services

InfiniteGraph: The Distributed Graph Database

Apache HBase. Crazy dances on the elephant back

Practical Cassandra. Vitalii

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Big Data Management and NoSQL Databases

MS SQL Performance (Tuning) Best Practices:

Challenges for Data Driven Systems

Advanced Data Management Technologies

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Distributed Data Management

Preparing Your Data For Cloud

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

NoSQL for SQL Professionals William McKnight

Object Oriented Database Management System for Decision Support System.

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Data Management in the Cloud

CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY

NoSQL Database Options

NoSQL Databases. Nikos Parlavantzas

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

FIS GT.M Multi-purpose Universal NoSQL Database. K.S. Bhaskar Development Director, FIS +1 (610)

Databases : Lecture 11 : Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

NoSQL Databases. Polyglot Persistence

Database Scalability and Oracle 12c

NoSQL Database Systems and their Security Challenges

Hypertable Architecture Overview

Infrastructures for big data

Referential Integrity in Cloud NoSQL Databases

Big Data With Hadoop

Introduction to Big Data Training

Structured Data Storage

An Approach to Implement Map Reduce with NoSQL Databases

2.1.5 Storing your application s structured data in a cloud database

NoSQL Systems for Big Data Management

NoSQL. Thomas Neumann 1 / 22

Physical Database Design and Tuning

nosql and Non Relational Databases

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

1. Physical Database Design in Relational Databases (1)

Data Modeling for Big Data

Integrating Big Data into the Computing Curricula

How To Scale Out Of A Nosql Database

Introduction to NOSQL

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Hadoop Ecosystem B Y R A H I M A.

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Textbook and References

Cloud Computing with Microsoft Azure

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Benchmarking and Analysis of NoSQL Technologies

Cloud Computing at Google. Architecture

Development of nosql data storage for the ATLAS PanDA Monitoring System

Big Data Database Revenue and Market Forecast,

Introduction to Database Systems CSE 444. Lecture 24: Databases as a Service

Do Relational Databases Belong in the Cloud? Michael Stiefel

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Domain driven design, NoSQL and multi-model databases


Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

The Sierra Clustered Database Engine, the technology at the heart of

Database Systems. Lecture 1: Introduction

NoSQL in der Cloud Why? Andreas Hartmann

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Big Systems, Big Data

Graph Database Proof of Concept Report

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Fault Tolerant Servers: The Choice for Continuous Availability

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Cloud Database Emergence

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Cassandra A Decentralized, Structured Storage System

Large-Scale Web Applications

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Cloud Computing Is In Your Future


Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Transcription:

NoSQL databases Mrs.Archana kalia Lecturer in Information Technology Dept. VPMs Polytechnic College,Thane Abstract: NoSQL (Not only SQL) is a database used to store large amounts of data. NoSQL databases are distributed, non-relational, open source and are horizontally scalable (in linear way). NOSQL does not follow property of ACID as we follow in SQL. In this paper, we are surveying about NoSQL, its background, fundamentals like ACID, BASE and CAP theorem. Since it is very difficult to choose a suitable database for a specific use case, this paper evaluates the underlying techniques of NoSQL databases considering their applicability for certain requirements. Introduction: In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of these data is handled by Relational database management systems (RDBMS). The idea of relational model came with E.F.Codd s 1970 paper "A relational model of data for large shared data banks" which made data modeling and application programming much easier. Beyond the intended benefits, the relational model is well-suited to clientserver programming and today it is predominant technology for storing structured data in web and business applications. NoSQL is a non-relational database management systems, different from traditional relational database management systems in some significant ways. It is designed for distributed data stores where very large scale of data storing needs (for example Google or Facebook which collects terabits of data every day for their users). These type of data storing may not require fixed schema, avoid join operations and typically scale horizontally. In today s time data is becoming easier to access and capture through third parties such as Facebook, Google+ and others. Personal user information, social graphs, geo location data, usergenerated content and machine logging data are just a few examples where the data has been increasing exponentially. To avail the above service properly, it is required to process huge amount of data which SQL databases were never designed. The evolution of NoSql databases is to handle these huge data properly. NoSQL systems generally have six key features: 1. the ability to horizontally scale simple operation throughput over many servers, 2. the ability to replicate and to distribute (partition) data over many servers, 3. a simple call level interface or protocol (in contrast to a SQL binding), 4. a weaker concurrency model than the ACID transactions of most relational (SQL) Database systems, 5. efficient use of distributed indexes and RAM for data storage, and 6. the ability to dynamically add new attributes to data records. 1

ACID free ACID stands for Atomicity, Consistency, Isolation and Durability. ACID concept basically comes from the SQL environment. But in NoSQL we will not use the ACID concept because of Consistency feature of SQL. As in the distributed environment, data is spread to different machines, each machine stores its data and maintenance of consistency is needed. For example, if there is change in one tuple of the table then changes are needed in each and every machine on which that particular data resides. If information regarding an updation spreads immediately, then consistency is given; if not, then inconsistency is carried BASE BASE stands for Basically, Available, Soft state, and Eventual consistency. BASE is reverse of ACID. NoSQL databases are divided in between the road from ACID to BASE. After a transaction consistency the state that we will get is soft state not a solid state. The main focus leading behind the BASE is the permanent availability. For example, thinking about the databases in banks, if two persons are accessing the same account in different cities then data updations is needed not just in time but needs some real time databases as well. Those updations need to be done frequently on all machines. Some more examples are online railway reservation, online book trade, etc. SCALABILITY In electronics (including hardware, communication and software), scalability is the ability of a system to expand to meet your business needs. For example scaling a web application is all about allowing more people to use your application. We scale a system by upgrading the existing hardware without changing much of the application or by adding extra hardware. There are two ways of scaling horizontal and vertical scaling : Vertical scaling To scale vertically (or scale up) means to add resources within the same logical unit to increase capacity. For example to add CPUs to an existing server, increase memory in the system or expanding storage by adding hard drive. Horizontal scaling To scale horizontally (or scale out) means to add more nodes to a system, such as adding a new computer to a distributed software application. In NoSQL system, data store can be much faster as it takes advantage of scaling out which means to add more nodes to a system and distribute the load over those nodes. CAP CAP stands for Consistency, Availability and Partition tolerance. CAP is basically a theorem that follows three principles Consistency - This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data. Availability - This means that the system is always on (service guarantee availability), no downtime. 2

Partition Tolerance - This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another. In theoretically it is impossible to fulfill all 3 requirements. CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. Therefore the entire current NoSQL database follow the different combinations of the C, A, P from the CAP theorem. Here is the brief description of three combinations CA, CP, AP: CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the system blocks. CP - Some data may not be accessible, but the rest is still consistent/accurate. AP - System is still available under partitioning, but some of the data returned may be inaccurate. NoSQL data store types On the basis of CAP theorem NoSQL databases are divided into number of databases. There are four new different types of data stores in NoSQL A. Key Value Stores Key value stores are similar to maps or dictionaries where data is addressed by a unique key. Since values are uninterrupted byte arrays, which are completely opaque to the system, keys are the only way to retrieve stored data. Values are isolated and independent from each other wherefore relationships must be handled in application logic. Due to this very simple data structure, key value stores are completely schema free. New values of any kind can be added at runtime without conflicting any other stored data and without influencing system availability. The grouping of key value pairs into collection is the only offered possibility to add some kind of structure to the data model. Key value stores are useful for simple operations, which are based on key attributes only. In order to speed up a user specific rendered webpage, parts of this page can be calculated before and served quickly and easily out of the store by user IDs when needed. Since most key value stores hold their dataset in memory, they are oftentimes used for caching of more time intensive SQL queries. B. Document Stores Document Stores encapsulate key value pairs in JSON or JSON like documents. Within documents, keys have to be unique. Every document contains a special key "ID", which is also unique within a collection of documents and therefore identifies a document explicitly. In contrast to key value stores, values are not opaque to the system and can be queried as well. Therefore, complex data structures like nested objects can be handled more conveniently. Storing data in interpretable JSON documents have the additional advantage of supporting data types, which makes document stores very developerfriendly. Similar to key value stores, document stores do not have any schema restrictions. Storing new documents containing any kind of attributes can as easily be done as adding new attributes to existing documents at runtime. Document stores offer multi attribute lookups on records which may have complete different kinds of key value pairs. Therefore, these systems are very convenient in data integration and schema migration tasks. (JSON is the data structure of the Web. It's a simple data format that allows programmers to store and communicate sets of values, lists, and key-value mappings across systems. As JSON adoption has grown, database vendors have sprung up offering JSON-centric document databases.) 3

C. Column Family Stores Column Family Stores are also known as column oriented stores, extensible record stores and wide columnar stores. All stores are inspired by Goggles Big table, which is a "distributed storage system for managing structured data that is designed to scale to a very large size". Big table is used in many Google projects varying in requirements of high throughput and latency-sensitive data serving. The data model is described as "sparse, distributed, persistent multidimensional sorted map". In this map, an arbitrary number of key value pairs can be stored within rows. Since values cannot be interpreted by the system, relationships between datasets and any other data types than strings are not supported natively. Similar to key value stores, these additional features have to be implemented in the application logic. Multiple versions of a value are stored in a chronological order to support versioning on the one hand and achieving better performance and consistency on the other one (chapter four). Columns can be grouped to column families, which is especially important for data organization and partitioning. D. Graph Databases In contrast to relational databases and the already introduced key oriented NoSQL databases, graph databases are specialized on efficient management of heavily linked data. Therefore, applications based on data with many relationships are more suited for graph databases, since cost intensive operations like recursive joins can be replaced by efficient traversals Property graphs are distinct from resource description framework stores like Sesame [18] and Big data which are specialized on querying and analyzing subject-predicate-object statements. Since the whole set of triples can be represented as directed multi relational graph, RDF frameworks are considered as a special form of graph databases in this paper too. In contrast to property graphs, these RDF graphs do not offer the possibility of adding additional key value pairs to edges and nodes. On the other handy, by use of RDF schema and the web ontology language it is possible to define a more complex and more expressive schema, than property graph databases do. Use cases for graph databases are location based services, knowledge representation and path finding problems raised in navigation systems, recommendation systems and all other use cases which involve complex relationships. Property graph databases are more suitable for large relationships over many nodes, whereas RDF is used for certain details in a graph. CHARACTERISTICS OF NoSQL *NoSQL does not use the relational data model thus does not use SQL language. *NoSQL stores large volume of data. *In distributed environment (spread data to different machines), we use NoSQL without any inconsistency. *If any faults or failures exist in any machine, then in this there will be no discontinuation of any work. * NoSQL is open source database, i.e. its source code is available to everyone and is free to use it without any overheads. *NoSQL allows data to store in any record that is it is not having any fixed schema. * NoSQL does not use concept of ACID properties. * NoSQL is horizontally scalable leading to high performance in a linear way. * It is having more flexible str 4

SQL vs NoSQL SQL (relational) versus NoSQL scalability is a controversial topic. This paper argues against both extremes. Here is some more background to support this position. The argument for relational over NoSQL goes something like this: * If new relational systems can do everything that a NoSQL system can, with analogous performance and scalability, and with the convenience of transactions and SQL, why would you choose a NoSQL system? * Relational DBMSs have taken and retained majority market share over other competitors in the past 30 years: network, object, and XML DBMSs. * Successful relational DBMSs have been built to handle other specific application loads in the past: read-only or read-mostly data warehousing, OLTP on multi-core multi-disk CPUs, in-memory databases, distributed databases, and now horizontally scaled databases. * While we don t see one size fits all in the SQL products themselves, we do see a common interface with SQL, transactions, and relational schema that give advantages in training, continuity, and data interchange. The counter-argument for NoSQL goes something like this: * We haven t yet seen good benchmarks showing that RDBMSs can achieve scaling comparable with NoSQL systems like Google s BigTable. * If you only require a lookup of objects based on a single key, then a key-value store is adequate and probably easier to understand than a relational DBMS. Likewise for a document store on a simple application: you only pay the learning curve for the level of complexity you require. * Some applications require a flexible schema allowing each object in a collection to have different attributes. While some RDBMSs allow efficient packing of tuples with missing attributes, and some allow adding new attributes at runtime, this is uncommon. * A relational DBMS makes expensive (multimode multi-table) operations too easy. NoSQL systems make them impossible or obviously expensive for programmers. * While RDBMSs have maintained majority market share over the years, other products have established smaller but non-trivial markets in areas where there is a need for particular capabilities,e.g. indexed objects with products like BerkeleyDB, or graph-following operations with object-oriented DBMSs. Both sides of this argument have merit. 5

CONCLUSION AND FUTURE WORK The main aim of this paper is to give an overview of NoSQL databases, about how it has declined the dominance of SQL, with its background and characteristics. It also describes its fundamentals that form the base of the NoSQL databases like ACID, BASE and CAP theorem. ACID property is not used in the NoSQL databases databases because of data consistency so we get to know how SQL lags data consistency. Later, on the basis of the CAP theorem we described different types of NoSQL databases that are Key-Value databases, Document Store Databases, Columnar based databases and Graph databases.. Further research is going on in the new technologies that are arising for or after NoSQL that is polygon persistence, etc. REFERENCES 1. Scalable SQL and NoSQL Data Stores by Rick Cattell Originally published in 2010, 2. NoSQL databases: a step to database scalability in web environment byjaroslav Pokorny Department of Software Engineering,Faculty of Mathematics and Physics, Charles University,Praha, Czech Republic. 3. NoSQL Evaluation,A Use Case Oriented Survey by Robin Hecht Chair of Applied Computer Science IVUniversity of Bayreuth, Germany robin.hecht@uni -bayreuth.de 4. Managing Schema Evolution in NoSQL Data Stores by Stefanie Scherzinger,Regensburg University of Applied Sciences,stefanie.scherzinger@hs-regensburg.de, Meike Klettke University of Rostock,meike.klettke@uni-rostock.deUta St orl,darmstadt Universityof Applied Sciencesuta.stoerl@h-da.de 6