SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford



Similar documents
Databases 2 (VU) ( )

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Lecture Data Warehouse Systems

Preparing Your Data For Cloud

Cloud Scale Distributed Data Storage. Jürmo Mehine

Structured Data Storage


NoSQL Databases. Nikos Parlavantzas

How To Handle Big Data With A Data Scientist

Introduction to NOSQL

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Advanced Data Management Technologies

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Can the Elephants Handle the NoSQL Onslaught?

Big Data Technologies Compared June 2014

NoSQL Database Systems and their Security Challenges

Applications for Big Data Analytics

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Integrating Big Data into the Computing Curricula

Challenges for Data Driven Systems

Sentimental Analysis using Hadoop Phase 2: Week 2

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

MapReduce with Apache Hadoop Analysing Big Data

NoSQL in der Cloud Why? Andreas Hartmann

NoSQL. Thomas Neumann 1 / 22

Open source large scale distributed data management with Google s MapReduce and Bigtable

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

The Quest for Extreme Scalability

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

So What s the Big Deal?

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Introduction to Big Data Training

Big Data and Data Science: Behind the Buzz Words

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

NoSQL Data Base Basics

nosql and Non Relational Databases

Data Modeling for Big Data

An Approach to Implement Map Reduce with NoSQL Databases

An Open Source NoSQL solution for Internet Access Logs Analysis

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Cloud & Big Data a perfect marriage? Patrick Valduriez

Benchmarking and Analysis of NoSQL Technologies

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

INTRODUCTION TO CASSANDRA

How To Improve Performance In A Database

Slave. Master. Research Scholar, Bharathiar University

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Introduction to NoSQL

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

NoSQL Systems for Big Data Management

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Implement Hadoop jobs to extract business value from large and varied data sets

NOSQL DATABASES AND CASSANDRA

CloudDB: A Data Store for all Sizes in the Cloud

Hadoop implementation of MapReduce computational model. Ján Vaňo

Report Data Management in the Cloud: Limitations and Opportunities

Cloud Computing at Google. Architecture

Practical Cassandra. Vitalii

Infrastructures for big data

How To Scale Out Of A Nosql Database

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Big Systems, Big Data

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Understanding NoSQL Technologies on Windows Azure

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

NoSQL Database Options

Understanding NoSQL on Microsoft Azure

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Current Data Security Issues of NoSQL Databases

NoSQL for SQL Professionals William McKnight

NoSQL Evaluation. A Use Case Oriented Survey

Transforming the Telecoms Business using Big Data and Analytics

Enterprise Operational SQL on Hadoop Trafodion Overview

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Data Management in the Cloud -

Transcription:

SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55

Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems and business intelligence Not every data management/analysis problem is best solved exclusively using a traditional relational DBMS Does not scale linearly beyond a few nodes because the need to provide shared-memory abstraction vs. recent MPP databases (e.g., Greenplum, AsterDB, Hive, HadoopDB)

Greenplum an MPP Database Builds on open source database PostgreSQL (non-parallel, single threaded) Utilizes shared-nothing, MPP architecture Automatic data partitioning across multiple segment servers, each owning a portion of data Parallel query optimizer, considering the cost of moving data between nodes Execution plans include relational database operations as well as parallel motion operations 57

Greenplum Architecture 58

NoSQL: The Name NoSQL = Not only using traditional relational DBMS and SQL No SQL Don t use SQL language 59

What does DBMS Do Well Database Management System (DBMS) provides. efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of consistent and persistent data.

NoSQL Systems Alternative to traditional relational DBMS + Flexible schema + Quicker/cheaper to set up + Massive scalability + Relaxed consistency (all nodes see the same data same time) higher performance & availability No declarative query language more programming Relaxed consistency fewer guarantees

Transactions ACID Properties Atomic All of the work in a transaction completes (commit) or none of it completes Consistent A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints. Isolated The results of any changes made during a transaction are not visible until the transaction has committed. Durable The results of a committed transaction survive failures

CAP Theorem Consistency (all nodes see the same data at the same time) Availability (a guarantee that every request receives a response about whether it was successful or failed) Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) CAP Theorem: A distributed system cannot satisfy all three of these guarantees at the same time 63

BASE Transactions Basically Available Soft state (i.e., cached data) Eventually Consistent Characteristics: Weak consistency stale data OK Availability first Sacrifice consistency to be immediately responsive, even if a failure prevents the first-tier service from connecting to some needed resource Criticized as increasing the complexity of distributed software applications

Examples of NoSQL Systems MapReduce framework (e.g., Hadoop, Hive) Key-value stores (e.g., Hbase, BigTable, BerkeleyDB) Document stores (e.g., couchdb, MongoDB) Graph Database systems (e.g., Neo4j)

Data Manipulation and Analysis Data Retrieval Document store Key-value store Join, Group By, Aggregation, Recursive queries Schema and complex XQL queries Hive and Graph DBs Statistical machine learning methods Natural language processing Clustering and classification UDFs, MapReduce programs 66

Example #1: Web log Analysis Each record: UserID, URL, timestamp, additional-info Task: Count # of records for Each UserID Each URL Each timestamp Certain construct appearing in additional-info

Example #1: Web log Analysis Each record: UserID, URL, timestamp, additional-info Separate records: UserID, name, age, gender, Task: Find average age of user accessing each URL Other Tasks?

Example #2: Social network Graph Each record: UserID 1, UserID 2 Separate records: UserID, name, age, gender, Task: Find all friends of friends of friends of friends of given user Other Tasks?

Example #3: Wikipedia Pages Large collection of documents Combination of structured and unstructured data Task: Retrieve introductory paragraph of all pages about U.S. presidents before 1900 Other Tasks?

MapReduce Framework Originally from Google, open source Hadoop No data model, data stored in files User provides specific functions: map() and reduce() System provides data processing glue, fault-tolerance, and scalability

MapReduce Framework Schemas and declarative queries are missed Hive schemas, SQL-like query language Pig more imperative but with relational operators Both compile to workflow of Hadoop (MapReduce) jobs

Key-Value Stores Extremely simple interface Data model: (key, value) pairs Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Implementation: efficiency, scalability, faulttolerance Records distributed to nodes based on key Replication Single-record transactions, eventual consistency

Key-Value Stores Extremely simple interface Data model: (key, value) pairs Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Some allow (non-uniform) columns within value Some allow Fetch on range of keys Example systems Google BigTable, Amazon Dynamo, Cassandra, Voldemort, HBase,

Document Stores Like Key-Value Stores except value is document Data model: (key, document) pairs Document: JSON, XML, other semistructured formats Basic operations: Insert(key,document), Fetch(key), Update(key), Delete(key) Also Fetch based on document contents Example systems CouchDB, MongoDB, SimpleDB,