Databases 2 (VU) (707.030)



Similar documents
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Applications for Big Data Analytics

Cloud Scale Distributed Data Storage. Jürmo Mehine

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Preparing Your Data For Cloud

Lecture Data Warehouse Systems

NoSQL Databases. Nikos Parlavantzas

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

NoSQL Data Base Basics

Big Data Analytics. Rasoul Karimi

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

How To Improve Performance In A Database

INTRODUCTION TO CASSANDRA

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Introduction to NOSQL

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Slave. Master. Research Scholar, Bharathiar University

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Big Data Technologies Compared June 2014

Structured Data Storage

Challenges for Data Driven Systems

Introduction to NoSQL

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Open source large scale distributed data management with Google s MapReduce and Bigtable

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

An Open Source NoSQL solution for Internet Access Logs Analysis


The Quest for Extreme Scalability

Advanced Data Management Technologies

Infrastructures for big data

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Comparing SQL and NOSQL databases

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

NoSQL Systems for Big Data Management

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

NoSQL Evaluation. A Use Case Oriented Survey

Cloud & Big Data a perfect marriage? Patrick Valduriez

Data Modeling for Big Data

How To Handle Big Data With A Data Scientist

So What s the Big Deal?

Comparing Scalable NOSQL Databases

Current Data Security Issues of NoSQL Databases

Enterprise Operational SQL on Hadoop Trafodion Overview

NoSQL Motivation. RDBMS values. Admin [1] Title: CS61 Lecture 15 Author: Charles C. Palmer Date: May 26, 2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

CISC 432/CMPE 432/CISC 832 Advanced Database Systems

REAL-TIME BIG DATA ANALYTICS

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

NoSQL. What Is NoSQL? Why NoSQL?

NoSQL for SQL Professionals William McKnight

Big Systems, Big Data

Understanding NoSQL Technologies on Windows Azure

CloudDB: A Data Store for all Sizes in the Cloud

Integrating Big Data into the Computing Curricula

Cassandra A Decentralized, Structured Storage System

NoSQL Database Systems and their Security Challenges

NoSQL. Thomas Neumann 1 / 22

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

Sentimental Analysis using Hadoop Phase 2: Week 2

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Big Data and Data Science: Behind the Buzz Words

An Approach to Implement Map Reduce with NoSQL Databases

Open Source Technologies on Microsoft Azure

NoSQL Databases. Polyglot Persistence

NoSQL and Graph Database

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

A Selection Method of Database System in Bigdata Environment: A Case Study From Smart Education Service in Korea

nosql and Non Relational Databases

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Hurtownie Danych i Business Intelligence: Big Data

GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Understanding NoSQL on Microsoft Azure

BIRT in the World of Big Data

How To Write A Database Program

Composite Software Data Virtualization Turbocharge Analytics with Big Data and Data Virtualization

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

NoSQL Databases: a step to database scalability in Web environment

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

A programming model in Cloud: MapReduce

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Parallel Techniques for Big Data. Patrick Valduriez INRIA, Montpellier

Transcription:

Databases 2 (VU) (707.030) Introduction to NoSQL Denis Helic KMI, TU Graz Oct 14, 2013 Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 1 / 37

Outline 1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4 NoSQL Approaches Slides Slides are based on Introduction to Databases course from Stanford University by Jennifer Widom Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 2 / 37

NoSQL Motivation NoSQL: What is in the name? Very often SQL = traditional relations DBMS Experience from the past decade: not every data management/analysis problem is best solved using a traditional relational DBMS NoSQL = not using traditional relational DBMS NoSQL don t use SQL language NoSQL = Not Only SQL Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 3 / 37

NoSQL Motivation Databases: Types of problems DBMS Not every data management/analysis problem is best solved using a traditional DBMS Database Management System (DBMS) provides... efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data. Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 4 / 37

NoSQL Motivation Databases: Types of problems Not every data management/analysis problem is best solved using a traditional DBMS Convenient Multi-user Safe Persistent Reliable Massive Efficient Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 5 / 37

NoSQL Motivation Databases: Types of problems Not every data management/analysis problem is best solved using a traditional DBMS E.g. a tagging system http://delicious.com/ http://www.bibsonomy.org/ The relation between tags and URLs is a many-to-many (m : n) relation Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 6 / 37

NoSQL Motivation Tagging url id url 1 kti.tugraz.at 2 www.tugraz.az...... Table: URL table Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 7 / 37

NoSQL Motivation Tagging tag id tag 1 knowledge 2 data 3 school...... Table: Tag table Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 8 / 37

NoSQL Motivation Tagging tag id url id 1 1 2 1 3 2...... Table: Bookmark table Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 9 / 37

NoSQL Motivation Tagging To retrieve tags and URLs you JOIN the tables and select records that fulfill a criteria E.g. to form a tag cloud http://www.bibsonomy.org/ How to retrieve related tags? Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 10 / 37

NoSQL Motivation Tagging Retrieve all URLs for a tag, then for each URL retrieve all tags Hub tags have millions of URLs That is just one-hop neighbors Two-hop, three-hop neighbors? What is the problem here? Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 11 / 37

NoSQL Motivation Tagging Retrieve all URLs for a tag, then for each URL retrieve all tags Hub tags have millions of URLs That is just one-hop neighbors Two-hop, three-hop neighbors? What is the problem here? Relational data model is not well-suited for tagging Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 11 / 37

Tagging NoSQL Motivation knowledge data school... kti.tugraz.at www.tugraz.at... Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 12 / 37

NoSQL Motivation Tagging Graph-based model is better suited Bipartite graph Finding related tags is equivalent to traversing graphs E.g. breadth-first search Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 13 / 37

NoSQL Systems NoSQL systems Alternative to traditional relational DBMS Advantages Flexible schema Quicker/cheaper to set up Massive scalability Relaxed consistency higher performance & availability Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 14 / 37

NoSQL Systems NoSQL systems Alternative to traditional relational DBMS Disadvantages No declarative query language more programming Relaxed consistency fewer guarantees Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 15 / 37

NoSQL Examples Example: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Load into database system Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 16 / 37

NoSQL Examples Example: Web log analysis Relational DBMS 1 Data cleaning 2 Data extraction 3 Verification 4 Schema specification NoSQL 1 Nothing Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 17 / 37

NoSQL Examples Example: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Find all records for: Given UserID Given URL Given timestamp Certain construct appearing in additional-info None of the operations requires SQL: NoSQL Highly parallelizable Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 18 / 37

NoSQL Examples Example: Web log analysis Each record: UserID, URL, timestamp, additional-info Task: Find all pairs of UserIDs accessing same URL Here a SELFJOIN is required But very unlikely that we will have such a query at all Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 19 / 37

NoSQL Examples Example: Web log analysis Each record: UserID, URL, timestamp, additional-info Separate records: UserID, name, age, gender,... Task: Find average age of user accessing given URL SQL-like Some aspects of NoSQL: do we need consistency? Statistical operation: e.g. we could sample Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 20 / 37

NoSQL Examples Example: Social network Each record: UserID1, UserID2 Separate records: UserID, name, age, gender,... Task: Find all friends of given user Simple operation: we do not need a complicated query language Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 21 / 37

NoSQL Examples Example: Social network Each record: UserID1, UserID2 Separate records: UserID, name, age, gender,... Task: Find all friends of friends given user The number of JOINs increases Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 22 / 37

NoSQL Examples Example: Social network Each record: UserID1, UserID2 Separate records: UserID, name, age, gender,... Task: Find all women friends of men friends of given user The number of JOINs increases Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 23 / 37

NoSQL Examples Example: Social network Each record: UserID1, UserID2 Separate records: UserID, name, age, gender,... Task: Find all friends of friends of friends of... friends of given user The number of JOINs explodes Graph-based data model Do we need consistency? Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 24 / 37

NoSQL Examples Example: Wikipedia Large collection of documents Combination of structured and unstructured data Task: Retrieve introductory paragraph of all pages about U.S. presidents before 1900 Not suitable for SQL systems because of the mix of structured and unstructured database Do we need consistency? Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 25 / 37

NoSQL Examples NoSQL systems Alternative to traditional relational DBMS Advantages Flexible schema Quicker/cheaper to set up Massive scalability Relaxed consistency higher performance & availability Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 26 / 37

NoSQL Examples NoSQL systems Alternative to traditional relational DBMS Disadvantages No declarative query language more programming Relaxed consistency fewer guarantees Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 27 / 37

NoSQL Approaches Types of NoSQL systems Map-Reduce framework Key-value stores Document stores Graph database systems Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 28 / 37

NoSQL Approaches Map-Reduce Framework Originally from Google Open source Hadoop No data model, data stored in files User provides specific functions Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 29 / 37

NoSQL Approaches Map-Reduce Framework System provides data processing glue, fault-tolerance, scalability Map and Reduce functions Map: Divide problem into subproblems Reduce: Do work on subproblems, combine results Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 30 / 37

NoSQL Approaches Key-Value Stores Extremely simple interface Data model: (key, value) pairs Operations: Create(key,value), Retrieve(key), Update(key), Delete(key) CRUD operations Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 31 / 37

NoSQL Approaches Key-Value Stores Implementation: efficiency, scalability, fault-tolerance Records distributed to nodes based on key Replication Single-record transactions, eventual consistency Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 32 / 37

NoSQL Approaches Key-Value Stores Some allow (non-uniform) columns within value Some allow Retrieve on range of keys Example systems Google BigTable, Amazon Dynamo, Cassandra, Voldemort, HBase, etc. Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 33 / 37

NoSQL Approaches Document Stores Like Key-Value Stores except value is document Data model: (key, document) pairs Document: JSON, XML, other semistructured formats Basic operations: Create(key,document), Retrieve(key), Update(key), Delete(key) CRUD operations Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 34 / 37

NoSQL Approaches Document Stores Also Retrieve based on document contents Example systems CouchDB, MongoDB, SimpleDB, etc. Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 35 / 37

NoSQL Approaches Graph database systems Data model: nodes and links Nodes may have properties (including ID) Links may have labels, roles, or other properties Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 36 / 37

NoSQL Approaches Graph database systems Interfaces and query languages vary Single-step versus path expressions versus full recursion Example systems Neo4j, FlockDB, Pregel, etc. RDF triple stores can map to graph databases Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 37 / 37