Examples of database sizes. Database Systems. RDBMS weakness. Most used DBMS. Non-relational databases. Semi-structured data (SSD) 12/12/16

Similar documents
Databases 2 (VU) ( )

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Lecture Data Warehouse Systems

Cloud Scale Distributed Data Storage. Jürmo Mehine

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

An Open Source NoSQL solution for Internet Access Logs Analysis

How To Scale Out Of A Nosql Database

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Table of Contents. Développement logiciel pour le Cloud (TLC) Table of Contents. 5. NoSQL data models. Guillaume Pierre

Applications for Big Data Analytics

NoSQL. Thomas Neumann 1 / 22

NoSQL Databases. Nikos Parlavantzas

Introduction to NOSQL

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

NoSQL Data Base Basics

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

INTRODUCTION TO CASSANDRA

Slave. Master. Research Scholar, Bharathiar University

NoSQL: Going Beyond Structured Data and RDBMS

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

Open source large scale distributed data management with Google s MapReduce and Bigtable

DROSS Distributed & Resilient Open Source Software

Big Systems, Big Data

Hadoop Ecosystem B Y R A H I M A.

Structured Data Storage

NoSQL for SQL Professionals William McKnight

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University


nosql and Non Relational Databases

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Infrastructures for big data

Can the Elephants Handle the NoSQL Onslaught?

NoSQL Systems for Big Data Management

Big Data Technologies Compared June 2014

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

An Approach to Implement Map Reduce with NoSQL Databases

Preparing Your Data For Cloud

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Big Data With Hadoop

Introduction to NoSQL

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Department of Software Systems. Presenter: Saira Shaheen, Dated:

Challenges for Data Driven Systems

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Study concluded that success rate for penetration from outside threats higher in corporate data centers

So What s the Big Deal?

NoSQL Database Options

NoSQL and Hadoop Technologies On Oracle Cloud

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Open Source Technologies on Microsoft Azure

Integrating Big Data into the Computing Curricula

Databases : Lecture 11 : Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

MapReduce with Apache Hadoop Analysing Big Data

Comparing SQL and NOSQL databases

Current Data Security Issues of NoSQL Databases

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Google Bing Daytona Microsoft Research

NoSQL Databases. Polyglot Persistence

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Understanding NoSQL Technologies on Windows Azure

Advanced Data Management Technologies

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Introduction to NoSQL

Cloud & Big Data a perfect marriage? Patrick Valduriez

Introduction to Apache Cassandra

Apache HBase. Crazy dances on the elephant back

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 12

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Emerging Requirements and DBMS Technologies:

How To Store Data In Nosql

A survey of big data architectures for handling massive data

Integration of Apache Hive and HBase

Constructing a Data Lake: Hadoop and Oracle Database United!

How To Handle Big Data With A Data Scientist

2.1.5 Storing your application s structured data in a cloud database

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Big Data Analytics. Rasoul Karimi

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

Practical Cassandra. Vitalii

Scalable ecommerce with NoSQL. Dipali Trivedi

Enterprise Operational SQL on Hadoop Trafodion Overview

The Quest for Extreme Scalability

Transcription:

Examples of database sizes Database Systems NoSQL Digg: 3 TB just to store the up/down votes Twitter: 7 TB/day Facebook: 50 TB for the private messaging feature 1 PB photos ebay: 2 PB data overall RDBMS weakness Most used DBMS RDBMSs typically handle massive amounts of data in complex domains, with frequent small read/writes. The archetypical RDBMS serves a bank. Cassandra (NoSQL) can perform the store operation into a 50GB database 2500 faster than using MySQL Data-intensive applications don t fit this pattern: MASSIVE+++ amounts of data (e.g. ebay) Super-fast indexing of documents (e.g. Google) Serving pages on high-traffic websites (e.g. Facebook) Streaming media (e.g. Spotify) Non-relational databases Semi-structured data (SSD) MapReduce framework Google originally; Hadoop (Apache), Key-Value stores BigTable (Google), Cassandra (Apache), Document stores CouchDB, MongoDB, SimpleDB, Graph databases Neo4j, FlockDB, Semi-structured databases (Native) XML databases, More flexible than the relational model. The type of each entity is its own business. Labels indicate meanings of substructures. Semi-structured: it is structured, but not everything is structured the same way! Support for XML and XQuery in e.g. Oracle, DB2, SQL Server. Special case: Document databases 1

Document stores Roughly: Key-Value stores where the values are documents XML, JSON, mixed semistructured data sets Typically incorporate a query language for the document type. See previous lecture for discussion on XML querying. Document store implementations MongoDB Name short for Humongous Open source owned by 10gen JSON(-like) semi-structured storage JavaScript query language Supports MapReduce for aggregations Apache CouchDB MySQL Table Terminology and Concepts Many concepts in MySQL have close analogs in MongoDB. This table outlines some of the common concepts in each system. Row MongoDB Collection Document Key-Value Stores Key-Value stores is a fancy name for persistant maps (associative arrays, hash tables) Extremely simple interface extremely complex implementations. Values can be another {Key-value} documents Column Joins Field Embedded documents, linking Customer NoSQL Data Example I "id": 1, "timestamp": "2016.03.26-11.47.02.065", nid": "B1234455X", "name": Alice", Objects Factures facture": [{ "date": "26/03/2016", "total": 6.98 ]} 3 Entities. Joins? objects": [{ concept": Pencils", amount": 3.78} { id": 2, "concept": Folder", amount": 3.20} NoSQL Data Example II "id": 1, "timestamp": "2016.03.26-11.47.02.065", nid": "B1234455X", "name": Alice", facture": [{ "date": "26/03/2016", "total": 6.98 objects": [{ concept": Pencils", amount": 3.78} { id": 2, "concept": Folder", amount": 3.20} 2

MySQL INSERT INTO users (user_id, age, status) VALUES ('bcd001', 45, 'A') SELECT * FROM users UPDATE users SET status = 'C' WHERE age > 25 MongoDB db.users.insert({ user_id: 'bcd001', age: 45, status: 'A' }) db.users.find() db.users.update( { age: { $gt: 25 } { $set: { status: 'C' } { multi: true } ) Performance NoSQL Denormalized data No JOINs Complex information on a single query SQL Normalized schemas Redundance Complex queries to get complex data Scaling NoSQL Easy to distribute Easy to spread the data SQL Still a challenge nowadays Key-Value store implementations BigTable (Google) Sparse, distributed, multi-dimensional sorted map Proprietary used in Google s internals: Google Reader, Google Maps, YouTube, Blogger, Cassandra (Apache) Originally Facebook s PM database now Open Source (Apache top-level project) Used by Netflix, Digg, Reddit, Spotify, MapReduce MapReduce No data model all data stored in files Operations supplied by user: Reader :: file [input record] Map :: input record <key, value> Reduce :: <key, [value]> [output record] Writer :: [output record] file Everything else done behind the scenes: Consistency, atomicity, distribution and parallelism, glue Optimized for broad data analytics Running simple queries over all data at once 3

MapReduce implementations The secret behind Google s success Still going strong. Hadoop (Apache) Open Source implementation of the MapReduce framework Used by Ebay, Amazon, Last.fm, LinkedIn, Twitter, Yahoo, Facebook internal logs (~15PB), MongoDB CouchDB Graph Databases Data modeled in a graph structure Nodes = entities Properties = tags, attribute values Edges connect Nodes to nodes (relationships) Nodes to properties (attributes) Fast access to associative data sets All entities that share a common property Computing association paths Graph database implementations NoSQL a hype? Neo4j Developed in Malmö Specialized query language: Cypher FlockDB Initially developed by Twitter to store user relationships Apache licence NoSQL is not the right choice just because it s new! Relational DBMSs still rule at what they were first designed for: efficient access to large amounts of data in complex domains. That s still the vast majority! Where is SQL ideal? Requirements can be identified in advance Data integrity is a must Standards-based proven technology. Where is NoSQL ideal? Unrelated / Indeterminate / evolving data requirements Simpler objectives where time is a requirement Speed and scalability is a must NoSQL = Not only SQL Different data models optimized for different tasks MapReduce, Key-Value stores, Document stores, Graph databases, Typically: + efficiency, scalability, flexibility, fault tolerance - (no) query language, (less) consistency 4

The End 5