Dr. Chuck Cartledge. 27 Aug. 2015

Similar documents
NoSQL Databases. Nikos Parlavantzas

Introduction to NOSQL

Lecture Data Warehouse Systems

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

NoSQL Databases. Polyglot Persistence

Big Data Analytics. Rasoul Karimi

Structured Data Storage

Databases : Lecture 11 : Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Cloud Scale Distributed Data Storage. Jürmo Mehine

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Big Data Management and NoSQL Databases

NoSQL Data Base Basics

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

NoSQL Systems for Big Data Management

How To Write A Database Program

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Understanding NoSQL on Microsoft Azure

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications

How graph databases started the multi-model revolution

An Approach to Implement Map Reduce with NoSQL Databases

How To Scale Out Of A Nosql Database

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Can the Elephants Handle the NoSQL Onslaught?

NoSQL for SQL Professionals William McKnight

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Database Revolution: Old SQL, NewSQL, NoSQL Huh? Michael Bowers April 9, 2013 v2.9

Making Sense of NoSQL Dan McCreary Ann Kelly

A survey of big data architectures for handling massive data

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

nosql and Non Relational Databases

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Big Data Database Revenue and Market Forecast,

Transactions and ACID in MongoDB

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Big Data and Data Science: Behind the Buzz Words

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 30,

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting

Challenges for Data Driven Systems

Dr. Chuck Cartledge. 15 Oct. 2015

Emerging Requirements and DBMS Technologies:

Open Source Technologies on Microsoft Azure

Understanding NoSQL Technologies on Windows Azure

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Comparing SQL and NOSQL databases

NoSQL and Graph Database

AllegroGraph. a graph database. Gary King gwking@franz.com

Mobile + HA + Cloud. Eugene Ciurana! ! pr3d4t0r - irc.freenode.net! ##java, ##security, #awk, #python, #bitcoin! irc.oftc.net: #tor, #tor-dev, #tails!

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Peninsula Strategy. Creating Strategy and Implementing Change

Overview of Database Management

NoSQL Database Options

Enterprise Operational SQL on Hadoop Trafodion Overview

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

Large-Scale Web Applications

these three NoSQL databases because I wanted to see a the two different sides of the CAP

TIM 50 - Business Information Systems

Preparing Your Data For Cloud

Data Services Advisory

Crazy NoSQL Data Integration with Pentaho

Study concluded that success rate for penetration from outside threats higher in corporate data centers

bigdata Managing Scale in Ontological Systems

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

How To Improve Performance In A Database

Integrating Big Data into the Computing Curricula

What Next for DBAs in the Big Data Era

GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

InfiniteGraph: The Distributed Graph Database


NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

NoSQL and Hadoop Technologies On Oracle Cloud

Hurtownie Danych i Business Intelligence: Big Data

Overview of Data Management

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Introduction to Big Data Training

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

MEAP Edition Manning Early Access Program Neo4j in Action MEAP version 3

INTRODUCTION TO CASSANDRA

Speed, scale, query: can NoSQL give us all three? Arun Matthew Couchbase

Open source, high performance database

CS Programming OLAP

CPS 516: Data-intensive Computing Systems. Instructor: Shivnath Babu TA: Zilong (Eric) Tan

Transcription:

CS-695 NoSQL Database Polyglot Persistence; Or, The Many Ways We Store Data Dr. Chuck Cartledge 27 Aug. 2015 1/37

Table of contents I 4 CRUDy stuff 1 A little history 2 A change in the air 3 Database layouts 5 Databases that I/we use 6 Conclusion 7 References 2/37

Hammer and nails......it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail. Abraham H. Maslow [8] 3/37

Miscellania Origin of polyglot... Popularized by Neal Ford [4]: Talked about software development How things are evolving (SQL, XML,.NET, etc.) How multi-threading is hard (concurrency, coordination, etc.) Promoted the idea of enterprise development via Java and.net Take away: choose the right tool for the job. Different languages will continue to exist because each is good at something and all are necessary. 4/37

Miscellania The world BC (Before Codd). Databases existed before Edgar Codd. Hierarchical approach alive and well in our file system Network approach currently underpinning ideas for graph databases These suffered because people had to know lots of details about how the database was implemented. 5/37

Miscellania The world after Codd. Separate representation from implementation Changes in database for optimization needn t affect data queries User interactions aren t cluttered by construction noise (including indexing and sorting) Codd s relational data bank hides all implementation information. Relational database management systems (RDBMS) hiding information about how data is stored. Data language is independent of how data is stored [3]. 6/37

Miscellania The world according to RDBMS. Everything is neat and tidy Everything can be defined in a set of tables that have relationships between them If you make the database large enough, you can store anything and ask any question Image from [10]. RDBMS reigned supreme for 30-40 years (starting in 1970). And then reality and Big Data started to hit. 7/37

How we turned and started to get to now. And then things started changing. Can t point a finger at a specific incident, might be a critical mass. The Internet made it easier to collect data. A new generation of people thought about things in a different way. The new data had three attributes: velocity, volume, variety [7]. New ways of looking at data encouraged new questions. People wanted answers faster. Many of these items couldn t be supported by a RDBMS. 8/37

Make things faster. Simple and complex ways How to get more processing power to answer database questions?? Basically: Scale up buy faster CPU and more RAM Scale out buy more CPUs and get them to work in parallel Scaling up with custom CPUs gets expensive very, very quickly. Image from [9]. Commodity CPUs are almost a dime a dozen. Leading to clusters, network services, distributed applications, etc. 9/37

Make things faster. Amdahl s Law [1] Division and measurement of serial and parallel operations appears time and again. (Shades of Mandelbrot.) Make the common fast. Make the fast common. Understand what parts have to be done serially. Understand what parts can be done in parallel. Need to factor in overhead costs when computing speed up. 10/37

Make things faster. Amdahl s Law (A summary) Time for serial execution def. = T(1) Portion that is NOT be paralyzable def. = B (0,1] Number of parallel resources def. = n T(n) = T(1) (B + 1 n (1 B)) Speed up def. = S(n) S(n) = T(1) = 1 T(n) B+ n 1(1 B) Dr. Gene Amdahl (circa 1960) 11/37

The questions changed. We knew that we didn t know. Our questions and our data changed. RDBMS had limitations: Supported ad hoc questions on predefined data Didn t support undefined or unstructured data Could scale up not out, so database size was practically limited SQL predicate calculus made logic awkward RDBMS are very, very good at somethings, but user needs were changing. 12/37

The questions changed. What happens when we ask a different question?? When the RDBMS database was designed, we thought we knew what we wanted to know. That was then. Now if we want to look at family relationships (parent, child, sibling, extended family, etc.) We can add a column to the table for up/down relationships We can add a column for side to side relationships We can add a column for extended family relationships The database doesn t look like how we think about the problem. When the data representation doesn t match how we think, then something has to change. 13/37

A collection of different database layouts. A RDBMS Can add well formed data easily Difficult to add new data fields or types Each row is expected to have the same data Supports unknown (ad hoc) queries well Scales up not out Popular RDBMS: Oracle, MySQL, MS SQL Server, PostgreSQL The King of the World for a very long time. (A version lives in your phone.) 14/37

A collection of different database layouts. A columnar database Takes the idea of a row orientated database and turns it on its side. Can add new columns easily Each row can have different use different columns Scales up and out Popular column oriented databases: IBM DB2, Sybase IQ, Teradata Image from [2]. 15/37

A collection of different database layouts. A Key-Value design A number (called the key) locates all other data (the value[s]). Use math on some data (may be more than one piece) The math (hash function) returns one value (the key) Use the key to find the rest of the data Locating data can be fast Hash function should return unique values Popular Key-Value DBMS: Redis, Memcached, Amazon DynamoDB, Riak Key-value databases are fast when using the hash function. Not so fast if you aren t. 16/37

A collection of different database layouts. An Online Analytical Processing (OLAP) design A way to visualize and analyze data using a data cube and basic functions: Basic functions: 1 Consolidation (roll-up) of the multi-dimensional data 2 Drill-down into the data 3 Slicing and dicing Fast execution time Incorporates aspects of navigational, hierarchical, and relational databases Popular OLAP databases: Hyperion Solutions, Cognos, MicroStrategy, Applix Image from [15]. Target users are business analysts and business process management. 17/37

A collection of different database layouts. A Graph design A very different way to think about data. Consists of two parts: 1 Node (something that exists as an entity in the database) 2 Arcs (something that describes a relationship between nodes) You can have nodes without arcs. You can not have arcs without nodes. Arcs can be unidirectional. Popular graph databases: Neo4j, OrientDB, Titan, Giraph Image from [6]. Questions are driven by the relationships between nodes vice the nodes themselves. 18/37

A collection of different database layouts. A document design Document oriented databases can be viewed, and can have internal document databases (recursively). Database is organized based on tags Tag s meaning is instance dependent Tags can be nested (recursively) Database structure maybe XML based and represented in different ways Popular document databases: MongoDB, CouchDB, Couchbase, MarkLogic Sometimes document databases show up in unexpected places. 19/37

Which design to use? If I had a hammer,... Questions to ask: 1 How much data will be in the database?? 2 Will I be reading mostly?? 3 Will I be writing mostly?? 4 How accurate must the data be?? 5 How many simultaneous readers and writers?? 6 How robust/resilient must the database be?? 7 How will the database be accessed?? 8 What about ACID vs. BASE?? So many choices. 20/37

Which design to use? ACID vs. BASE One is a design principle, the other is counter marketing. ACID [5] 1 A Atomicity - all or nothing 2 C Consistency - database is always valid 3 I Isolation - concurrent equal serial ops. 4 D Durable - the database is written to disk A database action will complete completely. BASE [12] 1 BA Basically Available 2 S Soft state - user guarantees consistency 3 E Eventually consistent A database action will probably complete eventually. ACID comes with SQL. BASE comes with NoSQL. 21/37

Which design to use? Consistency, Availability, Partition tolerance (CAP) Theorem Sharing data in distributed systems is hard. Data can be consistent across the system Data can be available across the system The system can continue to function if partitioned/split You only get to choose two. Image from [17]. RDBMS on a single machine means partition is undefined. Distributed systems only get two. 22/37

Create darkness was on the face of the deep. Ex nihilo nihil fit (out of nothing, nothing comes). The CRUD approach doesn t say what happened before the C. RDBMS CREATE DATABASE db name; CREATE TABLE table name ( column name1 data type(size), column name2 data type(size),...); Columnar CREATE DATABASE CREATE table name, column name1, column name2,...; Key-Value, Graph, Document CREATE DATABASE CREATE table name Graph, Document CREATE DATABASE Image from [11]. Implementation agnostic. 23/37

Create darkness was on the face of the deep. Create an entry RDBMS INSERT INTO table name VALUES (value1,value2,value3,...); Columnar PUT table name, row name, column name1:, value ; Key-Value ADD table name, key value, value; Graph CREATE relationship name, vertex name1, vertex name2 Document INSERT table name (GML/XML/JSON marked up data) 24/37

Report databases aren t much good if you can t get stuff out. Report/Retrieve data an entry from the database RDBMS SELECT column name,column name FROM table name; Columnar GET table name, row name1:, column name:; Key-Value GET table name, key value; Graph (pipe operations) GET VERTEX EDGE FILTER(expression) (...) Document FIND document id 25/37

Update things change. Update an entry RDBMS UPDATE table name SET column1=value1,column2=value2,... WHERE some column=some value; Columnar DELETE FROM table name WHERE [expression]; PUT table name, row name, column name1:, value ; Key-Value SET table name, key value, value; Graph GET VERTEX EDGE FILTER(expression) (...) REMOVE property ADD property Document UPDATE document id value (same format as CREATE) 26/37

Delete to remove that which once was. Delete an entry RDBMS DELETE FROM table name WHERE some column=some value; Columnar DELETE FROM table name WHERE [expression]; Key-Value DROP table name, key value; Graph GET VERTEX EDGE FILTER(expression) (...) REMOVE Document REMOVE document id value 27/37

Lots and they are hidden. Shopping as an example Firefox SQLite for browser history Shopping cart Key-Value based on session ID Recommended purchases graph database Credit card payment SQL database Excel record purchase document Save Excel file hierarchical database 28/37

A continuum. Things from a 50,000 foot perspective Ad-hoc Queries Doc. Col. OLAP RDBMS K-V Free text Rigid Messy Data Neat and tidy 29/37

A continuum. Notional strengths and weaknesses ACID BASE Ad-hoc queries Hardware Hardware failure Database type RDBMS K-V Col. Doc. Graph Supported Not supported by data model No statement 30/37

Where can I get these things?? Popular open source databases RDBMS MySQL, PostrgreSQL, SQLite Key-Value Redis, Memcached, Riak Columnar HBase, Accumulo, Hypertable Document MongoDB, CouchDB, Couchbase Graph Neo4j, OrientDB, Titan Image from [16]. Open source does not mean free; your time costs money. 31/37

In summary... What can we say?? 1 Each type of database design fills a specific need/niche. 2 Each type could do the work of the others 1 Each type has a data model tailored to its problem domain 2 Performance is tied to the hardware (CPU and I/O) RDBMS has been the King for a long time. Expect it to remain so due to inertia. 32/37

NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence by Sadalage and Fowler [14]. Book to be used and refered to during the course, ISBN 9780321826626. 33/37

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Redmon and Wilson [13]. A very nice and graspable tour of various NoSQL database types. Examples of each type is presented with exercises that can be completed in a weekend. Book to be used and refered to during the course, ISBn 9781934356920. 34/37

References I [1] Gene M Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the Spring Joint Computer Conference, ACM, 1967, pp. 483 485. [2] Dale Anderson, Column oriented database technologies, http://www.dbbest.com/blog/column-oriented-database-technologies/, 2012. [3] Edgar F. Codd, A relational model of data for large shared data banks, Communications of the ACM 13 (1970), no. 6, 377 387. [4] Neal Ford, Polyglot programming, http://memeagora.blogspot.com/2006/12/polyglot-programming.html, 2006. [5] Jim Gray, The transaction concept: Virtues and limitations, Very Large Databases, vol. 81, 1981, pp. 144 154. 35/37

References II [6] Andy Hogg, Whiteboard it the power of graph databases, http://www.computerweekly.com/feature/whiteboard-it-the-power-of-gr 2013. [7] Doug Laney, 3d data management: Controlling data volume, velocity and variety, META Group Research Note 6 (2001). [8] Abraham H. Maslow, The psychology of science, Henry Regency, 1966. [9] Andrea Mauro, Storage scale-up vs. scale-out, http://vinfrastructure.it/2014/06/scale-out-vs-scale-in/, 2014. [10] David Mertz, Xml matters: Putting xml in context with hierarchical, relational, and object-oriented models, http://www.ibm.com/developerworks/library/x-matters8/, 2001. 36/37

References III [11] Brian Panulla, If libraries were like relational databases, http://ghostednotes.com/2010/12/31/if-libraries-were-like-relationa 2010. [12] Dan Pritchett, Base: An acid alternative, Queue 6 (2008), no. 3, 48 55. [13] Eric Redmond and Jim R Wilson, Seven databases in seven weeks, Pragmatic Bookshelf, 2012. [14] Pramod J Sadalage and Martin Fowler, Nosql distilled, Pearson Education, 2012. [15] DatabaseJournal Staff, Examples of sql server implementations, Database Journal (2010). [16] Wikipedia Staff, Database, https://en.wikipedia.org/wiki/database, 2015. [17] Saeid Zebardast, Said experts, http://blog.zebardast.ir/, 2015. 37/37