Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014

Similar documents
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Cloud Scale Distributed Data Storage. Jürmo Mehine

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Making Sense of NoSQL Dan McCreary Ann Kelly

Preparing Your Data For Cloud

NoSQL Databases. Nikos Parlavantzas

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Structured Data Storage

Lecture Data Warehouse Systems

NoSQL Data Base Basics

Advanced Data Management Technologies

Can the Elephants Handle the NoSQL Onslaught?

Dynamo: Amazon s Highly Available Key-value Store

Hands-on Cassandra. OSCON July 20, Eric

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

How To Scale Out Of A Nosql Database

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Big Data Analytics. Rasoul Karimi

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Introduction to NOSQL

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

A survey of big data architectures for handling massive data

NoSQL Systems for Big Data Management

NoSQL and Graph Database

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

How To Use Big Data For Telco (For A Telco)

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Slave. Master. Research Scholar, Bharathiar University

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Benchmarking and Analysis of NoSQL Technologies

So What s the Big Deal?

An Approach to Implement Map Reduce with NoSQL Databases

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

NoSQL. Thomas Neumann 1 / 22

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Open Source Technologies on Microsoft Azure

Practical Cassandra. Vitalii

Big Data Technologies. Prof. Dr. Uta Störl Hochschule Darmstadt Fachbereich Informatik Sommersemester 2015

Big Systems, Big Data

NoSQL for SQL Professionals William McKnight

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Understanding NoSQL on Microsoft Azure

How To Write A Database Program

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

these three NoSQL databases because I wanted to see a the two different sides of the CAP

nosql and Non Relational Databases

Data Modeling for Big Data

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

bigdata Managing Scale in Ontological Systems

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

NoSQL and Hadoop Technologies On Oracle Cloud

The Quest for Extreme Scalability

Open source large scale distributed data management with Google s MapReduce and Bigtable

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

NoSQL Database Systems and their Security Challenges

NoSQL Evaluation. A Use Case Oriented Survey

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Introduction to Apache Cassandra

Scalable Multiple NameNodes Hadoop Cloud Storage System

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Hypertable Architecture Overview

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Big Data and Data Science: Behind the Buzz Words

Challenges for Data Driven Systems

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

An Open Source NoSQL solution for Internet Access Logs Analysis

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

Sentimental Analysis using Hadoop Phase 2: Week 2

A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA. Technology, Coimbatore. Engineering and Technology, Coimbatore.

Understanding NoSQL Technologies on Windows Azure

NOSQL DATABASES AND CASSANDRA

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

CloudDB: A Data Store for all Sizes in the Cloud

A Comparative Analysis of Different NoSQL Databases on Data Model, Query Model and Replication Model

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Comparing SQL and NOSQL databases

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

BRAC. Investigating Cloud Data Storage UNIVERSITY SCHOOL OF ENGINEERING. SUPERVISOR: Dr. Mumit Khan DEPARTMENT OF COMPUTER SCIENCE AND ENGEENIRING

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Integrating Big Data into the Computing Curricula

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Transcription:

Making Sense of NoSQL Dan McCreary Wednesday, Nov. 13 th 2014

Agenda Why NoSQL? What are the key NoSQL architectures? How are they different from traditional RDBMS Systems? What types of problems do they solve? How to learn more Copyright Kelly-McCreary & Associates, LLC 2

Background for Dan McCreary Co-founder of the NoSQL Now! conference Background Bell Labs NeXT Computer (Steve Jobs) Owner of 75-person software consulting firm US Federal data integration (National Information Exchange Model NIEM.gov) Native XML/XQuery for metadata management since 2006 Advocate of web standards, NoSQL and XRX systems As of Monday Principal Engineer with MarkLogic Copyright Kelly-McCreary & Associates, LLC 3

Making Sense of NoSQL Coauthor (with Ann Kelly) of "Making Sense of NoSQL" Guide for managers and architects Focus on NoSQL architectural tradeoff analysis Basis for 40 hour course on database architectures http://manning.com/mccreary Copyright Kelly-McCreary & Associates, LLC 4

The Story of Property Tax Forms Alex Bleasdale Arun Batchu How did I get into NoSQL? In 2006 a state agency in Minnesota wanted to standardize property tax records across 87 counties A story about standards XML, XML Schema, XForms, XQuery, NEIM A story about agility A story about new technology adoption Kelly-McCreary & Associates 5

2006 ecrv Case Study Kelly-McCreary & Associates 6

7 Kelly-McCreary & Associates

8 Kelly-McCreary & Associates

9 Kelly-McCreary & Associates

10 Kelly-McCreary & Associates

Four Translations T1 T2 Web Browser T3 Object Middle Tier T1 HTML into Java Objects T2 Java Objects into SQL Tables T3 Tables into Objects T4 Objects into HTML T4 Relational Database 11 Kelly-McCreary & Associates

Kurt's Suggestion Use a Native XML Database! Web Browser Web Form Save Kurt Cagle exist-db store($collection, $file-name, $data) 12 Kelly-McCreary & Associates

Zero Translation XForms Web Browser XML database XML lives in the web browser (XForms) REST interfaces XML in the database (Native XML, XQuery) XRX Web Application Architecture No "impedance mismatch", No translation! Department tried it and then went back to HTML, Java and SQL but I was forever changed I began to question everything I had been taught about databases 13 Kelly-McCreary & Associates

Anger, Wiki, Conference, Book 2011, 2012, 2013, 2014 Kelly-McCreary & Associates

NoSQL on Google Trends RDBMS NoSQL http://www.google.com/trends/explore?q=nosql%2c+rdbms#q=nosql%2c%20rdbms&cmpt=q Kelly-McCreary & Associates, LLC 15

The NO-SQL Universe Key-Value Stores Document Stores Column-Family Stores Graph/Triple Stores XML Copyright Kelly-McCreary & Associates, LLC 16

Sample of NoSQL Jargon Document orientation Schema free MapReduce Horizontal scaling Sharding and auto-sharding Brewer's CAP Theorem Consistency Reliability Partition tolerance Single-point-of-failure Object-Relational mapping Key-value stores Column stores Document-stores Memcached Indexing B-Tree Configurable durability Documents for archives Functional programming Document Transformation Document Indexing and Search Alternate Query Languages Aggregates OLAP XQuery MDX RDF SPARQL Architecture Tradeoff Modeling ATAM Erlang Note that within the context of NoSQL many of these terms have different meanings! Copyright Kelly-McCreary & Associates, LLC 17

Before NoSQL Relational Analytical (OLAP) Copyright Kelly-McCreary & Associates, LLC 18

After NoSQL Relational Analytical (OLAP) Key-Value key key key key value value value value Column-Family Graph Document Copyright Kelly-McCreary & Associates, LLC 19

Food for thought What percentage of database transactions run on RDBMSs in the following organizations? What percentage of all transactions in Minnesota run on RDBMSs? Why is this number different? Is our data fundamentally different? Copyright Kelly-McCreary & Associates, LLC 20

NoSQL The Big Tent http://www.flickr.com/photos/morgennebel/2933723145/ "NoSQL" a label for a "meme" that now encompasses a large body of innovative ideas on data management "Not Only SQL" Focus on non-relational databases and hybrids A community where new ideas are quickly recombined to create innovative new business solutions Copyright Kelly-McCreary & Associates, LLC 21

Historical Context Mainframe Era Commodity Processors 1 CPU COBOL and FORTRAN Punchcards and flat files $10,000 per CPU hour 10,000 CPUs Functional programming MapReduce "farms" Pennies per CPU hour Copyright Kelly-McCreary & Associates, LLC 22

Two Approaches to Computation 1930s and 40s John von Neumann Manage state with a program counter. Alonzo Church Make computations act like math functions. Which is simpler? Which is cheaper? Which will scale to 10,000 CPUs? Copyright 2010 Dan McCreary & Associates 23

Standard vs. MapReduce Prices John's Way Alonzo's Way http://aws.amazon.com/elasticmapreduce/#pricing Copyright Kelly-McCreary & Associates, LLC 24

MapReduce CPUs Cost Less! 14 12 10 8 6 4 2 0 Cost Per CPU Hour (Cents) EC2 MapReduce Cut cost from 12 to 3 cents per CPU hour! Perhaps Alonzo was right! Why? (hint: how "shareable" is this process) http://aws.amazon.com/elasticmapreduce/#pricing Copyright Kelly-McCreary & Associates, LLC 25

Pressures on Single Node RDBMS Architectures Scalability OLAP/BI/Data Warehouse Single Node RDBMS Social Networks Agile Schema Free Copyright Kelly-McCreary & Associates, LLC 26

An evolving tree of data types RDBMS BI/DW Web Crawlers Open Linked Data Transactional Log Files JSON Read Mostly Read/Write Binary Graph XML Structured Unstructured Documents Copyright Kelly-McCreary & Associates, LLC 27

Many Uses of Data Transactions (OLTP) Analysis (OLAP) Search and Findability Enterprise Agility Discovery and Insight Speed and Reliability Consistency and Availability Copyright Kelly-McCreary & Associates, LLC 28

Three Eras of Enterprise Data Siloed Systems ERP Drives BI/DW NoSQL and Documents HR Finance ERP OLTP Documents ERP Documents Inventory Sales BI/Data Warehouse OLAP BI/Data Warehouse NoSQL 1970s, 80s, 90s 1990s, 2000s Today NoSQL will not replace ERP or BI/DW systems but they will complement them and also facilitate the integration of unstructured document data Copyright Kelly-McCreary & Associates, LLC 29

Simplicity is a Virtue Photo from flickr by PSNZ Images Many modern systems derive their strength by dramatically limiting the features in their system and focus on a specific task Simplicity allows database designer to focus on the primary business drivers Simplicity promotes "separation of concerns" Copyright Kelly-McCreary & Associates, LLC 30

Google MapReduce 2004 paper that had huge impact of functional programming on the entire community Copied by many organizations, including Yahoo 31

Google Bigtable Paper 2006 paper that gave focus to scaleable databases designed to reliably scale to petabytes of data and thousands of machines 32

Amazon's Dynamo Paper Werner Vogels CTO - Amazon.com October 2, 2007 Used to power Amazon's Web Sites One of the most influential papers in the NoSQL movement Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, Dynamo: Amazon's Highly Available Key-Value Store, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007. 33

Scale Up vs. Scale Out Scale Up Make a single CPU as fast as possible Increase clock speed Add RAM Make disk I/O go faster Scale Out Make Many CPUs work together Learn how to divide your problems into independent threads Copyright Kelly-McCreary & Associates, LLC 34

Automatic Sharding Before Shard Warning processor at 90% capacity! Time to "Shard" copy ½ data to a new processor After Shard Each processor gets ½ the load Original processor New processor When one node in a cluster has too much of a load the system should be able to automatically rebalance the data distribution Note: Auto-sharding is not the same as replication! Copyright Kelly-McCreary & Associates, LLC 35

Schema-Free Integration XML v1 XML v2 XML v3 Enterprise Messaging System NoSQL Database "We can easily store the data that we actually get, not the data we thought we would get." 36

Horizontal Scalability Performance Linear scalable architectures provide a constant rate of additional performance as the number of processors increases Non-scalable systems reach a plateau of performance where adding new processors does not add incremental performance Number of processors Figure 6.2

Shared Nothing Architecture Every node in the cluster has its own CPU, RAM and disk

Master-Slave vs. Peer to Peer Master-Slave Peer-to-Peer requests requests Master Used only if primary master fails Standby Master Node Node Node Node Node Node Node The Master node may become a bottleneck in large clusters Many newer NoSQL architectures are moving toward a true peer-to-peer system Copyright Kelly-McCreary & Associates, LLC 39

The Tunable SLA inputs output Max Read Time Max Write Time Reads Per Second Writes Per Second Duplicate Copies Multiple Datacenters Transaction Guarantees Estimated Price/Month $435.50 NoSQL systems that use many commodity processors can be precisely tuned to meet an organizations service level agreements 40

Key-Value Stores key key key key value value value value Examples: Berkley DB, Memcache, DynamoDB, S3, Redis, Riak Keys used to access opaque blobs of data Values can contain any type of data (images, video) Pros: scalable, simple API (put, get, delete) Cons: no way to query based on the content of the value Copyright Kelly-McCreary & Associates, LLC 41

Column-Family Examples: Cassandra, HBase, Hypertable, Apache Accumulo, Bigtable Key includes a row, column family and column name Store versioned blobs in one large table Queries can be done on rows, column families and column names Pros: Good scale out, versioning Cons: Cannot query blob content, row and column designs are critical Copyright Kelly-McCreary & Associates, LLC 42

Column store key-value Key Row-ID Column Family Column Name Timestamp Value The key is composed of: row id (string) Column family (grouping of columns) Column name (string) Timestamp (64-bit value) Value any blob (byte stream)

Graph Store Examples: Neo4j, AllegroGraph, Bigdata triple store, InfiniteGraph, StarDog Data is stored in a series of nodes, relationships and properties Queries are really graph traversals Ideal when relationships between data is key: e.g. social networks Pros: fast network search, works with public linked data sets Cons: Poor scalability when graphs don't fit into RAM, specialized query languages (RDF uses SPARQL) Copyright Kelly-McCreary & Associates, LLC 44

Document Store Examples: MarkLogic, MongoDB, Couchbase, CouchDB, RavenDB, exist-db Data stored in nested hierarchies Logical data remains stored together as a unit Any item in the document can be queried Pros: No object-relational mapping layer, ideal for search Cons: Complex to implement, incompatible with SQL Copyright Kelly-McCreary & Associates, LLC 45

Two Models "Bag of Words" "Retained Structure" doc-id keywords keywords 'love' keywords keywords 'new' 'hate' keywords 'fear' keywords All keywords in a single container Only count frequencies are stored with each word Keywords associated with each sub-document component 46

Keywords and Node IDs document-id Node-id keywords Node-id keywords Node-id keywords Node-id keywords Node-id keywords Node-id keywords Keywords in the reverse index are now associated with the node-id in every document 47

Using the Wrong Architecture Start Finish Credit: Isaac Homelund MN Office of the Revisor

Using the Right Architecture Start Finish Find ways to remove barriers and empower the non programmers on your team.

Further Reading and Questions Dan McCreary dan@danmccreary.com @dmccreary http://manning.com/mccreary Copyright Kelly-McCreary & Associates, LLC 50