Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Similar documents

An Approach to Implement Map Reduce with NoSQL Databases

Integrating Big Data into the Computing Curricula

Can the Elephants Handle the NoSQL Onslaught?

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Cassandra vs MySQL. SQL vs NoSQL database comparison

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

these three NoSQL databases because I wanted to see a the two different sides of the CAP

NoSQL in der Cloud Why? Andreas Hartmann

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

A survey of big data architectures for handling massive data

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Transactions and ACID in MongoDB

An Open Source NoSQL solution for Internet Access Logs Analysis

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Structured Data Storage

nosql and Non Relational Databases

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

NoSQL and Hadoop Technologies On Oracle Cloud

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Challenges for Data Driven Systems

bigdata Managing Scale in Ontological Systems

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

In Memory Accelerator for MongoDB

Preparing Your Data For Cloud

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

NoSQL Data Base Basics

NoSQL Database Systems and their Security Challenges

MongoDB Developer and Administrator Certification Course Agenda

Comparing SQL and NOSQL databases

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Lecture Data Warehouse Systems

Scalable Architecture on Amazon AWS Cloud

Database Scalability and Oracle 12c

How, What, and Where of Data Warehouses for MySQL

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Cloud Scale Distributed Data Storage. Jürmo Mehine

NoSQL Database Options

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone

NoSQL Databases. Nikos Parlavantzas

Practical Cassandra. Vitalii

Understanding NoSQL Technologies on Windows Azure

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Advanced Data Management Technologies

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Understanding NoSQL on Microsoft Azure

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

How To Handle Big Data With A Data Scientist

Domain driven design, NoSQL and multi-model databases

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

Introduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Slave. Master. Research Scholar, Bharathiar University

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Distributed Storage Systems

Open Source Technologies on Microsoft Azure

Moving From Hadoop to Spark

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, Anthony Yeh, Software Engineer, YouTube.

How To Scale Out Of A Nosql Database

Replicating to everything

MongoDB. The Definitive Guide to. The NoSQL Database for Cloud and Desktop Computing. Apress8. Eelco Plugge, Peter Membrey and Tim Hawkins

Document Oriented Database

Using Object Database db4o as Storage Provider in Voldemort

How To Use Big Data For Telco (For A Telco)

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

How graph databases started the multi-model revolution

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Hadoop IST 734 SS CHUNG

Database Replication with MySQL and PostgreSQL

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Introduction to Big Data Training

Please ask questions! Have people used non-relational dbs before? MongoDB?

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Introduction to Apache Cassandra

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

NoSQL: Going Beyond Structured Data and RDBMS

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

NoSQL Systems for Big Data Management

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Transcription:

Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB

Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what should I use

What is a Database? I know it when I see it USCS Justice Potter Stewart

What is a Database? Data storage for multi-user apps Conservative philosophy Failure better than partial success All errors should be reported Connect multiple simultaneous clients Chooses throughput over speed Currently dominated by OpenSource Oracle is an exception

Concepts

Durability Safe on disk before acknowledged Reliably saved abrupt termination power failure Disk failure should be detected Recovery often takes a long time

Atomicity Saves are all-or-nothing Data is rolled back for errors Know the atom for your database

Queries Read or change the data Filtering, Aggregating,Calculations Insert, Update, Delete, Replace Typically do not change the records Move the problem not the data Transaction is an atom of queries All queries succeed or fail Wrapped up by a commit/rollback

Isolation Transactions build on each other Simulate serialization Roll back conflicting transactions Not visible to others until commit

Consistency Saved data must fit defined rules Never allowed to not fit rules One good state to another Rules can be programs Does not guarantee correct data

ACID The gold-standard for databases Atomicity Consistency Isolation Durability

Organization Database - top level container Table and Record/Row Primary Key Columns Rows ID Name Age 1 Sam 32 2 Abigail 28 3 Ron 23 4 Jennifer 47 Primary key is required One or more columns

Indexes Quickly access to data and ranges Usually implemented as b-tree Lists data in-order Search is log(n) 1 million records -> 6 steps Easy access to next and previous Multiple indexes for single table Can take up more space than data

Drivers Specific to the database software API to connect to and use database Multiple programming languages Can allow network connections

FileMakerPro

FileMakerPro Includes both App and DB layers Create forms without developers Relational, but not SQL Less programming, more clicking Frustrates many SQL developers Suitable for smaller data sets (10K)

Relational Databases A.K.A.: SQL Databases

Structured Query Language SQL The most common form of database Standardized, but many dialects Declarative language Examples: SELECT id, name FROM people columns to return table SELECT id, name FROM people WHERE age > 21 columns to return table limits SELECT count(*) FROM people WHERE age >21 columns to return table limits

Schemas Tables are rigidly defined Columns each take one data type Data storage can be very efficient Types: String (fixed with an varchar) Ints, Floats, Bytes Vary by vendor

Joins Data matched between tables SELECT person.name, phone.number FROM person, phone WHERE person.id = phone.person_id Returns only where data matched

Replication Data is copied to multiple servers Available even with downed servers XA Data? Data Data B

MySQL Replication Master-slave replication One master that allows changes Tree of slaves that allow reads Near-line server or load balancing Slaves slightly behind Master Write Master Read Slaves

Vendors License Features Oracle MySQL GPL Well supported Oracle backing MariaDB GPL Many Table Types More experimental PerconaDB GPL Takes from Oracle and MariaDB PostgreSQL BSD Standard based JSON columns SQLite Public Domain Small and ubiquitous

Scale Up vs.scale Out

Scale Up A few cheap computers have more aggregate power than a single expensive one. Take advantage of hardware progress CPU speed CPU Cores/Multiple CPUs Memory increase SSDs Faster networking

Scale Out - Facebook PHP Servers Query Routers Database A - G H - R S - Z

Scale Out Pros: More CPU, Memory, and Storage Fits well with cloud servers Cons: Coordinating servers costs time Cluster can partially fail Single server failure Network outages Complexity

CAP Theorem Choose two: Consistency All nodes see the same thing Availability Always get success or failure Partition tolerance Handles node and network errors No such thing as CP

Partitioning Servers each take one part of a table Data routed to the proper server A - K A X L - Z B

Map-Reduce Function run on every record (Map) Can filter or manipulate records Reduce function run to aggregate First run on each server Those results are then aggregated Used on enormous data sets Results are stored in a table Hadoop and Apache Spark

NewSQL

NewSQL SQL language with multiple servers Very different approaches/strengths Usually a subset of SQL

NewSQL Features Trade-off Percona Cluster Replication Complete SQL SQL-fast reads Writes are slower Full copies Clustrix Map-Reduce like speed Most SQL Slower on most queries MySQL Cluster Partitioning Replication Very fast Very Limited Joins

Document Databases

JSON Databases Usually not ACID No multi-record transactions Atom is usually one record Partitioning and Replication Can survive failures Tables are Key + JSON value No schema so records can be mixed Settings for speed vs safety

JSON JavaScript Object Notation Strings, Numbers, Arrays, and Dicts More types: Binary, Files, Time, etc Complex data without schemas

JSON Databases Features MongoDB Map-Reduce Hash sharding RethinkDB Cassandra Map-Reduce Joins Changefeeds High Speed Apache CouchDB HTTP data interface eventual consistency

Other Types Timeseries looking for patterns in time needle in haystack In memory very fast but small datasets limited queries Graph store relations between records friend-of-a-friend type problems

Conclusion Use Cases Limitations SQL NewSQL Default Well structured data Small queries on large data No partitioning Limited size Rigid Structure Limits on Queries MapReduce Huge data Difficult to use Batch operation Not ACID JSON Large data Evolving structure Not ACID

Questions? Karl Kuehn Automation Engineer RethinkDB