Open source, high performance database

Similar documents

Building Your First MongoDB Application

Introduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

MongoDB Developer and Administrator Certification Course Agenda

MongoDB. Or how I learned to stop worrying and love the database. Mathias Stearn. N*SQL Berlin October 22th, gen

Dr. Chuck Cartledge. 15 Oct. 2015

Comparing SQL and NOSQL databases

Getting Started with MongoDB

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Cloud Scale Distributed Data Storage. Jürmo Mehine

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Can the Elephants Handle the NoSQL Onslaught?

L7_L10. MongoDB. Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD.

NoSQL for SQL Professionals William McKnight

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Lecture Data Warehouse Systems

How To Scale Out Of A Nosql Database

Integrating Big Data into the Computing Curricula

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

NoSQL: Going Beyond Structured Data and RDBMS

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

Open Source Technologies on Microsoft Azure

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Challenges for Data Driven Systems

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

How graph databases started the multi-model revolution

NoSQL Database Options

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

.NET User Group Bern

NoSQL. Thomas Neumann 1 / 22

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Using distributed technologies to analyze Big Data

NoSQL Databases. Nikos Parlavantzas

NoSQL Data Base Basics

NoSQL in der Cloud Why? Andreas Hartmann

Big Data Explained. An introduction to Big Data Science.

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Database Scalability and Oracle 12c

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Introduction to Apache Cassandra

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services

Practical Cassandra. Vitalii

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases

Oracle Big Data SQL Technical Update

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Hadoop Ecosystem B Y R A H I M A.

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

NoSQL Systems for Big Data Management

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

How To Improve Performance In A Database

nosql and Non Relational Databases

Advanced Data Management Technologies

Big Data With Hadoop

How To Use Big Data For Telco (For A Telco)

A survey of big data architectures for handling massive data

BIG DATA What it is and how to use?

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

The MongoDB Tutorial Introduction for MySQL Users. Stephane Combaudon April 1st, 2014

INTRODUCTION TO CASSANDRA

So What s the Big Deal?

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

Luncheon Webinar Series May 13, 2013

NoSQL and Hadoop Technologies On Oracle Cloud

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Structured Data Storage

Business Intelligence for Big Data

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Information Architecture

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Making Sense of NoSQL Dan McCreary Ann Kelly

Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases

Oracle Database 12c Plug In. Switch On. Get SMART.

NOSQL DATABASES AND CASSANDRA

Preparing Your Data For Cloud

Sentimental Analysis using Hadoop Phase 2: Week 2

Do Relational Databases Belong in the Cloud? Michael Stiefel

Big Data and Data Science: Behind the Buzz Words

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

Slave. Master. Research Scholar, Bharathiar University

How To Handle Big Data With A Data Scientist

Applications for Big Data Analytics

Transcription:

Open source, high performance database Anti-social Databases: NoSQL and MongoDB Will LaForest Senior Director of 10gen Federal will@10gen.com @WLaForest 1

SQL invented Dynamic Web Content released IBM s IMS Oracle founded Client Server Web applications SOA 10gen founded BigTable 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Codd publishes relational model paper in 1970 PC s gain traction WWW born 3 tier architecture Brewer s Cap born Cloud Computing NoSQL Movement 2

Attribute Tuple Relation 3

Category N 1 BlogCategory N 1 Author N 1 N 1 Blog BlogTag N N Content Tag 4

Data stored in a RDBMS is very compact (disk was more expensive) SQL and RDBMS made queries flexible with rigid schemas Rigid schemas helps opcmize joins and storage Massive ecosystem of tools, libraries and integracons Been around 40 years! 5

Sensor Data/SIGINT (volume, velocity) Asset Management (variety, velocity) Crowd Sourcing (volume, variety) Open Source (volume, variety, velocity) 6

VOLUME & NEW ARCHITECTURES Systems scaling horizontally, not verccally Commodity servers Cloud CompuCng DATA VARIETY & VOLATILITY Extremely difficult to find a single fixed schema Don t know data schema a- priori TRANSACTIONAL MODEL N x Inserts or updates Distributed transaccons 7

Non- relaconal been hanging around (heard of MUMPS?) Modern NoSQL theory and offerings started in early 2000s NoSQL = Not Only SQL A colleccon of very different products AlternaCves to relaconal databases when a bad fit Common mocvacons Horizontally scalable (commodity server/cloud compucng) Schema Flexibility 8

I group by data model Some people use or include data arangement Data arrangement is important and cuts across data models (I have separate talk on this) Column/Document Consistent hashing parcconing Range based parcconing Key/Value Big Table Descendents Document Oriented Graph 9

Value (Data) mapped to a key (think primary) Some typed, some just BLOBs Fast hash based queries No range queries, no ordering, simple data model Will Chris Will-obj will@10gen.com chris@10gen.com [4e61 6d65 3a57 10

Data stored on disk in a column oriented fashion Predominantly hash based indexing Rudimentary or no secondary indexes Range queries and ordering on one dimension (row key) Some consistent hashing some range based row keys column family contact column family personal row Will twitter : wlaforest email : will@10gen.com phone : 555-555-5555 bio : Will attended picture : row Chris email : chris@10gen.com phone : 555-555-5555 bio : picture : hobby : golf 11

Data modeled as documents or objects (XML and JSON) No fixed schema Richest data model Consistent hashing and range based parcconing {name: will, eyes: blue, birthplace: NY, aliases: [ bill, la ciacco ], gender:???, boss: ben } name: jeff, eyes: blue, height: 72, boss: ben } name: ben, hat: yes } {name: brendan, aliases: [ el diablo ]} {name: matt, pizza: DiGiorno, height: 72, boss: 555.555.1212} 12

Not a database! Map Reduce on HDFS or other data source Great for grinding through data When you want to use it Can t use a index DistribuCng custom algorithms ETL Many NoSQL offerings have nacve map reduce funcconality 13

14

2007 founded 2009 first release of MongoDB MongoDB is open source 10gen has a Redhat- like business model SubscripCons (subscriber build, support) Training ConsulCng > 80M in funding Sequoia, NEA, In- Q- Tel, Union Square Ventures, Flybridge Capital, 15

#2 on Indeed s Fastest Growing Jobs Jaspersom BigData Index Demand for MongoDB, the document- oriented NoSQL database, saw the biggest spike with over 200% growth in 2011. Google Searches 451 Group MongoDB increasing its dominance 16

#2 ON INDEED S FASTEST GROWING JOBS 17

MongoDB INCREASING ITS DOMINANCE 18

Scale horizontally over commodity hardware Agility essencal (schema free/heterogeneous interface) RDBMSs great so keep what works Rich data models Adhoc queries Fully featured indexes What doesn t distribute well? Long running mulc- row transaccons Join Both arcfacts of the relaconal data model 19

Data stored as documents (JSON) Schema free CRUD operacons (Create Read Update Delete) Atomic document operacons Consistent (but is tunable advanced topic) Rich indexing (secondary, geospacal, covered) Ad hoc Queries like SQL Equality Ranges Regular expression searches GeospaCal ReplicaCon HA, read scalability, geo centric reads Sharding (somecmes called parcconing) for scalability 20

RDBMS MongoDB Database Database Table Collection Row Document 21

22

var p = { author: roger, date: new Date(), text: Spirited Away, tags: [ Tezuka, Manga ]} > db.posts.save(p) 23

>db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT- 0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ] } Notes: - _id is unique, but can be anything you d like 24

Create index on any Field in Document // 1 means ascending, -1 means descending >db.posts.ensureindex({author: 1}) >db.posts.find({author: 'roger'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger",... } 25

CondiConal Operators $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type $lt, $lte, $gt, $gte // find posts with any tags > db.posts.find( {tags: {$exists: true }} ) // find posts matching a regular expression > db.posts.find( {author: /^rog*/i } ) // count posts by an author before a certain date > db.posts.find( {author: roger, date:{ $lt: Sat }} ).count() 26

$set, $unset, $inc, $push, $pushall, $pull, $pullall, $bit > comment = { author: fred, date: new Date(), text: Best Movie Ever } > db.posts.update( { _id:... }, $push: {comments: comment} ); 27

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT- 0700 (PDT)", text : "Spirited Away", tags : [ "Tezuka", "Manga" ], comments : [ { author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT- 0700 (PDT)", text : "Best Movie Ever" } ]} 28

// Index nested documents > db.posts.ensureindex( comments.author :1 ) Ø db.posts.find({ comments.author : Fred }) // Index on tags > db.posts.ensureindex( tags: 1) > db.posts.find( { tags: Manga } ) // geospacal index > db.posts.ensureindex( author.locacon : 2d ) > db.posts.find( author.locacon : { $near : [22,42] } ) 29

30

Do them client side (Python or R) NaCve Map/Reduce in JS in MongoDB Distributes across the cluster with good data locality New aggregacon framework DeclaraCve (no JS required) Pipeline approach (like Unix ps - ax tee processes.txt more) Hadoop Intersect the indexing of MongoDB with the brute force parallelizacon of hadoop Hadoop MongoDB connector BI/ReporCng Tools Pentaho, Jaspersom, Ikanow, Nucleon 31

32

Prod clusters on order of 100B objects 50k qps per server ~1400 nodes Better data locality In-Memory Caching Auto-Sharding Replication /HA Horizontal Scaling 33