The MongoDB Tutorial Introduction for MySQL Users. Stephane Combaudon April 1st, 2014



Similar documents
MongoDB Developer and Administrator Certification Course Agenda

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

MongoDB: document-oriented database

Getting Started with MongoDB

SURVEY ON MONGODB: AN OPEN- SOURCE DOCUMENT DATABASE

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

MongoDB. The Definitive Guide to. The NoSQL Database for Cloud and Desktop Computing. Apress8. Eelco Plugge, Peter Membrey and Tim Hawkins

An Approach to Implement Map Reduce with NoSQL Databases

MongoDB Aggregation and Data Processing Release 3.0.4

MongoDB Aggregation and Data Processing

Introduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University

MongoDB and Couchbase

In Memory Accelerator for MongoDB

L7_L10. MongoDB. Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD.

Big data and urban mobility

ADVANCED DATABASES PROJECT. Juan Manuel Benítez V Gledys Sulbaran

Can the Elephants Handle the NoSQL Onslaught?

Mongo: some travel note Notes taken by Dario Varotto about MongoDB while doing mongo101p, mongo102 and mongo101js courses by MongoDB university.

Perl & NoSQL Focus on MongoDB. Jean-Marie Gouarné jmgdoc@cpan.org

MongoDB Aggregation and Data Processing

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Document Oriented Database

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

NoSQL Databases. Nikos Parlavantzas

Cloud Scale Distributed Data Storage. Jürmo Mehine

NoSQL in der Cloud Why? Andreas Hartmann

Sharding and MongoDB. Release MongoDB, Inc.

Python and MongoDB. Why?

Dr. Chuck Cartledge. 15 Oct. 2015

Sharding and MongoDB. Release MongoDB, Inc.

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Hadoop Job Oriented Training Agenda

Please ask questions! Have people used non-relational dbs before? MongoDB?

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

NoSQL: Going Beyond Structured Data and RDBMS

NoSQL Databases, part II

MongoDB. An introduction and performance analysis. Seminar Thesis

Integrating Big Data into the Computing Curricula

NoSQL web apps. w/ MongoDB, Node.js, AngularJS. Dr. Gerd Jungbluth, NoSQL UG Cologne,

Certified MongoDB Professional VS-1058

DocStore: Document Database for MySQL at Facebook. Peng Tian, Tian Xia 04/14/2015

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Comparing SQL and NOSQL databases

these three NoSQL databases because I wanted to see a the two different sides of the CAP

NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY DEPARTMENT OF CHEMICAL ENGINEERING ADVANCED PROCESS SIMULATION. SQL vs. NoSQL

MongoDB. Or how I learned to stop worrying and love the database. Mathias Stearn. N*SQL Berlin October 22th, gen

the missing log collector Treasure Data, Inc. Muga Nishizawa

RDBMS vs NoSQL: Performance and Scaling Comparison

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Application of NoSQL Database in Web Crawling

Big Data With Hadoop

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

MongoDB Security Guide

Database Administration with MySQL

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Cloud Computing at Google. Architecture

Querying MongoDB without programming using FUNQL

Log management with Logstash and Elasticsearch. Matteo Dessalvi

Scaling with MongoDB. by Michael Schurter Scaling with MongoDB by Michael Schurter - OS Bridge,

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Tobby Hagler, Phase2 Technology

CSCC09F Programming on the Web. Mongo DB

Brad Dayley. NoSQL with MongoDB

Why MySQL beats MongoDB

An Open Source NoSQL solution for Internet Access Logs Analysis

NoSQL and Hadoop Technologies On Oracle Cloud

Open source, high performance database

MongoDB Administration

.NET User Group Bern

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Building Your First MongoDB Application

MongoDB Security Guide Release 3.0.6

PHP and MongoDB Web Development Beginners Guide by Rubayeet Islam

Structured Data Storage

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

NoSQL Database - mongodb

Understanding NoSQL Technologies on Windows Azure

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

Practical Cassandra. Vitalii

Couchbase Server Under the Hood

NoSQL. Thomas Neumann 1 / 22

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Lecture Data Warehouse Systems

Andrew Moore Amsterdam 2015

Comparing Scalable NOSQL Databases

Adding Indirection Enhances Functionality

Transcription:

The MongoDB Tutorial Introduction for MySQL Users Stephane Combaudon April 1st, 2014

Agenda 2 Introduction Install & First Steps CRUD Aggregation Framework Performance Tuning Replication and High Availability Sharding

Before we start 3 No knowledge of MongoDB is required Several interesting topics will intentionally be left out Any problem, any question? Raise your hand!

Agenda 4 Introduction Install & First Steps CRUD Aggregation Framework Performance Tuning Replication and High Availability Sharding

Typical problems with RDBMS 5 SQL is difficult to understand Changing the schema is difficult with large tables aka The relational model is not flexible Scaling out and HA are difficult Support for sharding is limited or non-existent Many HA solutions, which one to choose?

MongoDB solutions (1) 6 SQL is difficult to understand Everything is a JSON document in MongoDB No joins but a powerful way to write complex queries (aggregation framework) Queries are easy to read/write SQL: SELECT * FROM people WHERE name = 'Stephane' MongoDB: db.people.find({name:'stephane'})

MongoDB solutions (2) 7 The relational model is not flexible MongoDB is schemaless, no ALTER TABLE! db.people.insert({name: 'Stephane'}); db.people.insert({name: 'Joe',age:30}); But be careful, this is also allowed db.people.insert({name: 'Stephane') db.people.insert({n: 'Joe'})

MongoDB solutions (3) Scaling and HA are difficult 8 Replica set M... mongod mongod Shard 1 Shard 2 Shard N mongod Config servers mongos Application... mongos Application

Summary 9 Scalability & Performance Memcached MongoDB RDBMS Functionality

Agenda 10 Introduction Install & First Steps CRUD Aggregation Framework Performance Tuning Replication and High Availability Sharding

MongoDB setup 11 See mongod_setup.rst

Useful terminology 12 Relational Database Table Row MongoDB Database Collection (JSON) document

JSON 13 Stands for JavaScript Object Notation But not tied to Javascript Open standard for human-readable data interchange Lightweight alternative to XML Internally, docs are stored in BSON Binary JSON See http://bsonspec.org

Invoking the shell 14 $ mongo (--port=xxx, default 27017)

Inserting data (1) 15 use test db.people.insert({name:'stephane',country:'fr'}) This inserts a document In the people collection Collection is in the test database Collection is created if it did not exist

Inserting data (2) 16 The shell embeds a JS interpreter You can also use a variable to build the JSON document step by step

_id 17 Unique identifier of a doc You can set it explicitly db.people.insert({_id:'stephane',country:'fr'}) If you don't, it will be created for you "_id" : ObjectId("523ef7bf8108101415e7d1d1") Monotonically increasing on a single node Think auto_increment in MySQL

Structure of a document 18 Set of key/value pairs {'key1':'value1','key2':'value2',...} A value can be an array A value can be another document

Lab #1 19 See lab1.rst

Agenda 20 Introduction Install & First Steps CRUD Aggregation Framework Performance Tuning Replication and High Availability Sharding

Inserting a document 21 SQL INSERT INTO t (fn, ln) VALUES ('John', 'Doe') MongoDB db.t.insert({fn:'john', ln: 'Doe'}) No need to specify non-existing fields

Updating a document (1) 22 SQL UPDATE t SET ln='smith' WHERE ln='doe' MongoDB db.t.update({ln:'doe'},{$set:{ln:'smith'}}) Only 1 doc. is updated by default Specify {multi:true} for multi-doc updates db.t.update({ln:'doe'},{$set:{ln:'smith'}}, {multi:true})

Updating a document (2) 23 To add a new field No need to run ALTER TABLE! It is a regular update db.t.update({ln:'smith'},{$set:{age:30}})

Removing documents 24 SQL DELETE FROM t WHERE fn='john' MongoDB db.t.remove({fn:'john'})

Dropping a collection 25 SQL DROP TABLE t MongoDB db.t.drop()

Selecting all documents 26 SQL SELECT * FROM t MongoDB db.t.find() SQL SELECT id, name FROM t MongoDB db.t.find({},{name:1})

Specifying a WHERE clause 27 SQL SELECT * FROM t WHERE fn='john' MongoDB db.t.find({ln:'john'}) SQL SELECT id, age FROM t WHERE fn='john' AND ln='smith' MongoDB db.t.find({fn:'john',ln:'smith'},{age:1})

Sorts, limits 28 SQL SELECT country FROM t WHERE fn='john' ORDER BY age DESC LIMIT 10 MongoDB db.t.find( {fn:'john'}, {country:1, _id:0} ).sort({age:1}).limit(10)

Inequalities 29 SQL SELECT fn,ln FROM t WHERE age > 30 MongoDB db.t.find( ) {age:{$gt:30}}, {fn:1, ln:1, _id:0}

Conditions on subdocuments 30 { } _id:... fn:'john', ln:'doe' address:{ country:'us', state:'ca' } db.t.find( {address.country:'us'}, ) {fn:1, ln:1, _id:0}

Conditions on arrays 31 { } _id:... hobbies:['music', 'MongoDB', 'Python'] Exact match on array db.t.find({hobbies:['music', 'MongoDB', 'Python']}) Match on array element db.t.find({hobbies:'python'})

Counting 32 SQL SELECT COUNT(*) FROM t WHERE country='us' MongoDB db.t.count({country:'us'})

Lab #2 33 See lab2.rst file

Agenda 34 Introduction Install & First Steps CRUD Aggregation Framework Performance Tuning Replication and High Availability Sharding

Aggregation 35 Means GROUP BY, SUM(), Not possible with the operators we saw previously (exception: count) 2 ways to aggregate Map-Reduce # Not covered here Aggregation Framework

Aggregation Framework 36 Available from MongoDB 2.2 Easier and faster than Map-Reduce Some limitations AF is perfect for simple cases Map-Reduce can be needed for complex situations

10,000 feet overview 37 Documents are modified through pipelines Similar to Unix pipelines grep oo /etc/passwd sort -rn awk -F ':' '{print $1,$3,$4}' Stage 1 Stage 2 Stage 3 Input Output

First example 38 SQL SELECT fn,count(*) AS total FROM people WHERE fn >= 'M%' GROUP BY fn MongoDB db.people.aggregate({$match:{fn:{$gte:'m'}}},{$group: {_id:"$fn",total:{$sum:1}}}) You probably want some explanation!

First example, explained 39 db.people.aggregate( {$match:xxx}, {$group:xxx} ) # Pipeline 1, filtering criteria # Pipeline 2, group by Filtering condition like with find() $match:{fn:{$gte:'m'}} Specifying the grouping field $group:{_id:"$fn",...} Specifying the aggregated fields $group:{..., total:{$sum:1}}

Pipeline order matters 40 db.people.aggregate( {$match:...}, {$group:...} ) vs db.people.aggregate( {$group:...}, {$match:...} )

SQL analogy 41 SELECT FROM people WHERE GROUP BY vs SELECT FROM people GROUP BY HAVING

Clean the output 42 The projecting operator allows renaming keys/values db.people.aggregate( {$match:...}, {$group:...}, {$project:{_id:0, name:{$toupper: $_id }, total:1}} )

Final query 43 SQL SELECT UPPER(fn) AS name, count(*) AS total FROM people WHERE fn >= 'M%' GROUP BY fn MongoDB db.people.aggregate( {$match:{fn:{$gte:'m'}}}, {$group:{_id:"$fn",total:{$sum:1}}}, {$project:{_id:0, name:{$toupper: $_id }, total:1}} )

Other operators 44 $sort {$sort: {total: -1}} $limit {$limit: 10} $skip {$skip: 100} $unwind Splits a document with an array into multiple documents

Lab #3 45 See lab3.rst

Agenda 46 Introduction Install & First Steps CRUD Aggregation Framework Performance Tuning Replication and High Availability Sharding

Indexes 47 MongoDB supports indexes At the collection level Similar to indexes on RDBMS Can be used for More efficient filtering More efficient sorting Index-only queries (covering index)

Main types of indexes 48 Single-field/multiple field index db.t.ensureindex({fn:1}) db.t.ensureindex({fn:1, age:-1}) Unique index All collections have an index on _id db.t.ensureindex({username:1}, {unique:true}) Indexes on arrays, embedded fields, subdocuments are also supported

Compound indexes (1) 49 Sort order Traversal in either direction db.t.find().sort({country:1,age:-1}) Good indexes {country:1, age:-1} {country:-1, age:1} Bad index {country:1, age:1}

Compound indexes (2) 50 Prefixes Leftmost prefixes are supported db.t.ensureindex({age:1,country:1}) Good for db.t.find({age:30}) db.t.find({age:30,country:'us'}) Not good for db.t.find({country:'us'})

Lab #4 51 See lab4.rst

Agenda 52 Introduction Install & First Steps CRUD Aggregation Framework Performance Tuning Replication and High Availability Sharding

Overview 53 Replication is asynchronous All writes go to the master Secondary can accept reads A new master is elected if the current master fails An arbiter (no data) can be set up for the election process

54 Writes Primary Heartbeat Secondary Secondary

Setup of a 3 node replica set 55 Start 3 instances with --replset rsname Then on the master rsconf = { } _id: "RS", members: [{ _id: 0, host: "localhost:30001" }] rs.initiate(rsconf) rs.add("localhost:30002") rs.add("localhost:30003")

If the master crashes 56 If a heartbeat does not return within 10s, the master is considered unavailable Election of a new master starts then You can influence the choice by setting a priority (between 0 and 100)

Setting priorities 57 cfg = rs.conf() cfg.members[1].priority = 0 cfg.members[2].priority = 10 rs.reconfig(cfg) A node set to have higher priority than the master will be elected the new master

Write concerns (aka w option) 58 How many nodes should ack a write? w=1 Primary only (default setting) w=2 Primary and one secondary w=majority Majority of the nodes

Setting a write concern 59 cfg = rs.conf() cfg.settings.getlasterrordefaults = {w: "majority"} rs.reconfig(cfg)

Read preferences 60 Can be any of primary primarypreferred secondary secondarypreferred nearest Or you can use a custom tag

Lab #5 61 See lab5.rst

Agenda 62 Introduction Install & First Steps CRUD Aggregation Framework Performance Tuning Replication and High Availability Sharding

When to shard 63 When the dataset no longer fits in a single server When write activity exceeds the capacity of a single node When the working set no longer fits in memory

Architecture of a sharded cluster 64 Replica set M... mongod mongod Shard 1 Shard 2 Shard N mongod Config servers mongos Application... mongos Application

Setup (1) 65 Start the config servers mongod --configsvr --dbpath <path> --port 40001 mongod --configsvr --dbpath <path> --port 40002 mongod --configsvr --dbpath <path> --port 40003 Start a router mongos configdb localhost:40001, localhost:40002, localhost:40003 --port 31000 Connect to mongos mongo --host localhost --port 31000

Setup (2) 66 Add shards sh.addshard('rs/localhost:30001') Enable sharding for a database sh.enablesharding('test') Enable sharding for a collection sh.shardcollection('test.people',{'fn':1})

Balancing (1) 67 Data is stored in chunks sh.addshard(shard1) - 32 33 45 46 82 83 + Shard 1

Balancing (2) 68 sh.addshard(shard2) - 32 33 45 46 82 83 + Shard 1 Shard 2

Balancing (3) 69 sh.addshard(shard3) - 32 33 45 46 82 83 + Shard 1 Shard 2 Shard 3

Lab #6 70 See lab6.rst

71 Thank you for your attention! stephane.combaudon@percona.com