Humongous MongoDB. Sean Corfield World Singles llc



Similar documents
The MongoDB Tutorial Introduction for MySQL Users. Stephane Combaudon April 1st, 2014

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

In Memory Accelerator for MongoDB

MongoDB Developer and Administrator Certification Course Agenda

MongoDB and Couchbase

MongoDB: document-oriented database

Sharding and MongoDB. Release MongoDB, Inc.

Sharding and MongoDB. Release MongoDB, Inc.

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

Getting Started with MongoDB

Big data and urban mobility

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

MongoDB. The Definitive Guide to. The NoSQL Database for Cloud and Desktop Computing. Apress8. Eelco Plugge, Peter Membrey and Tim Hawkins

Distributed Databases

MongoDB Aggregation and Data Processing Release 3.0.4

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

MONGODB - THE NOSQL DATABASE

Scaling with MongoDB. by Michael Schurter Scaling with MongoDB by Michael Schurter - OS Bridge,

Data Management in the Cloud

MongoDB Aggregation and Data Processing

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

a year with a talk by Armin '@mitsuhiko' Ronacher for PyGrunn 2013

Department of Software Systems. Presenter: Saira Shaheen, Dated:

Python and MongoDB. Why?

Integrating VoltDB with Hadoop

Hadoop IST 734 SS CHUNG

HDFS Users Guide. Table of contents

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

ArcGIS for Server Deployment Scenarios An ArcGIS Server s architecture tour

Pertino HA Cluster Deployment: Enabling a Multi- Tier Web Application Using Amazon EC2 and Google CE. A Pertino Deployment Guide

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Introduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University

A programming model in Cloud: MapReduce

MongoDB Security Guide

FioranoMQ 9. High Availability Guide

Mongo: some travel note Notes taken by Dario Varotto about MongoDB while doing mongo101p, mongo102 and mongo101js courses by MongoDB university.

Document Oriented Database

Workflow performance analysis tests

MongoDB Administration

NoSQL web apps. w/ MongoDB, Node.js, AngularJS. Dr. Gerd Jungbluth, NoSQL UG Cologne,

Can the Elephants Handle the NoSQL Onslaught?

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

MongoDB Aggregation and Data Processing

Building Your First MongoDB Application

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

FILECLOUD HIGH AVAILABILITY

AOL CUSTOMER SUCCESS STORY

MapReduce Job Processing

Introduction to Big Data Training

Introduction to HDFS. Prasanth Kothuri, CERN

Hadoop and Map-Reduce. Swati Gore

MongoDB. Or how I learned to stop worrying and love the database. Mathias Stearn. N*SQL Berlin October 22th, gen

Getting Started with SandStorm NoSQL Benchmark

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Solving Large-Scale Database Administration with Tungsten

NoSQL Databases. Nikos Parlavantzas

SURVEY ON MONGODB: AN OPEN- SOURCE DOCUMENT DATABASE

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

OBIEE 11g Scaleout & Clustering

MarkLogic Server. Database Replication Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

NoSQL Database Options

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Evaluation of NoSQL databases for large-scale decentralized microblogging

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Intellicus Enterprise Reporting and BI Platform

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, Anthony Yeh, Software Engineer, YouTube.

Perl & NoSQL Focus on MongoDB. Jean-Marie Gouarné jmgdoc@cpan.org

Deploying Hadoop with Manager

SQL Server AlwaysOn. Michal Tinthofer 11. Praha What to avoid and how to optimize, deploy and operate.

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

RDBMS vs NoSQL: Performance and Scaling Comparison

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Apache HBase: the Hadoop Database

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Comparing SQL and NOSQL databases

Set Up Hortonworks Hadoop with SQL Anywhere

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015

Please ask questions! Have people used non-relational dbs before? MongoDB?

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

Database Scalability {Patterns} / Robert Treat

Frictionless Persistence in.net with MongoDB. Mogens Heller Grabe Trifork

Couchbase Server Under the Hood

Benchmarking Top NoSQL Databases Apache Cassandra, Couchbase, HBase, and MongoDB Originally Published: April 13, 2015 Revised: May 27, 2015

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

HDFS Cluster Installation Automation for TupleWare

Log management with Logstash and Elasticsearch. Matteo Dessalvi

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Common Server Setups For Your Web Application - Part II

HO5604 Deploying MongoDB. A Scalable, Distributed Database with SUSE Cloud. Alejandro Bonilla. Sales Engineer abonilla@suse.com

ORACLE NOSQL DATABASE HANDS-ON WORKSHOP Cluster Deployment and Management

L7_L10. MongoDB. Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD.

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 16,

16.1 MAPREDUCE. For personal use only, not for distribution. 333

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Turn Big Data to Small Data

Certified MongoDB Professional VS-1058

Transcription:

Humongous MongoDB Sean Corfield World Singles llc 1

Agenda Scaling MongoDB - Concepts Replica Sets & Sharding Read Preference, Write Concern, Etc Map/Reduce Aggregation 2

You Might Prefer... Queries and Looping and Grouping Enterprise JavaScript Workflows Using PhoneGap to Build Mobile Apps Deep Dive - two hours, halfway thru! Caching in ColdFusion 10 3

Me Functional Programming in the 80's Object-Oriented Programming in the 90's Web / Dynamic Programming in the 00's Mostly Clojure today 4

Me & MongoDB Using MongoDB in production (2 years) Took 10gen "MongoDB for Developers" course (Python + MongoDB) Lead maintainer for Clojure's MongoDB wrapper "CongoMongo" 5

Agenda Scaling MongoDB - Concepts Replica Sets & Sharding Read Preference, Write Concern, Etc Map/Reduce Aggregation 6

Scaling Concepts Master / slave for traditional DBs One master One or more slaves Usually scale "up" not "out" 7

Scaling Concepts MongoDB replaces "master / slave" with "replica set" for failover & load distribution MongoDB adds sharded clusters to support very large data sets - horizontal scale out 8

Replica Set Concepts A replica set contains any number of (mostly) identical nodes A subset of these vote to elect a primary All remaining nodes are then secondaries Writes go to the primary node and replicate to the secondary nodes 9

Replica Set Concepts Reads generally go to the primary node but can be performed against secondaries Per connection or per operation Can also be targeted to specific nodes Tagged of nodes and reads 10

Replica Set Concepts Nodes may also be Secondary only Hidden Arbiter Non-voting Delayed 11

Sharding Concepts Multiple "shards" Servers or clusters (replica sets) Configuration server (or a cluster) Shard server proxy (mongos process) Lightweight, can have one per app server 12

app app app mongos mongos mongos config config shard primary shard primary config shard primary shard sec'dary shard sec'dary shard sec'dary shard sec'dary shard sec'dary shard sec'dary 13

Sharding Concepts A collection is split across the shards Automatic, based on a key column Automatically balanced across shards Reads directed to appropriate shards May run on all shards, then aggregate 14

Agenda Scaling MongoDB - Concepts Replica Sets & Sharding Read Preference, Write Concern, Etc Map/Reduce Aggregation 15

Replica Set Setup Start MongoDB servers as replica nodes add --replset {rsname} specify unique ports and --dbpath folders! 16

Replica Set Setup Connect via mongo shell to one server initiate the replica set add the other servers to that 17

Replica Sets Either: conf = { _id: "rsname", members: [ { _id: 0, rs.initiate( conf ) host: "server-x:27017" } ] } 18

Replica Set Setup Or: rs.initiate() Creates default rs.conf() Now add the others: rs.add( "server-y:27017" ) rs.add( "server-z:27017" ) 19

Replica Set Demo Let's see a real replica set running locally! Must use local machine name Must use different ports for each instance 20

Sharding Setup Start config server(s) mongod --configsvr --dbpath /data/cfg Start mongos process(es) mongos --configdb server-c:27019 Start server (or replica set) for each shard mongod --dbpath /data/sh1 21

Sharding Setup Add each shard to configuration sh.addshard("server-s:27100") Enable sharding for the database sh.enablesharding("mydb") Enable sharding for a collection sh.shardcollection("mydb.coll",{thekey:1}) 22

Sharding In Use Connect to the mongos server (or cluster) Interact with the database as usual Data automatically moved between shards Reads automatically routed based on key 23

Sharding In Use If a query includes the shard key, it will be routed directly to the appropriate server If a query does not include the shard key (or uses a range), the query will be sent to multiple servers and the results merged in the mongos process 24

Sharding Demo Let's see it running! 25

Agenda Scaling MongoDB - Concepts Replica Sets & Sharding Read Preference, Write Concern, Etc Map/Reduce Aggregation 26

Read Preference Normally all reads go to the primary Just like writes As we saw, you cannot normally read from a secondary unless you explicitly allow it Sometimes you want to spread the read load or read from nearby servers (in a geographically distributed cluster) 27

Read Preference Available options: primary - default primarypreferred secondary secondarypreferred nearest 28

Read Preference Secondaries can return stale data! Can specify preference Per connection Per collection Per operation Not currently supported by cfmongodb 29

Write Concern By default, receipt of a write command is acknowledged by the server (but may not yet have been written to disk) You can control what operations you wait for before a write command returns 30

Write Concern Errors Ignored - do not use! Unacknowledged Fire and forget Network errors are detected Used to be the default 31

Write Concern Acknowledged Current default (as of late 2012) Network errors, duplicate keys etc Journaled The update is written to local journal Durable - will survive shutdown / crash 32

Write Concern Replica Acknowledged The update is written to one or more secondaries in a replica set Can specify number or "majority" Specifying number will block until that many secondaries have the write (and therefore it can block "forever"!) 33

Write Concern cfmongodb supports per-connection only Not per-operation mongoclientoptions arg to MongoConfig writeconcern field of that struct 34

WriteConcern Retrieve from a Java class mongofactory.getobject( "com.mongodb.writeconcern" ).UNACKNOWLEDGED http://api.mongodb.org/java/2.10.1/com/ mongodb/writeconcern.html 35

Agenda Scaling MongoDB - Concepts Replica Sets & Sharding Read Preference, Write Concern, Etc Map/Reduce Aggregation 36

Map/Reduce Intended for complex data processing Batch operation, not real time! You provide map, reduce, finalize functions written in JavaScript (as strings!) 37

Map/Reduce people = mongo.getdbcollection("people"); people.mapreduce( map = "function(){...}", reduce = "function(key,values){... }", outputtarget =... ); For finalize you must use a DB command Examples in cfmongodb aggregation folder 38

Agenda Scaling MongoDB - Concepts Replica Sets & Sharding Read Preference, Write Concern, Etc Map/Reduce Aggregation 39

Aggregation Framework Added in MongoDB 2.2 Native, pipeline-based functions project (SELECT), match (WHERE), group (GROUP BY), sort (ORDER BY), unwind, skip, limit, geonear (new in 2.4) Simple aggregate() function takes each operation as an argument in order 40

Aggregation Framework result = musicians.aggregate( { "$group" : { "_id" : "$status", ); "total" : { "$sum" = 1 } } }, { "$project" : { "status" : "$_id", "numberofmusicians" : "$total", "_id" : 0 } } 41

Aggregation Framework Equivalent to SELECT COUNT(*) AS total, status FROM musicians GROUP BY status More examples in cfmongodb aggregation folder 42

Summary MongoDB Supports simple, robust clustering with automatic failover Supports data sharding to provide automatic horizontal scalability Provides plenty of control over reading and writing in a clustered environment 43

Summary cfmongodb supports Robust, scalable clustering Big Data manipulation through map/ reduce and the aggregation framework Write Concern (partially) It's open source - contribute! 44

Resources http://cfmongodb.riaforge.org http://mongodb.org http://docs.mongodb.org http://www.10gen.com 45

Q & A? @seancorfield http://corfield.org sean@corfield.org 46