Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.



Similar documents
MongoDB Developer and Administrator Certification Course Agenda

Dr. Chuck Cartledge. 15 Oct. 2015

Getting Started with MongoDB

The MongoDB Tutorial Introduction for MySQL Users. Stephane Combaudon April 1st, 2014

L7_L10. MongoDB. Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD.

Sharding and MongoDB. Release MongoDB, Inc.

MongoDB. The Definitive Guide to. The NoSQL Database for Cloud and Desktop Computing. Apress8. Eelco Plugge, Peter Membrey and Tim Hawkins

Sharding and MongoDB. Release MongoDB, Inc.

In Memory Accelerator for MongoDB

MongoDB Aggregation and Data Processing Release 3.0.4

Data Model Design for MongoDB

MongoDB Aggregation and Data Processing

Please ask questions! Have people used non-relational dbs before? MongoDB?

MongoDB: document-oriented database

Mongo: some travel note Notes taken by Dario Varotto about MongoDB while doing mongo101p, mongo102 and mongo101js courses by MongoDB university.

Brad Dayley. NoSQL with MongoDB

Cloudant Querying Options

Integrating VoltDB with Hadoop

MongoDB and Couchbase

Document Oriented Database

MongoDB Aggregation and Data Processing

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

MONGODB - THE NOSQL DATABASE

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

SURVEY ON MONGODB: AN OPEN- SOURCE DOCUMENT DATABASE

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Certified MongoDB Professional VS-1058

Introduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University

MS Access Lab 2. Topic: Tables

Workflow Templates Library

Data Management in the Cloud

NoSQL: Going Beyond Structured Data and RDBMS

MongoDB: The Definitive Guide

.NET User Group Bern

Intro to Databases. ACM Webmonkeys 2011

CSCC09F Programming on the Web. Mongo DB

NoSQL in der Cloud Why? Andreas Hartmann

High-Volume Data Warehousing in Centerprise. Product Datasheet

Storage Sync for Hyper-V. Installation Guide for Microsoft Hyper-V

Building Your First MongoDB Application

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, Version 4.0

Exam Number/Code : Exam Name: Name: PRO:MS SQL Serv. 08,Design,Optimize, and Maintain DB Admin Solu. Version : Demo.

Structured Data Storage

Introduction to MongoDB. Kristina Chodorow

Big data and urban mobility

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

1 File Processing Systems

Basics Of Replication: SQL Server 2000

Planning and Creating a Custom Database

Microsoft Access 3: Understanding and Creating Queries

HansaWorld SQL Training Material

HYPERION SYSTEM 9 N-TIER INSTALLATION GUIDE MASTER DATA MANAGEMENT RELEASE 9.2

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

ICE for Eclipse. Release 9.0.1

DBMS / Business Intelligence, SQL Server

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

A survey of big data architectures for handling massive data

Backup and Recovery. What Backup, Recovery, and Disaster Recovery Mean to Your SQL Anywhere Databases

The Google File System

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Introduction to Microsoft Access 2003

Hyper-V Protection. User guide

NoSQL Databases, part II

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Physical Data Organization

Microsoft Access Basics

Eucalyptus User Console Guide

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

Hyper-V Protection. User guide

Can the Elephants Handle the NoSQL Onslaught?

The Hadoop Distributed File System

CitusDB Architecture for Real-Time Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

MarkLogic Server. Database Replication Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

Restore and Recovery Tasks. Copyright 2009, Oracle. All rights reserved.

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to am to 5 pm HOTEL DUBAI GRAND DUBAI

Database Administration with MySQL

Integrating Big Data into the Computing Curricula

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Bigdata High Availability (HA) Architecture

Perl & NoSQL Focus on MongoDB. Jean-Marie Gouarné jmgdoc@cpan.org

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

MongoDB. An introduction and performance analysis. Seminar Thesis

An Approach to Implement Map Reduce with NoSQL Databases

Sisense. Product Highlights.

Oracle WebLogic Server

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

In-Memory Databases MemSQL

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Similarity Search in a Very Large Scale Using Hadoop and HBase

WebLogic Server Foundation Topology, Configuration and Administration

MongoDB: The Definitive Guide

Cloud computing - Architecting in the cloud

Transcription:

MongoDB 1. Introduction MongoDB is a document-oriented database, not a relation one. It replaces the concept of a row with a document. This makes it possible to represent complex hierarchical relationships with a single record. There are no predefined schemas, this makes adding or removing fields easier. Data are easier to scale out. Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster. Missing features: joins and complex multi-row transactions. Whenever possible, the database server offloads processing and logic to the client side. 2. Getting Started Document: ordered set of keys with associated values. Ex: { "foo" : 3 }. MongoDB is type- and case-sensitive No duplicate keys Collection: a group of documents. Analog to a table. Database: a group of collections. A database has its own permissions, and is stored in separate files on disk. Namespace: a database name + a collection name = a qualified collection name. db : print the currently assigned db use test2 : switch to the db test2 db.blog.insert(post) : insertion of the JS variable post db.blog.find() : retrieve the content of a collection db.blog.findone() : retrieve the first element of a collection db.blog.update({field: new value }, post): update a document db.blog.remove({field: value }); show dbs show collections 3. Creating, Updating, and Deleting Documents db.foo.insert({ bar : bat }) db.foo.batchinsert( [ { _id :0},{ _id :1},{ _id :2} ] ) db.foo.remove() db.foo.remove({ opt-out :true}) db.foo.drop() updates: the last one will win $inc: increment the value of a key: { $inc : { pageviews : 1 }}

$set: set the value of a field. $unset: remove a key and its value $push: add elements to the end of an array $each: modifier available for user with $addtoset and $push $slice $sort $ne: not equal $addtoset: add only if already exist (prevent duplicate) $pop: remove like a queue or a stack $pull: remove elements of an array that match the given criteria $: positional operator Upsert: Update or Insert $setoninsert: only set the value of a field when the document is being inserted getlasterror: return info on the last operation findandmodify: return the item and update it in a single operation unacknowledged writes: do not return any status response (for low value data) 4. Querying The find method is used to perform queries. Which documents get returned is determined by the first argument. Ex: db.users.find({"age":27}). Second argument: the keys you want. Ex: db.users.find({}, {"mail":1, "_id": 0}). Conditionals: $lt, $lte, $gt, $gte, $in, $nin, $or, $and, $exist Ex: db.users.find({"age" : {"$gte" : 18, "$lte" : 30}}) Ex: db.raffle.find({"ticket_no" : {"$in" : [725, 542, 390]}}) Ex: db.raffle.find({"$or" : [{"ticket_no" : 725}, {"winner" : true}]}) Regular expressions: db.users.find({"name": /joe/i}) Array: each query clause can match a different array element Embedded documents: use the "dot" notation to access inner fields. Where clause: allows to execute arbitrary JS (dangerous and slow) db.foo.find( "$where": function( { return true / false }} ); Cursors: hasnext(), next(), foreach(f(x) { }) limit(), skip(), sort({}) snapshot(): make sure each document is returned only once (slower) DB commands db.runcommand( { "drop", "test" });

5. Indexing Table scan = query that does not use an index Creating an index: db.people.ensureindex( { "profession": 1} ); Compound index: db.users.ensureindex({"age" : 1, "username" : 1}); Inefficient operators: $where, $exist, $ne, $not explain() is a tool to diagnosticate slow queries. Unique indexes: db.users.ensureindex( { "username": 1}, { "unique" : true }); Sparse indexes: indexes that need not include every document as an entry. db.ensureindex({"email" : 1}, {"unique" : true, "sparse" : true}) To retrieve indexes: db.<table>.getindexes() 6. Special Index and Collection Types Capped collections: fixed-size collections, circular queues. Documents cannot be removed or deleted. Useful for logging. Tailable cursors: cursor that continue to fetch new results as documents are added to the collection. TTL indexes: remove document after a given timeout Full-text indexes: indexing a large amount of text Geospatial indexes: uses GeoJSON, a format for encoding a variety of geographic data structures. Grid FS: mechanism for storing large binary files in Mongo DB 7. Aggregation The aggregator framework lets you transform and combine documents in a collection. $project: extracts fields from subdocuments, rename fields, perform operations on them $match: filters documents $group: groups documents based on some fields $sort $limit $unwind: split each field of an array into a separate document Mathematical expressions: $sum, $add, $subtract, $multiply, $divide, $mod Date expressions: $year, $month, $week, $dayofmonth String expressions: $substr Logical expressions: $cmp, $eq, $lt, $and Control statements: $cond, $ifnull

Ex: > db.articles.aggregate( { "$project" : { "author": 1}}, ); { "$group" : {"_id": "$author", "count": {"$sum" : 1 }}}, { "$sort" : {"count" : -1}}, { "$limit" : 5 } Map Reduce Powerful and flexible tool for aggregating data. It can be easily parallelized across multiple servers. It splits up a problem, sends chuncks of it to different machines, and lets each machine solve its part of the problem. When all the machines are finished, they merge all the pieces of the solution back into a full solution. Two steps: map and reduce. 1. Map: maps an operation onto every document in a collection 2. Reduce: takes the list of values and reduces it to a single element db.runcommand( {"mapreduce" : "foo", "map" : mymap, "reduce": myreduce}); Other commands $count $distinct $group 8. Application Design Normalization: dividing up data into multiple collections with reference between collections. MongoDB has no joining facilities, so gathering documents from multiple collections requires multiple queries. Each piece of data lives in one collection, and multiple documents may reference it. Denormalization: embedding all of the data in a single document. Many documents may have copies of the data. Multiple documents need to be updated if the information changes, but all related data can be fetched with a single query. Normalizing makes writes faster, denormalizing makes read faster. Ex: user and address: best to embed the address in the user document (faster read) since the address does not change often. Cardinality: one-to-one, one-to-many, many-to-many. Split "many" in "many" and "few". "Few" relationships work better with embedding, "many" with references. If a document has only one field that grows, try to keep it as the last field in the document. When not to use MongoDB: when we need transactions or joining many different types of data.

9. Setting Up a Replica Set Replication is a way of keeping identical copies of the data on multiple servers. A replica set is a group of servers with one primary, the server taking client requests, and multiple secondary, servers that keep copies of the primary's data. If the primary crashes, the secondaries can elect a new primary from among themselves. 10. Components of a Replica Set MongoDB takes care of the replication by keeping a log of operations (oplog), containing every write that a primary performs. The secondary server query this collection for operations to replicate. 11. Connecting to a Replica Set from your application From an application' point of view, a replica set behaves much like a standalone server. To ensure that writes will be persisted no matter what happens to the set, you must ensure that the write propagates to a majority of the members of the set. We can use the getlasterror() command to check that write were successful: db.runcommand({"getlasterror" : 1, "w" : "majority"}) Applications that require strongly consistent reads should not read from secondaries. 12. Administration This chapter covers replica set administration. 13. Introduction to Sharding Sharding refers to the process of splitting data up across machines (also called portioning). By putting a subset of data on each machine, it becomes possible to store more data and handle more load (<> Replication, that creates exact copy of the data on multiple servers). MongoDB supports auto-sharding, which tries to abstract the architecture away from the application and simplify the administration of such a system. MongoDB automates balancing data across shards and makes it easier to add and remove capacity.

14. Configuring Sharding Do not shard to early, or too late. Use sharding to Increase available RAM Increase available disk space Reduce load on a server Read or write data with greater throughput that a single mongod can handle 15. Choosing a Shard Key The most important and difficult task when using sharding is choosing how your data will be distributed. A shard key is a field used to split up the data. Three types of keys: ascending key, random, locationbased. 16. Sharding Administration This chapter gives advice on performing administrative tasks on all parts of a cluster, including: inspecting the cluster's state, add - remove - change members of a cluster, administering data movement and manually moving data 17. Seeing what your application is doing Current operation: db.currentop() Killing operation: db.killop(id) Profiling: db.system.profile.find().pretty() Calculating size: Object.bsonsize(db.users.findOne()) Stats: db.users.stats()... 18. Data Administration Adding root user: use admin; db.adduser("root", "abcd"); To enable security: --auth command-line option To authenticate: db.auth("user", "password");...

19. Durability Durability is the guarantee that an operation that is committed will survive permanently. Use db.foo.validate() to check a collection for corruption.... 20. Starting and Stopping MongoDB --dbpath --port --config Closing: {"shutdown": 1 } 21. Monitoring MongoDB 22. Making Backups 23. Deploying MongoDB