Building Your First MongoDB Application
Ross Lawley Python Engineer @ 10gen Web developer since 1999 Passionate about open source Agile methodology email: ross@10gen.com twitter: RossC0
Today's Talk Quick introduction to NoSQL Some Background about mongodb Using mongodb, general querying and usage Building your first app, creating a location-based app Data modelling in mongodb, queries, geospatial, updates and map reduce (examples work in mongodb JS shell)
What is NoSQL? Key / Value Column Graph Document
Key-Value Stores A mapping from a key to a value The store doesn't know anything about the the key or value The store doesn't know anything about the insides of the value Operations Set, get, or delete a key-value pair
Column-Oriented Stores Like a relational store, but flipped around: all data for a column is kept together An index provides a means to get a column value for a record Operations: Get, insert, delete records; updating fields Streaming column data in and out of Hadoop
Graph Databases Stores vertex-to-vertex edges Operations: Getting and setting edges Sometimes possible to annotate vertices or edges Query languages support finding paths between vertices, subject to various constraints
Document Stores The store is a container for documents Documents are made up of named fields Fields may or may not have type definitions e.g. XSDs for XML stores, vs. schema-less JSON stores Can create "secondary indexes" These provide the ability to query on any document field(s) Operations: Insert and delete documents Update fields within documents
What is? MongoDB is a scalable, high-performance, open source NoSQL database. Document-oriented storage Full Index Support Querying Fast In-Place Updates Replication & High Availability Auto-Sharding Map/Reduce GridFS
Database Landscape scalability & performance memcached key/value RDBMS depth of functionality
History First release February 2009 v1.0 - August 2009 v1.2 - December 2009 MapReduce, ++ v1.4 - March 2010 Concurrency, Geo V1.6 - August 2010 Sharding, Replica Sets V1.8 March 2011 Journaling, Geosphere V2.0 - Sep 2011 V1 Indexes, Concurrency V2.2 - Soon - Aggregation, Concurrency
Company behind mongodb (A)GPL license, own copyrights, engineering team support, consulting, commercial license Management Google/DoubleClick, Oracle, Apple, NetApp Funding: Sequoia, Union Square, Flybridge Offices in NYC, Palo Alto, London, Dublin 100+ employees
Where can you use it? MongoDB is Implemented in C++ Platforms 32/64 bit Windows, Linux, Mac OS-X, FreeBSD, Solaris Drivers are available in many languages 10gen supported C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala Community supported Clojure, ColdFusion, F#, Go, Groovy, Lua, R... http://www.mongodb.org/display/docs/drivers
Why use mongodb? Intrinsic support for agile development Super low latency access to your data Very little CPU overhead No additional caching layer required Built in replication and horizontal scaling support
Terminology RDBMS Table Row(s) Index Join Partition Partition Key MongoDB Collection JSON Document Index Embedding & Linking Shard Shard Key
Documents Blog Post Document > p = { author: "Ross", date: new Date(), text: "About MongoDB...", tags: ["tech", "databases"]} > db.posts.save(p)
Querying > db.posts.find() { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ] } Notes: _id is unique, but can be anything you'd like
Introducing BSON JSON has powerful, but limited set of datatypes arrays, objects, strings, numbers and null BSON is a binary representation of JSON Adds extra dataypes with Date, Int types, Id, Optimized for performance and navigational abilities And compression MongoDB sends and stores data in BSON
BSON http://bsonspec.org
Query Operators Conditional Operators - $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $size, $type - $lt, $lte, $gt, $gte // find posts with any tags > db.posts.find({tags: {$exists: true }}) // find posts matching a regular expression > db.posts.find({author: /^ro*/i }) // count posts by author > db.posts.find({author: 'Ross'}).count()
Secondary Indexes Create index on any Field in Document // 1 means ascending, -1 means descending > db.posts.ensureindex({author: 1}) > db.posts.findone({author: 'Ross'}) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross",... }
Compound Indexes Create index on multiple fields in a Document // 1 means ascending, -1 means descending > db.posts.ensureindex({author: 1, ts: -1}) > db.posts.find({author: 'Ross'}).sort({ts: -1}) [{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Ross",...}, { _id : ObjectId("4f61d325c496820ceba84124"), author: "Ross",...}]
Examine the query plan > db.posts.find({"author": 'Ross'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedobjects" : 1, "n" : 1, "millis" : 0, "indexbounds" : { "author" : [ [ "Ross", "Ross" ] ] } }
Atomic Operations $set, $unset, $inc, $push, $pushall, $pull, $pullall, $bit // Create a comment > new_comment = { author: "Fred", date: new Date(), text: "Best Post Ever!"} // Add to post > db.posts.update({ _id: "..." }, {"$push": {comments: new_comment}, "$inc": {comments_count: 1} });
Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 }
Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Ross", date : "Thu Feb 02 2012 11:50:01", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : "Fri Feb 03 2012 13:23:11", text : "Best Post Ever!" }], comment_count : 1 }
Secondary Indexes // Index nested documents > db.posts.ensureindex("comments.author": 1) > db.posts.find({"comments.author": "Fred"}) // Index on tags (multi-key index) > db.posts.ensureindex( tags: 1) > db.posts.find( { tags: "tech" } )
Geo Geo-spatial queries Require a geo index Find points near a given point Find points within a polygon/sphere // geospatial index > db.posts.ensureindex({"author.location": "2d"}) > db.posts.find({ "author.location" : { $near : [22, 42] }})
Map Reduce The caller provides map and reduce functions written in JavaScript // Emit each tag > map = "this['tags'].foreach( function(item) {emit(item, 1);} );" // Calculate totals > reduce = "function(key, values) { var total = 0; var valuessize = values.length; for (var i=0; i < valuessize; i++) { total += parseint(values[i], 10); } return total; };
Map Reduce // run the map reduce > db.posts.mapreduce(map, reduce, {"out": { inline : 1}}); { "results" : [ {"_id" : "databases", "value" : 1}, {"_id" : "tech", "value" : 1 } ], "timemillis" : 1, "counts" : { "input" : 1, "emit" : 2, "reduce" : 0, "output" : 2 }, "ok" : 1, }
Aggregation - coming in 2.2 // Count tags > agg = db.posts.aggregate( {$unwind: "$tags"}, {$group : {_id : "$tags", count : {$sum: 1}}} ) > agg.result [{"_id": "databases", "count": 1}, {"_id": "tech", "count": 1}]
GridFS Save files in mongodb Stream data back to the client // (Python) Create a new instance of GridFS >>> fs = gridfs.gridfs(db) // Save file to mongo >>> my_image = open('my_image.jpg', 'r') >>> file_id = fs.put(my_image) // Read file >>> fs.get(file_id).read()
Building Your First MongoDB App Want to build an app where users can check in to a location Leave notes or comments about that location Iterative Approach: Decide requirements Design documents Rinse, repeat :-)
Requirements "As a user I want to be able to find other locations nearby" Need to store locations (Offices, Restaurants etc) name, address, tags coordinates user generated content e.g. tips / notes
Requirements "As a user I want to be able to 'checkin' to a location" Checkins User should be able to 'check in' to a location Want to be able to generate statistics: Recent checkins Popular locations
Locations v1 > location_1 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402" }
Locations v1 > location_1 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402" } > db.locations.save(location_1) > db.locations.find({name: "Holiday Inn"})
Locations v2 > location_2 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", } tags: ["hotel", "conference", "mongodb"]
Locations v2 > location_2 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", } tags: ["hotel", "conference", "mongodb"] > db.locations.ensureindex({tags: 1}) > db.locations.find({tags: "hotel"})
Locations v3 > location_3 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"], } lat_long: [52.5184, 13.387]
Locations v3 > location_3 = { name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"], } lat_long: [52.5184, 13.387] > db.locations.ensureindex({lat_long: "2d"}) > db.locations.find({lat_long: {$near:[52.53, 13.4]}})
Finding locations // creating your indexes: > db.locations.ensureindex({name: 1}) > db.locations.ensureindex({tags: 1}) > db.locations.ensureindex({lat_long: "2d"}) // with regular expressions: > db.locations.find({name: /^holid/i}) // by tag: > db.locations.find({tag: "hotel"}) // finding places: > db.locations.find({lat_long: {$near:[52.53, 13.4]}})
Inserting locations - adding tips // initial data load: > db.locations.insert(location_3) // adding a tip with update: > db.locations.update( {name: "Holiday Inn"}, {$push: { tips: { user: "Ross", date: ISODate("05-04-2012"), tip: "Check out my replication talk later!"} }})
Tips added > db.locations.findone() { _id : ObjectId("5c4ba5c0672c685e5e8aabf3"), name: "Holiday Inn", address: "Engelhardsgasse 12", city: "Nürnberg ", post_code: "D-90402", tags: ["hotel", "conference", "mongodb"], lat_long: [52.5184, 13.387], } tips:[{ user: "Ross", date: ISODate("05-04-2012"), tip: "Check out my replication talk later!" }]
Requirements "As a user I want to be able to find other locations nearby" Need to store locations (Offices, Restaurants etc) name, address, tags coordinates user generated content e.g. tips / notes "As a user I want to be able to 'checkin' to a location" Checkins User should be able to 'check in' to a location Want to be able to generate statistics: Recent checkins Popular locations
Users and Checkins > user_1 = { _id: "ross@10gen.com", name: "Ross", twitter: "RossC0", checkins: [ {location: "Holiday Inn", ts: "25/04/2012"}, {location: "Munich Airport", ts: "24/04/2012"} ] }
Simple Stats // find all users who've checked in here: > db.users.find({"checkins.location":"holiday Inn"})
Simple Stats // find all users who've checked in here: > db.users.find({"checkins.location":"holiday Inn"}) // Can't find the last 10 checkins easily > db.users.find({"checkins.location":"holiday Inn"}).sort({"checkins.ts": -1}).limit(10) Schema hard to query for stats.
User and Checkins v2 > user_2 = { _id: "ross@10gen.com", name: "Ross", twitter: "RossC0", } > checkin_1 = { location: location_id, user: user_id, ts: ISODate("05-04-2012") } > db.checkins.find({user: user_id})
Simple Stats // find all users who've checked in here: > loc = db.locations.findone({"name":"holiday Inn"}) > checkins = db.checkins.find({location: loc._id}) > u_ids = checkins.map(function(c){return c.user}) > users = db.users.find({_id: {$in: u_ids}}) // find the last 10 checkins here: > db.checkins.find({location: loc._id}).sort({ts: -1}).limit(10) // count how many checked in today: > db.checkins.find({location: loc._id, ts: {$gt: midnight}} ).count()
Map Reduce // Find most popular locations > map_func = function() { emit(this.location, 1); } > reduce_func = function(key, values) { return Array.sum(values); } > db.checkins.mapreduce(map_func, reduce_func, {query: {ts: {$gt: now_minus_3_hrs}}, out: "result"}) > db.result.findone() {"_id": "Holiday Inn", "value" : 99}
Aggregation // Find most popular locations > agg = db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", value: {$sum: 1}}} ) > agg.result [{"_id": "Holiday Inn", "value" : 17}]
Deployment Single server - need a strong backup plan P
Deployment Single server - need a strong backup plan Replica sets - High availability - Automatic failover P P S S
Deployment Single server - need a strong backup plan Replica sets - High availability - Automatic failover P P S S Sharded - Horizontally scale - Auto balancing P S S P S S
MongoDB Use Cases Archiving Content Management Ecommerce Finance Gaming Government Metadata Storage News & Media Online Advertising Online Collaboration Real-time stats/analytics Social Networks Telecommunications
Any Questions?
download at mongodb.org conferences, appearances, and meetups http://www.10gen.com/events Facebook Twitter LinkedIn http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo support, training, and this talk brought to you by