.NET User Group Bern Roger Rudin bbv Software Services AG roger.rudin@bbv.ch
Agenda What is NoSQL Understanding the Motivation behind NoSQL MongoDB: A Document Oriented Database NoSQL Use Cases
What is NoSQL? NoSQL = Not only SQL
NoSQL Definition http://nosql-database.org/ NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent /BASE (not ACID), a huge data amount, and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.
Who Uses NoSQL? Twitter uses DBFlock/MySQL and Cassandra Cassandra is an open source project from Facebook Digg, Reddit use Cassandra bit.ly, foursquare, sourceforge, and New York Times use MongoDB Adobe, Alibaba, Ebay, use Hadoop
UNDERSTANDING THE MOTIVATION BEHIND NOSQL
Why SQL sucks.. O/R mapping (also known as Impedance Mismatch) Data-Model changes are hard and expensive SQL database are designed for high throughput, not low latency SQL Databases do no scale out well Microsoft, Oracle, and IBM charge big bucks for databases And then you need to hire a database admin Take it from the context of Google, Twitter, Facebook and Amazon. Your databases are among the biggest in the world and nobody pays you for that feature Wasting profit!!!
What has NoSQL done? Implemented the most common use cases as a piece of software Designed for scalability and performance
Visual Guide To NoSQL http://blog.nahurst.com/visual-guide-to-nosql-systems
NoSQL Data Models Key-Value Document-Oriented Column Oriented/Tabular
MONGODB: A DOCUMENT ORIENTED DATABASE
NoSQL Data Model: Document Oriented Data is stored as documents We are not talking about Word documents Comparable to Aggregates in DDD It means mostly schema free structured data Can be queried Is easily mapped to OO systems (Domain Model, DDD) No join need to implement via programming
Network Communications REST/JSON TCP/BSON (ClientDriver) BSON [bee sahn], short for Bin ary JSON, is a bin aryen coded seri al iz a tion of JSON-like doc u ments. Like JSON, BSON sup ports the em bed ding of doc u ments and ar rays with in oth er doc u ments and ar rays. BSON also con tains ex ten sions that al low rep res ent a tion of data types that are not part of the JSON spec. For ex ample, BSON has a Date type and a BinData type.
Client Drivers (Apache License) MongoDB currently has client support for the following programming languages: C C++ Erlang Haskell Java Javascript.NET (C# F#, PowerShell, etc) Perl PHP Python Ruby Scala
Collections vs. Capped Collection (Table in SQL) Collections blog.posts blog.comments forum.users etc. Capped collections (ring buffer) Logging Caching Archiving db.createcollection("log", {capped: true, size: <bytes>, max: <docs>});
Indexes Every field in the document can be indexed Simple Indexes: db.cities.ensureindex({city: 1}); Compound indexes: db.cities.ensureindex({city: 1, zip: 1}); Unique indexes: db.cities.ensureindex({city: 1, zip: 1}, {unique: true}); Sort order: 1 = descending, -1 = ascending
Relations ObjectId db.users.insert( {name: "Umbert", car_id: ObjectId("<GUID>")}); DBRef db.users.insert( {name: "Umbert", car: new DBRef("cars, ObjectId("<GUID>")}); db.users.findone( {name: "Umbert"}).car.fetch().name;
Queries (1)
Queries (Regular Expressions) {field: /regular.*expression/i} // get all cities that start with atl and end on a (e.g. atlanta) db.cities.count({city: /atl.*a/i});
Queries (2) : LINQ https://github.com/craiggwilson/fluent-mongo Equals x => x.age == 21 will translate to {"Age": 21} Greater Than, $gt: x => x.age > 18 will translate to {"Age": {$gt: 18}} Greater Than Or Equal, $gte: x => x.age >= 18 will translate to {"Age": {$gte: 18}} Less Than, $lt: x => x.age < 18 will translate to {"Age": {$lt: 18}} Less Than Or Equal, $lte: x => x.age <= 18 will translate to {"Age": {$lte: 18}} Not Equal, $ne: x => x.age!= 18 will translate to {"Age": {$ne: 18}}
Atomic Operations (Optimistic Locking) Update if current: Fetch the object. Modify the object locally. Send an update request that says "update the object to this new value if it still matches its old value".
Atomic Operations: Sample > t=db.inventory > s = t.findone({sku:'abc'}) {"_id" : "49df4d3c9664d32c73ea865a", "sku" : "abc", "qty" : 1} > t.update({sku:"abc",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ; > db.$cmd.findone({getlasterror:1}) {"err" :, "updatedexisting" : true, "n" : 1, "ok" : 1} // it has worked > t.update({sku:"abcz",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ; >db.$cmd.findone({getlasterror:1}) {"err" :, "updatedexisting" : false, "n" : 0, "ok" : 1} // did not work
Atomic Operations: multiple items db.products.update( {cat: boots, $atomic: 1}, {$inc: {price: 10.0}}, false, //no upsert true //update multiple );
Replica set (1) Automatic failover Automatic recovery of servers that were offline Distribution over more than one Datacenter Automatic nomination of a new Master Server in case of a failure Up to 7 server in one replica set
Replica set (2) ReplicaSet PRIMARY RECOVERING PRIMARY DOWN
Mongo Sharding Partitioning data across multiple physical servers to provide application scale-out Can distribute databases, collections or objects in a collection Choose how you partition data (shardkey) Balancing, migrations, management all automatic Range based Can convert from single master to sharded system with 0 downtime Often works in conjunction with object replication (failover)
Sharding-Cluster
Map Reduce http://www.joelonsoftware.com/items/2006/08/01.html It is a two step calculation where one step is used to simplify the data, and the second step is used to summarize the data
Map Reduce Sample
Map Reduce using LINQ https://github.com/craiggwilson/fluent-mongo/wiki/map-reduce LINQ is by far an easier way to compose map-reduce functions. // Compose a map reduce to get the sum everyone's ages. var sum = collection.asqueryable().sum(x => x.age); // Compose a map reduce to get the age range of everyone grouped by the first letter of their last name. var ageranges = from p in collection.asqueryable() group p by p.lastname[0] into g select new { FirstLetter = g.key, AverageAge = g.average(x => x.age), MinAge = g.min(x => x.age), MaxAge = g.max(x => x.age) };
Store large Files: GridFS The database supports native storage of binary data within BSON objects (limited in size 4 16 MB). GridFS is a specification for storing large files in MongoDB Comparable to Amazon S3 online storage service when using it in combination with replication and sharding
Performance On MySql, SourceForge was reaching its limits of performance at its current user load. Using some of the easy scale-out options in MongoDB, they fully replaced MySQL and found MongoDB could handle the current user load easily. In fact, after some testing, they found their site can now handle 100 times the number of users it currently supports. It means you can charge a lot less per user of your application and get the same revenue. Think about it.
Performance http://www.michaelckennedy.net/blog/2010/04/29/mongodbvssqlserver2008performanceshowdown.aspx It s the inserts where the differences are most obvious between MongoDB and SQL Server (about 30x-50x faster than SQL Server)
Administration: MongoVUE (Windows)
Administration: Monitoring MongoDB Monitoring Service
NOSQL USE CASES
Use Cases: Well suited Archiving and event logging Document and Content Management Systems E-Commerce Gaming. High performance small read/writes, geospatial indexes High volume problems Mobile. Specifically, the server-side infrastructure of mobile systems Projects using iterative/agile development methodologies Real-time stats/analytics
Use Cases: Less Well Suited Systems with a heavy emphasis on complex transactions such as banking systems and accounting (multi-object transactions) Traditional Non-Realtime Data Warehousing Problems requiring SQL
Questions? roger.rudin@bbv.ch