Big Data & Data Science Course Example using MapReduce Presented by
What is Mongo? Why Mongo? Mongo Model Mongo Deployment Mongo Query Language Built-In MapReduce Demo Q & A Agenda
Founders Max Schireson Eliot Horowitz
What is Mongo?
What is Mongo?
What is Mongo? Document Oriented Objects map to programming language types Embedded documents & arrays reduce need for joins No joins and no multi-document transactions (increase performance) High Performance No joins & embedding makes reads/writes fast Indexes including indexing of keys embedded
What is Mongo? High Availability Replicated servers with automatic master failover Easy Scalability Auto Sharding (data partitioning across servers) Eventually-consistent reads distributed over replicated servers
Why Mongo?
Why Mongo? MongoDB focuses on four main things: Flexibility Power Speed Ease of Use
Flexibility Why Mongo? Data stored in JSON documents (serialized to BSON) Schema-less Maps to native programming languages ERDs are not governing the design (like RDBMS)
Why Mongo? Power Supports secondary indexes Dynamic Queries Sorting Rich Updates Easy Aggregations Upserts update if document exists, insert if it doesn t
Speed/Scaling Why Mongo? Related data kept together in documents No need for joining various tables Auto-sharding allow for scaling clusters linearly Can increase capacity with No Downtime
Ease of Use Why Mongo? Very easy to install, configure, maintain, and use Very few configuration options Works right out of the box No need for fine-tuning obscure database configurations
Mongo Model
Mongo Model Design Philosophy New database technologies are needed to facilitate horizontal scaling of the data layer, easier development, and the ability to store order(s) of magnitude more data than was used in the past. A non-relational approach is the best path to database solutions which scale horizontally to many machines. It is unacceptable if these new technologies make writing applications harder. Writing code should be faster, easier, and more agile.
Mongo Model Design Philosophy continued The document data model (JSON/BSON) is easy to code to, easy to manage(schema-less), and yields excellent performance by grouping relevant data together internally. It is important to keep deep functionality to keep programming fast and simple. While some things must be left out, keep as much as possible for example secondary indexes, unique key constraints, atomic operations, multidocument updates. Database technology should run anywhere, being available both for running on your own servers or VMs, and also as a cloud pay-for-what-you-use service.
Mongo Model
Mongo Deployment
Mongo Deployment Download the Zip File 32-Bit or 64-Bit (recommended) Unzip the Download Create a Data Directory
Mongo Deployment Run & Connect to the Server That s It!
Mongo Query Language
Mongo Query Language
Mongo Query Language
Mongo Query Language
Mongo Query Language
Built-In MapReduce
MapReduce Part of the Aggregation Functionality Similar to Select Many and Group By results Two-Phase Approach: Map Phase processes each document and emits one or more objects for each document Reduce Phase combines the output of the map operation Finalize (optional) used to make final modifications to the output
MapReduce Data processing paradigm for condensing large volumes of data into useful aggregated results.
MapReduce
Demo
Problem to Solve: MapReduce { Recipe Title" : Hamburger", "content" : "...", tags" : [ Fast Food", Beef", Grab-N-Go"] } We want to end up with a "tags" collection that has documents that look like this: {"_id" : Fast Food", "value" : 4} {"_id" : Beef", "value" : 2} {"_id" : Grab-N-Go", "value" : 7} {"_id" : "Group", "value" : 1}
MapReduce Step 1 - MAP: map = function() { if (!this.category) { return; } } for (index in this.category) { emit(this.category[index], 1); }
MapReduce Step 2 - REDUCE: reduce = function(previous, current) { var count = 0; for (index in current) { count += current[index]; } } return count;
MapReduce Final Execute / Get Results: result = db.runcommand( {"mapreduce" : "recipes", "map" : map, "reduce" : reduce, "out" : "cats"})
Q & A
Credits http://www.mongodb.com