Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

Similar documents
MongoDB Developer and Administrator Certification Course Agenda

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

MongoDB: document-oriented database

Can the Elephants Handle the NoSQL Onslaught?

L7_L10. MongoDB. Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD.

The MongoDB Tutorial Introduction for MySQL Users. Stephane Combaudon April 1st, 2014

Department of Software Systems. Presenter: Saira Shaheen, Dated:

Getting Started with MongoDB

MongoDB. Or how I learned to stop worrying and love the database. Mathias Stearn. N*SQL Berlin October 22th, gen

Introduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University

.NET User Group Bern

NoSQL web apps. w/ MongoDB, Node.js, AngularJS. Dr. Gerd Jungbluth, NoSQL UG Cologne,

MongoDB. The Definitive Guide to. The NoSQL Database for Cloud and Desktop Computing. Apress8. Eelco Plugge, Peter Membrey and Tim Hawkins

NoSQL in der Cloud Why? Andreas Hartmann

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

MongoDB and Couchbase

HO5604 Deploying MongoDB. A Scalable, Distributed Database with SUSE Cloud. Alejandro Bonilla. Sales Engineer abonilla@suse.com

Humongous MongoDB. Sean Corfield World Singles llc

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Cloud Scale Distributed Data Storage. Jürmo Mehine

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

Document Oriented Database

In Memory Accelerator for MongoDB

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

Please ask questions! Have people used non-relational dbs before? MongoDB?

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Building Your First MongoDB Application

Big Data Visualization with JReport

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

MakeMyTrip CUSTOMER SUCCESS STORY

Ad Hoc Analysis of Big Data Visualization

Structured Data Storage

Embedded Analytics & Big Data Visualization in Any App

Certified MongoDB Professional VS-1058

SURVEY ON MONGODB: AN OPEN- SOURCE DOCUMENT DATABASE

Understanding NoSQL Technologies on Windows Azure

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Application of NoSQL Database in Web Crawling

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

How To Compare The Economics Of A Database To A Microsoft Database

Big Data Visualization and Dashboards

CSCC09F Programming on the Web. Mongo DB

Open source, high performance database

Hybrid Solutions Combining In-Memory & SSD

An Approach to Implement Map Reduce with NoSQL Databases

MONGODB - THE NOSQL DATABASE

NoSQL and Hadoop Technologies On Oracle Cloud

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Scaling Database Performance in Azure

Domain driven design, NoSQL and multi-model databases

NoSQL Databases. Nikos Parlavantzas

Scaling with MongoDB. by Michael Schurter Scaling with MongoDB by Michael Schurter - OS Bridge,

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

A Performance Analysis of Distributed Indexing using Terrier

Integrating Big Data into the Computing Curricula

MongoDB Aggregation and Data Processing Release 3.0.4

MongoDB Aggregation and Data Processing

Sharding and MongoDB. Release MongoDB, Inc.

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

InfiniteGraph: The Distributed Graph Database

Dr. Chuck Cartledge. 15 Oct. 2015

MongoDB. An introduction and performance analysis. Seminar Thesis

Distributed Databases

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

How graph databases started the multi-model revolution

A Total Cost of Ownership Comparison of MongoDB & Oracle

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 30,

How To Use Big Data For Telco (For A Telco)

Frictionless Persistence in.net with MongoDB. Mogens Heller Grabe Trifork

DYNAMIC QUERY FORMS WITH NoSQL

ADVANCED DATABASES PROJECT. Juan Manuel Benítez V Gledys Sulbaran

Cloud Server. Parallels. Key Features and Benefits. White Paper.

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

Transactions and ACID in MongoDB

Scalable Architecture on Amazon AWS Cloud

NoSQL Database - mongodb

Sharding and MongoDB. Release MongoDB, Inc.

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Introduction to Big Data Training

Data Modeling for Big Data

MarkLogic 8: Samplestack

Big data and urban mobility

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 16,

Citrix XenDesktop Backups with Xen & Now by SEP

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

How To Handle Big Data With A Data Scientist

How To Scale Out Of A Nosql Database

Time series IoT data ingestion into Cassandra using Kaa

Querying MongoDB without programming using FUNQL

The Transition from RDBMS to NoSQL. A Comparative Analysis of Three Popular Non-Relational Solutions: Cassandra, MongoDB and Couchbase

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

SCM Dashboard Monitoring Code Velocity at the Product / Project / Branch level

nosql and Non Relational Databases

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Transcription:

Big Data & Data Science Course Example using MapReduce Presented by

What is Mongo? Why Mongo? Mongo Model Mongo Deployment Mongo Query Language Built-In MapReduce Demo Q & A Agenda

Founders Max Schireson Eliot Horowitz

What is Mongo?

What is Mongo?

What is Mongo? Document Oriented Objects map to programming language types Embedded documents & arrays reduce need for joins No joins and no multi-document transactions (increase performance) High Performance No joins & embedding makes reads/writes fast Indexes including indexing of keys embedded

What is Mongo? High Availability Replicated servers with automatic master failover Easy Scalability Auto Sharding (data partitioning across servers) Eventually-consistent reads distributed over replicated servers

Why Mongo?

Why Mongo? MongoDB focuses on four main things: Flexibility Power Speed Ease of Use

Flexibility Why Mongo? Data stored in JSON documents (serialized to BSON) Schema-less Maps to native programming languages ERDs are not governing the design (like RDBMS)

Why Mongo? Power Supports secondary indexes Dynamic Queries Sorting Rich Updates Easy Aggregations Upserts update if document exists, insert if it doesn t

Speed/Scaling Why Mongo? Related data kept together in documents No need for joining various tables Auto-sharding allow for scaling clusters linearly Can increase capacity with No Downtime

Ease of Use Why Mongo? Very easy to install, configure, maintain, and use Very few configuration options Works right out of the box No need for fine-tuning obscure database configurations

Mongo Model

Mongo Model Design Philosophy New database technologies are needed to facilitate horizontal scaling of the data layer, easier development, and the ability to store order(s) of magnitude more data than was used in the past. A non-relational approach is the best path to database solutions which scale horizontally to many machines. It is unacceptable if these new technologies make writing applications harder. Writing code should be faster, easier, and more agile.

Mongo Model Design Philosophy continued The document data model (JSON/BSON) is easy to code to, easy to manage(schema-less), and yields excellent performance by grouping relevant data together internally. It is important to keep deep functionality to keep programming fast and simple. While some things must be left out, keep as much as possible for example secondary indexes, unique key constraints, atomic operations, multidocument updates. Database technology should run anywhere, being available both for running on your own servers or VMs, and also as a cloud pay-for-what-you-use service.

Mongo Model

Mongo Deployment

Mongo Deployment Download the Zip File 32-Bit or 64-Bit (recommended) Unzip the Download Create a Data Directory

Mongo Deployment Run & Connect to the Server That s It!

Mongo Query Language

Mongo Query Language

Mongo Query Language

Mongo Query Language

Mongo Query Language

Built-In MapReduce

MapReduce Part of the Aggregation Functionality Similar to Select Many and Group By results Two-Phase Approach: Map Phase processes each document and emits one or more objects for each document Reduce Phase combines the output of the map operation Finalize (optional) used to make final modifications to the output

MapReduce Data processing paradigm for condensing large volumes of data into useful aggregated results.

MapReduce

Demo

Problem to Solve: MapReduce { Recipe Title" : Hamburger", "content" : "...", tags" : [ Fast Food", Beef", Grab-N-Go"] } We want to end up with a "tags" collection that has documents that look like this: {"_id" : Fast Food", "value" : 4} {"_id" : Beef", "value" : 2} {"_id" : Grab-N-Go", "value" : 7} {"_id" : "Group", "value" : 1}

MapReduce Step 1 - MAP: map = function() { if (!this.category) { return; } } for (index in this.category) { emit(this.category[index], 1); }

MapReduce Step 2 - REDUCE: reduce = function(previous, current) { var count = 0; for (index in current) { count += current[index]; } } return count;

MapReduce Final Execute / Get Results: result = db.runcommand( {"mapreduce" : "recipes", "map" : map, "reduce" : reduce, "out" : "cats"})

Q & A

Credits http://www.mongodb.com