In Memory Accelerator for MongoDB



Similar documents
In-Memory BigData. Summer 2012, Technology Overview

MongoDB Developer and Administrator Certification Course Agenda

GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

MongoDB and Couchbase

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Sharding and MongoDB. Release MongoDB, Inc.

Sharding and MongoDB. Release MongoDB, Inc.

Bigdata High Availability (HA) Architecture

Cloud Based Application Architectures using Smart Computing

Hadoop IST 734 SS CHUNG

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010


NoSQL and Hadoop Technologies On Oracle Cloud

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Can the Elephants Handle the NoSQL Onslaught?

A survey of big data architectures for handling massive data

The Sierra Clustered Database Engine, the technology at the heart of

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Ground up Introduction to In-Memory Data (Grids)

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Data Management in the Cloud

Hadoop Architecture. Part 1

Table Of Contents. 1. GridGain In-Memory Database

Hadoop and Map-Reduce. Swati Gore

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

Practical Cassandra. Vitalii

Apache Hadoop. Alexandru Costan

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Daniel J. Adabi. Workshop presentation by Lukas Probst

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Adding Indirection Enhances Functionality

Hadoop: Embracing future hardware

Mark Bennett. Search and the Virtual Machine

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Distributed File Systems

An Approach to Implement Map Reduce with NoSQL Databases

Diagram 1: Islands of storage across a digital broadcast workflow

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

JBoss & Infinispan open source data grids for the cloud era

GraySort on Apache Spark by Databricks

High Availability Solutions for the MariaDB and MySQL Database

CitusDB Architecture for Real-Time Big Data

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Using RDBMS, NoSQL or Hadoop?

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

A Performance Analysis of Distributed Indexing using Terrier

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Project Convergence: Integrating Data Grids and Compute Grids. Eugene Steinberg, CTO Grid Dynamics May, 2008

IN-MEMORY DATA FABRIC: Data Grid

HDFS Users Guide. Table of contents

Unified Big Data Analytics Pipeline. 连 城

Extending Hadoop beyond MapReduce

PTC System Monitor Solution Training

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

HGST Virident Solutions 2.0

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

Liferay Performance Tuning

InfiniteGraph: The Distributed Graph Database

Transactions and ACID in MongoDB

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Hypertable Goes Realtime at Baidu. Yang Dong Sherlock Yang(

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

NoSQL in der Cloud Why? Andreas Hartmann

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Managing MySQL Scale Through Consolidation

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

White Paper. Optimizing the Performance Of MySQL Cluster

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

MySQL Cluster New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper October 2010

Data Center Performance Insurance

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Hadoop Distributed File System (HDFS) Overview

Overview: X5 Generation Database Machines

Unified Big Data Processing with Apache Spark. Matei

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

IBM WebSphere Distributed Caching Products

Transcription:

In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems

GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000 starts globally 2

Agenda GridGain Technology Overview Why In Memory Accelerator For MongoDB? How to Accelerate Feature And Performance Comparison QA 3

GridGain: Complete In Memory Stack 4

GridGain: Complete In Memory Stack 5

GridGain: Complete In Memory Stack 6

GridGain: Complete In Memory Stack 7

GridGain: Complete In Memory Stack 8

GridGain: In Memory DataBase Local, Replicated, Partitioned MVCC Based Concurrency HyperLocking Support Off Heap Memory Support Write Behind Cache ACID Transactions Pluggable Data Indexing SQL and Lucene Querying, custom SQL Functions 9

GridGain: In Memory High Performance Computing Affinity Collocation Zero Deployment Fault Tolerance Load Balancing Collision Resolution Job Checkpointing Distributed Continuations 10

In Memory Accelerator For MongoDB: Disclaimer We Do Not Compete with MongoDB MongoDB is Very Popular Product And Constantly Growing We Like the API and Features 11

In Memory Accelerator For MongoDB: Why? But In Some Cases Mongo Does Not Work Database Level Read/Write Locks Disk IO Complex Sharded Deployment Hardly Scalable 12

In Memory Accelerator For MongoDB: Why? What Can Be Done to Address These Problems? 13

In Memory Accelerator For MongoDB: Idea Implement Server Supporting The MongoDB Wire Protocol Store Documents In IMDB In PARTITIONED Mode Store Metadata In IMDB In REPLICATED Mode Server(s) Routes Queries To Nodes, Aggregates And Sends Results Back 14

In Memory Accelerator For MongoDB: Goals Up to 10x Better Performance Allow Easier Horizontal Scalability to 1,000+ Nodes Allow Vertical Scalability to Terabytes Of RAM Fully Concurrent In Memory Processing Efficient & Balanced Memory Management Transparent Background Repartitioning Embedded Mongo API for In Memory Computing 15

MongoDB: Sample Use Case Sample Deployment 5 Shards, 1 TB Each 3 Replicas in Each Shard Problem Statement Small Portion of 1TB is in Memory on Each Server Slower Performance Due to Constant Paging Slower Performance Due to Lack of Concurrency Queries Span Multiple Shards Some Queries Take Minutes 15 Servers Total, 3 Config Servers 16

MongoDB: Sharding Pros Some Parallelism on Writes Bigger Throughput Cons Writes are Still Not Fully Parallel Query Router Does More Work Complex Deployment Load May Not Be Even Across Shards 17

MongoDB: Replica Sets Pros Fault Tolerant If Primary Fails, Secondary Becomes Primary Queries Can Run in Parallel Cons Excessive Memory Paging Difficult to Fit the Whole Dataset In Memory Writes are Not Concurrent Data is Not Hot on Servers 18

In Memory Accelerator For MongoDB: Plug N Play Integration Plug N Play Integration Zero Code Change Any Mongo Client Full MongoDB Operation Support Aggregation Framework Supported Distributed Sorting and Grouping Algorithms 19

In Memory Mongo DB Accelerator: Memory First Optimized For Memory First vs Disk First Remove Disk I/O Remove Memory Paging On Heap and Off Heap Document Caching Direct Memory Access vs Block Data Access 20

In Memory Accelerator For MongoDB: Concurrency And Indexing Fully Concurrent No Global Write Locks All Writes and Reads Happen Concurrently Contention Free MVCC based Approach Comprehensive Document Indexing Fully Concurrent Indexes Based on SnapTrees On Heap and Off Heap Indexing 21

In Memory Accelerator For MongoDB: Data Partitioning No Sharding Data is Split Into Ranges, Not Shards No Need for Query Router mongos Data Is Evenly Distributed Load Is Evenly Distributed Backups For Redundancy And Querying Writes and Reads are Fully Concurrent 22

In Memory Accelerator For MongoDB: Operation Modes & Memory Management Dual Operation Mode PRIMARY Only In Memory PROXY Only In Native MongoDB DUAL Sync Or Async Writes to MongoDB Efficient Memory Management Pick DBs Or Collections to Store in Memory Custom Commands to Load and Unload Data Field Name Compaction 23

In Memory Accelerator For MongoDB: Embedded Mode Embedded Mode for JVM Remove Unnecessary Network Hops Gain Another Performance Increase Same API as for MongoDB Java Driver Java Based In Memory MapReduce Extra APIs for Better Affinity Collocation Logic and Data in the Same Memory Space 24

In Memory Accelerator For MongoDB: Sample Use Case Need to Accelerate? How to Accelerate? Handle Large Amount Of Queries In Real Time Utilize Existing Servers GridGain Node On Every Server Handle 3 5 Terabytes Of Data Only Put Highly Utilized Collections In Memory Updates Are As Frequent As Reads Data Is Fully In Memory No Memory Paging Even Load And Data Distribution Real Time Repartitioning Better Performance For Queries Under 1 second SLAs 25

Comparison: Scalability GridGain s In Memory Accelerator MongoDB Memory First Disk First Scales to 1000s Of Nodes Best on Under 40 Nodes Simple Unified Deployment Complex Sharded Deployment Data and Load are Evenly Distributed Data and Load Distribution are Uneven Per Document Redundancy Per Server Redundancy Elastically Add and Remove Nodes Adding and Removing Nodes is Inefficient 26

Comparison: Concurrency GridGain s In Memory Accelerator MongoDB No Global Write Locks Database Wide Write Locks Fully Concurrent Writes Writes Are Sequential Reads Are Concurrent With Writes Reads Must Wait For Writes 27

Comparison: Memory Management GridGain s In Memory Accelerator MongoDB Field Name Compaction No Field Name Compaction Data is Always In Memory Only Small Subset of Data is in Memory No Memory Paging Constant Paging In and Out Of Memory Data is Always Hot Secondaries Do Not Keep Hot Data in RAM Faster Querying Queries Are Slow When Hitting Disk Partial Indexes No Partial Indexes 28

In Memory Accelerator For MongoDB Questions and Answers Private beta: http:// Follow GridGain: @gridgain 30