A Journey through the MongoDB Internals?



Similar documents
File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Virtuoso and Database Scalability

In Memory Accelerator for MongoDB

Open Mic on IBM Notes Traveler Best Practices. Date: 11 July, 2013

Installing Sun's VirtualBox on Windows XP and setting up an Ubuntu VM

DMS Performance Tuning Guide for SQL Server

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

Tech Tip: Understanding Server Memory Counters

Oracle to SQL Server 2005 Migration

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Topics in Computer System Performance and Reliability: Storage Systems!

The Classical Architecture. Storage 1 / 36

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

ZooKeeper Administrator's Guide

COS 318: Operating Systems. Virtual Memory and Address Translation

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface

Ground up Introduction to In-Memory Data (Grids)

From 1000/day to 1000/sec

Scaling with MongoDB. by Michael Schurter Scaling with MongoDB by Michael Schurter - OS Bridge,

Hypertable Architecture Overview

This document will list the ManageEngine Applications Manager best practices

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

How To Make Your Database More Efficient By Virtualizing It On A Server

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Outline. Failure Types

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Transactions and ACID in MongoDB

Update: About Apple RAID Version 1.5 About this update

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

e-config Data Migration Guidelines Version 1.1 Author: e-config Team Owner: e-config Team

Microsoft SQL Server Guide. Best Practices and Backup Procedures

File Systems Management and Examples

Best Practices for Architecting Storage in Virtualized Environments

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Local File Systems in the Cloud. Michael Rubin

HDFS Architecture Guide

Pushing the Limits of Windows: Physical Memory Mark Russinovich (From Mark Russinovich Blog)

Azure VM Performance Considerations Running SQL Server

This talk is mostly about Data Center Replication, but along the way we'll have to talk about why you'd want transactionality arnd the Low-Level API.

Analyzing & Optimizing T-SQL Query Performance Part1: using SET and DBCC. Kevin Kline Senior Product Architect for SQL Server Quest Software

User Installation Guide

Storage Architectures for Big Data in the Cloud

DATABASE. Pervasive PSQL Performance. Key Performance Features of Pervasive PSQL. Pervasive PSQL White Paper

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Snapshot Point-in-time Copy in a Flash

BEST PRACTICES GUIDE: VMware on Nimble Storage

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Install Instructions and Deployment Options

CS 153 Design of Operating Systems Spring 2015

NoSQL web apps. w/ MongoDB, Node.js, AngularJS. Dr. Gerd Jungbluth, NoSQL UG Cologne,

MySQL Cluster New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

PCVITA Express Migrator for SharePoint(Exchange Public Folder) Table of Contents

Using the SQL Server Linked Server Capability

Raima Database Manager Version 14.0 In-memory Database Engine

Filesystems Performance in GNU/Linux Multi-Disk Data Storage

Investigating the Use of Virtual Servers to Improve the Restoration Process of an Active Directory Forest

High Frequency Trading and NoSQL. Peter Lawrey CEO, Principal Consultant Higher Frequency Trading

Cognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Installing FileMaker Pro 11 in Windows

Effects of VC RunTimes in App-V 5 SP2 with HotFix 4 Deployment Performance

TORNADO ONLINE BACKUP ADMINISTRATOR S GUIDE

TECHNICAL BRIEF. Primary Storage Compression with Storage Foundation 6.0

Distributed File Systems

DataBlitz Main Memory DataBase System

File System Management

The Google File System

Recover EDB and Export Exchange Database to PST 2010

SAPIP GUI INSTALLATION. Table of Contents

Rational Application Developer Performance Tips Introduction

In and Out of the PostgreSQL Shared Buffer Cache

Web Security Log Server Error Reference

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Chapter 11: File System Implementation. Operating System Concepts 8 th Edition

HADOOP PERFORMANCE TUNING

Can the Elephants Handle the NoSQL Onslaught?

Installation, Report Environment Set-up and DB Maintenance

Configuring Apache Derby for Performance and Durability Olav Sandstå

Whitepaper: performance of SqlBulkCopy

Local Caching Servers (LCS): User Manual

I. General Database Server Performance Information. Knowledge Base Article. Database Server Performance Best Practices Guide

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

About Parallels Desktop 10 for Mac

Update on filesystems for flash storage

Chapter 15 Windows Operating Systems

Lecture 16: Storage Devices

VMWare Workstation 11 Installation MICROSOFT WINDOWS SERVER 2008 R2 STANDARD ENTERPRISE ED.

Benchmarking FreeBSD. Ivan Voras

4 TECHNICAL INFORMATION. 4.1 Software and Hardware Requirements. 4.2 Installing the Rivers Database application

Chapter 12 File Management

There and Back Again As Quick As a Flash

How to choose High Availability solutions for MySQL MySQL UC 2010 Yves Trudeau Read by Peter Zaitsev. Percona Inc MySQLPerformanceBlog.

Understanding Disk Storage in Tivoli Storage Manager

Configuring Apache Derby for Performance and Durability Olav Sandstå

Installation Instruction STATISTICA Enterprise Small Business

CMS Performance Tuning Guide

Transcription:

#nosql13 A Journey through the MongoDB Internals? Christian Amor Kvalheim Engineering Team lead, MonboDB Inc, @christkv

Who Am I Driver Engineering lead at MongoDB Inc Work on the Mongo DB Node.js driver Started working for MongoDB Inc June 2011 @christkv christkv@mongodb.com

Why Pop the Hood? Understanding data safety Estimating RAM / disk requirements Optimizing performance

Storage Layout

Directory Layout drwxr-xr-x 136 Nov 19 10:12 journal -rw------- 16777216 Oct 25 14:58 test.0 -rw------- 134217728 Mar 13 2012 test.1 -rw------- 268435456 Mar 13 2012 test.2 -rw------- 536870912 May 11 2012 test.3 -rw------- 1073741824 May 11 2012 test.4 -rw------- 2146435072 Nov 19 10:14 test.5 -rw------- 16777216 Nov 19 10:13 test.ns

Directory Layout Aggressive pre-allocation (always 1 spare file) There is one namespace file per db which can hold 18000 entries per default A namespace is a collection or an index

Tuning with Options Use --directoryperdb to separate dbs into own folders which allows to use different volumes (isolation, performance) Use --smallfiles to keep data files smaller If using many databases, use nopreallocate and -- smallfiles to reduce storage size If using thousands of collections & indexes, increase namespace capacity with --nssize

Internal Structure

Internal File Format

Extent Structure

Extents and Records Journaling and Storage

To Sum Up: Internal File Format Files on disk are broken into extents which contain the documents A collection has one or more extents Extent grow exponentially up to 2GB Namespace entries in the ns file point to the first extent for that collection

What About Indexes?

Indexes Indexes are BTree structures serialized to disk They are stored in the same files as data but using own extents

The DB Stats > db.stats() { "db" : "test", "collections" : 22, "objects" : 17000383, ## number of documents "avgobjsize" : 44.33690276272011, "datasize" : 753744328, ## size of data "storagesize" : 1159569408, ## size of all containing extents "numextents" : 81, "indexes" : 85, "indexsize" : 624204896, ## separate index storage size "filesize" : 4176478208, ## size of data files on disk "nssizemb" : 16, "ok" : 1 }

The Collection Stats > db.large.stats() { "ns" : "test.large", "count" : 5000000, ## number of documents "size" : 280000024, ## size of data "avgobjsize" : 56.0000048, "storagesize" : 409206784, ## size of all containing extents "numextents" : 18, "nindexes" : 1, "lastextentsize" : 74846208, "paddingfactor" : 1, ## amount of padding "systemflags" : 0, "userflags" : 0, "totalindexsize" : 162228192, ## separate index storage size "indexsizes" : { "_id_" : 162228192 }, "ok" : 1 }

What s Memory Mapping?

Memory Mapped Files All data files are memory mapped to Virtual Memory by the OS MongoDB just reads / writes to RAM in the filesystem cache OS takes care of the rest! Virtual process size = total files size + overhead (connections, heap) If journal is on, the virtual size will be roughly doubled

Virtual Address Space

Memory Map, Love It or Hate It Pros: No complex memory / disk code in MongoDB, huge win! The OS is very good at caching for any type of storage Least Recently Used behavior Cache stays warm across MongoDB restarts Cons: RAM usage is affected by disk fragmentation RAM usage is affected by high read-ahead LRU behavior does not prioritize things (like indexes)

How Much Data is in RAM? Resident memory the best indicator of how much data in RAM Resident is: process overhead (connections, heap) + FS pages in RAM that were accessed Means that it resets to 0 upon restart even though data is still in RAM due to FS cache Use free command to check on FS cache size Can be affected by fragmentation and read-ahead

Journaling

The Problem Changes in memory mapped files are not applied in order and different parts of the file can be from different points in time! You want a consistent point-in-time snapshot when restarting after a crash

?

corruption

Solution Use a Journal Data gets written to a journal before making it to the data files Operations written to a journal buffer in RAM that gets flushed every 100ms by default or 100MB Once the journal is written to disk, the data is safe Journal prevents corruption and allows durability Can be turned off, but don t!

Journal Format

Can I Lose Data on a Hard Crash? Maximum data loss is 100ms (journal flush). This can be reduced with journalcommitinterval For durability (data is on disk when ack ed) use the JOURNAL_SAFE write concern ( j option). Note that replication can reduce the data loss further. Use the REPLICAS_SAFE write concern ( w option). As write guarantees increase, latency increases. To maintain performance, use more connections!

What is the Cost of a Journal? On read-heavy systems, no impact Write performance is reduced by 5-30% If using separate drive for journal, as low as 3% For apps that are write-heavy (1000+ writes per server) there can be slowdown due to mix of journal and data flushes. Use a separate drive!

Fragmentation

What it Looks Like Both on disk and in RAM!

Fragmentation Files can get fragmented over time if remove() and update() are issued. It gets worse if documents have varied sizes Fragmentation wastes disk space and RAM Also makes writes scattered and slower Fragmentation can be checked by comparing size to storagesize in the collection s stats.

How to Combat Fragmentation compact command (maintenance op) Normalize schema more (documents don t grow) Pre-pad documents (documents don t grow) Use separate collections over time, then use collection.drop() instead of collection.remove(query) --usepowerof2sizes option makes disk buckets more reusable

In Review

In Review Understand disk layout and footprint See how much data is actually in RAM Memory mapping is cool Answer how much data is ok to lose Check on fragmentation and avoid it https://github.com/10gen-labs/storage-viz

#nosql13 Questions? Christian Amor Kvalheim Engineering Team lead, MongoDB Inc, @christkv