Accelerating Big Data: Using SanDisk SSDs for MongoDB Workloads



Similar documents
Accelerating Cassandra Workloads using SanDisk Solid State Drives

SanDisk SSD Boot Storm Testing for Virtual Desktop Infrastructure (VDI)

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Accelerating Microsoft SQL Server 2014 Using Buffer Pool Extension and SanDisk SSDs

Unexpected Power Loss Protection

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

SanDisk and Atlantis Computing Inc. Partner for Software-Defined Storage Solutions

Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs)

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Accelerate MySQL Open Source Databases with SanDisk Non-Volatile Memory File System (NVMFS) and Fusion iomemory SX300 PCIe Application Accelerators

EMC Unified Storage for Microsoft SQL Server 2008

Data Center Solutions

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Benchmarking Cassandra on Violin

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

HP and SanDisk Partner for the HP 3PAR StoreServ 7450 All-flash Array

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation

Data Modeling for Big Data

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

Data Center Storage Solutions

Microsoft SQL Server Acceleration with SanDisk

Data Center Solutions

The Flash Transformed Data Center & the Unlimited Future of Flash John Scaramuzzo Sr. Vice President & General Manager, Enterprise Storage Solutions

Evaluation of Dell PowerEdge VRTX Shared PERC8 in Failover Scenario

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

Identikey Server Performance and Deployment Guide 3.1

Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V. Reference Architecture

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Oracle Acceleration with the SanDisk ION Accelerator Solution

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Improving Microsoft Exchange Performance Using SanDisk Solid State Drives (SSDs)

Intel RAID SSD Cache Controller RCS25ZB040

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Benchmarking Hadoop & HBase on Violin

Accelerate Oracle Backup Using SanDisk Solid State Drives (SSDs)

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Can the Elephants Handle the NoSQL Onslaught?

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

MS Exchange Server Acceleration

EMC VFCACHE ACCELERATES ORACLE

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

Website Hosting Agreement

Enabling the Flash-Transformed Data Center

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

SanDisk Lab Validation: VMware vsphere Swap-to-Host Cache on SanDisk SSDs

SanDisk Solid State Drives (SSDs) for Big Data Analytics Using Hadoop and Hive

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

High Availability of VistA EHR in Cloud. ViSolve Inc. White Paper February

SanDisk s Enterprise-Class SSDs Accelerate ediscovery Access in the Data Center

Accelerating MS SQL Server 2012

An Oracle White Paper June Oracle Database Firewall 5.0 Sizing Best Practices

Oracle Hyperion Financial Management Virtualization Whitepaper

Supreme Court of Italy Improves Oracle Database Performance and I/O Access to Court Proceedings with OCZ s PCIe-based Virtualized Solution

Optimizing SQL Server Storage Performance with the PowerEdge R720

An Oracle White Paper May Oracle Database Cloud Service

Dell One Identity Manager Scalability and Performance

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Total Disaster Recovery in Clustered Storage Servers

INTRODUCTION TO CASSANDRA

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

Advances in Flash Memory Technology & System Architecture to Achieve Savings in Data Center Power and TCO

Boost Oracle Data Warehouse Performance Using SanDisk Solid State Drives (SSDs)

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Fusion iomemory iodrive PCIe Application Accelerator Performance Testing

Vormetric and SanDisk : Encryption-at-Rest for Active Data Sets

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

ORACLE INFRASTRUCTURE AS A SERVICE PRIVATE CLOUD WITH CAPACITY ON DEMAND

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

DataStax Enterprise, powered by Apache Cassandra (TM)

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

An Oracle White Paper July Accelerating Database Infrastructure Using Oracle Real Application Clusters 11g R2 and QLogic FabricCache Adapters

Benchmarking and Analysis of NoSQL Technologies

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

Dell Statistica. Statistica Document Management System (SDMS) Requirements

How To Write An Article On An Hp Appsystem For Spera Hana

ZADARA STORAGE. Managed, hybrid storage EXECUTIVE SUMMARY. Research Brief

Accelerating Server Storage Performance on Lenovo ThinkServer

A survey of big data architectures for handling massive data

Transcription:

WHITE PAPER Accelerating Big Data: Using SanDisk s for MongoDB Workloads December 214 951 SanDisk Drive, Milpitas, CA 9535 214 SanDIsk Corporation. All rights reserved www.sandisk.com

Accelerating Big Data: Using SanDisk s for MongoDB Workloads Table of Contents Executive Summary.... 3 SanDisk CloudSpeed s.... 3 MongoDB.... 3 YCSB benchmark.... 4 Methodology.... 5 Datasets... 5 Test Environment.... 5 Technical Component Specifications.... 5 Testing Configuration Details.... 6 MongoDB Configuration... 6 Test Workloads.... 6 Results Summary.... 7 Update Heavy.... 7 Read Heavy.... 9 Read Only... 1 Conclusion.... 11 References.... 11 2

Accelerating Big Data: Using SanDisk s for MongoDB Workloads Executive Summary In recent years, as Big Data workloads have increased in the data center, NoSQL databases have become more widely used to store and access non-structured data. Examples of these data types include multimedia audio, video and photos, along with data from sensors from the Internet of Things. Solid state drives (s) have shown their value as storage for these NoSQL databases by dramatically improving performance compared to mechanically driven hard-disk drives (s). To prove this advantage for s, SanDisk tested MongoDB databases running on -enabled server platforms. These tests show that s often become the bottleneck in these NoSQL systems. Once the dataset size exceeds the memory capacity of the server, overall performance slows down. In contrast, s improved the performance of the MongoDB NoSQL database workload. Importantly, we also looked at the impact on operational costs associated with data center space utilization, power and cooling costs which decreased when -based deployments were used. SanDisk CloudSpeed s SanDisk, a global leader in flash storage solutions, partners with all the leading storage vendors for meeting IT industry needs of flash-based products. The adoption of cloud computing is driving data growth, leading to an explosion in the volume of data that needs to be processed, stored and analyzed. These demanding Big Data workloads must be supported without compromising on performance, reliability, or longevity. SanDisk CloudSpeed SATA s provide predictable performance and efficiency with superior reliability. Figure 1: SanDisk CloudSpeed SATA s These s are secured by SanDisk s Guardian capabilities of increasing durability, recoverability, and preventing data loss and corruption of data. MongoDB MongoDB is an open-source NoSQL document store database that is used in wide variety of workloads to support Mobility, Cloud Computing, Big Data/Analytics and other enterprise solutions. It provides application schema flexibility, which is not possible with relational databases. The relational databases (RDBMS) have schemas with tables and column attributes that have to be defined initially before loading the data for processing. MongoDB has a rich set of RDBMS functionalities like secondary indexes, query functionality and consistency for database transactions, but it does not require the upfront preparation for data-loading associated with RDBMS. MongoDB provides scalability, performance and high-availability scaling from single server deployments to large, complex multi-site architectures. That gives it a broad range of deployment scenarios. It leverages in-memory computing (IMC) and it provides high performance for both reads and writes. MongoDB s native replication and automated failover features enable enterprise-grade reliability and operational flexibility. 3

Accelerating Big Data: Using SanDisk s for MongoDB Workloads Some of the important MongoDB features include: Data model: JSON data model with dynamic schemas Scalability: auto-shading for horizontal scalability High availability: multiple copies are maintained with native replication Query model: Rich secondary indexes, including geospatial and TTL (Time-To-Live) indexes, aggregation framework and native map reduce Text search YCSB benchmark The Yahoo! Cloud Serving Benchmark (hereafter called YCSB) is a standard benchmark framework for evaluating the performance of new-generation cloud data serving systems like MongoDB, Cassandra, and Apache Hadoop HBase. The framework consists of a workload-generating client and a package of standard workloads. YCSB evaluates the performance and scalability of cloud-based systems, while the performance section of the YCSB benchmark test focuses on measuring the throughput of the system for defined latency (delay in processing due to I/O data transfer). Scalability focuses on the ability to scale elastically, so that these systems can handle more load as applications add more features, or ramp up to support an increased number of business users. The YCSB benchmark also provides workload distribution options based on how real-time applications experience operations being requested of the system, such as insert/update/scan operations acting on a random set of data. YSCB workload distribution options are in two main flavors, as described here: : This option for data-handling is based on assumption that all the records in the database are being uniformly accessed. : This is a statistical approach to handling requests to the database. Based on the assumption that the popularity of the record (e.g., when World Soccer finals in Twitter is trending in popularity) is showing that it is being accessed more often than the other records in the database. Latest: This option is based on the assumption that the latest events are more popular and are being accessed more frequently than the older events that have less frequent access. Along with the workload distribution and the type of database operation being selected, the following workload types are used for this benchmark: Workload Operations Record Selection/Distribution Update Heavy Read: 5% Update: 5% Read Heavy Read: 95% Update:5% Read Only Read: 1% Read Latest Read:95% Insert:5% Latest Figure 2: YCSB Workload 4

Accelerating Big Data: Using SanDisk s for MongoDB Workloads Methodology The following sections describe the methodology that was used as the YCSB benchmark tests were conducted with both -enabled and -enabled servers supporting the MongoDB workload: Datasets The data was loaded to the MongoDB using the load phase of the YSCB benchmark tool. Record Description: Each record consists of 1 character fields, each field 1 bytes long and Key assigned to each record which serves as a primary key. Record Size: 1,24 Bytes MongoDB Dataset Size: 32GB,, Test Environment The benchmark testing environment consists of one Dell PowerEdge R72 server with 24 Intel Xeon cores (two 12-core CPUs) with 96GB RAM used for hosting MongoDB server and one Dell PowerEdge R72 that serves as a client for YCSB benchmark tool. A 1GbE network interconnect is used between the MongoDB server and the YSCB client. The local storage is varied between hard disks (s) and solid-state disks (s). The data set size of the YCSB tests was increased from 32GB, to and to 1 terabyte (). Figure 4 provides complete hardware and software components that were used for this testing environment. Technical Component Specifications YSCB Client Server Parameter file Read/Write Mix Data Set Record Size Mongo Plugin 1GbE MongoDB Server Storage s CloudSpeed s Result Six s Six CloudSpeed s Figure 3: YCSB testing configuration 5

Accelerating Big Data: Using SanDisk s for MongoDB Workloads Testing Configuration Details Hardware Software if applicable Purpose Quantity Dell Inc. PowerEdge R72 CentOS 5.1, 64-bit MongoDB server 1 Intel Xeon E5-262 processor, two sockets, 24 cores (two 12-core processors) 96GB memory MongoDB : 1.2.2 Dell Inc. PowerEdge R72 CentOS 5.1, 64-bit YCSB client 1 Two Intel Xeon E5-262 processor, two sockets, 24 cores (two 12-core processors) 16GB memory YCSB.1.4 Dell PowerConnect 2824 24-port switch 1GbE network switch Data Network 1 5GB 7.2K RPM Dell SATA s Used as Just a bunch of Data node drives 6 disks (JBODs) 48GB CloudSpeed 1 SATA s JBODs Data node drives 6 Figure 4: Infrastructure details MongoDB Configuration MongoDB default configurations were used during the testing phase, and its data path and log path was switched between and for each testing cycle. Test: <MONGO_HOME>/bin/mongod dbpath /sandisk/data/mongodb/data logpath / sandisk/data/mongodb/log Test: <MONGO_HOME>/bin/mongod dbpath /sandisk/data/mongodb/data logpath / sandisk/data/mongodb/log Test Workloads The primary objective of this benchmark test was to identify the advantage of using SanDisk s for a MongoDB NoSQL store, and to provide performance data points for s and s. This benchmark consists of single-node MongoDB database with the standard YCSB benchmark workload types A, B and C, and a plan to test it with three different dataset sizes. The YCSB workload types based on percentage of reads and writes: Workload A: Update Heavy: 5% Update / 5% Read Workload B: Read Heavy: 5% Update / 95% Read Workload C: Read Only: 1% Read Only The YCSB default data size: 1 KB records (1 fields, 1 bytes each, plus key) Size of the data set is 2, key/value pairs The data set types are as follows: In-memory dataset: 32G Disk dataset 1: 256G Disk dataset 2: The YCSB workload distribution types are as follows: 6

Accelerating Big Data: Using SanDisk s for MongoDB Workloads : All database records are uniformly accessed : Some records in the database are accessed more often than other records Results Summary Based on test results, from an operations-per-second perspective, the MongoDB performance on solid state disks (s) is outstanding compared to hard disk drives (s) for the same MongoDB configuration. This advantage gets further highlighted when the dataset goes beyond the memory capacity of the MongoDB server. The latency metrics for s, for both the read and write operations, were the lowest across all workloads which is an important factor regarding the scalability of MongoDB server. Update Heavy In-memory dataset: Figure 5 shows update-heavy workload results for the 32GB dataset, which is smaller than the memory capacity of the MongoDB server. performance has higher throughput for both the and workload types. In-Memory Dataset 3, 25, 2, 15, 1, 5, Figure 5: Throughput comparisons of update-heavy in-memory dataset 7

Accelerating Big Data: Using SanDisk s for MongoDB Workloads On-disk dataset: Dataset results for the and database sizes. These two data sets exceed the capacity of available memory and must reside on an. As seen in Figure 6, SanDisk performance is far superior compared to performance in this scenario. On-Disk Dataset 5, 4, 3, 2, 1, Figure 6: Throughput comparisons of update-heavy on-disk dataset YCSB Workload Types Storage Configuration YCSB Workload Distribution 32GB Workload A (5r/5w) 19,49 95 66 Workload A (5r/5w) 2,3 165 17 Workload A (5r/5w) 22,124 2,418 1,871 Workload A (5r/5w) 24,732 4,676 3,523 Figure 7: Throughput results of update-heavy workload 8

Accelerating Big Data: Using SanDisk s for MongoDB Workloads Latency: SanDisk CloudSpeed s provide consistently low latency results, even with large datasets for both read and write operations. In-Memory Dataset Latency On-Disk Dataset Latency Latency (ms) 28, 26, 24, 22, 2, 1,5 1, 5 2, Figure 8: Latency results for vs. update-heavy workload Read Heavy s provide excellent performance results for read-heavy workload. As the data set expands from 32GB to to, the advantage gets clearly highlighted as shown in Figure 9 (on-disk dataset). In-Memory Dataset Throughput On-Disk Dataset Throughput 6, 5, 4, 3, 2, 1, 6, 5, 4, 3, 2, 1, Figure 9: Throughput results of read-heavy workload 9

Accelerating Big Data: Using SanDisk s for MongoDB Workloads Latency: s deliver minimal latency for read-heavy workload and this advantage is more pronounced for large datasets ( and ) as shown in Figure 1 and for same datasets s generate up to 2.3x higher latency. In-Memory Dataset Latency On-Disk Dataset Latency Latency (ms) 2 15 1 5 2, 1,5 1, 5 Figure 1: Latency results for read-heavy workload Read Only This workload is exclusively a read-only workload, which is fetching large amounts of data from MongoDB server. As expected, the gets clear advantage in this workload as the size of the data set exceeds available memory in going from to. In-Memory Dataset Throughput On-Disk Dataset Throughput 18, 17, 16, 15, 14, 13, 12, 6, 5, 4, 3, 2, 1, 11, Figure 11A: Throughput results for read-only workload 1

Accelerating Big Data: Using SanDisk s for MongoDB Workloads In-Memory Dataset Latency On-Disk Dataset Latency Latency (ms) 1.2 1..8.6.4.2 1,2 1, 8 6 4 2 Figure 11B: Latency results for read-only workload Latency: s deliver virtually no latency for in-memory datasets, and for large datasets beyond memory size, s delivers minimal latency. s for large datasets encounter up to 43x higher latency, highlighting the benefit of using s for such large workloads. Conclusion SanDisk CloudSpeed s deliver superior performance throughput and they do so with a consistently low latency for all the workload and dataset types. This kind of platform, with high performance and low latency platform, helps the MongoDB database to complete all of its operations in shorter time intervals than it would with s, thereby reducing the need for number MongoDB servers in a given clustered-server environment. Reduction of MongoDB cluster density reduces both capital expenses (CAPEX) and operational expenses (OPEX), with fewer MongoDB database instances to manage and administer. SanDisk s Guardian technology, which ships with the SanDisk s, provides a data protection capability, thereby securing the customer s investment in these solid state disks. References SanDisk Corporate website, includes information about Guardian technology: www.sandisk.com MongoDB: www.mongodb.org Yahoo LABS Yahoo! Cloud Serving Benchmark: http://research.yahoo.com/web_information_management/ycsb/ MongoDB memory mapping mechanism: www.polyspot.com/en/blog/212/understanding-mongodb-storage/ 11

Products, samples and prototypes are subject to update and change for technological and manufacturing purposes. SanDisk Corporation general policy does not recommend the use of its products in life support applications wherein a failure or malfunction of the product may directly threaten life or injury. Without limitation to the foregoing, SanDisk shall not be liable for any loss, injury or damage caused by use of its products in any of the following applications: Special applications such as military related equipment, nuclear reactor control, and aerospace Control devices for automotive vehicles, train, ship and traffic equipment Safety system for disaster prevention and crime prevention Medical-related equipment including medical measurement device Accordingly, in any use of SanDisk products in life support systems or other applications where failure could cause damage, injury or loss of life, the products should only be incorporated in systems designed with appropriate redundancy, fault tolerant or back-up features. Per SanDisk Terms and Conditions of Sale, the user of SanDisk products in life support or other such applications assumes all risk of such use and agrees to indemnify, defend and hold harmless SanDisk Corporation and its affiliates against all damages. Security safeguards, by their nature, are capable of circumvention. SanDisk cannot, and does not, guarantee that data will not be accessed by unauthorized persons, and SanDisk disclaims any warranties to that effect to the fullest extent permitted by law. This document and related material is for information use only and is subject to change without prior notice. SanDisk Corporation assumes no responsibility for any errors that may appear in this document or related material, nor for any damages or claims resulting from the furnishing, performance or use of this document or related material. SanDisk Corporation explicitly disclaims any express and implied warranties and indemnities of any kind that may or could be associated with this document and related material, and any user of this document or related material agrees to such disclaimer as a precondition to receipt and usage hereof. EACH USER OF THIS DOCUMENT EXPRESSLY WAIVES ALL GUARANTIES AND WARRANTIES OF ANY KIND ASSOCIATED WITH THIS DOCUMENT AND/OR RELATED MATERIALS, WHETHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE OR INFRINGEMENT, TOGETHER WITH ANY LIABILITY OF SANDISK CORPORATION AND ITS AFFILIATES UNDER ANY CONTRACT, NEGLIGENCE, STRICT LIABILITY OR OTHER LEGAL OR EQUITABLE THEORY FOR LOSS OF USE, REVENUE, OR PROFIT OR OTHER INCIDENTAL, PUNITIVE, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION PHYSICAL INJURY OR DEATH, PROPERTY DAMAGE, LOST DATA, OR COSTS OF PROCUREMENT OF SUBSTITUTE GOODS, TECHNOLOGY OR SERVICES. No part of this document may be reproduced, transmitted, transcribed, stored in a retrievable manner or translated into any language or computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual or otherwise, without the prior written consent of an officer of SanDisk Corporation. All parts of the SanDisk documentation are protected by copyright law and all rights are reserved. For more information, please visit: www.sandisk.com/enterprise At SanDisk, we re expanding the possibilities of data storage. For more than 25 years, SanDisk s ideas have helped transform the industry, delivering next generation storage solutions for consumers and businesses around the globe. 951 SanDisk Drive Milpitas CA 9535 USA 214 SanDisk Corporation. All rights reserved. SanDisk is a trademark of SanDisk Corporation, registered in the United States and other countries. CloudSpeed, CloudSpeed Eco, CloudSpeed Ascend, CloudSpeed Ultra, Guardian Technology, FlashGuard, DataGuard and EverGuard are trademarks of SanDisk Enterprise IP LLC. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their holder(s). 121514