How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)



Similar documents
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Data Center Solutions

Data Center Storage Solutions

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Vormetric and SanDisk : Encryption-at-Rest for Active Data Sets

Fusion iomemory iodrive PCIe Application Accelerator Performance Testing

Benchmarking Cassandra on Violin

Data Center Solutions

Benchmarking Hadoop & HBase on Violin

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

The Flash Transformed Data Center & the Unlimited Future of Flash John Scaramuzzo Sr. Vice President & General Manager, Enterprise Storage Solutions

Boost Database Performance with the Cisco UCS Storage Accelerator

Oracle Acceleration with the SanDisk ION Accelerator Solution

Microsoft SQL Server Acceleration with SanDisk

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Improving Microsoft Exchange Performance Using SanDisk Solid State Drives (SSDs)

TekSouth Fights US Air Force Data Center Sprawl with iomemory

Accelerating Cassandra Workloads using SanDisk Solid State Drives

HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

EMC Unified Storage for Microsoft SQL Server 2008

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

Accelerating Big Data: Using SanDisk SSDs for MongoDB Workloads

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Seeking Fast, Durable Data Management: A Database System and Persistent Storage Benchmark

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

Understanding Data Locality in VMware Virtual SAN

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

An Oracle White Paper May Exadata Smart Flash Cache and the Oracle Exadata Database Machine

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

ioscale: The Holy Grail for Hyperscale

SQL Server Virtualization

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

2009 Oracle Corporation 1

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

Microsoft SQL Server 2014 Fast Track

Virtuoso and Database Scalability

Accelerate SQL Server 2014 AlwaysOn Availability Groups with Seagate. Nytro Flash Accelerator Cards

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Accelerating Server Storage Performance on Lenovo ThinkServer

Enabling the Flash-Transformed Data Center

Maximum performance, minimal risk for data warehousing

Inge Os Sales Consulting Manager Oracle Norway

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Memory-Centric Database Acceleration

EMC XTREMIO EXECUTIVE OVERVIEW

Data Center Storage Solutions

Overview: X5 Generation Database Machines

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

High Performance SQL Server with Storage Center 6.4 All Flash Array

ZooKeeper. Table of contents

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Vertical Scaling of Oracle 10g Performance on Red Hat Enterprise Linux 5 on Intel Xeon Based Servers. Version 1.0

Everything you need to know about flash storage performance

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

All-Flash Storage Solution for SAP HANA:

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

SUN ORACLE EXADATA STORAGE SERVER

SAS Business Analytics. Base SAS for SAS 9.2

High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Marvell DragonFly Virtual Storage Accelerator Performance Benchmarks

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

Technical Paper. Performance and Tuning Considerations for SAS on Fusion-io ioscale Flash Storage

SOLUTION BRIEF. Advanced ODBC and JDBC Access to Salesforce Data.

An Oracle White Paper October Realizing the Superior Value and Performance of Oracle ZFS Storage Appliance

Dell s SAP HANA Appliance

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

HP PCIe IO Accelerator For Proliant Rackmount Servers And BladeSystems

Using In-Memory Computing to Simplify Big Data Analytics

Performance Beyond PCI Express: Moving Storage to The Memory Bus A Technical Whitepaper

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

SanDisk SSD Boot Storm Testing for Virtual Desktop Infrastructure (VDI)

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

EMC VFCACHE ACCELERATES ORACLE

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Transcription:

WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com

Table of Contents Abstract... 3 What Is Big Data?... 3 About Oracle NoSQL Database... 3 Big Data The Problem With Conventional Technology... 5 The Flash Memory Solution... 5 About The Tests... 6 Test Results... 6 Interpreting The Results...7 Oracle NoSQL Database And SanDisk A Winning Combination... 7 2

Abstract This paper describes the benefits of storing Oracle NoSQL Database data on SanDisk s Fusion iomemory products. Oracle and SanDisk partnered to test, validate, and deliver extreme-performance big data solutions for real-time applications. The superior performance of the Fusion iomemory devices complement the scalability, reliability, and simplicity of Oracle NoSQL Database, dramatically improving throughput and response times for serving keyvalue data. The combination of Oracle NoSQL Databases and Fusion iomemory products provide a compelling and cost-effective solution in a variety of scenarios. Results of testing showed that using an iodrive 2 device for data storage delivered nearly 30 times more operations per second than a 300GB 10k SAS disk on a 90% read and 10% write workload and nearly eight times more operations per second on a 50% read and 50% write workload. Equally impressive, an iodrive2 device reduced latency over 700% (seven times) on inserts in a 90% read and 10% write workload and over 5800% (58 times) on reads in a 50% read and 50% write workload. What Is Big Data? Big Data is an informal term that encompasses all sorts of data, including Web logs, sensor data, tweets, blogs, user reviews, and SMS messages. It is characterized by volume of hundreds of terabytes or more; wide data variety with no inherent structure (one row looks very different from another); and high velocity, on the order of hundreds of thousands of operations per second. Often, big data is processed using purpose-built software designed to address a specific data processing requirement. This category of big data processing solutions is generally referred to as NoSQL (not SQL or Not Only SQL). Although it is possible to process big data using traditional SQL-based products and solutions, NoSQL databases provide a more cost-effective and horizontally scalable alternative. NoSQL databases complement SQL-based solutions, providing significant new business advantages to the enterprise. Recently, there has been a huge surge of interest in big data processing solutions. As enterprises have embraced big data processing for business benefit, open source and commercial vendors have responded by providing a variety of solutions aimed at addressing specific big data processing needs. In October 2011, Oracle announced a suite of complementary products and technologies that provide a complete and comprehensive solution to address the big data processing needs of the market. Big data processing falls into two major categories: interactive processing and batch processing. In most big data processing applications, both kinds of data processing are required. Oracle NoSQL Database (NoSQL DB for short), also released in October 2011, is a scalable, highly available key-value store that can be used to acquire and manage vast amounts of interactive information. About Oracle NoSQL Database Oracle NoSQL Database is a highly available, linearly scalable, high-performance key-value database server. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring. Oracle NoSQL Database provides a very simple data model to the application developer. Each row is identified by a unique key, and also has a value, of arbitrary length, which is interpreted by the application. The application can manipulate (insert, delete, update, read) a single row in a transaction. The application can also perform an iterative, non-transactional scan of all the rows in the database. The simplicity of this data model and access provides tremendous flexibility and performance benefits over an SQL-based solution for big data processing. 3

As mentioned earlier, big data is characterized by variety, volume and velocity. The key-value paradigm permits the application to manage any kind of data: one row can be structurally very different from another row. The volume of data managed might change dramatically from one day to the next. For example, if e-commerce transactions are being managed in Oracle NoSQL Database, the volume of transactions and data can increase more than ten-fold during a busy shopping season, such as the weeks before the Christmas holiday. The data management system needs to scale easily to handle the change in workload without compromising performance. Similarly, high throughput and low response time are critical in many big data processing applications such as e-commerce, targeted advertising, and any application that provides interactive access to the customer. NoSQL DB is a sharded system each shard manages a subset of data. Typically, a shard is composed of three independent nodes to provide High Availability. One of the nodes in the shard is designated as a master, meaning it can serve read as well as write requests. Changes to data on the master node are continually propagated to the other nodes (the replicas) in the shard in order to keep the replicas up-to-date. Replicas can serve read requests; in case the master node fails, one of the surviving replicas is elected as the master, and processing continues without any interruption in database activity. Figure 1 illustrates the architecture of a typical NoSQL Database configuration with two clients. Note that the number of clients can vary, depending on application requirements. Figure 1: NoSQL Database system architecture Each node (master or replica) uses Berkeley DB Java Edition HA as the underlying data manager. Berkeley DB Java Edition uses a log-structured storage format to store the records and indices in the database. Log-structured storage is naturally optimized for write performance and can deliver extremely high write throughput. Through a combination of clever optimizations and effective use of memory, Berkeley DB Java Edition delivers excellent read performance as well. 4

Big Data The Problem With Conventional Technology Transactional semantics, high availability, scalable throughput and predictable latency are must-have requirements for the interactive (or real-time ) big data processing for which Oracle NoSQL Database is designed. For example, a retail e-commerce application must respond to user requests in under one or two seconds to ensure high user retention. Similarly, an in-home health care application must have the ability to capture and monitor data from multiple sensors, while processing and responding to critical medical events reliably and predictably without data loss. A common technique to ensure high throughput and low latency is to store all the information in memory. Due to the high and unpredictable volumes of data, however, an in-memory solution is not cost-effective for big data processing. Typically, big data solutions store the vast majority of the information on disk, and use memory for caching the most frequently accessed subsets of data. The performance of storing and retrieving data from disk often limits the throughput and response time achievable by the system. In particular, the number of input-output operations per second (IOPS) that a disk can deliver will dictate the performance characteristics of the system. Modern spinning disks are able to deliver fast sequential access, but poor sustained random performance of approximately 100 IOPS. Most often, the requirements of a NoSQL database application far exceed the capacity of a single disk. Consequently, high-performance solutions often use multiple disks per machine in order to get additional I/O bandwidth. This can work adequately for smaller data sets, but as the volume of data to be processed increases, applications require external arrays and the cost of hardware and maintenance to scale systems quickly becomes impractical. The Flash Memory Solution SanDisk s Fusion iomemory platform delivers the microsecond latency access interactive big data applications need to maintain real time response times for tens of terabytes of capacities something that in-memory databases cannot practically do. It provides persistent storage and the necessary I/O performance that disk arrays cannot achieve without racks of infrastructure and high bandwidth network infrastructure. Oracle and SanDisk have partnered to test the Fusion iomemory solution s benefits to the Oracle NoSQL Database. 5

About The Tests The tests were run on a single shard consisting of three nodes. Each node was a Sun Fire X4170 M2 configured with two Intel 2.93GHz 6-Core Xeon E5670 processors and 72GB of DRAM, a 300GB 10k SAS hard disk, and a 1.2TB Fusion iomemory iodrive2 card. The machines were configured with Oracle Linux Server release 5.7 and a pre-release version of NoSQL Database 2.0. The test driver consisted of a single Yahoo! Cloud Systems Benchmark (YCSB) client. The YCSB software was modified to use a larger key space for better distribution of keys when scaling up to large data sets. Tests were conducted on an Oracle system that was not tuned for flash. There were three sets of tests: 1. Pure insert: Insert 100 million records, with an average key size of 13 bytes and an average value size of 1108 bytes. 2. 50/50 R/W: Ten million operations consisting of a 50% read and 50% update mix, using the 100 million record store created by the insert test. 3. 95/5 R/W: Ten million operations consisting of a 95% read and 5% update mix, again using the 100 million record store created by the insert test. The above tests were run using both the SAS hard disk and iodrive2 card. Throughput and latency were measured by the YCSB client during these tests and are summarized in the tables presented below. Test Results 300GB SAS Disk iodrive2 Improvement Throughput (operations/sec) 23,308 24,150 3.60% Average insert latency (msec) 1 5.07 4.96 2.20% Average read latency (msec) 1 N/A N/A N/A Table 1. Pure insert test insert 100 million 1108 byte records 300GB SAS Disk iodrive2 Improvement Throughput (operations/sec) 3,342 33,693 908% Average insert latency (msec) 1 36.88 6.42 574% Average read latency (msec) 1 35.6 0.61 5836% Table 2. 50/50 read/update mix. 400 million 1108 byte records in the database 300GB SAS Disk iodrive2 Improvement Throughput (operations/sec) 3,583 106,616 2975% Average insert latency (msec) 1 34.57 4.79 721% Average read latency (msec) 1 33.16.91 3643% Table 3. 95/5 read/update mix. 100 million 1108 byte records in the database 1 Latency results include Java-application overhead. Raw for iomemory access latency is typically in the microsecond range. More efficient applications will see even faster response times. 6

Interpreting The Results For the pure insert scenario, the performance of disk and iodrive2 device is similar. This similarity is not surprising, since the underlying log-structured storage architecture for Oracle NoSQL Database is optimized for write operations on hard disks. However, we see a dramatic difference in the read/update mix tests. Read operations require random I/ Os (seeks) on conventional disks; consequently, the throughput as well as latency is affected. However, in the case of iodrive2, the cost of random I/O and sequential I/O is almost identical. In other words, any I/O operation in a sequence of operations is equally fast! The improvement factor is 30 times (nearly 3,000%). Notice that the overall throughput improves as the ratio of reads to writes increases. This happens because the benefits of log-structured storage have less of an impact when the relative proportion of writes to reads is small. Oracle NoSQL Database And SanDisk A Winning Combination From these performance tests, it is clear that iodrive2 card provides dramatic improvements in performance for interactive big data applications. Disk drives simply cannot achieve the number of IOPS that an iodrive2 device can. The superior performance of Oracle NoSQL Database using an iodrive2 card is critical for many mission-critical applications like e-retail, online advertising, home health care monitoring, financial services, security and surveillance, etc. Though the initial capital cost of flash storage-based technology is higher, a system using disk-based storage that delivers comparable performance will need a large number of disk spindles to deliver the required throughput, and may not be able to deliver the required latency at all. Further, the operational costs of flash-based technology, including the amount of hardware required, power consumption, and cooling, are much lower than comparable disk-based solutions. Finally, there are intangible benefits of deploying a super-high performance, low latency, and reliable NoSQL application, including customer and user loyalty and trust, competitive advantage, and lower operational costs. Oracle NoSQL Database with Fusion iomemory iodrive2 technology provides an enterprise-grade, highly reliable, highly scalable, high performance, and low-latency solution for the most demanding big data applications today. FOR MORE INFORMATION Contact a SanDisk representative, 1-800-578-6007 or fusion-sales@sandisk.com The performance results discussed herein are based on testing and use of the above described products. Results and performance may vary according to configurations and systems, including drive capacity, system architecture and applications. 2014 SanDisk Corporation. All rights reserved. SanDisk is a trademark of SanDisk Corporation, registered in the United States and other countries. Fusion iomemory, iodrive and others are trademarks of SanDisk Enterprise IP LLC. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). 7