Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database



Similar documents
How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Get More Scalability and Flexibility for Big Data

Boost Database Performance with the Cisco UCS Storage Accelerator

Unified Computing Systems

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

Cisco UCS B-Series M2 Blade Servers

Platfora Big Data Analytics

Benchmarking Cassandra on Violin

Cisco, Citrix, Microsoft, and NetApp Deliver Simplified High-Performance Infrastructure for Virtual Desktops

SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Cisco Unified Computing System: Meet the Challenges of Virtualization with Microsoft Hyper-V

Cisco UCS B460 M4 Blade Server

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

How To Build A Cisco Ukcsob420 M3 Blade Server

The Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer

MarkLogic and Cisco: A Next-Generation, Real-Time Solution for Big Data

A Platform Built for Server Virtualization: Cisco Unified Computing System

UCS M-Series Modular Servers

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Data Center Solutions

Cisco UCS B440 M2 High-Performance Blade Server

High Performance SQL Server with Storage Center 6.4 All Flash Array

Cisco UCS B200 M3 Blade Server

SUN ORACLE DATABASE MACHINE

Cisco UCS Business Advantage Delivered: Data Center Capacity Planning and Refresh

White Paper. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

IVA & UCS. Frank Stott UCS Sales Specialist frstott@cisco.com Cisco and/or its affiliates. All rights reserved.

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Mit Soft- & Hardware zum Erfolg. Giuseppe Paletta

Benchmarking Hadoop & HBase on Violin

Dell s SAP HANA Appliance

Cisco UCS C-Series Rack-Mount Servers The Computing Platform for Virtualised Data Centres. Business Overview

Block based, file-based, combination. Component based, solution based

High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology

The Data Placement Challenge

Cisco Unified Computing System Hardware

Overview: X5 Generation Database Machines

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

How Cisco IT Built Big Data Platform to Transform Data Management

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

Cisco Unified Data Center

SUN ORACLE EXADATA STORAGE SERVER

Data Center Solutions

HadoopTM Analytics DDN

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Cisco SmartPlay Select. Cisco Global Data Center Promotional Program

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

IBM BladeCenter H with Cisco VFrame Software A Comparison with HP Virtual Connect

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

I/O Performance of Cisco UCS M-Series Modular Servers with Cisco UCS M142 Compute Cartridges

EMC XTREMIO EXECUTIVE OVERVIEW

SQL Server Consolidation on VMware Using Cisco Unified Computing System

IT Agility Delivered: Cisco Unified Computing System

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Cisco Unified Computing System: Meet the Challenges of Microsoft SharePoint Server Workloads

David Lawler Vice President Server, Access & Virtualization Group

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

præsentation oktober 2011

UCS Storage Options. July Bertalan Dergez Consulting Systems Engineer

Oracle Database Scalability in VMware ESX VMware ESX 3.5

C460 M4 Flexible Compute for SAP HANA Landscapes. Judy Lee Released: April, 2015

Dell Virtual Remote Desktop Reference Architecture. Technical White Paper Version 1.0

Support a New Class of Applications with Cisco UCS M-Series Modular Servers

Flash Storage Optimizing Virtual Desktop Deployments

Maximum performance, minimal risk for data warehousing

HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

EMC Unified Storage for Microsoft SQL Server 2008

How To Write An Article On An Hp Appsystem For Spera Hana

June Blade.org 2009 ALL RIGHTS RESERVED

Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V. Reference Architecture

Dell PowerEdge Blades Outperform Cisco UCS in East-West Network Performance

Cisco UCS C220 M3 Server

Optimally Manage the Data Center Using Systems Management Tools from Cisco and Microsoft

TekSouth Fights US Air Force Data Center Sprawl with iomemory

Juniper Networks QFabric: Scaling for the Modern Data Center

Scaling from Datacenter to Client

The Flash Transformed Data Center & the Unlimited Future of Flash John Scaramuzzo Sr. Vice President & General Manager, Enterprise Storage Solutions

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

SummitStack in the Data Center

Private cloud computing advances

Transcription:

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a highly scalable architecture designed to meet variety of scale- out application demands with transparent data integration and management integration capabilities Enhanced with Flash technology. Abstract This paper describes the benefits of running Oracle NoSQL Database workloads on Cisco Unified Computing System (Cisco UCS and Fusion iodrive2 technology. Cisco, Fusion- io and Oracle have partnered to test, validate, and deliver extreme high performance big data solutions for real- time applications. The superior performance, scalability and manageability aspects of Cisco UCS C- Series Servers and Fusion s iodrive2 complements Oracle NoSQL Database, dramatically improving throughput and response times for serving key- value data. The combination of Cisco UCS, iomemory and Oracle NoSQL Database provide a compelling and cost- effective solution in a variety of scenarios. Results of testing showed that the system delivered over 1.2 million operations per second on a 95 percent read and 5 percent write workload using Yahoo! Cloud Serving Benchmark (YCSB), a commonly used benchmark for evaluating the performance of key- value databases and cloud serving stores. Equally impressive, the system was able to achieve an average latency of 0.88 milliseconds for reads and 4 milliseconds for update operations. Introduction to Big Data Big Data is an informal term that encompasses verity of data, including Web logs, sensor data, tweets, blogs, user reviews, and text messages. It is characterized by: High volume of hundreds of terabytes or more Wide data variety with no inherent structure (one row looks very different from another) High velocity, on the order of hundreds of thousands of operations per second. Often, big data is processed using purpose- built software designed to address a specific data processing requirement. This category of big data processing solutions is generally referred to as NoSQL (not SQL or Not Only SQL). Although it is possible to process big data using traditional SQL- based products and solutions, NoSQL databases provide a more cost- effective and horizontally scalable alternative. NoSQL databases complement SQL- based solutions, providing significant new business advantages to the enterprise. Recently, there has been a huge surge of interest in big data processing solutions. As enterprises have embraced big data processing for business benefit, open source and commercial

vendors have responded by providing a variety of solutions aimed at addressing specific big data processing needs. Challenges Delivering answers quickly under fluctuating workloads is a key requirement for big data processing. For example, a NoSQL solution is often used to manage user profiles for e- commerce web sites. The ability to look up a specific user s profile with extremely low latency (low milliseconds) is critical to having a happy and satisfied user. Further, e- commerce activity can fluctuate significantly over time (e.g. during the holiday shopping season) and can also be bursty. A big data solution must be able to handle these fluctuations gracefully; this means that big data solutions need to deliver the required throughput with predictable low latency under widely varying workloads. NoSQL systems need be highly available and horizontally scalable in order to adapt to such demanding workloads. With the recent explosion of internet data, enterprises face the eminent challenge of coming up to speed on handling this data growth and more importantly, extracting value for their mission critical business applications. Since a NoSQL solution is a distributed system with many moving parts, a commercial solution is often preferable over open- source solutions for such interactive mission- critical applications. Meeting the High Velocity Challenge In the second half of 2011 Cisco and Oracle joined forces to provide innovative solutions to this critical problem. They delivered complementary technologies (Hardware from Cisco and software from Oracle) to manage and process massive amount of data to maximize business value. Cisco and Oracle extended this partnership in 2012 to work with Fusion- io in a three- way engineering effort to tackle burning issues like transaction rates and latency. Fusion- io takes Oracle NoSQL Database Community Edition on Cisco UCS performance to the next level, delivering unprecedented performance and scalability. These complementary products combine to create a solution that provides significant advantages over the competition. About Cisco UCS and Cisco s Partnership with Oracle NoSQL Database Cisco Unified Computing System is a data center platform that has redefined enterprise computing. It brings together compute, network, storage access, and virtualization resources in a unified system designed to integrate technology in the data center and reduce Total Cost of Ownership (TCO) for the end user. Within a few years of its first customer shipment, Cisco UCS has established a position as the leading mainstream application platform. With support for Oracle NoSQL Database, Cisco UCS ecosystem extends the unstructured data management capabilities that can coexist with and complement Oracle Database based applications. More information on the partnership and the joint solution can be found at http://www.cisco.com/en/us/solutions/collateral/ns340/ns517/ns224/ns944/le_34301_wp.pdf About Oracle NoSQL Database Oracle NoSQL Database is a highly available, linearly scalable, high- performance key- value database server. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring. Oracle NoSQL Database provides a very simple data model to the application developer. Each row is identified by a unique key, and also has a value, of arbitrary length, which is interpreted by the application. The application can manipulate (insert, delete, update, read) a single row in a transaction.

The application can also perform an iterative, non- transactional scan of all the rows in the database. The simplicity of this data model and access provides tremendous flexibility and performance benefits over an SQL- based solution for big data processing. As mentioned earlier, big data is characterized by variety, volume and velocity. The key- value paradigm permits the application to manage any kind of data: one row can be structurally very different from another row. The volume of data managed might change dramatically from one day to the next. For example, if e- commerce transactions are being managed in Oracle NoSQL Database, the volume of transactions and data can increase more than ten- fold during a busy shopping season, such as the weeks before the Christmas holiday. The data management system needs to scale easily to handle the change in workload without compromising performance. Similarly, high throughput and low response time are critical in many big data processing applications such as e- commerce, targeted advertising, and any application that provides interactive access to the customer. NoSQL DB is a sharded system each shard manages a subset of data. Typically, a shard is composed of three independent nodes to provide High Availability as well as read scalability. Shards can be added to provide additional capacity. NoSQL DB performance scales linearly as shards are added. In case one of the nodes in the shard fails, the surviving nodes dynamically reconfigure the status of the shard, and processing continues without any interruption in database activity. Figure 1 illustrates the architecture of a typical NoSQL Database configuration with two clients. Note that the number of clients can vary, depending on application requirements. Figure 1: NoSQL Database system architecture Each node uses Berkeley DB Java Edition HA as the underlying data manager. Berkeley DB Java Edition uses a log- structured storage format to store the records and indices in the database. Log- structured storage is naturally optimized for write performance and can deliver extremely high write throughput. Through a combination of clever optimizations and effective use of memory, Berkeley DB Java Edition delivers excellent read performance as well.

High Velocity The Problem with Conventional Technology Transactional semantics, high availability, scalable throughput and predictable latency are must- have requirements for interactive (or real- time ) big data processing for which Oracle NoSQL Database is designed. For example, a retail e- commerce application must respond to user requests in under one or two seconds to ensure high user retention. Similarly, an in- home health care application must have the ability to capture and monitor data from multiple sensors, while processing and responding to critical medical events reliably and predictably without data loss. A common technique to ensure high throughput and low latency is to store all the information in memory. Due to the high and unpredictable volumes of data, however, an in- memory solution is not cost- effective for big data processing. Typically, big data solutions store the vast majority of the information on disk, and use memory for caching the most frequently accessed subsets of data. The performance of storing and retrieving data from spinning media often limits the throughput and response time achievable by the system. In particular, the number of input- output operations per second (IOPS) that a disk can deliver will dictate the performance characteristics of the system. While spinning media remains the most cost effective solution for high capacity applications, high performance flash solutions can provide compelling price- performance for latency sensitive transactional applications. The Cisco UCS Common Platform Architecture The Cisco big data common platform architecture (CPA) is a highly scalable architecture designed to meet variety of scale- out application demands, including Oracle No SQL, with transparent data integration and management integration capabilities with new and existing Oracle applications, is built using the following components: Cisco UCS 6200 Series Fabric Interconnects: The Cisco UCS 6200 Series Fabric Interconnects are a core part of Cisco UCS, providing both network connectivity and management capabilities across Cisco UCS 5100 Series Blade Server Chassis and Cisco UCS C- Series Rack Servers. Deployed in redundant pairs, the fabric Interconnects offer line- rate, low- latency, lossless 10 Gigabit Ethernet connectivity and unified management with Cisco UCS Manager in a highly available management domain. Cisco UCS 2200 Series Fabric Extenders: Cisco UCS 2200 Series Fabric Extenders behave like remote line cards for a parent switch and provide a highly scalable and extremely cost- effective unified server- access platform. Cisco UCS rack servers: Specific models are used to support the base, high- performance, and high- capacity configurations: o Cisco UCS C210 M2 General- Purpose Rack Server: Cisco UCS C210 M2 servers are general- purpose 2- socket platforms based on the Intel Xeon processor 5600 series. These servers support up to 192 GB of main memory and 16 internal front- accessible, hot- swappable, Small Form Factor (SFF) disk drives, with a choice of one or two RAID controllers for data performance and protection. o Cisco UCS C240 M3 Rack Server: Cisco UCS C240 M3 Servers are designed for both performance and expandability over a wide range of storage- intensive infrastructure workloads. Each server provides sockets for up to two processors from the Intel Xeon processor E5-2600 product family and up to 768 GB of main memory. Up to 24 SFF or 12

Large Form Factor (LFF) disk drives are supported, along with four Gigabit Ethernet LAN- on- motherboard (LOM) ports. Cisco UCS virtual interface cards (VICs): Unique to Cisco, Cisco UCS VICs incorporate next- generation converged network adapter (CNA) technology from Cisco and offer dual10- Gbps ports designed for use with Cisco UCS C- Series Rack Servers. Optimized for virtualized networking, these cards deliver high performance and bandwidth utilization and support up to 256 virtual devices. Cisco UCS Manager: Cisco UCS Manager resides in the Cisco UCS 6200 Series Fabric Interconnects. It makes the system self- aware and self- integrating, managing all the system components as a single logical entity. Cisco UCS Manager can be accessed through an intuitive GUI, a command- line interface (CLI), or an XML API. Cisco UCS Manager uses service profiles to define the personality, configuration, and connectivity of all resources within Cisco UCS, radically simplifying provisioning of resources so that the process takes minutes instead of days. This simplification allows IT departments to shift their focus from constant maintenance to strategic business initiatives. It also provides the most streamlined, simplified approach commercially available today to firmware updating for all server components. About the Tests The benchmark system was based on Cisco big data common platform architecture consisting of two fully redundant Cisco UCS 6248UP 48- Port Fabric Interconnects along with two Cisco Nexus 2232PP 10GE Fabric Extenders and fifteen Cisco UCS C240 M3 Rack Servers, each with two Intel Xeon processors E5-2665, 256 GB of memory, and 24 1- TB 7,200- rpm SFF SATA disk drives (not used in the benchmark) and two Fusion iodrive2 365 GB MLC (30 in total).. The work load was the Yahoo! Cloud Systems Benchmark (YCSB) modified to use a larger key space for better distribution of keys when scaling up to large data sets. The benchmark generated a 95% read and 5% update workload in order to measure latency and throughput: using a 200 million record store created by the an insert run. Throughput and latency were measured by the YCSB clients during these tests and are summarized in the tables presented below. Test Results i Number of Shards 2 4 8 10 Mixed Read and Write Throughput 302,152 558,569 1,028,868 1,244,550 Read Latency 0.76 0.79 0.85 0.88 Write Latency 3.08 3.82 4.29 4.47 Table 1: Key-value Store Operations per Second Interpreting the Results Read operations require random I/Os (seeks) on conventional disks. However, in the case of iodrive2, the cost of random I/O and sequential I/O is almost identical. In other words, any I/O operation in a i Latency results include Java-application overhead. Raw for iomemory access latency is typically in the microsecond range. More efficient applications will see even faster response times.

sequence of operations is equally fast! Consequently, the iodrive2 delivers much better performance compared to conventional hard disk drives, scales nearly linearly and also delivers much lower latency. Cisco UCS and Fusion iodrive2 for Oracle NoSQL Database A Winning Combination From these performance tests, it is clear that Cisco UCS C- Series and iodrive2 provide dramatic improvements in performance for interactive big data applications. Disk drives cannot cost- effectively achieve the number of IOPS that an iodrive2 can and simply cannot approach the latency of flash storage. The superior performance and scalability of Oracle NoSQL Database on Cisco UCS C240 Systems using iodrive2 is critical for many mission- critical applications like e- retail, online advertising, home health care monitoring, financial services, security and surveillance, etc. Though the capital cost of flash storage- based technology is higher, a system using disk- based storage that delivers comparable performance will need a large number of disk spindles to deliver the required throughput, and may not be able to deliver the required latency at all. Further, the operational costs of flash- based technology, including the amount of hardware required, power consumption, and cooling, is much lower than comparable disk- based solutions. Finally, there are intangible benefits of deploying a super- high performance, low latency, and reliable NoSQL application, including customer and user loyalty and trust, competitive advantage, and lower operational costs. Cisco UCS with Oracle NoSQL Database and Fusion iodrive2 provides an enterprise- grade, highly reliable, highly scalable, high performance, and low- latency solution for the most demanding big data applications today.