Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage



Similar documents
Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Accelerating and Simplifying Apache

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cloud Computing and Advanced Relationship Analytics

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Integrated Grid Solutions. and Greenplum

Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview

Benchmarking Cassandra on Violin

I N T E R S Y S T E M S W H I T E P A P E R INTERSYSTEMS CACHÉ AS AN ALTERNATIVE TO IN-MEMORY DATABASES. David Kaaret InterSystems Corporation

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Lab Validation Report

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

HP reference configuration for entry-level SAS Grid Manager solutions

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Enabling High performance Big Data platform with RDMA

With DDN Big Data Storage

Dell Reference Configuration for Hortonworks Data Platform

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Scala Storage Scale-Out Clustered Storage White Paper

A High-Performance Storage and Ultra-High-Speed File Transfer Solution

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Panasas: High Performance Storage for the Engineering Workflow

The Hardware Dilemma. Stephanie Best, SGI Director Big Data Marketing Ray Morcos, SGI Big Data Engineering

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Business white paper. HP Process Automation. Version 7.0. Server performance

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Infrastructure Matters: POWER8 vs. Xeon x86

HPC Advisory Council

Sun Constellation System: The Open Petascale Computing Architecture

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Four Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER

Benefits of multi-core, time-critical, high volume, real-time data analysis for trading and risk management

SOLID STATE DRIVES AND PARALLEL STORAGE

ANALYTICS BUILT FOR INTERNET OF THINGS

Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com

Using In-Memory Computing to Simplify Big Data Analytics

HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Introduction. Scalable File-Serving Using External Storage

Maximum performance, minimal risk for data warehousing

The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Virtual Compute Appliance Frequently Asked Questions

Improving Time to Results for Seismic Processing with Paradigm and DDN. ddn.com. DDN Whitepaper. James Coomer and Laurent Thiers

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Easier - Faster - Better

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

Designing a Cloud Storage System

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Building a Scalable Big Data Infrastructure for Dynamic Workflows

Get More Scalability and Flexibility for Big Data

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

EXPLORATION TECHNOLOGY REQUIRES A RADICAL CHANGE IN DATA ANALYSIS

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

Boost Database Performance with the Cisco UCS Storage Accelerator

Enabling Technologies for Distributed Computing

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Enabling Technologies for Distributed and Cloud Computing

In-Memory Analytics for Big Data

WHITE PAPER BRENT WELCH NOVEMBER

BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS

8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Database Server Configuration Best Practices for Aras Innovator 10

SQL Server Business Intelligence on HP ProLiant DL785 Server

SCI Briefing: A Review of the New Hitachi Unified Storage and Hitachi NAS Platform 4000 Series. Silverton Consulting, Inc.

NoSQL Data Base Basics

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Marvell DragonFly Virtual Storage Accelerator Performance Benchmarks

Cloud Computing through Virtualization and HPC technologies

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Diagram 1: Islands of storage across a digital broadcast workflow

General Pipeline System Setup Information

Avid ISIS

SMB Direct for SQL Server and Private Cloud

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

EMC Unified Storage for Microsoft SQL Server 2008

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

Cray: Enabling Real-Time Discovery in Big Data

Transcription:

White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211

Background Objectivity/DB uses a powerful distributed processing architecture to manage localized, centralized, or distributed object databases. Panasas storage appliances accelerate time-to-results with highly reliable, massively scalable storage solutions for performancedriven applications in Government R&D, Oil & Gas, Manufacturing, Higher Education & Research, Life Sciences, CAE manufacturing, and Financial markets. This white paper discusses the results of a benchmark run to demonstrate the performance scalability of Objectivity/DB when accessing files stored on a Panasas ActiveStor 12 scale-out NAS storage system. It covers the main features of Objectivity/ DB and the Panasas equipment before explaining the benchmark software and the hardware configuration. It then presents and interprets the benchmark results. The benchmark demonstrated significant improvements in write and read speed over conventional disk storage. Objectivity/DB Overview Objectivity, Inc. is a leading provider of extremely large distributed database and data management technologies serving next-generation enterprise, government, web businesses, social networks, and data intensive research in cloud, non-cloud, and embedded environments. The company s flagship product, Objectivity/DB, is primarily used in Extremely Large Databases in the DoD, DHS and DoE communities as well as in complex real-time process control and in telecom hardware and systems. No Mapping Layer - Objects and the relationships between them are stored directly, rather than in the column and row format of relational databases. This avoids the expensive and inefficient mapping layers needed between object oriented languages, such as C++, C# and Java, and the database storage mechanism. Objectivity/DB is extremely fast at navigating complex networks and tree structures as there is no relational join table processing. Primary architectural features of Objectivity/ DB include: A single logical view of distributed databases Simple, distributed server architecture Distributed Parallel Query Engine with replaceable components Predictable scalability Single Logical View - Users see a single federated database that consists of local or distributed databases. The databases contain application controllable clusters of objects, relationships, collections, and indices. Simple, Distributed Server Architecture - Objectivity/ DB is directly linked with client applications. It accesses simple, distributed servers that manage transaction management, remote data access, and fragments of parallel queries. It can be configured in standalone, networked, peer-to-peer, Service Oriented Architectures (SOA), and grid or cloud computing environments and takes care of all heterogeneity issues that arise because of differences in languages, file systems, operating systems, and hardware. Distributed Parallel Query Engine Clients can service queries locally or use a parallel query engine that distributes fragments of queries to simple query servers located close to the databases to be accessed. There are replaceable components for controlling how queries are split, executed and serviced. Query servers can execute filters to refine results before they are returned to the client. The query servers can be replaced with agents for accessing external data sources such as RDBMSs or the Web. Predictable Scalability - The address space available within a single logical view of a distributed federation of databases covers billions of terabytes. Objectivity/ DB customers have built some of the world s largest databases, including a petabyte+ system at Stanford Linear Accelerator Center. Objectivity/DB has been used in data fusion systems that ingest and correlate over one terabyte of data per hour with simultaneous deletion and queries. 2

Panasas ActiveStor Overview Panasas ActiveStor is the world s fastest parallel storage system, bringing plug-and-play simplicity to HPC storage deployments. Based on a fourthgeneration storage blade architecture and the Panasas PanFS storage operating system, ActiveStor delivers unmatched parallel file system performance in addition to the scalability, manageability, reliability, and value required by demanding technical computing organizations in the energy, government, finance, manufacturing, bioscience, and other research sectors. Primary architectural features include: A single logical view of data for simple administration and access to data. Easy to use, tightly integrated storage appliance. Distributed parallel file system and high performance DirectFlow protocol. Object RAID on a storage blade hardware architecture provides the basis for scalable performance. Attributes for superior scalability: Panasas parallel storage utilizes a single global name space allowing seamless manageability at massive scale. Panasas customers have successfully deployed large capacity implementations, including a six petabyte shared file system at Los Alamos National Laboratory. Reliability: Panasas storage is known for its superior reliability and data availability. The patented object RAID architecture and tiered parity technology ensures no single point of failure and superior data recovery on RAID protected data. Benchmark Software NetElem was selected to best test the scalability of Objectivity/DB using Panasas storage. This is because NetElem generates huge amounts of data in a relatively short amount of time. NetElem simulates typical telecommunications network entities (ports, cards, cables, etc.) which are efficiently represented and manipulated as objects. A telecommunications network has hundreds of thousands of such objects and therefore requires massive amounts of storage to store and manipulate these objects. Object Circuit Node Link Position Node Cabinet (or shelf) Card Frame Leg Magazine Space Mdf SwitchLine Card DP Frame Mdf Leg Switch Magazine SubDp SwitchLine Dp Space Leg Network Element Management System object class model. 3

The NetElem Suite The object class model for the Network Element Management System application is shown above. Only the circled types are used in the benchmark. The benchmark has three phases: Create - Create the tree of network elements Traversal - Navigate the tree Scan - Scan one class of objects sequentially NetElem Create Phase In the first phase, NetElem creates 16 databases of 2GB each. Each one contains 86,482,16 objects of the following four types: Object Type Count Circuits 6,144, Nodes 43,396,32 s 78,128 Links 36,864, The object instances are linked together using Objectivity/DB 1-to-many or many-to many associations. The Links are a type of object, not a database relationship type. NetElem Navigate Phase The navigation function starts from a root node and navigates to all node type objects, using Objectivity/DB persistent relationships (associations). The number of objects visited during navigation is 43,396,32. NetElem Scan Phase The scan function scans the 16 databases for Circuit and objects. Scanning in this case is mostly sequential read, since the container is created once and never modified. The number of Circuit and objects read during the scan is 6,144, and 18,432,, respectively. Benchmark Hardware Objectivity/DB (version 1.1) was configured as a standalone application, running on a multi-core, dual socket server and accessing its database stored on Panasas ActiveStor 12. The Panasas storage equipment was configured with a single ActiveStor 12 storage shelf with one director blade and 1 storage blades running PanFS 4..1. Eight client nodes were used for the benchmarking, each with dual 2.4 GHz quad core Intel Xeon processors and 24 GB of memory (reduced to 1GB to avoid local caching). The client OS was running Red Hat Enterprise Linux 5, kernel 2.6.18. Benchmark Results Single Server Scaling The first part of the benchmark measured the performance of an increasing number of processes running locally on one application server accessing data stored on a Panasas ActiveStor 12 storage system. The results showed that the time taken to create, traverse and scan the DB instances with ActiveStor 12 was significantly less than what is required when storing the data on conventional / local storage (ActiveStor 12 was 7.8x faster for Ingest, 14x faster for Traversal and up to 39x faster for the Scan phases). This increased throughput is achieved by processing the data streams through PanFS, the Panasas parallel file system. 45. 4. 35. 3. 25. 2. 15. 1. 5. Single Server with 8 Processes Ingest Traversal ScanC ScanP Conventional Storage Panasas Next, we examined the throughput (in MB/s) for each phase while increasing the number of processes from 1 to 8 with the large databases stored on the ActiveStor 12 scale out NAS system. Note that the throughput increased almost linearly with the number of object databases proving the benefit of using ActiveStor 12 when processing multiple concurrent streams of data. 4

5 4 3 2 1 Single Server Throughput Scaling 2 4 6 8 Processes Scan (s) Scan (Circuits) Ingest Traversal Multi-Server Scaling The second part of the benchmark measured the effect of scaling the number of client nodes (each with two processes) accessing data on ActiveStor 12. What is particularly significant is that the throughput, especially the create (write) rate, increased linearly and did not reach a plateau with the number of processes and threads available. Summary The benchmark convincingly demonstrated the superior performance that can be achieved with Objectivity/ DB using a Panasas scale-out NAS system, versus local storage. Data ingest was 7.8 times faster, the traversal stage was 6.8x faster, and sequential scans were up to 39x faster with increasing numbers of processes. Furthermore, database I/O performance was demonstrated to scale linearly as the number of clients increased when using the Panasas PanFS file system. Clearly, the combination of Objectivity/DB and Panasas ActiveStor 12 offers breakthrough performance for applications that manipulate or query large amounts of complex data. The benchmarked configuration is particularly attractive for NoSQL and other applications with extremely large storage requirements. Multi Server Throughput Scaling 2 15 1 5 Scan (s) Scan (Circuits) Traversal Ingest 2 4 6 8 Servers Phone: 817211 165 211 Panasas Incorporated. All rights reserved. Panasas is a trademark of Panasas, Inc. in the United States and other countries.