White Paper. PiLab Technology Performance Test, Deep Queries. pilab.pl



Similar documents
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Deep Dive: Maximizing EC2 & EBS Performance

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

1 Storage Devices Summary

Introduction to SQL for Data Scientists

InfoScale Storage & Media Server Workloads

SAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence

How To Make A Backup System More Efficient

Innovative technology for big data analytics

White Paper February IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

White Paper. Optimizing the Performance Of MySQL Cluster

MS SQL Performance (Tuning) Best Practices:

Oracle Database In-Memory The Next Big Thing

Using HP StoreOnce D2D systems for Microsoft SQL Server backups

Database Administration with MySQL

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

Virtuoso and Database Scalability

Performance and scalability of a large OLTP workload

Vectorwise 3.0 Fast Answers from Hadoop. Technical white paper

HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem

Database Design Patterns. Winter Lecture 24

InfiniteGraph: The Distributed Graph Database

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

Introduction to Microsoft Jet SQL

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Benchmarking Cassandra on Violin

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Relational Database: Additional Operations on Relations; SQL

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

BI 4.1 Quick Start Java User s Guide

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

A Novel Data Placement Model for Highly-Available Storage Systems

Inge Os Sales Consulting Manager Oracle Norway

Product Brief: XenData X2500 LTO-6 Digital Video Archive System

Designing a Dimensional Model

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Technical Paper. Performance of SAS In-Memory Statistics for Hadoop. A Benchmark Study. Allison Jennifer Ames Xiangxiang Meng Wayne Thompson

How To Create A Table In Sql (Ahem)

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Deliverable Billion Triple dataset hosted on the LOD2 Knowledge Store Cluster. LOD2 Creating Knowledge out of Interlinked Data

Data Deduplication and Corporate PC Backup

SQL Server 2012 Performance White Paper

Creating BI solutions with BISM Tabular. Written By: Dan Clark

2009 Oracle Corporation 1

Optimizing Your Data Warehouse Design for Superior Performance

SQL Server Business Intelligence on HP ProLiant DL785 Server

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD METAMARKETS

Flash Storage Roles & Opportunities. L.A. Hoffman/Ed Delgado CIO & Senior Storage Engineer Goodwin Procter L.L.P.

Microsoft SQL Server 2000 Index Defragmentation Best Practices

Safe Harbor Statement

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Unprecedented Performance and Scalability Demonstrated For Meter Data Management:

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

Brocade and EMC Solution for Microsoft Hyper-V and SharePoint Clusters

The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5.

Database Performance Report Using PivotalVRP

Microsoft SQL Server OLTP Best Practice

Software-defined Storage Architecture for Analytics Computing

BI4Dynamics provides rich business intelligence capabilities to companies of all sizes and industries. From the first day on you can analyse your

Maximum performance, minimal risk for data warehousing

Workflow performance analysis tests

Queuing Theory. Long Term Averages. Assumptions. Interesting Values. Queuing Model

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

White Paper. Recording Server Virtualization

Parallel Data Preparation with the DS2 Programming Language

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

UCSF Academic Research Systems

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

Scaling Database Performance in Azure

PureSystems: Changing The Economics And Experience Of IT

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

EonStor DS remote replication feature guide

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

WHITE PAPER Optimizing Virtual Platform Disk Performance

Tackling The Challenges of Big Data. Tackling The Challenges of Big Data Big Data Systems. Security is a Negative Goal. Nickolai Zeldovich

Oracle Big Data SQL Technical Update

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

RevoScaleR Speed and Scalability

Beyond Conventional Data Warehousing. Florian Waas Greenplum Inc.

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Binary search tree with SIMD bandwidth optimization using SSE

Users are Complaining that the System is Slow What Should I Do Now? Part 1

When a variable is assigned as a Process Initialization variable its value is provided at the beginning of the process.

Transcription:

White Paper PiLab Technology Performance Test, Deep Queries pilab.pl

Table of Content Forward... 3 Summary of Results... 3 Technical Introduction... 3 Hardware... 4 Disk read/write speed... 4 Network speed test... 5 Software... 5 Database Size... 6 Test Results... 6 2

Forward PiLab is developing various technologies that could potentially be used in future PiLab Business Intelligence products. In this paper, we examine benchmark results for performing complex database operations as may be required in order to support an advanced Business Intelligence (BI) tool. Useful capabilities of an ideal next-generation BI product could include the ability to enable users to do ad-hoc testing of hypotheses (i.e., ask any question they want), to quickly browse through data, and to have more users simultaneously do more complex analyses. Enabling such capabilities in a Business Intelligence tool would require that in the background the BI system could quickly perform highly complex database operations that are far beyond the capabilities of today s tools. In this paper, we review performance benchmark results achieved with PiLab software technology for performing such complex database operations. Summary of Results PiLab software technology was shown to be able to execute queries with 600 database joins within the same resource set where traditional approaches would result in failure due to lack of resources after only nine joins. This represents a dramatic improvement of >60X in the ability to do deep, complex queries that could be very useful in enabling breakthrough capabilities in a next-generation BI tool. Technical Introduction The objective of this analysis was to measure the relative advantage PiLab software provides when doing deep queries. Before reviewing the complete results, we will first describe the test environment. 3

Hardware Name Quantity Additional information E6000H Integrated Frame (include Data Mangerment Board) 1 MM620 Shelf Management Module 2 BH622 V2 Sandy- Bridge Romley EP Server 4 BH640 V2 Sandy- Bridge Romley EP 4S Server 4 HardDisk-300GB-SAS- 10000RPM-2.5-16M 16 Disk read/write speed [root@master ~]# gpcheckperf -f /tmp/hostfile_exkeys -d /home/primary -d /home/mirror -r ds /usr/local/greenplum-db/./bin/gpcheckperf -f /tmp/hostfile_exkeys -d /home/primary -d /home/mirror -r ds disk write avg time (sec): 485.42 disk write tot bytes: 945169498112 disk write tot bandwidth (MB/s): 1857.61 disk write min bandwidth (MB/s): 0.00 [ seg5] disk write max bandwidth (MB/s): 273.91 [ seg4] disk read avg time (sec): 519.42 disk read tot bytes: 1080193712128 disk read tot bandwidth (MB/s): 2007.81 4

disk read min bandwidth (MB/s): 223.24 [ seg3] disk read max bandwidth (MB/s): 300.63 [ seg5] stream tot bandwidth (MB/s): 44376.52 stream min bandwidth (MB/s): 5392.65 [ seg4] stream max bandwidth (MB/s): 5723.33 [master] Network speed test [root@master ~]# gpcheckperf -f /tmp/hostfile_exkeys -r N -d /tmp /usr/local/greenplum-db/./bin/gpcheckperf -f /tmp/hostfile_exkeys -r N -d /tmp Netperf bisection bandwidth test master -> seg1 = 112.280000 seg2 -> seg3 = 112.270000 seg4 -> seg5 = 112.280000 seg6 -> seg7 = 112.280000 seg1 -> master = 112.280000 seg3 -> seg2 = 112.290000 seg5 -> seg4 = 112.290000 seg7 -> seg6 = 112.310000 Summary: sum = 898.28 MB/sec min = 112.27 MB/sec max = 112.31 MB/sec avg = 112.28 MB/sec median = 112.28 MB/sec Software Operating system: CentOS 6.4 Database system: Pivotal GreenPlum DB 4.3 with master node on BH640 server 5

Database Size The database was filled with data using DataFiller software: https://www.cri.ensmp.fr/people/coelho/datafiller.html The size of the database was: 1.3TB. First table: Rows: 110M Values: 5 integer columns all filed with random data Second table Rows: 180 Values: 5 integer columns, 4 varchar(4000) columns all filled with random data Third table Rows: 10 Values: 5 integer columns, 4 varchar(4000) columns all filled with random data Forth table Rows: 1.7G with 5% of empty values Values: all values were 12 ([a-za-z0-9]) chars (varchar2(4000)) strings generated randomly based on natural distribution Fifth table Rows: 500M Values: Values: 5 integer columns all filed with random data Test Results The test was to query the system while incrementing the depth of the query based on the inner joins and left joins statements. The specifications for the queries were generated based on sample reports as used in customer environments. The depth of the query is crucial, especially in large and complex database systems that need to join many different database structures to get the result. In the test environment, traditional SQL querying was inefficient, and the time and resources consumed by the system was growing exponentially as the number of joined structures increased. For this workload, the system was unable to do more than eight joins. Attempting to join nine structures resulted in the system declaring insufficient resources after 3.5 hours of processing, and attempting to do any further SQL querying did not generate any results. Patent pending technology created by PiLab enables spreading the overhead of complex queries into small pieces, which could enable a BI system residing on that database to do infinite drilling into the logical structure of the data. For the same test referenced above, as shown below PiLab software was able to: Execute the same joins dramatically faster Join up to 600 structures. 6

Time in milis pilab.pl 10000000 9000000 Depth traversing comparison 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 Depth 0 PILAB jumps Standard Join 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100 200 300 400 500 600 Figure 1: Traditional SQL querying execution times increased exponentially and a maximum of eight joins could be performed. PiLab software enabled far faster queries, with linear execution time through 600 queries. 7