White Paper - GPU-Based SQL Database. SQream Technologies. SQream DB GPU-Based SQL Database Technical Overview White Paper



Similar documents
Scaling Your Data to the Cloud

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

SQL Server 2012 Performance White Paper

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

An Oracle White Paper May Exadata Smart Flash Cache and the Oracle Exadata Database Machine

2009 Oracle Corporation 1

How To Use Hp Vertica Ondemand

Big Data & Cloud Computing. Faysal Shaarani

IBM Netezza High Capacity Appliance

CitusDB Architecture for Real-Time Big Data

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

IBM Netezza High-performance business intelligence and advanced analytics for the enterprise. The analytics conundrum

SUN ORACLE EXADATA STORAGE SERVER

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Why DBMSs Matter More than Ever in the Big Data Era

Optimizing SQL Server AlwaysOn Implementations with OCZ s ZD-XL SQL Accelerator

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

An Oracle White Paper November Backup and Recovery with Oracle s Sun ZFS Storage Appliances and Oracle Recovery Manager

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Intel RAID SSD Cache Controller RCS25ZB040

Next-Generation Cloud Analytics with Amazon Redshift

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Oracle BI Publisher Enterprise Cluster Deployment. An Oracle White Paper August 2007

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

ScaleArc idb Solution for SQL Server Deployments

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

Main Memory Data Warehouses

Exadata Database Machine

Overview: X5 Generation Database Machines

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

An Oracle White Paper May Oracle Database Cloud Service

SUN ORACLE DATABASE MACHINE

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

1 Performance Moves to the Forefront for Data Warehouse Initiatives. 2 Real-Time Data Gets Real

Understanding the Value of In-Memory in the IT Landscape

Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper October 2010

IBM System x reference architecture solutions for big data

An Oracle White Paper October Oracle: Big Data for the Enterprise

Innovative technology for big data analytics

Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value

IT CHANGE MANAGEMENT & THE ORACLE EXADATA DATABASE MACHINE

Inge Os Sales Consulting Manager Oracle Norway

SQL Server 2005 Features Comparison

ORACLE COHERENCE 12CR2

ScaleArc for SQL Server

Online Firm Improves Performance, Customer Service with Mission-Critical Storage Solution

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

How To Handle Big Data With A Data Scientist

Microsoft Analytics Platform System. Solution Brief

IBM BigInsights for Apache Hadoop

SUN ORACLE DATABASE MACHINE

From Spark to Ignition:

Key Attributes for Analytics in an IBM i environment

EMC VFCACHE ACCELERATES ORACLE

Performance And Scalability In Oracle9i And SQL Server 2000

Big data management with IBM General Parallel File System

SQream Technologies Ltd - Confiden7al

ORACLE DATABASE 10G ENTERPRISE EDITION

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

Oracle Real-Time Scheduler Benchmark

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box)

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Oracle Big Data Management System

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

Big Data and Its Impact on the Data Warehousing Architecture

Performance and Scalability Overview

An Oracle White Paper June Oracle Database Firewall 5.0 Sizing Best Practices

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Aaron Werman.

Evolving Solutions Disruptive Technology Series Modern Data Warehouse

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

Il mondo dei DB Cambia : Tecnologie e opportunita`

SQL Maestro and the ELT Paradigm Shift

An Oracle White Paper June Oracle: Big Data for the Enterprise

Oracle Big Data SQL Technical Update

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR

Instant-On Enterprise

For Midsize Organizations. Oracle Product Brief Oracle Business Intelligence Standard Edition One

Architectures for Big Data Analytics A database perspective

ORACLE INFRASTRUCTURE AS A SERVICE PRIVATE CLOUD WITH CAPACITY ON DEMAND

Query Acceleration of Oracle Database 12c In-Memory using Software on Chip Technology with Fujitsu M10 SPARC Servers

An Oracle White Paper February Oracle Data Integrator 12c Architecture Overview

Actian Vector in Hadoop

Transcription:

SQream Technologies SQream DB GPU-Based SQL Database Technical Overview White Paper

Overview SQream DB is an analytic database built from scratch to harness the unique performance of graphical processors (GPUs) for handling petabyte-scale data, thus yielding significant savings in time and resources to its users. SQream DB s unique, cost-effective solution, provides enterprises with significant added value empowering BI, data scientists, engineers and even marketing teams with new possibilities in big data analytics. SQream DB running on a single or multiple NVIDIA GPUs, is capable of processing enormous data sets up to 100 times faster than any other leading data warehouse solution available today, by easily integrating it with existing tools and relational SQL queries - boosting productivity while reducing infrastructure and operating costs. Translating the above into tangible gains - running 100 times more queries while lowering the TCO - means that SQream DB is an outstandingly valuable asset to any organization handling big data analytic workloads. The SQream Advantage With the worldwide exploding data creation, organizations need to make use of and stay on top of their collected data. Organizations are facing a serious challenge in regards to storing immense volumes of structured and semi-structured data, analyzing it and obtaining real-time, rapid, actionable insights from it. Entities with quickly scaling data need a high-performance solution that will continue to perform well when addressing multi-petabyte data sets and heavy workloads. SQream DB is designed to address such needs, with the following four main advantages: Small Server Size SQream DB is designed from ground up to serve as a powerful database, while requiring as little as a single standard tower server or a 2U rack mount enclosure. Comparing a single 2U server with a full 42U rack vendor-supplied enclosure such as Teradata, Oracle Exadata and IBM PureData System for Analytics (formerly Netezza), the 2U server is capable of yielding equal or better query execution performance. As for costs - the savings in hardware, power, floor space, cooling and maintenance are enormous. SQream DB is not limited to the 2U form factor and can scale to larger configurations supporting multiple GPUs. Scale GPU is a Massively Parallel Processor (MPP) on a Card The idea behind SQream s architecture is harnessing the readily available power of thousands of parallel processing cores in a cost-effective GPU, to compete with and overtake standard and parallel DBMS solutions, running on dozens of expensive general-purpose processors. 2

MULTI- CPU - up to 32 cores GPU - up to 2880 cores CACHE RAM RAM MULTI- CPU - up to 32 cores GPU - up to 2880 cores CACHE RAM RAM A 32-core CPU installation (latency- oriented) requires a lot of power and can cost thousands of dollars. On the other hand, a single throughput-oriented GPU can have as many as 3000 onboard cores, delivering superior performance at a significantly lower cost, and a 90% reduced power consumption. With up to 20 times more processing power per node, suitable for aggressive data operations, and outstanding highspeed and scalability it is easy to see how SQream DB benefits the use of GPUs. While other clustered solutions may be massively parallel through scaling-out computers, SQream DB is massively parallel through the GPUs on-board thousands of cores. Moreover, several GPUs can link together inside the same enclosure, delivering a reduction of both memory and network I/O while decreasing network load and latency. Simplicity in Integration With SQream DB implementation could not be easier. SQream DB uses the familiar ANSI SQL syntax, meaning there is no need for any data remodeling, and no new skills need to be acquired. Employees don t need retraining and do not have to rewrite hundreds of queries. Even third party ETL and BI tools can easily be connected and used via industry standard ODBC/JDBC interfaces, without hiring integration specialists. 3

[At the time of writing this paper, SQream DB was tested to work with the following ETL and BI tools: Pentaho, Talend, Informatica, DataStage, SSIS, QlikView, Spotfire, Tableau, Business Objects and even Excel.] Simplicity by Design SQream DB is a columnar database, in which each column is stored as a collection of data chunks, each containing millions of values. SQream DB automates the creation of smart metadata on top of each column and every data chunk. This smart metadata replaces the common indexing used by most databases, thus eliminating the lengthy and limiting process of index creation while ingesting new data. The result is a smart grid for accessing any desired data on demand, at petabyte scale. SQream Database Architecture Connectors: JDBC,.Net, ODBC SQream Server SQL Parser Optimizer Resource Manager CPU/GPU Execution graph Runtime I/O Manager SQream Storage Metadata ext4/ntfs 4

Relational Algebra SQream DB utilizes a concept called relational algebra, first proposed by Edgar F. Codd from IBM Research, in 1969. This is a powerful model based on mathematical theory and is used by many SQL engines. It is based on set theory. The operations described as filters and joins, are such strong concepts, that they are comparable to mathematical basics like addition and multiplication. Relational Algebra is therefore not only well studied, but comprehensively battle tested in real world applications. By transforming your relational SQL queries into clever, highly parallelizable relational algebra, SQream DB can efficiently perform complex operations on the massively parallel GPU cores. These operations are performed internally by the SQream DB compiler and require no user intervention. Performance Relational Algebra Optimizations The SQream DB compiler does a lot of the heavy lifting. The compiler processes the given SQL query (from standard ODBC or JDBC connectors), creates an execution plan and then optimizes it. The result is an equivalent query that produces the same results, but runs a lot faster. Because SQream DB works in a massively parallel environment, most of the optimizations involve combining repeated work and choosing alternative paths that reduce repeated processor and I/O operations. GPU Parallelism SQream DB s main processing power comes from the massively parallel NVIDIA GPU. The execution plan that the compiler choses is uniquely suited and optimized for the NVIDIA GPU, resulting in high-speed, real-time, high scale performance. By using original patent-pending concepts, SQream DB s compiler and compressors are able to reduce the amount of I/O and repeated operations before the data is even transferred to the GPU, resulting in an incredible speed advantage with complex queries. Storage SQream DB utilizes powerful and robust columnar storage, split up into GPU manageable chunks. While some newer DBMS solutions are semi-columnar, SQream DB is fully columnar, including both the storage and the query engine. Vertical partitioning - columnar storage - This feature allows selective access to the required subset of columns, reducing disk scan and memory I/O time, compared with standard row storage. This seemingly straightforward concept enables SQream DB to operate very quickly. Horizontal partitioning - extent storage SQream automatically splits up the storage horizontally into manageable chunks enabling optimal usage of the hardware resources and relatively small memory availability in GPUs, compared with CPU RAM. 5

Emp_no Dept_id Hire_date Emp_in Dept_in 1 1 2012-01-01 Smith John 2 1 2014-05-16 Johnson Barbara 3 1 2014-01-22 Miller Amanda 4 2 2012-06-08 Taylor Evelyn 5 2 2013-04-25 Wilson Bob 6 3 2013-08-01 Brown Jim 1 1 2012-01-01 Smith John 2 1 2014-05-16 Johnson Barbara 3 1 2014-01-22 Miller Amanda 1 2 3 4 5 1 1 1 2 2 2012-01-01 2014-05-16 2014-01-22 2012-06-08 2013-04-25 Smart Metadata Smart metadata is automatically generated on the fly for each chunk, while data is ingested. The smart metadata enables the immediate pinpointing of the exact required data for each query. When using leading RDBMS solutions, DBAs need to set up indexing, at least on a few columns. SQream DB s smart metadata method means that the DBAs do not need to perform any data modeling or create indexes or primary keys, as these are automatically dealt with through the smart metadata during the data ingestion. The result is a cutting-edge smart grid for accessing and querying any desired data on demand, at petabyte scale. Smart metadata comes into play and enables ultra-fast, sub-second responses to specific queries, such as SELECT COUNT or SELECT DISTINCT SQream utilizes the smart metadata extensively, while saving significant processing and I/O time by pinpointing data chunks that are involved in the processing of each query. SQream DB offers ultra-fast data ingestion. Processing is done on the GPU, leaving the CPU free to perform heavy I/O. Thus, up to 2TB worth of ETL operations may be ingested by the server each hour, even with a basic configuration consisting of a single GPU card. Compression By utilizing cutting-edge but well-established compression algorithms specially tuned for fast operations, SQream DB enables reduction of disk storage size, while still maintaining blazing fast queries. In fact, the compression algorithms are so fast, that most hard-drives will be the bottleneck of the compress/decompress process. SQream s compression and decompression is performed on-the-fly on the GPU, 50 times faster than on a standard CPU. It is so fast that SQream DB compresses and decompresses everything. Other leading databases compress only some of the data. 6

Scaling Linear scaling in performance As opposed to other DBMSs - where performance decreases as data volume increases (beyond a certain threshold) - SQream DB s innovative technology allows for steady performance regardless of the data scale. Scaling in storage Storage may be enlarged easily, by adding more drives to the server. SQream DB s highly capable algorithms tackle the rest. Since SQream DB is throughput intensive, it is opt for multi-terabyte conventional hard drives and basic SSDs. Scaling in GPUs, not CPUs or nodes Adding additional compute power is simple. There is no need to replace the entire server, but only to plug in additional NVIDIA GPU cards. Interfaces and Integration SQL Support SQream DB supports the pure ANSI SQL language. Stored procedures such as Microsoft T-SQL and Oracle PL/SQL are not supported. SQream DB integrates easily into existing systems by supporting the usage of both ODBC and JDBC connectors. This means existing ETL and analytics tools and developed applications can stay, minimizing the time needed to get up and running with SQream DB. SQream DB may be introduced on its own, as a standalone petabyte-scale database, to meet all the analytic needs. However, there is no need to throw away existing solutions. Instead of upgrading current solutions by procuring additional non-linearly scaling hardware, organizations may plug in SQream DB as a secondary database solution, creating an on/ offloading system and empowering existing investments. IT Monitoring SQream DB runs on standard hardware and can easily integrate with any control and monitoring software in use, to track Linux based machines. Logging SQream DB contains a built-in logger that tracks critical server information, enabling IT and security teams to gain insights from the server s operations - from failed login attempts, to CPU time spent per query, through read-write cycles and memory utilization. Security SQream DB offers username/password authentication for levels ranging from the cluster (multi-database), all the way down to per-table authentication. 7

Backup and Restore Operations SQream DB offers backup and restore operations either via SQL statements or directly from the file system. The latter means that SQream DB can be backed up and restored, using any external storage system (Data Replication Manager). High Availability Configuration Multiple SQream DB servers may be connected to a single external storage system, while at any point in time, only one server is active and the others are passive. When the Active server fails, the Passive server mounts the shared storage and continues to respond to queries, without any data loss. [Active/Active and automatic Fail-Over is planned for the next release]. Alternatively, SQream DB can also run in a stand-alone cluster topology, in which two servers - both with the same internal direct attached storage, are active - while the first, which ingests new data and serves queries, continuously updates the other. Upon the first server failure, the other seamlessly takes control, with no time or data loss. Active Passive Storage SQream vs. Other Big Data Solutions Organizations may be considering a trendy new cluster or NoSQL solution. These are excellent for specific implementations, but they require experienced DBAs and new application development skills. Compared with the painless and hassle-free integration of SQream DB, the benefits of the latter are obvious. 8

Summary SQream DB delivers up to 100 times faster big data analytics compared with other key market players, while using significantly smaller hardware footprint. SQream DB is the only solution that is truly capable of dealing with massive big data escalating magnitudes (petabyte scale and hundreds of billions of rows of data), and doing so at relative ease and extraordinary value. SQream DB opens up new opportunities for organizations to do much more with their data, in relevance to their unique business use cases. Petabyte scale data insights with hundreds of billions of entries are now within reach. Organizations may integrate SQream DB as a standalone database solution or as a complementary analytics database, maximizing existing core IT investments. The SQream DB hardware architecture enables significant cost savings through the use of GPU s and their massively parallel abilities, instead of clustering servers and nodes - thus optimizing the system in a way that saves both hardware, infrastructure, utilities and maintenance costs. The integration of SQream DB is extremely straight-forward and requires no massive rewrites of SQL queries, no additional skills need to be acquired, and the database plugs in easily to the existing ecosystem - requiring little to no transition time and no investment in training, etc. All of the above translate into substantial gains for the organization by enabling the running of two orders of magnitude more queries - unlocking the critical business intelligence and information hiding in organizations collected big data. SQream DB brings organizations to a leading advantage point, while significantly reducing their hardware and operating costs. For more information about SQream DB, visit www.sqream.com or call +972.3.544.4871. Copyright 2010. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced in any form, for any purpose, without our prior written permission. 9