Scaling Your Data to the Cloud



Similar documents
White Paper - GPU-Based SQL Database. SQream Technologies. SQream DB GPU-Based SQL Database Technical Overview White Paper

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

SQL Server 2012 Performance White Paper

Big Data & Cloud Computing. Faysal Shaarani

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

2009 Oracle Corporation 1

An Oracle White Paper May Exadata Smart Flash Cache and the Oracle Exadata Database Machine

IBM Netezza High Capacity Appliance

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

SUN ORACLE EXADATA STORAGE SERVER

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

Exadata Database Machine

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

How To Use Hp Vertica Ondemand

An Oracle White Paper May Oracle Database Cloud Service

Innovative technology for big data analytics

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

IBM Netezza High-performance business intelligence and advanced analytics for the enterprise. The analytics conundrum

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Next-Generation Cloud Analytics with Amazon Redshift

Inge Os Sales Consulting Manager Oracle Norway

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

An Oracle White Paper November Backup and Recovery with Oracle s Sun ZFS Storage Appliances and Oracle Recovery Manager

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value

SUN ORACLE DATABASE MACHINE

IBM System x reference architecture solutions for big data

CitusDB Architecture for Real-Time Big Data

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Bringing Big Data into the Enterprise

Overview: X5 Generation Database Machines

Query Acceleration of Oracle Database 12c In-Memory using Software on Chip Technology with Fujitsu M10 SPARC Servers

Main Memory Data Warehouses

ORACLE INFRASTRUCTURE AS A SERVICE PRIVATE CLOUD WITH CAPACITY ON DEMAND

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

SQL Server 2012 Parallel Data Warehouse. Solution Brief

1 Performance Moves to the Forefront for Data Warehouse Initiatives. 2 Real-Time Data Gets Real

EMC GREENPLUM DATABASE

Understanding the Value of In-Memory in the IT Landscape

Key Attributes for Analytics in an IBM i environment

Architectures for Big Data Analytics A database perspective

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

IT CHANGE MANAGEMENT & THE ORACLE EXADATA DATABASE MACHINE

An Oracle White Paper October Oracle: Big Data for the Enterprise

An Oracle White Paper October Realizing the Superior Value and Performance of Oracle ZFS Storage Appliance

An Oracle White Paper February Oracle Data Integrator Performance Guide

Oracle BI Publisher Enterprise Cluster Deployment. An Oracle White Paper August 2007

Aaron Werman.

ORACLE COHERENCE 12CR2

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Focus on the business, not the business of data warehousing!

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Oracle Big Data Management System

Big Data and Its Impact on the Data Warehousing Architecture

Oracle Database Backup Service. Secure Backup in the Oracle Cloud

ScaleArc idb Solution for SQL Server Deployments

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Actian Vector in Hadoop

HadoopTM Analytics DDN

Applying traditional DBA skills to Oracle Exadata. Marc Fielding March 2013

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Netezza and Business Analytics Synergy

Oracle Big Data SQL Technical Update

SQream Technologies Ltd - Confiden7al

Capacity Management for Oracle Database Machine Exadata v2

Intel RAID SSD Cache Controller RCS25ZB040

SUN ORACLE DATABASE MACHINE

Guide to Database as a Service (DBaaS) Part 2 Delivering Database as a Service to Your Organization

Building your Big Data Architecture on Amazon Web Services

Big Data and Natural Language: Extracting Insight From Text

In-memory computing with SAP HANA

IBM PureData Systems. Robert Božič 2013 IBM Corporation

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

ORACLE DATABASE 10G ENTERPRISE EDITION

PRODUCT OVERVIEW SUITE DEALS. Combine our award-winning products for complete performance monitoring and optimization, and cost effective solutions.

2015 Ironside Group, Inc. 2

Driving Down the High Cost of Storage. Pillar Axiom 600

Optimizing SQL Server AlwaysOn Implementations with OCZ s ZD-XL SQL Accelerator

Performance and Scalability Overview

Why Oracle Database Runs Best on Oracle Servers and Storage. Optimize the Performance of the World s #1 Enterprise Database.

Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper October 2010

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Welcome to The Future of Analytics In Action IBM Corporation

The Complete Performance Solution for Microsoft SQL Server

Migrating Non-Oracle Databases and their Applications to Oracle Database 12c O R A C L E W H I T E P A P E R D E C E M B E R

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Transcription:

ZBDB Scaling Your Data to the Cloud Technical Overview White Paper POWERED BY

Overview ZBDB Zettabyte Database is a new, fully managed data warehouse on the cloud, from SQream Technologies. By building upon proven SQream DB technology, ZBDB gives customers the opportunity to analyze large data sets in an easy, no hassle solution, on a pay-as-you-go model. ZBDB s core, SQream DB is an analytic database built from scratch to harness the unique performance of graphical processors (GPUs) for handling petabyte-scale data. ZBDB runs on SoftLayer s bare metal service, enabling flexibility in machine configuration and superior performance, by eliminating unnecessary abstractions. Translating the above into tangible gains - running 100 times more queries while lowering the TCO, means ZBDB is an outstandingly valuable solution to organizations handling growing analytic workloads. The ZBDB Advantage With the boom in worldwide data creation, organizations need to make use of and stay on top of their collected data. The dramatic change in data volume challenges the way your organization stores immense volumes of structured and semistructured data, analyze it and obtain real-time, rapid, actionable insights from it. Entities with quickly scaling data need a high-performance solution that will continue to perform well when addressing multi-petabyte data sets and heavy workloads. SQream DB, the power behind ZBDB service is designed to address these needs, with the following three main advantages: Small Server Size SQream DB is designed from ground up to serve as a powerful database, while requiring as little as a single standard tower server or a 2U rack mount enclosure. Comparing a single 2U server with a full 42U rack vendor-supplied enclosure such as Teradata, Oracle Exadata and IBM PureData System for Analytics (formerly Netezza), the 2U server is capable of yielding equal or better query execution performance. As for costs - the savings in hardware, power, floor space, cooling and maintenance are enormous. SQream DB is not limited to the 2U form factor and can scale to larger configurations supporting multiple GPUs. Scale The GPU is a Massively Parallel Processor (MPP) on a Card The idea behind SQream s architecture is harnessing the readily available power of thousands of parallel processing cores in a cost-effective GPU, to compete with and overtake standard and parallel DBMS solutions running on dozens of expensive general-purpose processors. 2

MULTI- CPU - up to 32 cores GPU - up to 2880 cores CACHE RAM RAM MULTI- CPU - up to 32 cores GPU - up to 2880 cores CACHE RAM RAM A 32-core CPU (latency-oriented) installation requires a lot of power and can cost thousands of dollars. On the other hand, a single throughput-oriented GPU can have as many as 3000 onboard cores and gives superior performance at a significantly lower cost, and reduced power consumption. With up to 20 times more processing power per node when compared to a general purpose CPU, outstanding highspeed performance and scalability all at a 90% less power consumption - the GPU is suitable for aggressive data operations. This is how SQream DB benefits from the use of GPUs. While other clustered solutions might be massively parallel (scaling out), SQream DB is massively parallel on a card, with thousands of cores. Moreover, several GPUs can link together inside the same enclosure, delivering a reduction of both memory and network I/O while decreasing network load and latency, scaling incredibly easily. Simplicity in Integration With ZBDB, implementation could not be easier. Since ZBDB s underlying SQream DB uses the familiar standardized SQL syntax, not only is data remodeling avoidable, but your DBAs do not need any new skills. Your employees will need just minor training and will not have to rewrite hundreds of queries. Even your third party ETL and BI tools can be easily connected and used via industry standard ODBC/JDBC interfaces. ZBDB has been tested to work with these popular ETL and BI tools: Pentaho, Talend, Informatica, DataStage, SSIS, QlikView, Spotfire, Tableau, Business Objects and even Excel. 3

Simplicity by Design ZBDB uses SQream DB, a columnar database, in which each column is stored as a collection of data chunks, each containing millions of values. SQream DB automates the creation of smart metadata on top of each column and every data chunk. This smart metadata replaces the common indexing used by most databases, thus eliminating the lengthy and limiting process of index creation while ingesting new data. The result is a smart grid for accessing any desired data on demand, at petabyte scale. ZBDB s Architecture Connectors: JDBC, ODBC ZBDB SQL Parser Optimizer Resource Manager CPU/GPU Execution graph Runtime I/O Manager SQream Storage Metadata SoftLayer Storage 4

Relational Algebra SQream DB utilizes a concept called relational algebra, first proposed by Edgar F. Codd from IBM Research in 1969. This is a powerful model based on mathematical theory and is used by many SQL engines. This model is based in set theory. The operations described like filters and joins are such strong concepts, they can be compared to mathematical basics like addition and multiplication. Relational Algebra is therefore not only well studied, but comprehensively battle tested in real world applications. By transforming your relational SQL queries into clever, highly parallelizable relational algebra, SQream DB can efficiently perform complex operations on the massively parallel GPU cores. These operations are performed internally by the SQream DB compiler and require no user intervention. Performance Relational Algebra Optimizations The SQream DB compiler does a lot of the heavy lifting. The compiler processes the given SQL query (from standard ODBC or JDBC connectors), creates an execution plan and then optimizes it. The result is an equivalent query that produces the same results, but runs a lot faster. Because SQream DB works in a massively parallel environment, most of the optimizations involve combining repeated work and choosing alternative paths that reduce repeated processor and I/O operations. GPU Parallelism SQream DB s main processing power comes from the massively parallel NVIDIA GPU. The execution plan that the compiler chose is uniquely suited and optimized for the NVIDIA GPU, resulting in high-speed, real-time, high scale performance. By using original patent-pending concepts, SQream DB s compiler and compressors manage to reduce the amount of I/O and repeated operations before the data is even transferred to the GPU, resulting in an incredible speed advantage with complex queries. Storage SQream DB utilizes powerful and robust columnar storage, split up into GPU manageable chunks. While some newer DBMS solutions are semi-columnar, SQream DB is fully columnar, including both the storage and the query engine. Vertical partitioning - columnar storage - This feature allows selective access to the required subset of columns, reducing disk scan and memory I/O time when compared with standard row storage. This seemingly straightforward concept enables SQream DB to operate so quickly. Horizontal partitioning - extent storage SQream automatically splits up the storage horizontally into manageable chunks enabling optimal usage of the hardware resources and relatively small memory available in GPUs when compared with CPU RAM. 5

Emp_no Dept_id Hire_date Emp_in Dept_in 1 1 2012-01-01 Smith John 2 1 2014-05-16 Johnson Barbara 3 1 2014-01-22 Miller Amanda 4 2 2012-06-08 Taylor Evelyn 5 2 2013-04-25 Wilson Bob 6 3 2013-08-01 Brown Jim 1 1 2012-01-01 Smith John 2 1 2014-05-16 Johnson Barbara 3 1 2014-01-22 Miller Amanda 1 2 3 4 5 1 1 1 2 2 2012-01-01 2014-05-16 2014-01-22 2012-06-08 2013-04-25 Smart Metadata Smart metadata is automatically generated on the fly for each chunk while data is ingested. The smart metadata enables the immediate pinpointing of the exact required data for each query. In leading RDBMS solutions, DBAs need to set up indexing, at least on a few columns. SQream DB s smart metadata method means that the DBA does not need to perform data modeling or create indexes and primary keys as these are automatically dealt with through the smart metadata during the data ingestion. The result is a cutting-edge smart grid for accessing and querying any desired data on demand, at petabyte scale. Smart metadata comes into play and enables ultra-fast, sub-second responses to specific queries, such as SELECT COUNT or SELECT DISTINCT. The smart metadata is used extensively in SQream, and significantly saves processing and I/O times by pinpointing data chunks that are involved in the processing of each query. SQream DB offers ultra-fast data ingestion. Processing is done on the GPU, leaving the CPU free to perform heavy I/O, meaning over 1TB worth of ETL operations per hour per ZBDB instance. Compression By utilizing cutting-edge but well-established compression algorithms specially tuned for fast operations, SQream DB enables reduction of storage size on disk while still maintaining blazing fast queries. In fact, the compression algorithms are so fast that most hard-drives will be the bottleneck of the compress/decompress process. Our compression and decompression is performed on-the-fly at a blazing speed on the GPU, 50 times faster than on a standard CPU. In fact, it is so fast that SQream DB compresses and decompresses everything. Other leading databases compress only some of the data. 6

Scaling Linear scaling in performance SQream DB innovative design allows it to scale linearly with the size of the data, meaning you will not spend exponentially longer time as your data grows linearly. Scaling in storage As easy as purchasing more storage! Just add more storage to your instance, our capable algorithms take care of the rest. Scaling in GPUs, not CPUs or nodes Additional compute power is easy. You will not need to replace the entire server. Plug in additional NVIDIA GPU cards, and you are ready to go. Interfaces and Integration SQL Support Since ZBDB builds upon standardized SQL, integration is simple. ZBDB integrates easily into your existing systems by supporting the usage of both ODBC and JDBC connectors. This means your existing ETL and analytics tools and your developed applications can stay, minimizing the time you will need to get up and running. Managed solution Traditional data warehouses are complicated, and take a significant investment of time, resources and hardware to get running. Not only that, but once you ve made the investment, you have to hire expert DBAs to get things running smoothly. ZBDB takes care of that for you, with backups and security built in. Secure Your instance of ZBDB is protected and guaranteed to be visible only to you. Data is transferred via secure TLS or an encrypted VPN, based on your choice of convenience. Scalable Because ZBDB is a fully managed, ZBDB scales with your business. As your business grows and your data with it, just add more storage and we handle the rest for you. (Super) Fast ZBDB builds upon the proven, award-winning technologies of SQream DB. By using columnar storage, on-the-fly fast GPU compression and harnessing the power of Nvidia s GPUs - ZBDB gives you unparalleled performance without breaking a sweat. You have a supercomputer at your fingertips. Support All ZBDB instances come with access to our large knowledge base and guides to get you started. Because ZBDB uses standardized SQL, you should be up and running within hours. If however you do get stuck, one of our support personnel will help you out, for free. 7

Summary ZBDB uses columnar storage and massively parallel processing on a Graphics Processor to handle all of your data. ZBDB seamlessly distributes your data and queries over thousands of processors to deliver high-performance, high-throughput results - whether you have hundreds of gigabytes or hundreds of terabytes of data. ZBDB can deliver faster, more cost effective Big Data analytics when compared with other key market players. ZBDB is the only solution with predictable billing, no surprises, no hidden costs and no bandwidth bills. For more information about ZBDB, visit www.sqream.com or call +972.3.544.4871. Copyright 2015. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced in any form, for any purpose, without our prior written permission. 8