ZBDB Scaling Your Data to the Cloud Technical Overview White Paper POWERED BY
Overview ZBDB Zettabyte Database is a new, fully managed data warehouse on the cloud, from SQream Technologies. By building upon proven SQream DB technology, ZBDB gives customers the opportunity to analyze large data sets in an easy, no hassle solution, on a pay-as-you-go model. ZBDB s core, SQream DB is an analytic database built from scratch to harness the unique performance of graphical processors (GPUs) for handling petabyte-scale data. ZBDB runs on SoftLayer s bare metal service, enabling flexibility in machine configuration and superior performance, by eliminating unnecessary abstractions. Translating the above into tangible gains - running 100 times more queries while lowering the TCO, means ZBDB is an outstandingly valuable solution to organizations handling growing analytic workloads. The ZBDB Advantage With the boom in worldwide data creation, organizations need to make use of and stay on top of their collected data. The dramatic change in data volume challenges the way your organization stores immense volumes of structured and semistructured data, analyze it and obtain real-time, rapid, actionable insights from it. Entities with quickly scaling data need a high-performance solution that will continue to perform well when addressing multi-petabyte data sets and heavy workloads. SQream DB, the power behind ZBDB service is designed to address these needs, with the following three main advantages: Small Server Size SQream DB is designed from ground up to serve as a powerful database, while requiring as little as a single standard tower server or a 2U rack mount enclosure. Comparing a single 2U server with a full 42U rack vendor-supplied enclosure such as Teradata, Oracle Exadata and IBM PureData System for Analytics (formerly Netezza), the 2U server is capable of yielding equal or better query execution performance. As for costs - the savings in hardware, power, floor space, cooling and maintenance are enormous. SQream DB is not limited to the 2U form factor and can scale to larger configurations supporting multiple GPUs. Scale The GPU is a Massively Parallel Processor (MPP) on a Card The idea behind SQream s architecture is harnessing the readily available power of thousands of parallel processing cores in a cost-effective GPU, to compete with and overtake standard and parallel DBMS solutions running on dozens of expensive general-purpose processors. 2
MULTI- CPU - up to 32 cores GPU - up to 2880 cores CACHE RAM RAM MULTI- CPU - up to 32 cores GPU - up to 2880 cores CACHE RAM RAM A 32-core CPU (latency-oriented) installation requires a lot of power and can cost thousands of dollars. On the other hand, a single throughput-oriented GPU can have as many as 3000 onboard cores and gives superior performance at a significantly lower cost, and reduced power consumption. With up to 20 times more processing power per node when compared to a general purpose CPU, outstanding highspeed performance and scalability all at a 90% less power consumption - the GPU is suitable for aggressive data operations. This is how SQream DB benefits from the use of GPUs. While other clustered solutions might be massively parallel (scaling out), SQream DB is massively parallel on a card, with thousands of cores. Moreover, several GPUs can link together inside the same enclosure, delivering a reduction of both memory and network I/O while decreasing network load and latency, scaling incredibly easily. Simplicity in Integration With ZBDB, implementation could not be easier. Since ZBDB s underlying SQream DB uses the familiar standardized SQL syntax, not only is data remodeling avoidable, but your DBAs do not need any new skills. Your employees will need just minor training and will not have to rewrite hundreds of queries. Even your third party ETL and BI tools can be easily connected and used via industry standard ODBC/JDBC interfaces. ZBDB has been tested to work with these popular ETL and BI tools: Pentaho, Talend, Informatica, DataStage, SSIS, QlikView, Spotfire, Tableau, Business Objects and even Excel. 3
Simplicity by Design ZBDB uses SQream DB, a columnar database, in which each column is stored as a collection of data chunks, each containing millions of values. SQream DB automates the creation of smart metadata on top of each column and every data chunk. This smart metadata replaces the common indexing used by most databases, thus eliminating the lengthy and limiting process of index creation while ingesting new data. The result is a smart grid for accessing any desired data on demand, at petabyte scale. ZBDB s Architecture Connectors: JDBC, ODBC ZBDB SQL Parser Optimizer Resource Manager CPU/GPU Execution graph Runtime I/O Manager SQream Storage Metadata SoftLayer Storage 4
Relational Algebra SQream DB utilizes a concept called relational algebra, first proposed by Edgar F. Codd from IBM Research in 1969. This is a powerful model based on mathematical theory and is used by many SQL engines. This model is based in set theory. The operations described like filters and joins are such strong concepts, they can be compared to mathematical basics like addition and multiplication. Relational Algebra is therefore not only well studied, but comprehensively battle tested in real world applications. By transforming your relational SQL queries into clever, highly parallelizable relational algebra, SQream DB can efficiently perform complex operations on the massively parallel GPU cores. These operations are performed internally by the SQream DB compiler and require no user intervention. Performance Relational Algebra Optimizations The SQream DB compiler does a lot of the heavy lifting. The compiler processes the given SQL query (from standard ODBC or JDBC connectors), creates an execution plan and then optimizes it. The result is an equivalent query that produces the same results, but runs a lot faster. Because SQream DB works in a massively parallel environment, most of the optimizations involve combining repeated work and choosing alternative paths that reduce repeated processor and I/O operations. GPU Parallelism SQream DB s main processing power comes from the massively parallel NVIDIA GPU. The execution plan that the compiler chose is uniquely suited and optimized for the NVIDIA GPU, resulting in high-speed, real-time, high scale performance. By using original patent-pending concepts, SQream DB s compiler and compressors manage to reduce the amount of I/O and repeated operations before the data is even transferred to the GPU, resulting in an incredible speed advantage with complex queries. Storage SQream DB utilizes powerful and robust columnar storage, split up into GPU manageable chunks. While some newer DBMS solutions are semi-columnar, SQream DB is fully columnar, including both the storage and the query engine. Vertical partitioning - columnar storage - This feature allows selective access to the required subset of columns, reducing disk scan and memory I/O time when compared with standard row storage. This seemingly straightforward concept enables SQream DB to operate so quickly. Horizontal partitioning - extent storage SQream automatically splits up the storage horizontally into manageable chunks enabling optimal usage of the hardware resources and relatively small memory available in GPUs when compared with CPU RAM. 5
Emp_no Dept_id Hire_date Emp_in Dept_in 1 1 2012-01-01 Smith John 2 1 2014-05-16 Johnson Barbara 3 1 2014-01-22 Miller Amanda 4 2 2012-06-08 Taylor Evelyn 5 2 2013-04-25 Wilson Bob 6 3 2013-08-01 Brown Jim 1 1 2012-01-01 Smith John 2 1 2014-05-16 Johnson Barbara 3 1 2014-01-22 Miller Amanda 1 2 3 4 5 1 1 1 2 2 2012-01-01 2014-05-16 2014-01-22 2012-06-08 2013-04-25 Smart Metadata Smart metadata is automatically generated on the fly for each chunk while data is ingested. The smart metadata enables the immediate pinpointing of the exact required data for each query. In leading RDBMS solutions, DBAs need to set up indexing, at least on a few columns. SQream DB s smart metadata method means that the DBA does not need to perform data modeling or create indexes and primary keys as these are automatically dealt with through the smart metadata during the data ingestion. The result is a cutting-edge smart grid for accessing and querying any desired data on demand, at petabyte scale. Smart metadata comes into play and enables ultra-fast, sub-second responses to specific queries, such as SELECT COUNT or SELECT DISTINCT. The smart metadata is used extensively in SQream, and significantly saves processing and I/O times by pinpointing data chunks that are involved in the processing of each query. SQream DB offers ultra-fast data ingestion. Processing is done on the GPU, leaving the CPU free to perform heavy I/O, meaning over 1TB worth of ETL operations per hour per ZBDB instance. Compression By utilizing cutting-edge but well-established compression algorithms specially tuned for fast operations, SQream DB enables reduction of storage size on disk while still maintaining blazing fast queries. In fact, the compression algorithms are so fast that most hard-drives will be the bottleneck of the compress/decompress process. Our compression and decompression is performed on-the-fly at a blazing speed on the GPU, 50 times faster than on a standard CPU. In fact, it is so fast that SQream DB compresses and decompresses everything. Other leading databases compress only some of the data. 6
Scaling Linear scaling in performance SQream DB innovative design allows it to scale linearly with the size of the data, meaning you will not spend exponentially longer time as your data grows linearly. Scaling in storage As easy as purchasing more storage! Just add more storage to your instance, our capable algorithms take care of the rest. Scaling in GPUs, not CPUs or nodes Additional compute power is easy. You will not need to replace the entire server. Plug in additional NVIDIA GPU cards, and you are ready to go. Interfaces and Integration SQL Support Since ZBDB builds upon standardized SQL, integration is simple. ZBDB integrates easily into your existing systems by supporting the usage of both ODBC and JDBC connectors. This means your existing ETL and analytics tools and your developed applications can stay, minimizing the time you will need to get up and running. Managed solution Traditional data warehouses are complicated, and take a significant investment of time, resources and hardware to get running. Not only that, but once you ve made the investment, you have to hire expert DBAs to get things running smoothly. ZBDB takes care of that for you, with backups and security built in. Secure Your instance of ZBDB is protected and guaranteed to be visible only to you. Data is transferred via secure TLS or an encrypted VPN, based on your choice of convenience. Scalable Because ZBDB is a fully managed, ZBDB scales with your business. As your business grows and your data with it, just add more storage and we handle the rest for you. (Super) Fast ZBDB builds upon the proven, award-winning technologies of SQream DB. By using columnar storage, on-the-fly fast GPU compression and harnessing the power of Nvidia s GPUs - ZBDB gives you unparalleled performance without breaking a sweat. You have a supercomputer at your fingertips. Support All ZBDB instances come with access to our large knowledge base and guides to get you started. Because ZBDB uses standardized SQL, you should be up and running within hours. If however you do get stuck, one of our support personnel will help you out, for free. 7
Summary ZBDB uses columnar storage and massively parallel processing on a Graphics Processor to handle all of your data. ZBDB seamlessly distributes your data and queries over thousands of processors to deliver high-performance, high-throughput results - whether you have hundreds of gigabytes or hundreds of terabytes of data. ZBDB can deliver faster, more cost effective Big Data analytics when compared with other key market players. ZBDB is the only solution with predictable billing, no surprises, no hidden costs and no bandwidth bills. For more information about ZBDB, visit www.sqream.com or call +972.3.544.4871. Copyright 2015. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced in any form, for any purpose, without our prior written permission. 8