www.quilogic.com SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology



Similar documents
In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

RevoScaleR Speed and Scalability

GPU Parallel Computing Architecture and CUDA Programming Model

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Rethinking SIMD Vectorization for In-Memory Databases

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Safe Harbor Statement

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

SQL/XML-IMDB. In-Memory SQL / XML Database Component for Universal Data Management. White Paper. QuiLogic Inc

Partitioning under the hood in MySQL 5.5

White Paper. Optimizing the Performance Of MySQL Cluster

Key Attributes for Analytics in an IBM i environment

SQL Query Evaluation. Winter Lecture 23

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

Scaling Your Data to the Cloud

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Pattern Insight Clone Detection

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Whitepaper: performance of SqlBulkCopy

Resource Utilization of Middleware Components in Embedded Systems

CitusDB Architecture for Real-Time Big Data

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

OpenACC 2.0 and the PGI Accelerator Compilers

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Integrating Apache Spark with an Enterprise Data Warehouse

Topics in basic DBMS course

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Driving force. What future software needs. Potential research topics

Cloud Computing at Google. Architecture

In-Memory Data Management for Enterprise Applications

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Spring 2011 Prof. Hyesoon Kim

Next Generation Operating Systems

Bringing Big Data Modelling into the Hands of Domain Experts

Architectures for Big Data Analytics A database perspective

ORACLE DATABASE 10G ENTERPRISE EDITION

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology

MySQL Cluster New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com

Fast Analytics on Big Data with H20

Increasing Flash Throughput for Big Data Applications (Data Management Track)

Binary search tree with SIMD bandwidth optimization using SSE

Next Generation GPU Architecture Code-named Fermi

Multi-GPU Load Balancing for Simulation and Rendering

Infrastructure Matters: POWER8 vs. Xeon x86

SQL Server 2012 Performance White Paper

Distributed R for Big Data

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Oracle Database In-Memory The Next Big Thing

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Performance Verbesserung von SAP BW mit SQL Server Columnstore

SAP Business Suite powered by SAP HANA

DB2 for i5/os: Tuning for Performance

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Raima Database Manager Version 14.0 In-memory Database Engine

ultra fast SOM using CUDA

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

In-memory Tables Technology overview and solutions

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

HPC with Multicore and GPUs

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing

Evaluation of CUDA Fortran for the CFD code Strukti

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Using In-Memory Computing to Simplify Big Data Analytics

Deploying De-Duplication on Ext4 File System

Why DBMSs Matter More than Ever in the Big Data Era

Parallel Computing. Benson Muite. benson.

Optimizing Performance. Training Division New Delhi

MS SQL Performance (Tuning) Best Practices:

New Features in MySQL 5.0, 5.1, and Beyond

Physical Data Organization

Introduction to Cloud Computing

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

ADVANCED COMPUTER ARCHITECTURE

A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators

Netezza and Business Analytics Synergy

DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Computer Graphics Hardware An Overview

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

Data Center and Cloud Computing Market Landscape and Challenges

In-Memory Databases MemSQL

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

Transcription:

SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology

The parallel revolution Future computing systems are parallel, but Programmers have to code the parallelism Lake of development systems and experience Radical departure from general programming wisdom Requires combining CPU and GPU code 4 cores Requires close to metal knowledge for max. performance The Hardware is evolving and changing rapidly How are we going to program such systems? QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 2

changes everything Database systems now are ubiquitous, but 4 cores The core DB technology is from the 70s! Database systems now hitting a performance-wall Computer memory grows faster than enterprise data Hardware is evolving rapidly and gets more heterogeneous Parallel computing is a necessity for higher performance How are we going to develop such systems? QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 3

The Database has been re-invented SQL/XML-IMDBg Complete re-engineering of the database architecture is required to make a DB kernel multi-core ready and scalable to 100 s of processors Parallel In-Memory Database Server Cost effective upgrade path Unlimited scalability Declarative parallel data management and processing QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 4

Bridging the technology gap with SQL/XML-IMDBg Domain Experts can use a standard interface (SQL) for massive parallel computation and data mining Database Server Cost effective Unlimited scalability Universal Data management QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 5

Agenda (1) Overview about general database architecture Short introduction to SQL/XML-IMDB Overview of the re-engineering tasks performed to make the IMDB database kernel many-core ready Three levels of re-engineering: 1. General database architecture 2. Relational algebra structures and functions 3. Coding level QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 6

Agenda (2) Architecture of the database kernel Explains the overall design principle chosen to make a database kernel ready for massive parallel processing and GPU ready Database Table structure and outline Explains the vertical partitioning layout and why this is one of the most important pre-requisites for successful GPU co-processing in database kernels Memory management The memory management subsystem architecture and how to manage memory on a GPU device with missing dynamic memory management facilities Query Optimizer Gives insights into the Split-Work architecture and explains the Optimizer architecture with a special focus on the dynamic re-optimizing steps performed during query processing Query Executer structure Explains the architecture of the query Executer with a special focus on processing the split-work plan for distributing the query work between CPU and GPU QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 7

Agenda (3) Summary of presentation: Lessons learned from re-engineering the database kernel Some important conclusions drawn from our experiences regarding parallelizing and re-engineering a complex application for massive parallel platforms like GPU s Outlook on future enhancements planned for the IMDB Questions QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 8

DB Architecture overview XQuery SQL XQuery Parser SQL Parser Internal Query-Tree / Optimizer Code Generator Executer XML Memory Manager SQL Memory Manager Index Manager Local/Shared/GPU/Flash Memory QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 9

SQL/XML-IMDb overview DB Library (DLL, LIB,.NET Assembly) Application embedded SQL/XQuery statements Functional API Declarative universal data management Kind of declarative STL library Data exchange between process boundaries 8 years of market experience WIN32, WIN64, Linux32, Linux64, QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 10

SQL/XML-IMDB application practice Shared Memory Flash Memory Function1() { db.execxql ( FOR $X RETURN $X) } //---------------------------------------- Function2() { db.execsql ( SELECT FROM ) } C++ Application in Process A while( db.next() ) { } VB Application in Process B Sub Command1_OnClick db.execxql ( FOR $X INSERT Y ) db.execsql ( SELECT FROM ) ------------------------------------------- Sub Command2_OnClick db.execxql ( LET $X ) while db.next() <> 0 Do Local Memory Local Memory Export/Import (persistence) SQL/XML Tables in disk based files QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 11

Vision: Integrating the GPU memory Shared Memory GPU Co-Processor + Memory Tables Function1() { db.execxql ( FOR $X RETURN $X) } //---------------------------------------- Function2() { db.execsql ( SELECT FROM ) } C++ Application in Process A while( db.next() ) { } VB Application in Process B Sub Command1_OnClick db.execxql ( FOR $X INSERT Y ) db.execsql ( SELECT FROM ) ------------------------------------------- Sub Command2_OnClick db.execxql ( LET $X ) while db.next() <> 0 Do Local Memory Local Memory Export/Import (persistence) SQL/XML Tables in disk based files QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 12

Vision: Seamless utilization of GPU power Automatically distribute SQL query work and relational data between standard CPU cores and high performance graphics GPU s In-Memory: CPU-Local Memory, CPU-Shared Memory, GPU-Memory SQL/XML-IMDBg Place tables (and indexes) where you want: 4 cores Create Table Local TR ( ) Create Table Shared TS ( ) Create Table GPU TG ( ) Select * From TR, TS, TG WHERE. Select TG.a * TG.b / TG.c From TR, TS, TG WHERE. QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 13

Re-inventing the Database by complete re-engineering of the old IMDB QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 14

Re-Engineering (overall) Vertical partitioned table structure Arrays instead of Lists, Hash-Tables, Trees Compression (dictionary based) Regular shaped data structures Only one Index structure Vector processing of Algebra functions Fork-Join model (OpenMP) QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 15

Re-Engineering (coding) Reduced program code complexity Instruction cache friendlier Reduced memory footprint From C++ back to C Arrays everywhere Supports memory prefetch Simple access functions Loop unrolling Simpler buffer management Dynamic, array like SQL data tables No space waste (no static pre-allocation) Strings represented by ID s whenever possible Simple access/scan functions Simplified index structures Array like Better parallelizable Co-Processor friendly QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 16

3 essentials for ultra high performance RISC like DB core structure Simple and repeating data structures Dynamic Arrays everywhere Dramatically reduced DB-kernel complexity Favors parallel algorithmic structures! Split-Work Optimizer Dynamic mid-query re-optimizing Schedules query execution between CPU and GPU Split-Work Query Executer Work duplication and parallel execution Materialized intermediate results (compressed bitmaps) Vector processing of relational algebra QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 17

Agenda (2) General architecture outline of database kernel Explains the overall design principle chosen to make a database kernel ready for massive parallel processing and GPU ready Database Table structure and outline Explains the vertical partitioning layout and why this is one of the most important pre-requisites for successful GPU co-processing in database kernels Memory management The memory management subsystem architecture and how to manage memory on a GPU device with missing dynamic memory management facilities Query Optimizer Gives insights into the Split-Work architecture and explains the Optimizer architecture with a special focus on the dynamic re-optimizing steps performed during query processing Query Executer structure Explains the architecture of the query Executer with a special focus on processing the split-work plan for distributing the query work between CPU and GPU QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 18

Regular Shaped Structures and Dynamic Arrays everywhere QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 19

DB Architecture (core layout) Master Array L-Table S-Table G-Table Cursor Index Header Descriptor 1 dynamic size Descriptor n Jump table Data 1 n QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 20

Very good DB / GPU structure fit Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 21

Agenda (2) General architecture outline of database kernel Explains the overall design principle chosen to make a database kernel ready for massive parallel processing and GPU ready Database Table structure and outline Explains the vertical partitioning layout and why this is one of the most important pre-requisites for successful GPU co-processing in database kernels Memory management The memory management subsystem architecture and how to manage memory on a GPU device with missing dynamic memory management facilities Query Optimizer Gives insights into the Split-Work architecture and explains the Optimizer architecture with a special focus on the dynamic re-optimizing steps performed during query processing Query Executer structure Explains the architecture of the query Executer with a special focus on processing the split-work plan for distributing the query work between CPU and GPU QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 22

Table Layout (1) Vertical decomposition - dense data storage REF This is a string INT UINT Row ID s 1 DOUBLE 2 3 4 5 6 Column 1 Column 2 Column 3 Column 4 Column Example: Bool Data Type 12345678.... 32 10100011010010010000111011000010 10100011010010010000111011000010 10100011010010010000111011000010 Row ID s 32 64 96 Array like columns allow: Coalesced access on GPU CPU cache friendly access 10100011010010010000111011000010 128 Distributed table layout QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 23

Big Problem How to manage huge sized arrays QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 24

Table Layout (2) Table Descriptor Break array in manageable chunks Column Count Header Flags Row IDs fixed number of rows Contains a pointer in case of variable sized data Data Type Column1 Char size Col name Flags Data Addr Addr. Addr. Addr. Addr. Addr. Addr. Addr. 0 1024 2048 3072 4096 6020 Row IDs dynamic fixed Column N Data Type Char size Col name Flags Data Addr Addr Addr Addr Addr Addr Addr Addr 0 1024 2048 3072 4096 6020 QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 25

Agenda (2) General architecture outline of database kernel Explains the overall design principle chosen to make a database kernel ready for massive parallel processing and GPU ready Database Table structure and outline Explains the vertical partitioning layout and why this is one of the most important pre-requisites for successful GPU co-processing in database kernels Memory management The memory management subsystem architecture and how to manage memory on a GPU device with missing dynamic memory management facilities Query Optimizer Gives insights into the Split-Work architecture and explains the Optimizer architecture with a special focus on the dynamic re-optimizing steps performed during query processing Query Executer structure Explains the architecture of the query Executer with a special focus on processing the split-work plan for distributing the query work between CPU and GPU QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 26

Memory management (CPU) Win-OS VirtualAlloc Local Memory pool Shared Memory pool TLFS Allocator (Two-Level Segregate Fit) Bitmap based chunk management Simple code structure (instruction cache!) Memory pools based! Fastest allocator for shared memory Good cache locality Small size POOL 1 n Flash Memory pool Source: Implementation of a constant-time dynamic storage allocator. Miguel Masmano, Ismael Ripoll, et al. Software: Practice and Experience. Volume 38 Issue 10, Pages 995-1026. 2008. QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 27

Memory management (GPU) Win-OS cudamallochost() for data transfers VirtualAlloc() Data transfer mem-pool cudamalloc (large chunk) GPU GPU Memory pool POOL GPU device memory *devptr = *hostptr cudamemcpy Host memory QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 28

Table management on CPU and GPU POOL 1 POOL n CPU GPU Coalesced memory access Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 29

Scales to multi GPU clusters Some problems do not fit within a single GPU memory CPU Memory POOL 1 Split-Work Executer Memory POOL n Split-Work Executer GPU QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 30

GPU / CPU Table split CPU GPU Table split between CPU and GPU memory Query Processing Query Processing 4 cores Create Table TR( ) Create Table Shared TS( ) Create Table GPU TG( ) Table split is done automatically depending on column data type: varchar, char, blob, QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 31

Index Layout Header Index Descriptor Column ID Table ID Flags Search procedure: 1.) Candidate search 2.) Exact search Fixed size Data Type Index desc. Char size sorted Data Addr 41 768 2048 5017 xxx xxx xxx Col ID Searching can be: 1.) Binary search 2.) Scan 3.) Position estimate Col ID Col ID 1024 259 768 unsorted or sorted QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 32

Index processing on GPU Index Descriptor Search procedure: 1.) Candidate search 2.) Exact search Table Row IDs GPU Kernel call Binary search Simple scan Estimate * (skip over - use statistics of value distribution) Row ID 41 768 2048 5017 * CPU GPU Table values Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM PBSM QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 33

Agenda (2) General architecture outline of database kernel Explains the overall design principle chosen to make a database kernel ready for massive parallel processing and GPU ready Database Table structure and outline Explains the vertical partitioning layout and why this is one of the most important pre-requisites for successful GPU co-processing in database kernels Memory management The memory management subsystem architecture and how to manage memory on a GPU device with missing dynamic memory management facilities Query Optimizer Gives insights into the Split-Work architecture and explains the Optimizer architecture with a special focus on the dynamic re-optimizing steps performed during query processing Query Executer structure Explains the architecture of the query Executer with a special focus on processing the split-work plan for distributing the query work between CPU and GPU QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 34

Query Optimizer and Executer Re-optimizing steps performed during query execution Scanning stage Column array is parallel T1.Col2 40 T1.Col3 abcde T2.Col1 576 T2.Col2 333 Barrier sync. SCAN = SCAN = IDX SCAN IDX SCAN Selected Row ID s as compressed bitmap 00000000001000001000010 00000000001000001000010 00000000001000001000010 00000000001000001000010 Re-optimize 00000010010110010100100 00000010010110010100100 00000010010110010100100 00000010010110010100100 Join Join Join join stages are parallel 00000000001000001000010 00000000001000001000010 00000000001000001000010 GPU Re-optimize GPU Table array JOIN JOIN CPU QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 35

Why is re-optimizing important True size of intermediate result sets! Split-Work requires re-examination of intermediate results Schedules the query tasks either to CPU or GPU Should sorting be done on GPU or CPU? Leave intermediate results on GPU or move over? Perform join processing on GPU or CPU? Is the table persistent on GPU or on CPU? Tight integration of optimizer and executer necessary QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 36

Split-Work Reoptimize CPU GPU QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 37

Agenda (2) General architecture outline of database kernel Explains the overall design principle chosen to make a database kernel ready for massive parallel processing and GPU ready Database Table structure and outline Explains the vertical partitioning layout and why this is one of the most important pre-requisites for successful GPU co-processing in database kernels Memory management The memory management subsystem architecture and how to manage memory on a GPU device with missing dynamic memory management facilities Query Optimizer Gives insights into the Split-Work architecture and explains the Optimizer architecture with a special focus on the dynamic re-optimizing steps performed during query processing Query Executer structure Explains the architecture of the query Executer with a special focus on processing the split-work plan for distributing the query work between CPU and GPU QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 38

Query Executer - basic principle Select. FROM. Worker Thread Thread Pool Join Operator Tree SCAN Join 40 SCAN T1.Col2 ScanINT(*start) { val = start[1] *p = start[2] *bitmap = start[3] #pragma omp parallel for(.) } Pre-compiled operators Operator Library Query Compiler + Optimizer stop ScanINT_GPU(*start) {. // call GPU kernel. ScanInt <<<dimgrid, dimblock>>> (...). } QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 39

Executer Operator Library Function Data Re-optimize ScanINT(void *start) ScanUINT(void *start) ScanFLOAT(void *start) ScanDBL(void *start) ScanIntIDX(void *start) AddINT(void *start) AddDBL(void *start) Function call Vector processing of algebra functions Compiler optimization Filled cache lines Memory prefetch support No instruction cache misses Loop unrolling Branch prediction 4 cores ScanINT_GPU GPU Kernel call Coalesced memory access High bandwidth Massive parallel scanning Fast sorting Arithmetic processing Data local on GPU Loop unrolling QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 40

Split-Work scheduler scales to multi GPUs Query work scales linear with GPU count n copies of (partial) program array CPU thread 1 CPU thread 2 CPU thread n GPU kernel call GPU kernel call GPU kernel call QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 41

SQL query processing Select age, name, (salary * tax) From T Where age > 18 and salary > 10,000 and Task distribution to maximize performance! Project Result-Set 4 cores Join 4 cores Scan age name salary tax QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 42

Lessons learned Try to keep things simple and regular Classic Database technology to complex for GPU architecture Complete re-engineering required Regular structures favor parallelize and scalability Index and algorithmic structures needs to be revised Use regular shaped data structures for fast iterative access Re-optimizing is important for best query work scheduling Coalesced memory access required for high performance Keep the GPU bussy OpenMP is a good starting point Object oriented coding style is useless for GPU co-processing QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 43

Future plans GPU accelerated projection stage of SQL query is work in progress SELECT Tg.a * Tg.b + (1.323 * Tg.c) FROM. LINQ integration (Language INtegrated Query for Microsoft.NET-Framework) GPU based XML processing Self tuning (Experiment mode) Embedded devices? QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 44

Conclusions The RISC like regular-shaped DB technology scales to hundreds of parallel processor cores GPU usage as DB co-processor is a valid concept and boosts performance orders of magnitude Declarative programming style for application development is easier, faster and You get the GPU power for free! QuiLogic makes GPU power available for domain experts in a simple and declarative way (SQL) QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 45

Questions? harald.frick@chello.at office@quilogic.com QuiLogic In-Memory DB Technology SQL/XML-IMDBg: GPU boosted In-Memory Database 46