Database Performance with In-Memory Solutions ABS Developer Days January 17th and 18 th, 2013 Unterföhring metafinanz / Carsten Herbe
The goal of this presentation is to give you an understanding of in-memory databases, when they can be used and when better to use other technologies to store your data. Today we will look at disk-based and in-memory databases in-memory database products other data storage options... but not at ABS! Database performance with in-memory solutions 2
Within our business line Business Intelligence & Risk, there are five groups: Risk, Insurance Reporting, Insurance Analytics, Customer Intelligence and Data Warehousing. Data Warehousing Capabilities Technologies We support the complete data warehouse lifecycle from requirement gathering to tuning existing ETL processes. Our team includes skilled architects, developers and project managers with broad data warehousing experience. DWH Architectures & Dimensional Modeling Data Quality (Profiling & Cleansing) Databases (Oracle, in-mem., columnar) ETL Processes & Tools (OWB & SAS DI) Performance Tuning Turning Data into Information Project Management Big Data (Hadoop, NoSQL) Training (Oracle, DWH, Hadoop, etc.) Carsten Herbe Your contact More than 8 years data warehousing experience Strong technical skills in Oracle & OWB Certified Hadoop Developer Oracle partnership manager Database performance with in-memory solutions 3
Contents 1 Introduction 6 SAS Visual Analytics 2 About databases 7 Architectures 3 Oracle TimesTen 8 Alternative storage technologies 4 IBM soliddb 9 Conclusions 5 SAP HANA Database performance with in-memory solutions 4
1 Introduction
The main reason for introducing an in-memory database is performance! What do in-memory technologies promise? Database performance with in-memory solutions 6
In-Memory databases store all their data in the RAM. Hard-disk drives are only used for log-files and backups. In-memory databases & disks Database performance with in-memory solutions 7
/MB The RAM price is continuously decreasing. Systems with 1 terabyte of RAM are affordable today. Decreasing RAM prices 0,4 0,35 0,3 0,25 0,2 0,15 0,1 0,05 0 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Jahr Quellen: Computerbase, Heise, Golem, hardwareboard.eu Database performance with in-memory solutions 8
2 About databases
Relational databases are the standard way of storing all kinds of data securely and consistently. Data is queried and manipulated using SQL (Structured Query Language). Relational databases table PRODUCTS PROD_ID NAME PRICE CAT_ID 101 MyPhone 1 499,00 1 102 MyPad 1 799,00 2 103 MyPhone 2 699,00 1 SELECT p.name, c.name, p.price FROM products p JOIN categories c ON p.cat_id = c.cat ID WHERE p.price < 500; table CATEGORIES CAT_ID NAME 1 Mobiles 2 Tablets Primary Key Foreign Key SQL SELECT INSERT, UPDATE, (MERGE) DELETE Database performance with in-memory solutions 10
Databases never are 100% compatible. Databases and compatibility Databases The interface to a relational database is SQL The ANSI SQL standard is usually supported (ca. 99%) many vendors add custom functions Scripting language support (like PL/SQL on Oracle) Some database claim to be Oracle compatible (ca. 99%) Compability... means SQLs can be compiled and executed on different systems Databases are compatible from a user/application developer perspective BUT: SQLs on different systems are executed differently Results are identical But runtime may vary! Database tuning and administration is different and requires different skills Database performance with in-memory solutions 11
Not only data (including indexes) is causing disk i/o. UNDO is required for readconsistency, REDO for restoring data. Database architecture: disk-based DATA cache RDBMS DB processes DATA User & system data: tables, indexes, programs UNDO DATA UNDO REDO Required for read-consistency: used to undo changes made by other sessions after your own query has started. REDO DATA backup Backup Server REDO backup Required for durability: Data restore uses backup and redo. Database performance with in-memory solutions 12
An in-memory database holds all data in the RAM. But REDO is still written to disk to guarantee durability. SELECTs are processed in-memory only. Database architecture: in-memory RDBMS DATA DATA DB processes UNDO Data is stored in the RAM and is asynchronously written to snapshot on disk (only used for a restore). DATA snapshot REDO UNDO UNDO is in-memory only. REDO DATA backup REDO backup Backup Server Required for durability: Data restore uses data snapshot and redo. Database performance with in-memory solutions 13
Row-orientated databases are good for handling single rows of a transactional system, columnar-orientated databases good for analyzing huge data sets in data warehouses. Row- and column-oriented databases row orientated column-orientated PersNo Last Name First Name Salary PersNo Last Name First Name Salary 1 Müller Karl 45000 2 Bauer Fritz 62000 3 Meier Hans 54000 4 Schmidt Paul 70000 1 Müller Karl 45000 2 Bauer Fritz 62000 3 Meier Hans 54000 4 Schmidt Paul 70000 storage 1, Müller, Karl, 45000 2, Bauer, Fritz, 62000 3, Meier, Hans, 54000 4, Schmidt, Paul, 70000 storage 1, 2, 3, 4 Müller, Bauer, Meier, Schmidt Karl, Fritz, Hans, Paul 45000, 62000, 54000, 70000 Database performance with in-memory solutions 14
4 Oracle TimesTen
Oracle TimesTen is a relational in-memory database acquired in 2005. It s compatibility with the Oracle database has been improved ever since, including PL/SQL support. TimesTen architecture TimesTen Server Direct linked Application Shared Libraries Client Client Application DB Processes In-memory data Server Process Network Client Driver Checkpoint Files Log Files Cache Agent Network Oracle Server Oracle DB Database performance with in-memory solutions 16
Tables can be cached partially. If TimesTen cannot process a query, it is passed through to the underlying Oracle database TimesTen Cache Cached tables do not need to contain all attributes or all rows! Oracle DB TimesTen Cache Group 1 Tab1 Tab2 Tab1 Tab2 passthrough Server Process Client Application Tab3 Tab4 Cache Group 2 Tab4 Database performance with in-memory solutions 17
4 IBM soliddb
IBM acquired soliddb in 2007. It works similarly to Oracle TimesTen, but it also supports disk-based tables. Direct linking from application is supported using libraries. IBM soliddb source: IBM redbook IBM soliddb Delivering Data with Extreme Speed, 2011 Database performance with in-memory solutions 19
IDB soliddb can be used as a cache for a disk-based RDBMS. Contrary to TimesTen, different databases are supported. IBM soliddb Universal Cache Caching for Oracle, DB2, IDS, Sybase ASE, and Microsoft SQL Server Caches can be read-only or read/write Synchronisation using InfoSphere Change Data Capture (CDC) replication. source: IBM redbook IBM soliddb Delivering Data with Extreme Speed, 2011 Database performance with in-memory solutions 20
5 SAP HANA
SAP HANA is being developed by SAP and was first released in 2010. It makes use of the latest processor technology and therefore requires hardware certified by SAP. SAP HANA High-Performance Analytic Appliance In-memory Database column-based storage and/or row-based storage supports SQL and MDX Currently focused on SAP applications Database performance with in-memory solutions Seite 22
This Fujitsu PRIMERGY RS 900 S2 was used for a PoC by metafinanz and AMOS in 2012. This is what 1 TB of RAM looks like 1.024 GB (= 1 TB) RAM 80 cores 4.500 GB disk storage for data 1.280 GB SSD-based storage with Fusion I/O for logging 8 sockets 8-HE Rack Server X86-based, Intel Xeon E7-8800- Processor-Family Database performance with in-memory solutions Seite 23
6 SAS Visual Analytics
SAS Visual Analytics is not a relational database, but an analytical in-memory solution that includes data storage and a graphical front-end for analysis and reporting. SAS Visual Analytics architecture Database performance with in-memory solutions 25
SAS Visual Analytics includes graphical tools for reporting, data exploration and a mobile ios app. SAS Visual Analytics Central Entry Point Integration Role-based Views DATA PREPARATION EXPLORER DESIGNER MOBILE BI Monitor SAS LASR Analytic server Load and join data Create calculated columns Perform ad-hoc analysis and data discovery Create dashboard style reports for web or mobile Native ios application that delivers interactive reports created in the designer SAS LASR ANALYTIC SERVER Database performance with in-memory solutions 26
7 Architectures
A classical disk-based database is replaced by an in-memory database. Complete replacement classical architecture in-memory dedicated in-memory shared application server Java Application JDBC application server Java Application JDBC application + db server Java Application JDBC direct access network network database in-memory database disk-based database server database in-memory database server Database performance with in-memory solutions 28
On the application server, an in-memory database is used to cache currently used data. Server Cache classical architecture server cache application server Java Application JDBC application + db server Java Application JDBC direct access network database in-memory network database disk-based database server database disk-based database server Database performance with in-memory solutions 29
In a distributed environment, the in-memory database acts as a local cache. Concurrency problems must be handled by the application. Distributed cache distributed architecture distributed cache location 1 location n location 1 location 2 Java Application JDBC Java Application JDBC Java Application JDBC direct access Java Application JDBC direct access slow or unreliable network database in-memory slow or unreliable network database in-memory database disk-based location 0 database disk-based location 0 Database performance with in-memory solutions 30
A data mart, i.e. an (aggregated) subset of a data warehouse, is stored in the in-memory database. In-memory data mart classical BI architecture in-memory data mart in-memory BI BI server DB server BI Application JDBC data mart disk-based BI server DB server BI Application JDBC data mart in-memory BI server BI Application JDBC integrated in-memory DB server data warehouse disk-based DB server data warehouse disk-based DB server data warehouse disk-based Database performance with in-memory solutions 31
8 Alternative storage technologies
During the last couple of years, a lot new data storage technology have emerged. But still, relational databases are the most general purpose option and most widely used. Others ways to store your data Columnar DBs OLAP NoSQL Hadoop Data is stored in columns instead of rows Good compression and good for analytics For big servers or clusters of commodity hardware Stores data in cubes (similar to Excel pivot) For analytics (slice & dice) Data can be added but not modified Not only SQL Often only eventually consistent Typical: Key-value- or wide-column store (like a hash table with multiple values) All types of special purpose databases: documents, graphs, = HDFS (distributed file system) + MapReduce (parallel programming framework) Data (big files like 64MB) is written only once and never modified Runs on clusters of commodity hardware All product and vendor lists are only examples and are therefore incomplete. Database performance with in-memory solutions 33
Weak structered data structure Strong structered First analyse the requirements, than chose an appropriate technology for your problem. Classification disk-based & in-memory RDBMS columnar RDBMS OLAP SAS Visual Analytics key-value stores Hadoop operational data usage analytical Database performance with in-memory solutions 34
9 Conclusions
An in-memory database can boost your performance, but there are some points to consider. In-memory DBs can boost your performance, but Understand your current system! Don t use your database as a black box! Learn how your database works! Tune the existing system! Tune the SQL! Tune the database instance! Tune the application (design)! Plan... the migration, it is never easy! Understand the new technology and use it properly! New technology adds complexity both in dev & op! There is no 100% compatibility! Database performance with in-memory solutions 36
We offer open group trainings or customized trainings for individual companies. metafinanz training Einführung Oracle in-memory Datenbank TimesTen Big Data mit Hadoop NEW 2012 NEW 2013/Q2 Data Warehousing & Dimensionale Modellierung Oracle Warehousebuilder 11.2 New Features OWB Skripting mit OMB*Plus Oracle SQL Tuning Einführung in Oracle: Architektur, SQL und PL/SQL More details about our trainings can be found at http://www.metafinanz.de/arbeitsweise/was-trainieren-wir All trainings are also available in English on request. Database performance with in-memory solutions 37
If you have any questions ask now... or later? Carsten Herbe Head of Data Warehousing mail carsten.herbe@metafinanz.de phone +49 89 360531 5039 Database performance with in-memory solutions
Database Performance with In-Memory Solutions Thank you for your attention! metafinanz Informationssysteme GmbH Leopoldstr. 146 80804 München Phone: +49 89 360531-0 Fax: +49 89 350531-5015 Email: kontakt@metafinanz.de www.metafinanz.de Fachblog und Forum zu Solvency II: http://solvencyii.metafinanz.de Visit us: