I N T E R S Y S T E M S W H I T E P A P E R INTERSYSTEMS CACHÉ AS AN ALTERNATIVE TO IN-MEMORY DATABASES. David Kaaret InterSystems Corporation



Similar documents
HOW INTERSYSTEMS TECHNOLOGY ENABLES BUSINESS INTELLIGENCE SOLUTIONS

I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES. Deploying an elastic Data Fabric with caché

CACHÉ: FLEXIBLE, HIGH-PERFORMANCE PERSISTENCE FOR JAVA APPLICATIONS

Cache Database: Introduction to a New Generation Database

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

I N T E R S Y S T E M S W H I T E P A P E R ADVANCING SOA WITH AN EVENT-DRIVEN ARCHITECTURE

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

TIBCO ActiveSpaces Use Cases How in-memory computing supercharges your infrastructure

Configuration and Development

And OrACLE In A data MArT APPLICATIOn

IBM Netezza High Capacity Appliance

Gradient An EII Solution From Infosys

Big Data Functionality for Oracle 11 / 12 Using High Density Computing and Memory Centric DataBase (MCDB) Frequently Asked Questions

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

BigMemory & Hybris : Working together to improve the e-commerce customer experience

BUSINESS INTELLIGENCE ANALYTICS

Solving the Problem of Data Silos: Process and Architecture

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Exadata Database Machine

GigaSpaces Real-Time Analytics for Big Data

Introducing InterSystems DeepSee

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Nexenta Performance Scaling for Speed and Cost

InterSystems in Financial Services

<Insert Picture Here> Oracle In-Memory Database Cache Overview

In-Memory Analytics for Big Data

Introduction. Scalable File-Serving Using External Storage

EMC Unified Storage for Microsoft SQL Server 2008

Using In-Memory Computing to Simplify Big Data Analytics

Active AnAlytics: Driving informed Decisions leading to Better clinical AnD financial outcomes

WHITE PAPER Embedding Additional Value into Applications: What Enterprises Need Most from Application Vendors

An Oracle White Paper May Exadata Smart Flash Cache and the Oracle Exadata Database Machine

Online Firm Improves Performance, Customer Service with Mission-Critical Storage Solution

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Speed and Persistence for Real-Time Transactions

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

RevoScaleR Speed and Scalability

Scaling Your Data to the Cloud

OLAP Services. MicroStrategy Products. MicroStrategy OLAP Services Delivers Economic Savings, Analytical Insight, and up to 50x Faster Performance

An Overview of SAP BW Powered by HANA. Al Weedman

Using the Caché SQL Gateway

Understanding Storage Virtualization of Infortrend ESVA

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

ORACLE DATABASE 10G ENTERPRISE EDITION

USING INTERSYSTEMS CACHÉ FOR SECURELY STORING CREDIT CARD DATA

Realizing the True Potential of Software-Defined Storage

SOLUTION BRIEF. Advanced ODBC and JDBC Access to Salesforce Data.

Microsoft Analytics Platform System. Solution Brief

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V. Reference Architecture

An Accenture Point of View. Oracle Exalytics brings speed and unparalleled flexibility to business analytics

IBM WebSphere Distributed Caching Products

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Benefits of multi-core, time-critical, high volume, real-time data analysis for trading and risk management

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

Software-defined Storage Architecture for Analytics Computing

Availability Digest. Raima s High-Availability Embedded Database December 2011

Integrated and reliable the heart of your iseries system. i5/os the next generation iseries operating system

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

EMC XTREMIO EXECUTIVE OVERVIEW

BigMemory: Providing competitive advantage through in-memory data management

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

How To Use An Org.Org Cloud System For A Business

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

The Sierra Clustered Database Engine, the technology at the heart of

In-Memory Analytics: A comparison between Oracle TimesTen and Oracle Essbase

IBM System Storage DS5020 Express

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

In-Memory or Live Reporting: Which Is Better For SQL Server?

Innovative technology for big data analytics

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Big data management with IBM General Parallel File System

Postgres Plus Advanced Server

With DDN Big Data Storage

for BreaktHrougH HealtHcare SolutIonS

SQL Server 2012 Performance White Paper

HiRDB 9 HiRDB is an evolving database for continuing business services.

The IBM Cognos Platform for Enterprise Business Intelligence

Memory-Centric Database Acceleration

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

TIBCO Live Datamart: Push-Based Real-Time Analytics

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Achieving High Availability & Rapid Disaster Recovery in a Microsoft Exchange IP SAN April 2006

Optimizing SQL Server AlwaysOn Implementations with OCZ s ZD-XL SQL Accelerator

An Oracle White Paper June Integration Technologies for Primavera Solutions

Actian Vector in Hadoop

Cloud Computing and Advanced Relationship Analytics

Data Center Performance Insurance

Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Performance And Scalability In Oracle9i And SQL Server 2000

Transcription:

INTERSYSTEMS CACHÉ AS AN ALTERNATIVE TO IN-MEMORY DATABASES David Kaaret InterSystems Corporation

INTERSYSTEMS CACHÉ AS AN ALTERNATIVE TO IN-MEMORY DATABASES Introduction To overcome the performance limitations of traditional relational databases, applications ranging from those running on a single machine to large, interconnected grids often use in-memory databases to accelerate data access. While in-memory databases and caching products increase throughput, they suffer from a number of limitations including lack of support for large data sets, excessive hardware requirements, and limits on scalability. InterSystems Caché is a high-performance object database with a unique architecture that makes it suitable for applications that typically use in-memory databases. Caché s performance is comparable to that of in-memory databases, but Caché also provides: Persistence data is not lost when a machine is turned off or crashes Rapid access to very large data sets The ability to scale to hundreds of computers and tens of thousands of users Simultaneous data access via SQL and objects: Java, C++,.NET, etc. This paper explains why Caché is an attractive alternative to in-memory databases for companies that need high-speed access to large amounts of data. Unique data engine enables persistence and high performance Caché is a persistent database, which means that data maintained in RAM is written to disk by background processes. So how can Caché provide performance that is comparable to in-memory databases, which only periodically write data to some permanent data store? Part of the answer lies in Caché s unique architecture. Instead of the rows and columns of a traditional database, Caché uses multidimensional arrays, the structure of which is based on object de initions. Data is stored the way the architect designs it, and the same structures used for the in-memory cache are used on disk. Data that should be stored together is stored together. As a result, Caché can access data on disk very quickly. The requirement that multiple in-memory caches need to be synchronized when data is updated also reduces the performance of many distributed cache products. With Caché, the updating of data and the distribution of data to caches are logically separate. This gives it a much simpler work low which allows for superior performance. Caché also provides in-process bindings to C++ and Java that allow applications written in those languages to directly populate Caché s internal data structures. 1

The bene its of persistence Given that Caché provides comparable performance, its ability to access data on disk confers some signi icant advantages compared to in-memory databases. The most obvious is that there is no need for a separate permanent data store. Caché is the permanent store, and it is always current. Data is not lost when a machine is turned off or crashes. Another advantage is that, with Caché, the size of data sets is not limited by the amount of available RAM. If data is not in a local cache it is either obtained from a remote cache or from disk in a seamless manner. Since it is not RAM-limited, a Caché-based system can handle petabytes of data, in-memory databases cannot. Adding RAM to a system in an attempt to increase capacity is more expensive than adding disk storage. (A terabyte of disk storage is cheaper than a terabyte of RAM.) Plus, many in-memory systems need to keep redundant copies of data on separate machines to safeguard against the effects of having a computer crash. Operating distributed cache systems with a persistent database like Caché often results in reduced hardware costs. Seamless SQL and object data access One problem shared by most in-memory databases is that, because their data structures are optimized for high-speed processing, the data is usually not readily accessible via SQL. In order to be compatible with most analysis and reporting tools, the data must irst be mapped into relational tables. This is usually done when data is transferred from the in-memory database to the permanent data store and typically involves an ETL (extract, transform, and load) process. (The processing overhead and additional time required for mapping is the main reason relational databases are not fast enough for extremely high-speed distributed applications, and why in-memory databases are often used instead.) A few in-memory databases are based on the relational model, and offer SQL data access. Such systems suffer from the opposite problem, in that data is not readily accessible to the object-oriented technologies that are typically used for application development. In addition, most relational in-memory databases are not designed for multi-computer con igurations. They run on only one machine, and are RAM-limited. Caché is different, because the multidimensional arrays it uses can be exposed simultaneously as relational tables and as objects. Caché s Uni ied Data Architecture maintains both object and relational views of data at all times without mapping. 2

I N T W H E R S I T E Y S T E M S P A P E R Fig. 1: Caché s Uni$ied Data Architecture enables multiple ways to access data Caché s SQL access is compatible with both ODBC and JDBC. On the object side, Caché provides bindings to any number of object-oriented languages including Java,.NET, and C++. Caché s object representation is full-featured and supports object-oriented concepts like inheritance, polymorphism, and encapsulation. Enterprise Cache Protocol In multi-computer applications Caché automatically maintains caches by use of its Enterprise Cache Protocol (ECP). With ECP, Caché instances can be conoigured as data servers and/or application servers. Each piece of data is owned by a data server. Application servers understand where data is located and keep local caches of recently used data. If an application server cannot satisfy requests from its local cache it will request the necessary data from a remote data server. ECP automatically manages cache consistency. ECP requires no application changes applications simply treat the entire database as if it was local. This is a major distinction from some distributed cache systems, where each client needs to specify what subset of data it is interested in before any queries are performed. One machine, one cache Another key difference between Caché and other distributed cache products is that most other products maintain a separate cache for each process running on a machine. For example, if a single machine has eight clients then eight individual caches will be maintained on that machine. 3

In contrast, Caché maintains its cache in shared memory and provides bindings to allow processes running in their own memory address space to access the data. Data can be simultaneously accessed through TCP-based protocols like JDBC, through language bindings, and also for exceptionally high performance through bindings that allow applications to directly manipulate the cache. Allowing multiple clients to share a single cache provides a number of bene its. One is that a shared-cache system has reduced memory requirements. When, as is often the case, individual clients require access to overlapping data, other distributed cache products maintain multiple copies of the data. With Caché only a single copy of the data needs to be maintained for each machine. Having one cache per machine also results in reduced network I/O. In high-performance applications the network traf ic associated with cache maintenance can be a major issue. However, with a single cache per machine, only that cache needs to be updated as the underlying data changes, rather than making overlapping updates to multiple caches. Even with multi-core processors, a Caché-based system only uses one shared cache per machine, resulting in superior scalability compared with other distributed cache products. For example, in a Caché-based system of 250 machines, each with 8 cores, only 250 caches need to communicate with each other in order to maintain cache coherence. But systems that require a separate cache for each core would need to coordinate 2000 caches. As modern computers may have eight, sixteen, or even more cores, the scalability advantage of Caché becomes increasingly important. Fig. 2a: Cache coherency without InterSystems Enterprise Cache Protocol. 4

I N T W H E R S I T E Y S T E M S P A P E R Fig. 2b: Cache coherency in a Caché-based system. Populating the cache In many distributed cache applications, pre-loading the cache can be a lengthy process. This may be due to the sheer amount of data, and/or because of the time required to map data from a relational store into the object-oriented structures used by the application. For some data-intensive applications, more time is spent populating in-memory caches than actually running calculations against them. Not so with Caché. Caché s exceptional SQL capabilities allow it to easily pull data from relational primary data sources. And of course, as a persistent database, Caché may be the primary source. In that case, there is no need to pre-load caches at all. Local caches will automatically load the data they need as queries are run. Another consideration is how many machines are involved with the task of populating caches. With Caché, primary ownership of the data is held by a small percentage of the computers in a distributed grid environment. Populating that environment only requires access to the ECP data servers, and they can be loaded in the background while the other computers are used for other tasks. When the application servers come on line, their caches are repopulated automatically as data is requested. In contrast, when data is loaded in most in-memory products, it is partitioned to be spread across the distributed cache so that all, or virtually all, data is in the memory of at least one machine. As a result, it is often not feasible to do data loads with a small subset of the computers while bringing the rest on line as needed. 5

Conclusion The primary reason for using in-memory databases is speed. But although they are fast, in-memory databases often suffer from poor scalability, lack of SQL support, excessive hardware requirements, and the risk of losing data due to unplanned outages. Caché is the only persistent database that provides performance equal to that of in-memory databases. It also supports extremely large data sets, seamlessly allows data access via both SQL and objects, enables distributed systems of hundreds of machines, and is highly reliable. All of this makes Caché an attractive alternative for applications that must process very high volumes of data at very high speed. About InterSystems InterSystems Corporation is a global software technology leader with headquarters in Cambridge, Massachusetts, and of ices in 23 countries. InterSystems provides innovative products that enable fast development, deployment, and integration of enterprise-class applications. InterSystems Caché is a high performance object database that makes applications faster and more scalable. InterSystems Ensemble is a rapid integration and development platform that enriches applications with new functionality, and makes them connectable. InterSystems HealthShare is a platform that enables the fastest creation of an Electronic Health Record for regional or national health information exchange. InterSystems DeepSee is software that makes it possible to embed real-time business intelligence in transactional applications, enabling better operational decisions. For more information, visit InterSystems.com. InterSystems Corporation World Headquarters One Memorial Drive Cambridge, MA 02142-1356 Tel: +1.617.621.0600 Fax: +1.617.494.1631 InterSystems.com InterSystems Ensemble and InterSystems Caché are registered trademarks of InterSystems Corporation. InterSystems DeepSee and InterSystems HealthShare are trademarks of InterSystems Corporation. Other product names are trademarks of their respective vendors. Copyright 2010 InterSystems Corporation. All rights reserved. 1-10