BigMemory: Providing competitive advantage through in-memory data management

BUSINESS WHITE PAPER : Providing competitive advantage through in-memory data management : Ultra-fast RAM + big data = business power TABLE OF CONTENTS 1 Introduction 2 : two ways to drive real-time big data 3 explained: how it works 5 Understanding performance gains 6 Managing scale with 8 Conclusion The combination of plummeting RAM prices and the rise of big data is a game-changer in nearly every industry. With in-memory data management, credit card companies that previously ran risk analysis in 45-minute batch jobs can now detect fraud in the time it takes to swipe a transaction saving billions of dollars. Media distributors can synchronize content on multiple devices down to the second. Retailers and customer service organizations can boost the number and quality of customer interactions by orders of magnitude. Simply put, companies that make the most of in-memory big data will win. Big. is the world s easiest, most powerful in-memory data management platform. makes your data available in real-time to your applications. Because it works without any proprietary virtual machines or special hardware, it s a snap to install. It s also wildly cost effective. Before, the maximum addressable memory of a single Java Virtual Machine ( JVM ) before garbage collection tuning becomes impractically difficult was around four gigabytes. With, the maximum addressable memory of a single JVM is limited only by the amount of available RAM on your servers. At terabyte scale, can slash application server footprint by 100x or more.

White Paper : Providing competitive advantage through in-memory data management With, you get: Real-time access to terabytes of in-memory data High throughput with low, predictable latency Support for Java,.NET/C#, C++ applications 99.999 percent uptime Linear scale Data consistency guarantees across multiple servers Optimized data storage across RAM and SSD SQL support for querying in-memory data Reduced infrastructure costs through maximum hardware utilization High-performance, persistent storage for durability and ultra-fast restart Advanced monitoring, management and control Ultra-fast in-memory data stores that automatically move data where it s needed Support for data replication across multiple data centers for disaster recovery : two ways to drive real-time big data According to McKinsey Global Institute 1, big data is overwhelming the ability of typical database software tools to capture, store, manage and analyze it. lets you move terabytes of high-value data out of slow, expensive disk-bound databases and mainframes and into memory where applications can use it most effectively. comes in two flavors: Go and Max. Both ship in a Java Archive ( JAR) file that you can easily plug into your applications with a few lines of configuration. Your application then reads and writes data through a simple put/get/search API. Go (Figure 1) adds advanced in-memory data management capabilities to a single JVM. Traditionally, Java applications were limited to a heap of two to four gigabytes, due to the limitations of garbage collection. Go s unique off-heap data store gives your application easy, fast access to a terabyte or more of data in memory without garbage collection pauses. Max is for applications that need to access up to terabytes of in-memory big data across multiple application servers. Go lets you scale up; Max (Figure 2) lets you scale up and out. Because applications can access so much more memory in a single JVM (Figure 3), Max reduces your applications requirements for JVMs and server hardware by up to 90 percent. By storing more data in memory locally, reduces reliance on back-end databases. COMMODITY SERVER Java scale up API (JAVA, EHCACHE) Disk Store Figure 1: Go - in-memory big data on a single JVM 1 McKinsey Global Institute, Big Data: The Next Frontier for Innovation, Competition, and Productivity, p. 10 2

White Paper : Providing competitive advantage through in-memory data management increased data in memory Go Max Distributed In-Memory Database Database reduce database reliance Figure 2: Max - in-memory big data across multiple application servers explained: how it works In life, we re often faced with a tradeoff between volume and speed. It might be nice to keep a year s worth of paper towels in the house, but most people just can t afford the space. Better to make trips to the supermarket every so often. Data management systems pose the same kind of tradeoff. CPUs shift data between small, high-speed memory and larger, slower memory to maximize efficiency. Hierarchical storage management systems migrate files from, say, high-performance SAN devices to slower, cheaper SATA disks, and, finally, to tape. employs a similar tiered approach to managing your application s data in-memory, automatically moving it between the different tiers as needed. The top two tiers the JVM heap memory and the in-process, off-heap store use the application server host s RAM. Since application server hardware typically ships with tens of gigabytes of RAM and can be inexpensively upgraded with hundreds of gigabytes or more, can efficiently store terabytes of data in RAM where your application can most readily use it. Without With 2GB 2GB scale up Unused Memory 1TB 1TB COMMODITY SERVER COMMODITY SERVER Figure 3: By maximizing hardware utilization, Max can slash server hardware costs by up to 90 percent. 3

White Paper : Providing competitive advantage through in-memory data management Max for applications that deploy on multiple application servers and use more data than will typically fit in memory on a single application server machine uses tiers comprised of the combined RAM of a scalable array of distributed data servers. With Max, the client interface also maintains a TCP connection to the server array. The server array manages the movement of data between the different tiers as needed by the application. Hybrid is a hybrid mode that allows flexibility to leverage Flash as a storage tier. Figure 4 shows the topology of the Max s storage tiers. The top tier the Java heap memory in the application JVM contains a maximum of two gigabytes of data, accessible in nanoseconds. The local tier the off-heap memory store in the application JVM typically stores tens to hundreds of gigabytes of data accessible in microseconds. Finally, the distributed Max array keeps hundreds of gigabytes to terabytes of data accessible to multiple application servers in milliseconds. Figure 5 shows the speed of each tier for Max. scale up Active Mirror TCP TCP TCP TCP scale out Commodity Commodity Terracotta Array Stripe Figure 4: Max s distributed RAM store Speed (TPS) Hybrid Size (GB) 2,000,000+ (DRAM) Heap Store 2 1,000,000 Off-Heap Store 1,000 100,000 Disk Store (SSD/Flash) 10,000+ 1,000s External Data Source (E.G., Database, Hadoop, Data Warehouse) Figure 5: Tiered memory store and architecture 4

White Paper : Providing competitive advantage through in-memory data management Understanding performance gains s tiered-store organization keeps data where applications need it for fast, predictable access precisely when it s needed. Because local memory is fast and increasingly cheap and abundant, keeps as much data locally as your available RAM permits. Two kinds of fast: high throughput and predictably low latency When measuring performance, it s tempting to look only at overall throughput the average number of transactions or operations per unit of time. But average throughput tells only part of the story. For instance, a website may boast average throughput of thousands of requests per second, but outlier responses can take minutes. For applications where response time is critical risk analysis for online financial transactions, for instance outlier response times are business killers. In Java applications, most latency outliers can be traced to long garbage collection pauses associated with using large amounts of JVM heap memory. Back when servers shipped with less than 10 GB of RAM, you could make garbage collection pauses manageable if you had a development or dev ops team with specialized tuning skills. Today s servers come with hundreds of gigabytes or more of RAM, but using all that in your JVM heap is practically impossible: virtually no amount of tuning can prevent the garbage collector from freezing your applications for minutes at a time. keeps your application safe from the garbage collector by storing your data inmemory, but not in the JVM heap. s can run with heaps small enough so the garbage collector never pauses the JVM, while keeping hundreds of gigabytes of data or more in memory. The result is a high throughput, low-latency system unburdened by long, unpredictable garbage collection pauses. Ultra-high availability: five 9 s is fast. It s also highly reliable, delivering 99.999 percent uptime. One reason is that Max s distributed architecture has no single points of failure. Each server in the Max array replicates data in real time to a mirror. Should a server go offline either for maintenance or due to unexpected hardware failure its mirror will replace it with zero down time. servers also protect themselves by throttling unexpected load spikes, giving the system the ability to adjust to changing usage patterns (rather than failing). And, built-in security measures prevent unauthorized access to data and services. Life-cycle management for flexible server deployment Flexible server deployment is a key component of highly available systems. s that are fast and easy to tear down and bring up help avoid downtime. s that are slow to restart or reprovision create brittle systems that increase failure risk. At s terabyte in-memory scale, managing data life cycle is critical to run-time flexibility. has comprehensive data life-cycle management capabilities for making server deployment fast and flexible. Bulk loading bulk-loads data into RAM orders of magnitude faster than querying a database, achieving steady operating state quickly. In addition to bringing new servers online faster, protects expensive database resources from the strain of serving terabytes of application data as those servers start up. This is especially important in systems with multiple application servers that would otherwise overwhelm a database with many simultaneous terabyte-scale queries or require new servers to load from the database in series, which can slow deployment and add complexity. 5

White Paper : Providing competitive advantage through in-memory data management High-performance persistent restartable store automatically replicates in-memory data to a high-performance, persistent disk store so servers can restart and reach steady operating state in seconds. This adds to run-time flexibility and shields expensive database resources from the cost of redundant data load operations when servers are restarted. WAN replication Enterprises that maintain operations in multiple data centers for high availability, geographic distribution or disaster recovery take advantage of s WAN replication. can be configured to keep data up-to-date across data centers so that application load in one data center can be diverted to others as needed with no loss of data. s industry-standard API s access data through using the de facto Java standard Ehcache API. It combines the simple get and put methods of a key-value store with powerful query, search and analysis capabilities, giving applications unprecedented visibility into data that might otherwise be locked away in slow, expensive disk-bound databases. stores data as plain Java objects. This simplifies programming and enables applications to efficiently use data without the overhead of the object-relational mapping transformation that comes with relational databases. Once data is in, it stays in the format most easily used by the application. also works well in heterogeneous technology environments. Because is deployed as an in-memory data service, our customers commonly use MOM, HTTP, REST and SOAP protocols to access in a technology-agnostic way. Managing scale with is not only fast and reliable; it also helps you effectively scale up and out over time to meet the rapidly evolving data requirements of today s applications. Linear scalability scales without bottlenecks along multiple dimensions. It scales up a single application server by maximizing hardware utilization and taking full advantage of the cheap, abundant memory on today s commodity hardware. Max scales out the application server tier through seamless data management across application servers. The Max server array scales out linearly, without impacting latency, providing ample headroom to support the growth of applications and data over time. Data consistency at scale When applications scale across multiple servers, it s crucial to manage data consistency between those servers. offers a range of consistency guarantees from strict XA-compliant transactions to eventual consistency all configurable on a per-dataset basis. Whether in a single JVM deployment ( Go) or distributed across multiple application servers ( Max), manages data consistency over time according to your requirements. Monitoring, management and control comes with a full suite of monitoring, management and control tools and capabilities. s Automatic Resource Control (ARC) capability lets operators shape the allocation of memory on a per-dataset basis in each tier. When operators set a maximum memory allocation for a dataset in a tier, automatically maintains the size of that dataset within the allocation parameters, migrating data between tiers as necessary. ARC also offers a data-pinning option to guarantee that critical, frequently used data is always available in local memory. 6

Through the management console, operators get run-time visibility into memory allocation at each tier and see application behavior and memory utilization, allowing them to make intelligent adjustments if necessary. also captures run-time statistics on your server topography, your server health, data access performance (by server and dataset) as well as remote JVM operating characteristics and thread dumps. All analytics are available through a number of formats, including log messages, a RESTful API, and our own management console for administration, monitoring, and management. You can also get statistics as JMX events for surfacing metrics into your own dashboards and monitoring software. Configurable run-time events alert operators to changes in data store sizes, data access rates, and other performance indicators. Operators may then enable/disable data stores, adjust store sizes per tier by dataset, and adjust data freshness parameters. They can also control remote server life cycle to adjust data center topology and initiate remote backup procedures. s run-time visibility and alerting capabilities provide insight into what s happening in the data center. Operators can then use s remote server management and run-time control capabilities to take necessary action. White Paper : Providing competitive advantage through in-memory data management 7

White Paper : Providing competitive advantage through in-memory data management Conclusion is the easiest, most powerful way to take advantage of the in-memory revolution. With, you get fast, predictable access to all of your data up to hundreds of terabytes without garbage collection pauses. s two product editions Max and Go also give you all the reliability, availability and consistency that you ve come to expect from traditional disk-based data management systems. Find out how to power up your Digital Enterprise at www.softwareag.com ABOUT SOFTWARE AG Software AG helps organizations achieve their business objectives faster. The company s big data, integration and business process technologies enable customers to drive operational efficiency, modernize their systems and optimize processes for smarter decisions and better service. Building on over 40 years of customer-centric innovation, the company is ranked as a leader in 14 market categories, fueled by core product families Adabas-Natural, Alfabet, Apama, ARIS, Terracotta and webmethods. Learn more at www.softwareag.com. 2014 Software AG. All rights reserved. Software AG and all Software AG products are either trademarks or registered trademarks of Software AG. Other product and company names mentioned herein may be the trademarks of their respective owners. SAG_Terracotta_GoMax_8PG_WP_Aug14