Developing Scalable Java Applications with Cacheonix



Similar documents
JBoss & Infinispan open source data grids for the cloud era

BigMemory & Hybris : Working together to improve the e-commerce customer experience

In Memory Accelerator for MongoDB

MakeMyTrip CUSTOMER SUCCESS STORY

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Understanding Neo4j Scalability

Hadoop Distributed File System (HDFS) Overview

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

ORACLE COHERENCE 12CR2

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Chapter 18: Database System Architectures. Centralized Systems

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Practical Cassandra. Vitalii

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

<Insert Picture Here> Getting Coherence: Introduction to Data Grids South Florida User Group

Tier Architectures. Kathleen Durant CS 3200

<Insert Picture Here> Oracle In-Memory Database Cache Overview

BigMemory: Providing competitive advantage through in-memory data management

Ground up Introduction to In-Memory Data (Grids)

SCALABILITY AND AVAILABILITY

Client/Server and Distributed Computing

Principles and characteristics of distributed systems and environments

Big Data Analytics - Accelerated. stream-horizon.com

Hadoop: Embracing future hardware

Liferay Performance Tuning

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Chapter 2 TOPOLOGY SELECTION. SYS-ED/ Computer Education Techniques, Inc.

Application Performance Management for Enterprise Applications

Scaling out a SharePoint Farm and Configuring Network Load Balancing on the Web Servers. Steve Smith Combined Knowledge MVP SharePoint Server

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Rakam: Distributed Analytics API

Copyright 1

Distributed File Systems

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Cloud Computing Trends

Advances in Flash Memory Technology & System Architecture to Achieve Savings in Data Center Power and TCO

Performance Testing of Java Enterprise Systems

WITH BIGMEMORY WEBMETHODS. Introduction

Active/Active DB2 Clusters for HA and Scalability

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

How To Virtualize A Storage Area Network (San) With Virtualization

IBM WebSphere Distributed Caching Products

Boost SQL Server Performance Buffer Pool Extensions & Delayed Durability

Distributed Systems LEEC (2005/06 2º Sem.)

CitusDB Architecture for Real-Time Big Data

1 How to Monitor Performance

Big Data Sharing with the Cloud - WebSphere extreme Scale and IBM Integration Bus Integration

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

How To Understand The Concept Of A Distributed System

Performance Tuning and Optimizing SQL Databases 2016

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

BBM467 Data Intensive ApplicaAons

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

ScaleArc idb Solution for SQL Server Deployments

HDFS Users Guide. Table of contents

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Design and Evolution of the Apache Hadoop File System(HDFS)

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Integrated Application and Data Protection. NEC ExpressCluster White Paper

White Paper ClearSCADA Architecture

I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES. Deploying an elastic Data Fabric with caché

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Cloud Operating Systems for Servers

Using VMWare VAAI for storage integration with Infortrend EonStor DS G7i

Tachyon: Reliable File Sharing at Memory- Speed Across Cluster Frameworks

Fast Data in the Era of Big Data: Twitter s Real-

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Network Infrastructure Services CS848 Project

A Performance Analysis of Distributed Indexing using Terrier

MySQL High-Availability and Scale-Out architectures

Project Convergence: Integrating Data Grids and Compute Grids. Eugene Steinberg, CTO Grid Dynamics May, 2008

JoramMQ, a distributed MQTT broker for the Internet of Things

From Spark to Ignition:

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Shark Installation Guide Week 3 Report. Ankush Arora

WEB11 WebSphere extreme Scale et WebSphere DataPower XC10 Appliance : les solutions de caching élastique WebSphere

CommuniGate Pro White Paper. Dynamic Clustering Solution. For Reliable and Scalable. Messaging


DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

Big Fast Data Hadoop acceleration with Flash. June 2013

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Transcription:

Developing Scalable Java Applications with Cacheonix

Introduction Presenter: Slava Imeshev Founder and main committer, Cacheonix Frequent speaker on scalability simeshev@cacheonix.com www.cacheonix.com/blog/

Cacheonix A distributed data management framework Program your distributed applications as easy as if they were singe-jvm applications, with APIs for: Distributed cache Distributed HashMap In-memory data grid Distributed locks Distributed ConcurrentHashMap Distributed data processing Cluster management Open source (LGPL)

When Single Server Is Not Enough Sooner or later your application will have to process more requests than a single server can handle You need to distribute your application to multiple servers (LAN, AWS, etc) A.K.A. horizontal scalability

Scaling Horizontally

Distributed Systems Processes communicate over the network instead of local memory Distributed programming is easy to do poorly and surprisingly tricky to do well: The network in unreliable The latency varies wildly The bandwidth is limited Topology changes The network is nonuniform

Problems to be Solved by Distributed Applications Distributed applications must address a lot of concerns that don t exist in single-jvm applications 1. Scalability bottlenecks 2. Reliability 3. Concurrency 4. State sharing 5. Data consistency 6. Load balancing 7. Failure management 8. Make sure it is easy to develop!

Horizontal Scalability Horizontal scalability is an ability to handle additional load by adding more servers Horizontal scalability offers a much greater benefit as compared to vertical scalability (2-1000 times improvement in capacity)

Bottleneck Problem Horizontal scalability is hard to achieve because of ever-present bottlenecks A bottleneck is a shared server or a service that: All or most requests must go through Request latency is proportional number of requests (100 requests 1 ms/req., 1000 requests 5 ms/req.) Examples: Databases, Hadoop clusters, file systems, mainframes

Bottleneck-Free System

Systems That Cannot Scale Added 2 more app servers Expected x3 increase in capacity Got only x2 System hit scalability limit Capacity of the database or other data source is a bottleneck

Solution To Bottleneck Problem: Distributed Cache Cacheonix implements a distributed cache that provides a large clustered in-memory data store for hard-to-get, frequently-read data The application is reading from the cache instead of being stuck in reading from the slow database

Distributed Cache Cacheonix provides: Strict data consistency - the result of an update is immediately observed on all members of the cluster Load balancing cached data is distributed evenly among servers as members join and leave the cluster High availability - Cacheonix provides uninterrupted, consistent data access in presence of server failures and cluster reconfiguration

Distributed Cache Cacheonix offers: Cache coherence for strict data consistency Partitioning for load balancing Replication for high availability Ease of use: Standard java.util.map interface

Distributed Cache Cacheonix cache plugins for ORM frameworks: Hibernate MyBatis DataNucleus

Distributed Cache

Reliability Problem Reliability is an ability of the system to continue to function normally in presence of failures of cluster members Processing of user requests must be automatically picked up by operational servers Reliability is hard: Cluster members leave and join Networks fail Servers die

Solution to Reliability Problem Cacheonix provides: Data replication Even replica storage Unique replication protocol Instant recovery from failures

Distributed Concurrency Problem Threads must prevent reading partially updated shared objects Threads need to coordinate (synchronize) access to shared objects Distributed concurrency is hard: Servers communicate using a network Servers no longer share memory space Servers may fail while holding locks

Distributed Concurrency Solution Cacheonix provides: Distributed ReadWriteLocks Distributed ConcurrentHashMap

Distributed ReadWriteLocks Fault-tolerant for liveness Locks are released when a lock-holding server fails or leaves the cluster Reliable for high availability Locks are maintained as long as there is at least a single live server in the cluster Strictly consistent All servers immediately observe mutual exclusions New members of the cluster observe existing locks

Distributed ReadWriteLocks

Problem of Distributed State Sharing Threads need to access shared state in order to do useful work State sharing in a single JVM is trivial because of the local memory space Distributed state sharing is hard: Servers communicate using the network Distributed applications no longer share the memory space

Solution to Distributed State Sharing Problem Cacheonix provides: Distributed HashMap

Distributed HashMap Strictly consistent Guarantees that all servers immediately see the updates to the data Easy to use java.util.map interface Reliable Retains the data as servers fail or join the cluster

Designing for Running in Cluster Store state shared between threads in Maps. Convert the code below: to:

Failure Management Distributed applications experience failures not seen by single-jvm applications because networks are unreliable and servers die Event: Cluster partitioning causes a minority cluster to block Result: distributed operations may block for extended periods of time to avoid consistency errors Event: Cluster reconfiguration leads to leaving the minority cluster and joining the majority cluster Result: Locks and other consistent operations in progress are no longer valid and must be cancelled

Failure Management Cacheonix: Provides an ability to report a blocked cluster state for communicating it to the end user Detects change in cluster configuration (joining other cluster) and cancel consistent operations by throwing exceptions (lock()/unlock() and put ()/get()) Helps to prepare the application for dealing with such conditions, minimally gracefully reporting a error to the user.

Cluster Management and Data Distribution Protocol Cacheonix protocol: Symmetric clustering No single point of failure Wire-level Highest possible speed Data distribution Reliable Strictly consistent

Cluster Management and Data Distribution Protocol Cacheonix protocol enables: Distributed caching, Data replication, Reliable distributed locks, Consistent state sharing and Cluster management

Distributed Architecture

Tying It All Together: Distributed Data Management Framework Cacheonix

How about single-server applications?

Single-Server Architecture

Vertical Scalability Vertical scalability is handling additional load by adding more power to a single machine Vertical scalability is trivial to achieve. Just switch to a faster CPU, add more RAM or replace an HDD with an SSD Vertical scalability can be limited by bottlenecks: Databases Expensive calculations

Scaling Vertically with Cacheonix Cacheonix provides a fast local cache Eliminates database bottlenecks Improves performance Prepares for scaling in a cluster Use cases Local front cache Local query cache Local L2 cache for Hibernate, MyBatis and DataNucleus

Q & A

Cacheonix Open Source Distributed Data Management Framework Ease of development, Reliable distributed cache, Strict data consistency, Replicated distributed locks, State sharing in a cluster, Distributed ConcurrentHashMap, Cluster management, Fast local cache, And more!

Download Cacheonix at downloads.cacheonix.com Cacheonix wiki: wiki.cacheonix.com