In-Memory BigData. Summer 2012, Technology Overview

Size: px

Start display at page:

Download "In-Memory BigData. Summer 2012, Technology Overview"

Marvin Riley
10 years ago
Views:

1 In-Memory BigData Summer 2012, Technology Overview

2 Company Vision In-Memory Data Processing Leader: > 5 years in production > 100s of customers > Starts every 10 secs worldwide > Over 10,000,000 starts globally > Unique in-memory compute + data grid technology

3 In-Memory Processing Facts > 64-bit CPUs can address 16 exabytes > Disk up to 10 7 times slower than RAM > RAM prices drop 30% every 18 months > 1GB costs < $1 > 1TB RAM & 48 cores cluster ~ $40K > Multicore CPUs ideal for in-memory parallelization > Speed matters > Citi: 100ms == $1M > Google: 500ms == 20% traffic drop In-memory will have an industry impact comparable to web and cloud. RAM is a new disk, and disk is a new tape.

ideal for in-memory parallelization > Speed matters > Citi: 100ms == $1M > Google: 500ms == 20% traffic

4 GridGain 4: Three Editions > Different markets, customers, messages, needs: > Compute Grid Edition > Data Grid Edition > Big Data Edition

5 GridGain 4: In A Glance > Scalable In-Memory Data Platform > Compute Grid + In-Memory Data Grid Real Time & Streaming MapReduce, CEP > TBs of data and 1000s of nodes Typical 10s of TBs and 100s of nodes > In-Memory Speed, Database Reliability > Native: Java, Scala and Groovy DSLs > Clients: C++,.NET, ios, Android, PHP, REST > Distributed in-memory object store

TBs and 100s of nodes > In-Memory Speed, Database Reliability > Native: Java, Scala and

6 GridGain 4: New Features 1. In-Memory Data Grid 2. In-Memory Compute Grid 3. Streaming MapReduce 4. Clustering 5. Messaging 6. Advanced Security 7. DevOps GUI Console 8. SPI Architecture 9. Zero Deployment 10. Native Client APIs 11. Java, Scala, Groovy 12. Advanced Load Balancing 13. Pluggable Fault Tolerance 14. Hadoop Integration

DevOps GUI Console 8. SPI Architecture 9. Zero Deployment 10. Native Client APIs 11.

7 Clustering GridGain 4 Sophisticated clustering capabilities for JVM with ability to connect and manage a heterogenous set of computing devices > Pluggable cluster topology management & various consistency strategies > Pluggable automatic discovery on LAN, WAN, and AWS > Pluggable split-brain cluster segmentation resolution > Unicast, broadcast, and Actor-based cluster-wide message exchange > Pluggable event storage and propagation > Versioning > Support for complex leader election algorithms > On-demand and direct deployment > Support for virtual clusters and grouping > Integration with Hadoop ZooKeeper

cluster segmentation resolution > Unicast, broadcast, and Actor-based cluster-wide message exchange > Pluggable event storage and propagation >

8 Advanced Security GridGain 4 > Cluster Security > Client Security > JAAS-based > Authentication > Secure Session

9 SPI Architecture GridGain 4 Fourteen SPIs provide plug-and-play capabilities to replace and customize every significant subsystem of GridGain runtime. 1. Checkpoint SPI 2. Collision SPI 3. Authentication SPI 4. Secure Session SPI 5. Indexing SPI 6. Load Balancing SPI 7. Communication SPI 8. Deployment SPI 9. Swap Space SPI 10. Metrics SPI 11. Discovery SPI 12. Failover SPI 13. Topology SPI 14. Event Storage SPI

Authentication SPI 4. Secure Session SPI 5. Indexing SPI 6. Load Balancing SPI 7. Communication SPI 8.

10 Native Clients GridGain 4 > Java (EE & Android) > C++ >.NET C# > Objective C > REST > Memcache

11 Java, Scala, Groovy GridGain 4 > Java 6 > Scala 2.9 > Groovy 1.8 and Groovy++ > Scalar - Scala DSL for GridGain > Grover - Groovy++ DSL for GridGain

12 Hadoop Integration GridGain 4 > HBase cache store > ZooKeeper discovery integration > Distributed bulk data loader > Hadoop-compatible Distributed File System > In-memory & high performance alternative to HDFS

data loader > Hadoop-compatible Distributed File

13 DevOps Console GridGain 4

14 Success Stories > Trading Systems Handle large volumes of transactions > Real-time Risk Analysis Analysis of trading positions & risk > Online Gaming Online real-time backbone for gaming > Actuarial Analysis Insurance Rating and Modeling > Geo Mapping Real-time geographical route and traffic information > Bioinformatics Real-time DNA sequencing and matching

for gaming > Actuarial Analysis Insurance Rating and Modeling > Geo Mapping Real-time

15 GridGain Customers

16 In-Memory Data Grid Features 1 > Java-based distributed in-memory store > Zero deployment for data > Local, full replicable and partitioned cache types > Pluggable expiration policies (LRU, LFU, FIFO, time based and random) > Read-through and write through > Pluggable cache store (SQL, ERP, Hadoop) > Synchronous & asynchronous cache operations > MVCC-based concurrency > Pluggable data overflow storage > PESSIMISTIC & OPTIMISTIC ACID transactions

Read-through and write through > Pluggable cache store (SQL, ERP, Hadoop) > Synchronous & asynchronous cache

17 In-Memory Data Grid Features 2 > JTA/JTS integration > Master/master data replication > Master/master data invalidation > Replication/invalidation in async/sync modes > Write-behind cache store support > Concurrent/Delayed transactional preloading > Affinity routing with compute grid > Partitioned cache with active backups (replicas) > Structures and unstructured data > Transactional datacenter replication

Concurrent/Delayed transactional preloading > Affinity routing with compute grid > Partitioned cache

18 In-Memory Data Grid Features 3 > Customizable/pluggable data indexing > JDBC driver for in-memory data > Co-located cache mode > BigMemory (off-heap allocation) support > Tiered storage with on-heap, off-heap, swap, SQL and Hadoop > Distributed in-memory query support > SQL-based affinity co-located queries > Lucene-based text affinity co-located queries > H2-based text affinity co-located queries > Predicate-based full scan queries > Support for pagination > Local & remote filtering, transformation and reduction for execution plan

support > SQL-based affinity co-located queries > Lucene-based text affinity co-located queries > H2-based text affinity co-located

19 In-Memory Compute Grid Features 1 > Direct API for map/split and reduce/aggregate > Pluggable failover management > Pluggable topology resolution > Pluggable collision resolution > Distributed task session > Distributed continuations and recursive split > Streaming MapReduce > Complex Event Processing (CEP) > Node-local cache > AOP-based, OOP/FP-based, sync/async execution modes

Distributed task session > Distributed continuations and recursive split > Streaming MapReduce

20 In-Memory Compute Grid Features 2 > Direct closure distribution in Java, Scala and Groovy > Cron-based task scheduling > Direct redundant mapping support > Zero deployment with P2P on-demand distributed class loading > Partial asynchronous reduction > Weighted and dynamic adaptive mapping > State checkpoints for long running tasks > Early and late load balancing > Affinity rouging with data grid

distributed class loading > Partial asynchronous reduction > Weighted and dynamic adaptive mapping

21 GridGain Systems 1065 East Hillsdale Blvd., Suite 230 Foster City, CA Web:

Table Of Contents. 1. GridGain In-Memory Database

Table Of Contents. 1. GridGain In-Memory Database Table Of Contents 1. GridGain In-Memory Database 2. GridGain Installation 2.1 Check GridGain Installation 2.2 Running GridGain Examples 2.3 Configure GridGain Node Discovery 3. Starting Grid Nodes 4. Management