In-Memory BigData. Summer 2012, Technology Overview

Similar documents

Table Of Contents. 1. GridGain In-Memory Database

In Memory Accelerator for MongoDB

IN-MEMORY DATA FABRIC: Data Grid

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

INTRODUCING APACHE IGNITE An Apache Incubator Project

GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs

Hadoop IST 734 SS CHUNG

CDH AND BUSINESS CONTINUITY:

IBM WebSphere Distributed Caching Products

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

GigaSpaces Real-Time Analytics for Big Data

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Scaling Out With Apache Spark. DTL Meeting Slides based on

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Apache Hadoop. Alexandru Costan

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Ground up Introduction to In-Memory Data (Grids)

Hadoop Architecture. Part 1

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Hadoop and Map-Reduce. Swati Gore

Infinispan in 50 minutes. Sanne Grinovero

Big Data Course Highlights

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

From Spark to Ignition:

Tachyon: Reliable File Sharing at Memory- Speed Across Cluster Frameworks

Apache HBase. Crazy dances on the elephant back

I/O Considerations in Big Data Analytics

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Bigdata High Availability (HA) Architecture

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

NoSQL for SQL Professionals William McKnight

Liferay Performance Tuning

HADOOP MOCK TEST HADOOP MOCK TEST I

Spark: Cluster Computing with Working Sets

Implement Hadoop jobs to extract business value from large and varied data sets

How to Choose Between Hadoop, NoSQL and RDBMS

Bringing Big Data Modelling into the Hands of Domain Experts

Workshop on Hadoop with Big Data

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Oracle Database 12c Plug In. Switch On. Get SMART.

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Real-Time Analytics for Big Market Data with XAP In-Memory Computing

Redefining Microsoft SQL Server Data Management. PAS Specification

Hadoop & Spark Using Amazon EMR

Open Source for Cloud Infrastructure

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Practical Cassandra. Vitalii

Apache Hadoop FileSystem and its Usage in Facebook

Evolution from Big Data to Smart Data

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

<Insert Picture Here> Getting Coherence: Introduction to Data Grids South Florida User Group

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Performance and Scalability Overview

MapReduce with Apache Hadoop Analysing Big Data

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Distribution transparency. Degree of transparency. Openness of distributed systems

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Search and Real-Time Analytics on Big Data

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

Couchbase Server Under the Hood

Assignment # 1 (Cloud Computing Security)

Constructing a Data Lake: Hadoop and Oracle Database United!

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Cloud Based Application Architectures using Smart Computing

Extending Hadoop beyond MapReduce

Hadoop: Embracing future hardware

CSE-E5430 Scalable Cloud Computing Lecture 2

JBoss & Infinispan open source data grids for the cloud era

Performance and Scalability Overview

SQL Server 2005 Features Comparison

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Transcription:

In-Memory BigData Summer 2012, Technology Overview

Company Vision In-Memory Data Processing Leader: > 5 years in production > 100s of customers > Starts every 10 secs worldwide > Over 10,000,000 starts globally > Unique in-memory compute + data grid technology

In-Memory Processing Facts > 64-bit CPUs can address 16 exabytes > Disk up to 10 7 times slower than RAM > RAM prices drop 30% every 18 months > 1GB costs < $1 > 1TB RAM & 48 cores cluster ~ $40K > Multicore CPUs ideal for in-memory parallelization > Speed matters > Citi: 100ms == $1M > Google: 500ms == 20% traffic drop In-memory will have an industry impact comparable to web and cloud. RAM is a new disk, and disk is a new tape.

GridGain 4: Three Editions > Different markets, customers, messages, needs: > Compute Grid Edition > Data Grid Edition > Big Data Edition

GridGain 4: In A Glance > Scalable In-Memory Data Platform > Compute Grid + In-Memory Data Grid Real Time & Streaming MapReduce, CEP > TBs of data and 1000s of nodes Typical 10s of TBs and 100s of nodes > In-Memory Speed, Database Reliability > Native: Java, Scala and Groovy DSLs > Clients: C++,.NET, ios, Android, PHP, REST > Distributed in-memory object store

GridGain 4: New Features 1. In-Memory Data Grid 2. In-Memory Compute Grid 3. Streaming MapReduce 4. Clustering 5. Messaging 6. Advanced Security 7. DevOps GUI Console 8. SPI Architecture 9. Zero Deployment 10. Native Client APIs 11. Java, Scala, Groovy 12. Advanced Load Balancing 13. Pluggable Fault Tolerance 14. Hadoop Integration

Clustering GridGain 4 Sophisticated clustering capabilities for JVM with ability to connect and manage a heterogenous set of computing devices > Pluggable cluster topology management & various consistency strategies > Pluggable automatic discovery on LAN, WAN, and AWS > Pluggable split-brain cluster segmentation resolution > Unicast, broadcast, and Actor-based cluster-wide message exchange > Pluggable event storage and propagation > Versioning > Support for complex leader election algorithms > On-demand and direct deployment > Support for virtual clusters and grouping > Integration with Hadoop ZooKeeper

Advanced Security GridGain 4 > Cluster Security > Client Security > JAAS-based > Authentication > Secure Session

SPI Architecture GridGain 4 Fourteen SPIs provide plug-and-play capabilities to replace and customize every significant subsystem of GridGain runtime. 1. Checkpoint SPI 2. Collision SPI 3. Authentication SPI 4. Secure Session SPI 5. Indexing SPI 6. Load Balancing SPI 7. Communication SPI 8. Deployment SPI 9. Swap Space SPI 10. Metrics SPI 11. Discovery SPI 12. Failover SPI 13. Topology SPI 14. Event Storage SPI

Native Clients GridGain 4 > Java (EE & Android) > C++ >.NET C# > Objective C > REST > Memcache

Java, Scala, Groovy GridGain 4 > Java 6 > Scala 2.9 > Groovy 1.8 and Groovy++ > Scalar - Scala DSL for GridGain > Grover - Groovy++ DSL for GridGain

Hadoop Integration GridGain 4 > HBase cache store > ZooKeeper discovery integration > Distributed bulk data loader > Hadoop-compatible Distributed File System > In-memory & high performance alternative to HDFS

DevOps Console GridGain 4

Success Stories > Trading Systems Handle large volumes of transactions > Real-time Risk Analysis Analysis of trading positions & risk > Online Gaming Online real-time backbone for gaming > Actuarial Analysis Insurance Rating and Modeling > Geo Mapping Real-time geographical route and traffic information > Bioinformatics Real-time DNA sequencing and matching

GridGain Customers

In-Memory Data Grid Features 1 > Java-based distributed in-memory store > Zero deployment for data > Local, full replicable and partitioned cache types > Pluggable expiration policies (LRU, LFU, FIFO, time based and random) > Read-through and write through > Pluggable cache store (SQL, ERP, Hadoop) > Synchronous & asynchronous cache operations > MVCC-based concurrency > Pluggable data overflow storage > PESSIMISTIC & OPTIMISTIC ACID transactions

In-Memory Data Grid Features 2 > JTA/JTS integration > Master/master data replication > Master/master data invalidation > Replication/invalidation in async/sync modes > Write-behind cache store support > Concurrent/Delayed transactional preloading > Affinity routing with compute grid > Partitioned cache with active backups (replicas) > Structures and unstructured data > Transactional datacenter replication

In-Memory Data Grid Features 3 > Customizable/pluggable data indexing > JDBC driver for in-memory data > Co-located cache mode > BigMemory (off-heap allocation) support > Tiered storage with on-heap, off-heap, swap, SQL and Hadoop > Distributed in-memory query support > SQL-based affinity co-located queries > Lucene-based text affinity co-located queries > H2-based text affinity co-located queries > Predicate-based full scan queries > Support for pagination > Local & remote filtering, transformation and reduction for execution plan

In-Memory Compute Grid Features 1 > Direct API for map/split and reduce/aggregate > Pluggable failover management > Pluggable topology resolution > Pluggable collision resolution > Distributed task session > Distributed continuations and recursive split > Streaming MapReduce > Complex Event Processing (CEP) > Node-local cache > AOP-based, OOP/FP-based, sync/async execution modes

In-Memory Compute Grid Features 2 > Direct closure distribution in Java, Scala and Groovy > Cron-based task scheduling > Direct redundant mapping support > Zero deployment with P2P on-demand distributed class loading > Partial asynchronous reduction > Weighted and dynamic adaptive mapping > State checkpoints for long running tasks > Early and late load balancing > Affinity rouging with data grid

GridGain Systems 1065 East Hillsdale Blvd., Suite 230 Foster City, CA 94404 Web: www.gridgain.com Email: info@gridgain.com Twitter: @gridgain