Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Similar documents

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs

INTRODUCING APACHE IGNITE An Apache Incubator Project

In-Memory BigData. Summer 2012, Technology Overview

In Memory Accelerator for MongoDB

How To Create A Data Visualization With Apache Spark And Zeppelin

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

GridGain gets open source in-memory accelerator out of the blocks

An Open Source Memory-Centric Distributed Storage System

Moving From Hadoop to Spark

Table Of Contents. 1. GridGain In-Memory Database

How Companies are! Using Spark

Unified Big Data Processing with Apache Spark. Matei

Hadoop IST 734 SS CHUNG

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Large scale processing using Hadoop. Ján Vaňo

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Virtualizing Apache Hadoop. June, 2012

Scaling Out With Apache Spark. DTL Meeting Slides based on

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

How To Choose A Data Flow Pipeline From A Data Processing Platform

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

Hadoop Architecture. Part 1

Workshop on Hadoop with Big Data

Introduction to Spark

Unified Big Data Analytics Pipeline. 连城

IN-MEMORY DATA FABRIC: Data Grid

ORACLE COHERENCE 12CR2

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop & Spark Using Amazon EMR

Apache Flink Next-gen data analysis. Kostas

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

The Future of Data Management

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Case Study : 3 different hadoop cluster deployments

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Big Data Course Highlights

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

HiBench Introduction. Carson Wang Software & Services Group

Big Data Primer. 1 Why Big Data? Alex Sverdlov alex@theparticle.com

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Oracle Database 12c Plug In. Switch On. Get SMART.

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Big Data Analytics - Accelerated. stream-horizon.com

A Performance Analysis of Distributed Indexing using Terrier

Analytics on Spark &

Next-Gen Big Data Analytics using the Spark stack

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Real Time Data Processing using Spark Streaming

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Hadoop: Embracing future hardware

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Hadoop in the Enterprise

Actian Vector in Hadoop

Big Data on Microsoft Platform

Ground up Introduction to In-Memory Data (Grids)

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

NoSQL for SQL Professionals William McKnight

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

WHAT S NEW IN SAS 9.4

Big data blue print for cloud architecture

Introduction to Cloud Computing

JBoss & Infinispan open source data grids for the cloud era

HADOOP MOCK TEST HADOOP MOCK TEST I

Native Connectivity to Big Data Sources in MSTR 10

Trend Micro Big Data Platform and Apache Bigtop. 葉祐欣 (Evans Ye) Big Data Conference 2015

An Architecture Vision

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

How To Use Big Data For Telco (For A Telco)

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Spark: Making Big Data Interactive & Real-Time

Cloud Computing. Big Data. High Performance Computing

Hadoop implementation of MapReduce computational model. Ján Vaňo

Transcription:

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org @apacheignite @dsetrakyan

Agenda About In- Memory Computing Apache Ignite (tm) In- Memory Data Fabric Advanced Clustering Data Grid Compute Grid Service Grid Ignite For Analytics Streaming & CEP Share State Across Spark Jobs In- Memory MapReduce Interactive SQL DevOps: Yarn and Mesos Q & A

Apache Ignite TM In- Memory Data Fabric: Strategic Approach to IMC Supports Applications of various types and languages Open Source Apache 2.0 Simple Java APIs 1 JAR Dependency High Performance & Scale Automatic Fault Tolerance Management/Monitoring Runs on Commodity Hardware Supports existing & new data sources No need to rip & replace

In- Memory Data Fabric: More Than Data Grid

Apache Ignite: Better Cloud Support Automatic Discovery Simple Configuration AWS/EC2/S3 Google Compute Engine (NEW) Other Clouds with JClouds (NEW) Docker Support Automatically Build and Deploy

Data Grid: JCache (JSR 107) JCache (JSR 107) Basic Cache Operations ConcurrentMap APIs Collocated Processing (EntryProcessor) Events and Metrics Pluggable Persistence Ignite Data Grid ACID Transactions SQL Queries (ANSI 99) In- Memory Indexes Automatic RDBMS Integration

Data Grid: Partitioned Cache

Data Grid: Replicated Cache

Data Grid: Off- Heap Memory Unlimited Vertical Scale Avoid Java Garbage Collection Pauses Small On- Heap Footprint Large Off- Heap Footprint Off- Heap Indexes Full RAM Utilization Simple Configuration

Data Grid: Ad- Hoc SQL (ANSI 99) ANSI- 99 SQL Always Consistent Fault Tolerant In- Memory Indexes (On- Heap and Off- Heap) Automatic Group By, Aggregations, Sorting Cross- Cache Joins, Unions, etc. Ad- Hoc SQL Support

SQL Cross- Cache JOIN Example

SQL Cross- Cache GROUP BY Example

In- Memory Compute Grid Direct API for MapReduce Direct API for ForkJoin Zero Deployment Cron- like Task Scheduling State Checkpoints Load Balancing Automatic Failover Full Cluster Management Pluggable SPI Design

In- Memory Streaming and CEP Streaming Data Never Ends Branching Pipelines Pluggable Routing Sliding Windows for CEP/Continuous Query SQL Queries (ANSI 99) Query Across Sliding Windows Real Time Analysis

In- Memory Service Grid Singletons on the Cluster Cluster Singleton Node Singleton Key Singleton Distribute any Data Structure Available Anywhere on the Grid Access Anywhere via Proxies Guaranteed Availability Auto Redeployment in Case of Failures

Apache Ignite for BI and Analytics

DevOps: Integration with Yarn and Mesos Automatic Resource Management Easy Data Center Installation Easy Data Center Configuration On- Demand Elasticity

Share RDDs Across Spark Jobs IgniteRDD Share RDD across jobs on the host Share RDD across jobs in the application Share RDD globally Faster SQL In- Memory Indexes SQL on top of Shared RDD

Ignite In- Memory File System Ignite In- Memory File System (IGFS) Hadoop- compliant Easy to Install On- Heap and Off- Heap Caching Layer for HDFS Write- through and Read- through HDFS Performance Boost

Ignite In- Memory Map Reduce In- Memory Native Performance Zero Code Change Use existing MR code Use existing Hive queries No Name Node No Network Noise In- Process Data Colocation Eager Push Scheduling

Interactive SQL with Apache Zeppelin

GridGain Enterprise & Apache Ignite Comparison Chart Features Apache Ignite Enterprise Edition In-Memory Data Grid In-Memory Compute Grid CHECK Real-Time Streaming & CEP Hadoop Acceleration Management & Monitoring GUI Portable Objects.Net and C++ APIs Enterprise-grade Security Network Segmentation Protection Local Restartable Store Rolling Production Updates Datacenter Replication 9x5 and 24x7 Support Long Term Support & Patches GridGain Enterprise Subscriptions include the following during the term of the subscription: > Right to use GridGain Enterprise Edition > Bug fixes, patches, updates and upgrades > 9x5 or 24x7 Support > Ability to procure Training and Consulting Services from GridGain > Confidence and protection, not provided under Open Source licensing, that only a commercial vendor can provide, such as indemnification

ANY QUESTIONS? Thank you for joining us. Follow the conversation. http://www.ignite.incubator.apache.org @apacheignite @dsetrakyan