Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Similar documents

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs

INTRODUCING APACHE IGNITE An Apache Incubator Project

In Memory Accelerator for MongoDB

In-Memory BigData. Summer 2012, Technology Overview

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

How To Create A Data Visualization With Apache Spark And Zeppelin

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Hadoop Architecture. Part 1

Workshop on Hadoop with Big Data

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Extending Hadoop beyond MapReduce

Real Time Big Data Processing

Hadoop IST 734 SS CHUNG

Unified Big Data Processing with Apache Spark. Matei

Case Study : 3 different hadoop cluster deployments

Big Data Course Highlights

Table Of Contents. 1. GridGain In-Memory Database

Scalable Architecture on Amazon AWS Cloud

The Inside Scoop on Hadoop

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Big data blue print for cloud architecture

HADOOP MOCK TEST HADOOP MOCK TEST I

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Oracle Database 12c Plug In. Switch On. Get SMART.

Wisdom from Crowds of Machines

Big Data Analytics - Accelerated. stream-horizon.com

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

THE HADOOP DISTRIBUTED FILE SYSTEM

Virtualizing Apache Hadoop. June, 2012

Ground up Introduction to In-Memory Data (Grids)

Hadoop & Spark Using Amazon EMR

Big Data Trends and HDFS Evolution

Hadoop: Embracing future hardware

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Introduction to Big Data Training

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Moving From Hadoop to Spark

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

Unified Big Data Analytics Pipeline. 连城

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

BIG DATA SOLUTION DATA SHEET

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

From Spark to Ignition:

IN-MEMORY DATA FABRIC: Data Grid

How Companies are! Using Spark

Real-Time Analytics for Big Market Data with XAP In-Memory Computing

HiBench Introduction. Carson Wang Software & Services Group

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Scaling Out With Apache Spark. DTL Meeting Slides based on

Apache Hadoop. Alexandru Costan

Accelerating and Simplifying Apache

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Solving performance and data protection problems with active-active Hadoop SOLUTIONS BRIEF

Project Convergence: Integrating Data Grids and Compute Grids. Eugene Steinberg, CTO Grid Dynamics May, 2008

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Peers Techno log ies Pv t. L td. HADOOP

Introduction to Spark

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Fundamentals Curriculum HAWQ

A Performance Analysis of Distributed Indexing using Terrier

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

Xiaoming Gao Hui Li Thilina Gunarathne

Oracle WebLogic Server 11g Administration

Cloud Computing at Google. Architecture

NoSQL for SQL Professionals William McKnight

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Complete Java Classes Hadoop Syllabus Contact No:

HDFS. Hadoop Distributed File System

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Towards Smart and Intelligent SDN Controller

GigaSpaces Real-Time Analytics for Big Data

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz

Spark: Making Big Data Interactive & Real-Time

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Where is Hadoop Going Next?

Parallel Computing. Benson Muite. benson.

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Hadoop and Map-Reduce. Swati Gore

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Big Fast Data Hadoop acceleration with Flash. June 2013

Apache Flink Next-gen data analysis. Kostas

Transcription:

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org #apacheignite

Agenda Apache Ignite (tm) In- Memory Data Fabric Advanced Clustering Data Grid Compute Grid Service Grid Streaming & CEP Plug- n- Play Hadoop Accelerator Benchmarking Q & A

Apache Ignite TM In- Memory Data Fabric: Strategic Approach to IMC Supports Applications of various types and languages Open Source Apache 2.0 Simple Java APIs 1 JAR Dependency High Performance & Scale Automatic Fault Tolerance Management/Monitoring Runs on Commodity Hardware Supports existing & new data sources No need to rip & replace

In- Memory Data Fabric: More Than Data Grid

In- Memory Data Fabric: Clustering Ease of Getting Started Automatic Discovery Any Environment Public Cloud (AWS, OpenStack) Private Cloud Hybrid Cloud Local Laptop Zero- Deployment Auto- Deploy Code Pluggable Design

In- Memory Data Fabric: Clustering

In- Memory Data Fabric: Data Grid Distributed In- Memory Key- Value Store Replicated and Partitioned data TBs of data, of any type Redundant Backups Distributed ACID Transactions SQL queries and JDBC driver (ANSI 99) Data Structures (Queue, AtomicLong, etc.) Collocation of Compute and Data

In- Memory Data Fabric: Off- Heap Memory Unlimited Vertical Scale Avoid Java Garbage Collection Pauses Small On- Heap Footprint Large Off- Heap Footprint Off- Heap Indexes Full RAM Utilization Simple Configuration

Distributed Java Structures Distributed Map (cache) Distributed Set Distributed Queue CountDownLatch AtomicLong AtomicSequence AtomicReference Distributed ExecutorService 2014 GridGain Systems, Inc.

In- Memory Data Fabric: Compute Grid Direct API for MapReduce Direct API for ForkJoin Zero Deployment Cron- like Task Scheduling State Checkpoints Load Balancing Automatic Failover Full Cluster Management Pluggable SPI Design

In- Memory Data Fabric: Service Grid Singletons on the Cluster Cluster Singleton Node Singleton Key Singleton Distribute any Data Structure Available Anywhere on the Grid Access Anywhere via Proxies Guaranteed Availability Auto Redeployment in Case of Failures

In- Memory Data Fabric: Streaming and CEP Streaming Data Never Ends Branching Pipelines Pluggable Routing Sliding Windows CEP/Continuous Query SQL Query Real Time Analysis

In- Memory File System & Accelerated Map Reduce IGFS - In- Memory File System Zero Code Change Use existing MR code Use existing Hive queries No Name Node Write- through to HDFS In- Process Data Colocation Eager Push Scheduling

Yardstick: Distributed Benchmarking Use Yardstick Docker Containers Apache Ignite Hazelcast Apache Spark (coming) Apache Cassandra (coming) AWS Images Ready to Go View Results in S3 Bucket https://github.com/yardstick- benchmarks/yardstick- docker

ASF: Spark vs Ignite Apache Ignite Computational analytics In- Memory Indexes Real Streaming Transactional data processing Classic High Performance Compute Apache Spark Interactive analytics Full Scans (no indexes) Discretized Streaming Machine learning Classic data science 2014 GridGain Systems, Inc.

Coding Examples Compute Grid Data Grid

Join us This Year at the Industry- First JUNE 29-30 - SAN FRANCISCO, CA For networking, educawon and exchange of ideas Use promo code IMCS629M to register and get 30% off with the early bird rate at www.imcsummit.org/register- meetup/

ANY QUESTIONS? Thank you for joining us. Follow the conversation. http://www.ignite.incubator.apache.org #apacheignite