In-Memory Data Grids
|
|
|
- Peter Arnold Marsh
- 10 years ago
- Views:
Transcription
1 The Future of Real-time Big Data: In-Memory Data Grids Jacek Kruszelnicki, Numatica Corporation Phone:
2 Presenter Jacek Kruszelnicki President & CEO, Numatica Corporation Intellectuals solve problems. Geniuses prevent them. -- Albert Einstein 20+ years of shipping scalable, distributed enterprise software Vendor neutral. Merit-based. Biased towards simplicity/elegance. Author, mentor & conference speaker Founder and past persident of New England WebLogic User Group M.S. in Computer Science, Adjunct Professor at Northeastern University Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 2
3 Agenda Definitions: What is Big Data? What is Real-time Big Data Analytics? What is High-velocity data stream computing Business Drivers Big Data Universe: Hadoop, NoSQL and Distributed Data Grids Distributed Data Grids: What are Distributed Data Grids Charateristics of a Distributed Data Grid Major players, and the winner is... From Technlogies to Platforms The Ultimate: Unified Big Data processing Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 3
4 Not on Agenda Not an IMDG tutorial Not In-depth comparison of IMDGs (things change too fast) Data modeling, partitioning strategy (hub/spoke, data affinity, etc) (separate presentation) Analysis of competing frameworks suitable for Real-Time B.D.A. (separate presentation) CAP Theorem discussion Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 4
5 What is Big Data? Data sets too large and complex to process using standard DB tools and data processing applications Few dozen terabytes to Many petabytes of data in a single data set. High Volume High Variety Structured, semi-structured At-rest, In-flight Variety of formats High Velocity Millions of events per second Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 5
6 Real-time Big Data Analytics High Volume High Variety Focus on Velocity + Variety Rapidly changing data (at rest or in-motion): Business activities Examples: trades, orders High Velocity External stimuli Examples: market data, current customer location Operational monitoring Example: application log entries Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 6
7 Business Drivers Operational Intelligence: Analyze stream of business activities and external stimuli on-the fly React to them (preferably) instantaneously First-mover competitive advantage (new market conditions, customer wishes, etc) Real time data stream processing is critical, otherwise business value is lost. Real-time Big Data Analytics examples: Dynamic pricing (e-commerce) High-frequency trading Network security threats Credit card fraud prevention Factory floor data collection, RFID Mobile infrastructure, machine to machine (M2M) applications Prescriptive or Location-based applications Real-time dashboards, alerts, and reports Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 7
8 Growth (per Gartner) In-Memory Computing Racing Towards Mainstream Adoption Market to Reach $1 Billion by 2016 Issues so far: Lack of standards, Scarcity of skills, Relative architectural complexity, Monitoring and management challenges Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 8
9 Real-time Data Processing Advanced queries or complex transactions on huge datasets Get results in (near) real-time In-motion: Rapidly changing, massive stream(s) of data (events) Multiple sources, formats Needs to be processed in-flight Events persisted or not, depending on value At rest: Data already persisted, change notification Re-ingest, re-process (deltas only?) Typically, a combination of both Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 9
10 Big Data Universe Hadoop Spark Shark (Spark on Hive) 40x Still needs Hadoop: HDFS, YARN, Pig, Hive, HBase, JobTracker. TaskTracker,... Open-source, batch-oriented MPP DBs Teradata Vertica Greenplum Proprietary, complex, expensive NoSQL DBs Column: Cassandra, Hbase, Redshift Document/K-V: MongoDB, CouchDB, Riak Graph: Neo4j, etc. No tx, no referential integrity, triggers, foreign keys In-Memory Data Grids Coherence (commercial Hazelcast (OSS) Terracotta (commercial) Gemfire (commercial) GridGain (commercial) Gigaspaces XAP (OSS) Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 10
11 The Origins of In-Memory Data Grids Local Cache Distributed Cache (K-V Store) + Distributed List, Queue, Topic Computational Grids Reliable jobs execution, Scheduling Load balancing In-Memory Distributed Data/Compute Grid RAM is the new disk DISK is the new tape SQL (not Turing complete) MapReduce (low granularity, invasive), often not enough From: Disk IO optimization To: Data management over network (data in a grid) Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 11
12 The Concept Node 1 Node 2 Node 3 RAM/JVM RAM/JVM RAM/JVM Auxiliary Storage (RDBMS, FS) (Pluggable Serialization) Data partitioning: Logical (on key) % LOGICAL_PARTITION_COUNT (Hazelcast = 271) Physical (one or more logical) Partition ownership, replicas reassigned on membership change Can determine physical location by key, to dispatch logic Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 12
13 Real-time Big Data Universe Hadoop family Apache Storm Spark/Shark In-Memory Data Grids => Data/Compute Grids Coherence (commercial Hazelcast (OSS) Terracotta (commercial) Gemfire (commercial) GridGain (OSS, commercial) Gigaspaces XAP (OSS) Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 13
14 Traits of a Real-time Analytics platform Low Transactional Latency In-memory transaction speeds, minimize network trips, response in ms Low Data Latency Real-time data, rather than stale data weeks or months old High Throughput, High Scalability Horizontal and vertical Stream processing/messaging/cep Continuous Query Resilience Stateless, decentralized, peer-peer - no single point of failure, Partitioned with backup replicas. nodes can fail without data loss and join without any data corruption Elastic, simple cluster management No Mesos on top of ZooKeeper, etc. Programmatic, performant SQL-like querying (data affinity) Txs, including XA (when needed) Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 14
15 Major players - comparison Feature Hazelcast 3.2 GemFire 7 Oracle Coherence Distributed data/collections DistributedMap MultiMap Dist. Queue Dist. Events No Multimap No Multimap Distributed Concurrency Dist. Lock Dist. Atomic Ref Dist. Atomic Dist Semaphore No Dist. Atomic Ref. No Dist. Semaphore No Dist. Atomic Ref. No Dist. Semaphore Distributed Computing Executor Service, Key Affinity, Map/Reduce Convoluted, Invasive Convoluted, Invasive Resilience In-memory replicas Recovery WAN replication Elasticity/Cluster Management Yes Yes Yes Transactions Local + XA Local + XA Local + XA Yes Yes Distributed Querying SQL (subset) Predicate Continuous Query Yes Yes License Apache Commercial Commercial Commercial Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 15
16 And the winner is... The most intellectually elegant distributed in-memory data/compute grid. Minimalism as design aesthetic: Non-intrusive, No dependencies 2.6MB single jar library. Implements Java APIs (Map, List, Set, Queue, Lock) in a distributed manner Distributed Execution Framework (extension of Java s Executor Service) Distributed Queries (SQL/Predicate), Data affinity (execution on specific node execution) Cluster management Java API included (dead simple) Auto-discovery of nodes and intelligent synchronization Apache License 2, commercial extensions + support Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 16
17 Code example: Distributed Execution Framework (1) import java.util.concurrent.callable; public class Echo implements Callable<String>, Serializable { String input = null; } public Echo() { } public Echo(String input) { this.input = input; } public String call() { return Hazelcast.getCluster().getLocalMember().toString() + input; } ExecutorService executorservice = Executors.newSingleThreadExecutor(); Future<String> future = executorservice.submit (new Echo("myinput"));... String result = future.get(); Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 17
18 Code example Distributed Execution Framework (2) import com.hazelcast.core.iexecutorservice; import java.util.concurrent.callable; import java.util.concurrent.future; import java.util.set; public void echosomewhere(string input, Member member) { Callable<String> task = new Echo(input); HazelcastInstance hz = Hazelcast.newHazelcastInstance(); IExecutorService executorservice = hz.getexecutorservice("default"); Future<String> future = executorservice.submittomember(task, member); Future<String> future = executorservice.submittokeyowner(task, key); Map<Member, Future<String>> futures = executorservice.submittomembers(new Echo(input), members); String echoresult = future.get(); Execution callbacks, cancellation APIs provided Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 18
19 Code example: Data Affinity(1) HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg); Map mapa = instance.getmap("mapa"); Map mapb = instance.getmap("mapb"); Map mapc = instance.getmap("mapc"); // different entries, but the operation will take place on the same member mapa.put("key1", value); mapb.get("key1"); mapc.remove("key1"); // lock operation will still execute on the same member of the cluster instance.getlock ("key1").lock(); // consistent with distributed execution instance.getexecutorservice().executeonkeyowner(runnable, "key1"); Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 19
20 Code example: Data Affinity(2) public class OrderKey implements Serializable, PartitionAware { int customerid; int orderid; public OrderKey(int orderid, int customerid) {...} public Object getpartitionkey() { return customerid; } } Map mapcustomers = instance.getmap("customers") ; Map maporders = instance.getmap("orders"); mapcustomers.put(1, customer); maporders.put(new OrderKey(21, 1), order); public static class OrderDeletionTask implements Callable<Integer>, PartitionAware, Serializable {... } ExecutorService es = instance.getexecutorservice(); OrderDeletionTask task = new OrderDeletionTask(customerId, orderid); Future future = es.submit(task); int remainingorders = future.get(); Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 20
21 Code example: Distributed Query public class Employee implements Serializable { private String name; private int age; private boolean active; private double salary; } HazelcastInstance hz = Hazelcast.newHazelcastInstance(cfg); IMap map = hz.getmap("employeemap"); Set<Employee> employees = (Set<Employee>) map.values( new SqlPredicate("active AND age < 30")); // or JPA-like criteria Predicate predicate = e.is("active").and(e.get("age").lessthan(30)); Set<Employee> employees = (Set<Employee>) map.values(predicate); Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 21
22 Code example: Cluster Management HazelcastInstance hz = Hazelcast.newHazelcastInstance(cfg); Cluster cluster = hz.getcluster(); cluster.addmembershiplistener(new MembershipListener(){ public void memberadded(membershipevent membersipevent) { System.out.println("MemberAdded " + membersipevent); } } Member localmember = cluster.getlocalmember(); System.out.println ("my inetaddress= " + localmember.getinetaddress()); Set setmembers = cluster.getmembers(); for (Member member : setmembers) { System.out.println ("islocalmember " + member.localmember()); System.out.println ("member.inetaddress " + member.getinetaddress()); System.out.println ("member.port " + member.getport()); } Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 22
23 Example: Admin console Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 23
24 More than just Real-time Big Data Scalable Computing: DB Caching Web/Session Clustering High-speed K-V Data Store Distributed queues, topics (Fast) SOA Services Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 24
25 Limitations RAM currently maxes out at ~ 640GB/server Still more expensive than SSDs and HDDs Garbage collection Cost and capacity limitations will disappear over time Off-heap memory and specialized JVMs (Azul, etc.) Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 25
26 From Technologies to Platforms In-Memory Data/Compute Grids not enough: Backing DB(s) Enterprise Message Store(s) Example: GemFire GreenPlum DB Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 26
27 Holy Grail: Unified Big Data Processing Multi-Source Data Harvesting, Ingestion, Transformation Data-type agnostic: Highly Structured (RDBMS data) Semi-structured (documents, key-value tuples) In-Motion and At Rest Distributed Computing, not just Data Tx and non-tx Batch and/or Real-time Processing The fewer moving parts the better Data Warehousing -> Hadoop/MapReduce -> Unified Big Data Processing Relational + Batch Batch All of the above Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 27
28 Unified Architecture Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 28
29 Summary One order of magnitude faster Data And Logic colocated, affinity-able (with careful design) Flexibility: Turing complete SQL/Predicate queries MapReduce framework on top Scalable and Elastic, simply (automatic repartitioning) Current limitations (cost, GC) fading fast Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 29
30 Q & A Copyright (C) 2014 Numatica Corporation. All Rights Reserved. 30
31 Appendix 31
32 Berkeley Big-data Analytics Stack (BDAS) 32
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs
GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs DMITRIY SETRAKYAN Founder & EVP Engineering @dsetrakyan www.gridgain.com #gridgain Agenda EvoluCon of In- Memory CompuCng
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
In Memory Accelerator for MongoDB
In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Enterprise Operational SQL on Hadoop Trafodion Overview
Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org #apacheignite Agenda Apache Ignite (tm) In- Memory
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org @apacheignite @dsetrakyan Agenda About In- Memory
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
Real-time Big Data Analytics with Storm
Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
In-Memory Analytics for Big Data
In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
INTRODUCING APACHE IGNITE An Apache Incubator Project
WHITE PAPER BY GRIDGAIN SYSTEMS FEBRUARY 2015 INTRODUCING APACHE IGNITE An Apache Incubator Project COPYRIGHT AND TRADEMARK INFORMATION 2015 GridGain Systems. All rights reserved. This document is provided
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
MapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside [email protected] About Journey Dynamics Founded in 2006 to develop software technology to address the issues
<Insert Picture Here> Getting Coherence: Introduction to Data Grids South Florida User Group
Getting Coherence: Introduction to Data Grids South Florida User Group Cameron Purdy Cameron Purdy Vice President of Development Speaker Cameron Purdy is Vice President of Development
BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS
BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS WHAT IS BIG DATA? describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012
Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation
Dominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
Table Of Contents. 1. GridGain In-Memory Database
Table Of Contents 1. GridGain In-Memory Database 2. GridGain Installation 2.1 Check GridGain Installation 2.2 Running GridGain Examples 2.3 Configure GridGain Node Discovery 3. Starting Grid Nodes 4. Management
BIG DATA TOOLS. Top 10 open source technologies for Big Data
BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon [email protected] [email protected] XLDB
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:
NoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
Internals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
Using an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
Introduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН
Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Zettabytes Petabytes ABC Sharding A B C Id Fn Ln Addr 1 Fred Jones Liberty, NY 2 John Smith?????? 122+ NoSQL Database
IN-MEMORY DATA FABRIC: Data Grid
WHITE PAPER IN-MEMORY DATA FABRIC: Data Grid COPYRIGHT AND TRADEMARK INFORMATION 2014 GridGain Systems. All rights reserved. This document is provided as is. Information and views expressed in this document,
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
Tap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Real-Time Analytics for Big Market Data with XAP In-Memory Computing
Real-Time Analytics for Big Market Data with XAP In-Memory Computing March 2015 Real Time Analytics for Big Market Data Table of Contents Introduction 03 Main Industry Challenges....04 Achieving Real-Time
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
Open Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
Getting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
Big Data Introduction
Big Data Introduction Ralf Lange Global ISV & OEM Sales 1 Copyright 2012, Oracle and/or its affiliates. All rights Conventional infrastructure 2 Copyright 2012, Oracle and/or its affiliates. All rights
Big Data Success Step 1: Get the Technology Right
Big Data Success Step 1: Get the Technology Right TOM MATIJEVIC Director, Business Development ANDY MCNALIS Director, Data Management & Integration MetaScale is a subsidiary of Sears Holdings Corporation
Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Big Data: Are You Ready? Kevin Lancaster
Big Data: Are You Ready? Kevin Lancaster Director, Engineered Systems Oracle Europe, Middle East & Africa 1 A Data Explosion... Traditional Data Sources Billing engines Custom developed New, Non-Traditional
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
Unified Big Data Analytics Pipeline. 连 城 [email protected]
Unified Big Data Analytics Pipeline 连 城 [email protected] What is A fast and general engine for large-scale data processing An open source implementation of Resilient Distributed Datasets (RDD) Has an
In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015
In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015 June 29-30, 2015 Contacts Alexandre Boudnik Senior Solution Architect, EPAM Systems [email protected]
Upcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
Real Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
TRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
Spark: Making Big Data Interactive & Real-Time
Spark: Making Big Data Interactive & Real-Time Matei Zaharia UC Berkeley / MIT www.spark-project.org What is Spark? Fast and expressive cluster computing system compatible with Apache Hadoop Improves efficiency
In-Memory Computing : Premiere
White Paper In-Memory Computing : Premiere Overview Modern business environments have been evolving rapidly, pushing IT needs and expectations to new highs. Businesses are not asking for performance alone,
GigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store
Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information
Reference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
In-Memory BigData. Summer 2012, Technology Overview
In-Memory BigData Summer 2012, Technology Overview Company Vision In-Memory Data Processing Leader: > 5 years in production > 100s of customers > Starts every 10 secs worldwide > Over 10,000,000 starts
