CAPTURING & PROCESSING REAL-TIME DATA ON AWS



Similar documents
Real Time Big Data Processing

Hadoop & Spark Using Amazon EMR

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Thing Big: How to Scale Your Own Internet of Things.

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Amazon Kinesis and Apache Storm

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

Introduction to AWS in Higher Ed

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Technology Enablement

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Designing Agile Data Pipelines. Ashish Singh Software Engineer, Cloudera

Real-time Big Data Analytics with Storm

Building Real-Time Analytics Into Big Data Applications

Cloud Big Data Architectures

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Getting Real Real Time Data Integration Patterns and Architectures

Microservices on AWS

Azure Data Lake Analytics

More Data in Less Time

SAP and Hortonworks Reference Architecture

AIST Data Symposium. Ed Lenta. Managing Director, ANZ Amazon Web Services

Luncheon Webinar Series May 13, 2013

Big Data Use Case: Business Analytics

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Razvoj Java aplikacija u Amazon AWS Cloud: Praktična demonstracija

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Big Data Analytics Nokia

Innovative Geschäftsmodelle Ermöglicht durch die AWS Cloud

HDP Hadoop From concept to deployment.

Streaming items through a cluster with Spark Streaming

Big data blue print for cloud architecture

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Big Data Pipeline and Analytics Platform

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

How to Leverage Cloud to Quickly Build Scalable Applications

Business Intelligence for Big Data

Amazon Web Services Annual ALGIM Conference. Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand

From Spark to Ignition:

The Future of Data Management

Big Data Web Analytics Platform on AWS for Yottaa

Analytics on Spark &

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Logentries Insights: The State of Log Management & Analytics for AWS

tuplejump The data engineering platform

Integrating a Big Data Platform into Government:

Scalability in the Cloud HPC Convergence with Big Data in Design, Engineering, Manufacturing

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

BIG DATA ANALYTICS For REAL TIME SYSTEM

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

iway Roadmap: 2011 and Beyond Dave Watson SVP, iway Software

Native Connectivity to Big Data Sources in MSTR 10

Ryan Horn, Lead Software Engineer at Twilio. November 12, 2014 Las Vegas. BDT312 Using the Cloud to Scale from a Database to a Data Platform

Introduction to Amazon Web Services! Leo Senior Solutions Architect

the missing log collector Treasure Data, Inc. Muga Nishizawa

Big Data and Market Surveillance. April 28, 2014

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Scalable Architecture on Amazon AWS Cloud

Big Data and Industrial Internet

Big Data for everyone Democratizing big data with the cloud. Steffen Krause Technical

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

ur skills.com

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Amazon Web Services. Lawrence Berkeley LabTech Conference 9/10/15. Jamie Baker Federal Scientific Account Manager AWS WWPS

Cisco IT Hadoop Journey

Why Big Data in the Cloud?

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

So What s the Big Deal?

Ganzheitliches Datenmanagement

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

HDP Enabling the Modern Data Architecture

Databricks. A Primer

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Data Integration Hub

The Inside Scoop on Hadoop

MICROSTRATEGY ON AWS

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

AWS Lambda. Developer Guide

Big Data at Cloud Scale

The Game of Big Data! Analytics Infrastructure at KIXEYE

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Processing and Analyzing Streams. CDRs in Real Time

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Databricks. A Primer

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Big Data Infrastructure at Spotify

Dashboard Engine for Hadoop

Transcription:

CAPTURING & PROCESSING REAL-TIME DATA ON AWS @ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Agenda Real-Time Analytics Data Ingestion Data Processing n Architecture n AWS Lambda Customer Implementations

Real-Time Analytics Real-time Ingest! Highly Scalable" Durable" Elastic " Replay-able Reads" " Continuous Processing FX! + Load-balancing incoming streams" Fault-tolerance, Checkpoint / Replay" Elastic" Enable multiple apps to process in parallel" Continuous, real-time workloads! Low end-to-end latency! Continuous data flow!

Data Ingestion

Starting simple... foo-analysis.com Global top-10

Distributing the workload Elastic Beanstalk foo-analysis.com Global top-10

Or using a Elastic Data Broker Local top-10 Local top-10 Local top-10 Elastic Beanstalk foo-analysis.com Global top-10

Amazon Kinesis Managed Stream Elastic Beanstalk foo-analysis.com K I N E S I S Partition Key Worker My top-10 Sequence Number Data Record Global top-10 Data Record Stream Shard 14 17 18 21 23

Amazon Kinesis Common Data Broker Data Sources Data Sources Availability Zone Availability Zone Availability Zone [Data Archive] App. 1 App. 2 S3 Data Sources Data Sources AWS Endpoint Shard 1 Shard 2 Shard N [Metric Extraction] App. 3 [Sliding Window Analysis] DynamoDB Redshift App. 4 Data Sources [Machine Learning] EMR

Amazon Kinesis Distributed Streams From batch to continuous processing Scale shards elastically UP or DOWN without losing sequencing Workers can replay records for up to 24 hours Scale up to GB/sec without losing durability Records stored across multiple availability zones Multiple parallel Kinesis Apps output to anything RDBMS, S3, In-house Data Warehouse, Messaging, another stream, JavaSDK, PythonSDK, etc.

Data Processing

Emerging Architecture Data Streams Spark Storm KCL Streaming Analytics Notifications & Alerts APIs Dashboards/ visualizations Real Time Micro Batch Data Archive DW Hadoop Batch Analysis Dashboards/ visualizations Deep Learning Batch

Real-time: Event-based processing Producer Amazon Kinesis Kinesis Storm Spout Apache Storm Elas7Cache (Redis) Node.js Client (D3) hap://blogs.aws.amazon.com/bigdata/post/tx36lyscy2r0a9b/implement- a- Real- 7me- Sliding- Window- Applica7on- Using- Amazon- Kinesis- and- Apache

Micro-Batches: Drip feeding the data hap://blogs.aws.amazon.com/bigdata/post/tx2anln1pgeldju/best- Prac7ces- for- Micro- Batch- Loading- on- Amazon- RedshiY

Offline Batch: Hadoop for discovery Offline Analysis Producer Amazon Kinesis Kinesis Applica7on S3 EMR Ad- hoc Analysis Amazon Kinesis Hive Pig EMR Cascading MapReduce hap://blogs.aws.amazon.com/bigdata/post/tx36lyscy2r0a9b/implement- a- Real- 7me- Sliding- Window- Applica7on- Using- Amazon- Kinesis- and- Apache

Putting it together Producer Amazon Kinesis Apache Storm DynamoDB App Client Real Time KCL RedshiY BI Tools Micro Batch Batch KCL S3 EMR

AWS Lambda An event-driven computing service for dynamic applications AWS Lambda func/ons can be triggered by data stream updates from Amazon Kinesis and Amazon DynamoDB. For instance, you can watch for a pabern, such as an address, and trigger an alert.

A focus on functions, data and events S3 event notifications DynamoDB Streams Kinesis events Custom events Cloud func7ons

Putting AWS Lambda to work Server-free back-end Data triggers IoT Stream processing Indexing & synchronization

AWS Lambda for reactive computing Photo bucket S3 Extract Metadata Cloud Function Metadata DynamoDB Trending Cloud Function Trending DynamoDB NotifyCloud Function SNS Push notification

Processing Events from Kinesis Write million of events from Kinesis into Elas7search with only 60 lines of code!!! haps://gist.github.com/tylr/ e8baf45c07ced23ef013 hap://docs.aws.amazon.com/lambda/latest/dg/walkthrough- kinesis- events- adminuser.html

Customer deployments on AWS

GREE International re:invent 2014 GAM301 - Real-Time Game Analytics with Amazon Kinesis, Redshift, and DynamoDB Session - https://www.youtube.com/watch?v=elpwlj6yi44 Slide: http://www.slideshare.net/amazonwebservices/ gam301-realtime-game-analytics-with-amazon-kinesisamazon-redshift-and-amazon-dynamodb-awsreinvent-2014

Key Requirements for Analytics Initial Requreiments Data collection & streaming to database Zero data loss Zero data corruption Guaranteed data delivery New Requirements Near real-time data latency Real-time ad-hoc analysis Ease of adding consumers Managed Service

Data Collection Source of Data Mobile Devices Game Servers Ad Networks Data Sizes Size of event ~ 1 KB 500M+ events/day 500G+/day & growing JSON format

Architecture

SocialMetrix re:invent 2014 ARC202: Real-World Real-Time Analytics Session: https://www.youtube.com/watch?v=nia33zwfa8e Slides: http://www.slideshare.net/zer0/arc202-arc202- real-world-real-time-analytics20141109mhfinaledit

Drivers for architecture evolution More customers, bigger customers Add new features Keep costs under control

Requirements at 4th iteration Monitor millions of social media profiles Make data accessible (exploration, PoC) Improve UI response times Testing our data pipelines Reprocessing (faster)

Architecture

120 100 80 60 40 20 0 160 140 120 100 80 60 40 20 - Cost over Architecture Costs Customers Active Customers #1 #2 #3 #4

THANK YOU!!! http://aws.amazon.com/big-data