Building Real-Time Analytics Into Big Data Applications

Size: px
Start display at page:

Download "Building Real-Time Analytics Into Big Data Applications"

Transcription

1 Building Real-Time Analytics Into Big Data Applications Shawn Gandhi, Solutions 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

2 { } "payerid": "Joe", "productcode": "AmazonS3", "clientproductcode": "AmazonS3", "usagetype": "Bandwidth", "operation": "PUT", "value": "22490", "timestamp": " " Metering Record user-identifier frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" Common Log Entry <165> T22:14:15.003Z mymachine.example.com evntslog - ID47 [examplesdid@32473 iut="3" eventsource="application" eventid="1011"][examplepriority@32473 class="high"] Syslog Entry SeattlePublicWater/Kinesis/123/Realtime MQTT Record <R,AMZN,T,G,R1> NASDAQ OMX Record

3 Data volume Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software Forecast and 2011 Vendor Shares

4 Abraham Wald ( )

5

6

7 Big data: best served fresh Big data Hourly server logs: were your systems misbehaving one hour ago Weekly / Monthly Bill: what you spent this billing cycle Daily customer-preferences report from your website s click stream: what deal or ad to try next time Daily fraud reports: was there fraud yesterday Real-time big data Amazon CloudWatch metrics: what went wrong now Real-time spending alerts/caps: prevent overspending now Real-time analysis: what to offer the current customer now Real-time detection: block fraudulent use now

8 Talk outline A tour of Amazon Kinesis concepts in the context of a Twitter Trends service Implementing the Twitter Trends service Amazon Kinesis in the broader context of a big data ecosystem Q&A

9 Sending and reading data from Amazon Kinesis streams Sending Reading HTTP Post Get* APIs AWS SDK LOG4J Kinesis Client Library + Connector Library Apache Storm Flume Fluentd Amazon Elastic MapReduce

10 A Tour of Amazon Kinesis Concepts in the Context of a Twitter Trends Service

11 Demo

12 twitter-trends.com website twitter-trends.com Elastic Beanstalk twitter-trends.com

13 Too big to handle on one box twitter-trends.com

14 The solution: streaming map/reduce My top-10 twitter-trends.com My top-10 Global top-10 My top-10

15 Core concepts Data record Stream Partition key twitter-trends.com My top-10 My top-10 Global top-10 My top-10 Data record Shard Worker Shard: Sequence number

16 How this relates to Amazon Kinesis twitter-trends.com Kinesis Kinesis application

17 Core concepts recapped Data record ~ tweet Stream ~ all tweets (the Twitter Firehose) Partition key ~ Twitter topic (every tweet record belongs to exactly one of these) Shard ~ all the data records belonging to a set of Twitter topics that will get grouped together Sequence number ~ each data record gets one assigned when first ingested Worker ~ processes the records of a shard in sequence number order

18 Using the Amazon Kinesis API directly twitter-trends.com K I N E S I S process(records): iterator = getsharditerator(shardid, { LATEST); while for (record (true) { in records) { [records, updatelocaltop10(record); iterator] = } getnextrecords(iterator, maxrecstoreturn); if process(records); (timetodooutput()) { } writelocaltop10toddb(); } } } while (true) { localtop10lists = scanddbtable(); updateglobaltop10list( localtop10lists); sleep(10);

19 Challenges with using the Amazon Kinesis API directly Manual creation How many of workers and assignment How per many EC2 to shards instance? EC2 instances? twitter-trends.com K I N E S I S Kinesis application

20 Using the Amazon Kinesis Client Library twitter-trends.com K I N E S I S Shard mgmt table Kinesis application

21 Elastic scale and load balancing Auto scaling Group Amazon CloudWatch twitter-trends.com K I N E S I S Shard mgmt table

22 Shard management Auto scaling Group Amazon CloudWatch Amazon SNS AWS Console twitter-trends.com K I N E S I S Shard mgmt table

23 The challenges of fault tolerance K I N E S I S X X X Shard mgmt table twitter-trends.com

24 Fault tolerance support in KCL K I N E S I S Availability Zone X1 Shard mgmt table twitter-trends.com Availability Zone 3

25 Checkpoint, replay design pattern Amazon Kinesis Host A 18X 0310 Shard ID Shard-i Local top-10 {#Obama: 10235, #Congress: #Seahawks: 9910, 9835, } Shard-i Host B Shard ID Lock Seq num Shard-i Host AB 10 0

26 Catching up from delayed processing Amazon Kinesis Host A How long until we ve caught up? Shard-i X2 Host B Shard throughput SLA: - 1 MBps in - 2 MBps out => Catch up time ~ down time Unless your application can t process 2 MBps on a single host => Provision more shards

27 Concurrent processing applications Amazon Kinesis twitter-trends.com Shard-i spot-the-revolution.org

28 Why not combine applications? Amazon Kinesis twitter-trends.com Output state and checkpoint every 10 seconds twitter-stats.com Output state and checkpoint every hour

29 Implementing twitter-trends.com

30 Creating an Amazon Kinesis stream

31 How many shards? Additional considerations: - How much processing is needed per data record/data byte? - How quickly do you need to catch up if one or more of your workers stalls or fails? You can always dynamically reshard to adjust to changing workloads

32 Twitter Trends shard processing code Class TwitterTrendsShardProcessor implements IRecordProcessor { public TwitterTrendsShardProcessor() { public void initialize(string shardid) { public void processrecords(list<record> records, IRecordProcessorCheckpointer checkpointer) { } public void shutdown(irecordprocessorcheckpointer checkpointer, ShutdownReason reason) { }

33 Twitter Trends shard processing code Class TwitterTrendsShardProcessor implements IRecordProcessor { private Map<String, AtomicInteger> hashcount = new HashMap<>(); private long tweetsprocessed = 0; public void processrecords(list<record> records, IRecordProcessorCheckpointer checkpointer) computelocaltop10(records); } } if ((tweetsprocessed++) >= 2000) { } emittodynamodb(checkpointer);

34 Twitter Trends shard processing code private void computelocaltop10(list<record> records) { for (Record r : records) { String tweet = new String(r.getData().array()); String[] words = tweet.split(" \t"); for (String word : words) { if (word.startswith("#")) { if (!hashcount.containskey(word)) hashcount.put(word, new AtomicInteger(0)); hashcount.get(word).incrementandget(); } } } private void emittodynamodb(irecordprocessorcheckpointer checkpointer) { persistmaptodynamodb(hashcount); try { checkpointer.checkpoint(); } catch (IOException KinesisClientLibDependencyException InvalidStateException ThrottlingException ShutdownException e) { // Error handling } hashcount = new HashMap<>(); tweetsprocessed = 0; }

35 Amazon Kinesis as a gateway into the big data ecosystem

36 Big data has a lifecycle Twitter trends anonymized archive Twitter trends statistics Twitter trends processing application twitter-trends.com

37 Connector framework public interface IKinesisConnectorPipeline<T> { IEmitter<T> getemitter(kinesisconnectorconfiguration configuration); IBuffer<T> getbuffer(kinesisconnectorconfiguration configuration); ITransformer<T> gettransformer( KinesisConnectorConfiguration configuration); } IFilter<T> getfilter(kinesisconnectorconfiguration configuration);

38 Transformer and Filter interfaces public interface ITransformer<T> { public T toclass(record record) throws IOException; } public byte[] fromclass(t record) throws IOException; public interface IFilter<T> { } public boolean keeprecord(t record);

39 Tweet transformer for S3 archival public class TweetTransformer implements ITransformer<TwitterRecord> { private final JsonFactory fact = public TwitterRecord toclass(record record) { try { return fact.createjsonparser( record.getdata().array()).readvalueas(twitterrecord.class); } catch (IOException e) { throw new RuntimeException(e); } }

40 Example Amazon Redshift connector Amazon Redshift K I N E S I S Amazon S3

41 Example Amazon Redshift connector v2 K I N E S I S K I N E S I S Amazon Redshift Amazon S3

42

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

CAPTURING & PROCESSING REAL-TIME DATA ON AWS CAPTURING & PROCESSING REAL-TIME DATA ON AWS @ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

Scalability in the Cloud HPC Convergence with Big Data in Design, Engineering, Manufacturing

Scalability in the Cloud HPC Convergence with Big Data in Design, Engineering, Manufacturing Scalability in the Cloud HPC Convergence with Big Data in Design, Engineering, Manufacturing July 7, 2014 David Pellerin, Business Development Principal Amazon Web Services What Do We Hear From Customers?

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Thing Big: How to Scale Your Own Internet of Things. Walter'Pernstecher'-'[email protected]' Dr.'Markus'Schmidberger'-'schmidbe@amazon.

Thing Big: How to Scale Your Own Internet of Things. Walter'Pernstecher'-'pernstec@amazon.de' Dr.'Markus'Schmidberger'-'schmidbe@amazon. Thing Big: How to Scale Your Own Internet of Things Walter'Pernstecher'-'[email protected]' Dr.'Markus'Schmidberger'-'[email protected]' Internet of Things is the network of physical objects or "things"

More information

Building your Big Data Architecture on Amazon Web Services

Building your Big Data Architecture on Amazon Web Services Building your Big Data Architecture on Amazon Web Services Abhishek Sinha @abysinha [email protected] AWS Services Deployment & Administration Application Services Compute Storage Database Networking

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Amazon Web Services. Lawrence Berkeley LabTech Conference 9/10/15. Jamie Baker Federal Scientific Account Manager AWS WWPS bakjames@amazon.

Amazon Web Services. Lawrence Berkeley LabTech Conference 9/10/15. Jamie Baker Federal Scientific Account Manager AWS WWPS bakjames@amazon. Web Services Lawrence Berkeley LabTech Conference 9/10/15 Jamie Baker Federal Scientific Account Manager AWS WWPS [email protected] 2015, Web Services, Inc. or its Affiliates. All rights reserved. AWS

More information

Introduction to Amazon Web Services! Leo Zhadanovsky! @leozh [email protected]! Senior Solutions Architect

Introduction to Amazon Web Services! Leo Zhadanovsky! @leozh leo@amazon.com! Senior Solutions Architect Introduction to Amazon Web Services! Leo Zhadanovsky! @leozh [email protected]! Senior Solutions Architect AWS HISTORY About How didamazon Amazon Web Services! Deep experience in building and operating global

More information

Innovative Geschäftsmodelle Ermöglicht durch die AWS Cloud

Innovative Geschäftsmodelle Ermöglicht durch die AWS Cloud Innovative Geschäftsmodelle Ermöglicht durch die AWS Cloud Rolf Kersten Business Development Manager Amazon Web Services Germany GmbH 2. Juli 2014 2014 Software AG. All rights reserved. Sechs Dinge, die

More information

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH Real-time Data Analytics mit Elasticsearch Bernhard Pflugfelder inovex GmbH Bernhard Pflugfelder Big Data Engineer @ inovex Fields of interest: search analytics big data bi Working with: Lucene Solr Elasticsearch

More information

Amazon Web Services. 2015 Annual ALGIM Conference. Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand

Amazon Web Services. 2015 Annual ALGIM Conference. Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand Amazon Web Services 2015 Annual ALGIM Conference Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Who

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you

More information

Introduction to AWS in Higher Ed

Introduction to AWS in Higher Ed Introduction to AWS in Higher Ed Lori Clithero [email protected] 206.227.5054 University of Washington Cloud Day 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 2 Cloud democratizes

More information

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved BERLIN 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Build Your Mobile App Faster with AWS Mobile Services Jan Metzner AWS Solutions Architect @janmetzner Danilo Poccia AWS Technical

More information

Amazon Glacier. Developer Guide API Version 2012-06-01

Amazon Glacier. Developer Guide API Version 2012-06-01 Amazon Glacier Developer Guide Amazon Glacier: Developer Guide Copyright 2016 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in

More information

19.10.11. Amazon Elastic Beanstalk

19.10.11. Amazon Elastic Beanstalk 19.10.11 Amazon Elastic Beanstalk A Short History of AWS Amazon started as an ECommerce startup Original architecture was restructured to be more scalable and easier to maintain Competitive pressure for

More information

LONDON. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

LONDON. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved LONDON 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Best Practices for Building Partner Managed Services on AWS Kelly Hartman, Global Segment Leader, MSPs Kyle Lichtenberg, Solutions

More information

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

AWS Cloud for HPC and Big Data

AWS Cloud for HPC and Big Data AWS Cloud for HPC and Big Data David Pellerin, Business Development Principal IDC HPC User Forum September 16, 2014 AWS Regions US West (Oregon) US West (Northern California) GovCloud (ITAR Compliance)

More information

15-319 / 15-619 Cloud Computing. Recitation 5 September 29 th & October 1 st 2015

15-319 / 15-619 Cloud Computing. Recitation 5 September 29 th & October 1 st 2015 15-319 / 15-619 Cloud Computing Recitation 5 September 29 th & October 1 st 2015 1 Overview Administrative Stuff Last Week s Reflection Project 2.1, OLI Modules 5 & 6, Quiz 3 New concepts This week s schedule

More information

Amazon Kinesis and Apache Storm

Amazon Kinesis and Apache Storm Amazon Kinesis and Apache Storm Building a Real-Time Sliding-Window Dashboard over Streaming Data Rahul Bhartia October 2014 Contents Contents Abstract Introduction Reference Architecture Amazon Kinesis

More information

AIST Data Symposium. Ed Lenta. Managing Director, ANZ Amazon Web Services

AIST Data Symposium. Ed Lenta. Managing Director, ANZ Amazon Web Services AIST Data Symposium Ed Lenta Managing Director, ANZ Amazon Web Services Why are companies adopting cloud computing and AWS so quickly? #1: Agility The primary reason businesses are moving so quickly to

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

HADOOP BIG DATA DEVELOPER TRAINING AGENDA HADOOP BIG DATA DEVELOPER TRAINING AGENDA About the Course This course is the most advanced course available to Software professionals This has been suitably designed to help Big Data Developers and experts

More information

CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA

CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA AWS Introduction Why are enterprises choosing AWS? What are enterprises using AWS for? How are enterprise getting

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet [email protected] October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with

More information

AWS Lambda. Developer Guide

AWS Lambda. Developer Guide AWS Lambda Developer Guide AWS Lambda: Developer Guide Copyright 2015 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection

More information

www.boost ur skills.com

www.boost ur skills.com www.boost ur skills.com AWS CLOUD COMPUTING WORKSHOP Write us at [email protected] BOOSTURSKILLS No 1736 1st Amrutha College Road Kasavanhalli,Off Sarjapur Road,Bangalore-35 1) Introduction &

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

Amazon Elastic MapReduce. Jinesh Varia Peter Sirota Richard Cole

Amazon Elastic MapReduce. Jinesh Varia Peter Sirota Richard Cole Amazon Elastic MapReduce Jinesh Varia Peter Sirota Richard Cole Start End From IDE Command line Web Console Notify Input Data Get Results Start End From IDE Command line Web Console AWS EC2 Instance Notify

More information

Monitoring Linux and Windows Logs with Graylog Collector. Bernd Ahlers Graylog, Inc.

Monitoring Linux and Windows Logs with Graylog Collector. Bernd Ahlers Graylog, Inc. Monitoring Linux and Windows Logs with Graylog Collector Bernd Ahlers Graylog, Inc. Structured Logging & Introduction to Graylog Collector Bernd Ahlers Graylog, Inc. Introduction: Graylog Open source log

More information

Razvoj Java aplikacija u Amazon AWS Cloud: Praktična demonstracija

Razvoj Java aplikacija u Amazon AWS Cloud: Praktična demonstracija Razvoj Java aplikacija u Amazon AWS Cloud: Praktična demonstracija Robert Dukarić University of Ljubljana Faculty of Computer and Information Science Laboratory for information systems integration Competence

More information

How To Use Aws.Com

How To Use Aws.Com Crypto-Options on AWS Bertram Dorn Specialized Solutions Architect Security/Compliance Network/Databases Amazon Web Services Germany GmbH Amazon.com, Inc. and its affiliates. All rights reserved. Agenda

More information

Technology Enablement

Technology Enablement SOLUTION OVERVIEW 1 ABOUT TECHMILEAGE Founded in 2008 / Tempe, Arizona Over 100 engagements Full range of business & technology services Software Development, Big Data, Cloud/AWS, BI, Advanced Analytics

More information

Why should you look at your logs? Why ELK (Elasticsearch, Logstash, and Kibana)?

Why should you look at your logs? Why ELK (Elasticsearch, Logstash, and Kibana)? Authors Introduction This guide is designed to help developers, DevOps engineers, and operations teams that run and manage applications on top of AWS to effectively analyze their log data to get visibility

More information

Big Data Pipeline and Analytics Platform

Big Data Pipeline and Analytics Platform Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Source Software Sudhir Tonse (@stonse) Danny Yuan (@g9yuayon) Netflix is a log generating company that also happens to stream movies

More information

Amazon Web Services. Steve Spano, Brig Gen (Ret), USAF GM, Defense and National Security AWS, Worldwide Public Sector

Amazon Web Services. Steve Spano, Brig Gen (Ret), USAF GM, Defense and National Security AWS, Worldwide Public Sector Amazon Web Services Steve Spano, Brig Gen (Ret), USAF GM, Defense and National Security AWS, Worldwide Public Sector Overview Historical Perspective of IT Transformational Value of Cloud Why organizations

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information

More information

Logentries Insights: The State of Log Management & Analytics for AWS

Logentries Insights: The State of Log Management & Analytics for AWS Logentries Insights: The State of Log Management & Analytics for AWS Trevor Parsons Ph.D Co-founder & Chief Scientist Logentries 1 1. Introduction The Log Management industry was traditionally driven by

More information

Time series IoT data ingestion into Cassandra using Kaa

Time series IoT data ingestion into Cassandra using Kaa Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka [email protected] Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox

More information

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud Preparing Your IT for the Holidays A quick start guide to take your e-commerce to the Cloud September 2011 Preparing your IT for the Holidays: Contents Introduction E-Commerce Landscape...2 Introduction

More information

Agenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases

Agenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases Postgres & Hadoop Agenda! Strengths of PostgreSQL! Strengths of Hadoop! Hadoop Community! Use Cases Best of Both World Postgres Hadoop World s most advanced open source database solution Enterprise class

More information

Systems Integration in the Cloud Era with Apache Camel. Kai Wähner, Principal Consultant

Systems Integration in the Cloud Era with Apache Camel. Kai Wähner, Principal Consultant Systems Integration in the Cloud Era with Apache Camel Kai Wähner, Principal Consultant Kai Wähner Main Tasks Requirements Engineering Enterprise Architecture Management Business Process Management Architecture

More information

AWS Account Setup and Services Overview

AWS Account Setup and Services Overview AWS Account Setup and Services Overview 1. Purpose of the Lab Understand definitions of various Amazon Web Services (AWS) and their use in cloud computing based web applications that are accessible over

More information

Microservices on AWS

Microservices on AWS Microservices on AWS AWS Summit Berlin 2016 Matthias Jung, Solutions Architect Julien Simon, Evangelist April, 12 th, 2016 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda

More information

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation

More information

IAN MASSINGHAM. Technical Evangelist Amazon Web Services

IAN MASSINGHAM. Technical Evangelist Amazon Web Services IAN MASSINGHAM Technical Evangelist Amazon Web Services From 2014: Cloud computing has become the new normal Deploying new applications to the cloud by default Migrating existing applications as quickly

More information

Cloud Computing with Amazon Web Services and the DevOps Methodology. www.cloudreach.com

Cloud Computing with Amazon Web Services and the DevOps Methodology. www.cloudreach.com Cloud Computing with Amazon Web Services and the DevOps Methodology Who am I? Max Manders @maxmanders Systems Developer at Cloudreach @cloudreach Director / Co-Founder of Whisky Web @whiskyweb Who are

More information

Deep Dive: Maximizing EC2 & EBS Performance

Deep Dive: Maximizing EC2 & EBS Performance Deep Dive: Maximizing EC2 & EBS Performance Tom Maddox, Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved What we ll cover Amazon EBS overview Volumes Snapshots

More information

Professional Hadoop Solutions

Professional Hadoop Solutions Brochure More information from http://www.researchandmarkets.com/reports/2542488/ Professional Hadoop Solutions Description: The go-to guidebook for deploying Big Data solutions with Hadoop Today's enterprise

More information

Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014

Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014 Finding the Needle in a Big Data Haystack Wolfgang Hoschek (@whoschek) JAX 2014 1 About Wolfgang Software Engineer @ Cloudera Search Platform Team Previously CERN, Lawrence Berkeley National Laboratory,

More information

Bernd Ahlers Michael Friedrich. Log Monitoring Simplified Get the best out of Graylog2 & Icinga 2

Bernd Ahlers Michael Friedrich. Log Monitoring Simplified Get the best out of Graylog2 & Icinga 2 Bernd Ahlers Michael Friedrich Log Monitoring Simplified Get the best out of Graylog2 & Icinga 2 BEFORE WE START Agenda AGENDA Introduction Tools Log History Logs & Monitoring Demo The Future Resources

More information

AWS Directory Service. Simple AD Administration Guide Version 1.0

AWS Directory Service. Simple AD Administration Guide Version 1.0 AWS Directory Service Simple AD Administration Guide AWS Directory Service: Simple AD Administration Guide Copyright 2015 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's

More information

AWS Performance Tuning

AWS Performance Tuning AWS Performance Tuning Markus Albe @Percona Fernando Ipar @Percona Ryan Lowe @Square PLNY 2012 Amazon Web Services Cloud Formation CloudFront CloudSearch CloudWatch DirectConnect DynamoDB ec2 ElastiCache

More information

SECURITY IS JOB ZERO. Security The Forefront For Any Online Business Bill Murray Director AWS Security Programs

SECURITY IS JOB ZERO. Security The Forefront For Any Online Business Bill Murray Director AWS Security Programs SECURITY IS JOB ZERO Security The Forefront For Any Online Business Bill Murray Director AWS Security Programs Security is Job Zero Physical Security Network Security Platform Security People & Procedures

More information

Cloudera Certified Developer for Apache Hadoop

Cloudera Certified Developer for Apache Hadoop Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number

More information

ITIL Event Management in the Cloud

ITIL Event Management in the Cloud ITIL Event Management in the Cloud An AWS Cloud Adoption Framework Addendum July 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Cloud computing - Architecting in the cloud

Cloud computing - Architecting in the cloud Cloud computing - Architecting in the cloud [email protected] 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices

More information

Migration Scenario: Migrating Batch Processes to the AWS Cloud

Migration Scenario: Migrating Batch Processes to the AWS Cloud Migration Scenario: Migrating Batch Processes to the AWS Cloud Produce Ingest Process Store Manage Distribute Asset Creation Data Ingestor Metadata Ingestor (Manual) Transcoder Encoder Asset Store Catalog

More information

Application Security Best Practices. Matt Tavis Principal Solutions Architect

Application Security Best Practices. Matt Tavis Principal Solutions Architect Application Security Best Practices Matt Tavis Principal Solutions Architect Application Security Best Practices is a Complex topic! Design scalable and fault tolerant applications See Architecting for

More information

Amazon Web Services. For Government, Education, and Nonprofit Organizations. Jakob Huhn. [email protected]. Partner Manager Benelux, Public Sector

Amazon Web Services. For Government, Education, and Nonprofit Organizations. Jakob Huhn. jakohuhn@amazon.lu. Partner Manager Benelux, Public Sector Amazon Web Services For Government, Education, and Nonprofit Organizations Jakob Huhn Partner Manager Benelux, Public Sector [email protected] 2015, Amazon Web Services, Inc. or its Affiliates. All rights

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Headline Goes Here Hari Shreedharan Speaker Name or Subhead Goes Here

Headline Goes Here Hari Shreedharan Speaker Name or Subhead Goes Here Real Time Data Ingest into Hadoop DO NOT USE PUBLICLY using Flume PRIOR TO 10/23/12 Headline Goes Here Hari Shreedharan Speaker Name or Subhead Goes Here CommiDer and PMC Member, Apache Flume SoIware Engineer,

More information

MapReduce (in the cloud)

MapReduce (in the cloud) MapReduce (in the cloud) How to painlessly process terabytes of data by Irina Gordei MapReduce Presentation Outline What is MapReduce? Example How it works MapReduce in the cloud Conclusion Demo Motivation:

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Cloud Big Data Architectures

Cloud Big Data Architectures Cloud Big Data Architectures Lynn Langit QCon Sao Paulo, Brazil 2016 About this Workshop Real-world Cloud Scenarios w/aws, Azure and GCP 1. Big Data Solution Types 2. Data Pipelines 3. ETL and Visualization

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

Amazon Cloud Storage Options

Amazon Cloud Storage Options Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object

More information

Amazon Simple Notification Service. Developer Guide API Version 2010-03-31

Amazon Simple Notification Service. Developer Guide API Version 2010-03-31 Amazon Simple Notification Service Developer Guide Amazon Simple Notification Service: Developer Guide Copyright 2014 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. The following

More information

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS . 3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS Deliver fast actionable business insights for data scientists, rapid application creation for developers and enterprise-grade

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

Xiaoming Gao Hui Li Thilina Gunarathne

Xiaoming Gao Hui Li Thilina Gunarathne Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal

More information

Backup and Recovery of SAP Systems on Windows / SQL Server

Backup and Recovery of SAP Systems on Windows / SQL Server Backup and Recovery of SAP Systems on Windows / SQL Server Author: Version: Amazon Web Services sap- on- [email protected] 1.1 May 2012 2 Contents About this Guide... 4 What is not included in this guide...

More information

EEDC. Scalability Study of web apps in AWS. Execution Environments for Distributed Computing

EEDC. Scalability Study of web apps in AWS. Execution Environments for Distributed Computing EEDC Execution Environments for Distributed Computing 34330 Master in Computer Architecture, Networks and Systems - CANS Scalability Study of web apps in AWS Sergio Mendoza [email protected]

More information

Building Success on Acquia Cloud:

Building Success on Acquia Cloud: Building Success on Acquia Cloud: 10 Layers of PaaS TECHNICAL Guide Table of Contents Executive Summary.... 3 Introducing the 10 Layers of PaaS... 4 The Foundation: Five Layers of PaaS Infrastructure...

More information

Monitoring and Scaling My Application

Monitoring and Scaling My Application Monitoring and Scaling My Application In the last chapter, we looked at how we could use Amazon's queuing and notification services to add value to our existing application. We looked at how we could use

More information

Java SE 8 Programming

Java SE 8 Programming Oracle University Contact Us: 1.800.529.0165 Java SE 8 Programming Duration: 5 Days What you will learn This Java SE 8 Programming training covers the core language features and Application Programming

More information

How To Set Up Wiremock In Anhtml.Com On A Testnet On A Linux Server On A Microsoft Powerbook 2.5 (Powerbook) On A Powerbook 1.5 On A Macbook 2 (Powerbooks)

How To Set Up Wiremock In Anhtml.Com On A Testnet On A Linux Server On A Microsoft Powerbook 2.5 (Powerbook) On A Powerbook 1.5 On A Macbook 2 (Powerbooks) The Journey of Testing with Stubs and Proxies in AWS Lucy Chang [email protected] Abstract Intuit, a leader in small business and accountants software, is a strong AWS(Amazon Web Services) partner

More information

Edge Configuration Series Reporting Overview

Edge Configuration Series Reporting Overview Reporting Edge Configuration Series Reporting Overview The Reporting portion of the Edge appliance provides a number of enhanced network monitoring and reporting capabilities. WAN Reporting Provides detailed

More information

PERFORMANCE MODELS FOR APACHE ACCUMULO:

PERFORMANCE MODELS FOR APACHE ACCUMULO: Securely explore your data PERFORMANCE MODELS FOR APACHE ACCUMULO: THE HEAVY TAIL OF A SHAREDNOTHING ARCHITECTURE Chris McCubbin Director of Data Science Sqrrl Data, Inc. I M NOT ADAM FUCHS But perhaps

More information

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful. Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one

More information

StorReduce Technical White Paper Cloud-based Data Deduplication

StorReduce Technical White Paper Cloud-based Data Deduplication StorReduce Technical White Paper Cloud-based Data Deduplication See also at storreduce.com/docs StorReduce Quick Start Guide StorReduce FAQ StorReduce Solution Brief, and StorReduce Blog at storreduce.com/blog

More information

How to Leverage Cloud to Quickly Build Scalable Applications

How to Leverage Cloud to Quickly Build Scalable Applications How to Leverage Cloud to Quickly Build Scalable Applications Chris Keyser Principal Solution Architect David Polley Senior Director Cloud Product Management Cloud Growth Recent IDC cloud research shows

More information

Amazon Relational Database Service (RDS)

Amazon Relational Database Service (RDS) Amazon Relational Database Service (RDS) G-Cloud Service 1 1.An overview of the G-Cloud Service Arcus Global are approved to sell to the UK Public Sector as official Amazon Web Services resellers. Amazon

More information

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional

More information

Cloud Computing project Report

Cloud Computing project Report Name: Trasily Krishnan Shanmugham Cloud Computing project Report An online business can use load balancing and auto-scaling to support unexpected usage peaks and save money when usage is lower. Amazon

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. [email protected] [email protected] @OrionGM The Inside Scoop

More information