Amazon Kinesis and Apache Storm
|
|
|
- Alisha Hardy
- 10 years ago
- Views:
Transcription
1 Amazon Kinesis and Apache Storm Building a Real-Time Sliding-Window Dashboard over Streaming Data Rahul Bhartia October 2014
2 Contents Contents Abstract Introduction Reference Architecture Amazon Kinesis Apache Storm Amazon ElastiCache Node.js Epoch and D3 Deploying on AWS Streaming Data in Amazon Kinesis Accessing Kinesis Stream Continuous Processing Using Storm Implementing a Sliding Window Accessing the Storm User Interface Building a Real-Time Data Visualization Generating Server-Side Events Visualization Using Epoch Accessing the Dashboard Deleting the CloudFormation Stack Conclusion Page 2 of 16
3 Abstract Apache Storm developers can use Amazon Kinesis to quickly and cost effectively build real-time analytics dashboards and applications that can continuously process very high volumes of streaming data, such as clickstream log files and machine-generated data. This whitepaper outlines a reference architecture for building a system that performs real-time sliding-windows analysis over clickstream data using Amazon Kinesis and Apache Storm. We will use Amazon Kinesis for managed ingestion of streaming data at scale with the ability to replay past data, and we'll run sliding-window analytics to power dashboards in Apache Storm. We build the system based on the reference architecture, which demonstrates everything from ingestion, processing, and storing to visualizing of the data in real-time. Introduction Streams of data are ubiquitous today clickstreams, log data streams, event streams, and more. The need for real-time processing of high-volume data streams is pushing the limits of traditional data processing infrastructures. Building a clickstream monitoring system, for example, where data is in the form of a continuous clickstream rather than discrete data sets, requires the use of continuous queries, rather than ad-hoc one-time queries. In this whitepaper, we propose a reference architecture for ingesting, analyzing, and processing vast amounts of clickstream data generated at very high rates in a smart and cost-efficient way using Amazon Kinesis with Apache Storm. We also explore the use of Amazon ElastiCache (Redis) as an in-memory data store for aggregated counters and use of its Pub/Sub facility to publish the counters on a simple dashboard. We demonstrate the architecture by implementing a time-sensitive, sliding-window analysis over continuous clickstream data to determine how many visitors originated from specific referrers. Sliding-window analysis is a common way to analyze and identify trends in the clickstream data. The term sliding-window analysis refers to a common pattern analysis of real-time and continuous data and uses rolling counts of incoming data to examine trending, volumes, and rankings. Page 3 of 16
4 Reference Architecture The reference architecture outlined below demonstrates using Amazon Kinesis for realtime data ingestion and Apache Storm for processing streaming data. To demonstrate the reference architecture, we create an application that puts simulated URLs and referrers into an Amazon Kinesis stream. Then we use Apache Storm to process the Amazon Kinesis stream and calculate how many visitors originated from a particular referrer. Storm also persists the aggregated counters in ElastiCache and publishes the counter on a pubsub channel. Using Node.Js, we further translate the messages from the Pub/Sub channel into server-side events (SSEs). Epoch, which uses D3, is used to build a real-time visualization by consuming the server-side events. Let's start by taking a look at each component of our reference architecture and the role it plays in our architecture. Amazon Kinesis Amazon Kinesis handles streaming of high-volume, continuously generated data in our reference architecture. Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Using Amazon Kinesis, developers can continuously capture and store terabytes of data per hour from hundreds of thousands of sources, including website clickstreams, financial transaction data, social media feeds, IT logs, location-tracking events, and more. Developers can then write stream processing applications that consume the Kinesis streams to take action on real-time data, and power real-time dashboards, generate alerts, implement real-time business logic, or even emit data to other big data services such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, and more. The Kinesis Client Library enables developers to build streaming data processing applications. Leveraging the client library, developers can focus on business logic and let the client library automatically handle complex issues like adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, and processing data with fault-tolerance. Amazon Kinesis Connector Library further helps developers integrate Amazon Kinesis with other services from AWS, such as Amazon Page 4 of 16
5 DynamoDB, Redshift, and Amazon S3. Amazon Kinesis also has connectors for other applications and distributed systems, like Apache Storm. Apache Storm Apache Storm handles continuous processing of the Amazon Kinesis streams in our reference architecture. Apache Storm is a real-time distributed computing technology for processing streaming messages on a continuous basis. Individual logical processing units (known as bolts in Storm terminology) are connected like a pipeline to express the series of transformations while also exposing opportunities for concurrent processing. Data streams are ingested in Storm via spouts. Each spout emits one or more sequences of tuples into a cluster for processing. A given bolt can consume any number of input spouts, process the tuples, and emit transformed tuples. A topology is a multistage distributing computation application composed of a network of spouts and bolts. A topology (which you can think of as a full application) runs continuously in order to process incoming data. We'll use the Amazon Kinesis Storm Spout to integrate Amazon Kinesis with Storm for real-time computation of our clickstream logs. Amazon ElastiCache Amazon ElastiCache handles the persistent storage of aggregated counters in an inmemory store and publication of these counters in our reference architecture. ElastiCache is a web service for forming the data store and notification layer that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory caches, instead of relying entirely on slower disk-based databases. ElastiCache also supports Redis a popular open-source in-memory key-value store that supports data structures such as sorted sets and lists. Redis also has a subscription model that client apps can use to receive notifications. In this model, our client app the visualization layer is notified of the change and then updates the visualizations to reflect that change. While our stack mainly uses the Redis Pub/Sub facility to stream the updates to the visualization layer, you can also use Redis to persist the aggregated counters in the simple key-value store that Redis provides. Using such a persistent store, multiple applications across an enterprise can present different views or bootstrap to a known state in case of failures. Page 5 of 16
6 Node.js Node.js is the component handing the conversion of published counters into server side events in our reference architecture. Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an eventdriven, non-blocking I/O model that makes it lightweight and efficient, perfect for dataintensive real-time applications that run across distributed devices. The Node.js application subscribes to the Redis Pub/Sub channel and translates those notifications to Server-Sent Events (SSEs). SSE is a W3C specification that describes how a server can push events to browser clients using the standard HTTP protocol. Epoch and D3 Epoch is a real-time charting library built on D3, which handles the visualization of the server-side events to provide a real-time dashboard. D3.js is a JavaScript library for manipulating documents based on data. D3 helps bring data to life using HTML, SVG, and CSS. Epoch is a general purpose library for application developers and visualization designers. It focuses on two different aspects of visualization programming: basic charts for creating historical reports, and real-time charts for displaying frequently updating time-series data. Deploying on AWS We deploy our application using AWS CloudFormation, a service that gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion. The sample AWS CloudFormation template included here provisions three Amazon EC2 instances and starts all the applications on them. The AWS CloudFormation stack also creates an IAM Role to allow the application to authenticate your account without the need for you to provide explicit credentials. See Using IAM Roles for EC2 Instances with the SDK for Java for more information. You can run the AWS CloudFormation template to create the stack we described. In the following sections we walk through the details about processing in each individual layer. Streaming Data in Amazon Kinesis In a real-world scenario, your log servers, application servers, or app running on smartphones/ tablets are producing the clickstream data captured by your Amazon Kinesis stream. For our application, we'll use a sample producer app, HttpReferrerStreamWriter.java to generate simulated URLs in JSON format. This producer will automatically create an Amazon Kinesis stream for you and continuously emit simulated data into the stream. Page 6 of 16
7 This producer is also a part of the Amazon Kinesis Data Visualization Sample Application, which demonstrates how to interact with Amazon Kinesis to generate meaningful statistics from a stream of data and visualize those results in real time. Accessing Kinesis Stream After the AWS CloudFormation template completes deployment, navigate to the Amazon Kinesis console and click the stream name, which you will find prefixed by the template name, followed by 'KinesisStream' with two shards, and 'Active' status. On the Stream Details page for the stream, the first two graphs display write capacity of two MB/sec and read capacity of four MB/sec based on two shards available for the stream, as shown in the following illustration. The Amazon Kinesis stream is now ingesting data submitted to it by the producer. Continuous Processing Using Storm Amazon Kinesis Storm Spout fetches data records from Amazon Kinesis and emits them as tuples. The spout state is stored in Apache ZooKeeper, part of the Storm cluster, to track the current position in the stream. To use this spout, we start creating our Storm topology as shown in the following example: public static void main(string[] args) throws... { Page 7 of 16
8 if ( propertiesfile = args[0]; mode = args[1]; configure(propertiesfile); final KinesisSpoutConfig config = new KinesisSpoutConfig(streamName, zookeeperendpoint).withzookeeperprefix(zookeeperprefix).withinitialpositioninstream(initialpositioninstream).withregion(regions.fromname(regionname)); final KinesisSpout spout = new KinesisSpout(config, new CustomCredentialsProviderChain(), new ClientConfiguration()); // Using number of shards as the parallelism hint for the spout. builder.setspout("kinesis", spout, 2); In the excerpt above, note the following: The configure function initializes the variable from the properties file passed as an argument to the application. KinesisSpoutConfig configures the spout, including the Storm topology name, the Amazon Kinesis stream name endpoint for connecting to ZooKeeper, and the prefix for the ZooKeeper paths where the spout state is stored. KinesisSpout constructs an instance of the spout, using your AWS credentials and the configuration specified in KinesisSpoutConfig. Each task executed by the spout operates on a distinct set of Amazon Kinesis shards and periodically commits their state to ZooKeeper. The Kinesis Spout by default emits a tuple of (partitionkey, record) as specified by DefaultKinesisRecordScheme. If you want to emit more structured data, you can provide your own implementation of IKinesisRecordScheme. In this whitepaper we use a parse bolt to process the record field back into original referrer and resource components using the following code: public void execute(tuple input, BasicOutputCollector collector) { Page 8 of 16
9 Record record = (Record)input.getValueByField(DefaultKinesisRecordScheme.FI ELD_RECORD); ByteBuffer buffer = record.getdata(); data = decoder.decode(buffer).tostring(); JSONObject jsonobject = new JSONObject(data); String referrer = jsonobject.getstring("referrer"); We add this bolt to our topology and connect with the Kinesis spout: builder.setbolt("parse", new ParseReferrerBolt(), 6).shuffleGrouping("Kinesis"); At this point, we have a partial storm topology that reads data from Amazon Kinesis and emits the simulated referrer for each record generated by our producer application. In the next section we implement a sliding-window counter by using the storm-starter package library and connect it to the Redis Pub/Sub channel. Implementing a Sliding Window The storm-starter project provides sample implementations of various real-time data processing topologies, including the RollingTopWords topology, which can be used for computing trending topics. In this whitepaper, we reuse most of the RollingCountBolt topology to perform the rolling counts of incoming objects, i.e., sliding-window based counting. The bolt is configured by two parameters: Length of the sliding window in seconds (which influences the output data of the bolt, i.e., how it counts objects) Emit frequency in seconds (which influences how often the bolt will output the latest window counts) The RollingCountBolt uses tick tuple to emit the count at the specified frequency and is configured in the bolt as shown in the following example: public Map<String, Object> getcomponentconfiguration() { Map<String, Object> conf = new HashMap<String, Object>(); conf.put(config.topology_tick_tuple_freq_secs,emitfrequency InSeconds); Page 9 of 16
10 return conf; In the execute method of the bolt, we use the helper function isticktuple to identify whether the incoming tuple is a tick tuple. If the incoming tuple is a tick tuple, we proceed with emitting the current count. Otherwise, we increment the counter for the sliding window. public void execute(tuple tuple) { if (TupleHelpers.isTickTuple(tuple)) { LOG.debug("Received tick tuple, triggering emit of current window counts"); emitcurrentwindowcounts(); else { countobjandack(tuple); The bolt uses two main data structures to implement a sliding-window counter: SlidingWindowCounter provides a sliding-window count for each tracked object. The window in our case, however, does not advance with time, but whenever (and only when) the method getcountsthenadvancewindow() is called during the emit method. NthLastModifiedTimeTracker is used to track the actual duration of the sliding window. It is included in case the expected sliding-window length (as configured by the user) is different from the actual length. private void emitcurrentwindowcounts() { Map<Object, Long> counts = counter.getcountsthenadvancewindow(); int actualwindowlengthinseconds = lastmodifiedtracker.secondssinceoldestmodification(); lastmodifiedtracker.markasmodified(); if (actualwindowlengthinseconds!= windowlengthinseconds) { Page 10 of 16
11 LOG.warn(String.format(WINDOW_LENGTH_WARNING_TEMPLATE, actualwindowlengthinseconds, windowlengthinseconds)); emit(counts, actualwindowlengthinseconds); private void countobjandack(tuple tuple) { Object obj = tuple.getvalue(0); counter.incrementcount(obj); collector.ack(tuple); At this point we have all the pieces in place to compute a sliding-window count of referrers coming in a stream of data from Amazon Kinesis. We finally modify the emit function of the RollingCountBolt to publish the current count over a Pub/Sub channel of an ElastiCache Redis channel. for (Entry<Object, Long> entry : counts.entryset()) { JSONObject msg = new JSONObject(); msg.put("name", referrer); msg.put("time", currentepoch); msg.put("count", count); jedis.publish("ticker",msg.tostring()); Accessing the Storm User Interface The AWS CloudFormation template also deploys Storm and Zookeeper over a single Amazon EC2 instance, and starts the topology described above. You can find the URL for the Storm UI from the key named "StormURL" in the output section of the cloudformation template. Navigate to the URL and click the topology named 'SampleTopology'. Under the topology, you should see that the spout named "Kinesis" has two tasks; each task reading data in parallel from individual shards. Page 11 of 16
12 Building a Real-Time Data Visualization Next we move to the last task to present streaming data. Over the years, web standards have made great progress toward building data-centric applications with ease. We'll use SSEs, which allow a web app to subscribe to an event-stream instead of polling to create real-time visualizations. Generating Server-Side Events We use Node.js for its event-driven model to generate the SSEs. Our code creates a basic webserver and serves the content using the Connect middleware for Node. connect().use(connect.static( dirname)).use(function(req,res) { if(req.url == '/ticker') { Page 12 of 16
13 ticker(req,res); ).listen(9000); The ticker function subscribes to the Redis Pub/Sub channel, which is being constantly updated via our Storm topology. When a new counter event arrives on the Pub/Sub channel, the Node server appends the data and publishes it back as a server side event. function ticker(req,res) { req.socket.settimeout(infinity); var subscriber = redis.createclient(6379,process.argv[2]); subscriber.subscribe("pubsubcounters"); // When we receive a message from the redis connection subscriber.on("message", function(channel, message) { son(message); ); res.j //send headers for event-stream connection res.writehead(200, { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive' ); res.json = function(obj) { res.write("data: "+obj+"\n\n"); res.json(json.stringify({)); // The 'close' event is fired when a user closes their browser window. req.on("close", function() { subscriber.unsubscribe(); subscriber.quit(); ); Page 13 of 16
14 Visualization Using Epoch The index.html page served as default by Connect middleware has the client-side JavaScript that uses Epoch to render the real-time chart. In the script, we create an empty chart bound to the respective referrer variable. var sitearray = [ "amazon", "yahoo", "bing", "reddit", "google", "stackoverflow" ];... var chartdiv = document.createelement('div'); chartdiv.id = sitearray[i]+'chart'; chartdiv.classname = "epoch category20"; chartdiv.style.width='89%'; chartdiv.style.height='100%'; chartdiv.style.position='relative'; chartdiv.style.float='left';... eval("var "+sitearray[i]+"data = [{label:\""+sitearray[i]+"\",values:[{time:currenttime,y:0 ]];") eval("var "+sitearray[i]+" = $('#"+sitearray[i]+"chart').epoch({type:'time.area',data:"+ sitearray[i]+"data,axes:['left','bottom','right']);") The script adds an event-listener on the event stream that is published by the Node server. var source = new EventSource('/ticker'); source.addeventlistener('message',tick); The event-listener function simply parses the payload to find the time when the counter was generated and its value. The function then updates the dataset with the new dataset and redraws the graph created on every event. function tick(e) { if(e) { var eventdata = JSON.parse(e.data); window[eventdata.name].push([{ time: eventdata.time, y: eventdata.count]); Page 14 of 16
15 Accessing the Dashboard Our AWS CloudFormation template also deploys Node.js on a single-node Amazon EC2 instance for the server. You can find the URL for the Node server from the key named "VisualizationURL" under the output section of the AWS CloudFormation template. Navigate to the dashboard at that URL and you should see the graph being updated in real time as the data arrives. Deleting the CloudFormation Stack Once you've run through the stack and understand how several pieces of our reference architecture fit together, you can terminate the application resources created by the AWS CloudFormation stack. To do so, navigate to the AWS Management Console ( and select the AWS CloudFormation service. Select the stack that you created and then click on Delete Stack. When AWS CloudFormation is finished removing all resources, the stack should be removed from the list. Page 15 of 16
16 Conclusion In this whitepaper, we looked at how to implement real-time visualization of slidingwindows analysis over streaming data. In actual scenarios, real-time streaming data not only entails collection of data at scale but also being able to process, store, and deliver the results on a real-time basis. This brings in a number of challenges and requires a decoupled architecture for streaming, processing, storage, and delivery. Using systems for each layer, which can scale but easily integrate with other layers, allows for the everexpanding data velocity in real-time streaming system. Amazon Kinesis also supports multi-reader applications, allowing you to build multiple paths for the same data running at different latencies from a single stream. Using Amazon Kinesis connectors, you can also deliver the same stream of data to the other stores of your choice Amazon S3, Amazon Redshift, and Amazon DynamoDB. Amazon S3 offers highly durable and available cloud storage for a variety of content, ranging from web applications to media files, and can be used to build a cost-effective archive by running an application once every few hours. You can also use Amazon EMR clusters to read and process Amazon Kinesis streams directly, and use familiar tools like Hive, Pig, and MapReduce to analyze your data. 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. Page 16 of 16
CAPTURING & PROCESSING REAL-TIME DATA ON AWS
CAPTURING & PROCESSING REAL-TIME DATA ON AWS @ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
Real Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
Architectures for massive data management
Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet [email protected] October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with
Amazon Glacier. Developer Guide API Version 2012-06-01
Amazon Glacier Developer Guide Amazon Glacier: Developer Guide Copyright 2016 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in
Getting Started with AWS. Hosting a Static Website
Getting Started with AWS Hosting a Static Website Getting Started with AWS: Hosting a Static Website Copyright 2015 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. The following are
Getting Started with AWS. Hosting a Static Website
Getting Started with AWS Hosting a Static Website Getting Started with AWS: Hosting a Static Website Copyright 2016 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks
Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.
Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has
Amazon Simple Notification Service. Developer Guide API Version 2010-03-31
Amazon Simple Notification Service Developer Guide Amazon Simple Notification Service: Developer Guide Copyright 2014 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. The following
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Load testing with. WAPT Cloud. Quick Start Guide
Load testing with WAPT Cloud Quick Start Guide This document describes step by step how to create a simple typical test for a web application, execute it and interpret the results. 2007-2015 SoftLogica
Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros
David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you
BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO
BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO ANTHONY A. KALINDE SIGMA DATA SCIENCE GROUP ASSOCIATE "REALTIME BEHAVIOURAL DATA COLLECTION CLICKSTREAM EXAMPLE" WHAT IS CLICKSTREAM ANALYTICS?
Predictive Analytics with Storm, Hadoop, R on AWS
Douglas Moore Principal Consultant & Architect February 2013 Predictive Analytics with Storm, Hadoop, R on AWS Leading Provider Data Science and Engineering Services Accelerating Your Time to Value using
Client Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
Enterpise Mobility Lexicon & Terminology
1 Enterpise Mobility Lexicon & Terminology www.openratio.com By Rabih Kanaan 1 Amazon SNS Amazon Simple Notification Service (SNS) is a push messaging service that makes it simple & cost-effective to push
BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane
BIG DATA Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management Author: Sandesh Deshmane Executive Summary Growing data volumes and real time decision making requirements
BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
BERLIN 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Build Your Mobile App Faster with AWS Mobile Services Jan Metzner AWS Solutions Architect @janmetzner Danilo Poccia AWS Technical
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT
Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations
Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation
Real-time Big Data Analytics with Storm
Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap
BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
Integrating Mobile apps with your Enterprise
Integrating Mobile apps with your Enterprise Jonathan Marshall [email protected] @jmarshall1 Agenda Mobile apps and the enterprise Integrating mobile apps with Enterprise Applications Mobile apps and
Alfresco Enterprise on AWS: Reference Architecture
Alfresco Enterprise on AWS: Reference Architecture October 2013 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 13 Abstract Amazon Web Services (AWS)
AWS Service Catalog. User Guide
AWS Service Catalog User Guide AWS Service Catalog: User Guide Copyright 2016 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in
AWS Lambda. Developer Guide
AWS Lambda Developer Guide AWS Lambda: Developer Guide Copyright 2015 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
Thing Big: How to Scale Your Own Internet of Things. Walter'Pernstecher'-'[email protected]' Dr.'Markus'Schmidberger'-'schmidbe@amazon.
Thing Big: How to Scale Your Own Internet of Things Walter'Pernstecher'-'[email protected]' Dr.'Markus'Schmidberger'-'[email protected]' Internet of Things is the network of physical objects or "things"
Amazon Web Services. 18.11.2015 Yu Xiao
Amazon Web Services 18.11.2015 Yu Xiao Agenda Introduction to Amazon Web Services(AWS) 7 Steps to Select the Right Architecture for Your Web Applications Private, Public or Hybrid Cloud? AWS Case Study
Integrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
Getting Started with AWS. Static Website Hosting
Getting Started with AWS Static Website Hosting Getting Started with AWS: Static Website Hosting Copyright 2014 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. The following are trademarks
Unified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture
Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Adeniyi Abdul 2522715 Agenda Abstract Introduction
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Cache Configuration Reference
Sitecore CMS 6.2 Cache Configuration Reference Rev: 2009-11-20 Sitecore CMS 6.2 Cache Configuration Reference Tips and Techniques for Administrators and Developers Table of Contents Chapter 1 Introduction...
docs.hortonworks.com
docs.hortonworks.com Hortonworks Data Platform: Administering Ambari Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively
File S1: Supplementary Information of CloudDOE
File S1: Supplementary Information of CloudDOE Table of Contents 1. Prerequisites of CloudDOE... 2 2. An In-depth Discussion of Deploying a Hadoop Cloud... 2 Prerequisites of deployment... 2 Table S1.
Architecture Statement
Architecture Statement Secure, cloud-based workflow, alert, and notification platform built on top of Amazon Web Services (AWS) 2016 Primex Wireless, Inc. The Primex logo is a registered trademark of Primex
Primex Wireless OneVue Architecture Statement
Primex Wireless OneVue Architecture Statement Secure, cloud-based workflow, alert, and notification platform built on top of Amazon Web Services (AWS) 2015 Primex Wireless, Inc. The Primex logo is a registered
MEAN/Full Stack Web Development - Training Course Package
Brochure More information from http://www.researchandmarkets.com/reports/3301786/ MEAN/Full Stack Web Development - Training Course Package Description: This course pack features a detailed exploration
Middleware- Driven Mobile Applications
Middleware- Driven Mobile Applications A motwin White Paper When Launching New Mobile Services, Middleware Offers the Fastest, Most Flexible Development Path for Sophisticated Apps 1 Executive Summary
Openbus Documentation
Openbus Documentation Release 1 Produban February 17, 2014 Contents i ii An open source architecture able to process the massive amount of events that occur in a banking IT Infraestructure. Contents:
Best Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
Simple Storage Service (S3)
Simple Storage Service (S3) Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers. Amazon S3 provides a simple web services interface that can be used
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
TDAQ Analytics Dashboard
14 October 2010 ATL-DAQ-SLIDE-2010-397 TDAQ Analytics Dashboard A real time analytics web application Outline Messages in the ATLAS TDAQ infrastructure Importance of analysis A dashboard approach Architecture
Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011
Real-time Streaming Analysis for Hadoop and Flume Aaron Kimball odiago, inc. OSCON Data 2011 The plan Background: Flume introduction The need for online analytics Introducing FlumeBase Demo! FlumeBase
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
GigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence
BIG DATA ANALYTICS For REAL TIME SYSTEM
BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage
Visualizing a Neo4j Graph Database with KeyLines
Visualizing a Neo4j Graph Database with KeyLines Introduction 2! What is a graph database? 2! What is Neo4j? 2! Why visualize Neo4j? 3! Visualization Architecture 4! Benefits of the KeyLines/Neo4j architecture
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15
Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 2014 Amazon.com, Inc. and its affiliates. All rights
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
Ad Hoc Analysis of Big Data Visualization
Ad Hoc Analysis of Big Data Visualization Dean Yao Director of Marketing Greg Harris Systems Engineer Follow us @Jinfonet #BigDataWebinar JReport Highlights Advanced, Embedded Data Visualization Platform:
White Paper. How Streaming Data Analytics Enables Real-Time Decisions
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
Monitor Your Key Performance Indicators using WSO2 Business Activity Monitor
Published on WSO2 Inc (http://wso2.com) Home > Stories > Monitor Your Key Performance Indicators using WSO2 Business Activity Monitor Monitor Your Key Performance Indicators using WSO2 Business Activity
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems
Processing of massive data: MapReduce 2. Hadoop 1 MapReduce Implementations Google were the first that applied MapReduce for big data analysis Their idea was introduced in their seminal paper MapReduce:
Configuring user provisioning for Amazon Web Services (Amazon Specific)
Chapter 2 Configuring user provisioning for Amazon Web Services (Amazon Specific) Note If you re trying to configure provisioning for the Amazon Web Services: Amazon Specific + Provisioning app, you re
Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
This course provides students with the knowledge and skills to develop ASP.NET MVC 4 web applications.
20486B: Developing ASP.NET MVC 4 Web Applications Course Overview This course provides students with the knowledge and skills to develop ASP.NET MVC 4 web applications. Course Introduction Course Introduction
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Big Data Visualization and Dashboards
Big Data Visualization and Dashboards Boney Pandya Marketing Manager Greg Harris Systems Engineer Follow us @Jinfonet #BigDataWebinar JReport Highlights Advanced, Embedded Data Visualization Platform:
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Corporate Bill Analyzer
Corporate Bill Analyzer Product Description V 3.1 Contents Contents Introduction Platform Overview Core features Bill/Invoice presentment Corporate hierarchy support Billing Account hierarchy support Call
Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
New Features for Sybase Mobile SDK and Runtime. Sybase Unwired Platform 2.1 ESD #2
New Features for Sybase Mobile SDK and Runtime Sybase Unwired Platform 2.1 ESD #2 DOCUMENT ID: DC60009-01-0212-02 LAST REVISED: March 2012 Copyright 2012 by Sybase, Inc. All rights reserved. This publication
DevOps Best Practices for Mobile Apps. Sanjeev Sharma IBM Software Group
DevOps Best Practices for Mobile Apps Sanjeev Sharma IBM Software Group Me 18 year in the software industry 15+ years he has been a solution architect with IBM Areas of work: o DevOps o Enterprise Architecture
ORACLE MOBILE SUITE. Complete Mobile Development Solution. Cross Device Solution. Shared Services Infrastructure for Mobility
ORACLE MOBILE SUITE COMPLETE MOBILE DEVELOPMENT AND DEPLOYMENT PLATFORM KEY FEATURES Productivity boosting mobile development framework Cross device/os deployment Lightweight and robust enterprise service
Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:
Cloud (data) management Ahmed Ali-Eldin First part: ZooKeeper (Yahoo!) Agenda A highly available, scalable, distributed, configuration, consensus, group membership, leader election, naming, and coordination
Introducing Storm 1 Core Storm concepts Topology design
Storm Applied brief contents 1 Introducing Storm 1 2 Core Storm concepts 12 3 Topology design 33 4 Creating robust topologies 76 5 Moving from local to remote topologies 102 6 Tuning in Storm 130 7 Resource
Developer Tutorial Version 1. 0 February 2015
Developer Tutorial Version 1. 0 Contents Introduction... 3 What is the Mapzania SDK?... 3 Features of Mapzania SDK... 4 Mapzania Applications... 5 Architecture... 6 Front-end application components...
Load and Performance Load Testing. RadView Software October 2015 www.radview.com
Load and Performance Load Testing RadView Software October 2015 www.radview.com Contents Introduction... 3 Key Components and Architecture... 4 Creating Load Tests... 5 Mobile Load Testing... 9 Test Execution...
Search and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
What's New in SAS Data Management
Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases
EXECUTIVE SUMMARY CONTENTS. 1. Summary 2. Objectives 3. Methodology and Approach 4. Results 5. Next Steps 6. Glossary 7. Appendix. 1.
CONTENTS 1. Summary 2. Objectives 3. Methodology and Approach 4. Results 5. Next Steps 6. Glossary 7. Appendix EXECUTIVE SUMMARY Tenzing Managed IT services has recently partnered with Amazon Web Services
DLT Solutions and Amazon Web Services
DLT Solutions and Amazon Web Services For a seamless, cost-effective migration to the cloud PREMIER CONSULTING PARTNER DLT Solutions 2411 Dulles Corner Park, Suite 800 Herndon, VA 20171 Duane Thorpe Phone:
<Insert Picture Here> Oracle Web Cache 11g Overview
Oracle Web Cache 11g Overview Oracle Web Cache Oracle Web Cache is a secure reverse proxy cache and a compression engine deployed between Browser and HTTP server Browser and Content
QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering
QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering June 2014 Page 1 Contents Introduction... 3 About Amazon Web Services (AWS)... 3 About Amazon Redshift... 3 QlikView on AWS...
Collaborative Open Market to Place Objects at your Service
Collaborative Open Market to Place Objects at your Service D6.2.1 Developer SDK First Version D6.2.2 Developer IDE First Version D6.3.1 Cross-platform GUI for end-user Fist Version Project Acronym Project
COURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
Visualizing an OrientDB Graph Database with KeyLines
Visualizing an OrientDB Graph Database with KeyLines Visualizing an OrientDB Graph Database with KeyLines 1! Introduction 2! What is a graph database? 2! What is OrientDB? 2! Why visualize OrientDB? 3!
BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS
WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our
Building Scalable Applications Using Microsoft Technologies
Building Scalable Applications Using Microsoft Technologies Padma Krishnan Senior Manager Introduction CIOs lay great emphasis on application scalability and performance and rightly so. As business grows,
Monitoring the Real End User Experience
An AppDynamics Business White Paper HOW MUCH REVENUE DOES IT GENERATE? Monitoring the Real End User Experience Web application performance is fundamentally associated in the mind of the end user; with
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE
WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE Contents 1. Pattern Overview... 3 Features 3 Getting started with the Web Application Pattern... 3 Accepting the Web Application Pattern license agreement...
Wisdom from Crowds of Machines
Wisdom from Crowds of Machines Analytics and Big Data Summit September 19, 2013 Chetan Conikee Irfan Ahmad About Us CloudPhysics' mission is to discover the underlying principles that govern systems behavior
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
XpoLog Center Suite Log Management & Analysis platform
XpoLog Center Suite Log Management & Analysis platform Summary: 1. End to End data management collects and indexes data in any format from any machine / device in the environment. 2. Logs Monitoring -
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
