Big Data for everyone Democratizing big data with the cloud. Steffen Krause Technical Evangelist @AWS_Aktuell skrause@amazon.de



Similar documents
Analyzing Big Data with AWS

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Real Time Big Data Processing

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Data Infrastructure at Spotify

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Next-Generation Cloud Analytics with Amazon Redshift

BIG DATA TRENDS AND TECHNOLOGIES

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big data blue print for cloud architecture

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Open source Google-style large scale data analysis with Hadoop

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Student Project 1 - Explorative Data Analysis with Hadoop and Spark

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Thing Big: How to Scale Your Own Internet of Things.

There Are Clouds In Your Future. Jeff Barr Amazon Web (Twitter)

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Why Big Data in the Cloud?

Building your Big Data Architecture on Amazon Web Services

Big Data Spatial Analytics An Introduction

Data Analytics Infrastructure

Big Data at Cloud Scale

CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA

HDP Hadoop From concept to deployment.

Logentries Insights: The State of Log Management & Analytics for AWS

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Customer Case Study. Sharethrough

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Cloud Big Data Architectures

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Hadoop & Spark Using Amazon EMR

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Scalable Architecture on Amazon AWS Cloud

Scalable Application. Mikalai Alimenkou

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Cost-Effective Business Intelligence with Red Hat and Open Source

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Ø Teaching Evaluations. q Open March 3 through 16. Ø Final Exam. q Thursday, March 19, 4-7PM. Ø 2 flavors: q Public Cloud, available to public

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Big Data Web Analytics Platform on AWS for Yottaa

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Powerful analytics. and enterprise security. in a single platform. microstrategy.com 1

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Big Data and Industrial Internet

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

How to Leverage Cloud to Quickly Build Scalable Applications

The Inside Scoop on Hadoop

Amazon Web Services. Elastic Compute Cloud (EC2) and more...

How Companies are! Using Spark

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

THE STATE OF GEO BIG DATA IN OPEN SOURCE. Rob Emanuele

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

So What s the Big Deal?

Big Data Integration: A Buyer's Guide

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Upcoming Announcements

Introduction to AWS in Higher Ed

Hadoop & its Usage at Facebook

Native Connectivity to Big Data Sources in MSTR 10

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

the missing log collector Treasure Data, Inc. Muga Nishizawa

Oracle Big Data SQL Technical Update

Unisys Cost Schedule. Unisys Cost Schedule. Page 1

Big Data Pipeline and Analytics Platform

Ryan Horn, Lead Software Engineer at Twilio. November 12, 2014 Las Vegas. BDT312 Using the Cloud to Scale from a Database to a Data Platform

Moving From Hadoop to Spark

Big Data Use Case: Business Analytics

Hadoop & its Usage at Facebook

Big Data for the Rest of Us Technical White Paper

Sisense. Product Highlights.

Tap into Hadoop and Other No SQL Sources

Big Data & Cloud Computing. Faysal Shaarani

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Big Data & Netflix. Paul Ellwood February 9th, 2015

High performance analytics. Benchmark results.

This Symposium brought to you by

PLATFORA SOLUTION ARCHITECTURE

Microsoft Big Data. Solution Brief

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Analytics on Spark &

HDP Enabling the Modern Data Architecture

Transcription:

Big Data for everyone Democratizing big data with the cloud Steffen Krause Technical Evangelist @AWS_Aktuell skrause@amazon.de

Does this Data make me look big?

Overview Designing big data solutions in the cloud Not the only way to do it (but one that we have seen)

Big Data withaws Storage Big Data Compute Challenges start at relatively small volumes 100 GB 1,000 PB

Big Data withaws Storage Big Data Compute When data sets and data analytics need to scale to the point that you have to start innovatingaround how tocollect, store, organize, analyze and share it

Invest in data centers?

Generation Collection & storage Analytics & computation Collaboration & sharing

Generation Collection & storage Analytics & computation Collaboration & sharing

Storage Big Data Data has gravity Compute App Data App http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/

Storage Big Data and inertia at volume Compute Data http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/

Storage Big Data easier to move applications to the data Compute Data http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/

S3 as a single source of truth S3 Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-service-incloud.html

Generation Collection & storage Analytics & computation Collaboration & sharing

Hadoop based Analysis Amazon SQS DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon S3 Amazon EMR

Amazon Elastic MapReduce(EMR)? EMR is Hadoop in the Cloud

1 instance for 100 hours = 100 instances for 1 hour

Small instance = $6

1 instance for 1000hours = 1000instances for 1 hour

Small instance = $60

When you turn off your cloud resources, you actually stop paying for them

SQL based processing Amazon SQS DynamoDB Pre-processing framework Petabyte scale Columnar Data - warehouse Any SQL or NO SQL Store Log Aggregation tools Amazon S3 Amazon EMR Amazon Redshift

Generation Collection & storage Analytics & computation Collaboration & sharing

Sharing results and visualizations Amazon SQS Amazon Redshift Business Intelligence Tools DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon S3 Amazon EMR Business Intelligence Tools

The complete architecture Amazon SQS DynamoDB Amazon Redshift Business Intelligence Tools Visualization tools Any SQL or NO SQL Store Log Aggregation tools Amazon S3 Amazon EMR Amazon data pipeline GIS tools Business Intelligence Tools GIS tools on hadoop

Use cases

28

Lesson 1: Don t leave your Amazon account logged in at home Lesson 2: Use the data you have to drive proactive processes

Analyzing Credit Risk Requires 5 million simulations On AWS the simulation time reduced from 23 hours to 20 minutes

3000 cores for Risk Analysis (Monte Carlo) 3000 - CPU cores 300 cores during the weekend 300 - Wed Thu Fri Sat Sun Mon Tue

In 60 minutes, CHANNEL 4can analyze and model in-session data to deliver highly targeted ads to viewers before a program ends To get closer to growing video-on-demand (VOD) audiences and match them with advertisers, Channel 4 chose a cloud-based solution to help make sense of and monetize its unprecedented volumes of platform data.

Features powered by Amazon Elastic MapReduce: People Who Viewed this Also Viewed Review highlights Auto complete as you type on search Search spelling suggestions Top searches Ads 200 Elastic MapReduce jobs per day Processing 3TB of data

SkillPages Customer Use Case Everyone Needs Skilled People At Home At Work In Life Repeatedly

Data Architecture Join via Facebook Add a Skill Page Web Servers Raw Data Amazon S3 User Action Trace Events Invite Friends Get Data Amazon Redshift Amazon S3 Aggregated Data Raw Events Excel Data Analyst Tableau EMR Hive Scripts Process Content Process log files with regular expressions to parse out the info we need. Processes cookies into useful searchable data such as Session, UserId, API Security token. Filters surplus info like internal varnish logging. Internal Web

Foursquare 0.6 0.5 0.4 0.3 0.2 0.1 0 Female Gender Male We found that Amazon Redshift offers the performance we needed while freeing us from the licensing costsof our previous solution With Amazon Redshift and Tableau, anyone in the company can set up any queries they like from how users are reacting to a feature, to growth by demographic or geography, to the impact sales efforts have had in different areas. It s very flexible Jon Hoffman, Software Engineer, Foursquare Age When do people go to a place? Gorilla Coffee Gray's Papaya Amorino 0 20 40 60 80

Stack analysis and sharing Application Stack Scala/Liftweb Scala Mongo/Postgres/Flat Files API Machines Databases mongoexport postgres dump WWW Machines Application code Batch Jobs Logs Flume Data Stack Amazon S3 Database Dumps Log Files Hadoop Elastic Map Reduce Hive/Ruby/Mahout Analytics Dashboard Map Reduce Jobs

Everything that was a limited resource is now a programmable resource

Resources Hadoop Technology and Use Cases: http://www.powerof60.com/ http://aws.amazon.com/de Start withthefree Tier: http://aws.amazon.com/de/free/ 25 US$ creditsfornewgerman customers: http://aws.amazon.com/de/campaigns/account/ Twitter: @AWS_Aktuell Facebook: http://www.facebook.com/awsaktuell Webinars: http://aws.amazon.com/de/about-aws/events/