Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect



Similar documents
Real Time Big Data Processing

How to Leverage Cloud to Quickly Build Scalable Applications

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Building your Big Data Architecture on Amazon Web Services

Amazon EC2 Product Details Page 1 of 5

Thing Big: How to Scale Your Own Internet of Things.

Introduction to Amazon Web Services! Leo Senior Solutions Architect

CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA

Hadoop & Spark Using Amazon EMR

Scalable Architecture on Amazon AWS Cloud

Analyzing Big Data with AWS

Financial Services Grid Computing on Amazon Web Services January 2013 Ian Meyers

So What s the Big Deal?

Amazon Web Services. Lawrence Berkeley LabTech Conference 9/10/15. Jamie Baker Federal Scientific Account Manager AWS WWPS

Intro to AWS: Storage Services

Shadi Khalifa Database Systems Laboratory (DSL)

Next-Generation Cloud Analytics with Amazon Redshift

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

DLT Solutions and Amazon Web Services

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Service Organization Controls 3 Report

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

CONNECTRIA MANAGED AMAZON WEB SERVICES (AWS)

Innovative Geschäftsmodelle Ermöglicht durch die AWS Cloud

Amazon Web Services Annual ALGIM Conference. Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand

How AWS Pricing Works May 2015

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

How To Handle Big Data With A Data Scientist

Big Data for everyone Democratizing big data with the cloud. Steffen Krause Technical

How AWS Pricing Works

StorReduce Technical White Paper Cloud-based Data Deduplication

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

BIG DATA TRENDS AND TECHNOLOGIES

What is Cloud Computing? Tackling the Challenges of Big Data. Tackling The Challenges of Big Data. Matei Zaharia. Matei Zaharia. Big Data Collection

Running Oracle Applications on AWS

Amazon Web Services Student Tutorial

EEDC. Scalability Study of web apps in AWS. Execution Environments for Distributed Computing

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Amazon Elastic Beanstalk

Amazon Cloud Storage Options

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

IAN MASSINGHAM. Technical Evangelist Amazon Web Services

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Platforms in the Cloud

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

Big Data on Microsoft Platform

Amazon AWS in.net. Presented by: Scott Reed

Big Data and Industrial Internet

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Financial Services Grid Computing on Amazon Web Services. January, 2016

Storage Options in the AWS Cloud: Use Cases

Data Analytics Infrastructure

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Razvoj Java aplikacija u Amazon AWS Cloud: Praktična demonstracija

Cloud Computing For Bioinformatics

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

ur skills.com

Applications for Big Data Analytics

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Oracle Big Data SQL Technical Update

Open source Google-style large scale data analysis with Hadoop

CLOUD COMPUTING WITH AWS An INTRODUCTION. John Hildebrandt Solutions Architect ANZ

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service

Web Application Hosting in the AWS Cloud Best Practices

Leveraging Public Clouds to Ensure Data Availability

The Future of Data Management

Introduction to AWS in Higher Ed

Big data blue print for cloud architecture

Cloud Computing and Amazon Web Services

Cloud Computing. Lecture 24 Cloud Platform Comparison

Cloud computing - Architecting in the cloud

AIST Data Symposium. Ed Lenta. Managing Director, ANZ Amazon Web Services

Alfresco Enterprise on AWS: Reference Architecture

A Survey on Cloud Storage Systems

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Azure Data Lake Analytics

Cloud Computing Now and the Future Development of the IaaS

AWS Performance Tuning

Aleksandar Nenov. Devops Talk Belgrade 2015

NoSQL for SQL Professionals William McKnight

TRAINING PROGRAM ON BIGDATA/HADOOP

Big Data Explained. An introduction to Big Data Science.

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

Unisys Cost Schedule. Unisys Cost Schedule. Page 1

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

There Are Clouds In Your Future. Jeff Barr Amazon Web (Twitter)

Storing and Processing Sensor Networks Data in Public Clouds

From Internet Data Centers to Data Centers in the Cloud

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Cloud Computing. Adam Barker

Big Data Are You Ready? Thomas Kyte

Network Infrastructure Services CS848 Project

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Cloud Computing for Research. Jeff Barr - January 2011

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

Transcription:

on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \

So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze and share it

100 GB Challenges start at relatively small volumes 1,000 PB

Unconstrained data growth EB ZB 95% of the 1.2 zettabytes of data in the digital universe is unstructured 70% of of this is usergenerated content GB TB PB Unstructured data growth explosive, with estimates of compound annual growth (CAGR) at 62% from 2008 2012. Source: IDC

Where does it come from? Web sites Blogs/Reviews/Emails/Pictures Social Graphs Facebook, Linked-in, Contacts Application server logs Web sites, games Sensor data Weather, water, smart grids Images/videos Traffic, security cameras Twitter 50m tweets/day 1,400% growth/year

Why AWS and big data? Storage Innovation Amazon Amazon DynamoDB RedShift Amazon S3 HPC Spot EMR

Services Amazon EMR (Elastic Map Reduce) AWS Data Pipeline Amazon Redshift Hosted Hadoop framework Move data among AWS services and onpremises data sources Petabyte-scale data warehouse service AWS Worldwide Public Sector Team

How do you get your slice of it? AWS Direct Connect AWS Import/Export Queuing Amazon Storage Gateway Dedicated low latency bandwidth Physical media shipping Highly scalable event buffering Sync local storage to the cloud

Where do you put your slice of it? AWS Relational Database Service Fully managed database (MySQL, Oracle, MS SQL Server, AWS SimpleDB NoSQL, Schema-less Smaller datasets AWS DynamoDB NoSQL, Schema-less, Provisioned throughput database Amazon S3 Object datastore up to 5TB per object 99.999999999% durability PostgreSQL)

Where do you put your slice of it? Amazon Glacier Long term cold storage From $0.01 per GB/Month 99.999999999% durability

How quick do you need to read it? Single digit ms 10s-100s ms <5 hours AWS DynamoDB Social scale applications Provisioned throughput performance Flexible consistency models AWS S3 Any object, any app 99.999999999% durability Objects up to 5TB in size Performance AWS Glacier Media & asset archives Extremely low cost S3 levels of durability Scale Price

Operate at any scale Unlimited data Performance Scale Price

Data has gravity App Data App http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/

and inertia at volume Data http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/

easier to move applications to the data Data http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/

Bring compute capacity to the data Very large dataset seeks strong & consistent compute for short term relationship, possibly longer

Flexible compute resources, on demand Amazon Elastic Cloud (EC2) Basic unit of compute capacity Range of CPU, memory & local disk options 27 Instance types available, from micro through cluster compute to SSD backed Vertical Scaling From $0.02/hr Feature Flexible Scalable Machine Images Full control VM Import/Export Monitoring Inexpensive Secure Details Run Windows or Linux distributions Wide range of instance types from micro to cluster compute Configurations can be saved as machine images (AMIs) from which new instances can be created Full root or administrator rights Import and export VM images to transfer configurations in and out of EC2 Publishes metrics to Cloud Watch On-demand, Reserved and Spot instance types Full firewall control via Security Groups

Elastic capacity as you need it On and Off Fast Growth Variable peaks Predictable peaks

Elastic capacity as you need it WASTE On and Off Fast Growth Variable peaks CUSTOMER DISSATISFACTION Predictable peaks

Elastic capacity as you need it Capacity Traditional IT capacity Time Elastic cloud capacity Your IT needs

Elastic capacity as you need it On and Off Fast Growth Variable peaks Predictable peaks

From one instance

to thousands

Why AWS and big data? Storage Innovation DynamoDB S3 RedShift HPC Spot EMR

Why AWS and big data? Storage Innovation DynamoDB S3 RedShift HPC Spot EMR

AWS EMR Elastic MapReduce

Amazon Elastic MapReduce A key tool in the toolbox to help with challenges Makes possible analytics processes previously not feasible Cost effective when leveraged with EC2 spot market Broad ecosystem of tools to handle specific use cases AWS Worldwide Public Sector Team

Hadoop-as-a-service Map-Reduce engine Integrated with tools What is EMR? Massively parallel Integrated to AWS services Cost effective AWS wrapper

HDFS Reliable storage MapReduce Data analysis

EC2 instance Input file map reduce Output file

EC2 instance Input file map reduce Output file EC2 instance Input file map reduce Output file EC2 instance Input file map reduce Output file

Map? Reduce? Person Start End Bob 00:44:48 00:45:11 Charlie 02:16:02 02:16:18 Charlie 11:16:59 11:17:17 Charlie 11:17:24 11:17:38 Bob 11:23:10 11:23:25 Alice 16:26:46 16:26:54 David 17:20:28 17:20:45 Alice 18:16:53 18:17:00 Charlie 19:33:44 19:33:59 Bob 21:13:32 21:13:43 David 22:36:22 22:36:34 Alice 23:42:01 23:42:11 map Person Duration Bob 23 Charlie 16 Charlie 18 Charlie 14 Bob 15 Alice 8 David 17 Alice 7 Charlie 15 Bob 11 David 12 Alice 10 reduce Person Total Alice 25 Bob 49 Charlie 63 David 29

AWS Elastic MapReduce Architecture AWS Worldwide Public Sector Team

Pig HDFS Amazon EMR

HDFS Amazon EMR Amazon S3 Amazon DynamoDB

Data management HDFS Amazon EMR Amazon S3 Amazon DynamoDB

Data management Analytics languages Pig HDFS Amazon EMR Amazon S3 Amazon DynamoDB

Data management Analytics languages Pig HDFS Amazon EMR Amazon RDS Amazon S3 Amazon DynamoDB

Data management Analytics languages Pig HDFS Amazon EMR Amazon RDS Amazon RedShift AWS Data Pipeline Amazon S3 Amazon DynamoDB

Useful Resources & Links AWS : http://aws.amazon.com/big-data AWS HPC: http://aws.amazon.com/hpc-applications Architecture Center: http://aws.amazon.com/architecture Documentation: http://aws.amazon.com/documentation Security Center: http://aws.amazon.com/security Whitepapers: http://aws.amazon.com/whitepapers Resources: http://aws.amazon.com/resources Case Studies: http://aws.amazon.com/solutions/case-studies Solution Providers: http://aws.amazon.com/solutions/global-solution-providers Calculator: http://calculator.s3.amazonaws.com/calc5.html TCO Calculator: http://aws.amazon.com/tco-calculator AWS Blog: http://aws.typepad.com The Power of 60: http://www.powerof60.com

Thank you! Tim Bixler Manager, Federal Solutions Architecture tbixler@amazon.com