Building your Big Data Architecture on Amazon Web Services



Similar documents
How to Leverage Cloud to Quickly Build Scalable Applications

Amazon Web Services. Steve Spano, Brig Gen (Ret), USAF GM, Defense and National Security AWS, Worldwide Public Sector

Amazon.com, Inc. and its affiliates. All rights reserved.

CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA

Extending your Enterprise IT with Amazon Virtual Private Cloud. Oyvind Roti Principal Solutions Architect, AWS

Introduction to Amazon Web Services! Leo Senior Solutions Architect

IAN MASSINGHAM. Technical Evangelist Amazon Web Services

Amazon Web Services (AWS) A Secure and Scalable Platform for Global Enterprises. Tim Bixler Sr. Manager, Solutions Architecture

Amazon Web Services. For Government, Education, and Nonprofit Organizations. Jakob Huhn. Partner Manager Benelux, Public Sector

Amazon Web Services. Lawrence Berkeley LabTech Conference 9/10/15. Jamie Baker Federal Scientific Account Manager AWS WWPS

Innovative Geschäftsmodelle Ermöglicht durch die AWS Cloud

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Amazon Web Services Annual ALGIM Conference. Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand

Real Time Big Data Processing

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Cloud for PSU and Govt K. B. Rajendran, Head IDC & Cloud Business, Dimension Data India Ltd.

Hadoop & Spark Using Amazon EMR

Parallel Data Warehouse

Microsoft Hybrid Cloud: Best of Both Worlds. March 26, 2015

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Enterprise Cloud Services

BIG DATA TRENDS AND TECHNOLOGIES

Next-Generation Cloud Analytics with Amazon Redshift

Luncheon Webinar Series May 13, 2013

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Protecting Big Data Data Protection Solutions for the Business Data Lake

Rethink Disaster Recovery with Microsoft

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

Amazon EC2 Product Details Page 1 of 5

Cloud Database Demystified to Deliver SaaS Customer Value

Enterprise Cloud Services from Dimension Data. KB Rajendran Head IDC & Cloud Services

Cloud Computing and Amazon Web Services

Cloud Computing. Adam Barker

Guide to AWS. Brought to you by

Top 5 Reasons to choose Microsoft Windows Server 2008 R2 SP1 Hyper-V over VMware vsphere 5

Leveraging Public Clouds to Ensure Data Availability

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Oracle Database Backup Service. Secure Backup in the Oracle Cloud

Microsoft Business Intelligence solution. What makes Microsoft BI difference

Amazon Cloud Storage Options

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Cloud Computing For Bioinformatics

Big data management with IBM General Parallel File System

Nimble Storage + OpenStack 打 造 最 佳 企 業 專 屬 雲 端 平 台. Nimble Storage Brian Chen, Solution Architect Jay Wang, Principal Software Engineer

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

CitusDB Architecture for Real-Time Big Data

Big Data Analytics: Today's Gold Rush November 20, 2013

Amazon Web Services Fredrik Rapp, Partner Manager. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Big Data at Cloud Scale

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

AVLOR SERVER CLOUD RECOVERY

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Data & Cloud Computing. Faysal Shaarani

Scalable Architecture on Amazon AWS Cloud

Deliver Desktops as a Service! With VDI 2.0!

The Future of Data Management

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Savanna Hadoop on. OpenStack. Savanna Technical Lead

Big data blue print for cloud architecture

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Introduction to Cloud Computing

How To Handle Big Data With A Data Scientist

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

G06 - How to store your data in SharePoint

How To Use Hp Vertica Ondemand

NoSQL for SQL Professionals William McKnight

Increased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES WHITE PAPER

SQL Server 2012 Parallel Data Warehouse. Solution Brief

CommVault Backup Appliance with NetApp

Introduction to AWS Economics

EMC BACKUP MEETS BIG DATA

Oracle Database Backup in the Cloud. An Oracle White Paper September 2008

AWS Next Generation Storage Solutions. John Downey, Founder NextGen Storage, LLC

Big Data Technologies Compared June 2014

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services

HadoopTM Analytics DDN

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Analyzing Big Data with AWS

Big Data Management and Security

Scala Storage Scale-Out Clustered Storage White Paper

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

How AWS Pricing Works

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

I/O Considerations in Big Data Analytics

Integrating a Multi-tiered Deduplication Approach to Simplify Enterprise-wide Backup & Recovery

Transcription:

Building your Big Data Architecture on Amazon Web Services Abhishek Sinha @abysinha sinhaar@amazon.com

AWS Services Deployment & Administration Application Services Compute Storage Database Networking AWS Global Infrastructure

AWS Global Infrastructure 9 Regions 25 Availability Zones Continuous Expansion

$5.2B retail business 7,800 employees A whole lot of servers Every day, AWS adds enough server capacity to power that whole $5B enterprise

Powering the Most Popular Internet Businesses

We have partners and technologies ready to help

Solving Problems for Organizations Around the World

Value proposition of the AWS cloud No Upfront Investment Replace capital expenditure with variable expense Speed and agility Infrastructure in minutes not weeks Low ongoing cost Customers leverage our economies of scale 37 PRICE REDUCTIONS Focus on business Not undifferentiated heavy lifting Flexible capacity No need to guess capacity requirements and overprovision Global Reach Go global in minutes and reach a global audience

Gartner Magic Quadrant for Cloud Infrastructure as a Service (August 19, 2013) Gartner Magic Quadrant for Cloud Infrastructure as a Service, Lydia Leong, Douglas Toombs, Bob Gill, Gregor Petri, Tiny Haynes, August 19, 2013. This Magic Quadrant graphic was published by Gartner, Inc. as part of a larger research note and should be evaluated in the context of the entire report.. The Gartner report is available upon request from Steven Armstrong (asteven@amazon.com). Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

An engineer s definition When your data sets become so large that you have to start innovating how to collect, store, organize, analyze and share it

Generation Collection & storage Analytics & computation Collaboration & sharing

Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing

Lower cost, higher throughput Generation Collection & storage Highly constrained Analytics & computation Collaboration & sharing

Data volume Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012 2016 Forecast and 2011 Vendor Shares

Amazon Web Services helps remove constraints

Elastic and highly scalable + No upfront capital expense + Only pay for what you use + Available on-demand = Remove constraints

More than 25 Million Streaming Members 50 Billion Events Per Day 30 Million plays every day 2 billion hours of video in 3 months 4 million ratings per day 3 million searches Device location, time, day, week etc. Social data

10 TB of streaming data per day

Who buys video games?

Per day: 3.5 billion records 13 TB of click stream logs 71 million unique cookies

Today

Big Data tools Elastic MapReduce and Redshift

Big Data tools Elastic MapReduce and Redshift

How does EMR work? Choose: Hadoop distribution, # of nodes, types of nodes, custom configs, Hive/Pig/etc. Put the data into S3 EMR Cluster S3 EMR Launch the cluster using the EMR console, CLI, SDK, or APIs Get the output from S3 You can also store everything in HDFS

What can you run on EMR EMR Cluster S3 EMR

Resize Nodes EMR Cluster S3 EMR You can easily add and remove nodes

10 node cluster x 10 hours costs exactly the same as running 100 nodes cluster x 1 hours

Big Data tools Elastic MapReduce and Redshift

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud MPP Load Query Resize Backup Restore Parallelize and Distribute Everything Dramatically Reduce I/O

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud MPP Load Query Resize Backup Restore Parallelize and Distribute Everything Dramatically Reduce I/O Direct-attached storage Large data block sizes Column data store Data compression Zone maps

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud Redshift data is encrypted Continuously backed up to S3 Automatic node recovery Transparent disk failure Protect Operations Simplify Provisioning

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud Protect Operations Simplify Provisioning Redshift data is encrypted Continuously backed up to S3 Automatic node recovery Transparent disk failure Create a cluster in minutes Automatic OS and software patching Scale up to 1.6PB with a few clicks and no downtime

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud Start Small and Grow Big Extra Large Node (XL) 3 spindles, 2TB, 15GiB RAM 2 virtual cores, 10GigE 1 node (2TB) 2-32 node cluster (64TB) 8 Extra Large Node (8XL) 24 spindles, 16TB, 120GiB RAM 16 virtual cores, 10GigE 2-100 node cluster (1.6PB)

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud Easy to provision and scale No upfront costs, pay as you go High performance at a low price Open and flexible with support for popular BI tools

Sydney Singapore Tokyo Price Per Hour for XL Node On-Demand $ 1.25 1 Year Reservation $ 0.75 3 Year Reservation $ 0.45 (US$)

So for example. 1 XL node reserved for 3 years: = 0.45c x number of hours in a month = $340 per month 1 XL node cluster gives you: 2 Cores, 16 GB RAM, 2 TB Disk Plus 2 TB storage in S3 for backups & snapshots

Big Data + Cloud = Awesome Combination Big data: Potentially massive datasets Iterative, experimental style of data manipulation and analysis Frequently not a steady-state workload; peaks and valleys Data is a combination of structured and unstructured data in many formats AWS Cloud: Massive, virtually unlimited capacity Iterative, experimental style of infrastructure deployment/usage At its most efficient with highly variable workloads Tools for managing structured and unstructured data

THANK YOU Please come visit us at the Solution Architects Corner at AWS booth sinhaar@amazon.com @abysinha