Instance Types. Standard Instances:

Similar documents
Cloud Computing and Amazon Web Services

Shadi Khalifa Database Systems Laboratory (DSL)

BIG DATA TRENDS AND TECHNOLOGIES

Workshop on Hadoop with Big Data

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES

Cloud Computing For Bioinformatics

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

StorReduce Technical White Paper Cloud-based Data Deduplication

Savanna Hadoop on. OpenStack. Savanna Technical Lead

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Cloud computing - Architecting in the cloud

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Amazon EC2 Product Details Page 1 of 5

High Performance Applications over the Cloud: Gains and Losses

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Real Time Big Data Processing

Hadoop IST 734 SS CHUNG

Running R from Amazon's Elastic Compute Cloud

Introduction to Big Data Training

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Large-Scale Data Engineering. Cloud Computing - Computing as a Service

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data on Microsoft Platform

Apache Hadoop: Past, Present, and Future

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Scalable Architecture on Amazon AWS Cloud

FREE computing using Amazon EC2

PV213 Enterprise Information Systems in Practice 07 - Architecture of the EIS in the cloud

Introduction to. Thilina Gunarathne Salsa Group, Indiana University. With contributions from Saliya Ekanayake.

Hadoop Ecosystem B Y R A H I M A.

Développement logiciel pour le Cloud (TLC)

Certified Big Data and Apache Hadoop Developer VS-1221

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study

Introduction to Cloud Computing

Cloud Computing. Adam Barker

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Data processing goes big

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Amazon EC2 XenApp Scalability Analysis

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Intro to AWS: Storage Services

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Amazon Elastic Compute Cloud Getting Started Guide. My experience

Big Data Explained. An introduction to Big Data Science.

MySQL and Virtualization Guide

Peers Techno log ies Pv t. L td. HADOOP

Amazon Hosted ESRI GeoPortal Server. GeoCloud Project Report

Big Data and Natural Language: Extracting Insight From Text

wu.cloud: Insights Gained from Operating a Private Cloud System

Open source Google-style large scale data analysis with Hadoop

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Scalable Application. Mikalai Alimenkou

TRAINING PROGRAM ON BIGDATA/HADOOP

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Amazon Web Services. Luca Clementi Sriram Krishnan NBCR Summer Institute, August 2009

Efficient Cloud Management for Parallel Data Processing In Private Cloud

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Hadoop. Sunday, November 25, 12

Hadoop & Spark Using Amazon EMR

Virtual Machine (VM) These VMs are to be used for teaching: they are not workstations for calculation.

The Greenplum Analytics Workbench

Apache Sentry. Prasad Mujumdar

BIG DATA HADOOP TRAINING

Analyzing Big Data with AWS

Best Practices for Sharing Imagery using Amazon Web Services. Peter Becker

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

A Service for Data-Intensive Computations on Virtual Clusters

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Use of Hadoop File System for Nuclear Physics Analyses in STAR

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Scalable Services for Digital Preservation

Getting Started with Hadoop with Amazon s Elastic MapReduce

[Type text] Week. National summer training program on. Big Data & Hadoop. Why big data & Hadoop is important?

GTC Presentation March 19, Copyright 2012 Penguin Computing, Inc. All rights reserved

Online Backup Guide for the Amazon Cloud: How to Setup your Online Backup Service using Vembu StoreGrid Backup Virtual Appliance on the Amazon Cloud

Upcoming Announcements

Chase Wu New Jersey Ins0tute of Technology

BIG DATA What it is and how to use?

There Are Clouds In Your Future. Jeff Barr Amazon Web (Twitter)

ST 810, Advanced computing

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

Cloud Computing. Alex Crawford Ben Johnstone

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software

Amazon Elastic MapReduce. Jinesh Varia Peter Sirota Richard Cole

Cloud Computing for Research. Jeff Barr - January 2011

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

Last time. Today. IaaS Providers. Amazon Web Services, overview

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Hadoop & its Usage at Facebook

OTM in the Cloud. Ryan Haney

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

Introduction. Various user groups requiring Hadoop, each with its own diverse needs, include:

Apache HBase. Crazy dances on the elephant back

Transcription:

Instance Types Standard Instances: 1EC2CU: equivalent of 1.0-1.2GHz 2007 AMD Opteron or 2007 Intel Xeon processor Small: 1.7GBmem, 1EC2Compute Unit (EC2CU), 160GB local instance storage(lis), 32/64bits. Medium: 3.75 GBmem, 2EC2CU, 410GBlis, 32/64bits. Large: 7.5GBmem, 4EC2CU, 850GBlis, 64bits Extra Large: 15GBmem, 8EC2CU, 1690GBlis, 64bits. Micro Instances: 613MBmem, 2ECUs, EBS High-Memory Instances: 17.1, 34.2, 68.4GBs. High-CPU Instances (5EC2CU or 20EC2CU) Cluster GPU Instances (22GBmem, 33.5EC2CU, 2xNVIDIA Tesla Fermi M2050 GPUs, 1690GBlis, 10GEthernet. 21

Instance vs. VM Instance = VM + hardware (instance type) AMI (Amazon Machine Image) = VM image VM image = OS + software Users specify the type of VM and hardware (i.e., instance type) when setting up an instance 22

OS and Software Amazon Machine Images (AMIs) are preconfigured with an evergrowing list of operating systems (win2008os including in price!!) 23

Pricing: On-Demand Instance 24

Data Transfer Charge chow 25

AWS s Free Usage Tier 26

Amazon S3 (Simple Storage Service) Basics Data stored as objects (files) in buckets key to file is path identified by <bucket> + <path> No real directories, just path segments Great as persistent storage for data Reliable up to 99.999999999% Scalable up to petabytes of data Fast highly parallel requests

S3 Access Via your web browser Various command line tools s3cmd Or via HTTP REST interface Create (PUT/POST), Read (GET), Delete (DELETE)

S3 Limitations Can t be modified (no random write or append) Max size of 5TB (5GB per upload request)

S3 Pricing Varies by region Data in is (currently) free Data out is also free within same region Otherwise starts at $0.12/GB Storage cost is per GB-month Starts at $0.140/GB, drops w/volume

S3 Access Control List (ACL) Read/Write permissions on per-bucket basis Read == listing objects in bucket Write == create/overwrite/delete objects in bucket Read/Write permissions on per-object (file) basis Read == read object data & metadata

S3 Amazon web services S3 API support the ability to: Find buckets and objects (jar file, data file, etc.) Discover their meta data Create new buckets Upload new objects Delete existing buckets and objects Distcp/s3distcp from S3 to HDFS for computation

Amazon EMR A web service that allow cost-effective large data processing Hadoop (HDFS + Map-Reduce) over EC2 and S3 EMR is mostly used for data intensive tasks Examples: web indexing, data mining, log analysis, data warehousing, machine learning, financial analysis, scientific simulation, bioinformatics 33

Apache Hadoop Stack for Data analytics Resource Management & Workflow HBase Pig, Hive, Mahout Map Reduce Yarn Zookeeper HDFS Sqoop Flume 34

Why Use Elastic MapReduce? Reduce hardware & IT personnel costs Pay for what you actually use Don t pay for people you don t need Don t pay for capacity you don t need More agility, less wait time for hardware Don t waste time buying/racking/configuring servers Many server classes to choose from (micro to massive) Less time doing Hadoop deployment & version mgmt Optimized Hadoop is pre-installed

Amazon Mechanical Turk A web service that exposes an ondemand global workforce ready to complete small tasks in exchange for micro-payments Frictionless. Outsourcing per-se is irrelevant. A web services API Examples?

Identify Road Markings

How It Works www.mturk.com Requester (Developer) Human Intelligence Tasks (HITs) Worker Qualifications Artificial, Artificially Intelligent Software Completed HITs Workers 38

Example Application: Podcast transcription service provider, which transcribes audio into high-quality text Amazon Simple Storage: Stores the podcasts and related files Amazon Mechanical Turk + EMR: voice recognition algorithms transcribe podcasts Amazon EMR: index text within search engine

Learn More About AWS AWS: http://aws.amazon.com EC2 Resources: http://docs.amazonwebservices.com/awse C2/latest/UserGuide/ Amazon EMR: http://aws.amazon.com/elasticmapredu ce/

Homework (Last Friday) Setup AWS account Watch Video on AWS EMR Getting Started (11:04) Signing up for an AWS account, generating a keypair, and setting up an S3 bucket. Running Jobs (14:47) Creating, monitoring, and getting results from you EMR Job Flow. Clusters of Servers (10:50) EC2 instance types, pricing, and Hadoop cluster configuration. Dealing with Data (18:54) S3 architectures, pricing, and access control. 41

Homework (Cont.) AWS Hands-on Lab 0 Follow the instructions from the tutorial and repeat the tasks including: create an account, working with S3, create cluster, and run a job, setup instances. Compile jar file using the source code posted on course website 42

Summary Cloud Computing AWS EC2 and S3 EMR and AMT Hands-on Lab 0 warming up 43