AstroCompute. AWS101 - using the cloud for Science. Brendan Bouffler ( boof ) Scientific Computing (SciCo) @ AWS. ska-astrocompute@amazon.



Similar documents
Amazon Web Services. Lawrence Berkeley LabTech Conference 9/10/15. Jamie Baker Federal Scientific Account Manager AWS WWPS

CLOUD COMPUTING & DIGITAL CUSTOMER EXPERIENCE. Nicola Previati Territory Manager Italy

Introduction to AWS in Higher Ed

Amazon Web Services Annual ALGIM Conference. Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand

Research on AWS. Genomics Research Statistical Analysis Mathematical Modeling Computational Fluid Dynamics Grid/Cluster/High Performance Computing

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Amazon Web Services. For Government, Education, and Nonprofit Organizations. Jakob Huhn. Partner Manager Benelux, Public Sector

Cloud Computing: Opening the Path to Innovation

LONDON. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

IAN MASSINGHAM. Technical Evangelist Amazon Web Services

Razvoj Java aplikacija u Amazon AWS Cloud: Praktična demonstracija

SGE & Amazon EC2. Chris Dagdigian 2008 OSGC Conference

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Introduction to Amazon Web Services! Leo Senior Solutions Architect

Scalable Architecture on Amazon AWS Cloud

Homework #7 Amazon Elastic Compute Cloud Web Services

AIST Data Symposium. Ed Lenta. Managing Director, ANZ Amazon Web Services

SGE Roll: Users Guide. Version Edition

Hadoop & Spark Using Amazon EMR

How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

ArcGIS for Server in the Amazon Cloud. Michele Lundeen Esri

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

Assignment # 1 (Cloud Computing Security)

Thing Big: How to Scale Your Own Internet of Things.

An Introduction to High Performance Computing in the Department

Accelerating Time to Science: Transforming Research in the Cloud

Real Time Big Data Processing

How to Run Parallel Jobs Efficiently

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Monitoring and Scaling My Application

ur skills.com

Microservices on AWS

Enigma, Sun Grid Engine (SGE), and the Joint High Performance Computing Exchange (JHPCE) Cluster

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

Cloud Computing. Adam Barker

Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

Getting Started with AWS. Web Application Hosting for Linux

HDFS Cluster Installation Automation for TupleWare

Innovative Geschäftsmodelle Ermöglicht durch die AWS Cloud

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Enterprise PaaS Evaluation Guide

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Introduction to the SGE/OGS batch-queuing system

Introduction to Cloud Computing on Amazon Web Services (AWS) with focus on EC2 and S3. Horst Lueck

Last time. Today. IaaS Providers. Amazon Web Services, overview

Last time. Today. IaaS Providers. Amazon Web Services, overview

Data Lab System Architecture

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Azure Day Application Development

Amazon EC2 Product Details Page 1 of 5

This computer will be on independent from the computer you access it from (and also cost money as long as it s on )

Grid 101. Grid 101. Josh Hegie.

GRID Computing: CAS Style

1.0. User Manual For HPC Cluster at GIKI. Volume. Ghulam Ishaq Khan Institute of Engineering Sciences & Technology

AWS Account Setup and Services Overview

Hadoop Basics with InfoSphere BigInsights

Last time. Today. IaaS Providers. Amazon Web Services, overview

Comsol Multiphysics. Running COMSOL on the Amazon Cloud. VERSION 4.3a

Infrastructure Clouds for Science and Education: Platform Tools

Eucalyptus Tutorial HPC and Cloud Computing Workshop

Bluemix: The Open Platform as a Service

TECHNOLOGY WHITE PAPER Jun 2012

Cloudify and OpenStack Heat

Implementing Microsoft Windows Server Failover Clustering (WSFC) and SQL Server 2012 AlwaysOn Availability Groups in the AWS Cloud

THE EUCALYPTUS OPEN-SOURCE PRIVATE CLOUD

Single Node Hadoop Cluster Setup

Security, Compliance and Sharing of Genomic Data on the Cloud

Alfresco Enterprise on AWS: Reference Architecture

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher

CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA

Clusters in the Cloud

Scalable Application. Mikalai Alimenkou

Primex Wireless OneVue Architecture Statement

Grid Engine Users Guide p1 Edition

OpenStack. Orgad Kimchi. Principal Software Engineer. Oracle ISV Engineering. 1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.

DLT Solutions and Amazon Web Services

CloudCenter Full Lifecycle Management. An application-defined approach to deploying and managing applications in any datacenter or cloud environment

Getting Started with AWS. Computing Basics for Linux

Elastic Load Balancing Auto Scaling

Manual for using Super Computing Resources

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Hybrid Development and Test USE CASE

Cloud Models and Platforms

MICROSTRATEGY ON AWS

Cloud Computing with Amazon Web Services and the DevOps Methodology.

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Efficient Network Marketing - Fabien Hermenier A.M.a.a.a.C.

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Amazon Web Services, S3 and Elastic MapReduce

Why should you look at your logs? Why ELK (Elasticsearch, Logstash, and Kibana)?

TECHNOLOGY WHITE PAPER Jan 2016

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

FREE computing using Amazon EC2

ArcGIS for Server: In the Cloud

How To Scale A Server Farm

Postgres Plus Cloud Database!

UZH Experiences with OpenStack

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

Transcription:

AstroCompute AWS101 - using the cloud for Science Brendan Bouffler ( boof ) Scientific Computing (SciCo) @ AWS ska-astrocompute@amazon.com

AWS is hoping to contribute to the development of data processing, analysis and visualisation algorithms, tools and services for astronomy running in the cloud. Our primary objectives for this program are to expand global participation in the SKA and kickstart the creation of the "SKA App Store on AWS.

AWS Global Impact Initiatives for Science AWS Research Grants AWS Hosted Public Datasets AWS SciCo Team Dedicated team focusing Grants to initiate & support development of cloud-enabled technologies. Typically one-off grants of AWS resources like EC2 (compute) or S3 & EBS (storage) or more exotic like Kinesis & twitter feeds. Frequently results in reusable resources, like AMIs or open data, which we strongly encourage. Lowers the risk to try the cloud. - Large and globally significant datasets hosted and paid for by AWS for community use. - Data can be quickly and easily processed with elastic computing resources in the surrounding cloud. - AWS hopes to enable more innovation, more quickly. - Provided in partnership with content owners, who curate the data. on Scientific Computing & Research workloads Globally focussed and engaged in Big Science projects like the SKA. Leveraging AWS resources all over the world. Ensuring the cloud is able to make a disruptive impact on science.

Astronomy in the Cloud CHILES will produce the first HI deep field, to be carried out with the VLA in B array and covering a redshift range from z=0 to z=0.45. The field is centered at the COSMOS field. It will produce neutral hydrogen images of at least 300 galaxies spread over the entire redshift range. Working with AWS s SciCo team to exploit the SPOT market in the cloud, the team at ICRAR in Australia have been able to implement the entire processing pipeline in the cloud for around $2,000 per month, which means the $1.75M they otherwise needed to spend on an HPC cluster can be spent on way cooler things that impact their research like astronomers.

Global AWS Regions

Global AWS Regions

Owning a computer The spherical model # CPUs time The cloud model

Time travel # CPUs # CPUs time time Wall clock time: ~1 hour Wall clock time: ~1 week Cost: equal

Spot Market Spot Market Our ultimate space filler. Spot Instances allow you to name your own price for spare AWS computing capacity. 6.00 4.50 # CPUs Great for workloads that aren t time sensitive, and especially popular in research (hint: it s really cheap). 3.00 1.50 0.00 time Often 10x cheaper than on-demand.

Applications Virtual Desktops Collaboration and Sharing Databases Analytics App Services Deployment & Management Mobile & Devices Relational Hadoop Queuing Containers Identity Platform Services NoSQL Columnar Real-time Data warehouse Orchestration App streaming Transcoding Email Dev/ops Tools Resource Templates Usage Tracking Sync Mobile Analytics Caching Data Workflows Search Monitoring and Logs Notifications Foundation Services Compute (VMs, Auto-scaling & Load Balancing) Storage (Object, Block and Archive) Security & Access Control Networking Infrastructure Regions Availability Zones CDN Points of Presence

The AWS Console on the web

Instances There s a couple dozen EC2 compute instance types alone, each of which is optimized for different things. One size does not fit all. http://aws.amazon.com/ec2/instance-types/

AWS CLI - command line interface Anything you can do in the GUI, you can do on the command line. as-create-launch-config spotlc-5cents --image-id ami-e565ba8c --instance-type m1.small --spot-price 0.05... as-create-auto-scaling-group spotasg --launch-configuration spotlc-5cents --availability-zones us-east-1a,us-east-1b --max-size 16 --min-size 1 --desiredcapacity 3 Boto API Java Python Ruby PHP Shell and in most popular languages. http://aws.amazon.com/cli/

The AWS MarketPlace An Online catalog of Amazon Machine Images (AMIs) AWS.amazon.com/marketplace Almost all major research-focused operating systems. Dozens of scientific applications. Need more!

cfncluster - provision an HPC cluster in minutes cfncluster is a sample code framework that deploys and maintains clusters on AWS. It is reasonably agnostic to what the cluster is for and can easily be extended to support different frameworks. The CLI is stateless, everything is done using CloudFormation or resources within AWS. #cfncluster https://github.com/awslabs/cfncluster 5 minutes

Yes, it s a real HPC cluster arthur ~ [26] $ cfncluster create boof-cluster Starting: boof-cluster Status: cfncluster-boof-cluster - CREATE_COMPLETE Output:"MasterPrivateIP"="10.0.0.17" Output:"MasterPublicIP"="54.66.174.113" Output:"GangliaPrivateURL"="http://10.0.0.17/ganglia/" Output:"GangliaPublicURL"="http://54.66.174.113/ganglia/" arthur ~ [27] $ ssh ec2-user@54.66.174.113 The authenticity of host '54.66.174.113 (54.66.174.113)' can't be established. RSA key fingerprint is 45:3e:17:76:1d:01:13:d8:d4:40:1a:74:91:77:73:31. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '54.66.174.113' (RSA) to the list of known hosts. [ec2-user@ip-10-0-0-17 ~]$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvda1 10185764 7022736 2639040 73% / tmpfs 509312 0 509312 0% /dev/shm /dev/xvdf 20961280 32928 20928352 1% /shared [ec2-user@ip-10-0-0-17 ~]$ ed hw.qsub hw.qsub: No such file or directory a #!/bin/bash # #$ -cwd #$ -j y #$ -pe mpi 2 #$ -S /bin/bash # module load openmpi-x86_64 mpirun -np 2 hostname. w 110 q [ec2-user@ip-10-0-0-17 ~]$ ll total 4 -rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub [ec2-user@ip-10-0-0-17 ~]$ qsub hw.qsub Your job 1 ("hw.qsub") has been submitted [ec2-user@ip-10-0-0-17 ~]$ [ec2-user@ip-10-0-0-17 ~]$ qstat job-id prior name user state submit/start at queue slots ja-task-id -------------------------------------------------------------------------- ---------------------- 1 0.55500 hw.qsub ec2-user r 02/01/2015 05:57:25 all.q@ip-10-0-0-44.ap-southeas 2 [ec2-user@ip-10-0-0-17 ~]$ qhost HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO [ec2-user@ip-10-0-0-17 SWAPUS ~]$ qstat ---------------------------------------------------------------------------------------------- [ec2-user@ip-10-0-0-17 ~]$ ls -l global - - - - - - - - total - 8 - ip-10-0-0-136 lx-amd64 8 1 4 8-14.6G - 1024.0M -rw-rw-r-- - 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub -rw-r--r-- 1 ec2-user ec2-user 26 Feb 1 05:57 hw.qsub.o1 ip-10-0-0-154 lx-amd64 8 1 4 8-14.6G - 1024.0M - [ec2-user@ip-10-0-0-17 ~]$ qstat [ec2-user@ip-10-0-0-17 ~]$ [ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub.o1 ip-10-0-0-136 ip-10-0-0-154 [ec2-user@ip-10-0-0-17 ~]$ Now you have a cluster, probably running CentOS 6.x, with Sun Grid Engine as a default scheduler, and openmpi and a bunch of other stuff installed. You also have a shared filesystem in /shared and an autoscaling group ready to expand the number of compute nodes in the cluster when the existing ones get busy. You can customize quite a lot via the.cfncluster/config file - check out the comments. #cfncluster

HowTO Guide for cfncluster The documentation for cfncluster is available as a set of step-by-step slides (see the link below) which will guide you through setting up your first HPC cluster in the cloud. It also contains some tips for how to make the most of your cluster and how to minimise your costs, both during your experimental phase, as well as in production. https://publicboof.s3.amazonaws.com/cfncluster/%40boofla%20-%20cfncluster%20example%20-%202015-05.pdf #cfncluster

Training There s a lot of low-cost or no-cost training available via the self-paced labs on our training site. There s also a lot of great material on youtube: https://www.youtube.com/user/ AmazonWebServices and lots of other web resources, provided by the community. There s also paid professional training provided by AWS and our partners regularly in many cities around the world. You ll find more about that under AWS Technical Essentials at our training website. http://aws.amazon.com/training

Using a grant If you receive a grant from the AstroCompute program, you ll need to have an AWS account (ie a login) that you can do use to receive the credits and start your work. The first time you establish an account with AWS, you ll need to provide a credit card number. As an institution, you ll likely be able to switch over to an invoice system afterwards (that s an easy thing to figure out between your institute and AWS afterwards before any charges are made). Once you have your account, AWS (on instruction from SKA) will drop the approved credits into your account, giving you a healthy positive balance :-) Just to be on the safe side, we recommend you first set up a billing alert, so you get notified when you have consumed your research credits and you re about to go into negative balance. http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-gettingstarted.html#d0e391

Thank You ska-astrocompute@amazon.com