Last time General overview, motivation, expected outcomes, other formalities, etc. Please register for course Online (if possible), or talk to CS secretaries Cloud computing introduction General concepts Cloud definitions Different types of clouds Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS) Today How to develop applications (services) for clouds? Focus on Infrastructure as a Service and Platform as a Service, including Amazon Web Services EC2 (IaaS) Cloud Foundry, Google App Engine, Amazon Elastic Beanstalk (PaaS) Difference between using IaaS and PaaS? What is different from developing self-hosted applications? Lecture + in-lab tutorial (Amazon WS) IaaS Providers Offers compute (VMs), storage, & networking Sometimes also various related services Payment models Per hour, discounts for longer times For what? CPU, RAM, Disk storage, network transfers, etc Cost estimation/comparison hard Pre-defined or tailored VM configurations? Small/Medium/Large vs. custom All in all, difficult to compare prices! Example providers Amazon AWS, first and by far largest Google, Microsoft, Rackspace, Elastichosts, etc. CityCloud, IPeer, etc. (Swedish providers) Amazon Web Services, overview Compute services Elastic Compute Cloud (EC2) pay-per-use VM What we will use mostly Map-reduce, auto scaling, elastic load balancing Storage services Simple Storage Service (S3) For any data Elastic Block Store (EBS) For EC2 VMs Networking: DNS and VPN-style tools Databases: relational and nosql Deployment and management Application services 1
Amazon, compute concepts Instances Running VMs with certain hardware ~15 ~30 different configurations small, medium, large, etc. Custom types with more memory, more CPU, GPUs, I/O optimized, SSD storage, etc. Cost per hour (on-demand) reserved (1 or 3 years) Spot prices (dynamically set price) Regions and Availability zones Region: world location, e.g., US east, EU, Asia, Availability zone Distinct locations for fault tolerance Multiple per region Big crash in 2011 revealed zone overlaps Keep track of where you run your VMs, etc! More AWS compute concepts Amazon Machine Images (AMIs) Custom VM images with particular operating system, platform tools, etc. Some are more expensive due to software licenses More than 500 pre-made AMIs available Can define your own AMIs 1. Select a suitable base image (right OS, etc) 2. Install the software you need 3. Save image of running VM as new AMI Rather tricky part 4. Create multiple instances of your custom VM Note, not the way to keep data persistent Amazon Compute (cont.) AWS Marketplace 3 rd party software offered as AMIs Workspaces Virtual desktops (Mac or PC) in the cloud Auto-scaling Automatically adds/removes VMs based on rules + monitoring data Lambda Code executed in response to events Charged for each 100ms execution (no VM ) Amazon, storage concepts VM local disk is not persistent Each VM instance is booted from the AMI, any update to local disk is lost when VM terminates/stops Elastic Block Store (EBS) Persistent storage for EC2 images Raw, unformatted device, can create file systems, etc. on top Can be mounted like device/file system by EC2 instances But only one instance at a time An instance can have multiple block stores Can be used to store snapshots Bound to particular availability zone Cost per GB/month + per 10^6 I/O ops. 2
Amazon, more storage Amazon, even more storage Simple Storage Service (S3) Two-level Bucket object container with unique name Key unique name in bucket (like file name) Object are indexed by bucket+key, http:// s3.amazonaws.com/bucket/key Operations Create/Delete bucket Put/Get/Delete data from bucket Buckets are located in a specific region Cost per GB/month, per 1K ops, per GB/ transferred Amazon Glacier Archiving service Low cost storage for backup and archiving As low as $0.01 per GB and month Cheap? Can buy disk for less than $0.05 per GB Data transfer costs? Glacier offers 99.99999999% availability 0.03 s downtime per year However, data retrieval is slow 3-5 hours to retrieve data after request Available for 24h post request Amazon CloudFront A Content Delivery Network (CDN) Online caching of read-only objects closer to end users Latency and redundancy Regions and locations Regions Full set of services available (some limitations) Edge locations CloudFront, Route 53 Origin Server (S3, a VM, etc., or your server) Edge locations distributed around the world Data pulled to edge locations on demand 3
Amazon networking Networking - general pricing scheme Costs per ($0.00 - $0.12 / GB) for data transfer Large volume discounts Differences depending on source and destination To AWS, From AWS, within AWS Route 53 Domain Name System (DNS) Integrated with other AWS services Price: per zone + per 1M queries Direct connect Dedicated network access to your site For high and/or consistent network performance Port speed in multiples of 50M, 100M, 200M,, 1G, 10G Can be cheaper than common network pricing Amazon networking (cont.) Elastic Load Balancing (ELB) Redistributes requests across VM instances Availability zones Performance and robustness Integrates with auto-scaling Why this type of auto-scaling? Amazon networking (cont.) Virtual Private Cloud (VPC) EC2 resources Custom network (subnets, routing, etc.) Databases RDS -Relational Database Service Relational DB with SQL Similar to MySQL, PostreSQL, Oracle, etc. Simplified management (backups, etc.) DB instances similar to VMs, but with connected storage, optional fast storage, etc. DynamoDB nosql, i.e., a key-value store Fast and simple, but lack power of SQL Not suitable for all types of applications Pricing based on throughput ElastiCache In-memory cache for read-heavy applications Redshift Data warehouse (DB + reporting) 4
Deployment & management Amazon CloudWatch Monitoring for Cloud resources and applications Aggregation and alarm functionality Metrics Custom for most Amazon services VM CPU load, network traffic, etc. Your own metrics can be integrated Pricing based on: Number of metrics, requests, and alarms Monitoring frequency (1 min is lowest ) More management CloudFormation Templates for easy infrastructure deployment Bundles EC2, Storage, DB, etc. into a single stack that can be started/stopped Exists pre-defined templates for common stacks (Wordpress, Joomla, etc.) IAM Identity and Access management Define own users, groups, roles, permissions Used in this course for your accounts CloudTrail Tracks all API calls for policy and compliance AWS OpsWorks Custom deployment and application lifecycle management And more Application services Utility services for building AWS applications CloudSearch, Email (SES), Notifications (SNS), message queues (SQS) Simple integration for Amazon-native application Many other options for these functionalities Once again, consider lock-in Elastic Transcoder Codecs for audio and video stored in S3 AppStream Adds streaming to latency-sensitive applications such as games Custom application + client needed Amazon Analytics Elastic MapReduce (EMR) Compute framework for parallel data analysis Combined storage with high performance VMs More about MapReduce later in the course Kinesis Real-time processing of streaming data Data Pipeline Move, integrate, process data across compute and storage in multiple regions 5
Amazon, getting started Login Credentials 1. Username/password Limited functionality available (only Web UI) 2. Access ID + secret access key Available for generation/download in Web UI Query API (REST) + elasticfox (firefox plugin), Programming libraries (in Java etc.) As environment variables: AWS_ACCESS_KEY_ID and AWS_SECRET_KEY 3. Certificates Login to VMs (using ssh) Needs to be generated by you Our course set-up a bit complicated Ensure not to interfere with each other More details in tutorial Security concerns: protect your access keys! Do protect your security credentials Amazon, interfacing 1. Amazon Management Console The Web UI Useful for getting to know the services + debugging 2. Command line tools Windows or Linux, Linux version installed@cs 3. Programmatically Network interfaces: SOAP and REST Low-level, can be tedious to use Language bindings SDKs for Java, Ruby, etc. Eclipse plugins also available Amazon some observations First and by far largest provider Google, Microsoft, and other challenging Rapid roll-out of new services Exponential growth the last few years This is not hosting, but a programmable infrastructure Vendor lock-in? Prices are reduced often >40 price reductions the last few years Is it really getting cheaper? 6
Development for the cloud At large scale, errors are norm, not exception How often does a data center disk fail? How to avoid failure? Design for redundancy and fault tolerance Fail constantly! Netflix s chaos monkey Randomly kills VMs and services Extended with Latency Monkey, Chaos Gorilla, Chaos Kong etc. Resource sharing is complicated Multi-tenancy and noisy neighbors What works in your own servers, may not work in the cloud PaaS Development Tools for service development & deployment Typically includes databases, application servers, etc. One (or a few) programming languages Often tightly coupled to IaaS Exists for most platforms, such as: Java Cloud Foundry, Google AppEngine, Jelastic, Cloud Bees, etc. PHP CloudControl, etc Python Heroku, Google App Engine, etc..net Microsoft Azure Cloud Foundry Cloud Foundry, procedure VMWare s PaaS initiative Much attention recently One of few PaaS solutions with clear separation from IaaS Supports various application frameworks: Spring Java Inversion of Control (IoC) container with support for databases, XML processing, and almost everything else Ruby Neat interpreted programming language Various Web toolkits for Java or Ruby Ruby on Rails, Sinatra, Grails, Node.js 1. Implement your application in Java or Ruby 2. Connect it to Cloud Foundry services 3. Deploy it to Cloud Foundry VM Tooling (for the above steps): VMC Command line interface Eclipse Cloud Foundry extensions STS Spring framework extensions 7
Cloud Foundry, services MongoDB nosql database Scales to 1000+ nodes MySQL Standard relational database Scales to ~10 nodes (if clustered) RabbitMQ Implementation of AMQP Standard message queue for robust messaging (publish-subscribe) Redis Key-value store (like nosql database) Cloud Foundry, deployment and infrastructure Micro Cloud Foundry Your own VM, to be run locally Packed as VMWare image, requires VMWare Player (or similar) to run Seamless transition to real cloud Run VM In IaaS provider For example, Amazon EC2 Need to import Micro Cloud Foundry VM Google App Engine Runs web apps (really well!) One-trick-pony Handles HTTP(S) requests No performance tuning Everything scalable Number of apps, reqs/s, storage Pricing similar to Amazon Pay for instances, storage, network, etc. Some features Persistent storage Automatic scaling and load balancing Asynchronous task queues, etc. Google App Engine (cont.) Supports Java, Python, PHP, Go Exists SDKs for supported languages Very restricted environment Limited sandbox, Java class white list Limitations to SQL Performance limits: application boot time, response time < 30 seconds Compare: SDKs for smart phone development Development 1. Build application locally 2. Test locally in embedded Web server 3. Upload application to App Engine Notably: no VMs Google does not use virtualization 8
Amazon Elastic Beanstalk PaaS solution by Amazon built on top of AWS Elastic Beanstalk makes use of Compute (EC2) Storage (S3) Notification (SNS) Load balancing (ELB) Auto-scaling Languages: Node.js, PHP, Python, Java,.NET Apache Tomcat, etc. IaaS vs. PaaS Developing for IaaS Manage your application Manage the infrastructure Tedious, but greater flexibility Developing for PaaS Manage your application Simpler Limited tooling/application type restrictions Shows how PaaS can be built on top of IaaS Next time. After break: In-lab tutorial (setup Amazon WS) MA416 and MA426 Next lecture Virtualization (and VMs) Assignment 1: Full description online now Deadline: 2015-02-10 Implement small content management system Using IaaS Amazon Web Services Using PaaS - Google App Engine 9