Windows Azure as a Platform as a Service (PaaS)

Similar documents

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson,Nelson Araujo, Dennis Gannon, Wei Lu, and

Windows Azure Storage Essential Cloud Storage Services

Introduction to Azure: Microsoft s Cloud OS

Windows Azure and private cloud

Assignment # 1 (Cloud Computing Security)

Cloud Computing Trends

THE WINDOWS AZURE PROGRAMMING MODEL

Google Cloud Platform The basics

WINDOWS AZURE EXECUTION MODELS

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

Scaling Analysis Services in the Cloud

Microsoft Lab Of Things - Week6 Tuesday -

Amazon EC2 Product Details Page 1 of 5

WINDOWS AZURE DATA MANAGEMENT

Amazon Cloud Storage Options

HIGH-SPEED BRIDGE TO CLOUD STORAGE

Developing Microsoft Azure Solutions

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Developing Microsoft Azure Solutions 20532A; 5 days

Enterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell

INTRODUCING WINDOWS AZURE

Storing and Processing Sensor Networks Data in Public Clouds

SharePoint 2013 on Windows Azure Infrastructure David Aiken & Dan Wesley Version 1.0

Data Management in the Cloud

Service Level Agreement for Windows Azure operated by 21Vianet

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

ASP.NET Multi-Tier Windows Azure Application Using Storage Tables, Queues, and Blobs

Hypertable Architecture Overview

AppDev OnDemand Cloud Computing Learning Library

Cloud Computing with Windows Azure using your Preferred Technology

Kentico CMS 6.0 Performance Test Report. Kentico CMS 6.0. Performance Test Report February 2012 ANOTHER SUBTITLE

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Running a Workflow on a PowerCenter Grid

Azure VM Performance Considerations Running SQL Server

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Cloud Computing at Google. Architecture

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Introduction to Cloud Computing

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Chapter 27 Aneka Cloud Application Platform and Its Integration with Windows Azure

Getting Started with Sitecore Azure

Windows Azure Data Services (basics) 55093A; 3 Days

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Traditional v/s CONVRGD

MySQL Cluster New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com

Chapter 4 Cloud Computing Applications and Paradigms. Cloud Computing: Theory and Practice. 1

Benchmarking Hadoop & HBase on Violin

Bosch Video Management System High Availability with Hyper-V

In-Memory Databases MemSQL

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Enterprise Storage Solution for Hyper-V Private Cloud and VDI Deployments using Sanbolic s Melio Cloud Software Suite April 2011

System Protection for Hyper-V Whitepaper

Hyper-V Protection. User guide

CAT: Azure SQL DB Premium Deep Dive and Mythbuster

Achta's IBAN Validation API Service Overview (achta.com)

Cloud computing - Architecting in the cloud

Application Migration Best Practices. Gregory Shepard Senior Consultant InCycle Software

System Center 2012 Suite SYSTEM CENTER 2012 SUITE. BSD BİLGİSAYAR Adana

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Hadoop & Spark Using Amazon EMR

CitusDB Architecture for Real-Time Big Data

Performance Test Report KENTICO CMS 5.5. Prepared by Kentico Software in July 2010

CloudCenter Full Lifecycle Management. An application-defined approach to deploying and managing applications in any datacenter or cloud environment

Alfresco Enterprise on AWS: Reference Architecture

day 1 2 Windows Azure Platform Overview... 2 Windows Azure Compute... 3 Windows Azure Storage... 3 day 2 5

HDFS Users Guide. Table of contents

SiteCelerate white paper

ProTrack: A Simple Provenance-tracking Filesystem

Windows Azure Security

Building Scalable Applications Using Microsoft Technologies

INTRODUCING WINDOWS AZURE

INTRODUCING WINDOWS AZURE

Windows Server 2012 授權說明

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Simplifying Storage Operations By David Strom (published 3.15 by VMware) Introduction

Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel

Can the Elephants Handle the NoSQL Onslaught?

Data Semantics Aware Cloud for High Performance Analytics

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAN Conceptual and Design Basics

Distributed File Systems

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

Maximizing SQL Server Virtualization Performance

Hadoop in the Hybrid Cloud

Lecture 6 Cloud Application Development, using Google App Engine as an example

Scala Storage Scale-Out Clustered Storage White Paper

Alfresco Enterprise on Azure: Reference Architecture. September 2014

Transcription:

DEUTSCH-FRANZÖSISCHE SOMMERUNIVERSITÄT FÜR NACHWUCHSWISSENSCHAFTLER 2011 CLOUD COMPUTING : HERAUSFORDERUNGEN UND MÖGLICHKEITEN UNIVERSITÉ D ÉTÉ FRANCO-ALLEMANDE POUR JEUNES CHERCHEURS 2011 CLOUD COMPUTING : DÉFIS ET OPPORTUNITÉS Windows Azure as a Platform as a Service (PaaS) Jared Jackson Microsoft Research 17.7. 22.7. 2011

Before we begin Some Results Favorite Ice Cream Ice Cream Consumption Cookies and Cream Cheescake 3% 3% Stratiatella 10% Walnut 3% Cinamon 3% Vanilla 23% Other 29% Vanilla 33% Tiramisu 3% Malaga 3% Cherry 3% Amarena 3% Mango 3% Pistachio 7% Strawberry 10% Chocolate 13% Chocolate Chip 4% Neapolitan 4% Butter Pecan 7% Chocolate 11% Banana 3% Coffee 3% Cookies and Cream 4% Cherry 2% Coffee 2% Strawberry 5% Source: International Ice Cream Association (makeicecream.com)

Windows Azure Overview

Web Application Model Comparison Ad Hoc Application Model Machines Running IIS / ASP.NET Machines Running Windows Services Machines Running SQL Server 4

Web Application Model Comparison Ad Hoc Application Model Windows Azure Application Model Machines Running IIS / ASP.NET Machines Running Web Role Instances Windows Services Worker Role Instances Machines Running SQL Server Azure Storage Blob / Queue / Table SQL Azure 5

Key Components Fabric Controller Manages hardware and virtual machines for service Compute Web Roles Web application front end Worker Roles Utility compute VM Roles Custom compute role; You own and customize the VM Storage Blobs Binary objects Tables Entity storage Queues Role coordination SQL Azure SQL in the cloud

Key Components Fabric Controller Think of it as an automated IT department Cloud Layer on top of: Windows Server 2008 A custom version of Hyper-V called the Windows Azure Hypervisor Allows for automated management of virtual machines

Key Components Fabric Controller Think of it as an automated IT department Cloud Layer on top of: Windows Server 2008 A custom version of Hyper-V called the Windows Azure Hypervisor Allows for automated management of virtual machines It s job is to provision, deploy, monitor, and maintain applications in data centers Applications have a shape and a configuration. The configuration definition describes the shape of a service Role types Role VM sizes External and internal endpoints Local storage The configuration settings configures a service Instance count Storage keys Application-specific settings

Key Components Fabric Controller Manages nodes and edges in the fabric (the hardware) Power-on automation devices Routers / Switches Hardware load balancers Physical servers Virtual servers State transitions Current State Goal State Does what is needed to reach and maintain the goal state It s a perfect IT employee! Never sleeps Doesn t ever ask for raise Always does what you tell it to do in configuration definition and settings

Creating a New Project

Windows Azure Compute

Key Components Compute Web Roles Web Front End Cloud web server Web pages Web services You can create the following types: ASP.NET web roles ASP.NET MVC 2 web roles WCF service web roles Worker roles CGI-based web roles

Key Components Compute Worker Roles Utility compute Windows Server 2008 Background processing Each role can define an amount of local storage. Protected space on the local drive, considered volatile storage. May communicate with outside services Azure Storage SQL Azure Other Web services Can expose external and internal endpoints

Suggested Application Model Using queues for reliable messaging

Scalable, Fault Tolerant Applications Queues are the application glue Decouple parts of application, easier to scale independently; Resource allocation, different priority queues and backend servers Mask faults in worker roles (reliable messaging).

Key Components Compute VM Roles Customized Role You own the box How it works: Download Guest OS to Server 2008 Hyper-V Customize the OS as you need to Upload the differences VHD Azure runs your VM role using Base OS Differences VHD

Application Hosting

Grokking the service model Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate The service model is the same diagram written down in a declarative format You give the Fabric the service model and the binaries that go with each of those nodes The Fabric can provision, deploy and manage that diagram for you Find hardware home Copy and launch your app binaries Monitor your app and the hardware In case of failure, take action. Perhaps even relocate your app At all times, the diagram stays whole

Automated Service Management Provide code + service model Platform identifies and allocates resources, deploys the service, manages service health Configuration is handled by two files ServiceDefinition.csdef ServiceConfiguration.cscfg

Service Definition

Service Configuration

GUI Double click on Role Name in Azure Project

Deploying to the cloud We can deploy from the portal or from script VS builds two files. Encrypted package of your code Your config file You must create an Azure account, then a service, and then you deploy your code. Can take up to 20 minutes (which is better than six months)

Service Management API REST based API to manage your services X509-certs for authentication Lets you create, delete, change, upgrade, swap,. Lots of community and MSFT-built tools around the API - Easy to roll your own

The Secret Sauce The Fabric The Fabric is the brain behind Windows Azure. 1. Process service model 1. Determine resource requirements 2. Create role images 2. Allocate resources 3. Prepare nodes 1. Place role images on nodes 2. Configure settings 3. Start roles 4. Configure load balancers 5. Maintain service health 1. If role fails, restart the role, based on policy 2. If node fails, migrate the role, based on policy

Storage

Durable Storage, At Massive Scale Blob - Massive files e.g. videos, logs Drive - Use standard file system APIs Tables - Non-relational, but with few scale limits - Use SQL Azure for relational data Queues - Facilitate loosely-coupled, reliable, systems

Blob Features and Functions Store Large Objects (up to 1TB in size) Can be served through Windows Azure CDN service Standard REST Interface PutBlob Inserts a new blob, overwrites the existing blob GetBlob Get whole blob or a specific range DeleteBlob CopyBlob SnapshotBlob LeaseBlob

Two Types of Blobs Under the Hood Block Blob Targeted at streaming workloads Each blob consists of a sequence of blocks Each block is identified by a Block ID Size limit 200GB per blob Page Blob Targeted at random read/write workloads Each blob consists of an array of pages Each page is identified by its offset from the start of the blob Size limit 1TB per blob

Windows Azure Drive Provides a durable NTFS volume for Windows Azure applications to use Use existing NTFS APIs to access a durable drive Durability and survival of data on application failover Enables migrating existing NTFS applications to the cloud A Windows Azure Drive is a Page Blob Example, mount Page Blob as X:\ http://<accountname>.blob.core.windows.net/<containername>/<blobname> All writes to drive are made durable to the Page Blob Drive made durable through standard Page Blob replication Drive persists even when not mounted as a Page Blob

Windows Azure Tables Provides Structured Storage Massively Scalable Tables Billions of entities (rows) and TBs of data Can use thousands of servers as traffic grows Highly Available & Durable Data is replicated several times Familiar and Easy to use API WCF Data Services and OData.NET classes and LINQ REST with any platform or language

Windows Azure Queues Queue are performance efficient, highly available and provide reliable message delivery Simple, asynchronous work dispatch Programming semantics ensure that a message can be processed at least once Access is provided via REST

Storage Partitioning Understanding partitioning is key to understanding performance Every data object has a partition key Different for each data type (blobs, entities, queues) Partition key is unit of scale A partition can be served by a single server System load balances partitions based on traffic pattern Controls entity locality System load balances Server Busy Load balancing can take a few minutes to kick in Can take a couple of seconds for partition to be available on a different server Use exponential backoff on Server Busy Our system load balances to meet your traffic needs Single partition limits have been reached

Partition Keys In Each Abstraction Blobs Container name + Blob name Every blob and its snapshots are in a single partition Container Name image image video Entities TableName + PartitionKey Blob Name annarbor/bighouse.jpg foxborough/gillette.jpg annarbor/bighouse.jpg Entities w/ same PartitionKey value served from same partition PartitionKey (CustomerId) RowKey (RowKind) Name CreditCardNumber OrderTotal 1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx 1 Order 1 $35.12 2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx 2 Order 3 $10.00 Messages Queue Name All messages for a single queue belong to the same partition Queue jobs jobs workflow Message Message1 Message2 Message1

Scalability Targets Storage Account Capacity Up to 100 TBs Transactions Up to a few thousand requests per second Bandwidth Up to a few hundred megabytes per second Single Blob Partition Throughput up to 60 MB/s Single Queue/Table Partition Up to 500 transactions per second To go above these numbers, partition between multiple storage accounts and partitions When limit is hit, app will see 503 server busy : applications should implement exponential backoff

Partitions and Partition Ranges PartitionKey RowKey Timestamp ReleaseDate PartitionKey RowKey Timestamp ReleaseDate (Category) (Title) (Category) (Title) Action Fast & Furious 2009 Action The The Bourne Bourne Ultimatum Ultimatum 2007 2007 Animation Open Season 2 2009 Animation Open Season 2 2009 Animation The Ant Bully 2006 Animation The Ant Bully 2006 PartitionKey RowKey Timestamp ReleaseDate (Category) (Title) Comedy Office Space 1999 SciFi X-Men X-Men Origins: Origins: Wolverine 2009 2009 War Defiance 2008 War Defiance 2008

Key Selection: Things to Consider Scalability Distribute load as much as possible Hot partitions can be load balanced PartitionKey is critical for scalability Query Efficiency & Speed Avoid frequent large scans Parallelize queries Point queries are most efficient Entity group transactions Transactions across a single partition Transaction semantics & Reduce round trips See http://www.microsoftpdc.com/2009/svc09 and http://azurescope.cloudapp.net for more information

Expect Continuation Tokens Seriously! Maximum of 1000 rows in a response Maximum of 1000 rows in a response At the end of partition range boundary At the end of partition range boundary Maximum of 5 seconds to execute the query

Tables Recap Select PartitionKey and RowKey that help scale Efficient for frequently used queries Supports batch transactions Distributes load Avoid Append only patterns Distribute by using a hash etc. as prefix Always Handle continuation tokens Expect continuation tokens for range queries OR predicates are not optimized Execute the queries that form the OR predicates as separate queries Implement back-off strategy for retries WCF Data Services Server busy Load balance partitions to meet traffic needs Load on single partition has exceeded the limits Use a new context for each logical operation AddObject/AttachTo can throw exception if entity is already being tracked Point query throws an exception if resource does not exist. Use IgnoreResourceNotFoundException

Queues Their Unique Role in Building Reliable, Scalable Applications Want roles that work closely together, but are not bound together. Tight coupling leads to brittleness This can aid in scaling and performance A queue can hold an unlimited number of messages Messages must be serializable as XML Limited to 8KB in size Commonly use the work ticket pattern Why not simply use a table?

Queue Terminology

Message Lifecycle HTTP/1.1 200 OK Transfer-Encoding: chunked Content-Type: application/xml Date: Tue, PutMessage 09 Dec 2008 21:04:30 GMT Server: Nephos Queue Service Version 1.0 Microsoft-HTTPAPI/2.0 Msg 1 Msg 4 Queue GetMessage RemoveMessage (Timeout) Worker Role <?xml version="1.0" encoding="utf-8"?> <QueueMessagesList> POST http://myaccount.queue.core.windows.net/myqueue/messages Web Role Msg 2 Msg 21 <QueueMessage> DELETE <MessageId>5974b586-0df3-4e2d-ad0c-18e3892bfca2</MessageId> http://myaccount.queue.core.windows.net/myqueue/messages/messageid?popreceipt=yzq4yzg1mdi <InsertionTime>Mon, 22 Sep 2008 23:29:20 Msg GMT</InsertionTime> 3 GM0MDFiZDAwYzEw <ExpirationTime>Mon, 29 Sep 2008 23:29:20 GMT</ExpirationTime> <PopReceipt>YzQ4Yzg1MDIGM0MDFiZDAwYzEw</PopReceipt> <TimeNextVisible>Tue, 23 Sep 2008 05:29:20GMT</TimeNextVisible> <MessageText>PHRlc3Q+dG...dGVzdD4=</MessageText> </QueueMessage> </QueueMessagesList> Worker Role Msg 2

Truncated Exponential Back Off Polling Consider a backoff polling approach Each empty poll increases interval by 2x A successful sets the interval back to 1.

Removing Poison Messages Producers Consumers P 2 C 1 1. GetMessage(Q, 30 s) msg 1 4 0 3 0 2 0 1 1 01 P 1 C 2 2. GetMessage(Q, 30 s) msg 2 44

Removing Poison Messages Producers Consumers P 2 1 1 C 1 1. GetMessage(Q, 30 s) msg 1 5. C 1 crashed P 1 4 0 3 0 2 1 1 2 1 6. msg1 visible 30 s after Dequeue 2 1 C 2 2. GetMessage(Q, 30 s) msg 2 3. C2 consumed msg 2 4. DeleteMessage(Q, msg 2) 7. GetMessage(Q, 30 s) msg 1 45

Removing Poison Messages Producers Consumers P 2 4 0 3 0 1 3 2 C 1 1. Dequeue(Q, 30 sec) msg 1 5. C 1 crashed 10. C1 restarted 11. Dequeue(Q, 30 sec) msg 1 12. DequeueCount > 2 13. Delete (Q, msg1) P 1 1 2 2. Dequeue(Q, 30 sec) msg 2 3. C2 consumed msg 2 4. Delete(Q, msg 2) 7. Dequeue(Q, 30 sec) msg 1 8. C2 crashed C 2 6. msg1 visible 30s after Dequeue 9. msg1 visible 30s after Dequeue 46

Queues Recap Make message processing idempotent No need to deal with failures Do not rely on order Invisible messages result in out of order Use Dequeue count to remove poison messages Use blob to store message data with reference in message Use message count to scale Enforce threshold on message s dequeue count Messages > 8KB Batch messages Garbage collect orphaned blobs Dynamically increase/reduce workers

Windows Azure Storage Takeaways Blobs Drives Tables Queues http://blogs.msdn.com/windowsazurestorage/ http://azurescope.cloudapp.net

A Quick Exercise Then let s look at some code and some tools 49

Code AccountInformation.cs public class AccountInformation { private static string storagekey = thisisnotmykey"; private static string accountname = "jjstore"; private static StorageCredentialsAccountAndKey credentials; internal static StorageCredentialsAccountAndKey Credentials { get { if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName, storagekey); } } } } return credentials; 50

Code BlobHelper.cs public class BlobHelper { private static string defaultcontainername = "school"; private CloudBlobClient client = null; private CloudBlobContainer container = null; private void InitContainer() { if (client == null) client = new CloudStorageAccount(AccountInformation.Credentials, false).createcloudblobclient(); container = client.getcontainerreference(defaultcontainername); container.createifnotexist(); 51 } } BlobContainerPermissions permissions = container.getpermissions(); permissions.publicaccess = BlobContainerPublicAccessType.Container; container.setpermissions(permissions);

Code BlobHelper.cs public void WriteFileToBlob(string filepath) { if (client == null container == null) InitContainer(); FileInfo file = new FileInfo(filePath); CloudBlob blob = container.getblobreference(file.name); blob.properties.contenttype = GetContentType(file.Extension); blob.uploadfile(file.fullname); } // Or if you want to write a string replace the last line with: // blob.uploadtext(somestring); // And make sure you set the content type to the appropriate MIME type (e.g. text/plain ) 52

Code BlobHelper.cs public string GetBlobText(string blobname) { if (client == null container == null) InitContainer(); } CloudBlob blob = container.getblobreference(blobname); try { return blob.downloadtext(); } catch (Exception) { // The blob probably does not exist or there is no connection available return null; } 53

Application Code - Blobs private void SaveToCloudButton_Click(object sender, RoutedEventArgs e) { StringBuilder buff = new StringBuilder(); buff.appendline("lastname,firstname,email,birthday,nativelanguage,favoriteicecream,yearsinphd,graduated"); foreach (AttendeeEntity attendee in attendees) { buff.appendline(attendee.tocsvstring()); } } blobhelper.writestringtoblob("summerschoolattendees.txt", buff.tostring()); The blob is now available at: http://<accountname>.blob.core.windows.net/<containername>/<blobname> Or in this case: http://jjstore.blob.core.windows.net/school/summerschoolattendees.txt 54

Code - TableEntities using Microsoft.WindowsAzure.StorageClient; public class AttendeeEntity : TableServiceEntity { public string FirstName { get; set; } public string LastName { get; set; } public string Email { get; set; } public DateTime Birthday { get; set; } public string FavoriteIceCream { get; set; } public int YearsInPhD { get; set; } public bool Graduated { get; set; } } 55

Code - TableEntities public void UpdateFrom(AttendeeEntity other) { FirstName = other.firstname; LastName = other.lastname; Email = other.email; Birthday = other.birthday; FavoriteIceCream = other.favoriteicecream; YearsInPhD = other.yearsinphd; Graduated = other.graduated; } UpdateKeys(); public void UpdateKeys() { PartitionKey = "SummerSchool"; RowKey = Email; } 56

Code TableHelper.cs public class TableHelper { private CloudTableClient client = null; private TableServiceContext context = null; private Dictionary<string,AttendeeEntity> allattendees = null; private string tablename = "Attendees"; private CloudTableClient Client { get { if (client == null) client = new CloudStorageAccount(AccountInformation.Credentials, false).createcloudtableclient(); return client; } } private TableServiceContext Context { get { if (context == null) context = Client.GetDataServiceContext(); return context; } } } 57

Code TableHelper.cs private void ReadAllAttendees() { allattendees = new Dictionary<string, AttendeeEntity>(); CloudTableQuery<AttendeeEntity> query = Context.CreateQuery<AttendeeEntity>(tableName).AsTableServiceQuery(); } try { foreach (AttendeeEntity attendee in query) { allattendees[attendee.email] = attendee; } } catch (Exception) { // No entries in table - or other exception } 58

Code TableHelper.cs public void DeleteAttendee(string email) { if (allattendees == null) ReadAllAttendees(); if (!allattendees.containskey(email)) return; AttendeeEntity attendee = allattendees[email]; // Delete from the cloud table Context.DeleteObject(attendee); Context.SaveChanges(); } // Delete from the memory cache allattendees.remove(email); 59

Code TableHelper.cs public AttendeeEntity GetAttendee(string email) { if (allattendees == null) ReadAllAttendees(); if (allattendees.containskey(email)) return allattendees[email]; } return null; Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables 60

61 Pseudo Code TableHelper.cs public void UpdateAttendees(List<AttendeeEntity> updatedattendees) { foreach (AttendeeEntity attendee in updatedattendees) { UpdateAttendee(attendee, false); } Context.SaveChanges(SaveChangesOptions.Batch); } public void UpdateAttendee(AttendeeEntity attendee) { UpdateAttendee(attendee, true); } private void UpdateAttendee(AttendeeEntity attendee, bool savechanges) { if (allattendees.containskey(attendee.email)) { AttendeeEntity existingattendee = allattendees[attendee.email]; existingattendee.updatefrom(attendee); Context.UpdateObject(existingAttendee); } else { Context.AddObject(tableName, attendee); } if (savechanges) Context.SaveChanges(); }

Application Code Cloud Tables private void SaveButton_Click(object sender, RoutedEventArgs e) { // Write to table tablehelper.updateattendees(attendees); } That s it! Now your tables are accessible using REST service calls or any cloud storage tool. 62

63 Tools Fiddler2

Best Practices

Picking the Right VM Size Having the correct VM size can make a big difference in costs Fundamental choice larger, fewer VMs vs. many smaller instances If you scale better than linear across cores, larger VMs could save you money Pretty rare to see linear scaling across 8 cores. More instances may provide better uptime and reliability (more failures needed to take your service down) Only real right answer experiment with multiple sizes and instance counts in order to measure and find what is ideal for you

Using Your VM to the Maximum Remember: 1 role instance == 1 VM running Windows. 1 role instance!= one specific task for your code You re paying for the entire VM so why not use it? Common mistake split up code into multiple roles, each not using up CPU. Balance between using up CPU vs. having free capacity in times of need. Multiple ways to use your CPU to the fullest

Exploiting Concurrency Spin up additional processes, each with a specific task or as a unit of concurrency. May not be ideal if number of active processes exceeds number of cores Use multithreading aggressively In networking code, correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads In.NET 4, use the Task Parallel Library Data parallelism Task parallelism

Finding Good Code Neighbors Typically code falls into one or more of these categories: Memory Intensive CPU Intensive Network IO Intensive Storage IO Intensive Find code that is intensive with different resources to live together Example: distributed network caches are typically network- and memoryintensive; they may be a good neighbor for storage IO-intensive code

Scaling Appropriately Monitor your application and make sure you re scaled appropriately (not over-scaled). Spinning VMs up and down automatically is good at large scale. Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running. Being too aggressive in spinning down VMs can result in poor user experience. Trade-off between risk of failure/poor user experience due to not having excess capacity and the costs of having idling VMs. Performance Cost

Storage Costs Understand an application s storage profile and how storage billing works Make service choices based on your app profile E.g. SQL Azure has a flat fee while Windows Azure Tables charges per transaction. Service choice can make a big cost difference based on your app profile Caching and compressing. They help a lot with storage costs.

Saving Bandwidth Costs Bandwidth costs are a huge part of any popular web app s billing profile Saving bandwidth costs often lead to savings in other places Sending fewer things over the wire often means getting fewer things from storage Sending fewer things means your VM has time to do other tasks All of these tips have the side benefit of improving your web app s performance and user experience

Compressing Content 1. Gzip all output content All modern browsers can decompress on the fly. Compared to Compress, Gzip has much better compression and freedom from patented algorithms 2.Tradeoff compute costs for storage size 3.Minimize image sizes Use Portable Network Graphics (PNGs) Crush your PNGs Strip needless metadata Make all PNGs palette PNGs Uncompressed Content Gzip Minify JavaScript Minify CCS Minify Images Compressed Content

Best Practices Summary Doing less is the key to saving costs Measure everything Know your application profile in and out

Research Examples in the Cloud on another set of slides

Map Reduce on Azure Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web Hadoop implementation Hadoop has a long history and has been improved for stability Originally Designed for Cluster Systems Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure Designed from the start to use cloud primitives Built-in fault tolerance REST based interface for writing your own clients

Project Daytona - Map Reduce on Azure http://research.microsoft.com/en-us/projects/azure/daytona.aspx 76

Questions and Discussion Thank you for hosting me at the Summer School 77

LAST (Basic Local Alignment Search Tool) The most important software in bioinformatics Identify similarity between bio-sequences omputationally intensive Large number of pairwise alignment operations A BLAST running can take 700 ~ 1000 CPU hours Sequence databases growing exponentially GenBank doubled in size in about 15 months.

It is easy to parallelize BLAST Segment the input Segment processing (querying) is pleasingly parallel Segment the database (e.g., mpiblast) Needs special result reduction processing Large volume data A normal Blast database can be as large as 10GB 100 nodes means the peak storage bandwidth could reach to 1TB The output of BLAST is usually 10-100x larger than the input

Parallel BLAST engine on Azure Query-segmentation data-parallel pattern split the input sequences query partitions in parallel merge results together when done Follows the general suggested application model Web Role + Queue + Worker With three special considerations Batch job management Task parallelism on an elastic Cloud Wei Lu, Jared Jackson, and Roger Barga, AzureBlast: A Case Study of Developing Science Applications on the Cloud, in Proceedings of the 1st Workshop on Scientific Cloud Computing (Science Cloud 2010), Association for Computing Machinery, Inc., 21 June 2010

A simple Split/Join pattern Leverage multi-core of one instance argument a of NCBI-BLAST 1,2,4,8 for small, middle, large, and extra large instance size Task granularity Large partition load imbalance Small partition unnecessary overheads NCBI-BLAST overhead Data transferring overhead. Splitting task Best Practice: test runs to profiling and set size to mitigate the overhead BLAST task BLAST task BLAST task BLAST task Value of visibilitytimeout for each BLAST task, Essentially an estimate of the task run time. too small repeated computation; too large unnecessary long period of waiting time in case of the instance failure. Best Practice: Estimate the value based on the number of pair-bases in the partition and test-runs Watch out for the 2-hour maximum limitation Merging Task

Task size vs. Performance Benefit of the warm cache effect 100 sequences per partition is the best choice Instance size vs. Performance Super-linear speedup with larger size worker instances Primarily due to the memory capability. Task Size/Instance Size vs. Cost Extra-large instance generated the best and the most economical throughput Fully utilize the resource

BLAST task BLAST task Splitting task BLAST task Merging Task BLAST task Web Role Job Management Role Worker Web Portal Job registration Scaling Engine Worker Web Service Job Scheduler Global dispatch queue Worker NCBI databases Job Registry Azure Table Blast databases, temporary data, etc.) Database updating Role Azure Blob

ASP.NET program hosted by a web role instance Submit jobs Track job s status and logs Authentication/Authorization based on Live ID Web Portal Web Service Job Portal Job registration Job Registry Scaling Engine Job Scheduler The accepted job is stored into the job registry table Fault tolerance, avoid in-memory states

R. palustris as a platform for H2 production Eric Shadt, SAGE Sam Phattarasukol Harwood Lab, UW Blasted ~5,000 proteins (700K sequences) Against all NCBI non-redundant proteins: completed in 30 min Against ~5,000 proteins from another strain: completed in less than 30 sec AzureBLAST significantly saved computing time

Discovering Homologs Discover the interrelationships of known protein sequences All against All query The database is also the input query The protein database is large (4.2 GB size) Totally 9,865,668 sequences to be queried Theoretically, 100 billion sequence comparisons! Performance estimation Based on the sampling-running on one extra-large Azure instance Would require 3,216,731 minutes (6.1 years) on one desktop One of biggest BLAST jobs as far as we know This scale of experiments usually are infeasible to most scientists

Allocated a total of ~4000 instances 475 extra-large VMs (8 cores per VM), four datacenters, US (2), Western and North Europe 8 deployments of AzureBLAST Each deployment has its own co-located storage service Divide 10 million sequences into multiple segments Each will be submitted to one deployment as one job for execution Each segment consists of smaller partitions When load imbalances, redistribute the load manually 5 0 62 6 2 62 6 2 6 6 2 5 2 0

Total size of the output result is ~230GB The number of total hits is 1,764,579,487 Started at March 25 th, the last task completed on April 8 th (10 days compute) But based our estimates, real working instance time should be 6~8 day Look into log data to analyze what took place 5 0 62 6 2 62 6 2 6 6 2 5 2 0

A normal log record should be 3/31/2010 6:14 RD00155D3611B0 Executing the task 251523... 3/31/2010 6:25 RD00155D3611B0 Execution of task 251523 is done, it took 10.9mins 3/31/2010 6:25 RD00155D3611B0 Executing the task 251553... 3/31/2010 6:44 RD00155D3611B0 Execution of task 251553 is done, it took 19.3mins 3/31/2010 6:44 RD00155D3611B0 Executing the task 251600... 3/31/2010 7:02 RD00155D3611B0 Execution of task 251600 is done, it took 17.27 mins Otherwise, something is wrong (e.g., task failed to complete) 3/31/2010 8:22 RD00155D3611B0 Executing the task 251774... 3/31/2010 9:50 RD00155D3611B0 Executing the task 251895... 3/31/2010 11:12 RD00155D3611B0 Execution of task 251895 is done, it took 82 mins

North Europe Data Center, totally 34,256 tasks processed All 62 compute nodes lost tasks and then came back in a group. This is an Update domain ~ 6 nodes in one group ~30 mins

West Europe Datacenter; 30,976 tasks are completed, and job was killed 35 Nodes experience blob writing failure at same time A reasonable guess: the Fault Domain is working

MODISAzure : Computing Evapotranspiration (ET) in The Cloud You never miss the water till the well has run dry Irish Proverb

Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration, or evaporation through plant membranes, by plants. ET = Rn + ρ a c p δq g a ( + γ 1 + g a g s )λ υ Penman-Monteith (1964) ET = Water volume evapotranspired (m 3 s -1 m -2 ) Δ = Rate of change of saturation specific humidity with air temperature.(pa K -1 ) λ v = Latent heat of vaporization (J/g) R n = Net radiation (W m -2 ) c p = Specific heat capacity of air (J kg -1 K -1 ) ρ a = dry air density (kg m -3 ) δq = vapor pressure deficit (Pa) g a = Conductivity of air (inverse of r a ) (m s -1 ) g s = Conductivity of plant stoma, air (inverse of r s ) (m s -1 ) γ = Psychrometric constant (γ 66 Pa K -1 ) Lots of inputs: big data reduction Some of the inputs are not so simple Estimating resistance/conductivity across a catchment can be tricky

Climate classification ~1MB (1file) FLUXNET curated sensor dataset (30GB, 960 files) Vegetative clumping ~5MB (1file) NCEP/NCAR ~100MB (4K files) NASA MODIS imagery source archives 5 TB (600K files) 20 US year = 1 global year FLUXNET curated field dataset 2 KB (1 file)

Data collection (map) stage Downloads requested input tiles from NASA ftp sites Includes geospatial lookup for non-sinusoidal tiles that will contribute to a reprojected sinusoidal tile Reprojection (map) stage Converts source tile(s) to intermediate result sinusoidal tiles Simple nearest neighbor or spline algorithms Derivation reduction stage First stage visible to scientist Computes ET in our initial use Analysis reduction stage Optional second stage visible to scientist Enables production of science analysis artifacts such as maps, tables, virtual sensors Source Imagery Download Sites Data Collection Stage Reprojection Queue... Download Queue Reprojection Stage Source Metadata Reduction #1 Queue AzureMODIS Service Web Role Portal Derivation Reduction Stage Request Queue Reduction #2 Queue Scientists Science results Analysis Reduction Stage Scientific Results Download http://research.microsoft.com/en-us/projects/azure/azuremodis.aspx

<PipelineStage> Request MODISAzure Service (Web Role) Service Monitor (Worker Role) <PipelineStage>Job Queue Persist Parse & Persist <PipelineStage>JobStatus <PipelineStage>TaskStatus <PipelineStage>Task Queue ModisAzure Service is the Web Role front door Receives all user requests Queues request to appropriate Download, Reprojection, or Reduction Job Queue Dispatch Service Monitor is a dedicated Worker Role Parses all job requests into tasks recoverable units of work Execution status of all jobs and tasks persisted in Tables

Service Monitor (Worker Role) Parse & Persist <PipelineStage>TaskStatus <PipelineStage>Task Queue Dispatch GenericWorker (Worker Role) <Input>Data Storage All work actually done by a Worker Role Dequeues tasks created by the Service Monitor Retries failed tasks 3 times Maintains all task status Sandboxes science or other executable Marshalls all storage from/to Azure blob storage to/from local Azure Worker instance files

Reprojection Request Job Queue Persist ReprojectionJobStatus Each entity specifies a single reprojection job request Reprojection Data Storage Service Monitor (Worker Role) Task Queue Dispatch GenericWorker (Worker Role) Parse & Persist Points to ReprojectionTaskStatus ScanTimeList SwathGranuleMeta Swath Source Data Storage Each entity specifies a single reprojection task (i.e. a single tile) Query this table to get the list of satellite scan times that cover a target tile Query this table to get geo-metadata (e.g. boundaries) for each swath tile

Total: $1420 Computational costs driven by data scale and need to run reduction multiple times Storage costs driven by data scale and 6 month project duration Small with respect to the people costs even at graduate student rates! Source Imagery Download Sites Data Collection Stage $50 upload $450 storage Reprojection Queue... Download Queue Reprojection Stage $420 cpu $60 download 400-500 GB 60K files 10 MB/sec 11 hours <10 workers 400 GB 45K files 3500 hours 20-100 workers Source Metadata Reduction #1 Queue AzureMODIS Service Web Role Portal Derivation Reduction Stage $216 cpu $1 download $6 storage 5-7 GB 5.5K files 1800 hours 20-100 workers Request Queue Reduction #2 Queue Scientists Scientific Results Download Analysis Reduction Stage $216 cpu $2 download $9 storage <10 GB ~1K files 1800 hours 20-100 workers

Clouds are the largest scale computer centers ever constructed and have the potential to be important to both large and small scale science problems. Equally import they can increase participation in research, providing needed resources to users/communities without ready access. Clouds suitable for loosely coupled data parallel applications, and can support many interesting programming patterns, but tightly coupled lowlatency applications do not perform optimally on clouds today. Provide valuable fault tolerance and scalability abstractions Clouds as amplifier for familiar client tools and on premise compute. Clouds services to support research provide considerable leverage for both individual researchers and entire communities of researchers.