DEUTSCH-FRANZÖSISCHE SOMMERUNIVERSITÄT FÜR NACHWUCHSWISSENSCHAFTLER 2011 CLOUD COMPUTING : HERAUSFORDERUNGEN UND MÖGLICHKEITEN UNIVERSITÉ D ÉTÉ FRANCO-ALLEMANDE POUR JEUNES CHERCHEURS 2011 CLOUD COMPUTING : DÉFIS ET OPPORTUNITÉS Windows Azure as a Platform as a Service (PaaS) Jared Jackson Microsoft Research 17.7. 22.7. 2011
Before we begin Some Results Favorite Ice Cream Ice Cream Consumption Cookies and Cream Cheescake 3% 3% Stratiatella 10% Walnut 3% Cinamon 3% Vanilla 23% Other 29% Vanilla 33% Tiramisu 3% Malaga 3% Cherry 3% Amarena 3% Mango 3% Pistachio 7% Strawberry 10% Chocolate 13% Chocolate Chip 4% Neapolitan 4% Butter Pecan 7% Chocolate 11% Banana 3% Coffee 3% Cookies and Cream 4% Cherry 2% Coffee 2% Strawberry 5% Source: International Ice Cream Association (makeicecream.com)
Windows Azure Overview
Web Application Model Comparison Ad Hoc Application Model Machines Running IIS / ASP.NET Machines Running Windows Services Machines Running SQL Server 4
Web Application Model Comparison Ad Hoc Application Model Windows Azure Application Model Machines Running IIS / ASP.NET Machines Running Web Role Instances Windows Services Worker Role Instances Machines Running SQL Server Azure Storage Blob / Queue / Table SQL Azure 5
Key Components Fabric Controller Manages hardware and virtual machines for service Compute Web Roles Web application front end Worker Roles Utility compute VM Roles Custom compute role; You own and customize the VM Storage Blobs Binary objects Tables Entity storage Queues Role coordination SQL Azure SQL in the cloud
Key Components Fabric Controller Think of it as an automated IT department Cloud Layer on top of: Windows Server 2008 A custom version of Hyper-V called the Windows Azure Hypervisor Allows for automated management of virtual machines
Key Components Fabric Controller Think of it as an automated IT department Cloud Layer on top of: Windows Server 2008 A custom version of Hyper-V called the Windows Azure Hypervisor Allows for automated management of virtual machines It s job is to provision, deploy, monitor, and maintain applications in data centers Applications have a shape and a configuration. The configuration definition describes the shape of a service Role types Role VM sizes External and internal endpoints Local storage The configuration settings configures a service Instance count Storage keys Application-specific settings
Key Components Fabric Controller Manages nodes and edges in the fabric (the hardware) Power-on automation devices Routers / Switches Hardware load balancers Physical servers Virtual servers State transitions Current State Goal State Does what is needed to reach and maintain the goal state It s a perfect IT employee! Never sleeps Doesn t ever ask for raise Always does what you tell it to do in configuration definition and settings
Creating a New Project
Windows Azure Compute
Key Components Compute Web Roles Web Front End Cloud web server Web pages Web services You can create the following types: ASP.NET web roles ASP.NET MVC 2 web roles WCF service web roles Worker roles CGI-based web roles
Key Components Compute Worker Roles Utility compute Windows Server 2008 Background processing Each role can define an amount of local storage. Protected space on the local drive, considered volatile storage. May communicate with outside services Azure Storage SQL Azure Other Web services Can expose external and internal endpoints
Suggested Application Model Using queues for reliable messaging
Scalable, Fault Tolerant Applications Queues are the application glue Decouple parts of application, easier to scale independently; Resource allocation, different priority queues and backend servers Mask faults in worker roles (reliable messaging).
Key Components Compute VM Roles Customized Role You own the box How it works: Download Guest OS to Server 2008 Hyper-V Customize the OS as you need to Upload the differences VHD Azure runs your VM role using Base OS Differences VHD
Application Hosting
Grokking the service model Imagine white-boarding out your service architecture with boxes for nodes and arrows describing how they communicate The service model is the same diagram written down in a declarative format You give the Fabric the service model and the binaries that go with each of those nodes The Fabric can provision, deploy and manage that diagram for you Find hardware home Copy and launch your app binaries Monitor your app and the hardware In case of failure, take action. Perhaps even relocate your app At all times, the diagram stays whole
Automated Service Management Provide code + service model Platform identifies and allocates resources, deploys the service, manages service health Configuration is handled by two files ServiceDefinition.csdef ServiceConfiguration.cscfg
Service Definition
Service Configuration
GUI Double click on Role Name in Azure Project
Deploying to the cloud We can deploy from the portal or from script VS builds two files. Encrypted package of your code Your config file You must create an Azure account, then a service, and then you deploy your code. Can take up to 20 minutes (which is better than six months)
Service Management API REST based API to manage your services X509-certs for authentication Lets you create, delete, change, upgrade, swap,. Lots of community and MSFT-built tools around the API - Easy to roll your own
The Secret Sauce The Fabric The Fabric is the brain behind Windows Azure. 1. Process service model 1. Determine resource requirements 2. Create role images 2. Allocate resources 3. Prepare nodes 1. Place role images on nodes 2. Configure settings 3. Start roles 4. Configure load balancers 5. Maintain service health 1. If role fails, restart the role, based on policy 2. If node fails, migrate the role, based on policy
Storage
Durable Storage, At Massive Scale Blob - Massive files e.g. videos, logs Drive - Use standard file system APIs Tables - Non-relational, but with few scale limits - Use SQL Azure for relational data Queues - Facilitate loosely-coupled, reliable, systems
Blob Features and Functions Store Large Objects (up to 1TB in size) Can be served through Windows Azure CDN service Standard REST Interface PutBlob Inserts a new blob, overwrites the existing blob GetBlob Get whole blob or a specific range DeleteBlob CopyBlob SnapshotBlob LeaseBlob
Two Types of Blobs Under the Hood Block Blob Targeted at streaming workloads Each blob consists of a sequence of blocks Each block is identified by a Block ID Size limit 200GB per blob Page Blob Targeted at random read/write workloads Each blob consists of an array of pages Each page is identified by its offset from the start of the blob Size limit 1TB per blob
Windows Azure Drive Provides a durable NTFS volume for Windows Azure applications to use Use existing NTFS APIs to access a durable drive Durability and survival of data on application failover Enables migrating existing NTFS applications to the cloud A Windows Azure Drive is a Page Blob Example, mount Page Blob as X:\ http://<accountname>.blob.core.windows.net/<containername>/<blobname> All writes to drive are made durable to the Page Blob Drive made durable through standard Page Blob replication Drive persists even when not mounted as a Page Blob
Windows Azure Tables Provides Structured Storage Massively Scalable Tables Billions of entities (rows) and TBs of data Can use thousands of servers as traffic grows Highly Available & Durable Data is replicated several times Familiar and Easy to use API WCF Data Services and OData.NET classes and LINQ REST with any platform or language
Windows Azure Queues Queue are performance efficient, highly available and provide reliable message delivery Simple, asynchronous work dispatch Programming semantics ensure that a message can be processed at least once Access is provided via REST
Storage Partitioning Understanding partitioning is key to understanding performance Every data object has a partition key Different for each data type (blobs, entities, queues) Partition key is unit of scale A partition can be served by a single server System load balances partitions based on traffic pattern Controls entity locality System load balances Server Busy Load balancing can take a few minutes to kick in Can take a couple of seconds for partition to be available on a different server Use exponential backoff on Server Busy Our system load balances to meet your traffic needs Single partition limits have been reached
Partition Keys In Each Abstraction Blobs Container name + Blob name Every blob and its snapshots are in a single partition Container Name image image video Entities TableName + PartitionKey Blob Name annarbor/bighouse.jpg foxborough/gillette.jpg annarbor/bighouse.jpg Entities w/ same PartitionKey value served from same partition PartitionKey (CustomerId) RowKey (RowKind) Name CreditCardNumber OrderTotal 1 Customer-John Smith John Smith xxxx-xxxx-xxxx-xxxx 1 Order 1 $35.12 2 Customer-Bill Johnson Bill Johnson xxxx-xxxx-xxxx-xxxx 2 Order 3 $10.00 Messages Queue Name All messages for a single queue belong to the same partition Queue jobs jobs workflow Message Message1 Message2 Message1
Scalability Targets Storage Account Capacity Up to 100 TBs Transactions Up to a few thousand requests per second Bandwidth Up to a few hundred megabytes per second Single Blob Partition Throughput up to 60 MB/s Single Queue/Table Partition Up to 500 transactions per second To go above these numbers, partition between multiple storage accounts and partitions When limit is hit, app will see 503 server busy : applications should implement exponential backoff
Partitions and Partition Ranges PartitionKey RowKey Timestamp ReleaseDate PartitionKey RowKey Timestamp ReleaseDate (Category) (Title) (Category) (Title) Action Fast & Furious 2009 Action The The Bourne Bourne Ultimatum Ultimatum 2007 2007 Animation Open Season 2 2009 Animation Open Season 2 2009 Animation The Ant Bully 2006 Animation The Ant Bully 2006 PartitionKey RowKey Timestamp ReleaseDate (Category) (Title) Comedy Office Space 1999 SciFi X-Men X-Men Origins: Origins: Wolverine 2009 2009 War Defiance 2008 War Defiance 2008
Key Selection: Things to Consider Scalability Distribute load as much as possible Hot partitions can be load balanced PartitionKey is critical for scalability Query Efficiency & Speed Avoid frequent large scans Parallelize queries Point queries are most efficient Entity group transactions Transactions across a single partition Transaction semantics & Reduce round trips See http://www.microsoftpdc.com/2009/svc09 and http://azurescope.cloudapp.net for more information
Expect Continuation Tokens Seriously! Maximum of 1000 rows in a response Maximum of 1000 rows in a response At the end of partition range boundary At the end of partition range boundary Maximum of 5 seconds to execute the query
Tables Recap Select PartitionKey and RowKey that help scale Efficient for frequently used queries Supports batch transactions Distributes load Avoid Append only patterns Distribute by using a hash etc. as prefix Always Handle continuation tokens Expect continuation tokens for range queries OR predicates are not optimized Execute the queries that form the OR predicates as separate queries Implement back-off strategy for retries WCF Data Services Server busy Load balance partitions to meet traffic needs Load on single partition has exceeded the limits Use a new context for each logical operation AddObject/AttachTo can throw exception if entity is already being tracked Point query throws an exception if resource does not exist. Use IgnoreResourceNotFoundException
Queues Their Unique Role in Building Reliable, Scalable Applications Want roles that work closely together, but are not bound together. Tight coupling leads to brittleness This can aid in scaling and performance A queue can hold an unlimited number of messages Messages must be serializable as XML Limited to 8KB in size Commonly use the work ticket pattern Why not simply use a table?
Queue Terminology
Message Lifecycle HTTP/1.1 200 OK Transfer-Encoding: chunked Content-Type: application/xml Date: Tue, PutMessage 09 Dec 2008 21:04:30 GMT Server: Nephos Queue Service Version 1.0 Microsoft-HTTPAPI/2.0 Msg 1 Msg 4 Queue GetMessage RemoveMessage (Timeout) Worker Role <?xml version="1.0" encoding="utf-8"?> <QueueMessagesList> POST http://myaccount.queue.core.windows.net/myqueue/messages Web Role Msg 2 Msg 21 <QueueMessage> DELETE <MessageId>5974b586-0df3-4e2d-ad0c-18e3892bfca2</MessageId> http://myaccount.queue.core.windows.net/myqueue/messages/messageid?popreceipt=yzq4yzg1mdi <InsertionTime>Mon, 22 Sep 2008 23:29:20 Msg GMT</InsertionTime> 3 GM0MDFiZDAwYzEw <ExpirationTime>Mon, 29 Sep 2008 23:29:20 GMT</ExpirationTime> <PopReceipt>YzQ4Yzg1MDIGM0MDFiZDAwYzEw</PopReceipt> <TimeNextVisible>Tue, 23 Sep 2008 05:29:20GMT</TimeNextVisible> <MessageText>PHRlc3Q+dG...dGVzdD4=</MessageText> </QueueMessage> </QueueMessagesList> Worker Role Msg 2
Truncated Exponential Back Off Polling Consider a backoff polling approach Each empty poll increases interval by 2x A successful sets the interval back to 1.
Removing Poison Messages Producers Consumers P 2 C 1 1. GetMessage(Q, 30 s) msg 1 4 0 3 0 2 0 1 1 01 P 1 C 2 2. GetMessage(Q, 30 s) msg 2 44
Removing Poison Messages Producers Consumers P 2 1 1 C 1 1. GetMessage(Q, 30 s) msg 1 5. C 1 crashed P 1 4 0 3 0 2 1 1 2 1 6. msg1 visible 30 s after Dequeue 2 1 C 2 2. GetMessage(Q, 30 s) msg 2 3. C2 consumed msg 2 4. DeleteMessage(Q, msg 2) 7. GetMessage(Q, 30 s) msg 1 45
Removing Poison Messages Producers Consumers P 2 4 0 3 0 1 3 2 C 1 1. Dequeue(Q, 30 sec) msg 1 5. C 1 crashed 10. C1 restarted 11. Dequeue(Q, 30 sec) msg 1 12. DequeueCount > 2 13. Delete (Q, msg1) P 1 1 2 2. Dequeue(Q, 30 sec) msg 2 3. C2 consumed msg 2 4. Delete(Q, msg 2) 7. Dequeue(Q, 30 sec) msg 1 8. C2 crashed C 2 6. msg1 visible 30s after Dequeue 9. msg1 visible 30s after Dequeue 46
Queues Recap Make message processing idempotent No need to deal with failures Do not rely on order Invisible messages result in out of order Use Dequeue count to remove poison messages Use blob to store message data with reference in message Use message count to scale Enforce threshold on message s dequeue count Messages > 8KB Batch messages Garbage collect orphaned blobs Dynamically increase/reduce workers
Windows Azure Storage Takeaways Blobs Drives Tables Queues http://blogs.msdn.com/windowsazurestorage/ http://azurescope.cloudapp.net
A Quick Exercise Then let s look at some code and some tools 49
Code AccountInformation.cs public class AccountInformation { private static string storagekey = thisisnotmykey"; private static string accountname = "jjstore"; private static StorageCredentialsAccountAndKey credentials; internal static StorageCredentialsAccountAndKey Credentials { get { if (credentials == null) credentials = new StorageCredentialsAccountAndKey(accountName, storagekey); } } } } return credentials; 50
Code BlobHelper.cs public class BlobHelper { private static string defaultcontainername = "school"; private CloudBlobClient client = null; private CloudBlobContainer container = null; private void InitContainer() { if (client == null) client = new CloudStorageAccount(AccountInformation.Credentials, false).createcloudblobclient(); container = client.getcontainerreference(defaultcontainername); container.createifnotexist(); 51 } } BlobContainerPermissions permissions = container.getpermissions(); permissions.publicaccess = BlobContainerPublicAccessType.Container; container.setpermissions(permissions);
Code BlobHelper.cs public void WriteFileToBlob(string filepath) { if (client == null container == null) InitContainer(); FileInfo file = new FileInfo(filePath); CloudBlob blob = container.getblobreference(file.name); blob.properties.contenttype = GetContentType(file.Extension); blob.uploadfile(file.fullname); } // Or if you want to write a string replace the last line with: // blob.uploadtext(somestring); // And make sure you set the content type to the appropriate MIME type (e.g. text/plain ) 52
Code BlobHelper.cs public string GetBlobText(string blobname) { if (client == null container == null) InitContainer(); } CloudBlob blob = container.getblobreference(blobname); try { return blob.downloadtext(); } catch (Exception) { // The blob probably does not exist or there is no connection available return null; } 53
Application Code - Blobs private void SaveToCloudButton_Click(object sender, RoutedEventArgs e) { StringBuilder buff = new StringBuilder(); buff.appendline("lastname,firstname,email,birthday,nativelanguage,favoriteicecream,yearsinphd,graduated"); foreach (AttendeeEntity attendee in attendees) { buff.appendline(attendee.tocsvstring()); } } blobhelper.writestringtoblob("summerschoolattendees.txt", buff.tostring()); The blob is now available at: http://<accountname>.blob.core.windows.net/<containername>/<blobname> Or in this case: http://jjstore.blob.core.windows.net/school/summerschoolattendees.txt 54
Code - TableEntities using Microsoft.WindowsAzure.StorageClient; public class AttendeeEntity : TableServiceEntity { public string FirstName { get; set; } public string LastName { get; set; } public string Email { get; set; } public DateTime Birthday { get; set; } public string FavoriteIceCream { get; set; } public int YearsInPhD { get; set; } public bool Graduated { get; set; } } 55
Code - TableEntities public void UpdateFrom(AttendeeEntity other) { FirstName = other.firstname; LastName = other.lastname; Email = other.email; Birthday = other.birthday; FavoriteIceCream = other.favoriteicecream; YearsInPhD = other.yearsinphd; Graduated = other.graduated; } UpdateKeys(); public void UpdateKeys() { PartitionKey = "SummerSchool"; RowKey = Email; } 56
Code TableHelper.cs public class TableHelper { private CloudTableClient client = null; private TableServiceContext context = null; private Dictionary<string,AttendeeEntity> allattendees = null; private string tablename = "Attendees"; private CloudTableClient Client { get { if (client == null) client = new CloudStorageAccount(AccountInformation.Credentials, false).createcloudtableclient(); return client; } } private TableServiceContext Context { get { if (context == null) context = Client.GetDataServiceContext(); return context; } } } 57
Code TableHelper.cs private void ReadAllAttendees() { allattendees = new Dictionary<string, AttendeeEntity>(); CloudTableQuery<AttendeeEntity> query = Context.CreateQuery<AttendeeEntity>(tableName).AsTableServiceQuery(); } try { foreach (AttendeeEntity attendee in query) { allattendees[attendee.email] = attendee; } } catch (Exception) { // No entries in table - or other exception } 58
Code TableHelper.cs public void DeleteAttendee(string email) { if (allattendees == null) ReadAllAttendees(); if (!allattendees.containskey(email)) return; AttendeeEntity attendee = allattendees[email]; // Delete from the cloud table Context.DeleteObject(attendee); Context.SaveChanges(); } // Delete from the memory cache allattendees.remove(email); 59
Code TableHelper.cs public AttendeeEntity GetAttendee(string email) { if (allattendees == null) ReadAllAttendees(); if (allattendees.containskey(email)) return allattendees[email]; } return null; Remember that this only works for tables (or queries on tables) that easily fit in memory This is one of many design patterns for working with tables 60
61 Pseudo Code TableHelper.cs public void UpdateAttendees(List<AttendeeEntity> updatedattendees) { foreach (AttendeeEntity attendee in updatedattendees) { UpdateAttendee(attendee, false); } Context.SaveChanges(SaveChangesOptions.Batch); } public void UpdateAttendee(AttendeeEntity attendee) { UpdateAttendee(attendee, true); } private void UpdateAttendee(AttendeeEntity attendee, bool savechanges) { if (allattendees.containskey(attendee.email)) { AttendeeEntity existingattendee = allattendees[attendee.email]; existingattendee.updatefrom(attendee); Context.UpdateObject(existingAttendee); } else { Context.AddObject(tableName, attendee); } if (savechanges) Context.SaveChanges(); }
Application Code Cloud Tables private void SaveButton_Click(object sender, RoutedEventArgs e) { // Write to table tablehelper.updateattendees(attendees); } That s it! Now your tables are accessible using REST service calls or any cloud storage tool. 62
63 Tools Fiddler2
Best Practices
Picking the Right VM Size Having the correct VM size can make a big difference in costs Fundamental choice larger, fewer VMs vs. many smaller instances If you scale better than linear across cores, larger VMs could save you money Pretty rare to see linear scaling across 8 cores. More instances may provide better uptime and reliability (more failures needed to take your service down) Only real right answer experiment with multiple sizes and instance counts in order to measure and find what is ideal for you
Using Your VM to the Maximum Remember: 1 role instance == 1 VM running Windows. 1 role instance!= one specific task for your code You re paying for the entire VM so why not use it? Common mistake split up code into multiple roles, each not using up CPU. Balance between using up CPU vs. having free capacity in times of need. Multiple ways to use your CPU to the fullest
Exploiting Concurrency Spin up additional processes, each with a specific task or as a unit of concurrency. May not be ideal if number of active processes exceeds number of cores Use multithreading aggressively In networking code, correct usage of NT IO Completion Ports will let the kernel schedule the precise number of threads In.NET 4, use the Task Parallel Library Data parallelism Task parallelism
Finding Good Code Neighbors Typically code falls into one or more of these categories: Memory Intensive CPU Intensive Network IO Intensive Storage IO Intensive Find code that is intensive with different resources to live together Example: distributed network caches are typically network- and memoryintensive; they may be a good neighbor for storage IO-intensive code
Scaling Appropriately Monitor your application and make sure you re scaled appropriately (not over-scaled). Spinning VMs up and down automatically is good at large scale. Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to keep running. Being too aggressive in spinning down VMs can result in poor user experience. Trade-off between risk of failure/poor user experience due to not having excess capacity and the costs of having idling VMs. Performance Cost
Storage Costs Understand an application s storage profile and how storage billing works Make service choices based on your app profile E.g. SQL Azure has a flat fee while Windows Azure Tables charges per transaction. Service choice can make a big cost difference based on your app profile Caching and compressing. They help a lot with storage costs.
Saving Bandwidth Costs Bandwidth costs are a huge part of any popular web app s billing profile Saving bandwidth costs often lead to savings in other places Sending fewer things over the wire often means getting fewer things from storage Sending fewer things means your VM has time to do other tasks All of these tips have the side benefit of improving your web app s performance and user experience
Compressing Content 1. Gzip all output content All modern browsers can decompress on the fly. Compared to Compress, Gzip has much better compression and freedom from patented algorithms 2.Tradeoff compute costs for storage size 3.Minimize image sizes Use Portable Network Graphics (PNGs) Crush your PNGs Strip needless metadata Make all PNGs palette PNGs Uncompressed Content Gzip Minify JavaScript Minify CCS Minify Images Compressed Content
Best Practices Summary Doing less is the key to saving costs Measure everything Know your application profile in and out
Research Examples in the Cloud on another set of slides
Map Reduce on Azure Elastic MapReduce on Amazon Web Services has traditionally been the only option for Map Reduce jobs in the web Hadoop implementation Hadoop has a long history and has been improved for stability Originally Designed for Cluster Systems Microsoft Research this week is announcing a project code named Daytona for Map Reduce jobs on Azure Designed from the start to use cloud primitives Built-in fault tolerance REST based interface for writing your own clients
Project Daytona - Map Reduce on Azure http://research.microsoft.com/en-us/projects/azure/daytona.aspx 76
Questions and Discussion Thank you for hosting me at the Summer School 77
LAST (Basic Local Alignment Search Tool) The most important software in bioinformatics Identify similarity between bio-sequences omputationally intensive Large number of pairwise alignment operations A BLAST running can take 700 ~ 1000 CPU hours Sequence databases growing exponentially GenBank doubled in size in about 15 months.
It is easy to parallelize BLAST Segment the input Segment processing (querying) is pleasingly parallel Segment the database (e.g., mpiblast) Needs special result reduction processing Large volume data A normal Blast database can be as large as 10GB 100 nodes means the peak storage bandwidth could reach to 1TB The output of BLAST is usually 10-100x larger than the input
Parallel BLAST engine on Azure Query-segmentation data-parallel pattern split the input sequences query partitions in parallel merge results together when done Follows the general suggested application model Web Role + Queue + Worker With three special considerations Batch job management Task parallelism on an elastic Cloud Wei Lu, Jared Jackson, and Roger Barga, AzureBlast: A Case Study of Developing Science Applications on the Cloud, in Proceedings of the 1st Workshop on Scientific Cloud Computing (Science Cloud 2010), Association for Computing Machinery, Inc., 21 June 2010
A simple Split/Join pattern Leverage multi-core of one instance argument a of NCBI-BLAST 1,2,4,8 for small, middle, large, and extra large instance size Task granularity Large partition load imbalance Small partition unnecessary overheads NCBI-BLAST overhead Data transferring overhead. Splitting task Best Practice: test runs to profiling and set size to mitigate the overhead BLAST task BLAST task BLAST task BLAST task Value of visibilitytimeout for each BLAST task, Essentially an estimate of the task run time. too small repeated computation; too large unnecessary long period of waiting time in case of the instance failure. Best Practice: Estimate the value based on the number of pair-bases in the partition and test-runs Watch out for the 2-hour maximum limitation Merging Task
Task size vs. Performance Benefit of the warm cache effect 100 sequences per partition is the best choice Instance size vs. Performance Super-linear speedup with larger size worker instances Primarily due to the memory capability. Task Size/Instance Size vs. Cost Extra-large instance generated the best and the most economical throughput Fully utilize the resource
BLAST task BLAST task Splitting task BLAST task Merging Task BLAST task Web Role Job Management Role Worker Web Portal Job registration Scaling Engine Worker Web Service Job Scheduler Global dispatch queue Worker NCBI databases Job Registry Azure Table Blast databases, temporary data, etc.) Database updating Role Azure Blob
ASP.NET program hosted by a web role instance Submit jobs Track job s status and logs Authentication/Authorization based on Live ID Web Portal Web Service Job Portal Job registration Job Registry Scaling Engine Job Scheduler The accepted job is stored into the job registry table Fault tolerance, avoid in-memory states
R. palustris as a platform for H2 production Eric Shadt, SAGE Sam Phattarasukol Harwood Lab, UW Blasted ~5,000 proteins (700K sequences) Against all NCBI non-redundant proteins: completed in 30 min Against ~5,000 proteins from another strain: completed in less than 30 sec AzureBLAST significantly saved computing time
Discovering Homologs Discover the interrelationships of known protein sequences All against All query The database is also the input query The protein database is large (4.2 GB size) Totally 9,865,668 sequences to be queried Theoretically, 100 billion sequence comparisons! Performance estimation Based on the sampling-running on one extra-large Azure instance Would require 3,216,731 minutes (6.1 years) on one desktop One of biggest BLAST jobs as far as we know This scale of experiments usually are infeasible to most scientists
Allocated a total of ~4000 instances 475 extra-large VMs (8 cores per VM), four datacenters, US (2), Western and North Europe 8 deployments of AzureBLAST Each deployment has its own co-located storage service Divide 10 million sequences into multiple segments Each will be submitted to one deployment as one job for execution Each segment consists of smaller partitions When load imbalances, redistribute the load manually 5 0 62 6 2 62 6 2 6 6 2 5 2 0
Total size of the output result is ~230GB The number of total hits is 1,764,579,487 Started at March 25 th, the last task completed on April 8 th (10 days compute) But based our estimates, real working instance time should be 6~8 day Look into log data to analyze what took place 5 0 62 6 2 62 6 2 6 6 2 5 2 0
A normal log record should be 3/31/2010 6:14 RD00155D3611B0 Executing the task 251523... 3/31/2010 6:25 RD00155D3611B0 Execution of task 251523 is done, it took 10.9mins 3/31/2010 6:25 RD00155D3611B0 Executing the task 251553... 3/31/2010 6:44 RD00155D3611B0 Execution of task 251553 is done, it took 19.3mins 3/31/2010 6:44 RD00155D3611B0 Executing the task 251600... 3/31/2010 7:02 RD00155D3611B0 Execution of task 251600 is done, it took 17.27 mins Otherwise, something is wrong (e.g., task failed to complete) 3/31/2010 8:22 RD00155D3611B0 Executing the task 251774... 3/31/2010 9:50 RD00155D3611B0 Executing the task 251895... 3/31/2010 11:12 RD00155D3611B0 Execution of task 251895 is done, it took 82 mins
North Europe Data Center, totally 34,256 tasks processed All 62 compute nodes lost tasks and then came back in a group. This is an Update domain ~ 6 nodes in one group ~30 mins
West Europe Datacenter; 30,976 tasks are completed, and job was killed 35 Nodes experience blob writing failure at same time A reasonable guess: the Fault Domain is working
MODISAzure : Computing Evapotranspiration (ET) in The Cloud You never miss the water till the well has run dry Irish Proverb
Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from open water bodies and transpiration, or evaporation through plant membranes, by plants. ET = Rn + ρ a c p δq g a ( + γ 1 + g a g s )λ υ Penman-Monteith (1964) ET = Water volume evapotranspired (m 3 s -1 m -2 ) Δ = Rate of change of saturation specific humidity with air temperature.(pa K -1 ) λ v = Latent heat of vaporization (J/g) R n = Net radiation (W m -2 ) c p = Specific heat capacity of air (J kg -1 K -1 ) ρ a = dry air density (kg m -3 ) δq = vapor pressure deficit (Pa) g a = Conductivity of air (inverse of r a ) (m s -1 ) g s = Conductivity of plant stoma, air (inverse of r s ) (m s -1 ) γ = Psychrometric constant (γ 66 Pa K -1 ) Lots of inputs: big data reduction Some of the inputs are not so simple Estimating resistance/conductivity across a catchment can be tricky
Climate classification ~1MB (1file) FLUXNET curated sensor dataset (30GB, 960 files) Vegetative clumping ~5MB (1file) NCEP/NCAR ~100MB (4K files) NASA MODIS imagery source archives 5 TB (600K files) 20 US year = 1 global year FLUXNET curated field dataset 2 KB (1 file)
Data collection (map) stage Downloads requested input tiles from NASA ftp sites Includes geospatial lookup for non-sinusoidal tiles that will contribute to a reprojected sinusoidal tile Reprojection (map) stage Converts source tile(s) to intermediate result sinusoidal tiles Simple nearest neighbor or spline algorithms Derivation reduction stage First stage visible to scientist Computes ET in our initial use Analysis reduction stage Optional second stage visible to scientist Enables production of science analysis artifacts such as maps, tables, virtual sensors Source Imagery Download Sites Data Collection Stage Reprojection Queue... Download Queue Reprojection Stage Source Metadata Reduction #1 Queue AzureMODIS Service Web Role Portal Derivation Reduction Stage Request Queue Reduction #2 Queue Scientists Science results Analysis Reduction Stage Scientific Results Download http://research.microsoft.com/en-us/projects/azure/azuremodis.aspx
<PipelineStage> Request MODISAzure Service (Web Role) Service Monitor (Worker Role) <PipelineStage>Job Queue Persist Parse & Persist <PipelineStage>JobStatus <PipelineStage>TaskStatus <PipelineStage>Task Queue ModisAzure Service is the Web Role front door Receives all user requests Queues request to appropriate Download, Reprojection, or Reduction Job Queue Dispatch Service Monitor is a dedicated Worker Role Parses all job requests into tasks recoverable units of work Execution status of all jobs and tasks persisted in Tables
Service Monitor (Worker Role) Parse & Persist <PipelineStage>TaskStatus <PipelineStage>Task Queue Dispatch GenericWorker (Worker Role) <Input>Data Storage All work actually done by a Worker Role Dequeues tasks created by the Service Monitor Retries failed tasks 3 times Maintains all task status Sandboxes science or other executable Marshalls all storage from/to Azure blob storage to/from local Azure Worker instance files
Reprojection Request Job Queue Persist ReprojectionJobStatus Each entity specifies a single reprojection job request Reprojection Data Storage Service Monitor (Worker Role) Task Queue Dispatch GenericWorker (Worker Role) Parse & Persist Points to ReprojectionTaskStatus ScanTimeList SwathGranuleMeta Swath Source Data Storage Each entity specifies a single reprojection task (i.e. a single tile) Query this table to get the list of satellite scan times that cover a target tile Query this table to get geo-metadata (e.g. boundaries) for each swath tile
Total: $1420 Computational costs driven by data scale and need to run reduction multiple times Storage costs driven by data scale and 6 month project duration Small with respect to the people costs even at graduate student rates! Source Imagery Download Sites Data Collection Stage $50 upload $450 storage Reprojection Queue... Download Queue Reprojection Stage $420 cpu $60 download 400-500 GB 60K files 10 MB/sec 11 hours <10 workers 400 GB 45K files 3500 hours 20-100 workers Source Metadata Reduction #1 Queue AzureMODIS Service Web Role Portal Derivation Reduction Stage $216 cpu $1 download $6 storage 5-7 GB 5.5K files 1800 hours 20-100 workers Request Queue Reduction #2 Queue Scientists Scientific Results Download Analysis Reduction Stage $216 cpu $2 download $9 storage <10 GB ~1K files 1800 hours 20-100 workers
Clouds are the largest scale computer centers ever constructed and have the potential to be important to both large and small scale science problems. Equally import they can increase participation in research, providing needed resources to users/communities without ready access. Clouds suitable for loosely coupled data parallel applications, and can support many interesting programming patterns, but tightly coupled lowlatency applications do not perform optimally on clouds today. Provide valuable fault tolerance and scalability abstractions Clouds as amplifier for familiar client tools and on premise compute. Clouds services to support research provide considerable leverage for both individual researchers and entire communities of researchers.