Talk Outline The data tsunami and the 4 th paradigm of science The challenges for the long tail of science Where is the cloud being used now? The app marketplace SMEs Analytics as a service. What are the apps and services science? The Research as a Service business model.
The data explosion is transforming science Every area of science is now engaged in data-intensive research Researchers need Technology to publish and share data in the cloud Data analytics tools to explore massive data collections A sustainable economic model for scientific analysis, collaboration and data curation
The Long Tail of Science Collectively long tail science is generating a lot of data Estimated at over 1PB per year and it is growing fast. Many funding agencies now or soon will requires all data be made public US Universities are struggling with this new load Data must be preserved Data must be sharable, searchable, and analyzable
How Do We Help The Long Tail of Science? Let Scientists Be Scientists but they need access to big data and analysis tools
What is the role of the cloud? Look at where the cloud is having its most revolutionary impact. Three key areas
1. Enabling SMEs to create now business For Microsoft, over 30,000 customers, many SMEs. Same for Amazon, Google and others Expect new public clouds in Europe and Asia to do the same. 2. The Mobile App to Cloud explosion Mapping, data storage, communication, etc. From our FP7 Venus-C project
3. Analytics as a Service An important emerging standard: the Hadoop ecosystem http://datameer2.datameer.com/blog
Engaging the broader scientific community
Cloud Science Stack The challenge: Design a platform for scientific data management and analysis that is Open and extensible Provides an economic sustainability model for data preservation and use Is easily accessed by simple desktop/web analysis apps. Encourages scientific collaboration Leverages the capabilities of public clouds and on-campus resources Can we build a demonstration project to test the feasibility of this? Build it using the tools the community wants and uses.
Cloud apps for science Microsoft Research has released a number of cloud apps for scientific communities. FetchClimate ChronoZoom: An infinite canvas in time
Supporting community data collections Dataverse from Harvard Others like DataOne for environmental science Inter-university Consortium for Political and Social Research ICPSR subscription business model working now for 50 years.
For the Spreadsheet Users: One Simple, Powerful Idea: DataUp Most scientists do data collection and analysis using spreadsheets. How to they share them? preserve them? generate metadata to store them? DataUP is an open source Excel plug in (or web tool) to help researchers document, manage, and archive their tabular data, DataUp operates within the scientist's workflow and integrates with Microsoft Excel Guides user through basic metadata generation then upload The Cloud
IPython Notebook on Windows Azure Browser/OS independent Math rendering Azure ready Inline graphics & Video Notebooks stored in Azure Blobs Interactive Parallel Computing & MPI Engine on Azure via Windows or Linux VM Support for R, LaTeX, PHP, Collaborate with colleagues
Big Data Analytics in Medicine Hospital Readmissions (from Eric Horvitz of MSR) 20% of patients were rehospitalized within 30 days of their discharge from hospitals and that 35% of patients were rehospitalized within 90 days Study of large multi-year data set of hospitalizations. Machine learning produced a predictive model that can accurately predict likelihood of a readmission given patient data. The Genetic Causes of Disease (David Heckerman) Use data from the Welcome Trust for a GWAS for a large population looking for Looking for causes for seven common diseases (bipolar, r. arthritis, coronary, hypertension,.) Confounding is a problem. Needed a new algorithm. Ran on Azure cloud using 35,000 cores in 3 weeks.
First Target: Bioinformatics in the Cloud Protein Folding The University of Washington is studying the ways proteins from salmonella virus inject DNA into cells. Used 2000 concurrent cores. Joint Genetic and Neuroimaging Analysis France s premier research institute INRIA is using 1000 cores of Azure to study large cohorts of subjects to understand links between genetic patterns and brain anomalies. Porting Bioinformatics to the Cloud Researchers at the Universidad Politécnica de Valencia have ported a powerfull collection of bioinformatics tools to the Azure cloud for gene mutation research. We are adding large data collections to this. Drug Discovery Researchers at Newcastle University in the U.K. are using Azure to model the properties (toxicity, solubility, biological activity) of molecules for potential use as drugs. Systems Biology The University of Trento Centre for Computational and Systems Biology have developed an Azure based tool, BetaSIM for modeling and simulating biological systems.
Next Steps Bringing Communities Together
Concluding Remarks Long tail science will need to be self sustaining. Well curated research data has real value If high quality analysis tools are also there, subscriptions will support it. Next Goal: Build a Research Marketplace advanced data analytics and machine learning libraries Curated data collections (via dataverse or duracloud) Data upload, curation and visualization tool (CDL project) A support platform for research challenge projects such machine learning and medical image analysis Exploit Azure Marketplace to provided limited free and paid access.
2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.