The data explosion is transforming science

Similar documents
Every area of science is now engaged in data-intensive research Researchers need Technology to publish and share data in the cloud Data analytics

The Changing Nature and Uses of Data

Accelerating Academic Research with Cloud Computing

Scientific and Technical Applications as a Service in the Cloud

Microsoft Research Worldwide Presence

Microsoft Research Windows Azure for Research Training

Microsoft Research Microsoft Azure for Research Training

Key Technology Trends

Linux and Windows together, leveraging SUSE and Microsoft collaboration

WINDOWS AZURE DATA MANAGEMENT

Parallel Data Warehouse

Microsoft Technical Computing The Advancement of Parallelism. Tom Quinn, Technical Computing Partner Manager

Building Cloud Applications to Support Research: the Microsoft Azure Research Engagement Project

Big Data a threat or a chance?

Contents. Preface Acknowledgements. Chapter 1 Introduction 1.1

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research

Register on projectbotticelli.com. Introduction to BI & Big Data DAX MDX Data Mining

Windows Azure Platform

Running Large Workflows in the Cloud

Personalized Medicine and IT

Data Semantics Aware Cloud for High Performance Analytics

Intelligence. Productivity. Mobility. Unified Service. Predictive analytics: Offline mobile: Self, assisted & field service

Cars on the Ground, Customers in the Clouds. Scaling a Website While Enhancing Innovation

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

Microsoft technológie pre BigData. Ľubomír Goryl Solution Professional

Cloud Platform Warfare in 2013 and Beyond

Developing Microsoft Azure Solutions

Introducing Cloud Computing into STEM Curriculum Using Microsoft Azure

Developing Microsoft Azure Solutions 20532A; 5 days

Open Data, Open Innovation.

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies

Machine Learning with MATLAB David Willingham Application Engineer

Research Data Management Services. Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012

Cloud Computing. Adam Barker

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Big Data Management in the Clouds and HPC Systems

DATA SCIENCE ADVISING NOTES David Wild - updated May 2015

Twister4Azure: Data Analytics in the Cloud

Sunnie Chung. Cleveland State University

Big Data Trends A Basis for Personalized Medicine

Private Cloud 201 How to Build a Private Cloud

International Journal of Innovative Research in Information Security (IJIRIS) ISSN: (O) Volume 1 Issue 3 (September 2014)

New solutions for Big Data Analysis and Visualization

University Uses Business Intelligence Software to Boost Gene Research

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

Dr Alexander Henzing

PRISM FY11 9/27/2010

Luncheon Webinar Series May 13, 2013

Big Data in the context of Preservation and Value Adding

Big Analytics in the Cloud. Matt Winkler PM, Big

Smart Monitoring Service

An HPC Application Deployment Model on Azure Cloud for SMEs

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Big Data Analytics. Chances and Challenges. Volker Markl

Dutch HPC Cloud: flexible HPC for high productivity in science & business

How To Talk About Data Intensive Computing On The Cloud

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

INVESTOR PRESENTATION. Third Quarter 2014

WINDOWS AZURE AND WINDOWS HPC SERVER

INVESTOR PRESENTATION. First Quarter 2014

From Data to Foresight:

Virtualization. as a key enabler for Cloud OS vision. Vasily Malanin Datacenter Product Management Lead Microsoft APAC

Trust. The essential ingredient for innovation. Thomas Langkabel National Technology Officer Microsoft Germany

Computing in clouds: Where we come from, Where we are, What we can, Where we go

Large-Scale Data Processing

Big Data jako součást našeho života. Zdenek Panec: June, 2015

Future computing platforms for biodiversity science

Doing Multidisciplinary Research in Data Science

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Assignment # 1 (Cloud Computing Security)

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Investor Presentation. Second Quarter 2015

A Professional Big Data Master s Program to train Computational Specialists

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Bring the cloud to your datacenter

Cluster, Grid, Cloud Concepts

Green Prefab: Civil Engineering Hub in Microsoft Windows Azure

Research Computing Building Blocks INFRASTRUCTURE FOR DATA AT PURDUE PRESTON SMITH, DIRECTOR OF RESEARCH SERVICES

Grid Computing Perspectives for IBM

Amazon Web Services Annual ALGIM Conference. Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand

Brian Amedro CTO. Worldwide Customers

Vivir en un mar de Datos 2015: Big Data una mirada Global Fundación Telefónica

Delivering the power of the world s most successful genomics platform

Sanjeev Kumar. contribute

Deploying MATLAB -based Applications David Willingham Senior Application Engineer

Make the Most of Big Data to Drive Innovation Through Reseach

All You Wanted To Know About the Management of Digital Resources in Alma

BI in the Cloud Sky is the limit

BCIT COMPUTING offers courses and credentials in SIX related information technology sectors

Hadoop in the Hybrid Cloud

ODUM INSTITUTE ARCHIVE SERVICES OVERVIEW IASSIST 2015

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!

Data Analytics as a Service

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Microsoft Cloud Services for Education. Matthew Fox Western Europe HQ Cloud Lead

Running Agilent GeneSpring MPP on the Cloud

Big Data and Industrial Internet

NASA's Strategy and Activities in Server Side Analytics

Transcription:

Talk Outline The data tsunami and the 4 th paradigm of science The challenges for the long tail of science Where is the cloud being used now? The app marketplace SMEs Analytics as a service. What are the apps and services science? The Research as a Service business model.

The data explosion is transforming science Every area of science is now engaged in data-intensive research Researchers need Technology to publish and share data in the cloud Data analytics tools to explore massive data collections A sustainable economic model for scientific analysis, collaboration and data curation

The Long Tail of Science Collectively long tail science is generating a lot of data Estimated at over 1PB per year and it is growing fast. Many funding agencies now or soon will requires all data be made public US Universities are struggling with this new load Data must be preserved Data must be sharable, searchable, and analyzable

How Do We Help The Long Tail of Science? Let Scientists Be Scientists but they need access to big data and analysis tools

What is the role of the cloud? Look at where the cloud is having its most revolutionary impact. Three key areas

1. Enabling SMEs to create now business For Microsoft, over 30,000 customers, many SMEs. Same for Amazon, Google and others Expect new public clouds in Europe and Asia to do the same. 2. The Mobile App to Cloud explosion Mapping, data storage, communication, etc. From our FP7 Venus-C project

3. Analytics as a Service An important emerging standard: the Hadoop ecosystem http://datameer2.datameer.com/blog

Engaging the broader scientific community

Cloud Science Stack The challenge: Design a platform for scientific data management and analysis that is Open and extensible Provides an economic sustainability model for data preservation and use Is easily accessed by simple desktop/web analysis apps. Encourages scientific collaboration Leverages the capabilities of public clouds and on-campus resources Can we build a demonstration project to test the feasibility of this? Build it using the tools the community wants and uses.

Cloud apps for science Microsoft Research has released a number of cloud apps for scientific communities. FetchClimate ChronoZoom: An infinite canvas in time

Supporting community data collections Dataverse from Harvard Others like DataOne for environmental science Inter-university Consortium for Political and Social Research ICPSR subscription business model working now for 50 years.

For the Spreadsheet Users: One Simple, Powerful Idea: DataUp Most scientists do data collection and analysis using spreadsheets. How to they share them? preserve them? generate metadata to store them? DataUP is an open source Excel plug in (or web tool) to help researchers document, manage, and archive their tabular data, DataUp operates within the scientist's workflow and integrates with Microsoft Excel Guides user through basic metadata generation then upload The Cloud

IPython Notebook on Windows Azure Browser/OS independent Math rendering Azure ready Inline graphics & Video Notebooks stored in Azure Blobs Interactive Parallel Computing & MPI Engine on Azure via Windows or Linux VM Support for R, LaTeX, PHP, Collaborate with colleagues

Big Data Analytics in Medicine Hospital Readmissions (from Eric Horvitz of MSR) 20% of patients were rehospitalized within 30 days of their discharge from hospitals and that 35% of patients were rehospitalized within 90 days Study of large multi-year data set of hospitalizations. Machine learning produced a predictive model that can accurately predict likelihood of a readmission given patient data. The Genetic Causes of Disease (David Heckerman) Use data from the Welcome Trust for a GWAS for a large population looking for Looking for causes for seven common diseases (bipolar, r. arthritis, coronary, hypertension,.) Confounding is a problem. Needed a new algorithm. Ran on Azure cloud using 35,000 cores in 3 weeks.

First Target: Bioinformatics in the Cloud Protein Folding The University of Washington is studying the ways proteins from salmonella virus inject DNA into cells. Used 2000 concurrent cores. Joint Genetic and Neuroimaging Analysis France s premier research institute INRIA is using 1000 cores of Azure to study large cohorts of subjects to understand links between genetic patterns and brain anomalies. Porting Bioinformatics to the Cloud Researchers at the Universidad Politécnica de Valencia have ported a powerfull collection of bioinformatics tools to the Azure cloud for gene mutation research. We are adding large data collections to this. Drug Discovery Researchers at Newcastle University in the U.K. are using Azure to model the properties (toxicity, solubility, biological activity) of molecules for potential use as drugs. Systems Biology The University of Trento Centre for Computational and Systems Biology have developed an Azure based tool, BetaSIM for modeling and simulating biological systems.

Next Steps Bringing Communities Together

Concluding Remarks Long tail science will need to be self sustaining. Well curated research data has real value If high quality analysis tools are also there, subscriptions will support it. Next Goal: Build a Research Marketplace advanced data analytics and machine learning libraries Curated data collections (via dataverse or duracloud) Data upload, curation and visualization tool (CDL project) A support platform for research challenge projects such machine learning and medical image analysis Exploit Azure Marketplace to provided limited free and paid access.

2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.