DTI Image Processing Pipeline and Cloud Computing Environment

Similar documents
Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of

Practical Solutions for Big Data Analytics

The Globus Galaxies Platform: Delivering Science Gateways as a Service

How To Write A Blog Post On Globus

Globus Genomics Tutorial GlobusWorld 2014

Cornell University Center for Advanced Computing

Cornell University Center for Advanced Computing A Sustainable Business Model for Advanced Research Computing

Deploying Business Virtual Appliances on Open Source Cloud Computing

Cornell University Center for Advanced Computing

GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid

Workflow Tools at NERSC. Debbie Bard NERSC Data and Analytics Services

Planning, Provisioning and Deploying Enterprise Clouds with Oracle Enterprise Manager 12c Kevin Patterson, Principal Sales Consultant, Enterprise

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

Grid Computing vs Cloud

Cloud Computing and E-Commerce

Efficient Cloud Management for Parallel Data Processing In Private Cloud

INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS

Introduction to Arvados. A Curoverse White Paper

Scale Cloud Across the Enterprise

How to Do/Evaluate Cloud Computing Research. Young Choon Lee

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

globus online The Research Cloud Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Cloud and Virtualization to Support Grid Infrastructures

CONDOR CLUSTERS ON EC2

Globus Research Data Management: Introduction and Service Overview

Plug-and-play Virtual Appliance Clusters Running Hadoop. Dr. Renato Figueiredo ACIS Lab - University of Florida

Grid Computing Vs. Cloud Computing

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Scaling up to Production

High Throughput Sequencing Data Analysis using Cloud Computing

DataCenter optimization for Cloud Computing

Scalable Architecture on Amazon AWS Cloud

Cloud Computing Trends

Introduction to Cloud Computing

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Cloud Testing Testing on the Cloud

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

Description of Application

Globus for Data Management

Solution for private cloud computing

Outlook. Corporate Research and Technologies, Munich, Germany. 20 th May 2010

SOA and Cloud in practice - An Example Case Study

Simplified Private Cloud Management

Public Cloud Offerings and Private Cloud Options. Week 2 Lecture 4. M. Ali Babar

The Cloud at Crawford. Evaluating the pros and cons of cloud computing and its use in claims management

Nimbus: Cloud Computing with Science

Cloud computing - Architecting in the cloud

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar

Hadoop & Spark Using Amazon EMR

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Deploying a Geospatial Cloud

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Cloud security CS642: Computer Security Professor Ristenpart h9p:// rist at cs dot wisc dot edu University of Wisconsin CS 642

IBM Solutions Grid for Business Partners Helping IBM Business Partners to Grid-enable applications for the next phase of e-business on demand

Cloud Infrastructure Pattern

CLEVER: a CLoud-Enabled Virtual EnviRonment

Evolution from the Traditional Data Center to Exalogic: An Operational Perspective

By Cloud Spectator July 2013

How To Understand Cloud Computing

SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION

Hybrid Development and Test USE CASE

A Service for Data-Intensive Computations on Virtual Clusters

Cloud JPL Science Data Systems

On Demand Satellite Image Processing

BITDEFENDER SECURITY FOR AMAZON WEB SERVICES

Federal Secure Cloud Testing as a Service - TaaS Center of Excellence (CoE) Robert L. Linton

CHAPTER 8 CLOUD COMPUTING

How To Run A Cloud At Cornell Cac

Cloud Computing For Bioinformatics

Best Practices for Sharing Imagery using Amazon Web Services. Peter Becker

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Transcription:

DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago and Argonne National Laboratory

Introduction DTI image analysis requires the use of many tools QC Registration ROI Marking Fiber Tracking QC, Registration, ROI Marking, Fiber Tracking,.. Constructing analyses is challenging Data & tooldiscovery, selection, orchestration,.. We have made huge strides in terms of data Data formats, repositories, i protocols, metadata, CDEs We now need infrastructure to reduce the barriers that exist between data providers, tool developers, researchers, and clinicians Big Science. Small Labs o We have exceptional infrastructure for the 1%, what about the 99%? 2 DTI Pipelines and Cloud Infrastructure

Common Approach to Analysis Modify Install (Re)Run Script Camino 3 DTI Pipelines and Cloud Infrastructure

How can we improve? We need a platform where users can easily construct ct and execute eec teanalyses Using best of bread tools and pipelines Abstracting low level infrastructure and platform heterogeneity Supporting automation and parallelism Supporting experimentation => Make existing tools and common analyses mundane building blocks 4 DTI Pipelines and Cloud Infrastructure

DTI Metric Reproducibility Pipeline Ultimate Goal: Investigate the feasibility of using DTI in clinical practice Automatic calculation of DTI metrics (FA, MD) from 48 automatically generated ROIs Using existing tools to create a reusable analysis workflow that can be easily repeated Investigate the ability to scale analyses over large datasets Explore the reproducibility over a group of 20 subjects with 4 scans spread over 2 sessions 5 DTI Pipelines and Cloud Infrastructure

DTI Processing Pipeline (1) BVEC & BVAL DTI T1 1. ECC DTI (FSL) 2. BET DTI (FSL) 3. BET T1 (FSL) 4. Linear Registration DTI / T1 (FSL FLIRT) 5. DTI Fitting (FSL/Camino) 7. Linear Registration T1/Template (FSLFLIRT) FLIRT) 7. Non linear Registration T1/Template (FSLFNIRT) FNIRT) Template 9. Transform FA/MD to MNI space (FSL Applywarp) p) 8. Calculate ROI Mean FA/MD (AFNI 3dmaskave) Atlas Mask 6 DTI Pipelines and Cloud Infrastructure

DTI Processing Pipeline (2) DTI BVEC & BVAL 1. ECC DTI (FSL) 2. BET DTI (FSL) FA Template FA image 3. DTI Fitting (FSL/Camino) Linear Registration (FSL FLIRT) Non Linear Registration (FSL FNIRT) MD image Apply Warp coefficient 7 DTI Pipelines and Cloud Infrastructure FA in MNI space 6. Calculate ROI Mean (3dmaskave) MD in MNI space Atlas Mask

Approaches for Implementing Pipelines Scripts XNAT Pipeline Engine Globus Genomics Bash scripts written to Defined by code (XML SaaS for genomics execute tools on a + scripts) Graphical interface for single computer Overhead to include creation and execution Time consuming, error prone, hard to transfer knowledge tools, develop interfaces and create pipelines Supports ondemand provisioning based on pricing gpolicies Little support for parallelization Difficult to change tools/pipelines Tools installed dynamically when Some support for parallelization li required 8 DTI Pipelines and Cloud Infrastructure

DTI Pipeline Platform Globus Endpoints Galaxy & Manager Globus Transfer Galaxy Condor Shared File System Dynamic Scheduler Dynamic Worker Pool 9 DTI Pipelines and Cloud Infrastructure

DTI Pipelines in the Cloud NFS Gluster GridFTP Condor Schedule 10 DTI Pipelines and Cloud Infrastructure Camino

DTI Pipelines in Galaxy 11 DTI Pipelines and Cloud Infrastructure

Cloud Computing Leverages economies of scale to facilitate utility ty models Pay only for resources used 1 * 100 hours == 100 * 1 hour On demand and elastic access to unlimited capacity Addresses fluctuating requirements Software as a Service Web access to data through Platform as a Service defined interfaces Platform as a Service No management of hardware or low level tools Infrastructure as a Service 12 DTI Pipelines and Cloud Infrastructure

Challenges Moving to the Cloud Resource Selection: Comparing price, capabilities, performance, instance types (EBS, S,Instance store), toolperformance Tool Selection and Management: Finding tools, installing, configuring and using them in different environments Analysis/Resource Management: Developing structured and repeatable analyseswith different tools. Data transfer: Moving large amounts of data in/out of Cloud environmentreliably reliably andefficiently Scale and Parallelism: Scaling analyses by efficiently parallelizing across elastic infrastructure Security: Data and computation security HIPAA? 13 DTI Pipelines and Cloud Infrastructure

Amazon EC2 Pricing System Specifications Pricing CPU Units CPU Cores Memory On Demand Spot (Low) Spot (High) m1.large 4 2 75 7.5 024 0.24 0.026026 55 5.5 m1.xlarge 8 4 15 0.48 0.052 0.64 m3.xlarge 13 4 15 0.5 0.058 0.058 m3.2xlarge 26 8 30 1 0.01150115 0.115 m2.xlarge 6.5 2 17.1 0.41 0.035 0.36 m2.2xlarge 13 4 34.2 0.82 0.07 3 m2.4xlarge 26 8 68.4 164 1.64 014 0.14 014 0.14 14 DTI Pipelines and Cloud Infrastructure

Spot Pricing Volatility 15 DTI Pipelines and Cloud Infrastructure

Instance Performance and Pricing Time (M Minutes ) 120 100 80 60 40 20 0 EBS Instance Store On Demand Spot (Low) Spot (High) 1.5 1.2 0.9 0.6 0.3 0 Cos st per Subjec ($) 16 DTI Pipelines and Cloud Infrastructure

Pricing Multiple Analyses Per Node Co ost per Subject ($) 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 On Demand Spot (Low) Spot (High) 17 DTI Pipelines and Cloud Infrastructure

Elastic Startup Cost 1:15:00 Tim me 1:00:00 0:45:00 0:30:00 ROI Calculation Tensor Fitting ECC & Registration Contextualize 0:15:00 0:00:00 New Worker Existing Worker Spot Price Queue 18 DTI Pipelines and Cloud Infrastructure

Data Transfer with Globus Online Reliable file transfer, sharing, syncing. Easy fire and forget file transfers Automatic fault recovery High performance Across multiple l security domains In place sharing of files with users and groups No IT required. Software as a Service (SaaS) o No client software installation i o New features automatically available 19 DTI Pipelines and Cloud Infrastructure

Transfer Comparison 20 DTI Pipelines and Cloud Infrastructure

Summary Structured pipelines simplify creation, execution and sharing of complex analyses and sharing of complex analyses Hosted as a service can further reduce barriers By outsourcing pipeline execution on the Cloud we can reduce overhead and costs Previously we took weeks to process ~100 scans o Using this approach < 5 cents a subject ($5 for 1 hour) What's next? Can we deliver this as a service? o Billing, security, paradigm shift, interactive tools Developing toolsheds for sharing tools and pipelines pp 21 DTI Pipelines and Cloud Infrastructure

Acknowledgements Mike Vannier, Xia Jiang, Farid Dahi Globus Online Ian Foster, Steve Tuecke, Rachana Ananthakrishnan Globus Genomics o Ravi Madduri, Paul Dave, Dina Sulakhe, Lukasz Lacinski 22 DTI Pipelines and Cloud Infrastructure