Biomedical Informatics Applications, Big Data, & Cloud Computing



Similar documents
Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

From Distributed Computing to Distributed Artificial Intelligence

Workprogramme

Linked Science as a producer and consumer of big data in the Earth Sciences

Research Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on

High Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University

Hadoop-GIS: A High Performance Spatial Query System for Analytical Medical Imaging with MapReduce

ebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.

EHR CURATION FOR MEDICAL MINING

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction

Big Data Processing and Analytics for Mouse Embryo Images

Environmental Remote Sensing GEOG 2021

Massive Labeled Solar Image Data Benchmarks for Automated Feature Recognition

Information Services for Smart Grids

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

The STC for Event Analysis: Scalability Issues

EVALUATING CLASSIFICATION POWER OF LINKED ADMISSION DATA SOURCES WITH TEXT MINING

Statistical Analysis and Visualization for Cyber Security

The Scientific Data Mining Process

Globus Genomics Tutorial GlobusWorld 2014

This Symposium brought to you by

MapReduce and Hadoop Distributed File System V I J A Y R A O

Enabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research

Sanjeev Kumar. contribute

A pretty picture, or a measurement? Retinal Imaging

Predictive Analytics

Conquering the Astronomical Data Flood through Machine

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Scalable Developments for Big Data Analytics in Remote Sensing

Data Centric Systems (DCS)

Managing and Querying Whole Slide Images

Advancing Sustainability with Geospatial Steven Hagan, Vice President, Server Technologies João Paiva, Ph.D. Spatial Information and Science

Opportunities and Challenges in Big Data Neuroscience

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION

Opportunities to Overcome Key Challenges

Software Description Technology

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

A Framework forhigh Performance Image AnalysisPipelines overcloud Resources

How To Create A Signal Data System

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Data Refinery with Big Data Aspects

Cloud JPL Science Data Systems

Graph Database Performance: An Oracle Perspective

Introduction to Data Mining

Predictive Analytics for Demand Forecasting and Planning Managers A Big Data Challenge Hans Levenbach, Delphus, Inc.

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

CloudRank-D:A Benchmark Suite for Private Cloud Systems

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

A Learning Based Method for Super-Resolution of Low Resolution Images

ICT Perspectives on Big Data: Well Sorted Materials

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Data Warehousing. Yeow Wei Choong Anne Laurent

Towards a New Model for the Infrastructure Grid

urika! Unlocking the Power of Big Data at PSC

Thermal Management of Datacenter

Big Workflow: More than Just Intelligent Workload Management for Big Data

European Data Infrastructure - EUDAT Data Services & Tools

Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage

TRANSLATIONAL BIOINFORMATICS 101

Principles for Working with Big Data"

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Curriculum Vitae. Mahesh Joshi. Education. Research Experience. Publications

Data Warehouse: Introduction

NASA s Big Data Challenges in Climate Science

Information Model Requirements of Post-Coordinated SNOMED CT Expressions for Structured Pathology Reports

System Models for Distributed and Cloud Computing

How To Make Sense Of Data With Altilia

How To Handle Big Data With A Data Scientist

On-demand Provisioning of Workflow Middleware and Services An Overview

Ibis: Scaling Python Analy=cs on Hadoop and Impala

TOLOMEO. ORFEO Toolbox. Jordi Inglada - CNES. TOoLs for Open Mul/- risk assessment using Earth Observa/on data TOLOMEO

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P.

A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace

Mission Need Statement for the Next Generation High Performance Production Computing System Project (NERSC-8)

Li Xiong, Emory University

CIESIN Columbia University

QoS-Aware Storage Virtualization for Cloud File Systems. Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt. Zuse Institute Berlin

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Disributed Query Processing KGRAM - Search Engine TOP 10

Rulex s Logic Learning Machines successfully meet biomedical challenges.

Information Processing, Big Data, and the Cloud

UniGR Workshop: Big Data «The challenge of visualizing big data»

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

Embedded Systems in Healthcare. Pierre America Healthcare Systems Architecture Philips Research, Eindhoven, the Netherlands November 12, 2008

Proposal Title: Smart Analytic Health Plan Systems

Pallas Ludens. We inject human intelligence precisely where automation fails. Jonas Andrulis, Daniel Kondermann

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Find the signal in the noise

Cloud Computing and the Future of Internet Services. Wei-Ying Ma Principal Researcher, Research Area Manager Microsoft Research Asia

Accelerate > Converged Storage Infrastructure. DDN Case Study. ddn.com DataDirect Networks. All Rights Reserved

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

From Big Data to Smart Data Thomas Hahn

Age of Analytics: Competing in the 21 st Century

Transcription:

Biomedical Informatics Applications, Big Data, & Cloud Computing Patrick Widener, PhD Assistant Professor, Biomedical Engineering Senior Research Scientist, Center for Comprehensive Informatics Emory University patrick.widener@emory.edu

Squeezing Information from Temporal Spatial Datasets Leverage exascale data and computer resources to squeeze the most out of image, sensor or simulation data Run lots of different algorithms to derive same features Run lots of algorithms to derive complementary features Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms

Pipelines to Carry out Feature Extraction and Classification in Brain Tumor Research

Exascale feature extraction / analysis Domain Knowledge Stroma-poor subclass Differentiating Neuroblastoma Hundreds of concepts 4 channels, 10 GB per image, 1000s of images Segmentation, Feature Extraction, Classification Semantic-access: Stroma-poor regions Millions of features 700K identified nuclei from single algorithm Object-relational access: Pixels in rectangle where R > 23 Images from Pathology Slides

Pipeline for Whole Slide Feature Characterization 10 10 pixels for each whole slide image 10 whole slide images per patient 10 8 image features per whole slide image 10,000 brain tumor patients 10 15 pixels 10 13 features Hundreds of algorithms Annotations and markups from dozens of humans

Integration of heterogeneous multiscale information Pathology, Radiology, omics, Clinical information Reproducible characterization at gross level (Radiology) and fine level (Pathology) Integration with multiple types of omic information Exploit synergies to improve ability to forecast survival & response. Omic Data Radiology Imaging Pathologic Features Patient Outcome

Slide from Walter Bosch

Analogous Feature Extraction and Classification Issues in Most Fields Astrophysics Material Science Which portions of a star s core are susceptible to implosion over time period [t1, t2]? Is crystalline growth likely to occur within range [p1, p2] of pressure conditions? Compute streamlines on vector field v within grid points [(x1,y1)-(x2,y2)] Compute likelihood of local cyclic relationships among nanoparticles within a frame Cancer studies Which regions of the tumor are undergoing active angiogenesis in response to hypoxia? Determine image regions where (blood vessel density > 20) and (nuclei and necrotic region are within 50 microns of each other)

Data Science Research Challenges Coordination and management of algorithms and metadata needed to carry out high throughput feature extraction and classification Interactive on-demand user interactivity with exascale complex multi-algorithm analysis frameworks Computer assisted annotation and markup for very large datasets development of the actual image analysis and machine learning algorithms Structural and semantic metadata management: how to manage tradeoff between flexibility and curation Data and semantic modeling infrastructures and policies able to scale to handle distributed systems with an aggregate of 10*9 or more data models/concepts

Biomedical informa.cs research challenges Integra)on of mul)ple data sources with conflic)ng metadata and conflic)ng data Efficient methods for seman)c query involving complex mul)- scale features associated with peta- /exascale ensembles of highly annotated images Systems to support combina)ons of structured and irregular accesses to exascale datasets Seman)c Query Engine Correla've analysis Image segmenta'on, classifica'on Distributed, federated query support Tac.cal research issues Data privacy/security Seeding petascale data into the cloud Data to computa)on or computa)on to data Access to specialized analysis engines GT / Emory Biomedical Research Cloud Jointly developed system soeware/middleware stack Leverage exis)ng work in virtualiza)on, power management Joint hardware, networking investments