Big Data and Graph Analytics in a Health Care Setting



Similar documents
Using Big Data in Healthcare

Find the signal in the noise

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

Adobe Insight, powered by Omniture

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering

bigdata Managing Scale in Ontological Systems

Cray: Enabling Real-Time Discovery in Big Data

Microsoft Amalga HIS Electronic Medical Record

Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

Optum One. The Intelligent Health Platform

Introduction to Data Mining

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Using Ultra-Large Data Sets in Healthcare New Questions-New Answers

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Using Data Mining and Machine Learning in Retail

THE VIRTUAL DATA WAREHOUSE (VDW) AND HOW TO USE IT

Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps

Secondary Uses of Data for Comparative Effectiveness Research

Department of Veterans Affairs (VA) Electronic Health Record: Providing Information for Clinical Care and Data for VA Health Research

Data and Information Management in Public Health

Introduction. A. Bellaachia Page: 1

The Fusion of Supercomputing and Big Data: The Role of Global Memory Architectures in Future Large Scale Data Analytics

Workshop on Hadoop with Big Data

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

Design of a Multi Dimensional Database for the Archimed DataWarehouse

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

Patient Management Systems. Terrence Adam, BS Pharm,, MD, PhD Assistant Professor, PCHS University of Minnesota College of Pharmacy

Conquering the Astronomical Data Flood through Machine

Big Data :: Big Demand

Bringing Big Data Modelling into the Hands of Domain Experts

Data Isn't Everything

Where is... How do I get to...

Open source Google-style large scale data analysis with Hadoop

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Healthcare Market Overview

Healthcare Big Data Exploration in Real-Time

Transformational Data-Driven Solutions for Healthcare

Innovative Information Visualization of Electronic Health Record Data: a Systematic Review

National Cancer Institute

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Bigdata : Enabling the Semantic Web at Web Scale

Chapter 3: Data Mining Driven Learning Apprentice System for Medical Billing Compliance

Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My

SPATIAL DATA CLASSIFICATION AND DATA MINING

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Documentation and Medical Records Paper-based and Computer-based

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Environmental Health Science. Brian S. Schwartz, MD, MS

CHAPTER-6 DATA WAREHOUSE

Understanding applications using the BSC performance tools

Impact Intelligence. Flexibility. Security. Ease of use. White Paper

UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES

[callout: no organization can afford to deny itself the power of business intelligence ]

Healthcare data analytics. Da-Wei Wang Institute of Information Science

SQL Server 2012 Business Intelligence Boot Camp

HadoopRDF : A Scalable RDF Data Analysis System

Associate Prof. Dr. Victor Onomza Waziri

ANALYTICS BUILT FOR INTERNET OF THINGS

4. Understanding Clinical Data and Workflow Understanding Surveillance Data Exchange Processes Guide and Worksheet

Meaningful Use Stage 2 Certification: A Guide for EHR Product Managers

EMPI: A BUILDING BLOCK FOR INTEROPERABILITY

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

GC3 Use cases for the Cloud

Basic Study Designs in Analytical Epidemiology For Observational Studies

GEMS: Graph database Engine for Multithreaded Systems

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

BMC ProactiveNet Performance Management: Delivering on the Promise of Predictive Control Across the Total IT Environment SOLUTION WHITE PAPER

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Susan J Hyatt President and CEO HYATTDIO, Inc. Lorraine Fernandes, RHIA Global Healthcare Ambassador IBM Information Management

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

How To Handle Big Data With A Data Scientist

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Scaling Out With Apache Spark. DTL Meeting Slides based on

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Large-Scale Data Processing

Data Warehouse: Introduction

The Cubetree Storage Organization

Machine Learning for Data-Driven Discovery

Achieving Cost-Effective, Vendor-Neutral Archiving For Your Enterprise

Interoperability and Analytics February 29, 2016

Data Mining in the Swamp

Transcription:

Big Data and Graph Analytics in a Health Care Setting Supercomputing 12 November 15, 2012 Bob Techentin Mayo Clinic SPPDG Archive 43738-1

Archive 43738-2 What is the Mayo Clinic? Mayo Clinic Mission: To inspire hope and contribute to health and well-being by providing the best care to every patient through integrated clinical practice, education and research Primary Value: The needs of the patient come first The best interest of the patient is the only interest to be considered, and in order that the sick may have the benefit of advancing knowledge, union of forces is necessary. -- William J. Mayo, June 15, 1910

Archive 43738-3 Mayo Clinic Environment (2011 Data) Not-for-profit foundation for Health Care/Research/Education Total Mayo staff size: 58,300 in three locations (Rochester, MN; Jacksonville, FL; and Scottsdale, AZ) plus multiple (60+) regional sites near Rochester Mayo Rochester: 15 million square feet (about 3.5X Mall of America) 1,113,000 patient registrations, 123,000 Hospital admissions, 588,000 hospital days, 2,400+ hospital beds, 20 million+ yearly laboratory tests Total Mayo research staff size: 4,252 FTEs

Archive 43738-4 Clinical Data Analysis Challenges Many clinical analysis problems are virtually intractable by traditional computational means. Mayo Clinic s patient data includes 5,000,000 patient records, exceeding 10 PB of data storage. Medical records are a combination of numeric data (e.g., lab work), clinical observations (e.g., medical history), categorical data (e.g., clinical diagnosis), and longitudinal measurements (e.g., annual checkup). Developing query mechanisms for this complex data is challenging. Relationships between clinical factors is a found in a very small number of records. This is a low signal-to-noise relationship. Access to a large, wellorganized medical record database is critical. Analysis approaches differ for various applications Clinical decision support patient vs. population Research correlation of disparate factors Monitoring detect / predict population trends

Archive 43738-5 Clinical Data Analysis Example: Clinical Decision Support Patient vs. Population Clinical Decision Support Rules have been implemented at Mayo Clinic hospitals Rules are laboriously crafted by medical experts, and implemented by information systems staff Rules are back-tested against medical records, manually! Implemented rules fire thousands of times per week, with a high false positive rate Systems engineering approach now being applied to development of new rules Systematically identify failure modes Analyze the whole system (including Big Data) and connections between processes

Clinical Data Analysis Example: Research Correlation of Disparate Factors Discovery in 2006 that a huge (7-8%) decrease in incidence of breast cancer in one year (2003) was caused by nationwide cessation of hormone supplementation in menopausal women in 2002! First reported at San Antonio Breast Cancer Symposium, December 13-15, 2006 Discovered through laborious manual records reviews! Clinicians and medical researchers need technology to reduce investigations from years to months or days Many such reviews are simply not done because of limited manual resources SPPDG Archive 43738-6

Archive 43738-7 Characteristics of Mayo Enterprise Data Warehouse Complex Over 1000 departmental systems Many specialty domains Wide variety of data semantics Large Oldest formal medical record Complete conversion to electronic records (medical, surgical, radiology, billing, etc.) over 10 years ago Millions of patients / Terabytes of data Significant challenges Data sets are not well connected Noisy, incomplete, low reliability data / Low signal-to-noise ratio

SPPDG Archive 43738-8

Archive 43738-9 Example Medical Record Database: Rochester Epidemiology Project REP data captures a subset of medical record data for a relatively stable population in Southeast Minnesota ~ 500K Individuals, 40 year duration ~ 2 M medical records from many clinics and hospitals Medical record events limited to births, deaths, diagnoses, prescriptions, procedures Relatively small and simple 50 GB Sybase database is two orders of magnitude smaller than Mayo Electronic Medical Record Translation to semantic (RDF) model results in dozens of classes and only 3.5 billion triples

SPPDG Archive 43738-10

Archive 43738-11 Early Results From Semantic Analysis of Rochester Epidemiology Project Data Translation from multiple relational database sources into single semantic model is relatively straight forward 50 GB of relational data expands to 350 GB in NTRIPLES format Translation and data management can be slow and cumbersome Initial queries appear promising for research and clinical practice Cray XMT-2 and YarcData urika Query times of seconds, compared to hours for Sybase

Archive 43738-12 Expanding Clinical Record Analysis Beyond the Test Cases The Mayo Clinic Data Warehouse is two orders of magnitude larger than the REP databases, and the EMR is much larger yet Many more patients, across the U.S. and international Much more complex / richer dataset Aggregated data sets might exceed 300 B triples New systems supporting clinical practice need to scale in capacity and capability Medical centers might require 10,000-100,000 queries per day Integration of graph analytics with FLOPS - Logical AND numerical!

Archive 43738-13 Semantic Analysis Challenges for Clinical Applications Annotating facts with quality or reliability Reification adds substantial complexity to semantic model Identifying complex co-morbidity conditions Simple queries return N M combinations cluster analysis may be more effective Integration of logical and numerical processes Queries over ranges of values or dates (without FILTER!) Integrated analysis (with more sophistication than COUNT) Approximate matching Find a patient like me where me is 40,000 triples Temporal analysis

Archive 43738-14 Summary Both Big Data and Graph Analytics will have a role in future clinical practice and medical research Initial work with semantic analysis of medical record data using XMT-2 and urika appear promising Beginning to recognize limitations of current systems Scale and performance of future systems Expressing and querying data quality and reliability Integration of true graph analytics Tight coupling of logical and numerical analysis

A Few Dimensions of Evolution TODAY Applications Discrete Apps Situational Awareness Algorithms Graph Libraries Separate Monolithic Graph experts Logical and Numerical Incremental Coupled Subject-matter experts Audience Graph Primitives Static Graphs Dynamic Graphs Infrastructure/ Runtimes MPI, MapReduce Graph-specific Hardware Mass-market Current Graph-specific Emerging

A Few Dimensions of Evolution FUTURE Applications Discrete Apps Situational Awareness Algorithms Graph Libraries Separate Monolithic Graph experts Logical and Numerical Incremental Coupled Subject-matter experts Audience Graph Primitives Static Graphs Dynamic Graphs Infrastructure/ Runtimes MPI, MapReduce Graph-specific Hardware Mass-market Current Graph-specific Emerging