Using Big Data in Healthcare



Similar documents
Big Data and Graph Analytics in a Health Care Setting

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Industry 4.0 and Big Data

Cray: Enabling Real-Time Discovery in Big Data

bigdata Managing Scale in Ontological Systems

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Big Data Analytics. Rasoul Karimi

E6895 Advanced Big Data Analytics Lecture 4:! Data Store

HadoopRDF : A Scalable RDF Data Analysis System

Application of Engineering Principles to Patient Flow & Healthcare Delivery

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

YarcData urika Technical White Paper

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Natural Language Processing in the EHR Lifecycle

Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

Graph Database Performance: An Oracle Perspective

Big Data for Big Intel

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data and Analytics: Challenges and Opportunities

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

We have big data, but we need big knowledge

Transforming the Telecoms Business using Big Data and Analytics

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

ADVANCED DATA VISUALIZATION

COMP9321 Web Application Engineering

Discovering Business Insights in Big Data Using SQL-MapReduce

Customer Case Study. Sharethrough

An industry perspective on deployed semantic interoperability solutions

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Machine Learning over Big Data

Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD

Anatomy of Cyber Threats, Vulnerabilities, and Attacks

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

University of Manchester Health Data Science Masters Modules

AllegroGraph. a graph database. Gary King gwking@franz.com

Big Data and Data Science. The globally recognised training program

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Information Management course

The Big Data Paradigm Shift. Insight Through Automation

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Big Data and Healthcare Payers WHITE PAPER

Data Modeling in the Age of Big Data

A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud.

How To Build A Cloud Based Intelligence System

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

Protein Protein Interaction Networks

Architectures for massive data management

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

IC05 Introduction on Networks &Visualization Nov

Healthcare, transportation,

A Statistical Text Mining Method for Patent Analysis

Introduction to urika. Multithreading. urika Appliance. SPARQL Database. Use Cases

An In-Depth Look at In-Memory Predictive Analytics for Developers

Rackspace Cloud Databases and Container-based Virtualization

Big Data 101: Harvest Real Value & Avoid Hollow Hype

Where is... How do I get to...

By Evan Quinn, Senior Principal Analyst. This ESG White Paper was commissioned by YarcData and is distributed under license from ESG.

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA

Big Data and the Data Lake. February 2015

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Blazent IT Data Intelligence Technology:

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Parallel Data Warehouse

Chronon: A modern alternative to Log Files

> Semantic Web Use Cases and Case Studies

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

Big RDF Data Partitioning and Processing using hadoop in Cloud

Handling the Complexity of RDF Data: Combining List and Graph Visualization

Understanding the Value of In-Memory in the IT Landscape

Database Marketing, Business Intelligence and Knowledge Discovery

ISSN: International Journal of Innovative Research in Technology & Science(IJIRTS)

How To Make Sense Of Data With Altilia

Data Discovery, Analytics, and the Enterprise Data Hub

Transcription:

Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? David R. Holmes III, PhD Mayo Clinic College of Medicine Rochester, MN, USA Using Big Data in Healthcare Graph Databases and Graph Analytic Approaches David R. Holmes III ISPOR 19 th Annual Meeting June 2 nd, 2014 2014 MFMER slide-2

Teamwork Special Purpose Processor Development Group Barry Gilbert, Ph.D. Robert Techentin Center for Science of Healthcare Delivery Jeanne Huddleston, M.D. Nilay Shah, Ph.D. Rochester Epidemiology Project Jennifer St. Sauver, Ph.D. YarcData Steve Reinhardt Biomedical Imaging Resource Will and Charlie Mayo, The Mayo Brothers 2014 MFMER slide-3 Graph Analytics 2014 MFMER slide-4

What is a graph? A 2 Node 1 and Node 2 are related 1 Node 1 is forward related to Node 2 B 3 Node 1 is forward related to Node 2 and Node 3 Correlates Coffee Drinking Node 1 is forward related to Node 2 via Edge A. Node 1 is forward related to Node 3 via Edge B Smoking Causes Heart Attack Smoking is correlated with coffee drinking. Smoking may cause heart attacks. Smoking is a confounding variable. 2014 MFMER slide-5 Semantic Graphs / Databases Node-typed, edge-typed, directed graph Using the Resource Description Framework (RDF), we can describe each piece of information in the graph as a triple: <Subject> <Predicate> <Object> Correlates Coffee Drinking <Smoking> <corr. with> <Coffee Drinking> <Coffee Drinking> <corr. with> <Smoking> <Smoking> <causes> <Heart Attacks> Smoking Causes Heart Attack A semantic database is referred to as a triple-store (e.g. a collection of triples) Semantic Databases are queried using SPARQL (the semantic equivalent of SQL) Inferential rules and ontologies can be applied dynamically to the data to further enrich the dataset 2014 MFMER slide-6

Origins of Semantic Databases in Healthcare Mishelevich, David J. "MEANINGEX: a computer-based semantic parse approach to the analysis of meaning." (1971) "Semantic analysis of medical records." (1972) Initial notion of an ontology and semantic (i.e. noun phrase) representation of medical data Schmid, Hans Albrecht, and J. Richard Swenson. "On the semantics of the relational data model." (1975) Formalizing the graph-like nature of semantic data models 1970s 1980s 1990s 2000s... Lenz, Richard, Mario Beyer, and Klaus A. Kuhn. "Semantic integration in healthcare networks. (2007) 2014 MFMER slide-7 Benefits of Semantic Databases Semantic databases center around the users need to collect and interrogate the heterogeneous data Flexible Schema New variables can be added to the data model easily Data type agnostic New variables are added with indifference to variables already in the data model Expressability Ability to query the database in a flexible manner without regards for the specific data model Can dynamically apply inferential rules and ontologies Whole graph algorithms can be applied in order to find unique relationships between variables 2014 MFMER slide-8

Healthcare Semantification at Mayo Rochester Epidemiology Project (Population-based) Goal: Leverage the stable population to track health over time 500K Individuals, 40 year duration 2 M healthcare records Bedside Patient Rescue (In-hospital) Goal: Early Warning Systems (EWS) for patient events 115K patient encounters, 2 year duration 38M records (labs, nursing evals, etc.) 2014 MFMER slide-9 Rochester Epidemiology Project 2014 MFMER slide-10

2014 MFMER slide-11 2014 MFMER slide-12

2014 MFMER slide-13 Whole Graph Algorithm: Diffusion Algorithm Diffusion algorithm can find hidden relationships by exploiting connections in the semantic graph Initial values are attached to specific seed nodes Values propagate over graph edges, and accumulate in different parts of the graph Sometimes results are unexpected With a functioning graph diffusion algorithm, many possible searches can be performed For the REP, we can identify a representative example of cohort features and label the graph 2014 MFMER slide-14

2014 MFMER slide-15 Bedside Patient Rescue 2014 MFMER slide-16

2014 MFMER slide-17 2014 MFMER slide-18

2014 MFMER slide-19 2014 MFMER slide-20

Just one algorithm? No There are many whole graph algorithms which could be applied to healthcare data: PageRank Google-developed algorithms for weighting the edges to emphasize important nodes in a graph Peer-pressure clustering Graph-based cluster algorithm to find groups based on both node and edge data Betweeness-centrality Algorithm to determine key nodes in a graph which are most connected Clique detection Methods to find sub-graphs in a graph 2014 MFMER slide-21 2014 MFMER slide-22

Why doesn t everyone use Semantic Databases? Migrating relational databases to semantic databases can be tricky Graph databases suffer from missing data and noisy data just like relational databases Graph databases are large, and graph algorithms are complex 2014 MFMER slide-23 Migrating Relational Databases Relational DBs, by definition, are an efficient tabular storage of information. Care must be taken in developing a semantic model to ensure semantic richness Data must be promoted correctly to subjects/objects Predicates must be semantically meaningful Standard nomenclature must be used to be compatible 2014 MFMER slide-24

Missing and Noisy Data Missing data is just that missing. Graph algorithms need to be smarter about missing data. For example, Building latent variables into the data Using a priori models to address missing data Healthcare data is notoriously noisy Moreover, there is a lot of it Algorithms must be robust to noise and oversampling While pre-processing can address this, some useful information can be lost. Algorithms need to intelligently weight the data to draw meaningful conclusions. Connecting Two BPR Encounters 2014 MFMER slide-25 Graph Data is Large and Complex For decades, the community didn t have the computational resources to deal with semantic data efficiently. Technology developers were unable to pack enough memory into a computer to hold the data Networks were too slow As a result, CPUs were data starved New technologies address this issue specifically Hadoop clusters Graph computers 8192 threads, 2 TB memory 2014 MFMER slide-26

Progressively complex queries using graph computer vs standard SQL database 2014 MFMER slide-27 Final Thoughts Graph databases for healthcare were proposed in the 1970s. Over time, the conceptual model of graph databases / algorithms matured. Technology has finally caught up. The Jerry Springer Show The technical community is now prepared to accept massive amounts of healthcare data and store it semantically. Semantic graph databases change the way that we look at data. Graph analytics will yield new insights into existing and soon-to-be collected datasets. There are still challenges in data migration and data quality to be addressed. Harass your favorite computer scientist / informaticist to make progress in these areas. 2014 MFMER slide-28

2014 MFMER slide-29 Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? David R. Holmes III, PhD Mayo Clinic College of Medicine Rochester, MN, USA