Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? David R. Holmes III, PhD Mayo Clinic College of Medicine Rochester, MN, USA Using Big Data in Healthcare Graph Databases and Graph Analytic Approaches David R. Holmes III ISPOR 19 th Annual Meeting June 2 nd, 2014 2014 MFMER slide-2
Teamwork Special Purpose Processor Development Group Barry Gilbert, Ph.D. Robert Techentin Center for Science of Healthcare Delivery Jeanne Huddleston, M.D. Nilay Shah, Ph.D. Rochester Epidemiology Project Jennifer St. Sauver, Ph.D. YarcData Steve Reinhardt Biomedical Imaging Resource Will and Charlie Mayo, The Mayo Brothers 2014 MFMER slide-3 Graph Analytics 2014 MFMER slide-4
What is a graph? A 2 Node 1 and Node 2 are related 1 Node 1 is forward related to Node 2 B 3 Node 1 is forward related to Node 2 and Node 3 Correlates Coffee Drinking Node 1 is forward related to Node 2 via Edge A. Node 1 is forward related to Node 3 via Edge B Smoking Causes Heart Attack Smoking is correlated with coffee drinking. Smoking may cause heart attacks. Smoking is a confounding variable. 2014 MFMER slide-5 Semantic Graphs / Databases Node-typed, edge-typed, directed graph Using the Resource Description Framework (RDF), we can describe each piece of information in the graph as a triple: <Subject> <Predicate> <Object> Correlates Coffee Drinking <Smoking> <corr. with> <Coffee Drinking> <Coffee Drinking> <corr. with> <Smoking> <Smoking> <causes> <Heart Attacks> Smoking Causes Heart Attack A semantic database is referred to as a triple-store (e.g. a collection of triples) Semantic Databases are queried using SPARQL (the semantic equivalent of SQL) Inferential rules and ontologies can be applied dynamically to the data to further enrich the dataset 2014 MFMER slide-6
Origins of Semantic Databases in Healthcare Mishelevich, David J. "MEANINGEX: a computer-based semantic parse approach to the analysis of meaning." (1971) "Semantic analysis of medical records." (1972) Initial notion of an ontology and semantic (i.e. noun phrase) representation of medical data Schmid, Hans Albrecht, and J. Richard Swenson. "On the semantics of the relational data model." (1975) Formalizing the graph-like nature of semantic data models 1970s 1980s 1990s 2000s... Lenz, Richard, Mario Beyer, and Klaus A. Kuhn. "Semantic integration in healthcare networks. (2007) 2014 MFMER slide-7 Benefits of Semantic Databases Semantic databases center around the users need to collect and interrogate the heterogeneous data Flexible Schema New variables can be added to the data model easily Data type agnostic New variables are added with indifference to variables already in the data model Expressability Ability to query the database in a flexible manner without regards for the specific data model Can dynamically apply inferential rules and ontologies Whole graph algorithms can be applied in order to find unique relationships between variables 2014 MFMER slide-8
Healthcare Semantification at Mayo Rochester Epidemiology Project (Population-based) Goal: Leverage the stable population to track health over time 500K Individuals, 40 year duration 2 M healthcare records Bedside Patient Rescue (In-hospital) Goal: Early Warning Systems (EWS) for patient events 115K patient encounters, 2 year duration 38M records (labs, nursing evals, etc.) 2014 MFMER slide-9 Rochester Epidemiology Project 2014 MFMER slide-10
2014 MFMER slide-11 2014 MFMER slide-12
2014 MFMER slide-13 Whole Graph Algorithm: Diffusion Algorithm Diffusion algorithm can find hidden relationships by exploiting connections in the semantic graph Initial values are attached to specific seed nodes Values propagate over graph edges, and accumulate in different parts of the graph Sometimes results are unexpected With a functioning graph diffusion algorithm, many possible searches can be performed For the REP, we can identify a representative example of cohort features and label the graph 2014 MFMER slide-14
2014 MFMER slide-15 Bedside Patient Rescue 2014 MFMER slide-16
2014 MFMER slide-17 2014 MFMER slide-18
2014 MFMER slide-19 2014 MFMER slide-20
Just one algorithm? No There are many whole graph algorithms which could be applied to healthcare data: PageRank Google-developed algorithms for weighting the edges to emphasize important nodes in a graph Peer-pressure clustering Graph-based cluster algorithm to find groups based on both node and edge data Betweeness-centrality Algorithm to determine key nodes in a graph which are most connected Clique detection Methods to find sub-graphs in a graph 2014 MFMER slide-21 2014 MFMER slide-22
Why doesn t everyone use Semantic Databases? Migrating relational databases to semantic databases can be tricky Graph databases suffer from missing data and noisy data just like relational databases Graph databases are large, and graph algorithms are complex 2014 MFMER slide-23 Migrating Relational Databases Relational DBs, by definition, are an efficient tabular storage of information. Care must be taken in developing a semantic model to ensure semantic richness Data must be promoted correctly to subjects/objects Predicates must be semantically meaningful Standard nomenclature must be used to be compatible 2014 MFMER slide-24
Missing and Noisy Data Missing data is just that missing. Graph algorithms need to be smarter about missing data. For example, Building latent variables into the data Using a priori models to address missing data Healthcare data is notoriously noisy Moreover, there is a lot of it Algorithms must be robust to noise and oversampling While pre-processing can address this, some useful information can be lost. Algorithms need to intelligently weight the data to draw meaningful conclusions. Connecting Two BPR Encounters 2014 MFMER slide-25 Graph Data is Large and Complex For decades, the community didn t have the computational resources to deal with semantic data efficiently. Technology developers were unable to pack enough memory into a computer to hold the data Networks were too slow As a result, CPUs were data starved New technologies address this issue specifically Hadoop clusters Graph computers 8192 threads, 2 TB memory 2014 MFMER slide-26
Progressively complex queries using graph computer vs standard SQL database 2014 MFMER slide-27 Final Thoughts Graph databases for healthcare were proposed in the 1970s. Over time, the conceptual model of graph databases / algorithms matured. Technology has finally caught up. The Jerry Springer Show The technical community is now prepared to accept massive amounts of healthcare data and store it semantically. Semantic graph databases change the way that we look at data. Graph analytics will yield new insights into existing and soon-to-be collected datasets. There are still challenges in data migration and data quality to be addressed. Harass your favorite computer scientist / informaticist to make progress in these areas. 2014 MFMER slide-28
2014 MFMER slide-29 Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? David R. Holmes III, PhD Mayo Clinic College of Medicine Rochester, MN, USA