Data Science Solutions for Healthcare. Delivering Evidence-Based Insights and Decision Support



Similar documents
ACCOUNTABLE CARE ANALYTICS: DEVELOPING A TRUSTED 360 DEGREE VIEW OF THE PATIENT

Big Data Analytics in Health Care

I n t e r S y S t e m S W h I t e P a P e r F O R H E A L T H C A R E IT E X E C U T I V E S. In accountable care

Values of Healthcare

Achieving meaningful use of healthcare information technology

What is your vision of population/public health practice in an era when the health care of all Americans is supported by EHRs?

Better planning and forecasting with IBM Predictive Analytics

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Introduction to Management Information Systems

The Six A s. for Population Health Management. Suzanne Cogan, VP North American Sales, Orion Health

Eligible Professionals please see the document: MEDITECH Prepares You for Stage 2 of Meaningful Use: Eligible Professionals.

How To Use Predictive Analytics To Improve Health Care

Find the signal in the noise

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Leveraging Big Data for the Next Generation of Health Care Ken Cunningham, VP Analytics Pam Jodock, Director Business Development

WHITE PAPER. QualityAnalytics. Bridging Clinical Documentation and Quality of Care

Leveraging EHR to Improve Patient Safety: A Davies Story

Making critical connections: predictive analytics in government

TRUSTED PATIENT EDUCATION FOR BETTER OUTCOMES. MICROMEDEX Patient Connect. Patient Education & Engagement

Meaningful Use and Engaging Patients: Beyond Checking the Box

BEYOND THE EHR MEANINGFUL USE, CONTENT MANAGEMENT AND BUSINESS INTELLIGENCE

Why Electronic Health Records are Ill-Suited for Population Health Management An InfoMC White Paper January 2016

Clintegrity 360 QualityAnalytics

HL7 and Meaningful Use

Data Mining Applications in Higher Education

Health Home Performance Enhancement through Novel Reuse of Syndromic Surveillance Data

Advanced Solutions for Accountable Care Organizations (ACOs)

Meaningful Use. Goals and Principles

Version 1.0. HEAL NY Phase 5 Health IT & Public Health Team. Version Released 1.0. HEAL NY Phase 5 Health

A Roadmap for Population Health: Best Practices for Achieving Operational Alignment

THE ACTIVELY CONNECTED PHYSICIAN

Commonwealth of Virginia

Overview. For Clinical Research and Clinical Trial Data Management. the protocol centric information system

The Scientific Data Mining Process

An Essential Ingredient for a Successful ACO: The Clinical Knowledge Exchange

Overview of MU Stage 2 Joel White, Health IT Now

HIMSS Interoperability Showcase 2011

TRUVEN HEALTH UNIFY. Population Health Management Enterprise Solution

Healthcare Measurement Analysis Using Data mining Techniques

Using EHRs, HIE, & Data Analytics to Support Accountable Care. Jonathan Shoemaker June 2014

Big Data Integration and Governance Considerations for Healthcare

Big Data Analytics- Innovations at the Edge

Clinical & Business Intelligence: An Analytics Executive Review

Visualization methods for patent data

TRUVEN HEALTH UNIFY. Population Health Management Enterprise Solution

Transforming Healthcare in Emerging Markets with EMR adoption

HL7 and Meaningful Use

SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care

Collaboration is the Key for Health Plans in a Shared Risk Environment

Creating a national electronic health record: The Canada Health Infoway experience

Database Marketing, Business Intelligence and Knowledge Discovery

SOLUTION BRIEF. SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care

The Evolving Comparative Analytics Market:

How To Create A Health Analytics Framework

New York ehealth Collaborative. Health Information Exchange and Interoperability April 2012

Exploration and Visualization of Post-Market Data

HL7 & Meaningful Use. Charles Jaffe, MD, PhD CEO Health Level Seven International. HIMSS 11 Orlando February 23, 2011

Public Health 101 Series

Healthcare Data Management

Presenters. How to Maximize Technology to Improve Care and Reduce Cost 9/17/2015

Empowering ACO Success with Integrated Analytics

A Population Health Management Approach in the Home and Community-based Settings

REAL-TIME INTELLIGENCE FOR FASTER PATIENT INTERVENTIONS. MICROMEDEX 360 Care Insights. Real-Time Patient Intervention

Active AnAlytics: Driving informed Decisions leading to Better clinical AnD financial outcomes

Meaningful Use - HL7 Version 2

Electronic Medical Records vs. Electronic Health Records: Yes, There Is a Difference. A HIMSS Analytics TM White Paper. By Dave Garets and Mike Davis

Data Driven Healthcare: the Canadian experience June 3, 2015

HIMSS Interoperability Showcase 2011

Impact Intelligence. Flexibility. Security. Ease of use. White Paper

A STRATIFIED APPROACH TO PATIENT SAFETY THROUGH HEALTH INFORMATION TECHNOLOGY

The Evolving Comparative Analytics Market:

Meaningful Use. Medicare and Medicaid EHR Incentive Programs

Opportunities and challenges for public health surveillance: a new world of interoperability with electronic health records

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Meaningful Use as a Driver for EHR Adoption

Big Data and Healthcare Payers WHITE PAPER

Big Data 101: Harvest Real Value & Avoid Hollow Hype

Medical Big Data Interpretation

Draft Pan-Canadian Primary Health Care Electronic Medical Record Content Standard, Version 2.0 (PHC EMR CS) Frequently Asked Questions

7CHAPTER EXAMINATION/ ASSESSMENT NOTES, GRAPHICS, AND CHARTS

Healthcare data analytics. Da-Wei Wang Institute of Information Science

Public Health Reporting Initiative Functional Requirements Description

Numerical Algorithms Group

SPATIAL DATA CLASSIFICATION AND DATA MINING

Data Conversion Best Practices

IBM Content Analytics with Enterprise Search, Version 3.0

EMDEON INTERACTIVE CARE MANAGEMENT

Data Analytics: The Next Wave in Health IT. Big Data Conference June 17, Robert Wah, MD. Global Chief Medical Officer CSC

Data Mining for Successful Healthcare Organizations

Innovation Metrics: Measurement to Insight

Master of Science in Healthcare Informatics and Analytics Program Overview

Making Critical Connections: Predictive Analytics in Government

3M Health Information Systems. Take action: How predictive analytics can help you improve healthcare value

Tools for Managing and Measuring the Value of Big Data Projects

Meaningful Use Stage 2 Certification: A Guide for EHR Product Managers

Comments from ISDS on proposed rule for meaningful use March 15, 2010 Introduction Structure of comments

Meaningful Use - HL7 Version 2

Stakeholder Guide

Health IT: Better Information for Better Decisions. HSE Healthcare Leaders Masterclass April Robert Wah, MD

Transcription:

Data Science Solutions for Healthcare Delivering Evidence-Based Insights and Decision Support

Table of Contents Executive Summary... 3 The Data Challenge for Healthcare... 4 On the Nature of Raw Data... 6 Data Science Framework and Process.... 8 Use Case 1: Public Health Influenza Data Sharing... 12 Use Case 2: Analysis of Congestive Heart Failure Readmission Rates... 14 Conclusion... 16 2 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

Executive Summary The cost of healthcare in the United States is growing at an unsustainable rate. In 2010, U.S. healthcare spending was $2.58 trillion, or 17% of U.S. gross domestic product (GDP) a figure that on its own would be the fifth-largest economy in the world. By 2020, the percentage of GDP spent on healthcare is estimated to rise to nearly 20%. In this white paper, Leidos (formerly SAIC-maxIT- Vitalize) focuses on emerging capabilities in data science that can identify useful, actionable and perhaps unanticipated recommendations to improve healthcare delivery from large volumes of diverse and complex data. We present use case studies to examine some of these capabilities in action. Interoperable electronic health records (EHRs) for patient care hold tremendous potential to reduce the growth in costs. EHRs can help healthcare organizations improve chronic disease management, increase operating efficiencies, transform their finances, and improve patient outcomes. However, EHR implementations are in various stages of maturity across the country, and their benefits have not been fully realized. One of the primary challenges healthcare decision-makers face is how to make meaningful use of the data collected, available, and accessible in EHRs. By optimizing the use of data accessible in EHRs, we can uncover hidden relationships and identify patterns and trends in this diverse and complex information to improve chronic disease management, increase operating efficiencies, and transform healthcare organizations finances. 3 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

The Data Challenge for Healthcare The cost of healthcare in the United States is growing at an unsustainable rate. It is already the most costly in the world, accounting for 17% of U.S. gross domestic product in 2010, with estimates predicting that percentage will grow to nearly 20% by 2020. 1 Considered on its own, the $2.58 trillion spent annually would be the fifth-largest economy in the world. Interoperable electronic health records (EHRs) comprising structured lab records with clinical test results as well as unstructured records of clinicians notes are only beginning to play their expected role in reducing this growth rate. As more hospitals reach Stage 6 and Stage 7 of the HIMSS EMR Adoption Model, substantial benefits are possible, 2 including healthcare delivery improvements in several core areas: optimizing resource productivity, including clinical and administrative labor and equipment scheduling; reducing the number of adverse drug events; and managing the revenue cycle. 3 EHRs hold the potential to improve chronic disease management, increase operating efficiencies, and transform healthcare organizations finances through a combination of enhanced data management, knowledge development, knowledge use, and practice change. However, these benefits require that healthcare leaders be able to take action and make meaningful use 4 of the data collected, available, and accessible in EHRs. The cost of healthcare in the United States is growing at an unsustainable rate. Evolution of the Electronic Health Record EHRs have typically contained hard objective data in digital form from the analysis of lab tests, including numerical (both discrete and continuous) and categorical valuations (e.g., normal, above normal, below normal). More recently, providers are collecting many more structured attributes, such as vital signs, demographic information, and prescription histories, in-patient EHRs and are tracking these data over time. These structured data are often supplemented with unstructured (freeform) text, such as notes from clinicians, hospital physicians and nurses, and patients primary care physicians and specialists. EHRs also routinely include unstructured images, such as X-rays and MRI scans, and genomic data, all of which are examined by specialists, who append summary notes to the EHR. We anticipate that EHRs will further expand in scope to include content from a larger team of providers, including in-home caregivers, community social workers, family members, and the patients themselves. 1. https://www.cms.gov/research-statistics-data-and-systems/statistics-trends-and-reports/nationalhealthexpenddata/downloads/proj2010.pdf 2. EMR Benefits and Benefit Realization Methods of Stage 6 and 7 Hospitals: Hospitals with Advanced EMRs Report Numerous Benefits. February 2012. http://www.himssanalytics.org/research/assetdetail.aspx?pubid=79509&tid=122 3. Francois M. Laflamme, Wayne E. Pietraszek, and Nilesh V. Rajadhyax, Reforming Hospitals With IT Investment, McKinsey on Business Technology, Number 20, Summer 2010, pp. 27-33. http://www.mckinseyquarterly.com/reforming_hospitals_with_it_investment_2653 4. http://www.healthit.gov/policy-researchers-implementers/meaningful-use 4 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

From Raw Data to Decision-Making The variety and complexity of EHR data creates two fundamental challenges: collecting, storing, and providing access to EHR data, as well as exploring, inspecting, and consuming those data to uncover hidden relationships, identify trends and patterns, and improve patient outcomes. The first challenge is far from solved conclusively, especially as it needs to address legal and regulatory requirements that themselves may change. However, much progress has been made in developing and deploying robust EHR systems in real-world healthcare practice across the United States and throughout the world. This white paper focuses on the second challenge transforming the raw data accessible in EHRs into improved understanding of patients health, healthcare practices, and opportunities to enhance both. As Marchand and Peppard 5 note, simply deploying an IT data system does not inherently indicate how to use the information it generates to make better decisions or gain deeper and perhaps unanticipated insights into key aspects of the business. It is the exploration of these data, and subsequent explanation of the relevant exploratory findings, that is essential to effective decision-making and subsequent implementation of an action plan. We explore emerging capabilities in data science that can extract useful and actionable information from large volumes of diverse and complex data. We also use case studies to examine some of these capabilities in action. What Is Data Science? The term data science has been used in business for nearly a decade, encompassing the principally statistical analysis of historical, structured data. More recently, business intelligence has expanded to include data collected from social websites, such as determining purchase trends. However, it is only in the last few years that the term has been extended to define a broader field of study. 6 Leidos defines data science as the discipline of deriving relevant, actionable knowledge from raw data. These data are often collected for other purposes, such as facilitating business operations or complying with laws and regulations. This challenge has become more pervasive and problematic with the advances of technologies that collect, store, and manage very large, complex datasets. Data science is distinct from the general concept of big data in that it recognizes the increasingly tight coupling among three skill areas: (1) strong domain expertise and knowledge of the available data resources; (2) analytical skills and techniques, often automated and based on mathematics, statistics, and machine learning; and (3) computer science and programming skills. All three capabilities are required to work with large datasets that are not amenable to traditional processing by database management systems. Data science speaks to the process of extracting relevant insight from data through the use of technologies and tools, not simply the technologies and tools themselves. 5. Marchand D, Peppard J. Why IT Fumbles Analytics. Harvard Business Review. Jan-Feb 2013. http://hbr.org/2013/01/why-it-fumbles-analytics/ar/1 6. The late Jim Gray coined a new term for the development of scientific knowledge, the Fourth Paradigm of scientific research, the seminal paper leading to http://oreilly.com/data/radarreports/what-is-data-science.csp 5 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

On the Nature of Raw Data Opportunities to derive value from data depend to some extent on the nature of the raw data themselves. A dataset and the individual records within it can be characterized in several ways, including volume, velocity, and variety the 3Vs 7 model commonly used to describe big data. Type The data in a record or dataset may be text strings or numeric values of various kinds, such as real or imaginary numbers, expressed separately or in arrays or matrices. Data arrays may be indexed or timesequenced to form ordered data such as image, video, and audio records. Structure Any record or data element in a record may be unrelated to others (unstructured data), related in a fixed and formally defined way such as a hierarchy or relational linkage (structured data), or related in ways specific to the individual record or dataset (semistructured data). Volume Volume refers both to the number of records in a dataset and the size of an individual record. Roughly speaking, small datasets are those that can be managed, accessed, and analyzed with traditional file-based and commodity applications. Medium datasets require well-understood technologies, such as relational database management systems. Large datasets need specialized data management tools for analysis, storage and maintenance that must coordinate across processors and storage platforms. Note that the actual sizes of small, medium, and large datasets continue to change as data storage and retrieval technology advances. 8 Velocity Velocity refers to how often new data arrive for collection, storage, and analysis, as well as how much data arrive at any one instant. High-velocity applications such as retail purchase transactions may require specialized tools to adequately handle incoming floods of data. Variety Opportunities to derive value from raw data depend on the nature of the data. Variety describes both the number of datasets and the number of data types employed in analytics. Aggregating, aligning, linking, and correlating information across highly diverse datasets and data types is an ongoing challenge in data science for all applications, including emerging technologies to address the alignment of patient data across multiple EHRs. 7. http://www.gartner.com/newsroom/id/1731916 8. For example, decoding the human genome originally took 10 years to process; now it can be achieved in one week, which depends on new technologies with improved volume storage and retrieval velocity performance. 6 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

Completeness, Richness, and Balance These factors indicate how fully the dataset represents a desired population, activity, or other variable of interest. Completeness is the fraction of a given population represented within the dataset. Richness refers to the amount of data available for each entity (e.g., an individual or organization). Balance is a measure of how prevalent a given phenomenon is (or is estimated to be) in the dataset; for example, in a healthcare analytics context, a dataset including all in-hospital patients would have a different readmission rate than a dataset of patients with chronic diseases, since comorbidities are presumably more prevalent in the latter dataset. Labeling Labeling indicates to what extent objects, events, and other phenomena of interest are explicitly marked, e.g., an X-ray image with notes attached from a radiologist indicating whether a tumor was malignant or benign. Healthcare Data Features and Implications The data collected in healthcare operations exhibits tremendous variety and complexity, but fairly low velocity compared to extremely highvelocity applications such as credit-card processing. Similarly, although adoption of EHRs generates an unprecedented volume of data in healthcare delivery, the volume can be accommodated by technologies developed for much larger datasets managed in transactional systems, such as airline passenger data repositories. Because healthcare data are so diverse and information-rich, the innovative, yet relevant, use of analytics must be a key aspect of decision-makers approach to understanding and intelligently using their data. Simple and common data queries and matching of explicitly labeled features (typical firstorder approaches) are prohibitively time and resourceconsuming, and quite often miss a large percentage of the useful intelligence that lies in healthcare data, waiting to be revealed and acted upon. 7 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

Data Science Framework and Process Leidos has developed, implemented, and refined a framework for the practice of data science through extensive experience applying data science techniques to real-world problems in healthcare, national security, intelligence, commerce, and other fields. Success in these distinct domains relies on a framework of practice that is distinct from the general concept of big data in that it recognizes the tight coupling among three skill areas: ffstrong domain expertise and knowledge of the available data resources ffanalytical skills and techniques, based on mathematics, statistics, and machine learning ffcomputer science and programming skills. This framework comprises four fundamental stages, each of which is underpinned with a firm understanding of the business or clinical problem and healthcare delivery environment, though they may not proceed entirely linearly. As depicted in figure 1, the data science framework of Leidos is founded in resolving a business problem identified by decision-makers and results in actionable insight tied to this problem. It is not uncommon to proceed through each stage to establish a basic data science workflow, then revisit earlier stages to accommodate shifts in the problem definition or environment resources, new data sources, additional analyses, and explore new information. Data Collection Collection Data collection encompasses the activities from the moment raw data are gathered from their originating sources to the time of their placement (possibly in a transformed state) in repositories appropriate for storage, retrieval, and analysis. It also includes the planning and systems required for ongoing management and security of the collected data. Explanation INSIGHT AND ACTION Curation Data Curation Data curation, where data are prepared for meaningful analysis, is perhaps the most important step in determining whether a data science project has the potential to produce measurable business value. Analysis Figure 1. Data Science Framework The data science process should result in actionable insight for business problems. This stage includes, but is much broader than, the traditional extraction, transformation, and load (ETL) methods of both commodity and custom-developed big data management systems. Specifically, the first phase of data curation involves standardizing the raw data to uniform specifications. This typically involves data cleansing, in which potential errors (e.g., data corrupted in collection, data values outside meaningful or permissible ranges) are repaired 8 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

and/or removed from the dataset, as well as reformatting and conversion of data values to standard units and formats. It may also include transformations such as conversion of audio files to text transcripts. The second phase of data curation prepares standardized data for analysis. Data of different types and from different sources may be merged, aligned and linked, or aggregated into new records specific to an activity, process, or organization. Large datasets may be sampled to create smaller datasets that are more amenable to automated analysis and/ or to correct statistical imbalances in the full dataset. Sampling may be inappropriate for some analysis activities, such as rare event or anomaly detection. The dataset may be further enriched by the addition of new features generated from collected data. These features may be as simple as summary statistics (e.g., mean and standard deviation) computed from a numerical dataset, or as complex as a map generated from cellphone data of a criminal suspect s locations leading up to the commission of a crime, or medication data of pharmacy locations that visualizes this one factor of patient engagement. Enrichment may also generate new representations of data, such as a network graph of clinical protocols and medications administered among patients created from clinician order entry records. Alternate data representations enable privacy protection, especially when an analytics project is not allowed the same access to data as the process that originated the data. Computing these features is a key part of data curation. It relies on data scientists ability to explore and understand the data from both mathematical and business-problem-specific perspectives. They will use tools including SQL queries, off-the-shelf statistical packages, and multidimensional visualization applications to explore the data thoroughly, looking at both individual records and summary statistics and visualizations. At the end of this stage, counts and lists of sets of prearranged record attributes can be gathered for direct communication with a data analyst or processed by more sophisticated analysis methods intended to generate findings that detect, identify, and explain previously unknown patterns. Finally, data curation is where privacy protection is implemented, especially when an analytics project is not allowed the same access to data as the process that originated the data. Data Analysis This stage includes identification, selection, and execution of algorithmic and computational models and methods. It typically involves extension and novel integration of existing analytical approaches rather than the creation of new analytical methods. Successful data analysis solutions do not take a kitchen sink approach of applying analysis methods indiscriminately and hoping for good outcomes. Instead, they are driven by the organization s need for insightful and actionable intelligence and focus on producing relevant and accurate evidence informing such action. 9 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

As a general category, business intelligence analysis may use any of the many types of analysis to answer the question What is going on in my business today? It commonly focuses on obtaining answers to prescribed questions (e.g., How is my administrative budget tracking against plan? ) and/or measuring known performance criteria (e.g., readmission rates), rather than searching for novel or unknown meaning in the data. The paragraphs below describe several common analysis methodologies. ffinformation Retrieval Information retrieval uses direct manual or automated queries of a dataset to answer the question What in this dataset is relevant to the query? It typically relies on matching the query to explicitly represented features of the data (e.g., a patient s name) without inference or prediction based on hidden features (e.g., an undiagnosed disease that could be inferred from the patient s chart). ffclassification This method identifies to which of a prescribed set of classes a newly observed data point belongs. The classes may be explicitly defined by users, or may be learned by a classification algorithm based on other features of the data. ffclustering Clustering is similar to classification, but labels data points based on their similarity (in one or more features) with each other. Data points are then grouped into likely classes based on metrics that may be deterministic (e.g., hierarchical and k-means), statistical (e.g., multivariate distributions and likelihood maximization), or network graphbased (e.g., centrality and between-ness). ffpattern Recognition and Matching Pattern matching seeks to identify trends, sequences, and patterns in datasets. The patterns may be explicitly defined in advance, learned directly from data, and/or manually refined after initial learning from data. ffanomaly Detection Anomaly detection is the reverse of pattern matching looking for outliers from a known pattern, rather than the pattern itself. Anomalies may be structural, in which certain entities behavior is significantly different from that of peer entities (e.g., finding patient zero in a disease outbreak or fraudulent insurance claim filings), or temporal, when entities behavior changes significantly over time (e.g., the methods used by health payer companies to flag potentially fraudulent claims). ffpredictive Modeling Predictive modeling not only attempts to forecast future events, it also seeks to identify ways to optimize future outcomes (e.g., comparing a patient outcome forecast with and without a given medical intervention). The underlying characteristics of the original data determine the types of meaningful predictions and optimizations that are possible to generate. f f Multirelational Network Analysis Multirelational networks explore the connections and relationships among data points and entities represented in the data. Network analysis typically focuses on social networks, which analyze links among groups of people (e.g., provider referrals, patient satisfaction), and semantic networks, in which connections occur between types of knowledge (e.g., relationships between medical literature and the concepts and conclusions contained in clinicians notes). 10 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

Explanation via Interpretation and Communication of Analytic Findings The end game of a decision support solution based on data science practices is to provide decisionmakers with the intelligence they need to take action. This can happen only if leaders have timely access to clear, accurate, and relevant data analytic conclusions in a form that aligns with their missions, needs, and expectations. Communicating analytic conclusions may employ any of the following strategies. An example would be application of forensic analysis, for example longitudinal analysis of a drug s efficacy in clinical trials, to understand and explain how the drug might affect readmission rates, with the goal of informing a hospital s choice to incorporate it into their standard of care. ffdata Summarization Summarization distills key analytic findings relevant to a project s objective and aligned with the consumer s role and experience into a simpler and more concise form than a full report essentially an executive summary. Analytic summaries should focus on the key aspects of analytic findings while enabling examination of the underlying evidence and analysis logic. ffsynthesis of Analytic Findings Synthesis unites seemingly separate findings, whether complementary or conflicting, to explain their combined effect on the decision-making process. Given the diversity of data types and levels of accuracy, timeliness, and relevance, synthesis can be the most difficult to accomplish effectively. However, it is crucial to leaders understanding, confidence, and ability to select appropriate actions, including a decision that the data are insufficient to warrant a change to current operations. ffnarrative Creation Narratives place analytic findings in a larger context relevant to the target audience(s), who may have roles, experience, and perspectives different from data scientists, or even leaders of the organization performing data science efforts. An example would be an explanation of how decreasing readmission rates for a particular medical condition would likely affect (positively or negatively) not just the hospital where care is given but also the surrounding community. ffforensic Analysis Forensic analysis examines historical data to develop analytic findings that may provide insights on future plans over multiple time scales. Longitudinal analysis of a drug s efficacy in clinical trials, with the goal of informing a hospital s choice to incorporate it into their standard of care, is an example of forensic analysis. f f Visualization Visualization creates image-based static or dynamic representations of data and analytic findings (e.g., line charts, histograms, maps, network diagrams) to complement direct examination of the data themselves. These visual representations may also have interactive elements that enable viewers to select data subsets, observe data patterns over time, and otherwise perform basic analysis. Visualization requires care in selecting data types, as well as the cultural tolerance and acceptance of visuals as actionable evidence, but can be a powerful method of communicating findings and recommendations. 11 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

USE CASE 1: Public Health Influenza Data Sharing Homeland Security Presidential Directive 21 9 states the necessity of establishing a national biosurveillance system for human health that integrates state, regional, and community-level capabilities. To better support early detection and rapid response to major infectious disease outbreaks or pandemics, the Centers for Disease Control (CDC) wanted to expand the real-time exchange of clinical health information among health information exchanges (HIEs) and local, state, and federal public health organizations. CDC awarded Leidos a contract to design and develop an integrated HIE to connect public health with HIEs to improve public health s real-time understanding of communities population health and healthcare facility status. 10 Leidos led a team of bioinformaticists, systems engineers, HIE operators, and subject matter experts in state and local public health requirements to assemble the Northwest Public Health Information Exchange (NW-PHIE) to help achieve this goal. A key challenge was establishing a real-time, multidirectional information flow across federal, state, and local jurisdictions. Until this project, biosurveillance platforms such as RODS 11, ESSENCE 12, and BioSense 13 collected de-identified biosurveillance information, but did not share or visualize these data across platforms. No syndromic surveillance data feeds were being sent to public health from hospitals in central and eastern Washington State. The starting points for NW-PHIE s Collection syndromic surveillance requirements were the American Health Informatics Community (AHIC) Biosurveillance Use Case and the MBDS. NW-PHIE augmented the AHIC requirements with additional requirements from local and state public health agencies. The Leidos team developed a data format specification that described how the highly diverse clinical data would be transmitted to public health. We followed best practices and industry standards, such as those from HL7. Leidos next mapped the data fields in Curation the HL7 messages to the clinical care information provided by the various data sources to meet the data output specifications. Through these alignments, information for a single patient is split into many discrete biosurveillance HL7 messages triggered by events at a healthcare facility, such as registration, lab order, lab result, etc. To make clinical data useful for population health purposes, several analyses were Analysis needed. First, individual messages are reassembled into a longitudinal, comprehensive view of a visit encounter. Next, the encounters are classified according to surveillance criteria, counted over a fixed time interval, and paired with an appropriate denominator, such as total visit volume or catchment population. From these data, absolute counts and rates of illness within the patient population are obtained. 9. http://fas.org/programs/bio/resource/documents/hspd-21.pdf. 10. M. Trebatoski, J. Davies, D. Revere, D. Dobbs, Methods for Leveraging a Health Information Exchange for Public Health: Lessons Learned from the NW-PHIE Experience, Journal of Public Health Informatics, ISSN 1947-2579 * http://ojphi.org * Vol.2, No. 2, 2010. 11. Real-time Outbreak and Disease Surveillance Laboratories (RODS), https://www.rods.pitt.edu/site/ 12. The ESSENCE II Disease Surviellance Test Bed for National Capital Area, Jospeh Lombardo, http://www.jhuapl.edu/techdigest/td2404/lombardo. pdf. 13. http://www.cdc.gov/biosense 12 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

To assist public health departments in evaluating the clinical data, we developed Explanation tools and algorithms to help visualize data and support the development of appropriate public health responses. The resulting analysis is fed back to clinicians via health alerts, investigation findings, and case criteria updates. Now, thanks to NW-PHIE, 16 hospitals (representing close to a million patient encounters a year) are sending a full set of syndromic surveillance data to public health. These feeds proved to be vitally important in helping manage the 2009 H1N1 outbreak and are available to help manage future disease outbreaks. 13 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

USE CASE 2: Analysis of Congestive Heart Failure Readmission Rates Patients with a diagnosis of congestive heart failure (CHF) pose a challenge to health delivery systems. According to the Centers for Medicaid & Medicare Services (CMS), the rate of readmission within 30 days of discharge for CHF patients ranges from 18.2% to 33.8% 14, compared to the average readmission rate of approximately 24.8% for all patients in the United States. CMS views these variations as an opportunity to decrease the rate of CHF readmission. Beginning in October 2012, CMS has penalized payments to health systems that exceed the average rates of readmission. Other diagnosis categories, such as hypertension, also trigger penalties from CMS when readmission rates exceed prescribed norms. To address this business challenge, Leidos conducted a research and development effort to study whether our information-based health approach to clinical challenges could be used to create a model that could calculate an accurate CHF readmission probability based on a broad set of clinical and non-clinical factors. A multidisciplinary team including experts in commercial and federal healthcare, health informaticists, physicians, scientists, statisticians, and mathematicians conducted initial research into causes of CHF readmission.they identified three types of factors that may affect readmission of CHF patients: patient-specific factors (clinical and demographic), hospital-specific factors (practices and continuity), and socioeconomic factors (support and access). Our team used these SME findings to develop our pilot prediction model for Collection CHF readmission rates, in the form of a Bayesian inference network (INET) model with multiple nodes and relationships (links). Each node depicted on a network graph represented a grouping of data that affects CHF readmission (directly or indirectly), such as whether a patient has extensive comorbid conditions or a social support system after discharge. The model, which includes multi-path relationships (chains of inference), then generates predictive values by statistically weighting and combining the complementary or competing chains of inference. Curation Note that this use case focuses on analysis; requirements for data collection and curation would need to be considered based on a specific implementation of this proof-of-concept experiment. However, in preparation for a decision on technology investment into data Analysis collection and curation, the team also investigated whether electronic patient data were available to populate the model s parameters directly. Figure 2 illustrates the availability of EHR data in our statistical analysis of more than 5,000 CHF patient encounters in a 10-hospital system between 2010 and 2011. 14. CMS Hospital Compare Database, January 26, 2012, http://www.hospitalcompare.hhs.gov/hospital-search.aspx 14 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

# OF PATIENT FACTORS 50 40 30 20 10 0 Availability of EHR Data for Patient Factors 6 10 6 Not Collected Unstructured Text Discrete/Coded 32 26 4 6 6 All Typically Collected Sometimes Collected Not Collected FACTOR COLLECTION CATEGORY Figure 2. Availability of EHR Data for Patient Factors Having constructed the network and populated it with data, the team Explanation investigated a set of outcomes for three use cases identified by the team s clinical SMEs, using historical hard CHF data and subjective, but expert, judgments. The focus group members reviewed the outcomes against their own CHF readmission experience and concluded that the INET model would be a useful decision-support tool that provides insight for hospital clinical and administrative staff members responsible for identifying CHF patients at risk prior to discharge. The INET model s findings including outcome likelihood (chance the patient would readmit) and top N indicators that, if addressed correctly, would reduce the readmission were presented to a focus group consisting of physicians, executive leaders, informaticists, and statistical analysts from a 10-hospital university health system and a large, innercity safety net hospital. 15 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

Conclusion Healthcare organizations are transitioning to EHRs en masse for the clear benefits they offer in improving collaboration and information sharing, reducing inefficiency and administrative effort, and ultimately improving patient care. Implementation progress varies widely, but many organizations are already realizing some level of benefit simply by using EHRs instead of paper records. As healthcare organizations grow, adapt, and serve changing populations, the data science mix of methods can be expanded or tailored (including scaled-out methods to meet big data needs) to keep delivering benefits. At every point, data science can be harnessed in different ways to derive useful, actionable information from the complexity of data in EHRs. Data science is a deep, flexible, customizable next level of opportunity for organizations using EHRs. Healthcare decision-makers can choose the depth, breadth, and complexity of the data sharing, analytics, and explanation tools they deploy based on their missions, budgets, and patients needs. 16 DATA SCIENCE SOLUTIONS FOR HEALTHCARE

ABOUT LEIDOS Leidos is a FORTUNE 500 science and technology solutions leader working to address some of the world s toughest challenges in national security, health and engineering. The Company s 20,000 employees support vital missions for our government and the commercial sector, develop innovative solutions to drive better outcomes and defend our Nation s digital and physical infrastructure from new world threats. Headquartered in Reston, Virginia, Leidos reported annual revenues of approximately $5.77 billion for its fiscal year ended January 31, 2014 after giving effect to the spin-off of the company's technical services and information technology business. For more information, visit www.leidos.com. FOR MORE INFORMATION Julie A. Rosen, Ph.D. Chief Scientist, Leidos Health Julie.A.Rosen@leidos.com Visit us at www.leidos.com/health 15-0350 17 DATA SCIENCE SOLUTIONS FOR HEALTHCARE