Exploring the Challenges and Opportunities of Leveraging EMRs for Data-Driven Clinical Research

Similar documents
Find the signal in the noise

Spend Enrichment: Making better decisions starts with accurate data

Supporting Clinical and Translational Research with Informatics

Health Information Exchange. Scalable and Affordable

IBM's Fraud and Abuse, Analytics and Management Solution

REAL-TIME INTELLIGENCE FOR FASTER PATIENT INTERVENTIONS. MICROMEDEX 360 Care Insights. Real-Time Patient Intervention

Analance Data Integration Technical Whitepaper

The What, When, Where and How of Natural Language Processing

EMC DOCUMENTUM CONTENT ENABLED EMR Enhance the value of your EMR investment by accessing the complete patient record.

Transformational Data-Driven Solutions for Healthcare

How To Turn Big Data Into An Insight

Tapping the benefits of business analytics and optimization

Analance Data Integration Technical Whitepaper

How To Manage Log Management

I n t e r S y S t e m S W h I t e P a P e r F O R H E A L T H C A R E IT E X E C U T I V E S. In accountable care

CA Service Desk Manager

Managing Product Variants in a Software Product Line with PTC Integrity

ACCOUNTABLE CARE ANALYTICS: DEVELOPING A TRUSTED 360 DEGREE VIEW OF THE PATIENT

Three proven methods to achieve a higher ROI from data mining

Achieving meaningful use of healthcare information technology

Next Generation Business Performance Management Solution

How to Conduct a Thorough CAC Readiness Assessment

Centricity Practice Solution An integrated EMR and Practice Management system

Open is as Open Does: Lessons from Running a Professional Open Source Company

Increasing business values with efficient Software Configuration Management

Big Data 101: Harvest Real Value & Avoid Hollow Hype

WHITE PAPER. QualityAnalytics. Bridging Clinical Documentation and Quality of Care

making a difference where health matters Canadian Primary Care Sentinel Surveillance Network

Open Platform. Clinical Portal. Provider Mobile. Orion Health. Rhapsody Integration Engine. RAD LAB PAYER Rx

Integrated archiving: streamlining compliance and discovery through content and business process management

Principal MDM Components and Capabilities

IBM Software Enabling business agility through real-time process visibility

Not all NLP is Created Equal:

IBM WebSphere ILOG Rules for.net

Galen Healthcare Solutions

Beyond the Data Lake

HITEKS REAL- TIME SOLUTIONS FOR REAL- LIFE PROBLEMS

Getting started with a data quality program

GE Healthcare. Size it up. Centricity cardiovascular PACS solution

VIII. Dentist Crosswalk

IBM SECURITY QRADAR INCIDENT FORENSICS

Putting IBM Watson to Work In Healthcare

The Challenge of Implementing Interoperable Electronic Medical Records

Clintegrity 360 QualityAnalytics

AAP Meaningful Use: Certified EHR Technology Criteria

Predictive Intelligence: Identify Future Problems and Prevent Them from Happening BEST PRACTICES WHITE PAPER

Meeting the challenges of today s oil and gas exploration and production industry.

Integration for your Health Information System

HGST Object Storage for a New Generation of IT

americanehr.com A Report by AmericanEHR Partners October 2011

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

Public Health Reporting Initiative Functional Requirements Description

Healthcare Content Management: Achieving a New Vision of Interoperability and Patient-Centric Care

Technical Management Strategic Capabilities Statement. Business Solutions for the Future

Transforming Insurance Risk Assessment with Big Data: Choosing the Best Path

Exploration and Visualization of Post-Market Data

Qualifying for Medicare Incentive Payments with Crystal Practice Management. Version

Delivering the power of the world s most successful genomics platform

Data Mining for Successful Healthcare Organizations

EMPI: A BUILDING BLOCK FOR INTEROPERABILITY

Big Data Analytics in Health Care

How To Improve Data Collection

ICT Perspectives on Big Data: Well Sorted Materials

SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care

Answers to Top BRMS Questions

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Leveraging Integration Engines for Strategic Data Sharing under Value-Based Care. Produced in partnership with. Featuring industry research by

SECURITY METRICS: MEASUREMENTS TO SUPPORT THE CONTINUED DEVELOPMENT OF INFORMATION SECURITY TECHNOLOGY

SOLUTION BRIEF. SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care

with Managing RSA the Lifecycle of Key Manager RSA Streamlining Security Operations Data Loss Prevention Solutions RSA Solution Brief

Healthcare, transportation,

POST MARKET STUDY AS a SERVICE (PMSaaS) Chaitanya

PHARMACEUTICAL BIGDATA ANALYTICS

Planning for Health Information Technology and Exchange in Public Health

Computer Assisted Coding: A Path to Mitigate Risk & Reduce Cost

Machine Data Analytics with Sumo Logic

Transcription:

White Paper Exploring the Challenges and Opportunities of Leveraging EMRs for Data-Driven Clinical Research Because some knowledge is too important not to share.

Exploring the Challenges and Opportunities of Leveraging EMRs for Data- Driven Clinical Research Electronic medical records (EMRs) can facilitate faster and cheaper clinical research investigations. By collecting diagnostic, intervention, and outcomes data at all levels of care and across time, EMRs capture a richer picture of clinical effects, relationships, efficiency, and more. With increased clinical reliance on EMR systems, the question remains of how to best leverage EMR data for research purposes. Data-driven research relying on EMR data must address two general areas of concern: data quality and data accessibility. Such issues stem from those generally associated with secondary and exploratory analyses and manifest in particular forms due to the fact that EMRs are fundamentally designed as tools for patient care not research. Managing these potential issues is becoming increasingly important as focus moves away from well-defined clinical variables (e.g., pulmonary function test score, mortality) and toward more complex care concepts. However, doing so can shift a researcher s attention away from scientific pursuits and onto data transformation tasks. Ultimately, this distraction must be mitigated with informatics and analyst tools that lie beyond offthe-shelf, commercial database software. Data Quality and Validity EMR-based research faces a number of data quality issues, largely those associated with secondary and observational analyses. Structured and unstructured data (e.g., discrete diagnostic codes and practitioner notes, respectively) are largely input into EMR systems by care providers. Ultimately, the reliability of this data is dependent on the precision, accuracy, and overall rigor of this data collection effort. EMR data is therefore not immune to relatively straightforward issues like human error and missingness. More complex quality issues can stem from non-uniformity in the use of EMRs across healthcare networks, care settings, and individual practitioners. Younger physicians tend to employ EMR platforms more thoroughly, for example, as do healthcare networks serving higher income populations 1. Consequently, EMR data quality can be confounded by geography, socioeconomic status, and more. These factors pose major threats to the generalizability (i.e., external validity) of a research study s results. Therefore, a researcher must be cognizant of both inherent patient-to-patient variability and potentially significant practitioner-to-practitioner variability in terms of data quality. 1 J. Lin, T. Jiao, J.E. Biskupiak, & C McAdam-Marx. Application of electronic medical record data for health outcomes research: A review of recent literature. Expert Rev. Pharmacoecon. Outcomes Res. 13(2), 191-200 (2013).

EMR-based data-driven research must also consider construct validity i.e., that a set of values in an EMR database actually represents phenomena of interest to a particular research project. As symptoms, diagnoses, and treatment components can overlap between health issues as do their codes researchers must find a way to distinguish which data is relevant to their individual needs. Similarly, EMR data reflects what comes up in an exchange between provider and patient, meaning information relevant to a research question may not be fully represented in EMRs. Furthermore, clinically-meaningful data points are often not of the resolution preferred by researchers; for example, a patient reporting that she is experiencing pain is actionable information clinically, whereas a novel pain-related research study may have asked a patient to report pain level on a standardized 1-10 scale 2. Accessing Meaningful Information The task of actually extracting research-grade data from potentially fractured EMR databases is itself nontrivial. Many recent publications have relied on welldefined clinical outcomes (e.g., occurrence of a cardiac event) and covariates (e.g., vitals). These sort of analyses take advantage of structured data, which can assume a set format (e.g., numeric values for weight) or one of a discrete set of values in a drop-down list, for example. However, up to 70% of clinically-useful information is recorded in unstructured fields, such as in the form of physician notes input into text boxes 1. The rate at which such unstructured clinical data has become available to researchers has outpaced the rate at which optimal methods to leverage it have been developed. This is largely due to the unrestricted nature of free text 3 : particularly with potential human error and syntax issues (e.g., acronym use, tense changes), reliable and comprehensive querying can be a major undertaking. Straightforward methods to manage such information accessibility challenges such as a subject matter expert annotating the free text are expensive in terms of man-hours and un-scalable. Furthermore, even when focusing on structured data, EMR databases are designed to optimize queries that are patient-centric, not attributecentric. This means that queries are optimized to return lists of patients seen by a practitioner on a given day, for example, rather than return data on patients who experienced a specific set of symptoms and received a certain treatment. Consequently, queries to obtain focused research data sets can become computationally more difficult. This is particularly true if the queries involve logic or rule-based searches, such as returning data on patients whose baseline blood pressure fell within a certain range. 2 S. Muller. Electronic medical records: The way forward for primary care research? Family Practice. 31(2): 127-129. 3 P.M. Nadkarni, L. Ohno-Machado, & W.W. Chapman. Natural language processing: An introduction. J. Am. Med. Inform. Assoc. 18, 544-551 (2011).

Unlocking the Power of EMR Data for Clinical Research Faced with significant data quality and accessibility issues, how can the promise of EMRs for faster and cheaper clinical research be realized? The answer lies in emerging methods and customized tools that mitigate the data transformation demands placed on researchers, which could otherwise disrupt the actual pursuit of research. In terms of managing unstructured data, structure is not necessarily the answer. As unstructured text fields tend to capture some of the most clinically-relevant information, and as the archive of such EMR data grows, forcing structure risks losing valuable information. One computational approach natural language processing (NLP) is increasingly being relied upon for processing unstructured EMR data. NLP brings together concepts from statistics, computer science, engineering, and clinical research to develop algorithms that automatically learn what is important information within unstructured text. This requires functionality to detect the beginning and ending of words, grouping phrases into concepts, aggregating the most meaningful information into usable quantifications, and much more; this is done despite human error (e.g., misspellings) and complex syntax issues (e.g., abbreviations) within free text. While promising, the computational demands of NLP algorithms can approach the level of IBM s Watson computer, and efforts to cost-effectively introduce them into clinical research settings are therefore ongoing. One approach for addressing data quality issues in both structured and unstructured data is to merge EMR datasets with those from other sources. Merging EMR datasets with medical claims data or pharmacy records, for example, can validate that a patient was prescribed a certain treatment. Similarly, identifying where an EMR database overlaps with medical registry information can give a subset of patients for whom some EMR data can be validated. Such merging efforts could even result in a richer dataset than was provided by either source individually. However, coherently merging datasets often entails intensive legwork, as use of EMR software remains disparate across healthcare networks, clinical settings, and providers. Given the legwork required to merge and curate databases, independent but overlapping efforts to do so are inefficient in a larger research context and highlight an opportunity to increase research productivity. With this in mind, an Architecture for Research Computing in Health (ARCH) strategy centralizes EMR, biobank, claims, electronic data capture, and other available data from diverse sources at an institutional level. By aggregating, organizing, and curating merged datasets at this level and producing local, customized datasets for individual research efforts, the data transformation burden is lifted off of researchers and overall efficiency improves.

An ARCH strategy requires an informatics infrastructure not offered by off-the-shelf database platforms. RexDB by Prometheus Research is a customizable data repository specifically designed with clinical research in mind. This platform seamlessly accepts clinical data from diverse sources as inputs, transforms it into usable forms, and provides localized, investigation-specific tools, datasets, and reports. These capabilities are clinically tested, as RexDB is the basis of a shared database infrastructure in a partnership with Weill Cornell Medical College and New York Presbyterian Hospital (NYPH): data from Epic EMR systems, Profiler Biobank, and Allscripts systems at the Center for Advanced Digestive Care and data from CompuRecord, Epic, and Allscripts systems from NYPH s anesthesiology department are loaded into a RexDB pipeline that aggregates and transforms the raw data to provide customized datasets for researchers as needed 4. Subtleties and complexities associated with both clinical phenomena and raw EMR data itself make off-the-shelf data management platforms suboptimal tools for increasingly complex research. RexDB, however, is fundamentally based around a goal of facilitating data-driven clinical analyses. Its flexibility offers a scalable, customizable informatics infrastructure for diverse clinical research projects at both laboratory and institutional levels. The database configuration, straightforward querying, and automated reporting capabilities of the RexDB suite is also backed by a team of analysts supporting a researcher s data processing needs from beginning to end. Altogether, this technological and analyst toolbox takes on the legwork of turning raw EMR data into usable forms for clinical studies: despite the challenges of working with EMR data, RexDB lets researchers focus on research. Ultimately, designing a good EMR-based datadriven study is not enough. Care must be taken when implementing efforts to obtain researchquality datasets from EMRs. As interest in using EMRs for research purposes grows, so do demands for the tools to facilitate such efforts. 4 S.B. Johnson, T.R. Campion, N.E. Pegoraro, L. Rozenblit, C. Tirrell, & C.L. Cole. An institutional strategy to support clinical research with centrally managed custom data repositories. American Medical Informatics Association 2014 Annual Symposium. Poster presentation (2014).

Additional Resources US CORPORATE OFFICE 55 Church Street 7th Floor New Haven, CT 06510 USA CONTACT US +1 800 693 9057 +1 203 672 5800 contact@prometheusresearch.com FOLLOW US Twitter: @PrometheusRsrch Facebook: www.facebook.com/prometheusresearch WEB & MORE For this and other white papers, academic presentations, and publications by Prometheus Research, please visit: www.prometheusresearch.com RexDB is a registered trademark of Prometheus Research, LLC. Copyright 2015. All rights reserved.