Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR Nathan Manwaring University of Utah Masters Project Presentation April 2012
Equation Consulting Who we are Equation Consulting is a 45 person consulting firm based in Salt Lake City, UT with a core focus on data-driven solutions to improve physician economics within hospital, private, or academic settings. Our engagements are almost exclusively limited to hospital and physician billing data, but due to the Meaningful Use Initiative of HITECH, more clients are asking about EMR data and Equation is seeking to increase their exposure to clinical data. 2
Project Goals Equation s Goal was to: 1. Understand what information is available and discover possible applications in future products and projects that could be offered to Equation s clients. Measuring effectiveness of anti-hypertension medication chosen as first-look analysis 2. Protect privacy by determining how to create meaningful datasets for research that meet HIPAA requirements for de-identified data sets Researcher s Goal was to: 1. Improve understanding of statistical techniques in data analysis including: t-test, regression, and data-mining methods 2. Work with Dr. Sheng to develop tools to identify potential Adverse Drug Events from EMR data 3
Background What is EMR? Electronic Medical Records (EMR/EHR) are computerized medical record in a hospital or physician's office. Major benefits over paper charts include improving efficiency and accuracy of data sharing, decreased cost of storing cumbersome paper files, and opportunities to prompt physicians with automated warnings prior to performing orders. VS. 4
Background HITEC Act and the EMR Explosion Long before Obamacare, Congress signed the Health Information Technology for Economic and Clinical Health Act of 2009 (HITECH) creating $34 billion in financial incentives for hospitals and physicians for their meaningful use of certified electronic health records (EHRs). This law also includes substantial payment reductions if they are not meaningful users of health IT after 2015. As of late 2011, physician EMR adoption ranges from 39% for solo practitioners to 77% adoption rate for large multi-specialty practices. 5
Background What to do with all of this new Data? 6
Background How to protect Patient Privacy? A Safe Harbor outlined by HIPPA for sharing patient data with researchers is through removing all PHI, or creating a de-identified dataset. A de-identified dataset is created by removing all 18 elements that could be used to identify the individual or the individual's relatives, employers, or household members. There also must be no actual knowledge that the remaining information could be used alone or in combination with other information to identify the individual who is the subject of the information. It is the goal of Equation and it s clients to determine how to generate meaningful data extracts while still falling within these requirements for de-identification. 1. Names. 2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP Code, and their equivalent geographical codes, except for the initial three digits of a ZIP Code 3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death and all ages over 89 except that such ages are aggregated into a single category of age 90 or older. 4. Telephone numbers. 5. Facsimile numbers. 6. Electronic mail addresses. 7. Social security numbers. 8. Medical record numbers. 9. Health plan beneficiary numbers. 10. Account numbers. 11. Certificate/license numbers. 12. Vehicle identifiers and serial numbers, including license plate numbers. 13. Device identifiers and serial numbers. 14. Web universal resource locators (URLs). 15. Internet protocol (IP) address numbers. 16. Biometric identifiers, including fingerprints and voiceprints. 17. Full-face photographic images and any comparable images. 18. Any other unique identifying number, characteristic, or code, unless otherwise permitted by the Privacy Rule for reidentification. 7
Background: Why Hypertension? Data Perspective Blood Pressure is one of the most basic/universal attributes collected by physicians using EMR Hypertension, has a well-defined list of diagnosis codes that can be easily used to identify the primary patient population we wish to study Data Collection doesn t require significant behavior change from providers, increasing data accuracy and availability Clinical Perspective...Even small improvements in blood pressure control can have major public health impact. A 1990 systematic review of 14 randomized treatment trials for hypertensive patients showed that lowering diastolic blood pressure (DBP) by 5 to 6 points reduced stroke rates by 42%. Another recent study showed that lowering DBP by only 2 points could result in a 6% reduction in the risk of coronary heart disease, along with a 15% reduction in the risk of stroke and one type of heart attack... http://www.ahrq.gov/qual/hypertengap.htm 8
Step 1: Preparing the Data The Data The source of this data is a subset Epic s Clarity data warehouse. The subset contains data spread between 149 unique tables. Items to be included in the analysis include: Patient Demographics, Medications, Diagnosis Codes, Procedure Codes, Provider Information, Location and Department information, Dates of Service, Medication Orders, etc. The Process First goal was to generate a clean list of patients that would be the starting universe for potential analysis. Criteria for meeting this requirement is an encounter with a primary diagnosis of hypertension. Second goal was to generate a dataset that would tie together various medications, encounters, BP readings, and physician encounters in an ordered way that allowed for comparing consecutive items on a single line Grain of the dataset is a unique Patient/PrescriptionDate combination. Data must include all medication prescribed as well as detail on previous and future encounters and BP measurements from that time period Over 2,000 lines of SQL code was written to generate the desired tables 9
Step 1: Preparing the Data ----------------------------------------------------------------------------------------------- ---------------------------------- --08 Create Marker for the top 50 Drugs (by Simple Generic Code) that can reside as a seperate column ----------------------------------------------------------------------------------------------- ---------------------------------- IF OBJECT_ID('TEMP_TOP_DRUGS') <> 0 DROP TABLE TEMP_TOP_DRUGS GO SELECT TOP 50 Order_MedicationSimpleGenericCode,COUNT(*) AS Cnt,row_number () OVER( ORDER BY COUNT(*) DESC) AS RowNum INTO TEMP_TOP_DRUGS FROM MedicationOrder where Order_MedicationSimpleGenericCode is not null GROUP BY Order_MedicationSimpleGenericCode ORDER BY COUNT(*) DESC GO DECLARE @SimpleGenericCode VARCHAR(255) DECLARE @ColumnName VARCHAR(255) DECLARE @SQLCode1 NVARCHAR(MAX) DECLARE @SQLCode2 NVARCHAR(MAX) DECLARE @RowNum INT DECLARE @MAXRowNum INT SET @RowNum = 1 SET @MAXRowNum = (SELECT MAX(RowNum) FROM TEMP_TOP_DRUGS) WHILE @RowNum <= @MAXRowNum BEGIN SET @SimpleGenericCode = (SELECT Order_MedicationSimpleGenericCode FROM TEMP_TOP_DRUGS WHERE RowNum = @RowNum) SET @SQLCode1 = 'ALTER TABLE PatientOrder ADD Drug_'+@SimpleGenericCode+'_mrkr TINYINT' EXECUTE sp_executesql @SQLCode1 --SELECT @SQLCode1 SET @SQLCode2 = 'UPDATE PatientOrder SET Drug_'+@SimpleGenericCode+'_mrkr = CASE WHEN MedicationList LIKE ''% '+@SimpleGenericCode+' %'' THEN 1 ELSE 0 END' EXECUTE sp_executesql @SQLCode2 --SELECT @SQLCode2 SET @RowNum = @RowNum+1 END GO ----------------------------------------------------------------------------------------------- ---------------------------------- --08 Create Active Medication List for All Drug Orders ----------------------------------------------------------------------------------------------- ---------------------------------- GO IF OBJECT_ID('TEMP_All_CSN_ID') <> 0 DROP TABLE TEMP_All_CSN_ID SELECT Order_BP_EncounterPrev04_CSN_ID AS CSN_ID INTO TEMP_All_CSN_ID FROM( SELECT DISTINCT Order_BP_EncounterPrev04_CSN_ID FROM PatientOrder UNION SELECT DISTINCT Order_BP_EncounterPrev03_CSN_ID FROM PatientOrder UNION SELECT DISTINCT Order_BP_EncounterPrev02_CSN_ID FROM PatientOrder UNION SELECT DISTINCT Order_BP_EncounterPrev01_CSN_ID FROM PatientOrder UNION SELECT DISTINCT Order_BP_EncounterCurr00_CSN_ID FROM PatientOrder UNION SELECT DISTINCT Order_BP_EncounterNext01_CSN_ID FROM PatientOrder UNION SELECT DISTINCT Order_BP_EncounterNext02_CSN_ID FROM PatientOrder UNION SELECT DISTINCT Order_BP_EncounterNext03_CSN_ID FROM PatientOrder UNION SELECT DISTINCT Order_BP_EncounterNext04_CSN_ID FROM PatientOrder ) a GO ALTER TABLE TEMP_All_CSN_ID ADD MedList VARCHAR(500) GO DECLARE @Line INT DECLARE @Max_Line INT SET @Line = 1 SET @Max_Line = (SELECT MAX(LINE) FROM db_0175_08_ods..pat_enc_curr_meds a INNER JOIN TEMP_All_CSN_ID b ON a.pat_enc_csn_id = b.csn_id AND a.is_active_yn = 'Y') WHILE @Line <= @Max_Line BEGIN UPDATE TEMP_All_CSN_ID SET MedList = ISNULL(a.MedList,'') +' '+ d.simple_generic_c FROM TEMP_All_CSN_ID a INNER JOIN db_0175_08_ods..pat_enc_curr_meds b ON a.csn_id = b.pat_enc_csn_id AND b.line = @Line AND b.is_active_yn = 'Y' INNER JOIN db_0175_08_ods..clarity_medication d ON b.medication_id = d.medication_id WHERE CHARINDEX(d.SIMPLE_GENERIC_C,ISNULL(a.MedList,''),1) = 0 SET @Line = @Line + 1 END 10
Step 1: Preparing the Data 11
Step 2: Understanding the Data High Level Review of Data Patient Data 28,178 Patients 29 Demographics Reviewed: Median Age 60, range 40-80; 57% Female, Median Weight 190 lbs, Avg 7 BP Measurements per patient Medication Information 6,818 Drugs Analyzed 30,590 Discontinued Orders (potential adverse events) Prescription Data 1.9 million Unique prescriptions written 20,568 Unique Order Dates Top 50 drugs prescribed 19,151 times to primary population 12
Step 2: Data Analysis Core Drug Comparison Most common Anti-hypertensive medication: 1. Lisinopril (LIS) 2. Hydrochlorothiazide (HCTZ) 3. Lisinopril-Hydrochlorothiazide combination drug 4. Furomeside Exclude core drugs # 1 & 4 due to unique patient populations making comparison less helpful Next Steps Analysis Compare All Drugs Data-Mining & Classification Analyze LIS and HCTZ o T-test comparisons o Industry Research o Demographics Analysis o Outcomes Analysis o Detail 13
Multiple Linear Regression: Introduction Linear regression is a modeling approach that determines the closest linear relationship between one or more independent variables and a single dependent variable. Regression modeling is facilitated through binary columns that indicate the presence of a particular independent variable (see below) 14
Analysis: Multiple Linear Regression We initially attempted to model the relationship between many (14) medications and their impact on blood pressure. While this method did successfully identify several drugs known to increase/decrease BP, it also indicated that the drug Furosemide as increasing diastolic BP Additional data scrubbing would show that Furosemide is not associated with increased BP Conclusion While we gain some useful information from linear regression modeling, it does not appear to be the most appropriate modeling method for this dataset 15
Data Mining Classification: Introduction Classification Using Data Mining Techniques (Decision Tree, Neural Networks) to classify an object into sets of pre-defined object classes o Example: Based on specific demographics can we classify patients as churn or not o Example: Based on customer attributes classify customers as will or won t purchase 16
Analysis: Data Mining - Classification We tried Data Mining modeling J48 decision tree and NaïveBays to try to determine which patients would experience a favorable drop in their Category of Hypertension Variables Given: Gender, Age, HypertensionDX_mrkr, DiagnosisGroup, DaysFromOrder, Weight, BMI, Temp, Pulse, DiastolicCategory, medication markers The J48 decision tree was closest with only 65% correctly classified instances Definitely an area to continue exploring in the future 17
Analysis: Lisinopril VS. Hydrochlorothiazide Preliminary analysis shows comparable results from LIS and HCTZ, especially among Category 2-3 combined populations. LIS has slightly better performance over all 18
Lisinopril VS. Hydrochlorothiazide: T-test The T-test is commonly used to calculate the significance of observed differences between the means of two samples. T-test result is the percent probability of the null hypothesis The null hypothesis is that there are no significant difference between the means. 19
Lisinopril VS. Hydrochlorothiazide: Preliminary Analysis Preliminary analysis shows comparable results from LIS and HCTZ, especially among Category 2-3 combined populations. LIS has slightly better performance over all 20
Analysis: Industry Research Lisinopril VS. Hydrochlorothiazide Several studies were found comparing effectiveness of LIS vs HCTZ at lowering blood pressure. Both studies found that LIS was 6-7 points more effective than HCTZ EMR Analysis shows only 2 points of difference between the impacts of the two drugs Need to examine potential causes of difference between EMR findings and research http://www.springerlink.com/content/r01716462m4864u8/ http://journals.lww.com/cardiovascularpharm/abstract/1987/00003/controlled_multicenter_study_of_the.10.aspx 21
Analysis: Difference 1- Age Demographics Study age demographics were significantly under representing the >64 category compared to the actual patient population http://www.springerlink.com/content/r01716462m4864u8/ http://journals.lww.com/cardiovascularpharm/abstract/1987/00003/controlled_multicenter_study_of_the.10.aspx 22
Analysis: Difference 2 - Gender Study includes significantly more males than the actual patient population http://www.springerlink.com/content/r01716462m4864u8/ http://journals.lww.com/cardiovascularpharm/abstract/1987/00003/controlled_multicenter_study_of_the.10.aspx 23
Analysis: Impact of Demographic Variation is Important Splitting the group comparison by gender reveals that Males have less favorable outcomes from HCTZ than females. T-test results suggest a high degree of similarity for females using the medications, but not for Males. If EMR data had similar demographics to the Study, the results of the comparison would be comparable. http://www.springerlink.com/content/r01716462m4864u8/ http://journals.lww.com/cardiovascularpharm/abstract/1987/00003/controlled_multicenter_study_of_the.10.aspx 24
Conclusion: Real World vs. Research EMR Research is not to replace studies, nor is it competing with them. EMR tells the physician a different story: EMR Data Story What actually happens to my patient s BP when I prescribe this medication? Clinical Trial Story What happens in a highly controlled environment when a specific population of patients are given this medication? 25
Conclusion: Privacy Data Integrity Protection After the initial SQL data scrubbing, none of the 18 Safe Harbor fields were required to complete the analysis, and very little value is lost by removing dates. It is possible to construct a data extraction script that created the Patient Summary table, but retained order of and time in between encounters and measurements without including any actual dates. Clients would control the execution of the extract and would certify that the end-result was free form PHI By constructing special data extraction tools for clients, it s possible to both guarantee patient privacy and achieve rich data sets with high research value. Furthermore, unless a specific query is created for a particular data set, it is unlikely that a hospital would be able to provide PHI-free data that retained this information. Original Data De-identified Data 26
Future Research Ideas Framingham Study: 1. What are predictors of Heart Attack, Stroke, Diabetes, etc.? 2. What medications seem to have the best track record for reducing Heart Attack Risk? 3. What physicians seem to be the most efficient providers? (best outcomes relative to patient cost) Adverse Events and Discontinued Medication 1. What is the relationship between Adverse Events and Discontinued Medication Codes? 2. Can we predict a decrease in medication effectiveness in key populations with higher adverse events reported? 27
Future Research: Adverse Events and Discontinued Medication One possible factor influencing the comparison of LIS and HCTZ in large populations of patients is the frequency of side-effects influencing whether patients are unable or unwilling to finish the prescription due to side effects. Lisinopril Hydrochlorothiazide 28
Future Research: Adverse Events and Discontinued Medication EMR data captured information when a medication is discontinued due to dose adjustment, patient preference, allergic reactions, etc. By analyzing this data, we can start to compare the picture to findings from the DrugInformer.com database. The user of the tool simply sets the weighting criteria and then the excel sheet will display the top ranked drugs by severity based on the weighting chosen by the user. 29