Big Data Analytics for Mitigating Insider Risks in Electronic Medical Records



Similar documents
Auditing EMR System Usage. You Chen Jan, 17, 2013

Requirements and Design for an Extensible Toolkit for Analyzing EMR Audit Logs

Managing the Insider Threat: Real-time Monitoring of Access Patterns to ephi

EXHIBIT 1 ACTIVELY RECRUITED POSITIONS LIST (Final 12/09/15)

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

The Basics of HIPAA Privacy and Security and HITECH

Case: 1:69-cv Document #: Filed: 10/22/14 Page 1 of 8 PageID #:26731 EXHIBIT 1 ACTIVELY RECRUITED POSITIONS LIST

Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework. Nitesh Chawla, PhD (c)

Big Data Analytics for Healthcare

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Master of Science in Health Information Technology Degree Curriculum

Mining Deviations from Patient Care Pathways via Electronic Medical Record System Audits

A Hospital Satisfaction Survey Report of Taiwan s Current Implemented Electronic Medical Records (EMRs) Systems

Social Media Mining. Data Mining Essentials

Exploration and Visualization of Post-Market Data

Ambulatory EMR Implementation at Texas Children s Hospital. GE Healthcare User Summit September 23, 2004

DRAFT. Select VHA ENTERPRISE STANDARD TITLE:??

BIG DATA IN HEALTHCARE THE NEXT FRONTIER

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

The Use of Patient Records (EHR) for Research

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Random forest algorithm in big data environment

OREGON CERTIFICATE OF ADVANCED MASTERY HEALTH SERVICES CONTEXTUAL FRAMEWORK

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

DEPARTMENTAL POLICY. Northwestern Memorial Hospital

Technology Using Electronic Health Records to Improve the

Data Privacy and Biomedicine Syllabus - Page 1 of 6

Health Professionals who Support People Living with Dementia

Steps to getting a diagnosis: Finding out if it s Alzheimer s Disease.

304 Predictive Informatics: What Is Its Place in Healthcare?

APPENDIX CONFLICT OF INTEREST CODE OF THE COUNTY OF SANTA CLARA EXHIBIT A LIST OF DESIGNATED POSITIONS FOR

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

UW MEDICINE PATIENT EDUCATION. Your Care Team. Helpful information

Centricity Physician Office

Coventry Health and Life Insurance Company PPO Schedule of Benefits

Mississippi Medicaid Enrollment Application (Ordering/Referring/Prescribing Provider)

Program Change Request. New Program Proposal. Work. In Workflow. Viewing: TBD MS MS : Master of Social. Last approved: 10/29/15 11:27 am

CoolaData Predictive Analytics

REQUISITE SKILLS AND ABILITIES FOR PHYSICAL THERAPY STUDENTS AT THE UNIVERSITY OF ALBERTA, AND THE ACCOMMODATION OF STUDENTS WITH DISABILITIES ON

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA

ANALYTICS IN BIG DATA ERA

Preliminary validation of treatment relationship confirmed by event log applications

University of Maryland School of Medicine Master of Public Health Program. Evaluation of Public Health Competencies

Azure Machine Learning, SQL Data Mining and R

Engaging Physicians from the Inside Resident Informatics Program

11. Analysis of Case-control Studies Logistic Regression

Workforce Development: The Future of Nursing Informatics

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

Department of Behavioral Sciences and Health Education

MEDICAL OFFICE ASSISTANTS PCM Training

A Long Way to Go for EMR Usability By: Jessica Green

Labor Market Information for the Greater New Bedford Area

Predictive Analytics Certificate Program

Local outlier detection in data forensics: data mining approach to flag unusual schools

Presenter: Doug Reynolds, Development Dimensions International

Running Head: COST-CONTROLLING MEASURES OF THE A.C.A. 1. Lesser Politicized Cost-Controlling Measures of the Affordable Care Act: Literature Review

Portions of the Design Document for a course on Neonatal Electroencephalography

A CAREER IN CARING. Healthcare Career Opportunities in B.C.


A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

UCLA Physician Informaticists. Information Services & Solutions

EMR Implementation Readiness Assessment and Patient Satisfaction

UNIVERSITY OF KANSAS Office of Institutional Research and Planning

MINISTRY OF HEALTH- BOTSWANA VACANCY ANNOUNCEMENTS

Learning Objectives. Using Epic to Conduct Clinical Research A Series of Pediatric Case Studies. Disclosures. Medical Informatics is...

Electronic Medical Record Adoption Model (EMRAM) John Rayner Director of Professional Development HIMSS-UK

Medical Informatics in Healthcare Organizations: A Survey of Healthcare Information Managers

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

National Stroke Association s Guide to Choosing Stroke Rehabilitation Services

Schedule 3 - IWK Health Centre Four Bargaining Units at April 1, 2015

Competency 1 Describe the role of epidemiology in public health

Profile: Incorporating Routine Behavioral Health Screenings Into the Patient-Centered Medical Home

Data Mining + Business Intelligence. Integration, Design and Implementation

Implementing MICO Beyond the EMR

Medical Informatics An Overview Saudi Board For Community Medicine

Degree Based. Fellowship of transplantation Sub-specialty of nephrology Fellowship September 1 year 2 students

Online Directory Assistance

Server Load Prediction

The Data Mining Process

11/17/2015. Learning Objectives. What Is Data Mining? Presentation. At the conclusion of this presentation, the learner will be able to:

Transcription:

Big Data Analytics for Mitigating Insider Risks in Electronic Medical Records Bradley Malin, Ph.D. Associate Prof. & Vice Chair of Biomedical Informatics, School of Medicine Associate Prof. of Computer Science, School of Engineering Vanderbilt University 21/8/2015

January 1, 2015 Logged over 2,000,000 users interactions Alice s Alice s Electronic Alice s Electronic Medical Electronic Medical Medical Record ecord Record Alice s Electronic Medical Record

January 2, 2015 Logged over 2,000,000 users interactions

January 3, 2015 Logged over 2,000,000 users interactions

Auditing Requirements Federal (US) 1. Access control 2. Track & audit employee accesses 3. Store logs for 6 years 6

7

How (Not) to Use Access Control Central Norway Health Region enabled break the glass 1/2 of 99,000 patients broke glass 1/2 of 12,000 users broke glass Role Users Break Grass Nurse 5633 36% Doctor 2927 52% Health Secretary 1876 52% Physiotherapist 382 56% Psychologist 194 58% ~300K events in 1 month (Røstad & Øystein 2007) 8

Oct 2007 Palisades Medical Center Dozens of Employees 9

July 8, 2011 UCLA HHS Investigation $1 million fine 10

The Model is Wrong 11

Learning Suspicious EMR Access Behavior (Boxwala et al, JAMIA, 2011) Manually select 505 potential cases / controls based on previous breaches at Partners Healthcare Model Support Vector Machines Logistic Regression LABEL Human experts label cases as + / SELECT New unlabeled events from DB PREDICT Calculate the prediction probabilities using classifier on all events BUILD Build classifier from labeled events 12

Learning Suspicious EMR Access Behavior (Boxwala et al, JAMIA, 2011) Feature Coefficient Odds Ratio Works in the same department 3.16 23.5 Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 Same Zip Code 1.46 0.23 Is Provider 2.33 0.10 Care unit visit match 3.40 0.03 13

Role Refinement Northwestern Memorial Hospital 3 months of access logs to inpatient records 8K users, 16K patients, 1.1M accesses User ID Patient ID User position Date / Time Number of Orders Entered Patient Location in hospital Service patient is on (Zhang, Gunter, Liebovitz, Tian, & Malin AMIA 2011) 14

Predictability is Job Dependent MOST PREDICTABLE Rank Most Predictable Accuracy Users 1 (tie) ED Assistant 100% 26 1 (tie) ED Physician CPOE 100% 43 1 (tie) NMH Resident/Fellow ID Clinic-CPOE 100% 10 LEAST PREDICTABLE Rank Least Predictable Accuracy Users 140 Patient Care Staff Nurse 7.6% 1554 139 Rehab Occupational Therapist (OT) 14.3% 28 136 Patient Care Staff Nurse (Pilot) 22.1% 217 15

Where are We Going Wrong? Actual Role Predicted Role Probability Rehab Rehab Occupational Therapist Physical Therapist 85.7% Rehab Rehab Physical Therapist Occupational Therapist 60.0% 16

Suspicious? 17

Suspicious or Anomalous? 18

Defining Access Control Detecting Suspicious Behavior 19

January 1 EMR users linked if they accessed 1 patient in common (Malin, Nyemba, Paulett 2011) 20

Mining to Model the System (Malin, Nyemba, Paulett 2011) 2 nd Principal Component University Hospital Children s Hospital 1 st Principal Component 21

Hypothesis! Collaborative systems are about social phenomena People should form communities We should be able to measure deviation from community structure Note: other social phenomena could be studied (temporal workflow*, function invoked* if any, etc.) (*Chen et al, IEEE TDSC 2012; Zhang et al. ACM SACMAT 2013; Zhang et al. ACM TMIS 2013) 22

Community Based Anomaly Detection (CADS) Pattern Extraction Anomaly Detection Access Logs Social Relation Construction User Communities Distance Measurement User Specific Deviation Scores Community Deviation Deviation Measurement (Chen & Malin ACM CODASPY 2011) 23

Example 6 Nearest Neighbor Network (1 day of accesses) The average cluster coefficient for this network is 0.48, which is significantly larger than 0.001 for random networks Users exhibit collaborative behavior in the health information system 24

Auditing Strategies of the Past Principle Components Analysis (PCA) Graph based anomaly detection (Shyu et al 2003) (How similar am I to spectral clusters of users?) K Nearest Neighbor (KNN) Nearest neighbor based anomaly detection (Liao et al 2002) (How similar am I to my friends?) High Volume Model (Gallagher et al 1998) (Do I access way more people than my relations?) 25

Social Structure Wins the Day! True Positive Rate False Positive Rate 26

Gripes & Future Musings Different providers within the same ward have different behavior! Different wards within the same healthcare institution have different behavior! Different healthcare organizations use different languages! Logic (i.e., access control) and AI (i.e., data mining) need to play nicely together 27

Questions? b.malin@vanderbilt.edu Health Information Privacy Laboratory http://www.hiplab.org/ 28

29

High Confidence Rules Rule Support Confidence Weeks Center for Patient & Professional Advocacy Hearing & Speech 0.000581 0.860 18 Practice City A Clinic City A 0.000193 0.673 21 Infectious Disease Clinic Infectious Disease 0.000206 0.637 21 NICU Neonatology 0.000613 0.629 17 VMG Family Practice Clinic City A 0.00132 0.628 21 Vanderbilt Hearing School Hearing & Speech 0.00142 0.619 22 30

Low Confidence Rules (but occur in at least 3 weeks) Rule Support Confidence Weeks Anesthesiology Vanderbilt Hearing School Anesthesiology 4N Labor & Delivery Anesthesiology Physician Liaison Program Emergency Medicine Nutrition Clinic Anesthesiology Cardiac Cath Lab Emergency Medicine Diabetes Ctr Anesthesiology Center for Clinical/Research Ethics Anesthesiology Infectious Disease Clinic Anesthesiology Pediatric Immunology Anesthesiology Mental Health Center 0.0000522 0.000581 6 0.0000526 0.000577 6 0.0000565 0.000574 4 0.0000454 0.000572 4 0.0000590 0.000565 3 0.0000458 0.000558 4 0.0000459 0.000528 7 0.0000454 0.000527 4 0.0000458 0.000514 4 0.0000453 0.000514 4 31

Big Data Audits Must Be Understandable to be Actionable 32

What Makes Sense? Dr. Smith s access of Peggy Johnson s medical record was strange Dr. Smith s access was 10 standard deviations away from normal behavior in his hospital Dr. Smith s access was strange because he is a neonatologist and he accessed the record of a 100 year old woman who, for the past year, has only been treated by gerontologists 33

So Do You Believe Inferred Patterns? 34

Hypothesis: Locally Knowledgeable of Class Anethesiologists Psychiatrists Coding & Charge Entry Medical Information Services Ane. Rules Psych. Rules Code Rules MIS Rules High (10) High (10) High (10) High (10) Medium (10) Medium (10) Medium (10) Medium (10) Low (10) Low (10) Low (10) Low (10) 35

Survey Employees presented with questions asked to report likelihood of rules on a 5 point Likert scale All employees asked the same set of 120 questions (four sets of 30) Someone from Anesthesiology accessed the record of patient John Doe. How likely is it that someone from the following organizational area accessed the same patient's record? Anesthesiology Psychiatry Not at all Not at all Slightly Moderately Very Completely Slightly Moderately Very Completely 36

Hypothesis: Locally Knowledgeable of Class Anethesiologists Ane. Rules High Medium Low Employees can distinguish between high, med, and low for their own rules Anesthesiologists evaluated with anesthesiology rules Tested hypothesis with linear mixed effects (LME) model 37

Hypothesis: Locally Knowledgeable of Class Anethesiologists Confirmed for every organizational area at 95% confidence level! Ane. Rules High Medium Low Area Strength p value ANE 0.75 0.007 CODE 0.44 0.011 MIS 0.32 0.037 PSY 0.82 0.020 38

Learning Suspicious EMR Access Behavior (Boxwala et al, JAMIA, 2011) Feature Coefficient Odds Ratio 39

Learning Rules for Suspicious Access Detection (Boxwala et al, JAMIA, 2011) Feature Coefficient Odds Ratio Works in the same department 3.16 23.5 40

Learning Rules for Suspicious Access Detection (Boxwala et al, JAMIA, 2011) Feature Coefficient Odds Ratio Works in the same department 3.16 23.5 Same street address 2.60 13.45 41

Learning Rules for Suspicious Access Detection (Boxwala et al, JAMIA, 2011) Feature Coefficient Odds Ratio Works in the same department 3.16 23.5 Same street address 2.60 13.45 Same family name 2.34 10.38 42

Learning Rules for Suspicious Access Detection (Boxwala et al, JAMIA, 2011) Feature Coefficient Odds Ratio Works in the same department 3.16 23.5 Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 43

Learning Rules for Suspicious Access Detection (Boxwala et al, JAMIA, 2011) Feature Coefficient Odds Ratio Works in the same department 3.16 23.5 Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 44

Learning Rules for Suspicious Access Detection (Boxwala et al, JAMIA, 2011) Feature Coefficient Odds Ratio Works in the same department 3.16 23.5 Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 Same Zip Code 1.46 0.23 45

Learning Rules for Suspicious Access Detection (Boxwala et al, JAMIA, 2011) Feature Coefficient Odds Ratio Works in the same department 3.16 23.5 Same street address 2.60 13.45 Same family name 2.34 10.38 Over 200 accesses in a day 1.30 3.70 VIP Patient 1.18 3.23 Same Zip Code 1.46 0.23 Is Provider 2.33 0.10 46

Predictability is Job Dependent Prediction Accuracy 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Med Student CPOE NMH Resident / Fellow CPOE Patient Care Staff Nurse Rehab OT 0 500 1000 1500 2000 Number of Users in Role 47

Another Healthcare Environment Vanderbilt EMR Logs 6 months Arbitrary Week 2,500 users 35,000 patients 66,000 <user, patient> distinct accesses 48