Design and Implementation of an Automated Geocoding Infrastructure for the Duke Medicine Enterprise Data Warehouse Shelley A. Rusincovitch, Sohayla Pruitt, MS, Rebecca Gray, DPhil, Kevin Li, PhD, Monique L. Anderson, MD, Stephanie W. Brinson, Jeffrey M. Ferranti, MD, MS AMIA 2014 Joint Summits on Translational Science April 7-11, 2014, San Francisco, California Disclosure Shelley Rusincovitch discloses that she has no relationships with commercial interests. 1
Learning Objective After participating in this activity the learner should be better able to: Understand and characterize the design, process, and implementation of automated geocoding within an enterprise data warehouse. Introduction: GIS and a Geocoding Infrastructure 2
Geographic Information Systems (GIS) Widely used in industry, and in the public health sphere Products include maps, location-specific analytics, and geospatial statistics Only recently been explored at the clinical practice and health system level Fundamentally based upon geocoded data 3
Context of the Duke Medicine Enterprise Data Warehouse Enterprise Data Warehouse (EDW) Overview The EDW contains data generated in healthcare delivery and reimbursement practices within Duke Medicine, including: Duke University Hospital (academic facility) Duke Regional Hospital (community hospital) Duke Raleigh Hospital (community hospital) More than 200 affiliated clinics Supported by Duke Health Technology Solutions, Enterprise Information Management. Directed by Stephen Blackwelder, PhD, and part of the office of the Chief Information Officer, Jeff Ferranti, MD, MS 4
Electronic Health Record (EHR) Duke MAESTRO Care Epic platform-based EHR, implemented 2012-2014, with deployment now complete in all facilities MAESTRO: Medical Application Environments Supporting Transformation of Research and Operations EDW Metrics Electronic capture of patient list extends back to 1979 Visits, diagnoses, procedures for the main hospital beginning in 1996 As of March 2014: 4.37 million patients 91.62 million contacts 78.32 million orders 212.98 million results 5
An Enterprise Resource The Duke Medicine EDW supports many purposes, including: Quality improvement initiatives Regulatory compliance reporting Clinical research Financial analysis and enterprise reporting Automated Geocoding: The Foundation of an Infrastructure 6
Geocoding Process of rendering textual address information into latitude/longitude coordinates Basis for mapping and visualization Allows integration of data from multiple sources by geographic location Sources include U.S. Census data, public health data, and data on the built environment, among others On demand Visualization from Our Self service Query Portal, DEDUCE 7
The Geocoding Process Often semi-manual and iterative, specialized personnel Often deployed for subsets of data in project-specific context In contrast, our approach was an enterprise-level implementation: Design priority for scalability and economy of scale Maximize efficiency through automated address standardization and geocoding Enhance existing enterprise data assets and tools Keep protected health information (PHI) securely behind our firewall PHI and GIS Many elements of a patient address are PHI (Protected Health Information): HIPAA Privacy Rule, 45 CFR 164.514(e). http://www.gpo.gov/fdsys/pkg/cfr-2011-title45- vol1/pdf/cfr-2011-title45-vol1-sec164-514.pdf 8
Design Decisions for Special Circumstances Source data may lack veracity Some addresses do not indicate an actual residence (e.g., a post office box) Decision: Maintain a consistent level of precision and semantic definition Data elements that cannot be processed at a sufficient level of accuracy are not written back to the EDW Resulting data are dependable and easily interpretable by non-gis specialists Implementation 9
Automated Geocoding Deployed in August 2012 SAS Data Management Studio version 2.1: United States Postal Service (USPS) knowledge base TomTom rooftop geocoding data pack These reference data are refreshed on a quarterly basis Example of Process Stages 10
Process Diagram: Automated Address Verification, Standardization, and Geocoding Results 4,080,966 patient address records (82.2% of total) have been verified and standardized 87.7% of standardized patient address records have been geocoded Geocoded records are 72.1% of total patient address records 11
Discussion Newly released IOM Report: Social Determinants of Health IOM report, "Capturing Social and Behavioral Domains in Electronic Health Records: Phase 1." Released April 8, 2014. http://www.iom.edu/reports/2014/capturing Social and Behavioral Domains in Electronic Health Records Phase 1.aspx "A number of domains of special relevance to the social determinants of health can be characterized by use of the patient's residential address...the exponential growth of geocoded data sets will likely allow linkage to a large set of potentially useful variables in coming years. (page 70) 12
A Brief Example of Applications Acknowledgements The enterprise data warehouse and geospatial platform are products of the DHTS-EIM team and its collaborators. We gratefully acknowledge their work and individual contributions. We thank the following individuals for their leadership: Robert Califf, MD, Duke Translational Medicine Institute Howard Shang, Duke Clinical Research Institute We thank our partner organization for its support: SAS Institute (Armistead Sapp, Chris Ricciardi, David Franklin) 13
Questions/Discussion Further Reading Horvath MM, Winfield S, Evans S, Slopek S, Shang H, Ferranti J. The DEDUCE Guided Query tool: providing simplified access to clinical data for research and quality improvement. J Biomed Inform 2011;44(2):266-76. http://www.ncbi.nlm.nih.gov/pubmed/21130181 Ferranti JM, Gilbert W, McCall J, Shang H, Barros T, Horvath MM. The design and implementation of an open-source, data-driven cohort recruitment system: the Duke Integrated Subject Cohort and Enrollment Research Network (DISCERN). J Am Med Inform Assoc 2012;19(e1):e68-75. http://www.ncbi.nlm.nih.gov/pubmed/21946237 Danford CP, Horvath MM, Hammond WE, Ferranti JM. Does access modality matter? Evaluation of validity in reusing clinical care data. AMIA Annu Symp Proc 2013;278-83. http://www.ncbi.nlm.nih.gov/pubmed/24551337 14
Contact Information Shelley A. Rusincovitch Project Leader, Applied Informatics Duke Translational Medicine Institute (DTMI) shelley.rusincovitch@duke.edu 15