Python, Cadastral Geocoding, and EMR Data

Similar documents

Network Analysis with Python. Deelesh Mandloi

Geocoding in Law Enforcement Final Report

There are various ways to find data using the Hennepin County GIS Open Data site:

Workflow improvement with FME in Skedsmo municipality

ESRI Technical Certification Overview. Amy Daniels Instructor, Greenville Tech

Getting Started With Mortgage MarketSmart

Creating Pivot Tables

About Reference Data

6 Steps to Data Blending for Spatial Analytics

7 Steps to Successful Data Blending for Excel

NRG Software Postal Tool Kit v1.5

EMR Technology Checklist

State of Israel Ministry of Housing and Construction Survey of Israel. system for the Israeli real estate market

ACTIVITY & LOCATION BASED ANALYTICS APPLICATIONS

A GIS Portrait of Childrens Health System of Texas Population Health. ESRI Health GIS Conference November 3-5, 2014 Colorado Springs, Colorado

Guide to Creating and Editing Metadata in ArcGIS for Publishing to the MSDIS GeoPortal

Brett Gaines Senior Consultant, CGI Federal Geospatial and Data Analytics Lead Developer

Data Warehousing. Jens Teubner, TU Dortmund Winter 2014/15. Jens Teubner Data Warehousing Winter 2014/15 1

Bridging Urban Design and 3D GIS Infrastructure

Valassis Lists Data as an Indicator of Population Recovery in the New Orleans Area

What's new in gvsig Desktop 2.0

GEOGRAPHIC INFORMATION SYSTEMS

Implementing Geospatial Data in Parametric Environment Elçin ERTUĞRUL*

Copyright Soleran, Inc. esalestrack On-Demand CRM. Trademarks and all rights reserved. esalestrack is a Soleran product Privacy Statement

GIS I Business Exr02 (av 9-10) - Expand Market Share (v3b, Jul 2013)

Data Integration for ArcGIS Users Data Interoperability. Charmel Menzel, ESRI Don Murray, Safe Software

Visualization of LODES/OnTheMap Work Destination Data Using GIS and Statistical applications

GEOGRAPHIC INFORMATION SYSTEMS

Note: Hands On workshops are Bring Your Own Laptop (BYOL), unless otherwise noted. Some workshops are Bring Your Own Mobile Device(BYOD).

ESRI Business Analyst for Telecommunications

XTIVIA, Inc. Vicinity for Salesforce Installation Guide

SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION. Marc-Olivier Briat, Jean-Luc Monnot, Edith M.

ArcGIS for. Intelligence

NaviCell Data Visualization Python API

TABLEAU COURSE CONTENT. Presented By 3S Business Corporation Inc Call us at : Mail us at : info@3sbc.com

Implementing ArcGIS for SharePoint Habitat for Humanity of Omaha April, 2013

What is GIS. What is GIS? University of Tsukuba. What do you image of GIS? Copyright(C) ESRI Japan Corporation. All rights reserved.

Esri Maps for Salesforce and Microsoft Dynamics CRM

SIEBEL SALES USER GUIDE

Introduction to REDCap for Clinical Data Collection Shannon M. Morrison, M.S. Quantitative Health Sciences Cleveland Clinic Foundation, Cleveland, OH

PharmaSUG Paper DG06

Quality Assessment Strategies for GIS Reference Data Used in Patient Address Geocoding

Premium Forwarding Service Residential (PFS-Residential ) Application

Vector analysis - introduction Spatial data management operations - Assembling datasets for analysis. Data management operations

The CFPB and HMDA Changes. QuestSoft CFPB HMDA Webinar 8/19/

Spotfire v6 New Features. TIBCO Spotfire Delta Training Jumpstart

From Chaos to Efficiency in Medical Practice: Disruptive InDxLogicTechnology Rocks! Noreen Headley Systems & Revenue Cycle Manager

Data Interoperability Extension Tutorial

Optum Physician EMR Administration Module Setup Guide for Clinical Toolbar

Geocoding and Buffering Addresses in ArcGIS

Cardinal Health Specialty Solutions. Cardinal Health Geographic Insights Maximize Market Opportunity with Actionable Insights from Data Visualization

How To Improve Gis Data Quality

Solutions for Central and Federal Governments

DESIGN AND IMPLEMENTATION OF A WEB-BASED GIS FOR PUBLIC HEALTHCARE DECISION SUPPORT SYSTEM IN ZARIA METROPOLIS

Downloading SSURGO Soil Data from Internet

Mailing Services. Made Easy. Providing the information and tools for your mail pieces from start to finish.

My City Houston Map Viewer, Version 2 Web Site

Every Door Direct Mail

Welch Allyn Connectivity SDK Development

GIS Training Express

Going the Distance with Behavioral Health Core Measures Workflow Changes to Canopy. Go-Live: July 23, 2015

Spatial Analyst Tutorial

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

Put all your results in one basket. Orchard Pathology simplifies the complexities of clinical, molecular, and pathology testing and reporting.

Model, Analyze and Optimize the Supply Chain

Ag-Analytics Documentation

A. System. City of Perrysburg, Ohio URISA EXEMPLARY SYSTEMS IN GOVERNMENT AWARD

Cancer Mapper: Michigan s Interactive Mapping Tool

PHARMACEUTICAL BIGDATA ANALYTICS

Open Source Desktop GIS Solutions for the Not-So Casual User

Premium Forwarding Service (PFS ) Application

Report and Export Options

Creating Geospatial Metadata. Kim Durante Geo4Lib Camp

Creating a Tableau Data Visualization on Cincinnati Crime By Jeffrey A. Shaffer

Business Analyst Server

LYON COUNTY GEOMOOSE 2 HELP DOCUMENT

An Application in GIS for a Sanitary Sewer Overflow Emergency Response Program

Tool User Guide DATA PREPARATION TOOL USER GUIDE A.1 INTRODUCTION A.2 INSTALLATION. A.2.1 Minimum System Requirements

NetCDF and HDF Data in ArcGIS

What s new in TIBCO Spotfire 6.5

Texas Wildfire Risk Assessment Portal (TxWRAP) User Manual. Texas A&M Forest Service

Kodiak Island Borough GIS Online Tools

Title: GIS in School Administration: K-12 School Facilities Planning

Transcription:

Python, Cadastral Geocoding, and EMR Data Amy Hughes* 12 Sandi L. Pruitt 23 Tammy Leonard 14 * Presenter Supervising Author 1 University of Texas Southwestern Department of Clinical Sciences 2 University of Texas at Dallas School of Economics, Political, and Policy Sciences 3 Harold C. Simmons Comprehensive Cancer Center 4 University of Dallas Department of Economics

Contents Introduction Problem statement A Python & SAS Solution Results Next Steps

Introduction: Research Interests Population-Based Research Optimizing Screening Through Personalized Regimens (PROSPR) Cancer screening behaviors Outreach RCTs Observational studies Neighborhood effects Residential mobility

Introduction: Electronic Medical Record (EMR) EMR an electronic copy of a paper medical chart HITECH Act (2009) government mandate for EMRs and EMR systems Number of health benefits Offers new, more comprehensive data for observational research

Introduction: PROSPR Data EMRs from county safety-net hospital during 2005-2012 CRC screening exclusion restrictions Data enters the EMR when: Patient visits any clinic within the hospital A test or procedure is ordered A diagnosis is given Records can be viewed as An address history A medical history

Our Problem This particular dataset: Address cleaning of 275,000+ records Cadastral geocoding across 9 counties and 7 years Addition of ancillary data to medical record Future datasets: Data instability Interchangeability of data preparation method Quick turn-around

Our Solution: Workflow SAS 9.3 Clean address data Divide into years Send to geocoding Address/Moving analysis Screening behavior analysis Python Build parcels Standardize Merge ZIPS Single House locator Geocode by year Add ancillary spatial data Merge all years R Within-patient fuzzy matching Additional within-patient matches

Our Solution: SAS I 1. Address validation: Flag addresses that do not contain both a street number and a street name Flag addresses corresponding to P.O. Boxes Detect addresses spread across two fields and combine into one 2. Temporal filtering Group addresses based on date stamp for cadastral geocoding

Our Solution: Python I Problem: patient addresses span multiple counties 3. Assemble parcel files 4. Standardize parcel addresses 5. Merge to cover entire metro area 6. Spatially merge in ZIP codes for cadastral zone geocoding

Our Solution: Python II 7. Cadastral zone geocoding Patient address files (by year) are geocoded using parcels from the following year Use ZIP codes as zones ZIP codes are used to score matches, but not to limit the search for matches Allows for correct disambiguation of the same street address in different counties 8. Merge results from all years

Our Solution: R Problem: inconsistent data entry The same address for a patient is recorded with different spellings across clinic visits 9. Fuzzy matching within-patient Unmatched addresses are compared to matched addresses for the same patient using Levenshtein string distance If the distances is small enough, we manually match the address to circumvent the typo

Our Solution: SAS II 10.Unique address analysis Use distance between geocoded points to determine unique addresses across years Use within-group analysis in SAS to calculate unique address summary stats for each patient 11.Screening behavior analysis

Our Solution: Implementation Currently not a geoprocessing tool Scripts are run within SAS analysis file Execution of Python and R scripts via batch to Windows command line Entire process requires the click of a single button for reproducibility

Results Can quickly geocode many addresses to the parcel level across 9 counties and 7 years Creation of input-driven code enables: Use of the same process for future datasets Analysis of data at the cadastral level from multiple EMR sources

Results Benefits: Account for changing structure of the urban area Use parcel attributes in subsequent screening behavior analyses Define our own neighborhoods for analyses Reproducibility Ease of implementation Speed

Results Drawbacks: Use of third-party software can be expensive Loss of data due to more rigorous geocoding processes Using dual-range locator, can geocode ~92% of all addresses (includes messy data) Using single-house locator and fuzzy matching across years, can geocode ~76% of data (so far)

Next Steps Development of geoprocessing tool and/or toolbox Increase ability to correctly geocode difficult addresses Customization of zone geocoding behavior in single house locator through XML Eliminate the use of R by conducting fuzzy matching in SAS

Acknowledgments: Project funded by CPRIT grant #PP100039 NCI grant #5U54CA163308 Special thanks to Bennett Grinsfelder and Erica Cuate at UT Southwestern Data provided through PROSPR at UTSW ESRI: http://www.esri.com SAS: www.sas.com R: www.r-project.org