ANALYZING THE TEXT IN MEDICAL RECORDS: A COLLECTIVE APPROACH USING VISUALIZATION. By W H Inmon

Similar documents
DATA WAREHOUSING IN THE HEALTHCARE ENVIRONMENT. By W H Inmon

TEXTUAL ETL THE COMPONENTS. A WHITE PAPER BY W H Inmon. copyright 2014 Forest Rim Technology, all rights reserved

EC Wise Report: Unlocking the Value of Deeply Unstructured Data. The Challenge: Gaining Knowledge from Deeply Unstructured Data.

The growth of computing can be measured in two ways growth in what is termed structured systems and growth in what is termed unstructured systems.

ACHIEVING BUSINESS VALUE WITH BIG DATA. By W H Inmon. copyright 2014 Forest Rim Technology, all rights reserved

DATA WAREHOUSE/BIG DATA AN ARCHITECTURAL APPROACH

How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference.

co-sponsored by the Health & Physical Education Department, the Health Services Office, and the Student Development Center

RIDGE PHYSICAL THERAPY & WELLNESS CENTER. Intake Form

HOW THE DATA LAKE WORKS

Unit 14: The Question of Causation

Analytics. Irish Data Analytics Landscape Survey Analysis. The. Store

Advanced Rehab Solutions 609 Morris Avenue Springfield, NJ 07081

YOUR MEDICAL RECORDS AN UPDATE PROVIDED BY THE OTFORD PATIENT PARTICIPATION GROUP (PPG)

WELCOME PATIENT CONDITION

SOME STRAIGHT TALK ABOUT THE COSTS OF DATA WAREHOUSING

Irish Data Analytics Landscape Survey Analysis

Welcome to Manhattan Dental Studio, where delivering quality dental care for optimal health is

INFORMED CONSENT INFORMED CONSENT FOR PARTICIPATION IN A HEALTH AND FITNESS TRAINING PROGRAM

Analytics Industry Trends Survey. Research conducted and written by:

A guide to prostate cancer clinical trials

DATA MINING AND WAREHOUSING CONCEPTS

IBM Cognos Statistics

Understanding the Value of In-Memory in the IT Landscape

Alldent Dental Center Patient Registration

How To Write A Medical History Questionnaire For An Aransas Plastic Surgery

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Recovering From Heart Problems Through Cardiac Rehabilitation: Patient Guide The Keys to Heart Health

Five Questions to Ask Before You Use SAS for Your Next Analytics Project. White Paper

Integrating Unstructured Text into the Structured Environment

Big Data in the Nordics 2012

Traditional Analytics and Beyond:

Data Analytics for Healthcare: Creating understanding from big data

Name: Birthdate: Age: Address: City, State, ZIP: Preferred Phone # (Home)(Cell)(Work): Marital Status: M S W D

How Big Data is Shaping Health Care Decisions Pat Keran Sr. Director of Innovation Optum Technologies

SUBSTANDARD UNDERWRITING

Top 5 Analytics Applications in Financial Services

What actually is the immune system? What is it made up of?

BIG Data Analytics Move to Competitive Advantage

-General Information -Why Save the Umbilical Cord? -Pros and Cons ((Arguments Against Saving the Umbilical Cord)) -Pros and Cons ((Arguments For

PATIENT INFORMATION SHEET PHYSICIAN YOU ARE SEEING TODAY DATE OF OFFICE VISIT REFERRING PHYSICIAN LAST NAME FIRST NAME MI

Saint Francis Kidney Transplant Program Issue Date: 6/9/15

Performance Management for Enterprise Applications

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

Westoaks Orthopaedic Associates

Unstructured Textual Data in the Organization

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

Where World-Class Expertise and Genuine Compassion Come Together. AT THE FOREFRONT OF TRANSPLANT CARE Kidney Combined Kidney-Pancreas Pancreas Islets

EHRs and Contexts of Use

Putting Analytics to Work In Healthcare

National Emphysema Treatment Trial (NETT) Consent for Screening and Patient Registry

Praxis Physical Therapy and Human Performance 935 Lakeview Parkway Suite #195 Vernon Hills, IL Phone: Fax:

Policy Holder Name Relationship to Patient SSN DOB

VMware vcenter Log Insight Delivers Immediate Value to IT Operations. The Value of VMware vcenter Log Insight : The Customer Perspective

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

Hand & Orthopedic Physical Therapy Associates, P.C.

Personal Health Care Journal

City: State: Zip: City: State: Zip: Phone: Birth Date: Age: Marital Status: Single Married Divorced Widowed Cell Phone: City: State: Zip:

Important Information When Considering Portability Coverage

CAHPS Clinician & Group Survey

Horizon Eye Care, P.A. Patient Information Sheet. For your convenience, please print and complete the pre-registration forms before your visit.

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Patient Demographic Sheet

11 Serious and life-threatening side effects can occur while taking EVISTA. These include 12 blood clots and dying from stroke:

CENTRAL COAST SMALL BUSINESS DEVELOPMENT CENTER

It s about you What is performance analysis/business intelligence analytics? What is the role of the Performance Analyst?

Step 1: Complete the attached Health Appraisal and Medical History Questionnaire, Goal Inventory, and Liability Waiver.

Clintegrity 360 QualityAnalytics

SAS gets into visual analysis in earnest, brings DataFlux unit into the fold

Process Intelligence: An Exciting New Frontier for Business Intelligence

IBM Big Data in Government

A WHITE PAPER By Silwood Technology Limited

Motor Vehicle Accident - New Patient

Analytics and Big Data at State Farm

Health Care Data CHAPTER 1. Introduction

ARE YOU READY TO TAKE CONTROL?

CMF: Cyclophosphamide, Methotrexate and Fluorouracil

WHITE PAPER. QualityAnalytics. Bridging Clinical Documentation and Quality of Care

Comparative Analysis of the Main Business Intelligence Solutions

AMERICAN HERITAGE LIFE INSURANCE COMPANY (AHL) 1776 AMERICAN HERITAGE LIFE DRIVE JACKSONVILLE, FLORIDA 32224

WELCOME Thank you for taking the time to fill out this form. It will enable us to provide quality, personalized dental care for you.

BREAST CANCER AWARENESS FOR WOMEN AND MEN by Samar Ali A. Kader. Two years ago, I was working as a bedside nurse. One of my colleagues felt

Creating Tables ACCESS. Normalisation Techniques

BBBT Podcast Transcript

MEDICAL BREAKTHROUGHS RESEARCH SUMMARY

Management Consulting Systems Integration Managed Services WHITE PAPER DATA DISCOVERY VS ENTERPRISE BUSINESS INTELLIGENCE

LOG AND EVENT MANAGEMENT FOR SECURITY AND COMPLIANCE

Associated Ear, Nose & Throat Specialists, LLC. OCCUPATION: Employer: Work Phone: PHYSICIAN REQUESTING CONSULTATION: TOWN: PHONE:

RALPH R. GARRAMONE, MD, FACS (239)

PRIMARY LUNG CANCER TREATMENT

Find the signal in the noise

Male Patient Questionnaire & History

OnX Big Data Reference Architecture

Your guide to better health Grow healthy. Live well.

SAS Add-In 2.1 for Microsoft Office: Getting Started with Data Analysis

Using Data Mining to Detect Insurance Fraud

BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA?

CQMs. Clinical Quality Measures 101

the future in your hands imagine

Transcription:

ANALYZING THE TEXT IN MEDICAL RECORDS: A COLLECTIVE APPROACH USING VISUALIZATION By W H Inmon

With the rising costs of medicine and the advent of an aging population, there has never been a better time for accurate and thorough medical research. For years doctors and hospitals have treated patients and kept records as to the treatment, examinations, and outcomes of the care given. And for a given patient the information has been adequate. But there is a wealth of information that can be gathered when those medical records are examined collectively. Looking at many medical records collectively can yield insight into patterns relating to disease and conditions that may not be apparent when looking at just one or two medical records. But looking at multiple medical records at once on a collective basis has been challenging until now. When a patient undergoes medical care, there are many reasons for the encounter. There are - examinations - diagnoses - procedures - tests - emergency care and many more reasons why a patient needs medical care. Fig 1 There are many episodes of care - diagnoses - checkups - procedures - blood tests - emergency room - elective surgery - others And every time the patient undergoes an episode of care, careful records are taken.

Fig 2 Date - patient - Jim Jones - Dr Bowles - Littleton Swedish - - blood pressure - 140/82 - heartbeat - 72 per minute - weight - 205 - height - 5 11 - general health - good For every episode of care, a textual record is created The essence of these records is text that describes the intricacies of the encounter or episode of care. Sometimes the text describing the encounter is verbose. Sometimes the text is terse. The amount of text and the nature of the language depends on the physician, the nature of the encounter, and many other factors. Over time these medical records are collected by doctors, hospitals, and other agencies. For a given patient the collection of the records forms the personal medical history of the patient. There is much value to the patient from these records. But there is an even greater value to these records when the records are examined collectively. When a research organization can examine 10,000, 100,000, and even 1,000,000 records at a time, patterns relating to disease and medical conditions start to emerge that say a lot about disease and the human condition not just information about a given patient. Over time medical records are collected, often times from different sources. Date - patient - Jim Jones - Dr Bowles - Littleton Swedish - Date - patient - Jim Jones - Dr Bowles - Littleton Swedish - - blood Date pressure - patient - - 140/82 Jim Jones - Dr Bowles - Littleton Swedish - - blood Date pressure - patient - - 140/82 Jim Jones - Dr Bowles - Littleton Swedish - - heartbeat - blood pressure 72 per minute - 140/82 - heartbeat - blood - pressure 72 per minute - 140/82 - weight - heartbeat - 205-72 per minute - weight - heartbeat - 205-72 per minute - height - weight - 5 11-205 - height - weight - 5 11-205 - general - height health - 5 11 - good - general - height health - 5 11 - good - general health - good - general health - good Over time many records are collected Fig 3 And it is customary for these records to be collected electronically. Standard technology has the records that are collected electronically stored on conventional systems such as MicroSoft NT, IBM DB2 or Hadoop, among others. Typically the disk storage media is used to store the data.

Many records are collected electronically Fig 4 While electronic storage of medical records electronically has many advantages and many valid uses, there is one major drawback to the storage of medical records electronically. That disadvantage is that the records can be usefully accessed and analyzed only a patient at a time. The problem is that the records can only be meaningfully retrieved a record at a time Fig 5 There are several reasons for this limitation. The first reason for the limitation is that the records are stored textually. Standard technology does not handle unstructured text well. Standard technology handles structured data, numerical data and transactions quite well. But when it comes to text, standard technology is good for storing the text but not for retrieving and analyzing the text. The lack of structure of the text defeats many of the advantages of standard technology. A second reason why standard technology does not lend itself to the analysis of collective textual analysis is that most of the data resides on very different sources and technologies. One source of medical records is housed in MicroSoft s NT. Another source of medical records is housed in IBM s DB2. Another source of medical records is housed in Hadoop, and so forth. These technologies simply were never designed to work seamlessly with other technologies. Therefore it is no surprise that trying to look at medical records collectively is a real challenge when the medical records are scattered over different technologies, as they often are.

Another major challenge is that when medical records are examined collectively is that there is a difference in terminology. Orthopedic surgeons call a broken bone one thing and general practitioners call a broken bone something else. And vice versa the abbreviation ha to a cardiologiost means heart attack while the same abbreviation to an endocrinologist means hepatitis A. So merely throwing a bunch of medical records together is no guarantee that a collective analysis will yield anything meaningful. All of these problems with the integration of text and more must be surmounted if a collective analysis of medical records is to yield anything useful. Fortunately there is a solution to the need for looking at medical records collectively. That solution is Forest Rim Technology s Textual ETL technology. wdeqewpopo dkqdojknmk wdeqewpopo qsxnkinkiqsx dkqdojknmk wdeqewpopo qsxnkinkiqsx dkqdojknmk qsxnkinkiqsx Textual ETL Textual ETL reads and edits the text found in the medical records Fig 6 Fig 6 shows that Forest Rim Technology reads medical records wherever they are found on whatever technology they reside in. Forest Rim Technology doesn t care if the data comes from IBM, Teradata, NT, Oracle or any other source. As long as it is electronically readable text, Forest Rim Technology can handle it. After the medical records are read, terminology differences synonyms and homographs are resolved. Forest Rim Technology has sophisticated logic to handle the integration of different terminologies. The medical record data from multiple medical records is integrated into a single whole. Further edits such as stop word removal (eg. a, an, the, what, to, as, etc.) and stemming are performed to make the text that has been read pliable and ready for integrated analysis. Forest Rim Technology creates an integrated foundation of medical data that is integrated and comes from any electronically readable source. After Forest Rim Technology finishes the editing and conditioning of data, Forest Rim Technology can pass the data on to a reporting engine SeePower. SeePower takes the conditioned data and produces a special kind of visualization a SOM or a self organizing map.

IDS Foundation SeePower From the foundation work that IDS does, SeePower creates a SOM Fig 7 A SOM - self organizing map SOM s are a very special kind of visualization. SOM s reflect the entire mass of data that has been read and conditioned. SOM s are capable of representing thousands of documents and millions of words and phrases. In addition, the SOM that is produced is dynamically accessible. The basic idea behind a SOM is to group together text that is related and text that is aggregated. Fig 8 shows a SOM. A concentration of information A sparsity of information A SOM indicates where there is a concentration of information and a sparsity of information Fig 8 In Fig 8 the SOM shows that there is a concentration of information in one place and a sparsity of information elsewhere. In addition the SOM shows that there is a continuum of information from one type of information to the next. All of the text that has been read every word and phrase from all of the documents that have been read are represented in the SOM. As an example, suppose the medical records were from women from 20 to 50. There would be concentrations of information from thousands of medical records about child birth, monthly cycles, and menopause. There would be less information about smoking, broken bones, and obesity. And there would be very little information about rare blood conditions, rare bone conditions, and other rare disorders. The information that is regularly occurring in the many medical records would appear grouped together as a dark spot in the SOM. The information that is very infrequently occurring would appear as a light spot in the SOM. One of the most useful aspects of the SOM is the ability to drill down.

With a SOM you can drill down all the way to the source document Fig 9 When an analyst drills down, the analyst selects one word or phrase and explores the word and its relationship with other words further. In addition, the analyst can drill across. The analyst can see what text is closely related to what other text. All of this analysis is done by moving a cursor across the SOM. As an example, suppose the analyst finds an unexpected occurrence of lots of cases of emphysema. The analyst can isolate on those cases and look at them in lots of ways by geography, by age, by gender, by weight, by smoking habits, and so forth. The drill down can go to as low a level of detail as desired. Furthermore, if a really deep analysis is required, the analyst can look at the source documents that the word or phrase came from. In the case of drilling down on emphysema, the analyst can go down to the actual medical record itself. In a word, the SOM gives the analyst the capability of exploring and analyzing thousands of medical records all at once in a visual and natural mode of exploration. But perhaps the most interesting aspect of a SOM is the ability to show correlations of text from thousands of medical records together.

But perhaps the most interesting aspect of a SOM is the ability to detect and display corelations Fig 10 When a SOM shows a concentration of information in one place and a concentration of information elsewhere, there is a correlation of information. Sometimes that correlation of information is weak. Sometimes the corelation of information is strong. In any case the correlation shows up visually and clearly as a result of the examination of thousands of medical documents. As an example, suppose an analyst has done a study of the records of a particular kind of cancer say skin cancer. The analyst can see immediately the correlating factors. The analyst can see age, exposure to sunlight, skin type. But the analyst can see other kinds of relationships as well, which may not be expected, such as ingestion of vitamin C, other medications, gender, occupation, and so forth. All of the correlating factors make their appearance if they have ever been caught in a medical record. Of course, once an analyst has detected such a correlation, the corelation can be isolated and examined further. Once an area of interest has been detected it can be isolated and examined further Fig 11

There is one other thing Forest Rim Technology does that is of value to the research analyst. The output from Forest Rim Technology does not have to be used visually as described. Once edited and conditioned, the data from the medical records is available for further analysis using conventional analytical tools such as SAS, Business Objects, Cognos, Tableau, Qlikview, etc. Visualization and the access and conditioning of medical records then becomes the key to looking at and analyzing medical records collectively. Forest Rim Technology is located in Castle Rock, CO. Forest Rim Technology produces textual ETL, a technology that allows unstructured text to be disambiguated and placed into a standard data base where it can be analyzed. Forest Rim Technology was founded by Bill Inmon. For more information look at www.forestrimtech.com.