Big Data and Data Governance Alicia d Empaire 1
Agenda Definition of Big Data Unique Challenges of Big Data Why Data Governance is Needed Healthcare Use Cases of Big Data BHSF Data Governance and Big Data Initiatives 2
Defining Big Data 3
Defining Big Data Big Data is a concept used to describe vast amounts of diverse data, both structured and unstructured, that organizations access quickly and analyze using innovative new tools that help pinpoint opportunities to better manage and improve value. Big Data is a term associated with a set of techniques and technologies needed to solve business problems that we could not previously support due to technology limitations, prohibitive cost or both. 4
The 4 Vs of Big Data Volume of data being captured by enterprises Variety of data types being captured by enterprises Velocity or rate at which data is being generated Veracity or trustworthiness of the data 5
Unique Challenges of Big Data New stack of technologies that is fairly complex to learn Limited Big Data resources Poor Data Quality + Big Data = Big Problems Lack of Project and Data Governance 6
7
Definition of Data Governance Data governance is an emerging, cross functional management program that treats data as an enterprise asset: A collection of corporate policies, standards, processes, people and technology essential to managing critical data to set goals. (Maria Villar and Theresa Kushner) Data governance is business-driven. (How to define data governance and how the organization understands it). 8
Why Data Governance Data Governance promotes data quality, data integrity, data consistency, data timeliness, data security, information privacy, and thus increases the information usability and reliability Data Governance program encourages the understanding and management of the data from both business and technical perspectives, and promotes the importance of the data as a valuable resource, allowing the enterprise to use the data confidently to satisfy business needs 9
Why Data Governance Implement data governance to enhance data standardization and improve data quality Example #1: (Invision Acct-Number & Lawson Acct-Number) Invision Acct_Number (Patient Account Number) Lawson Acct_Number (GL Account Number) Acct_Number (Patient Account Number) GL_Acct_Number (GL Account Number) BHSF Data Warehouse 10
Why Data Governance Implement data governance to provide data standardization and improve data quality Example #2: (Medical Record Number) Invision Trendstar Horizon Lab Med_Rec_Number (123456) Med_Rec_Number (00000123456) Med Rec Number (123456B) Med_Rec_Number (123456) BHSF Data Warehouse 11
Future BHSF BI Strategy Implement data governance to provide data standardization and improve data quality Example #2: (ICD-9 Code 3.23 Salmonella Arthritis) Invision PA Trendstar HDM HIM ICD9 (003.23) ICD9 (00323) ICD9 (003.23) DSCH_DX_CODE (003.23) BHSF Data Warehouse 12
Healthcare Big Data Use Cases Predicative Analytics Disease Surveillance Sensor Data Genomics Medical Research 13
BHSF Big Data Initiatives Existing Big Data Initiatives o o o HL7 ADT real time data semi structured HL7 messages (around 9.5 GB per month) (Volume and Variety) Invision Patient Accounting data (around 7 GB per month) (Volume) Relational databases, flat files, HTML, Excel and XML file sources and targets (Variety) Upcoming Big Data Initiatives o o o Natural Language Processing tools that will generate structured data from unstructured dictated or transcribed data. (Volume, Variety and Velocity) Data Warehouse Appliance for Lifetime Clinical Repository data (Velocity) Machine Sensor Data for remote monitoring (Volume, Variety and Velocity) 14
Data Governance BHSF BI Clinical Clinical/Operational Revenue Pricing Operational Third Party Projects Reconciliation ERP 50+ sources 10+ years of data Soarian (EHR) EPSI ERP PeopleSoft CVIS External Data Orders Results Patient Management Patient Accounting Lifetime Clinical Record Healthcare Query Soarian Quality Measures 2 years of data Finance Cost Accounting 3 years of data HR Payroll Supply Chain Finance Operational Data Cardiovascular Clinical Information Premier Press Ganey Predictive mortality (Apache) Care Discovery Cardinal AMS Integration Cancer Center Staples US Foods 15
BHSF Data Governance Data Governance Committee o Set up a committee with members from various operational areas o Selected our first pilot data governance project (registration data) Data Governance Initiatives o o o Partnering with consultants to assist in our data governance structure Looking for business involvement and an ongoing commitment for data governance from the operational side Implement data management tools to support the governance process (metadata, business glossaries, data quality) 16
17
Questions 18