HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation

Similar documents
The deployment of OHMS TM. in private cloud

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

Data processing goes big

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Relational Database Basics Review

Tier Architectures. Kathleen Durant CS 3200

Implementing a Data Warehouse with Microsoft SQL Server 2012

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course Syllabus. Maintaining a Microsoft SQL Server 2005 Database. At Course Completion

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

SQL Server Training Course Content

Databases in Organizations

Distributed Computing and Big Data: Hadoop and MapReduce

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Data Migration between Document-Oriented and Relational Databases

SQL Server Administrator Introduction - 3 Days Objectives

AWS Schema Conversion Tool. User Guide Version 1.0

Client Overview. Engagement Situation. Key Requirements

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Automate Your BI Administration to Save Millions with Command Manager and System Manager

Explain how to prepare the hardware and other resources necessary to install SQL Server. Install SQL Server. Manage and configure SQL Server.

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

Big Data Analytics and Healthcare

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Designing Database Solutions for Microsoft SQL Server 2012 MOC 20465

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

Genomics and the EHR. Mark Hoffman, Ph.D. Vice President Research Solutions Cerner Corporation

LearnFromGuru Polish your knowledge

MS-40074: Microsoft SQL Server 2014 for Oracle DBAs

SAP Data Services 4.X. An Enterprise Information management Solution

MS 20465: Designing Database Solutions for Microsoft SQL Server 2012

Maintaining a Microsoft SQL Server 2008 Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

XpoLog Competitive Comparison Sheet

Data Warehousing and Data Mining in Business Applications

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

> Semantic Web Use Cases and Case Studies

CERULIUM TERADATA COURSE CATALOG

Implementing a Microsoft SQL Server 2005 Database

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

Category: Business Process and Integration Solution for Small Business and the Enterprise

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Course Outline. Module 1: Introduction to Data Warehousing

BIGDATA GREENPLUM DBA INTRODUCTION COURSE OBJECTIVES COURSE SUMMARY HIGHLIGHTS OF GREENPLUM DBA AT IQ TECH

Integration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes

DBMS / Business Intelligence, SQL Server

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

An Approach to Implement Map Reduce with NoSQL Databases

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Database as a Service (DaaS) Version 1.02

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Oracle Data Integrator 12c: Integration and Administration

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Oracle Data Integrator 11g: Integration and Administration

SQL Server 2012 Business Intelligence Boot Camp

ArcSDE Spatial Data Management Roles and Responsibilities

Testing Big data is one of the biggest

HL7 and SOA Based Distributed Electronic Patient Record Architecture Using Open EMR

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Chapter 3 - Data Replication and Materialized Integration

Data Integration Checklist

Chapter 7. Using Hadoop Cluster and MapReduce

SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Big Data and the Data Lake. February 2015

Implementing a Data Warehouse with Microsoft SQL Server

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Natural Language to Relational Query by Using Parsing Compiler

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

DATA WAREHOUSING AND OLAP TECHNOLOGY

Data Mining On Diabetics

Implementing a Data Warehouse with Microsoft SQL Server 2012

ANALYTICS IN BIG DATA ERA

Administering a Microsoft SQL Server 2000 Database

Contents. Introduction... 1

B.Sc (Computer Science) Database Management Systems UNIT-V

Course Syllabus. At Course Completion

<Insert Picture Here> Move to Oracle Database with Oracle SQL Developer Migrations

Final Report - HydrometDB Belize s Climatic Database Management System. Executive Summary

TORNADO Solution for Telecom Vertical

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

D5.3.2b Automatic Rigorous Testing Components

Data Warehouse: Introduction

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Migrating Discoverer to OBIEE Lessons Learned. Presented By Presented By Naren Thota Infosemantics, Inc.

Transcription:

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - aniketb1@umbc.edu CMSC 601 - Presentation Date-04/25/2011

AGENDA Introduction and Background Framework Heterogeneous Data Data Integration Model Problems Involved and Solution Personalized Web for Doctors and Patients Future Work Questions

CLINICAL DECISION SUPPORT SYSTEM? Clinical decision support system is a computer software, which is designed to assist physicians and other health professionals with decision making tasks, as determining diagnosis of patient data. Clinical Decision Support systems link health observations with health knowledge to influence health choices by clinicians for improved health care pre and post diagnosis to predict diseases. Types -Knowledge Based(if-then, inference based) -Non-Knowledge Based(Machine Learning)

Challenges include patient s symptoms, medical history, family history and genetics to be understood in detailed with the help of doctors. Choosing from methodologies like Bayesian Network, Neural network, genetic algorithms and rule based mechanisms is a challenge.

SINGLE-NUCLEOTIDE POLYMORPHISM (SNP)? Variation occurring when a single nucleotide A, T, C, or G in the genome differs between members of a species or paired chromosomes in an individual.

HUMAN GENETIC VARIATION

PROPOSED SOLUTION Clinical Data Genomic Data Data Integration Data Mining Predictions and Knowledge Web-based Access

FRAMEWORK

HETEROGENEOUS NATURE OF DATA 1) Difference in DBMS - Data Models (structures, constrains, query languages ) - System Level Support (concurrency control, commit, recovery) - Semantic Heterogeneity 2) Operating System - File Systems - Naming, file types, operations - Transaction Support - Inter process Communication 3) Hardware - Instruction Set - Data Format & Representation - Configuration

MEDICAL DATA FORMATS Clinical Data : From Clinics : HL7 (Health Level 7), XML,Text,CSV Genetic Data : XML, VCF, Text, CSV, etc Per patient : Clinical data and genomic data not available in most cases and integration with the available is a challenge.

EXISTING AND POTENTIAL DATA BASES IN HEALTHCARE Electronic Patient Record Tissue Sample Bank Patient ID Genetic Profile Proteomics Imaging Data Laboratory Data A n o n o m i t y Drugs + side effects Toxicology Drugs Interactionsn Molecular Mechanisms Clinical Guidelines

Data Model

DATABASE SCHEMA

EXTRACT, TRANSFORM, LOAD Extracting from various data formats eg: relational or non relational. -Parsing, Validation, etc Transformation according to target DB or business rules. -Automated column selection -Mapping fields -Encoding values -Merge and Look up - Transposing or pivoting. - Proper data validation according to target schema. Load phase involves the proper structured files into to target DB.

PROBLEMS Data processing on voluminous data demands advanced computational resources. Deployment and continuous monitoring of database on Bluegrit Server. Extraction of medical data from various sources, formats and correlating them is a challenge. Loading stage needs proper formatting of incoming data to application database(eg XML or HL7 to RDBMS) Integrity among the databases should be maintained along with redundant data to be merged on the basis of various attributes. Automation of the ETL phase is a tough task. Scalability of the data model to accommodate more information.(eg :no. of keys,indexes,partitions,tables, architecture, platform )

SOLUTIONS Use of MySQL Cluster version deployed on Bluegrit with 1 mgt node and 2 data nodes to speed up processing. The data can be correlated using metadata files, online journals and consultation with doctors. Extraction of the data from various data sources can be achieved after getting controlled access to data sources. ELIXIR and MirthConnect tools can help to convert data from XML/HL7 to RDBMS respectively. Automation of extraction, transform and load phase with usage of scripts in conjunction with the tools. Migration to Hbase/MongoDB at later stage since the data is huge and not relational most of the times.

PERSONAL WEB FOR DOCTORS & PATIENTS Doctors and patients should be able to access the data from different institution with single login. Doctors should be able to map various data sources. Add, modify and delete existing records (controlled access). Access to patients historical clinical and genetic traits needs data to be archived. Access to recent journals and papers by doctors to make decisions needs integration of various journal DBs in future. Personalized pages needs doctors/patients for alerts, reminders,etc.

FUTURE TASKS Optimize performance to handle more patient's data from different institutions and various diseases for the web application. Automate tasks for variety of routine tasks related to the database(especially ETL). Testing various features of data model with respect to integration of modules. Test the compatibility of application and scalability on various platforms with respect to database, operating system.

QUESTIONS