Medical Data Mining Applications and its Uses



Similar documents
Healthcare Measurement Analysis Using Data mining Techniques

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

Dynamic Data in terms of Data Mining Streams

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

Data Mining Part 5. Prediction

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

2.1. Data Mining for Biomedical and DNA data analysis

Artificial Neural Network Approach for Classification of Heart Disease Dataset

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

SPATIAL DATA CLASSIFICATION AND DATA MINING

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Information Management course

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

An Empirical Study of Application of Data Mining Techniques in Library System

Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region

Data Mining On Diabetics

Effective Analysis and Predictive Model of Stroke Disease using Classification Methods

Keywords data mining, prediction techniques, decision making.

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Introduction to Data Mining

PharmaSUG2011 Paper HS03

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis

Introduction. A. Bellaachia Page: 1

Social Media Mining. Data Mining Essentials

Data Mining Solutions for the Business Environment

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Prediction and Diagnosis of Heart Disease by Data Mining Techniques

Customer Classification And Prediction Based On Data Mining Technique

Introduction to Data Mining

Rule based Classification of BSE Stock Data with Data Mining

Prediction of Heart Disease Using Naïve Bayes Algorithm

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Decision Support System on Prediction of Heart Disease Using Data Mining Techniques

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

International Postprofessional Doctoral of Physical Therapy (DPT) in Musculoskeletal Management Program (non US/Canada) Curriculum

DATA MINING AND REPORTING IN HEALTHCARE

Grid Density Clustering Algorithm

DATA MINING TECHNIQUES AND APPLICATIONS

Investigating Clinical Care Pathways Correlated with Outcomes

Data Mining Applications in Fund Raising

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

REVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING DATA MINING TECHNIQUES

Classification and Prediction

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

BREAST CANCER AWARENESS FOR WOMEN AND MEN by Samar Ali A. Kader. Two years ago, I was working as a bedside nurse. One of my colleagues felt

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

COURSE RECOMMENDER SYSTEM IN E-LEARNING

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Neural Networks in Data Mining

TIETS34 Seminar: Data Mining on Biometric identification

Data Warehousing and Data Mining

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

A Novel Approach for Heart Disease Diagnosis using Data Mining and Fuzzy Logic

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Application of Data Mining Methods in Health Care Databases

Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Heart Disease Diagnosis Using Predictive Data mining

There are many different types of cancer and sometimes cancer is diagnosed when in fact you are not suffering from the disease at all.

A STUDY ON CLINICAL PREDICTION USING DATA MINING TECHNIQUES

Introduction to Information and Computer Science: Information Systems

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Data Warehousing and Data Mining

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

Data Mining Applications in Higher Education

Data Mining: Concepts and Techniques

A Review of Data Mining Techniques

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

An Overview of Database management System, Data warehousing and Data Mining

Business Loan Insurance Plan Critical Illness Claim - Policy 57903

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Data Mining Introduction

Integrating Data Mining Techniques into Telemedicine Systems

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Exploration and Visualization of Post-Market Data

USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES

Web-Based Heart Disease Decision Support System using Data Mining Classification Modeling Techniques

The Scientific Data Mining Process

Database Marketing, Business Intelligence and Knowledge Discovery

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

A Web-based Interactive Data Visualization System for Outlier Subspace Analysis

Transcription:

Medical Data Mining Applications and its Uses School of Computing Science and Engineering VIT University, Vellore 632014, Tamil Nadu, India. Abstract Data mining is considered as a new method for the analysis of data. It is used to discover new useful knowledge. Data mining help the researcher to have deep insight of understanding of large information repositories. Data mining can reveal new health care knowledge and in turn can support strong clinical and administrative decision making from large clinical data bases. This paper first summarizes about data mining in general. Then speaks about data mining in KDD process and then about the uniqueness of data mining in health care applications compared with other applications. Then briefly summarizes various data mining algorithms and their usage guidelines. Then introduces the usage of data mining in health care. Successful data mining application in health care can support both clinical decision making and administrative decision making. This paper concludes with the benefits associated with the clinical use of data mining by medical experts. Key words: clinical data bases, Health care, decision making 1. Introduction The practice of using concrete data and evidence based medicine has existed for many centuries. Data mining applications in medicine and public health is relatively young field of study [1]. Data mining is considered as a new data analysis method. It can discover some useful knowledge from large data bases. Now-a-days lots of data is being collected and stored at enormous speed. Traditional data analysis techniques are infeasible. Human analyst may take time to discover useful information and much of the data is never analyzed at all. Data mining is considered as automated analysis of massive data sets. Compared with other data mining areas, medical data mining has some unique characteristics [1]. Since medical files are related to human subjects, privacy concern is taken more seriously than other data mining tasks. Data mining originated as interdisciplinary field of statistics and machine learning and then advanced from these beginnings to include artificial intelligence, pattern recognition, high performance computing, data base technology, visualization etc [1]. The term data mining can be defined in many ways [3]. One way of defining the term data mining is easy extraction of hidden, true, previously unknown and worthy information from data in large data repositories. Data mining can be referred by many other names like knowledge discovery in data bases (KDD), knowledge extraction, data analysis, data archeology, data dredging, information harvesting etc [3]. 2. Data mining in KDD process Researchers believe data mining as one of the step in KDD process (Fig 1). Knowledge discovery in databases (KDD) is organized into data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge presentation [1, 3]. 8

Fig 1: Knowledge Discovery in Data bases Data mining in KDD process can be defined as a set of applications of specific algorithms for pattern extraction from preprocessed dataset [5, 7]. The successful data mining application provides useful health care knowledge that can be used to support clinical decisions like process of diagnosis, prognosis prediction and therapy. It can also support administrative decision making like health care coverage, staffing estimates and hospital resource usage [1]. 3. Data mining in health care and its applications Health care data are primarily generated from the delivery of patient care. Therefore, medical data mining involves both privacy issues and legal issues. Data mining in health care is essential because nature of medical data is incomplete as different person suffering from same disease may not undergo same kind of test and diagnostic procedure because of the factors like age, family history, work environment etc. Too many diseases are now available for decision making and there is increased demand for health care services like creating disease awareness. The onset of disease has to be detected in a cost-effective, non-invasive and painless way. The quality of medical data is inferior because of missing values [11] and Hospital Information System (HIS) or database designed for financial or billing purpose. Today, health care industry generates large amounts of data about patients, hospital resources, diagnosis of diseases, electronic patient records, medical devices etc. This large amount of data is to be processed for knowledge extraction that enables support for cost-savings and decision making. Data mining introduces a set of methods that can be applied to this preprocessed data to discover hidden patterns that provide healthcare professionals an additional source of knowledge for making clinical and administrative decisions. But how ever at last the decision rests with health care professionals. 3.1 Medical Applications: Data mining application in health care is mainly used for clinical decision making and administrative decision making. This section describes the usage of data mining for various medical applications. 9

3.1.1 Diagnosis: Data mining can assist in decision making with a large number of inputs and in stressful situations. It can perform automated analysis of Pathological signals (ECG, EEG and EMG) and Medical images (mammograms, ultrasound, X-ray, CT, and MRI). Examples: Heart attacks, Chest pains, Rheumatic disorders, Myocardial ischemia using the ST-T ECG complex and Coronary artery disease using SPECT images. 3.1.2 Therapy: Based on modeled historical performance, data mining can select best treatment plans. Examples: Using patient model, predict optimum medication dosage: e.g. for diabetics and Data fusion from various sensing modalities in ICUs to assist overburdened medical staff. 3.1.3 Prognosis: Accurate prognosis and risk assessment are essential for improved disease management and outcome. Examples: Survival analysis for AIDS patients, Predict pre-term birth risk, Determine cardiac surgical risk, Predict ambulation following spinal cord injury, Breast cancer prognosis. 3.1.4 Biochemical/Biological Analysis: Data mining can automate analytical tasks for urine and blood analysis, tracking the level of glucose, determining level of ion in body fluids, pathological condition detection. 3.1.5 Epidemiological Studies: Study of health, disease, morbidity, injuries and mortality in human communities. It can discover patterns relating outcomes to exposures, study independence or correlation between diseases and analyze public health survey data. Examples: Assess asthma strategies in inner-city children and Predict outbreaks in simulated populations. 3.1.6 Hospital Management: Optimize allocation of resources and assist in future planning for improved services. Examples: Forecasting patient volume, ambulance run volume, etc. and predicting length-of-stay for incoming patients. 4. Data mining algorithms and guidelines Researchers must have a deep insight of understanding of various data mining algorithms and their functionality. Data mining algorithms can be classified into two categories. First category is called as descriptive or unsupervised learning and second category is called as predictive or supervised learning [3, 5, 7, 8, 9]. In descriptive data mining algorithms class labels of training data are unknown. Clustering and Association rule mining comes under this category. In predictive data mining algorithms training dataset are accompanied by class labels and new data is classified based on training set. Classification and Regression methods come under this category. Three of the most widely used data mining algorithms in health care are classification method, clustering and association rule mining. Each of these methods are described below along with the guidelines for their respective use. 4.1 Classification Classification method is used to predict the class label of a categorical attribute. Class is the dependent attribute in which users are most interested. The value of this dependent attribute is predicted by using the set of independent attributes. In the medical field, classification method can be used to define the diagnosis procedure and prognosis prediction based on the symptoms and health conditions of the patient. 10

Fig 2: Two step process of classification algorithms Classification method consists of two steps. They are labeled as learning step and classification step (Fig 2). Learning step takes training data as input and builds a classifier which generates the classification rules. In classification step, test data is used to check the accuracy of the classifier. Predictive accuracy of the classifier is estimated. Accuracy of the classifier is the percentage of test data whose class labels are correctly classified by the classifier. If the accuracy of the classifier is considered acceptable then it is used to classify the class labels of future data tuples. 4.1.1 Classification Guidelines Before applying classification algorithm to a data set, it is must to identify the relevant and irrelevant attributes. For example age and date of birth are relevant attributes. That is, one can be derived from other. Such attributes need to be identified and one among them has to be removed from the dataset. The totally irrelevant attributes should be identified because these attributes may act as noise and slow down the efficiency and accuracy of classification method. For example sex attribute on prostate cancer data mining is irrelevant. Classification methods have high prediction power. That is why it is preferred in medical data mining. Many classification algorithms exist. But there is no one best algorithm for any data set. If the number of attributes is changed by the attribute selection method, then the classification algorithm for a data set is also changed [12, 13]. Thus it is preferable to apply many classification algorithms to the training data set. The output of the classification method should be tested with the testing data to measure the accuracy of the model. At last the best algorithm should be selected for future prediction activities. 4.2 Clustering Cluster is the collection of similar data objects. Objects in a cluster are similar to each other and dissimilar to the objects in another cluster (Fig 3). Cluster analysis finds similarity between data objects based on some characteristics and grouping them into clusters. It is an unsupervised learning process that occurs by observing only independent attributes in the data set. Clustering doesn t use class concept. That is why it is preferred for the studies that encompass large amount of data but very little information is known about data. 11

Fig 3: Clusters of patients with respect to their location Good cluster have the property of high intra-class similarity and low inter-class similarity. Quality of the cluster is determined by the method of the similarity measure used. Quality of the clustering method is measured by its ability to discover some or all hidden patterns. 4.2.1 Clustering Guidelines Clustering is generally preferred when very little information about data is known or nothing is known about the data. That is why clustering is widely applied to study micro array data sequence. Almost every clustering algorithm can handle only numerical attributes. However, most health care data bases have number of attributes of categorical type. Although conversion of such attributes in to numeric is possible, such conversion may distorts the distance between categories. Some algorithms can handle the categorical attributes without converting them into numerical attributes. For example FarthestFirst and Two-step Cluster analysis methods can handle the categorical data attributes [1]. 4.3 Association Rule Mining Association rule mining otherwise called as frequent item set or pattern mining is often employed to find frequent patterns, associations, correlations among set of items in the data base. In the health care industry, one can use this method to discover underlying relationships among disease symptoms, relationship among health conditions of different patients suffering from same disease and relationship among various diseases. Researchers can derive evidence-based hypotheses about health conditions and symptoms contributing to a disease or complications [1]. Association rule mining algorithms are not evaluated based on its accuracy because every association algorithm mines all association rules. Association mining algorithms are evaluated based on its efficiency to mine hidden rules from large dataset. Efficiency can be improved in many ways. Association mining algorithm efficiency can be improved by using hash functions and tables, by transaction reduction methods [14] and by using sampling methods [15]. 4.3.1 Association Guidelines Concept hierarchy can be used to compress lot of raw association rules like {smoking, drinking, helicobacter pylori infection} {cancer} into manageable size. The main focus of classification algorithm is towards class label whereas association mining algorithms are mainly used to discover relationship among all attributes. Good association mining method should ignore association rules 12

which are of meaningless. For example {cervical cancer} {male} is a meaningless rule to be ignored by association algorithm. Raw association rules have to be organized by using concept hierarchy. 5. Uses of health care mining Many have initiated the research work in health care mining. Some of the uses of health care mining are listed below 5.1 Hosting of safety issues and prevention of hospital errors: When health care industry apply data mining methods on their existing data bases, they can discover true, new, useful and life saving information. Data preprocessing can reveal unknown medical errors. By mining patient s records many safety issues can be flagged and addressed by the hospital management. 5.2 Policy recommendations in public health: Data mining can lead to policy making in public health. Data mining can explore the cause of failing to implement strict sanitary and sterilization measures. It can also explore the advantages of implementing such policies. 5.3 Cost savings at minimal extra cost: Data mining allows health care management and other organizations to get more out of existing at minimal extra cost. For example data mining can be applied to discover fraud in credit cards and insurance claims. 5.4 Early detection and prevention of diseases: Medical practitioners can identify patterns and outliers better by using data mining method rather than just by looking at the tabulated data. For example classification algorithms can help medical experts in early detection of heart diseases. 5.5 Cost effective and painless diagnosis: Some disease diagnosis methods and lab test procedures are painful, costly and invasive to the patients. For example conducting biopsy in women to detect cervical cancer is costly and painful procedure. Medical experts can first use data mining algorithms to decide whether or not to recommend a biopsy for a patient who is suspected to have cervical cancer. 5.6 Adverse Drug Events: Data mining can be used to discover knowledge about drug side effects. Some drugs that have been approved as non-harmful are later detected to have harmful side effects. Data mining algorithms can be used to detect the side effects caused by a drug well in advance. 6. Conclusion Data mining can help the medical experts to discover new life saving information. When medical institutions apply this on their existing data base, they can uncover new, valid and useful knowledge that otherwise would have remained inert in their data bases. More over data mining can lead to better medical policies like strict sterilization and sanitation, vaccination planning etc. But however before using data mining algorithms an medical institution must formulate clear policies on the privacy, confidentiality and security of patient records. Efficient screening tools reduce demand on costly health care resources. Data mining provides better insight into the medical survey data and help the physicians cope with the information overload. Health care insurers, health care organizations, physicians and patients get more out of this data mining technique. Healthcare insurers can detect fraud in the medical insurance claims, Healthcare organizations can make clear customer relationship management decisions, Physicians can identify cost effective and efficient treatment methods and Patients can receive better and more affordable health care services. With the help of data mining technique, health care professionals can provide better service to the society. However, at last clinical decisions rests with the medical expert. Data mining in health care can change the world. 13

References 1. Illhoi Yoo, Patricia Alafaireet, Miroslav Marinov, Keila Pena-Hernandez, Rajitha Gopidi, Jia-Fu Chang, Lei Hua Data Mining in Health Care and Biomedicine: A survey of the literature J Med Syst DOI 10.1007/s10916-011-9710-5 Springer 2011 2. Mirjana Pejic Bach, Dijana Cosic Data mining usage in health care management: literature survey and decision tree application medicinski Glasnik 2008 3. Han, J., Kamber, M., Data mining: concepts and techniques, 2 nd edition, The Morgan Kaufmann Series, 2010. 4. Dursun Delen, Christie Fuller, Charles McCann, Deepa Ray Analysis of healthcare coverage: A data mining approach Expert Systems with Applications 36 (2009) 995 1003 Elsevier. 5. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., The KDD process of extracting useful knowledge from volumes of data communication ACM 39(11):27 34, 1996. 6. Berger, A., and Berger, C., Data mining as a tool for research and knowledge development in nursing Comput. Inform. Nurs. 22(3):123 131, 2004. 7. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., from data mining to knowledge discovery in databases, Commun. ACM 39 (11):24 26, 1996. 8. Brachman, R. J., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G., and Simoudis, E., Mining business databases Commun. ACM 39(11):42 48, 1996. 9. Velickov, S., Solomatine, D., Predictive data mining: practical examples, 2nd Joint Workshop on Applied AI in Civil Engineering, Cottbus, Germany, March 2000. 10. Dunham, M., Data mining Introductory and advanced topics, Pearson Education, 2003. 11. Ichise, R., and Numao Learning, M., First-order rules to handle medical data, NII Journal 2:9 14, 2001. 12. Potter, R., Comparison of classification algorithms applied to breast cancer diagnosis and prognosis, advances in data mining, 7th Industrial Conference, ICDM 2007, Leipzig, Germany, July 2007, pp.40 49. 13. Harper, P. R., A review and comparison of classification algorithms for medical decision making, Health Policy 71:315 331, 2005. 14. Park, J. S., Chen, M. S., Yu, P. S., An effective hash-based algorithm for mining association rules, Proceedings 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD 95), San Jose, CA (May 1995), pp. 175 186. 15. Toivonen, H., Sampling large databases for association rules, Proceedings 1996 International Conference on Very Large Databases (VLDB 96), Bombay, India (Sept. 1996), pp.134 145. 16. Bellazzi, R., and Zupan, B., Predictive data mining in clinical medicine: current issues and guidelines. Int. J. Med. Inform.77:81 97, 2008. 17. Übeyli, E. D., Comparison of different classification algorithms in clinical decision making. Expert syst 24(1):17 31, 2007. 18. Kaur, H., and Wasan, S. K., Empirical study on applications of data mining techniques in healthcare. J. Comput. Sci. 2(2):194 200, 2006. 19. Bertsimas, D., Bjarnadóttir, M. V., Kane, M. A., Kryder, J. C., Pandey, R., Vempala, S., and Wang, G., Algorithmic prediction of health-care costs. Oper. Res. 56(6):1382 1392, 2008. 20. Kerr, G., Ruskin, H. J., Crane, M., and Doolan, P., Techniques for clustering gene expression data. Comput. Biol. Med. 38 (3):283 293, 2008. 14