Data mining approach to policy analysis in a health insurance domain
|
|
|
- Joanna Lawson
- 10 years ago
- Views:
Transcription
1 International Journal of Medical Informatics 62 (2001) Data mining approach to policy analysis in a health insurance domain Young Moon Chae a, *, Seung Hee Ho a, Kyoung Won Cho a, Dong Ha Lee b, Sun Ha Ji a a Graduate School of Health Policy and Administration, Yonsei Uni ersity, CPO Box 8044, Seoul, , South Korea b Department of Computer Science and Engineering, Pohang Uni ersity of Science and Technology, Pohang, South Korea Abstract This study examined the characteristics of the knowledge discovery and data mining algorithms to demonstrate how they can be used to predict health outcomes and provide policy information for hypertension management using the Korea Medical Insurance Corporation database. Specifically, this study validated the predictive power of data mining algorithms by comparing the performance of logistic regression and two decision tree algorithms, CHIAD (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5) using the test set of 4588 beneficiaries and the training set of 13,689 beneficiaries. Contrary to the previous study, the CHIAD algorithm performed better than the logistic regression in predicting hypertension, and C5.0 had the lowest predictive power. In addition, the CHIAD algorithm and the association rule also provided the segment-specific information for the risk factors and target group that may be used in a policy analysis for hypertension management Elsevier Science Ireland Ltd. All rights reserved. Keywords: Knowledge management; Data mining; Logistic regression; Hypertension; Health insurance 1. Introduction * Corresponding author. Present address: Graduate School of Health Science and Management, Yonsei University, CPO Box 8044, Seoul, , South Korea. Fax: address: [email protected] (Y.M. Chae). Healthcare organizations are generally characterized as information-dependent organizations. Knowledge-intensive technology is vital to these information-dependent organizations as knowledge is becoming more important in healthcare organizations, with the impact of technology and the resultant complex, competitive environment. The process of systematically and actively managing and leveraging the stores of knowledge in an organization is called knowledge management [1]. Information systems can play a valuable role in knowledge management, in which it helps the organization optimize its flow of information and capture its knowledge base. The Korea Medical Insurance Corporation (KMIC) provides a health insurance to all /01/$ - see front matter 2001 Elsevier Science Ireland Ltd. All rights reserved. PII: S (01)00154-X
2 104 Y.M. Chae et al. / International Journal of Medical Informatics 62 (2001) civil service workers, teachers, and their dependants. All insured beneficiaries are required to participate in biannual medical examinations performed by KMIC, as a part of nation-wide health-promotion program. A questionnaire was also distributed to all participants 3 4 days before the examination to collect information on perceived health status, tobacco consumption, and exercise habits. While health promotion is a new concept in Korea, recent changes in the cause of death have piqued interest in health promotion. In 1995, a health-promotion law was passed that provides funds for health-promotion research through a tax on tobacco sales, requires that a small portion of the medical insurance funds be spent on prevention activities, and requires communities to develop health-promotion plans [2]. The KMIC database is of an enormous value in monitoring health status and developing national health-promotion programs because it contains health-utilization data as well as risk factors such as demographic data, biomedical data, and lifestyle data for the same beneficiaries over time. Despite its usefulness, KMIC failed to mobilize, exploit, and capitalize on the information available in the database. Such information is needed for developing policies and inducing business process change. The probable reason is that its ability to collect and store the data has grown proportionately faster than its ability to analyze data from a large temporal database. This paper presents the knowledge management tool called knowledge discovery and data mining to make effective use of the KMIC database to discover the untapped information using the hypertension-management program as an example of a health-promotion program. Data mining is a nontrivial process of identifying valid, novel, potentially useful, and an ultimately understandable pattern in data [3]. Typically, the applications involve large-scale information banks such as a data warehouse. In healthcare, insurance companies and large hospitals are ideal settings for the application of data mining. Some of the previous applications for data mining in healthcare were pathology information systems [4,5]. These systems, however, did not deal explicitly with policy analysis using various data-mining models. Long et al. [6] compared the performance of logistic regression to a popular data mining model, called C4.5 decision tree induction, in classifying patients as having acute cardiac ischemia, and found that logistic regression performed better than C4.5. In this paper, the performance of logistic regression and two decision tree algorithms, CHIAD (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5), in predicting hypertension were compared. In addition, this paper demonstrated how the decision tree algorithm and another data mining model, called association rule, could be used in a policy analysis for hypertension management in a health-insurance domain. 2. Methods 2.1. Subjects The subjects were randomly selected from a population of 127,886 beneficiaries who participated in a biannual medical examination conducted by KMIC in In selecting a sample for this study, 50% of the total population was randomly selected in the first stage, and 100% of the beneficiaries with hypertension (9,103) and an equal number of beneficiaries without hypertension were randomly selected in the second stage. This sample was further randomly divided into a training set (13,689) and a test set (4,588) (the ratio was approximately 3:1).
3 Y.M. Chae et al. / International Journal of Medical Informatics 62 (2001) Biometric data, including blood pressure, blood glucose, cholesterol, urinary glucose, urinary protein, and height and weight, were collected during the physical examination. Hypertension was defined as systolic blood pressure 140 mmhg or diastolic blood pressure as 90 mmhg, and low risk as values below these. A fasting blood specimen was drawn and analyzed for total cholesterol and blood glucose Knowledge models Logistic regression Logistic regression is a nonlinear regression method for predicting a dichotomous dependent variable. Logistic regression was performed to identify risk factors for hypertension using patient characteristics, history, lifestyle, and test results as independent variables and the hypertension status as the dependent variable. Stepwise selections of the independent variables were made, and the corresponding coefficients were computed. In producing the logistic regression equation, the maximum-likelihood ratio was used to determine the statistical significance of the variables. Logistic regression has proven to be very robust in a number of medical domains and proves an effective way of estimating probabilities from dichotomous variables Decision tree Decision trees are known as effective classifiers in a variety of domains. Logistic regression and decision-tree induction have different underlying assumptions. For logistic regression, it is assumed that the influence of a variable on the outcome is uniform across all subjects unless specific interactions with other variables are included. However, the decision tree assumes that the effect of a variable in the subset is unrelated to the effect of the variable in other subsets of subjects. In our example, the decision tree categorizes the entire subjects according to whether or not they are likely to have hypertension. CHIAD and C5.0 are two popular decision-tree inducers, based on the ID3 classification algorithm by Quinlan [7]. A CHAID tree is a decision tree that is constructed by splitting subsets of the space into two or more child nodes repeatedly, beginning with the entire data set. To determine the best split at any node, any allowable pair of categories of the predictor variables is merged until there is no statistically significant difference within the pair with respect to the target variable. In this paper, the CHIAD algorithm with growing criteria of the likelihood ratio chi-square statistic was used for building the tree and evaluating splits because most of our variables were ordinal and discretized continuous variables. To identify nodes of interest (that is, nodes with a relatively high probability), a gains chart was used. The gains chart shows the nodes sorted by the number of cases in the target category for each node. C5.0 uses a pruning strategy where a branch is pruned when the introduced error is one standard error of the existing errors adjusted for the continuity correction. C5.0 also uses a boosting technique to generate and combine multiple classifiers to give improved predictive accuracy. Compared with C4.5, the error rate of C5.0 s boosted classifiers is about one-third of the error rate of C4.5 s single classifiers [8]. Both algorithms also provide classification rules for identifying risk factors that may be used to develop programs for hypertension management Association rule An association rule gives an occurrence relationship among factors. In a well-known market basket problem, the association rule has been used to discover buying patterns
4 106 Y.M. Chae et al. / International Journal of Medical Informatics 62 (2001) such as two or more items that are often bought together. In this paper, the association rule was used to identify the occurrence relationship between hypertension and various modifiable risk factors, such as smoking or drinking, in an attempt to develop a hypertension-management program. An association rule is intended to capture a certain type of dependence among items. Suppose that i 1 = i 2, then support is a probability that i 1 and i 2 occur together confidence, of all the baskets containing i 1, is a probability of also containing i Results 3.1. Characteristics of the subjects The mean age of the study participants was 52.1 years among men and 51.4 among women. Among the 12,077 men, 5,762 (47.7%) were current smokers, and 1,994 (16.5%) were ex-smokers. However, among the 6,200 women, only 22 (0.4%) were current smokers and 22 (0.4%) former smokers. Most of them had moderate weight levels. Complete descriptive statistics for the modifiable risk factors are shown in Table Comparison of predicti e rates between logistic regression and decision tree algorithms The results of the logistic regression show that the biomedical variables were excellent predictors of hypertension (Table 2). While four biomedical variables (BMI, urinary protein, blood glucose, and cholesterol) were significant predictors, none of the lifestyle factors predicted hypertension, and age was the only significant predictor among demographic factors. A comparison of the sensitivity, specificity, and overall predictive rate for the three models is shown in Table 3. The CHAID algorithm had the best overall predictive rate (64.06%), followed by logistic regression (63.84%) and C5.0 (59.22%). CHIAD algorithm also had the best sensitivity (76.3%), followed by logistic regression (64.36%) and C5.0 (59.34%). However, CHAID had the lowest specificity (52.3%), and the logistic regression had the best specificity (63.33%) Policy analysis for hypertension management with data mining Decision tree As shown in Fig. 1, the decision tree has 75 leaf nodes. Each node depicted in the decision tree can be expressed in terms of an if then rule, as follows: /* Node 75 */ If ((gender=male) and BMI=obesity and (44 age =50) and (family history of stroke=yes) and (0 Urinary PH = 60) and (92 blood glucose =571)), then hypertension proportion=77.65% The gains chart produced by the decision tree can be used for a policy analysis for hypertension management. There are two parts to the gains chart: node-by-node statistics and cumulative statistics (Table 4). In the gains chart, nodes were sorted by the number of cases in the target category for each node. The first node in the table, node 67, contains 271 hypertensive cases out of 333 subjects, or 81.38% hypertension rate. For this type of gains chart, with a categorical target variable, the gain score equals the percentage of cases with the target category in this case, hypertension for the node. The Index score shows how the proportion of hypertension for this particular node compares to the overall proportion of hypertension. For node 65, the Index score is about 158.0%, meaning
5 Y.M. Chae et al. / International Journal of Medical Informatics 62 (2001) that the proportion of respondents for this node is about 1.6 times the hypertension rate for the overall sample. The cumulative statistics shows us how well we do at finding hypertensive cases by taking the best segments of the sample. If we only take the best node (node 67), we reach 3.95% of hypertensive cases by targeting only 2.43% of the sample. If we include the next best node as well (node 21), then we get 5.55% of the hypertensive cases from only 3.43% of the sample. Including the node 65 increases those values to 7.7% of hypertensive cases from 4.7% of the sample. At this stage, we are at the crossover point described above, where we start to see diminishing returns. Note what happens if we include the next node (node 40) we get 26.3% of hypertensive cases, but we must contact 17.8% of the sample to get them. The gains chart also provides valuable information about which segments to target Table 1 Descriptive statistics for the study sample Category Measure Value Men (n=12,077) Women (6200) Lifestyle variables Count Percentage Count Percentage Diet habits Irregular Salt intake Salted Preferred food Meat Alcohol Heavy Tobacco Former Current Exercise No Biometric variables BMI Overweight Obesity Disease Hypertension U. sugar Positive 11, U. protein Positive 11, U. RBC Positive 11, Blood glucose High ( 126 mg/dl) Total High ( 240 mg/dl) cholesterol Demographic variables Age group H.D Past history Stroke D.M Hypertension Family history Stroke H.D D.M H.D.: heart disease; D.M.: diabetes mellitus.
6 108 Y.M. Chae et al. / International Journal of Medical Informatics 62 (2001) Table 2 Results of logistic regression Category Variable Odds ratio Para. est. Pr 2 Lifestyle variables Salt intake Alcohol Tobacco Exercise Meat Biometric variables BMI U. protein Blood glucose T. cholesterol U. RBC U. sugar Demographic variables PHx of H.D PHx of stroke Age FHx of H.D FHx of D.M FHx of Stroke FHx of hypertension PHx of D.M PHx: past history; FHx: family history; H.D: heart disease; D.M: diabetes mellitus. Table 3 Comparison of the predictive rates for logistic regression, CHAID, and C5.0 Sensitivity (%) Specificity (%) Predictive rate (%) Logistic Regression CHAID C and which to avoid. We might base the decision on the number of prospects we want, the desired hypertension rate for the target sample, or the desired proportion of all potential hypertension cases we want to contact. In this example, suppose we want an estimated hypertension rate of at least 80%. To achieve this, we would target the first three nodes, nodes 67, 21, and 65. This segment-specific information can be used for planning for a hypertension-management program Association rule The association rules provide specific information about risk factors based on the following generalized rule induction (Table 5). If exercise=no and (43 age 48) and gender =female, then a probability of hypertension is 21%. (1.8% of the sample s both hypertension and the above risk factors) If exercise=no and smoke=yes and
7 Y.M. Chae et al. / International Journal of Medical Informatics 62 (2001) the probability of hypertension (from 22% to 26%) regardless of gender. 4. Discussion Fig. 1. Decision tree by the CHAID algorithm. (43 age 48) and gender=female, then a probability of hypertension is 22%. If exercise = no and smoke= yes and drink= yes, then a probability of hypertension is 26%. This shows that the existence of all three modifiable risk factors significantly increases This study examined the characteristics of the knowledge discovery and data-mining models to demonstrate how they can be used to predict health outcomes and provide policy information from the Korea Medical Insurance Corporation database using a hypertension-management program as an example. First, this study compared the performance of logistic regression and two decision-tree algorithms, CHIAD and C5.0, since logistic regression has assumed a major position in the healthcare field as a method for predicting or classifying health outcomes based on the specific characteristics of each individual case. This comparison was performed using the test set of 4588 subjects and Table 4 Gains chart by CHIAD algorithm Node Node-by-node Cumulative Node Node: Resp Resp: Gain Index Node Resp Gain Index (n) (%) (n) (%) (%) (%) (%) (%) (%) (%) Resp.: Respondents.
8 110 Y.M. Chae et al. / International Journal of Medical Informatics 62 (2001) Table 5 Example of an association rule Gender Age Phx of heart BMI Smoke Drink Exercise Support (%) Confidence (%) disease Female 43 age 48 * * * * No Female 43 age 48 * * Yes * No Female 43 age 48 Yes * * * No Female 43 age 48 * * * Yes No Female 43 age 48 * * Yes * No * 43 age 48 * * Yes Yes No *Non-significant. the training set of 13,689 subjects that were used to develop the models. Contrary to the study by Long et al. [6], the CHIAD algorithm (64.06%) performed better than logistic regression (63.84%) in predicting overall hypertension, and it provided a much higher sensitivity (76.3%) than logistic regression (64.36%), but logistic regression performed better than C5.0. In a similar study by Chae et al. [10], discriminant analysis performed better than other data-mining methods, neural network and case-based reasoning. Second, we demonstrated how CHIAD could be used in a policy analysis for hypertension management. While logistic regression provides risk factors for hypertension, it does not provide specific information about the segment characteristics of age or risk factors that may be useful for policy analysis. The CHAID algorithm provided cumulative statistics that show how well we do at finding the hypertensive cases by taking the best segments of the sample. The gains chart also provided valuable information about which segments to target and which to avoid. In addition, we presented the association rules that provided an occurrence relationship among risk factors. For example, the association rule showed that the existence of all three modifiable risk factors (smoking, drinking and exercise) significantly increased the probability of hypertension regardless of gender. Such information, which could not be obtained from the logistic regression, can be used in examining the effects of individual (modifiable) risk factors on the specific segment of target population. Study limitations include a low specificity for the CHAID algorithm and low confidence measures for the association rule. The other limitations are weak measures for exercise behavior, absence of measures for nutrition, stress, and depression. In addition, the population was limited to teachers and civil servants, and thus was biased from the perspective of affluence. Future analyses will include an improvement of decision-tree algorithm and association rule. Another area of improvement in data mining is an application of a sequence rule. The sequence rule gives a temporal relationship among factors [9]. Since all insured workers in Korea are required to participate in biannual medical examinations performed by KMIC, and their biomedical as well as lifestyle data are well maintained in a temporal database at the KMIC, the sequence rule can be effectively applied to predict health outcomes based on the trends data. For example, the sequence rule provides information that blood pressure goes up if the BMI
9 Y.M. Chae et al. / International Journal of Medical Informatics 62 (2001) and cholesterol level both go up at two consecutive biannual medical examinations. Finally, cost information will be incorporated into a data-mining algorithm for each of the risk factors in order to estimate the budgets for providing hypertension management services for the specific target population. References [1] K.C. Laudon, J.P. Laudon, Management Information Systems, Prentice Hall, Englewood Cliffs, NJ, 1998, p [2] Ministry of Health and Welfare, Health Promotion Law. Republic of Korea, [3] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data mining to knowledge discovery; an overview, in: U. Fayyad, G. Piatetsky-Shapiro (Eds.), Advances in Knowledge Discovery and Data Mining, MIT Press, Boston, MA, 1996, pp [4] J.M. McDonald, S. Brossette, S.A. Moser, Pathology information systems: data mining leads to knowledge discovery, Arch. Pathol. Lab. Med. 122 (1998) [5] S. Evans, S. Lemon, C.A. Deters, R.M. Fusaro, H.T. Lynch, Automated detection of hereditary syndromes using data mining, Comput. Biomed. Res. 30 (1997) [6] W.J. Long, J.L. Griffith, H.P. Selker, R.B. D Agostino, A comparison of logistic regression to decision-tree induction in a medical domain, Comput. Biomed. Res. 26 (1993) [7] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, [8] D.B. Biggs, B. de Ville, E. Suen, A method of choosing multi-way partitions for classification and decision trees, J. Appl. Stat. 8 (1991) [9] R. Arawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceeding of the ACM SIGMOD. International Conference on the Management of Data, 1993, pp [10] Y.M. Chae, S.H. Lee, S.H. Ho, M.Y. Bae, H.C. Ohrr, Medical decision support system for the management of hypertension, Informatica 21 (1997)
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
DATA MINING AND REPORTING IN HEALTHCARE
DATA MINING AND REPORTING IN HEALTHCARE Divya Gandhi 1, Pooja Asher 2, Harshada Chaudhari 3 1,2,3 Department of Information Technology, Sardar Patel Institute of Technology, Mumbai,(India) ABSTRACT The
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
USE OF DATA MINING TO DERIVE CRM STRATEGIES OF AN AUTOMOBILE REPAIR SERVICE CENTER IN KOREA
USE OF DATA MINING TO DERIVE CRM STRATEGIES OF AN AUTOMOBILE REPAIR SERVICE CENTER IN KOREA Youngsam Yoon and Yongmoo Suh, Korea University, {mryys, ymsuh}@korea.ac.kr ABSTRACT Problems of a Korean automobile
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
Your Results. For more information visit: www.sutton.gov.uk/healthchecks. Name: Date: In partnership with
Your Results Name: Date: For more information visit: www.sutton.gov.uk/healthchecks In partnership with Introduction Everyone is at risk of developing diabetes, heart disease, kidney disease, stroke and
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Mortality Assessment Technology: A New Tool for Life Insurance Underwriting
Mortality Assessment Technology: A New Tool for Life Insurance Underwriting Guizhou Hu, MD, PhD BioSignia, Inc, Durham, North Carolina Abstract The ability to more accurately predict chronic disease morbidity
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
Easily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
Evaluation of the health check up and guidance program through linkage with health insurance claims
Journal of the National Institute of Public Health 2013 Vol.62 No.1 p.13 30 Evaluation of the health check up and guidance program through linkage with health insurance claims Etsuji OKAMOTO, Yoshimune
Social inequalities in all cause and cause specific mortality in a country of the African region
Social inequalities in all cause and cause specific mortality in a country of the African region Silvia STRINGHINI 1, Valentin Rousson 1, Bharathi Viswanathan 2, Jude Gedeon 2, Fred Paccaud 1, Pascal Bovet
The association between health risk status and health care costs among the membership of an Australian health plan
HEALTH PROMOTION INTERNATIONAL Vol. 18, No. 1 Oxford University Press 2003. All rights reserved Printed in Great Britain The association between health risk status and health care costs among the membership
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
Performance Analysis of Decision Trees
Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India
Coronary Heart Disease (CHD) Brief
Coronary Heart Disease (CHD) Brief What is Coronary Heart Disease? Coronary Heart Disease (CHD), also called coronary artery disease 1, is the most common heart condition in the United States. It occurs
Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network
General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Absolute cardiovascular disease risk assessment
Quick reference guide for health professionals Absolute cardiovascular disease risk assessment This quick reference guide is a summary of the key steps involved in assessing absolute cardiovascular risk
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics
for Healthcare Analytics Si-Chi Chin,KiyanaZolfaghar,SenjutiBasuRoy,AnkurTeredesai,andPaulAmoroso Institute of Technology, The University of Washington -Tacoma,900CommerceStreet,Tacoma,WA980-00,U.S.A.
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
Weather forecast prediction: a Data Mining application
Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,[email protected],8407974457 Abstract
High Blood Pressure in People with Diabetes:
Prepared in collaboration with High Blood Pressure in People with Diabetes: Are you at risk? Updated 2012 People with diabetes are more likely to have high blood pressure. What is blood pressure? The force
Prevention of and the Screening for Diabetes Part I Insulin Resistance By James L. Holly, MD Your Life Your Health The Examiner January 19, 2012
Prevention of and the Screening for Diabetes Part I Insulin Resistance By James L. Holly, MD Your Life Your Health The Examiner January 19, 2012 In 2002, SETMA began a relationship with Joslin Diabetes
Heart Disease Diagnosis Using Predictive Data mining
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference
Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: [email protected] Data Mining a step in A KDD Process Data mining:
Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR
Data Driven Approaches to Prescription Medication Outcomes Analysis Using EMR Nathan Manwaring University of Utah Masters Project Presentation April 2012 Equation Consulting Who we are Equation Consulting
Optum research study: Reducing health risks with corporate health-management programs
Optum research study: Reducing health risks with corporate health-management programs Introduction Despite improvements in medical science over the past several years, the nation s health is not good.
Keywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
Using Adaptive Random Trees (ART) for optimal scorecard segmentation
A FAIR ISAAC WHITE PAPER Using Adaptive Random Trees (ART) for optimal scorecard segmentation By Chris Ralph Analytic Science Director April 2006 Summary Segmented systems of models are widely recognized
Benchmarking of different classes of models used for credit scoring
Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want
Data Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Interactive Exploration of Decision Tree Results
Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,[email protected]) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,
University of Michigan Health Risk Assessment (HRA) and Trend Management System (TMS)
University of Michigan Health Risk Assessment (HRA) and Trend Management System (TMS) CIGNA has entered into a long-term agreement with the University of Michigan, which gives it access to intellectual
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
WellStyle Rewards GET STARTED GUIDE
WellStyle Rewards GET STARTED GUIDE EFFECTIVE 1/1/14 2 CONGRATULATIONS! Your health plan from MVP Health Care includes WellStyle Rewards, a program that recognizes you for taking meaningful steps toward
Artificial Neural Network Approach for Classification of Heart Disease Dataset
Artificial Neural Network Approach for Classification of Heart Disease Dataset Manjusha B. Wadhonkar 1, Prof. P.A. Tijare 2 and Prof. S.N.Sawalkar 3 1 M.E Computer Engineering (Second Year)., Computer
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
Effective Data Mining Using Neural Networks
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 8, NO. 6, DECEMBER 1996 957 Effective Data Mining Using Neural Networks Hongjun Lu, Member, IEEE Computer Society, Rudy Setiono, and Huan Liu,
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
Data Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell [email protected] David Kopcso Babson College [email protected] Abstract: A series of simulations
CHAPTER V DISCUSSION. normal life provided they keep their diabetes under control. Life style modifications
CHAPTER V DISCUSSION Background Diabetes mellitus is a chronic condition but people with diabetes can lead a normal life provided they keep their diabetes under control. Life style modifications (LSM)
Cardiovascular Disease Risk Factors
Cardiovascular Disease Risk Factors Risk factors are traits and life-style habits that increase a person's chances of having coronary artery and vascular disease. Some risk factors cannot be changed or
Facts about Diabetes in Massachusetts
Facts about Diabetes in Massachusetts Diabetes is a disease in which the body does not produce or properly use insulin (a hormone used to convert sugar, starches, and other food into the energy needed
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
An Overview and Guide to Healthy Living with Type 2 Diabetes
MEETING YOUR GOALS An Overview and Guide to Healthy Living with Type 2 Diabetes MEETING YOUR GOALS This brochure was designed to help you understand the health goals to live a healthy lifestyle with type
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
Data Mining On Diabetics
Data Mining On Diabetics Janani Sankari.M 1,Saravana priya.m 2 Assistant Professor 1,2 Department of Information Technology 1,Computer Engineering 2 Jeppiaar Engineering College,Chennai 1, D.Y.Patil College
A Basic Guide to Modeling Techniques for All Direct Marketing Challenges
A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview
Beware that Low Urine Creatinine! by Vera F. Dolan MSPH FALU, Michael Fulks MD, Robert L. Stout PhD
1 Beware that Low Urine Creatinine! by Vera F. Dolan MSPH FALU, Michael Fulks MD, Robert L. Stout PhD Executive Summary: The presence of low urine creatinine at insurance testing is associated with increased
Obesity in the United States Workforce. Findings from the National Health and Nutrition Examination Surveys (NHANES) III and 1999-2000
P F I Z E R F A C T S Obesity in the United States Workforce Findings from the National Health and Nutrition Examination Surveys (NHANES) III and 1999-2000 p p Obesity in The United States Workforce One
Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov
Start-up Companies Predictive Models Analysis Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov Abstract: A quantitative research is performed to derive a model for predicting the success of Bulgarian start-up
Protein Intake in Potentially Insulin Resistant Adults: Impact on Glycemic and Lipoprotein Profiles - NPB #01-075
Title: Protein Intake in Potentially Insulin Resistant Adults: Impact on Glycemic and Lipoprotein Profiles - NPB #01-075 Investigator: Institution: Gail Gates, PhD, RD/LD Oklahoma State University Date
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
Outcomes-Based Health Risk Management: More Than a Wellness Program
Outcomes-Based Health Risk Management: More Than a Wellness Program Summer 2013 Lockton Companies Company health plan costs have been outpacing inflation, increasing over the past 10 years at an average
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
NEURAL NETWORKS IN DATA MINING
NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
Standardization of Components, Products and Processes with Data Mining
B. Agard and A. Kusiak, Standardization of Components, Products and Processes with Data Mining, International Conference on Production Research Americas 2004, Santiago, Chile, August 1-4, 2004. Standardization
Kansas Behavioral Health Risk Bulletin
Kansas Behavioral Health Risk Bulletin Kansas Department of Health and Environment November 7, 1995 Bureau of Chronic Disease and Health Promotion Vol. 1 No. 12 Diabetes Mellitus in Kansas Diabetes mellitus
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
Know Your Numbers. The Five-Point Plan
The Five-Point Plan Know Your Numbers 2 My husband didn t even know he had diabetes until he had a heart attack. Lupe Ontiveros Actress on Desperate Housewives 13 What does it mean to Know Your Numbers?
Applying Data Mining Technique to Sales Forecast
Applying Data Mining Technique to Sales Forecast 1 Erkin Guler, 2 Taner Ersoz and 1 Filiz Ersoz 1 Karabuk University, Department of Industrial Engineering, Karabuk, Turkey [email protected], [email protected]
NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia. Produced by: National Cardiovascular Intelligence Network (NCVIN)
NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia Produced by: National Cardiovascular Intelligence Network (NCVIN) Date: August 2015 About Public Health England Public Health England
S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY
S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT Predictive modeling includes regression, both logistic and linear,
Analysis of Regression Based on Sampling Weights in Complex Sample Survey: Data from the Korea Youth Risk Behavior Web- Based Survey
, pp.65-74 http://dx.doi.org/10.14257/ijunesst.2015.8.10.07 Analysis of Regression Based on Sampling Weights in Complex Sample Survey: Data from the Korea Youth Risk Behavior Web- Based Survey Haewon Byeon
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
MBA 8473 - Data Mining & Knowledge Discovery
MBA 8473 - Data Mining & Knowledge Discovery MBA 8473 1 Learning Objectives 55. Explain what is data mining? 56. Explain two basic types of applications of data mining. 55.1. Compare and contrast various
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Decision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Web-Based Heart Disease Decision Support System using Data Mining Classification Modeling Techniques
Web-Based Heart Disease Decision Support System using Data Mining Classification Modeling Techniques Sellappan Palaniappan 1 ), Rafiah Awang 2 ) Abstract The healthcare industry collects huge amounts of
Free Trial - BIRT Analytics - IAAs
Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis
FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT
FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT ANURADHA.T Assoc.prof, [email protected] SRI SAI KRISHNA.A [email protected] SATYATEJ.K [email protected] NAGA ANIL KUMAR.G
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
