How To Predict Diabetes In A Cost Bucket

Size: px
Start display at page:

Download "How To Predict Diabetes In A Cost Bucket"

Transcription

1 Paper PH An Analysis of Diabetes Risk Factors Using Data Mining Approach Akkarapol Sa-ngasoongsong and Jongsawas Chongwatpol Oklahoma State University, Stillwater, OK 74078, USA ABSTRACT Preventing the disease of diabetes is an ongoing area of interest to the healthcare community. Although many studies employ several data mining techniques to assess the leading causes of diabetes, only small sets of clinical risk factors are considered. Consequently, not only many potentially important variables such as pre-diabetes health conditions are neglected in their analysis, but the results produced by such techniques may not represent relevant risk factors and pattern recognition of diabetes appropriately. In this study, we categorize our analysis into three different focuses based on the patients healthcare costs. We then examine whether more complex analytical models sing several data mining techniques in SAS Enterprise Miner 7.1 can better predict and explain the causes of increasing diabetes in adult patients in each cost category. The preliminary analysis shows that high blood pressure, age, cholesterol, adult BMI, total income, sex, heart attack, marital status, dental checkup, and asthma diagnosis are among the key risk factors. 1. INTRODUCTION Preventing the disease of diabetes is an ongoing area of interest to the healthcare community. Based on the data from the 2011 National Diabetes Fact Sheet, diabetes affects an estimate of 25.8 million people in the US, which is about 8.3% of the population. Additionally, approximately 79 million people have been diagnosed with pre-diabetes [1]. Pre-diabetes refers to a group of people with higher blood glucose levels than normal but not high enough for a diagnosis of diabetes. Increased awareness and treatment of diabetes should begin with prevention. Much of the focus has been on the impact and importance of preventive measures on disease occurrence and especially cost savings resulted from such measures. Many studies regarding diabetes prediction have been conducted for several years. The main objectives are to predict what variables are the causes, at high risk, for diabetes and to provide a preventive action toward individual at increased risk for the disease. Several variables have been reported in literature as important indicators for diabetes prediction. Lindstrom and Tuomilehto (2003) develop the diabetes risk score model considering Age, BMI, waist circumference, history of antihypertensive drug treatment, high blood glucose, physical activity, and daily consumption of fruits, berries, or vegetables as categorical variables [2]. Park and Edington (2001) present a sequential neural network model for diabetes prediction. The authors indicate risk factors, in the final model, including blood pressure, cholesterol, back pain, fatty food, weight index or alcohol index [3]. Concaro et al, (2009) present the application of a data mining technique to a sample of diabetic patients. They consider the clinical variables such as BMI, blood pressure, glycaemia, cholesterol, or cardio-vascular risk in the model [4]. Although these studies employ several data mining techniques to assess the leading causes of diabetes, only small sets of clinical risk factors are considered. Consequently, not only many potentially important variables such as prediabetes health conditions are neglected in their analysis, but the results produced by such techniques may not represent relevant risk factors and pattern recognition of diabetes appropriately. This study seeks to fill this gap. Specially, the question arises What are the most important risk factors to be included in prognostic analysis to prevent prevalence of diabetes? To answer this research question, we examine whether more complex analytical models using several data mining techniques can better predict and explain the causes of increasing diabetes. In this study, we follow the CRISP-DM Model (Cross Industry Standard Process for Data Mining), which is used as a comprehensive data mining methodology and process model for conducting this data mining study. CRISP-DM breaks down this data mining project in to six phases: business understanding, data understanding, data preparation, modeling, evaluation, and development. 1

2 2. RESEARCH FRAMEWORK We provide the in-depth analysis on how data mining approach can be a great help. After understanding the domain of diabetes and developing the objectives of achieving prognostic analysis of diabetes though data mining approach, we begin our analysis by understanding the relevant data source, accessing data quality, and discovering first insights into the data. The next step is toward data preprocessing from the initial raw data to the final dataset, ready for the model development. This preprocessing step takes about 90% of time to clean, transform, construct, and format the relevant data. We then apply analytical data mining techniques to predict and explain factors that increase the prevalence of diabetes in the patient samples. However, we need to evaluate and assess the validity and the utility of our developed predictive models before deploying the data mining results into the domain as stated in the objectives of the study. Figure 1 presents the overall framework of our models to address the research question from the data understanding to model deployment. 2.1 Data Description The data source that is used to perform data mining analysis in this study is provided by SAS in the national 2010 SAS Data Mining Shootout competition. With 50,788 records, the dataset consists of 43 variables in which 35 variables are discrete variables and the other eight variables are continuous variables. This dataset is assumed to be representative of the population and used for analysis as a snapshot of the country and its health care costs at a point in time. Our first task in this study is to get a sense of the dataset for any inconsistencies, errors, or extreme values in the data. Frequency distribution, descriptive statistics, and cross-tab analysis are used in this section. Table 1 presents the summary of inapplicable data for nominal variables. Below are the key findings, which are very important for data preparation in the next section: Table 1: Summary of inapplicable data for nominal variables 2

3 Figure 1: Research Framework 3

4 Below are the key findings, which are very important for data preparation in the next section: - LAST_PSA, LAST_PAP_SMEAR, LAST_BREAST_EXAM, LAST MAMMOGRAM are only taken by one type of gender (Deleted). - More than 95% of the data in PERSON_BLIND and IS_DEAF are inapplicable % of patients with diabetes have no educational degree, compared to 17.81% of patients without diabetes % of patients with diabetes wear eyeglasses, compared to 44.58% of patients without diabetes % of patients with diabetes checked their cholesterol within last year, compared to % of patients without diabetes % of patients with diabetes checked their health within last year, compared to 38.92% of patients without diabetes % of patients with diabetes had flu-shot within last year, compared to 16.55% of patients without diabetes % of patients with diabetes had sigmoidoscopy or colonoscopy test, compared to 10.74% of patients without diabetes % of patients with diabetes have been diagnosed with high blood pressure, compared to % of patients without diabetes % of patients with diabetes have been diagnosed with coronary heart disease diagnosis, compared to 1.57% of patients without diabetes Based on these findings, we can observe that most diabetes patients have been diagnosed with other concurrent diseases such as high blood pressure, heart disease, and cholesterols, etc. 2.2 Data Preparation The next step in this analysis is to examine the quality of the data. We would like to figure out whether or not the data is complete or has missing values and what variables to be included in the model. We make an assumption that the model excludes variables with over 50% information missing; see similar methodology from Park and Edington (2001), [3]. Thus, the following seven variables are excluded from our analysis: (1) MORE_THAN_ONE_JOB, (2) PERSON_BIND, (3) IS_DEAF, (4) LAST_PSA, (5) LAST_PAP_SMEAR, (6) LAST_BREAST_EXAM, and (7) LAST_MAMMOGRAM. Below are two examples on how we analyze those excluded variables. Note that, besides looking at the descriptive statistics, we have actually tested whether or not these seven excluded variables are significant in the models (overall decision tree model or even the model by age and gender). Due to the high-missing-values issue, the results show that these seven variables are not included in any models tested. - For instance, a variable PERSON_BLIND has 48,544 missing observations (approximately 95% missing values) and out of its applicable data, there are only 37 observations indicating that those patients have been diagnosed with diabetes and blindness. Even though there has been claimed that roughly 40% of patients diagnosed with diabetes in the United States have some form of diabetic retinopathy [5], we still exclude that variable from our analysis due to the relatively-low relevant amount of data in this dataset. - Another critical variable to be excluded from our analysis is MORE_THAN_ONE_JOB. There have been several studies on how unhealthy eating or sleeping habits (sedentary life styles) for those people that work more than one job increases the risk factor for diabetes [6]. Only approximately less than 0.3% (68 out of 23,169 observations excluding all non-applicable data) indicates the group of patients who have been diagnosed with diabetes and worked more than one job (see Figure 2). Thus, we decide to exclude this variable from our analysis. Similar reasons for excluding such variable can be seen for the other five variables. Figure 2: Cross-Tab Analysis (Diabetes by PERSON_BIND and Diabetes by MORE_THAN_ONE_JOB) 4

5 Additionally, including such variables with high-missing values in the model or even applying missing value imputation method can lower the quality of our findings. In fact, we believe that the dataset is still quite large enough (over 30,000 observations) for the analysis after excluding these variables and any other missing values. 2.3 Missing Values, Possible Outliers, and Data Transformation After excluding suspected variables, we decide to impute missing values for better model prediction and to increase the total number of observations with diabetes. Although, imputation can bias the model prediction, omitting these missing values in the sample dataset may also produce an extremely biased prediction when applied to the new dataset. Another issue that should be addressed is the possible outliers in the data set. Some of the patients in the data set have income lower than zero. Also some of the most expensive patients have the total health care cost, Medicaid, or Medicare within very extreme percentile. In our model, we decide to filter the patients with these outliers out of the data set. Consequently, a total of 27,655 observations are ready for modeling analysis. For interval input variables, the missing values are replaced with the mean of the non-missing values. In contrast, the missing values for nominal variables are replaced with the most frequently occurring class variable value. Initially, we decide to transform the variables, which look right skewed, in order to improve the fit of a model to the data. We apply Maximize Normality method to the following six variables: AMOUNT_PAID_MEDICAID, AMOUNT_PAID_MEDICARE, NUMBER_VISITS, ADULT_BMI, TOTALEXP, and TOTAL_INCOME. However, it seems that only TOTAL_INCOME and NUMB_VISITS show significant improvement after the transformation, while the distribution of other variables remain closely the same. Thus, only TOTAL_INCOME and NUMB_VISITS variables are transformed for further analysis (see Figure 3). Figure 3: Variable Transformation Statistics 3. PREDICTION MODEL In this study, three popular data mining techniques including logistic regression, decision tree, and artificial neural network are applied and compared to each other based on their predictive accuracy on the hold-out sample. - Logistic regression is often used to predict an outcome variable that is binary or multi-class dependent variables. It allows the prediction of discrete variables (dependent variables) by a mix of continuous and discrete predictors as the relationship between dependent variables and independent variables is non-linear. It builds the model to predict the odds of its occurrence instead of point estimate event in the traditional linear regression model. - Decision tree is another data classification and prediction method commonly used due to its intuitive explainability characteristics. Decision tree divides the dataset into multiple groups by evaluating individual data record, which can be described by its attributes. It is also simple and easy to visualize the process of classification where the predicates return discrete values and can be explained by a series of nested if-then-else statements. - Artificial Neural Network (ANN) is a mathematical and computational model for pattern recognition and data classification through a learning process. It is a biologically inspired analytical technique, simulating biological systems, where learning algorithm indicates how learning takes place and involves adjustments to the synaptic connections between neurons. Data input can be discrete or real valued; meanwhile the output is in a form of vector of values and can be discrete or real valued as well. For a technical summary including both algorithm and its applications of each method in medical and health care areas see [7-11]. 5

6 3.1 Cost Bucketing and Model Classification In this study, we categorize our analysis into three different focuses based on the patients healthcare costs. We partition our patients into these three groups (see Table 2) in such a way that the sum of all patients healthcare costs in each group is approximately the same. The range of healthcare costs is varied significantly. According to the population s cumulative healthcare expenses, approximately 87% of the overall expense of the population originates from only 3% of the most expensive patients. - The cost bucket #1 can be interpreted as candidates for the low-risk group of patients diagnosed with diabetes. - The cost bucket #2 can be interpreted as the pre-diabetes group. - The cost bucket #3 can be interpreted as the high-risk group of patients diagnosed with diabetes. Table 2: Cost bucket information We decide to stratify our samples to construct a model set with approximately equal numbers of each target value (DIABETES_DIAG_BINARY) in each cost bucket. With a 50% adjusting for oversampling, we believe that the contrast between the two values is minimized, which make it easier and more reliable to recognize patterns in this dataset. The most important goals of this study are to predict the outcome probability of people with diabetes. In order to provide more accurate diabetes prediction, we apply and compare ANN, Logistic Regression, and Decision three models in each cost bucket as well. 3.2 Performance Measures The complicity of the model is controlled by fit statistics calculated on the testing dataset. We use three different criteria to select the best model on the testing dataset. These criteria include false negative, prediction accuracy, and misclassification rate. False negative (Target = 1 and Outcome = 0) represents the case of an error in the model prediction where model results indicate that diabetes occurrence is not present, when in reality, there is an incident. The false negative value should be as low as possible. The proportion of cases misclassified is very common in the predictive modeling. However, the observed misclassification rate should be also relatively low for model justification. Lastly, prediction accuracy is evaluated among the three models on the testing dataset. The higher the prediction accuracy rate, the better the model to be selected. 4. RESULTS AND DISCUSSIONS After excluding variables with outliers and high missing values, we first develop predictive models on the original sample dataset without categorizing our sample into three cost-bucketing groups as presented in Table 2. The results of the overall model are compared to those from each cost-bucketing group to determine whether important risk factors of diabetes are different. The dataset is allocated to the training (70%) and testing (30%) partitions. The binary variable of patients with diabetes (Target = 1 for patients with diabetes and Outcome = 0 for patients without diabetes) is the output variable of the prediction models. After recoding all categorical input variables, the selected variables are tested whether the association between the input variables and the logit of binary target variable satisfy the linearity assumption. The problematic variables are then transformed to satisfy such assumption. Different models are constructed and compared in order to predict patients with diabetes. Training dataset is only used to extract models by the data mining algorithms. Then, those models derived in the training data set are then applied on the testing dataset for the correct discovery of intrusions. In other words, this testing dataset is used to prune the models generated by the data mining process in the training dataset to avoid overfitting and instabilities in the classification accuracy. Statistical analyses are performed using SAS Enterprise Guide 4.3 for data preparation and SAS Enterprise Miner 7.1 for model development and comparison. 6

7 4.1 Overall Model Logistic regression produces the best results with overall misclassification rate of 22.89%. Although Artificial Neural Networks (ANN) has the lowest false negative rate of 20.55%, we still select the logistic regression model as our final model to predict patients with diabetes. After performing the stepwise logistic regression, which is a process of building a model where the choice of predictive variables is carried our based on the t-statistics of their estimated coefficients, the final model is presented in Figure 4. There are total 15 risk factors used to predict the prevalence of diabetes. Figure 4: Classification Table and Summary of Stepwise Selection (Overall Model) 4.2 Cost Bucket #1 (<$5,750) For the low-risk group (Cost Bucket #1), Decision tree produces the best results with overall misclassification rate of 24.56%, followed by the Logistic regression and artificial neural networks with misclassification rates of 25.43% and 25.68%, respectively. Figure 5 presents the final variable selection and classification table for Decision Tree model. Seven variables including age, cholesterol, adult BMI, high blood pressure, total income, sex, and asthma diagnosis are important factors to predict diabetes in this cost bucket group. Compared to the overall model, Only age, high blood pressure, and cholesterol are the common risk factors. 7

8 Figure 5: Classification Table and Variable Importance (Cost Bucket #1) 4.3 Cost Bucket #2 ($5,750 - $19,400) For the pre-diabetes group, Logistic Regression produces the best results with overall misclassification rate of 29.74%, followed by artificial neural networks and decision tree, respectively. Figure 6 presents the final variable selection based on the stepwise regression approach and classification table for Logistic regression model. Eight variables including cholesterol, adult BMI, high blood pressure, highest degree, last flu shot, heart attack, wears eyeglasses, and marital status are important factors to predict diabetes in this cost bucket group. Compared to the overall model, adult BMI, high blood pressure, last flu shot, heart attack, wears eyeglasses, and marital status are the common risk factors. On the other hand, only adult BMI and high blood pressure are the common risk factors between this group and the cost bucket # Cost Bucket #3 (>$19,400) For the high-risk group, Decision Tree produces the best results with overall misclassification rate of 41.80%, followed by artificial neural networks and Logistic Regression, respectively. Figure 7 presents the final variable selection and classification table for Decision Tree model. The final model indicates that adult BMI, age, wear eyeglasses, and marital status are among the key risk factors in this cost bucket group. Compared to the overall model, adult BMI, age, and, wears eyeglasses are the common risk factors. On the other hand, only adult BMI and age are the common risk factors between this group and the cost bucket #1; meanwhile adult BMI, wears eyeglasses, and marital status are the common risk factors compared to the cost bucket #2 8

9 Figure 6: Classification Table and Summary of Stepwise Selection (Cost Bucket #2) 5. CONCLUSION This study demonstrates that data mining based approaches can be used to assess predictor variables influencing the risk of diabetes in adult patients. As opposed to the traditional descriptive statistical analysis methods or the approaches adopting only expert-selected variables, the employment of logistic regression, decision trees, or neural network models provide an interesting list of risk factors that some have been included in the existing studies; meanwhile some others have been absent from the related literatures. We categorize our analysis into three different focuses based on the patients healthcare costs. Figure 8 presents the summary of the key findings of this study. Only adult BMI is the key risk factor among these four groups. Age is the most important factor for the overall model, cost buckets #1, and # 3; meanwhile, high blood pressure is the key indicator for the overall model, cost buckets #1 and #2. The results presented in Figure 8 confirm the previous literature to some extent. The following key risk factors: High blood pressure, adult BMI, cholesterol, and heart attack derived from our analysis are consistent with the studies from Lindstrom and Tuomilehto (2003), Park and Edington (2001), and Concaro et al. (2009). On the other hand, several interesting factors such as total income, asthma diagnosis, wears eyeglasses, and marital status reported to be critical in this study have not been included in the existing studies. A prospective study is needed to provide better understanding of the relationship between these undisclosed factors and the increased risk of diabetes. 9

10 Figure 7: Classification Table and Variable Importance (Cost Bucket #3) Figure 8: Summary of the Key Findings 10

11 REFERENCES [1] American Diabetes Association, "Diabetes Statistics," June 02, 2012, < [2] J. Lindstrom and J. Tuomilehto, "The Diabetes Risk Score: A practical tool to predict type 2 diabetes risk," Diabetes Care, 26:3 (2003), [3] J. Park and D. W. Edington, "A Sequential Neural Network Model for Diabetes Prediction," Artificial Intelligence in Medicine, 23 (2001), [4] S. Concaro, L. Sacchi, C. Cerra, and M. Stefanelli, "Temporal Data Mining for the Assessment of the Costs Related to Diabetes Mellitus Pharmacological Treatment," Proc. AMIA 2009 Symposium Proceedings, 2009, pp [5] American Optometric Association, "Diabetes is the Leading Cause of Blindness Among Most Adults," July 11, 2010, < [6] Eurekalert, "Insufficient sleep may be linked to increased diabetes risk," July 11, 2010, < [7] Jackson, J., Data mining: a conceptual overview. Communications of the Association for Information Systems, : p [8] Turban, E., R. Sharda, and D. Delen, Decision Support and Business Intelligence Systems. 2011, Pearson. [9] Delen, D., A. Oztekin, and Z.J. Kong, A machine learning-based approach to prognostic analysis of thoracic transplantations. Artificial Intelligence in Medicine, (1): p [10] Delen, D., A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, (4): p [11] Delen, D., G. Walker, and A. Kadam, Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine, (2): p CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Name: Akkarapol Sa-ngasoongsong Enterprise: Oklahoma State University Address: 245N. University Place#301 City, State ZIP: Stillwater, Oklahoma, 74075, USA akkarap@okstate.edu Name: Jongsawas Chongwatpol Enterprise: Oklahoma State University Address: Spears School of Business City, State ZIP: Stillwater, Oklahoma, 74078, USA jongsaw@ostat .okstate.edu Akkarapol Sa-ngasoongsong is a PhD student in School of Industrial Engineering and Management at Oklahoma State University. He has three years of professional experiences as R&D Engineer. He has earned two SAS certifications: SAS Certified Base Programmer for SAS 9 and Certified Predictive Modeler using SAS Enterprise Miner 6.1. He was 1 st place award winner of the M2010 conference s Data Mining Shootout, and recipient of SAS ambassador honorable mention award (SGF 2012) Jongsawas Chongwatpol has earned PhD degree in Management Science and Information Systems from Oklahoma State University. He was 1 st place award winner of the M2010 conference s Data Mining Shootout. He was also a recipient of the best teaching award from Oklahoma State University, and 2011 best cast study paper award from Decision Sciences Institute (DSI). SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 11

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct

More information

Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry

Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry Paper 1808-2014 Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry Kittipong Trongsawad and Jongsawas Chongwatpol NIDA Business School, National Institute

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

PharmaSUG2011 Paper HS03

PharmaSUG2011 Paper HS03 PharmaSUG2011 Paper HS03 Using SAS Predictive Modeling to Investigate the Asthma s Patient Future Hospitalization Risk Yehia H. Khalil, University of Louisville, Louisville, KY, US ABSTRACT The focus of

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Improving the Thermal Efficiency of Coal-Fired Power Plants: A Data Mining Approach

Improving the Thermal Efficiency of Coal-Fired Power Plants: A Data Mining Approach Paper 1805-2014 Improving the Thermal Efficiency of Coal-Fired Power Plants: A Data Mining Approach Thanrawee Phurithititanapong and Jongsawas Chongwatpol NIDA Business School, National Institute of Development

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Disease Prevention to Reduce New Hampshire Healthcare Claims and Costs: A Data Mining Approach

Disease Prevention to Reduce New Hampshire Healthcare Claims and Costs: A Data Mining Approach Disease Prevention to Reduce New Hampshire Healthcare Claims and Costs: A Data Mining Approach Disease Prevention to Reduce New Hampshire Healthcare Claims and Costs: A Data Mining Approach Rakesh Karn,

More information

Customer and Business Analytic

Customer and Business Analytic Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Data Mining On Diabetics

Data Mining On Diabetics Data Mining On Diabetics Janani Sankari.M 1,Saravana priya.m 2 Assistant Professor 1,2 Department of Information Technology 1,Computer Engineering 2 Jeppiaar Engineering College,Chennai 1, D.Y.Patil College

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics

Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics for Healthcare Analytics Si-Chi Chin,KiyanaZolfaghar,SenjutiBasuRoy,AnkurTeredesai,andPaulAmoroso Institute of Technology, The University of Washington -Tacoma,900CommerceStreet,Tacoma,WA980-00,U.S.A.

More information

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression

More information

Journal of Information

Journal of Information Journal of Information journal homepage: http://www.pakinsight.com/?ic=journal&journal=104 PREDICT SURVIVAL OF PATIENTS WITH LUNG CANCER USING AN ENSEMBLE FEATURE SELECTION ALGORITHM AND CLASSIFICATION

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

More information

DATA MINING AND REPORTING IN HEALTHCARE

DATA MINING AND REPORTING IN HEALTHCARE DATA MINING AND REPORTING IN HEALTHCARE Divya Gandhi 1, Pooja Asher 2, Harshada Chaudhari 3 1,2,3 Department of Information Technology, Sardar Patel Institute of Technology, Mumbai,(India) ABSTRACT The

More information

Lowering social cost of car accidents by predicting high-risk drivers

Lowering social cost of car accidents by predicting high-risk drivers Lowering social cost of car accidents by predicting high-risk drivers Vannessa Peng Davin Tsai Shu-Min Yeh Why we do this? Traffic accident happened every day. In order to decrease the number of traffic

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

Figure 1. The Research Framework on Hit Song Prediction

Figure 1. The Research Framework on Hit Song Prediction Paper 3368-2015 Determining the Key Success Factors for hit songs in the Billboard Music Charts Piboon Banpotsakun, MBA Student, NIDA Business School, Bangkok, Thailand Jongsawas Chongwatpol, Ph.D., NIDA

More information

COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS

COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS AMJAD HARB and RASHID JAYOUSI Faculty of Computer Science, Al-Quds University, Jerusalem, Palestine Abstract This study exploits

More information

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous

More information

Understanding Characteristics of Caravan Insurance Policy Buyer

Understanding Characteristics of Caravan Insurance Policy Buyer Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended

More information

Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers

Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers Paper 1863-2014 Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers Sai Vijay Kishore Movva, Vandana Reddy and Dr. Goutam Chakraborty;

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

University of Michigan Health Risk Assessment (HRA) and Trend Management System (TMS)

University of Michigan Health Risk Assessment (HRA) and Trend Management System (TMS) University of Michigan Health Risk Assessment (HRA) and Trend Management System (TMS) CIGNA has entered into a long-term agreement with the University of Michigan, which gives it access to intellectual

More information

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Tongshan Chang The University of California Office of the President CAIR Conference in Pasadena 11/13/2008

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

University of Michigan Dearborn Graduate Psychology Assessment Program

University of Michigan Dearborn Graduate Psychology Assessment Program University of Michigan Dearborn Graduate Psychology Assessment Program Graduate Clinical Health Psychology Program Goals 1 Psychotherapy Skills Acquisition: To train students in the skills and knowledge

More information

S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY

S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT Predictive modeling includes regression, both logistic and linear,

More information

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH 330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,

More information

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information

More information

Facts about Diabetes in Massachusetts

Facts about Diabetes in Massachusetts Facts about Diabetes in Massachusetts Diabetes is a disease in which the body does not produce or properly use insulin (a hormone used to convert sugar, starches, and other food into the energy needed

More information

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved.

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile Association Overview Two Challenges: 1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects

More information

Benchmarking of different classes of models used for credit scoring

Benchmarking of different classes of models used for credit scoring Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

A Property and Casualty Insurance Predictive Modeling Process in SAS

A Property and Casualty Insurance Predictive Modeling Process in SAS Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly

More information

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc. A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc. Introduction: The Basel Capital Accord, ready for implementation in force around 2006, sets out

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

TDWI Best Practice BI & DW Predictive Analytics & Data Mining TDWI Best Practice BI & DW Predictive Analytics & Data Mining Course Length : 9am to 5pm, 2 consecutive days 2012 Dates : Sydney: July 30 & 31 Melbourne: August 2 & 3 Canberra: August 6 & 7 Venue & Cost

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Paper 114-27 Predicting Customer in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Junxiang Lu, Ph.D. Sprint Communications Company Overland Park, Kansas ABSTRACT

More information

Joseph Twagilimana, University of Louisville, Louisville, KY

Joseph Twagilimana, University of Louisville, Louisville, KY ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim

More information

NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia. Produced by: National Cardiovascular Intelligence Network (NCVIN)

NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia. Produced by: National Cardiovascular Intelligence Network (NCVIN) NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia Produced by: National Cardiovascular Intelligence Network (NCVIN) Date: August 2015 About Public Health England Public Health England

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses

Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses Salford Systems Data Mining 2006 March 27-31 2006 San Diego, CA By Dean Abbott Abbott Analytics

More information

USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES

USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES Irron Williams Northwestern University IrronWilliams2015@u.northwestern.edu Abstract--Data science is evolving. In

More information

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

!!!#$$%&'()*+$(,%!#$%$&'()*%(+,'-*&./#-$&'(-&(0*.$#-$1(2&.3$'45 !"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

More information

Data Mining Applications in Fund Raising

Data Mining Applications in Fund Raising Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,

More information

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations

More information

Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT

Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal ank of Scotland, ridgeport, CT ASTRACT The credit card industry is particular in its need for a wide variety

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION. Friday 14 March 2008 9.00-9.

SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION. Friday 14 March 2008 9.00-9. SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION Friday 14 March 2008 9.00-9.45 am Attempt all ten questions. For each question, choose the

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Paper 3508-2015. Downtime of a truck = Truck repair end date - Truck repair start date

Paper 3508-2015. Downtime of a truck = Truck repair end date - Truck repair start date Paper 3508-2015 Using Text from Repair Tickets of a Truck Manufacturing Company to Predict Factors that Contribute to Truck Downtime Ayush Priyadarshi and Dr. Goutam Chakraborty, Oklahoma State University

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

Alex Vidras, David Tysinger. Merkle Inc.

Alex Vidras, David Tysinger. Merkle Inc. Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT

More information

Mortality Assessment Technology: A New Tool for Life Insurance Underwriting

Mortality Assessment Technology: A New Tool for Life Insurance Underwriting Mortality Assessment Technology: A New Tool for Life Insurance Underwriting Guizhou Hu, MD, PhD BioSignia, Inc, Durham, North Carolina Abstract The ability to more accurately predict chronic disease morbidity

More information

The SAS Analytics Shootout Annual Student Competition

The SAS Analytics Shootout Annual Student Competition The SAS Analytics Shootout Annual Student Competition Sponsored by SAS and The Institute for Health & Business Insight at Central Michigan University Joseph Pomerville, Research Analyst / Advanced Analytics

More information

Lecture 6 - Data Mining Processes

Lecture 6 - Data Mining Processes Lecture 6 - Data Mining Processes Dr. Songsri Tangsripairoj Dr.Benjarath Pupacdi Faculty of ICT, Mahidol University 1 Cross-Industry Standard Process for Data Mining (CRISP-DM) Example Application: Telephone

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

More information

A fast, powerful data mining workbench designed for small to midsize organizations

A fast, powerful data mining workbench designed for small to midsize organizations FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables

Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables Paper 10961-2016 Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables Vinoth Kumar Raja, Vignesh Dhanabal and Dr. Goutam Chakraborty, Oklahoma State

More information

Artificial Neural Network Approach for Classification of Heart Disease Dataset

Artificial Neural Network Approach for Classification of Heart Disease Dataset Artificial Neural Network Approach for Classification of Heart Disease Dataset Manjusha B. Wadhonkar 1, Prof. P.A. Tijare 2 and Prof. S.N.Sawalkar 3 1 M.E Computer Engineering (Second Year)., Computer

More information

Master of Science in Healthcare Informatics and Analytics Program Overview

Master of Science in Healthcare Informatics and Analytics Program Overview Master of Science in Healthcare Informatics and Analytics Program Overview The program is a 60 credit, 100 week course of study that is designed to graduate students who: Understand and can apply the appropriate

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

Predictive Modeling of Titanic Survivors: a Learning Competition

Predictive Modeling of Titanic Survivors: a Learning Competition SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

The Predictive Data Mining Revolution in Scorecards:

The Predictive Data Mining Revolution in Scorecards: January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms

More information

TYPE 2 DIABETES IN THE AFRICAN AMERICAN COMMUNITY. Understanding the Complications That May Happen Without Proper Care

TYPE 2 DIABETES IN THE AFRICAN AMERICAN COMMUNITY. Understanding the Complications That May Happen Without Proper Care TYPE 2 DIABETES IN THE AFRICAN AMERICAN COMMUNITY Understanding the Complications That May Happen Without Proper Care STAYING HEALTHY THE IMPORTANCE OF PROPER MANAGEMENT OF TYPE 2 DIABETES Diabetes is

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

REVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING DATA MINING TECHNIQUES

REVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING DATA MINING TECHNIQUES International Journal of Latest Research in Engineering and Technology (IJLRET) ISSN: 2454-5031(Online) ǁ Volume 1 Issue 5ǁOctober 2015 ǁ PP 09-14 REVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access

More information