Imputation Methods to Deal with Missing Values when Data Mining Trauma Injury Data

Size: px
Start display at page:

Download "Imputation Methods to Deal with Missing Values when Data Mining Trauma Injury Data"

Transcription

1 Imputation Methods to Deal with Missing Values when Data Mining Trauma Injury Data Kay I Penny Centre for Mathematics and Statistics, Napier University, Craiglockhart Campus, Edinburgh, EH14 1DJ k.penny@napier.ac.uk Thomas Chesney Nottingham University Business School, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB Thomas.Chesney@nottingham.ac.uk Abstract. Methods for analysing trauma injury data with missing values, collected at a UK hospital, are reported. One measure of injury severity, the Glasgow coma score, which is known to be associated with patient death, is missing for 12% of patients in the dataset. In order to include these 12% of patients in the analysis, three different data imputation techniques are used to estimate the missing values. The imputed data sets are analysed by an artificial neural network and logistic regression, and their results compared in terms of sensitivity, specificity, positive predictive value and negative predictive value. Keywords Data mining, missing data imputation, trauma injury. 1. Introduction Trauma injury is the most common cause of loss of life to those under forty [1]. In 1991 a trauma system was put in place at the North Staffordshire Hospital (NSH) in Stoke-on-Trent in the U.K. It records injury details including Injury Severity Score (ISS) [2], Abbreviated Injury Scores (AIS) [3], the Glasgow Coma Score (GCS) [4], the patient's sex and age, management and interventions, and the outcome of the treatment, including whether the patient lived or died during their hospital stay. North Staffordshire Hospital is a major trauma centre in the area and receives patient referrals from surrounding hospitals. Oakley [5] analysed data for only the most severely injured patients admitted between 1992 to 1998, and found determinants of mortality for this subset of patients included age, head AIS, chest AIS, abdominal AIS, external injury AIS, mechanism of injury, primary receiving hospital and calendar year of admission. Further analysis includes a comparison of several artificial neural network (ANN) models and logistic regression (LR) to predict death during hospital stay [6]. Factors found to be important in the modelling were age, mechanism of injury, whether the patient was referred from another hospital, and several injury severity scores including GCS motor and GCS verbal scores. Missing data do not always cause concern when using data mining techniques, however, these data have 12% of GCS scores missing. Applying the standard practice of complete-case analysis therefore means that 12% of the dataset has been excluded from the modelling since these patients do not have recorded values for the three GSC scores. Exclusion of this subset of patients may lead to bias in the results, as patients who have not had their GCS scores recorded may not be a representative sample of the population of trauma injury patients e.g. it may be that these patients tend to be more seriously injured than the average or typical patient, hence the scores were not recorded due to lack of time, or that they presented with a different type or combination of injuries etc. The aim of this research is to investigate the accuracy of modelling patient death following trauma injury in conjunction with missing value imputation. 2. Methods The study involves trauma audit data from patients treated at the North Staffordshire

2 Hospital from 1993 to 1999 and from 2001 to The gap was due to lack of resources which affected data collection during this period. Only the most severely injured patients i.e. patients with an ISS greater than 15 are included in this study, resulting in a total of 1658 patients in the dataset. Hence these results are generalisable to severely injured patients only. Table 1. Factors considered for inclusion in the analyses Sex (Male or Female) Age group (years): 0-15; 16-25; 26-35; 36-50; 51-70; over 70 Year of admission (1992-8, ) Month of admission (Jan Dec) Day of admission (Mon Sun) Time of admission ( ; ; ; ; ; ) Referred from another hospital (yes or no) Mechanism of injury group: Motor vehicle crash; Fall greater than 2m; Fall less than 2m; Assault; Other Type of trauma: blunt (yes or no) penetrating (yes or no) Abbreviated injury scores (AIS): Head Face Lower limb Neck Chest External Abdomen Cervical-spine Upper limb Thoracic-spine Spine Lumbar-spine Glasgow coma scores (GCS): Eye response; Motor response; Verbal response Factors considered for inclusion in the analysis are summarised in Table 1. Two different approaches to the statistical analysis of these data were carried out; data mining using an artificial neural network (ANN) and logistic regression modelling (LR). All analysis was carried out using the statistical packages SPSS 12, Clementine 7.0, and Solas Data Mining Methods ANNs attempt to mimic the biological structure and the connectivity of a natural neural network, using the human brain as an analogy. Input is fed through the neurons in the network which transform them to output a probability, in this case, the probability that a patient will die. An exhaustive prune was used to create the ANN. All the neurons are fully connected and each is a feed-forward multi layer perceptron which uses the sigmoid transfer function [7]. The learning technique used is back propagation. This means that, starting with the given topology, the network is trained, then a sensitivity analysis is performed on the hidden units and the weakest are removed. This training/removing is repeated for a set length of time. The ANN used in this study has 3 hidden layers with 30, 20 and 10 neurons respectively and the following learning rates: alpha=0.9, eta=0.3, as previous analysis found that this architecture works well for trauma injury data [6]. As well as data mining using an ANN, LR modelling is included for comparison. The LR models were developed to determine a parsimonious model with good predictive ability, yet as simple a model as possible. Hence this approach is more subjective than the ANN. In medical applications it is often the case that a logistic regression model is developed using the complete data set, and the model is then tested on the same set of data used to build it. However, it is not ideal to test the model with the same data used to build it, and to allow comparison with the data mining methods presented in this paper, a k-fold cross-validation technique was used to test all of the models, with k set to five. This technique is good practice when building neural networks with medical data [8]. Using this technique the data were split into five subsets. Four data subsets are used to train each model, and the fifth is used to test it. This is then repeated another four times so that each data subset is used to test the models once. When splitting the dataset, those patients who lived were selected independently of those patients who died, in order to keep the same proportions of patients who died in each of the k data subsets. This is necessary since the data outcome variable, patient death, is very imbalanced; 79% of patients lived and 21% died during their hospital stay.

3 2.2. Missing value imputations Previous work [6] compared the results of four different ANN models as well as LR to predict death during hospital stay following injury. Both GCS motor and GCS Verbal were found to have high importance in two of the ANNs, and GCS motor was statistically significant in the LR model. In order for these variables to be included in the models, 12% of the sample, i.e. patients whose GCS scores were not recorded, were excluded from the analysis. Hence missing value imputation is considered here in order that all patients can be included in the modelling process. The GCS is a measurement of severity of head injury and comprises three components, each measured on an ordinal scale: eye response (1-4), verbal response (1-5) and motor response (1-6). Three methods of data imputation are considered in this study: 1. Hot-deck imputation 2. Predictive model-based imputation 3. Propensity score imputation Hot-deck imputation [9] involves substituting individual values drawn from patients with observed data who are similar to the patient with the missing value. In terms of the GCS scores, this would involve imputing a GCS score drawn from a subset of patients who are similar to the patient with the missing GCS score. In order to impute a particular GCS score, this method sorts patients both with observed values and those with missing values for this score into a number of subsets according to a set of covariates which are associated with the GCS scores. In this application, the imputation subsets comprise patients with the same values of the injury severity scores: AIS head, AIS chest, AIS lumbar spine and AIS cervical spine. Patients with missing GCS scores will then have their missing values replaced with observed values selected at random, with replacement, from patients in the same subset i.e. patients who are similar with respect to these covariates. If there are no observed values in the corresponding subset of patients, then the subset is collapsed by one level, and this process is repeated until an observed value can be found. Predictive model-based imputation involves imputing a missing value by using an ordinary least-squares regression method to estimate a missing GSC score. Firstly, a predictive model is estimated from the observed data, which contains no missing values for the GCS score of interest. Let Y be the GCS variable to be imputed, and let X be the same set of covariates used in the hotdeck imputation listed above. Let Y obs be the observed values in Y, Y mis be the missing values in Y, and let X obs be the covariates corresponding to Y obs. By regressing Y obs on X obs, predictions for the missing values are obtained from the equation: Y ˆ mis a bx (1) mis Let a represent the constant in the model, and b represent the vector of regression coefficients. Using this estimated model, a random element is incorporated in the estimate of the missing values. Parameter values from the regression model are drawn from their posterior distribution given the data, using non-informative priors [10] [11]. In this way, the extra uncertainty due to the fact that the regression parameters can be estimated, but not determined, from the observed data is reflected. Propensity score imputation [12] is based on the underlying assumption that the missingness of an imputation variable can be explained by a set of covariates using a logistic regression model. A binary indicator variable is created to represent whether the variable to be imputed is missing or observed for each individual. This indicator variable is the dependent variable in the logistic regression modelling, and the independent variables are a set of covariates which is thought to be related to the variable to be imputed. Using the regression coefficients from the logistic regression model, the propensity that a patient would have a missing value can be calculated. The propensity score for a patient is the conditional probability of missingness, given the observed covariates. Missing values of the imputation variable y are imputed by values randomly drawn from a subset of observed values of y, that is, its donor pool. In this study, five donor pool subgroups have been created. The patients in the dataset are sorted in ascending order according to their assigned propensity scores, and then divided into five equal sized subgroups according to their propensity scores. For each missing value, an observed value is selected for imputation, at random with replacement, from the corresponding donor pool. 2.3 Evaluation methods The five-fold cross-validation design results in five training datasets and five corresponding

4 validation datasets. Each of the three imputation methods described above are applied to each of these ten datasets and results are compared for the ANN and the LR models. The overall performance of a model under a particular imputation method is then the mean performance of the five validation data sets. In many data mining efforts the evaluation criterion is the overall accuracy i.e. the percentage of correct classifications made by an algorithm, however, in medical data mining consideration must be given to the percentage of false positives and false negatives made. The evaluation criteria included for testing the classification algorithms are sensitivity (sens), specificity (spec), positive predictive value (PPV) and negative predictive value (NPV). A cut-point of 0.5 is used for in the logistic regression modelling to allow comparability between the three imputation methods. A receiver operator curve (ROC) analysis is carried out to compare the logistic regression results. 3. Results The results for the k-fold cross-validations for each data-mining method applied to each of the three sets of imputed data subsets are presented in Table 2 along with the results when no imputation (complete-case) was performed. The mean accuracy measures of the five validation datasets are given along with the betweenvalidation standard errors. The performance of the complete-case analysis is included for comparison. For the LR modelling, there is very little difference in performance between the three missing data imputation methods, and all three perform almost as well as the complete-case model. Although the specificity for all three LR results is high, the sensitivity measures are all fairly low, with just over half of those who die, predicted correctly. However, the cut-point of 0.5 could be lowered to increase the sensitivity of the models, thereby decreasing specificity. The results of the ROC analysis gave areas under the curve and between-validation standard errors of 0.86 (0.012) for both the hot-deck and the model-based results, and 0.85 (0.013) for the propensity scoring method, whereas the area under the ROC curve for the complete-case analysis was Similarly there is little difference between the three imputation methods when modelling the data with an ANN. However, all imputation methods slightly improve the positive predictive value of the ANN models compared with complete-case analysis. Table 2. Evaluations of Methods Data mining/ imputation method ANN: Sens hot-deck 46% (1.8) 45% (2.2) propensity 41% (5.4) LR: hot-deck 51% (1.8) modelbased Evaluation Criteria Spec 92% (0.7) 92% (0.5) 93% (0.9) PPV 0.61 (0.017) 0.62 (0.014) 0.61 (0.026) NPV 0.86 (0.003) 0.86 (0.004) 0.85 (0.011) 58% 86% % (2.2) propensity 50% (1.1) modelbased completecase completecase 93% (0.7) 93% (0.4) 94% (0.6) 0.66 (0.017) 0.67 (0.007) 0.69 (0.020) 0.88 (0.003) 0.88 (0.004) 0.88 (0.002) 56% 94% Table 3 contains a listing of the factors included in the training models. Many of the factors considered for inclusion in the models (Table 1) are correlated with each other, hence the models do not include the same subsets of factors to have high importance (ANNs) or statistical significance (LRs). A typical LR model shows increased odds of death if involved in a motor vehicle crash, having a blunt or penetrating injury, older age, not being referred from another hospital, and having a more severe

5 injury according to several AIS scores and the three GCS scores. The three GCS scores were often found to be statistically significant in the training models, and all training models included at least two of the GCS scores. Ten factors included in a typical ANN training model are listed in order of importance (Table 3). Two GCS scores are important in this model. Table 3. Factors included in the training models LR models Age group Patient referred Mechanism of injury Blunt injury Penetrating injury GCS eye GCS motor GCS verbal AIS head AIS abdomen AIS external 4. Conclusions ANN models AIS cervical spine AIS thoracic spine AIS external GCS eye GCS motor AIS head AIS spine AIS legs AIS face Year of admission There is little distinction between the three imputation methods in terms of results observed, for both the LR and the ANN models. According to the sensitivity and specificity measures, the results from the imputations are almost as good as the complete-case results, for both the LR and ANN models. This is also confirmed by the ROC analysis, which shows that the model from the complete-case analysis (0.89) is slightly more accurate than those based on the imputed data (0.86, 0.86 and 0.85). In this study, single imputation is used i.e. each missing value is replaced with a single imputed value, and then the data are analysed as for a complete-case analysis. The authors did consider using multiple imputation techniques [9], where each missing value is replaced with M 2 imputed values, resulting in M completed datasets. The M complete-data inferences can be combined to form one inference that reflects the uncertainty due to missingness under that model. Although multiple imputation has not been used in this application, the same missing values are effectively estimated five times under the k-fold cross-validation design, since a patient is included in a validation dataset once and in a training dataset four times. Since different imputations are created for a particular missing value for each of the different data subsets, an element of between imputation variability has been incorporated into the results. Although these results do not lead to more accurate classification of patient death or survival following trauma injury than the complete-case analysis, they do allow classification of patients whose Glasgow coma scores are missing. These patients would not have been included in either building or testing the models in the complete-case analysis. In other words, it would not have been possible to make a prediction for a patient with missing GCS values, whereas using imputation allows a prediction to be made. Further work to investigate how well the different imputation methods correctly estimate the missing GCS scores would be useful. One approach would be to carry out a simulation study using the complete-case data only, where a subset of GCS scores is deleted to mimic the pattern of missingness in the observed data. This would allow the assessment of the different imputation techniques to correctly estimate the deleted GSC scores. Also, similar techniques could then be applied to the whole trauma injury dataset which includes patients with all levels of injury severity, not only those most severely injured with ISS > References [1] The Trauma Audit and Research Network; /FirstDecade.pdf [23/01/06]. [2] Baker SP, O'Neill B, Haddon Jr W, Long WB. The injury severity score: a Method for describing patients with multiple injuries and evaluating patient care. Journal of Trauma 1974; 14: [3] Association For The Advancement Of Automotive Medicine. The abbreviated injury scale, 1990 revision. Des Pleines, IL, Association for the Advancement of Automotive Medicine; 1990.

6 [4] Teasdale G, Jennett B. Assessment of coma and impaired consciousness. A practical scale. Lancet 1974; (ii): [5] Oakley PA, Mackenzie G, Templeton J, Cook AL, Kirby, RM. Longitudinal trends in trauma mortality and survival in Stoke-on- Trent Injury 2004; 35: [6] Chesney T, Penny K, Oakley P, Davies S, Chesney D, Maffulli N, Templeton J. Data mining medical information: Should artificial neural networks be used to analyse trauma audit data? Int J of Healthcare Information Systems and Informatics 2006; 1(2): [7] Watkins D Clementine's Neural Networks Technical Overview; sortium/secure/neural_overview.doc [12/01/06]. [8] Cunningham P, Carney J, Jacob S. Stability problems with artificial neural networks and the ensemble solution. Artificial Intelligence in medicine 2000; 20(3): [9] Little RJA, Rubin DB. Statistical Analysis with Missing Data. New Jersey: John Wiley & Sons; [10]Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley; [11]Gelman A, Carlin J, Stern H, Rubin DB. Bayesian Data Analysis. New York: Chapman and Hall; [12]Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70:

Dealing with Missing Data

Dealing with Missing Data Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904

More information

Missing data and net survival analysis Bernard Rachet

Missing data and net survival analysis Bernard Rachet Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,

More information

Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

The Abbreviated Injury Scale (AIS) A brief introduction

The Abbreviated Injury Scale (AIS) A brief introduction The Abbreviated Injury Scale (AIS) A brief introduction Abbreviated Injury Scale 1990 Revision Update 98 The Abbreviated Injury Scale produced by: Association for the Advancement of Automotive Medicine

More information

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Artificial Neural Network and Non-Linear Regression: A Comparative Study

Artificial Neural Network and Non-Linear Regression: A Comparative Study International Journal of Scientific and Research Publications, Volume 2, Issue 12, December 2012 1 Artificial Neural Network and Non-Linear Regression: A Comparative Study Shraddha Srivastava 1, *, K.C.

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling 1 Forecasting Women s Apparel Sales Using Mathematical Modeling Celia Frank* 1, Balaji Vemulapalli 1, Les M. Sztandera 2, Amar Raheja 3 1 School of Textiles and Materials Technology 2 Computer Information

More information

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA 123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake

More information

NEURAL NETWORKS IN DATA MINING

NEURAL NETWORKS IN DATA MINING NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,

More information

An Article Critique - Helmet Use and Associated Spinal Fractures in Motorcycle Crash Victims. Ashley Roberts. University of Cincinnati

An Article Critique - Helmet Use and Associated Spinal Fractures in Motorcycle Crash Victims. Ashley Roberts. University of Cincinnati Epidemiology Article Critique 1 Running head: Epidemiology Article Critique An Article Critique - Helmet Use and Associated Spinal Fractures in Motorcycle Crash Victims Ashley Roberts University of Cincinnati

More information

Prediction Model for Crude Oil Price Using Artificial Neural Networks

Prediction Model for Crude Oil Price Using Artificial Neural Networks Applied Mathematical Sciences, Vol. 8, 2014, no. 80, 3953-3965 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43193 Prediction Model for Crude Oil Price Using Artificial Neural Networks

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Crash Outcome Data Evaluation System

Crash Outcome Data Evaluation System Crash Outcome Data Evaluation System HEALTH AND COST OUTCOMES RESULTING FROM TRAUMATIC BRAIN INJURY CAUSED BY NOT WEARING A HELMET, FOR MOTORCYCLE CRASHES IN WISCONSIN, 2011 Wayne Bigelow Center for Health

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Tongshan Chang The University of California Office of the President CAIR Conference in Pasadena 11/13/2008

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

40,46 16,22 16,25. no fx thoracic sp. Fx lumbar spine. no fx lumbar. spine

40,46 16,22 16,25. no fx thoracic sp. Fx lumbar spine. no fx lumbar. spine Spine injuries in motor vehicle accidents an analysis of 34188 injured front passengers with special consideration of injuries of the thoracolumbar in relation to injury mechanisms C. W. Müller, D. Otte,

More information

Data mining and statistical models in marketing campaigns of BT Retail

Data mining and statistical models in marketing campaigns of BT Retail Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120

More information

Analysis of Patients with Severe Trauma Caused by Motorcycle Accidents 1

Analysis of Patients with Severe Trauma Caused by Motorcycle Accidents 1 Analysis of Patients with Severe Trauma Caused by Motorcycle Accidents 1 1 You In-Gyu, 2 Lim Chung Hwan, 3 Shim Jae Goo 1, First Author Dept of Health Care, Hanseo University, semicoma72@daum.net *2,Corresponding

More information

Journal of Information

Journal of Information Journal of Information journal homepage: http://www.pakinsight.com/?ic=journal&journal=104 PREDICT SURVIVAL OF PATIENTS WITH LUNG CANCER USING AN ENSEMBLE FEATURE SELECTION ALGORITHM AND CLASSIFICATION

More information

Measuring Injury Severity

Measuring Injury Severity 1 Measuring Injury Severity A brief introduction Thomas Songer, PhD University of Pittsburgh tjs@pitt.edu Injury severity is an integral component in injury research and injury control. This lecture introduces

More information

Long-term medical consequences to children injured in car crashes and influence of crash directions

Long-term medical consequences to children injured in car crashes and influence of crash directions Long-term medical consequences to children injured in car crashes and influence of crash directions Katarina Bohman 1,2), Helena Stigson 2,3), Maria Krafft 3,4) 1) Autoliv Research, 2) Karolinska Institutet,

More information

Injury Mortality Following the Loss of Air Medical Support for Rural Interhospital Transport

Injury Mortality Following the Loss of Air Medical Support for Rural Interhospital Transport 694 Mann et al. MORTALITY WITH LOSS OF RURAL AIR TRANSPORT Injury Mortality Following the Loss of Air Medical Support for Rural Interhospital Transport N. Clay Mann, PhD, MS, Kerrie A. Pinkney, MD, MPH,

More information

FINDING SUBGROUPS OF ENHANCED TREATMENT EFFECT. Jeremy M G Taylor Jared Foster University of Michigan Steve Ruberg Eli Lilly

FINDING SUBGROUPS OF ENHANCED TREATMENT EFFECT. Jeremy M G Taylor Jared Foster University of Michigan Steve Ruberg Eli Lilly FINDING SUBGROUPS OF ENHANCED TREATMENT EFFECT Jeremy M G Taylor Jared Foster University of Michigan Steve Ruberg Eli Lilly 1 1. INTRODUCTION and MOTIVATION 2. PROPOSED METHOD Random Forests Classification

More information

Item Imputation Without Specifying Scale Structure

Item Imputation Without Specifying Scale Structure Original Article Item Imputation Without Specifying Scale Structure Stef van Buuren TNO Quality of Life, Leiden, The Netherlands University of Utrecht, The Netherlands Abstract. Imputation of incomplete

More information

Evaluation of Predictive Models

Evaluation of Predictive Models Evaluation of Predictive Models Assessing calibration and discrimination Examples Decision Systems Group, Brigham and Women s Hospital Harvard Medical School Harvard-MIT Division of Health Sciences and

More information

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Measuring road crash injury severity in Western Australia using ICISS methodology

Measuring road crash injury severity in Western Australia using ICISS methodology Measuring road crash injury severity in Western Australia using ICISS methodology A Chapman Data Analyst, Data Linkage Branch, Public Health Intelligence, Public Health Division, Department of Health,

More information

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013 A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:

More information

IBM SPSS Neural Networks 22

IBM SPSS Neural Networks 22 IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

An exploratory neural network model for predicting disability severity from road traffic accidents in Thailand

An exploratory neural network model for predicting disability severity from road traffic accidents in Thailand An exploratory neural network model for predicting disability severity from road traffic accidents in Thailand Jaratsri Rungrattanaubol 1, Anamai Na-udom 2 and Antony Harfield 1* 1 Department of Computer

More information

England & Wales SEVERE INJURY IN CHILDREN

England & Wales SEVERE INJURY IN CHILDREN England & Wales SEVERE INJURY IN CHILDREN 2012 THE TRAUMA AUDIT AND RESEARCH NETWORK The TARNlet Committee Mr Ross Fisher Co-chairman of TARNlet Consultant in Paediatric Surgery Sheffi eld Children s NHS

More information

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling 1 Forecasting Women s Apparel Sales Using Mathematical Modeling Celia Frank* 1, Balaji Vemulapalli 1, Les M. Sztandera 2, Amar Raheja 3 1 School of Textiles and Materials Technology 2 Computer Information

More information

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random [Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator

More information

Lecture 6. Artificial Neural Networks

Lecture 6. Artificial Neural Networks Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm

More information

Predictive time series analysis of stock prices using neural network classifier

Predictive time series analysis of stock prices using neural network classifier Predictive time series analysis of stock prices using neural network classifier Abhinav Pathak, National Institute of Technology, Karnataka, Surathkal, India abhi.pat93@gmail.com Abstract The work pertains

More information

Joseph Twagilimana, University of Louisville, Louisville, KY

Joseph Twagilimana, University of Louisville, Louisville, KY ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim

More information

Information processing for new generation of clinical decision support systems

Information processing for new generation of clinical decision support systems Information processing for new generation of clinical decision support systems Thomas Mazzocco tma@cs.stir.ac.uk COSIPRA lab - School of Natural Sciences University of Stirling, Scotland (UK) 2nd SPLab

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing

A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing Maryam Daneshmandi mdaneshmandi82@yahoo.com School of Information Technology Shiraz Electronics University Shiraz, Iran

More information

Stock Portfolio Selection using Data Mining Approach

Stock Portfolio Selection using Data Mining Approach IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 11 (November. 2013), V1 PP 42-48 Stock Portfolio Selection using Data Mining Approach Carol Anne Hargreaves, Prateek

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Data Mining Lab 5: Introduction to Neural Networks

Data Mining Lab 5: Introduction to Neural Networks Data Mining Lab 5: Introduction to Neural Networks 1 Introduction In this lab we are going to have a look at some very basic neural networks on a new data set which relates various covariates about cheese

More information

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Application of discriminant analysis to predict the class of degree for graduating students in a university system

Application of discriminant analysis to predict the class of degree for graduating students in a university system International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application

More information

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,... IFT3395/6390 Historical perspective: back to 1957 (Prof. Pascal Vincent) (Rosenblatt, Perceptron ) Machine Learning from linear regression to Neural Networks Computer Science Artificial Intelligence Symbolic

More information

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type. Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

PREDICTING THE USED CAR SAFETY RATINGS CRASHWORTHINESS RATING FROM ANCAP SCORES

PREDICTING THE USED CAR SAFETY RATINGS CRASHWORTHINESS RATING FROM ANCAP SCORES PREDICTING THE USED CAR SAFETY RATINGS CRASHWORTHINESS RATING FROM ANCAP SCORES by Stuart Newstead and Jim Scully April 2012 Report No. 309 Project Sponsored By The Vehicle Safety Research Group ii MONASH

More information

Outcome Prediction after Moderate and Severe Head Injury Using an Artificial Neural Network

Outcome Prediction after Moderate and Severe Head Injury Using an Artificial Neural Network 241 Outcome Prediction after Moderate and Severe Head Injury Using an Artificial Neural Network Min-Huei Hsu a,b, Yu-Chuan Li c, Wen-Ta Chiu d, Ju-Chuan Yen e a Department of Neurosurgery, e Department

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

Weather forecast prediction: a Data Mining application

Weather forecast prediction: a Data Mining application Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,ashwini.mandale@gmail.com,8407974457 Abstract

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Power Prediction Analysis using Artificial Neural Network in MS Excel

Power Prediction Analysis using Artificial Neural Network in MS Excel Power Prediction Analysis using Artificial Neural Network in MS Excel NURHASHINMAH MAHAMAD, MUHAMAD KAMAL B. MOHAMMED AMIN Electronic System Engineering Department Malaysia Japan International Institute

More information

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values Methods Report A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values Hrishikesh Chakraborty and Hong Gu March 9 RTI Press About the Author Hrishikesh Chakraborty,

More information

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services

A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services Anuj Sharma Information Systems Area Indian Institute of Management, Indore, India Dr. Prabin Kumar Panigrahi

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Longitudinal Studies, The Institute of Education, University of London. Square, London, EC1 OHB, U.K. Email: R.D.Wiggins@city.ac.

Longitudinal Studies, The Institute of Education, University of London. Square, London, EC1 OHB, U.K. Email: R.D.Wiggins@city.ac. A comparative evaluation of currently available software remedies to handle missing data in the context of longitudinal design and analysis. Wiggins, R.D 1., Ely, M 2. & Lynch, K. 3 1 Department of Sociology,

More information

Data Mining mit der JMSL Numerical Library for Java Applications

Data Mining mit der JMSL Numerical Library for Java Applications Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

Effective Analysis and Predictive Model of Stroke Disease using Classification Methods

Effective Analysis and Predictive Model of Stroke Disease using Classification Methods Effective Analysis and Predictive Model of Stroke Disease using Classification Methods A.Sudha Student, M.Tech (CSE) VIT University Vellore, India P.Gayathri Assistant Professor VIT University Vellore,

More information

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring 714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: raghavendra_bk@rediffmail.com

More information

Healthcare Data Mining: Prediction Inpatient Length of Stay

Healthcare Data Mining: Prediction Inpatient Length of Stay 3rd International IEEE Conference Intelligent Systems, September 2006 Healthcare Data Mining: Prediction Inpatient Length of Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, Elia El-Darzi 1 Abstract

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

Analysis Issues II. Mary Foulkes, PhD Johns Hopkins University

Analysis Issues II. Mary Foulkes, PhD Johns Hopkins University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

CODES Statewide Application: Older Occupants of Motor Vehicles. Massachusetts

CODES Statewide Application: Older Occupants of Motor Vehicles. Massachusetts CODES Statewide Application: Older Occupants of Motor Vehicles Massachusetts Heather Rothenberg, Marta Benavente, and Michael A. Knodler, Jr. University of Massachusetts Traffic Safety Research Program

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING. Anatoli Nachev

APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING. Anatoli Nachev 86 ITHEA APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING Anatoli Nachev Abstract: This paper presents a case study of data mining modeling techniques for direct marketing. It focuses to three

More information

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Intelligent Modeling of Sugar-cane Maturation

Intelligent Modeling of Sugar-cane Maturation Intelligent Modeling of Sugar-cane Maturation State University of Pernambuco Recife (Brazil) Fernando Buarque de Lima Neto, PhD Salomão Madeiro Flávio Rosendo da Silva Oliveira Frederico Bruno Alexandre

More information

NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia. Produced by: National Cardiovascular Intelligence Network (NCVIN)

NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia. Produced by: National Cardiovascular Intelligence Network (NCVIN) NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia Produced by: National Cardiovascular Intelligence Network (NCVIN) Date: August 2015 About Public Health England Public Health England

More information

Serious Injury Reporting An Irish Perspective. Maggie Martin

Serious Injury Reporting An Irish Perspective. Maggie Martin Serious Injury Reporting An Irish Perspective Maggie Martin Background Investigate the feasibility of adopting the Maximum Abbreviated Injury Scale (MAIS) in Ireland assessed at level 3 or more. Having

More information

NEURAL NETWORKS A Comprehensive Foundation

NEURAL NETWORKS A Comprehensive Foundation NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments

More information

Dealing with Missing Data

Dealing with Missing Data Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

Price Prediction of Share Market using Artificial Neural Network (ANN)

Price Prediction of Share Market using Artificial Neural Network (ANN) Prediction of Share Market using Artificial Neural Network (ANN) Zabir Haider Khan Department of CSE, SUST, Sylhet, Bangladesh Tasnim Sharmin Alin Department of CSE, SUST, Sylhet, Bangladesh Md. Akter

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD

Speaker First Plenary Session THE USE OF BIG DATA - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD Optum Labs Cambridge, MA, USA Statistical Methods and Machine Learning ISPOR International

More information

Motorcycle Safety A Trauma Surgeon s Perspective Sean A. Nix, D.O.

Motorcycle Safety A Trauma Surgeon s Perspective Sean A. Nix, D.O. Motorcycle Safety A Trauma Surgeon s Perspective Sean A. Nix, D.O. Disclosure I have nothing to disclose No political or financial attachments I do take care of injured patients Motorcycle Safety Injury

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

Combining GLM and datamining techniques for modelling accident compensation data. Peter Mulquiney

Combining GLM and datamining techniques for modelling accident compensation data. Peter Mulquiney Combining GLM and datamining techniques for modelling accident compensation data Peter Mulquiney Introduction Accident compensation data exhibit features which complicate loss reserving and premium rate

More information

Addressing the Class Imbalance Problem in Medical Datasets

Addressing the Class Imbalance Problem in Medical Datasets Addressing the Class Imbalance Problem in Medical Datasets M. Mostafizur Rahman and D. N. Davis the size of the training set is significantly increased [5]. If the time taken to resample is not considered,

More information