How To Predict Diabetes In A Cost Bucket
|
|
- Toby Gordon
- 3 years ago
- Views:
Transcription
1 Paper PH An Analysis of Diabetes Risk Factors Using Data Mining Approach Akkarapol Sa-ngasoongsong and Jongsawas Chongwatpol Oklahoma State University, Stillwater, OK 74078, USA ABSTRACT Preventing the disease of diabetes is an ongoing area of interest to the healthcare community. Although many studies employ several data mining techniques to assess the leading causes of diabetes, only small sets of clinical risk factors are considered. Consequently, not only many potentially important variables such as pre-diabetes health conditions are neglected in their analysis, but the results produced by such techniques may not represent relevant risk factors and pattern recognition of diabetes appropriately. In this study, we categorize our analysis into three different focuses based on the patients healthcare costs. We then examine whether more complex analytical models sing several data mining techniques in SAS Enterprise Miner 7.1 can better predict and explain the causes of increasing diabetes in adult patients in each cost category. The preliminary analysis shows that high blood pressure, age, cholesterol, adult BMI, total income, sex, heart attack, marital status, dental checkup, and asthma diagnosis are among the key risk factors. 1. INTRODUCTION Preventing the disease of diabetes is an ongoing area of interest to the healthcare community. Based on the data from the 2011 National Diabetes Fact Sheet, diabetes affects an estimate of 25.8 million people in the US, which is about 8.3% of the population. Additionally, approximately 79 million people have been diagnosed with pre-diabetes [1]. Pre-diabetes refers to a group of people with higher blood glucose levels than normal but not high enough for a diagnosis of diabetes. Increased awareness and treatment of diabetes should begin with prevention. Much of the focus has been on the impact and importance of preventive measures on disease occurrence and especially cost savings resulted from such measures. Many studies regarding diabetes prediction have been conducted for several years. The main objectives are to predict what variables are the causes, at high risk, for diabetes and to provide a preventive action toward individual at increased risk for the disease. Several variables have been reported in literature as important indicators for diabetes prediction. Lindstrom and Tuomilehto (2003) develop the diabetes risk score model considering Age, BMI, waist circumference, history of antihypertensive drug treatment, high blood glucose, physical activity, and daily consumption of fruits, berries, or vegetables as categorical variables [2]. Park and Edington (2001) present a sequential neural network model for diabetes prediction. The authors indicate risk factors, in the final model, including blood pressure, cholesterol, back pain, fatty food, weight index or alcohol index [3]. Concaro et al, (2009) present the application of a data mining technique to a sample of diabetic patients. They consider the clinical variables such as BMI, blood pressure, glycaemia, cholesterol, or cardio-vascular risk in the model [4]. Although these studies employ several data mining techniques to assess the leading causes of diabetes, only small sets of clinical risk factors are considered. Consequently, not only many potentially important variables such as prediabetes health conditions are neglected in their analysis, but the results produced by such techniques may not represent relevant risk factors and pattern recognition of diabetes appropriately. This study seeks to fill this gap. Specially, the question arises What are the most important risk factors to be included in prognostic analysis to prevent prevalence of diabetes? To answer this research question, we examine whether more complex analytical models using several data mining techniques can better predict and explain the causes of increasing diabetes. In this study, we follow the CRISP-DM Model (Cross Industry Standard Process for Data Mining), which is used as a comprehensive data mining methodology and process model for conducting this data mining study. CRISP-DM breaks down this data mining project in to six phases: business understanding, data understanding, data preparation, modeling, evaluation, and development. 1
2 2. RESEARCH FRAMEWORK We provide the in-depth analysis on how data mining approach can be a great help. After understanding the domain of diabetes and developing the objectives of achieving prognostic analysis of diabetes though data mining approach, we begin our analysis by understanding the relevant data source, accessing data quality, and discovering first insights into the data. The next step is toward data preprocessing from the initial raw data to the final dataset, ready for the model development. This preprocessing step takes about 90% of time to clean, transform, construct, and format the relevant data. We then apply analytical data mining techniques to predict and explain factors that increase the prevalence of diabetes in the patient samples. However, we need to evaluate and assess the validity and the utility of our developed predictive models before deploying the data mining results into the domain as stated in the objectives of the study. Figure 1 presents the overall framework of our models to address the research question from the data understanding to model deployment. 2.1 Data Description The data source that is used to perform data mining analysis in this study is provided by SAS in the national 2010 SAS Data Mining Shootout competition. With 50,788 records, the dataset consists of 43 variables in which 35 variables are discrete variables and the other eight variables are continuous variables. This dataset is assumed to be representative of the population and used for analysis as a snapshot of the country and its health care costs at a point in time. Our first task in this study is to get a sense of the dataset for any inconsistencies, errors, or extreme values in the data. Frequency distribution, descriptive statistics, and cross-tab analysis are used in this section. Table 1 presents the summary of inapplicable data for nominal variables. Below are the key findings, which are very important for data preparation in the next section: Table 1: Summary of inapplicable data for nominal variables 2
3 Figure 1: Research Framework 3
4 Below are the key findings, which are very important for data preparation in the next section: - LAST_PSA, LAST_PAP_SMEAR, LAST_BREAST_EXAM, LAST MAMMOGRAM are only taken by one type of gender (Deleted). - More than 95% of the data in PERSON_BLIND and IS_DEAF are inapplicable % of patients with diabetes have no educational degree, compared to 17.81% of patients without diabetes % of patients with diabetes wear eyeglasses, compared to 44.58% of patients without diabetes % of patients with diabetes checked their cholesterol within last year, compared to % of patients without diabetes % of patients with diabetes checked their health within last year, compared to 38.92% of patients without diabetes % of patients with diabetes had flu-shot within last year, compared to 16.55% of patients without diabetes % of patients with diabetes had sigmoidoscopy or colonoscopy test, compared to 10.74% of patients without diabetes % of patients with diabetes have been diagnosed with high blood pressure, compared to % of patients without diabetes % of patients with diabetes have been diagnosed with coronary heart disease diagnosis, compared to 1.57% of patients without diabetes Based on these findings, we can observe that most diabetes patients have been diagnosed with other concurrent diseases such as high blood pressure, heart disease, and cholesterols, etc. 2.2 Data Preparation The next step in this analysis is to examine the quality of the data. We would like to figure out whether or not the data is complete or has missing values and what variables to be included in the model. We make an assumption that the model excludes variables with over 50% information missing; see similar methodology from Park and Edington (2001), [3]. Thus, the following seven variables are excluded from our analysis: (1) MORE_THAN_ONE_JOB, (2) PERSON_BIND, (3) IS_DEAF, (4) LAST_PSA, (5) LAST_PAP_SMEAR, (6) LAST_BREAST_EXAM, and (7) LAST_MAMMOGRAM. Below are two examples on how we analyze those excluded variables. Note that, besides looking at the descriptive statistics, we have actually tested whether or not these seven excluded variables are significant in the models (overall decision tree model or even the model by age and gender). Due to the high-missing-values issue, the results show that these seven variables are not included in any models tested. - For instance, a variable PERSON_BLIND has 48,544 missing observations (approximately 95% missing values) and out of its applicable data, there are only 37 observations indicating that those patients have been diagnosed with diabetes and blindness. Even though there has been claimed that roughly 40% of patients diagnosed with diabetes in the United States have some form of diabetic retinopathy [5], we still exclude that variable from our analysis due to the relatively-low relevant amount of data in this dataset. - Another critical variable to be excluded from our analysis is MORE_THAN_ONE_JOB. There have been several studies on how unhealthy eating or sleeping habits (sedentary life styles) for those people that work more than one job increases the risk factor for diabetes [6]. Only approximately less than 0.3% (68 out of 23,169 observations excluding all non-applicable data) indicates the group of patients who have been diagnosed with diabetes and worked more than one job (see Figure 2). Thus, we decide to exclude this variable from our analysis. Similar reasons for excluding such variable can be seen for the other five variables. Figure 2: Cross-Tab Analysis (Diabetes by PERSON_BIND and Diabetes by MORE_THAN_ONE_JOB) 4
5 Additionally, including such variables with high-missing values in the model or even applying missing value imputation method can lower the quality of our findings. In fact, we believe that the dataset is still quite large enough (over 30,000 observations) for the analysis after excluding these variables and any other missing values. 2.3 Missing Values, Possible Outliers, and Data Transformation After excluding suspected variables, we decide to impute missing values for better model prediction and to increase the total number of observations with diabetes. Although, imputation can bias the model prediction, omitting these missing values in the sample dataset may also produce an extremely biased prediction when applied to the new dataset. Another issue that should be addressed is the possible outliers in the data set. Some of the patients in the data set have income lower than zero. Also some of the most expensive patients have the total health care cost, Medicaid, or Medicare within very extreme percentile. In our model, we decide to filter the patients with these outliers out of the data set. Consequently, a total of 27,655 observations are ready for modeling analysis. For interval input variables, the missing values are replaced with the mean of the non-missing values. In contrast, the missing values for nominal variables are replaced with the most frequently occurring class variable value. Initially, we decide to transform the variables, which look right skewed, in order to improve the fit of a model to the data. We apply Maximize Normality method to the following six variables: AMOUNT_PAID_MEDICAID, AMOUNT_PAID_MEDICARE, NUMBER_VISITS, ADULT_BMI, TOTALEXP, and TOTAL_INCOME. However, it seems that only TOTAL_INCOME and NUMB_VISITS show significant improvement after the transformation, while the distribution of other variables remain closely the same. Thus, only TOTAL_INCOME and NUMB_VISITS variables are transformed for further analysis (see Figure 3). Figure 3: Variable Transformation Statistics 3. PREDICTION MODEL In this study, three popular data mining techniques including logistic regression, decision tree, and artificial neural network are applied and compared to each other based on their predictive accuracy on the hold-out sample. - Logistic regression is often used to predict an outcome variable that is binary or multi-class dependent variables. It allows the prediction of discrete variables (dependent variables) by a mix of continuous and discrete predictors as the relationship between dependent variables and independent variables is non-linear. It builds the model to predict the odds of its occurrence instead of point estimate event in the traditional linear regression model. - Decision tree is another data classification and prediction method commonly used due to its intuitive explainability characteristics. Decision tree divides the dataset into multiple groups by evaluating individual data record, which can be described by its attributes. It is also simple and easy to visualize the process of classification where the predicates return discrete values and can be explained by a series of nested if-then-else statements. - Artificial Neural Network (ANN) is a mathematical and computational model for pattern recognition and data classification through a learning process. It is a biologically inspired analytical technique, simulating biological systems, where learning algorithm indicates how learning takes place and involves adjustments to the synaptic connections between neurons. Data input can be discrete or real valued; meanwhile the output is in a form of vector of values and can be discrete or real valued as well. For a technical summary including both algorithm and its applications of each method in medical and health care areas see [7-11]. 5
6 3.1 Cost Bucketing and Model Classification In this study, we categorize our analysis into three different focuses based on the patients healthcare costs. We partition our patients into these three groups (see Table 2) in such a way that the sum of all patients healthcare costs in each group is approximately the same. The range of healthcare costs is varied significantly. According to the population s cumulative healthcare expenses, approximately 87% of the overall expense of the population originates from only 3% of the most expensive patients. - The cost bucket #1 can be interpreted as candidates for the low-risk group of patients diagnosed with diabetes. - The cost bucket #2 can be interpreted as the pre-diabetes group. - The cost bucket #3 can be interpreted as the high-risk group of patients diagnosed with diabetes. Table 2: Cost bucket information We decide to stratify our samples to construct a model set with approximately equal numbers of each target value (DIABETES_DIAG_BINARY) in each cost bucket. With a 50% adjusting for oversampling, we believe that the contrast between the two values is minimized, which make it easier and more reliable to recognize patterns in this dataset. The most important goals of this study are to predict the outcome probability of people with diabetes. In order to provide more accurate diabetes prediction, we apply and compare ANN, Logistic Regression, and Decision three models in each cost bucket as well. 3.2 Performance Measures The complicity of the model is controlled by fit statistics calculated on the testing dataset. We use three different criteria to select the best model on the testing dataset. These criteria include false negative, prediction accuracy, and misclassification rate. False negative (Target = 1 and Outcome = 0) represents the case of an error in the model prediction where model results indicate that diabetes occurrence is not present, when in reality, there is an incident. The false negative value should be as low as possible. The proportion of cases misclassified is very common in the predictive modeling. However, the observed misclassification rate should be also relatively low for model justification. Lastly, prediction accuracy is evaluated among the three models on the testing dataset. The higher the prediction accuracy rate, the better the model to be selected. 4. RESULTS AND DISCUSSIONS After excluding variables with outliers and high missing values, we first develop predictive models on the original sample dataset without categorizing our sample into three cost-bucketing groups as presented in Table 2. The results of the overall model are compared to those from each cost-bucketing group to determine whether important risk factors of diabetes are different. The dataset is allocated to the training (70%) and testing (30%) partitions. The binary variable of patients with diabetes (Target = 1 for patients with diabetes and Outcome = 0 for patients without diabetes) is the output variable of the prediction models. After recoding all categorical input variables, the selected variables are tested whether the association between the input variables and the logit of binary target variable satisfy the linearity assumption. The problematic variables are then transformed to satisfy such assumption. Different models are constructed and compared in order to predict patients with diabetes. Training dataset is only used to extract models by the data mining algorithms. Then, those models derived in the training data set are then applied on the testing dataset for the correct discovery of intrusions. In other words, this testing dataset is used to prune the models generated by the data mining process in the training dataset to avoid overfitting and instabilities in the classification accuracy. Statistical analyses are performed using SAS Enterprise Guide 4.3 for data preparation and SAS Enterprise Miner 7.1 for model development and comparison. 6
7 4.1 Overall Model Logistic regression produces the best results with overall misclassification rate of 22.89%. Although Artificial Neural Networks (ANN) has the lowest false negative rate of 20.55%, we still select the logistic regression model as our final model to predict patients with diabetes. After performing the stepwise logistic regression, which is a process of building a model where the choice of predictive variables is carried our based on the t-statistics of their estimated coefficients, the final model is presented in Figure 4. There are total 15 risk factors used to predict the prevalence of diabetes. Figure 4: Classification Table and Summary of Stepwise Selection (Overall Model) 4.2 Cost Bucket #1 (<$5,750) For the low-risk group (Cost Bucket #1), Decision tree produces the best results with overall misclassification rate of 24.56%, followed by the Logistic regression and artificial neural networks with misclassification rates of 25.43% and 25.68%, respectively. Figure 5 presents the final variable selection and classification table for Decision Tree model. Seven variables including age, cholesterol, adult BMI, high blood pressure, total income, sex, and asthma diagnosis are important factors to predict diabetes in this cost bucket group. Compared to the overall model, Only age, high blood pressure, and cholesterol are the common risk factors. 7
8 Figure 5: Classification Table and Variable Importance (Cost Bucket #1) 4.3 Cost Bucket #2 ($5,750 - $19,400) For the pre-diabetes group, Logistic Regression produces the best results with overall misclassification rate of 29.74%, followed by artificial neural networks and decision tree, respectively. Figure 6 presents the final variable selection based on the stepwise regression approach and classification table for Logistic regression model. Eight variables including cholesterol, adult BMI, high blood pressure, highest degree, last flu shot, heart attack, wears eyeglasses, and marital status are important factors to predict diabetes in this cost bucket group. Compared to the overall model, adult BMI, high blood pressure, last flu shot, heart attack, wears eyeglasses, and marital status are the common risk factors. On the other hand, only adult BMI and high blood pressure are the common risk factors between this group and the cost bucket # Cost Bucket #3 (>$19,400) For the high-risk group, Decision Tree produces the best results with overall misclassification rate of 41.80%, followed by artificial neural networks and Logistic Regression, respectively. Figure 7 presents the final variable selection and classification table for Decision Tree model. The final model indicates that adult BMI, age, wear eyeglasses, and marital status are among the key risk factors in this cost bucket group. Compared to the overall model, adult BMI, age, and, wears eyeglasses are the common risk factors. On the other hand, only adult BMI and age are the common risk factors between this group and the cost bucket #1; meanwhile adult BMI, wears eyeglasses, and marital status are the common risk factors compared to the cost bucket #2 8
9 Figure 6: Classification Table and Summary of Stepwise Selection (Cost Bucket #2) 5. CONCLUSION This study demonstrates that data mining based approaches can be used to assess predictor variables influencing the risk of diabetes in adult patients. As opposed to the traditional descriptive statistical analysis methods or the approaches adopting only expert-selected variables, the employment of logistic regression, decision trees, or neural network models provide an interesting list of risk factors that some have been included in the existing studies; meanwhile some others have been absent from the related literatures. We categorize our analysis into three different focuses based on the patients healthcare costs. Figure 8 presents the summary of the key findings of this study. Only adult BMI is the key risk factor among these four groups. Age is the most important factor for the overall model, cost buckets #1, and # 3; meanwhile, high blood pressure is the key indicator for the overall model, cost buckets #1 and #2. The results presented in Figure 8 confirm the previous literature to some extent. The following key risk factors: High blood pressure, adult BMI, cholesterol, and heart attack derived from our analysis are consistent with the studies from Lindstrom and Tuomilehto (2003), Park and Edington (2001), and Concaro et al. (2009). On the other hand, several interesting factors such as total income, asthma diagnosis, wears eyeglasses, and marital status reported to be critical in this study have not been included in the existing studies. A prospective study is needed to provide better understanding of the relationship between these undisclosed factors and the increased risk of diabetes. 9
10 Figure 7: Classification Table and Variable Importance (Cost Bucket #3) Figure 8: Summary of the Key Findings 10
11 REFERENCES [1] American Diabetes Association, "Diabetes Statistics," June 02, 2012, < [2] J. Lindstrom and J. Tuomilehto, "The Diabetes Risk Score: A practical tool to predict type 2 diabetes risk," Diabetes Care, 26:3 (2003), [3] J. Park and D. W. Edington, "A Sequential Neural Network Model for Diabetes Prediction," Artificial Intelligence in Medicine, 23 (2001), [4] S. Concaro, L. Sacchi, C. Cerra, and M. Stefanelli, "Temporal Data Mining for the Assessment of the Costs Related to Diabetes Mellitus Pharmacological Treatment," Proc. AMIA 2009 Symposium Proceedings, 2009, pp [5] American Optometric Association, "Diabetes is the Leading Cause of Blindness Among Most Adults," July 11, 2010, < [6] Eurekalert, "Insufficient sleep may be linked to increased diabetes risk," July 11, 2010, < [7] Jackson, J., Data mining: a conceptual overview. Communications of the Association for Information Systems, : p [8] Turban, E., R. Sharda, and D. Delen, Decision Support and Business Intelligence Systems. 2011, Pearson. [9] Delen, D., A. Oztekin, and Z.J. Kong, A machine learning-based approach to prognostic analysis of thoracic transplantations. Artificial Intelligence in Medicine, (1): p [10] Delen, D., A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, (4): p [11] Delen, D., G. Walker, and A. Kadam, Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine, (2): p CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Name: Akkarapol Sa-ngasoongsong Enterprise: Oklahoma State University Address: 245N. University Place#301 City, State ZIP: Stillwater, Oklahoma, 74075, USA akkarap@okstate.edu Name: Jongsawas Chongwatpol Enterprise: Oklahoma State University Address: Spears School of Business City, State ZIP: Stillwater, Oklahoma, 74078, USA jongsaw@ostat .okstate.edu Akkarapol Sa-ngasoongsong is a PhD student in School of Industrial Engineering and Management at Oklahoma State University. He has three years of professional experiences as R&D Engineer. He has earned two SAS certifications: SAS Certified Base Programmer for SAS 9 and Certified Predictive Modeler using SAS Enterprise Miner 6.1. He was 1 st place award winner of the M2010 conference s Data Mining Shootout, and recipient of SAS ambassador honorable mention award (SGF 2012) Jongsawas Chongwatpol has earned PhD degree in Management Science and Information Systems from Oklahoma State University. He was 1 st place award winner of the M2010 conference s Data Mining Shootout. He was also a recipient of the best teaching award from Oklahoma State University, and 2011 best cast study paper award from Decision Sciences Institute (DSI). SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 11
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
More informationReevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry
Paper 1808-2014 Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry Kittipong Trongsawad and Jongsawas Chongwatpol NIDA Business School, National Institute
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationPharmaSUG2011 Paper HS03
PharmaSUG2011 Paper HS03 Using SAS Predictive Modeling to Investigate the Asthma s Patient Future Hospitalization Risk Yehia H. Khalil, University of Louisville, Louisville, KY, US ABSTRACT The focus of
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationA Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
More informationImproving the Thermal Efficiency of Coal-Fired Power Plants: A Data Mining Approach
Paper 1805-2014 Improving the Thermal Efficiency of Coal-Fired Power Plants: A Data Mining Approach Thanrawee Phurithititanapong and Jongsawas Chongwatpol NIDA Business School, National Institute of Development
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationDisease Prevention to Reduce New Hampshire Healthcare Claims and Costs: A Data Mining Approach
Disease Prevention to Reduce New Hampshire Healthcare Claims and Costs: A Data Mining Approach Disease Prevention to Reduce New Hampshire Healthcare Claims and Costs: A Data Mining Approach Rakesh Karn,
More informationCustomer and Business Analytic
Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationData Mining On Diabetics
Data Mining On Diabetics Janani Sankari.M 1,Saravana priya.m 2 Assistant Professor 1,2 Department of Information Technology 1,Computer Engineering 2 Jeppiaar Engineering College,Chennai 1, D.Y.Patil College
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationDivide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics
for Healthcare Analytics Si-Chi Chin,KiyanaZolfaghar,SenjutiBasuRoy,AnkurTeredesai,andPaulAmoroso Institute of Technology, The University of Washington -Tacoma,900CommerceStreet,Tacoma,WA980-00,U.S.A.
More informationA Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND
Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression
More informationJournal of Information
Journal of Information journal homepage: http://www.pakinsight.com/?ic=journal&journal=104 PREDICT SURVIVAL OF PATIENTS WITH LUNG CANCER USING AN ENSEMBLE FEATURE SELECTION ALGORITHM AND CLASSIFICATION
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationAn Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
More informationDATA MINING AND REPORTING IN HEALTHCARE
DATA MINING AND REPORTING IN HEALTHCARE Divya Gandhi 1, Pooja Asher 2, Harshada Chaudhari 3 1,2,3 Department of Information Technology, Sardar Patel Institute of Technology, Mumbai,(India) ABSTRACT The
More informationLowering social cost of car accidents by predicting high-risk drivers
Lowering social cost of car accidents by predicting high-risk drivers Vannessa Peng Davin Tsai Shu-Min Yeh Why we do this? Traffic accident happened every day. In order to decrease the number of traffic
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More informationFigure 1. The Research Framework on Hit Song Prediction
Paper 3368-2015 Determining the Key Success Factors for hit songs in the Billboard Music Charts Piboon Banpotsakun, MBA Student, NIDA Business School, Bangkok, Thailand Jongsawas Chongwatpol, Ph.D., NIDA
More informationCOMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS
COMPARING NEURAL NETWORK ALGORITHM PERFORMANCE USING SPSS AND NEUROSOLUTIONS AMJAD HARB and RASHID JAYOUSI Faculty of Computer Science, Al-Quds University, Jerusalem, Palestine Abstract This study exploits
More informationECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node
Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous
More informationUnderstanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
More informationInternet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers
Paper 1863-2014 Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers Sai Vijay Kishore Movva, Vandana Reddy and Dr. Goutam Chakraborty;
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationA STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
More informationUniversity of Michigan Health Risk Assessment (HRA) and Trend Management System (TMS)
University of Michigan Health Risk Assessment (HRA) and Trend Management System (TMS) CIGNA has entered into a long-term agreement with the University of Michigan, which gives it access to intellectual
More informationUse Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study
Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Tongshan Chang The University of California Office of the President CAIR Conference in Pasadena 11/13/2008
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationUniversity of Michigan Dearborn Graduate Psychology Assessment Program
University of Michigan Dearborn Graduate Psychology Assessment Program Graduate Clinical Health Psychology Program Goals 1 Psychotherapy Skills Acquisition: To train students in the skills and knowledge
More informationS03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY
S03-2008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT Predictive modeling includes regression, both logistic and linear,
More informationSURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH
330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,
More informationHow Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK
How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information
More informationFacts about Diabetes in Massachusetts
Facts about Diabetes in Massachusetts Diabetes is a disease in which the body does not produce or properly use insulin (a hormone used to convert sugar, starches, and other food into the energy needed
More informationData Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved.
Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationAddressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association
Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile Association Overview Two Challenges: 1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects
More informationBenchmarking of different classes of models used for credit scoring
Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationA Property and Casualty Insurance Predictive Modeling Process in SAS
Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly
More informationA Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.
A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc. Introduction: The Basel Capital Accord, ready for implementation in force around 2006, sets out
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationData Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationTDWI Best Practice BI & DW Predictive Analytics & Data Mining
TDWI Best Practice BI & DW Predictive Analytics & Data Mining Course Length : 9am to 5pm, 2 consecutive days 2012 Dates : Sydney: July 30 & 31 Melbourne: August 2 & 3 Canberra: August 6 & 7 Venue & Cost
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationTHE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
More informationPredicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS
Paper 114-27 Predicting Customer in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Junxiang Lu, Ph.D. Sprint Communications Company Overland Park, Kansas ABSTRACT
More informationJoseph Twagilimana, University of Louisville, Louisville, KY
ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim
More informationNHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia. Produced by: National Cardiovascular Intelligence Network (NCVIN)
NHS Diabetes Prevention Programme (NHS DPP) Non-diabetic hyperglycaemia Produced by: National Cardiovascular Intelligence Network (NCVIN) Date: August 2015 About Public Health England Public Health England
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationSurvey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses
Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses Salford Systems Data Mining 2006 March 27-31 2006 San Diego, CA By Dean Abbott Abbott Analytics
More informationUSING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES
USING DATA SCIENCE TO DISCOVE INSIGHT OF MEDICAL PROVIDERS CHARGE FOR COMMON SERVICES Irron Williams Northwestern University IrronWilliams2015@u.northwestern.edu Abstract--Data science is evolving. In
More information!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
More informationData Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
More informationMethods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
More informationVariable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT
Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal ank of Scotland, ridgeport, CT ASTRACT The credit card industry is particular in its need for a wide variety
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationSECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION. Friday 14 March 2008 9.00-9.
SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION Friday 14 March 2008 9.00-9.45 am Attempt all ten questions. For each question, choose the
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationImpelling Heart Attack Prediction System using Data Mining and Artificial Neural Network
General Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Impelling
More informationEFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
More informationPaper 3508-2015. Downtime of a truck = Truck repair end date - Truck repair start date
Paper 3508-2015 Using Text from Repair Tickets of a Truck Manufacturing Company to Predict Factors that Contribute to Truck Downtime Ayush Priyadarshi and Dr. Goutam Chakraborty, Oklahoma State University
More informationLeveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
More informationAlex Vidras, David Tysinger. Merkle Inc.
Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT
More informationMortality Assessment Technology: A New Tool for Life Insurance Underwriting
Mortality Assessment Technology: A New Tool for Life Insurance Underwriting Guizhou Hu, MD, PhD BioSignia, Inc, Durham, North Carolina Abstract The ability to more accurately predict chronic disease morbidity
More informationThe SAS Analytics Shootout Annual Student Competition
The SAS Analytics Shootout Annual Student Competition Sponsored by SAS and The Institute for Health & Business Insight at Central Michigan University Joseph Pomerville, Research Analyst / Advanced Analytics
More informationLecture 6 - Data Mining Processes
Lecture 6 - Data Mining Processes Dr. Songsri Tangsripairoj Dr.Benjarath Pupacdi Faculty of ICT, Mahidol University 1 Cross-Industry Standard Process for Data Mining (CRISP-DM) Example Application: Telephone
More informationHealthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
More informationAnalyzing Research Data Using Excel
Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial
More informationA fast, powerful data mining workbench designed for small to midsize organizations
FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationEasily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
More informationData Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationImproving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables
Paper 10961-2016 Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables Vinoth Kumar Raja, Vignesh Dhanabal and Dr. Goutam Chakraborty, Oklahoma State
More informationArtificial Neural Network Approach for Classification of Heart Disease Dataset
Artificial Neural Network Approach for Classification of Heart Disease Dataset Manjusha B. Wadhonkar 1, Prof. P.A. Tijare 2 and Prof. S.N.Sawalkar 3 1 M.E Computer Engineering (Second Year)., Computer
More informationMaster of Science in Healthcare Informatics and Analytics Program Overview
Master of Science in Healthcare Informatics and Analytics Program Overview The program is a 60 credit, 100 week course of study that is designed to graduate students who: Understand and can apply the appropriate
More informationInsurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationPredictive Modeling of Titanic Survivors: a Learning Competition
SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationThe Predictive Data Mining Revolution in Scorecards:
January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms
More informationTYPE 2 DIABETES IN THE AFRICAN AMERICAN COMMUNITY. Understanding the Complications That May Happen Without Proper Care
TYPE 2 DIABETES IN THE AFRICAN AMERICAN COMMUNITY Understanding the Complications That May Happen Without Proper Care STAYING HEALTHY THE IMPORTANCE OF PROPER MANAGEMENT OF TYPE 2 DIABETES Diabetes is
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More information2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
More informationREVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING DATA MINING TECHNIQUES
International Journal of Latest Research in Engineering and Technology (IJLRET) ISSN: 2454-5031(Online) ǁ Volume 1 Issue 5ǁOctober 2015 ǁ PP 09-14 REVIEW ON PREDICTION SYSTEM FOR HEART DIAGNOSIS USING
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationWhite Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.
White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access
More information