MHI3000 Big Data Analytics for Health Care Final Project Report
|
|
|
- Alexandra Phelps
- 10 years ago
- Views:
Transcription
1 MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu ( ) 1. Data pre-processing The data given to use is quite huge and along with many problems I need to consider. 1) Many missing data, e.g. AVG_Income, we need to determine by which way to process those empty spots. 2) There might be duplicate data, should I remove the duplicate or not? 3) Some variable only contains one value. E.g. Test_F. 4) With large number of variables, shall we need to do dimensionality reduction for less computational cost? Thus, I ve done following pre-processing things before getting started: 1. Try different ways of replacing missing value and it turns out that replacing missing with 0 is the best way for my validation set AUC. However, I think for variable AVG_Income, Distance, we should use replace empty with mean or median instead of 0 is much more making sense. 2. I found there is no difference if removing duplicates or not because luckily we don t have duplicates in our dataset. Even if there are duplicates, I suppose real life patients obtain all those so it s applicable to use anyway. 3. The only variable I need to remove is Test_F. You can see my remove reason in 4.4 of my report. As for some variables that only contains very few options (missing value proportion > 0.5) of values or only few spot contains value, I found they are actually incredibly significant for the whole model. E.g. Nausia_Score. As for Test_F, it s reasonable to eliminate this column. 4. Essentially, PCA is a process of retaining most important information by project original data to a lower dimensionality. Also, PCA isn t a robust to outliers. Thus, PCA will probably lower the final AUC and I didn t use it. 2. Model Selection Among all those classifiers, I experimented with them all and sum up a table to illustrate why I choose two-class boosted decision tree over the rest choices.
2 Accuracy AUC Parameters Two-class Average Perception /10 Two-class Bayes Machine Two-class Decision forest /32/128/1 Two-class Decision Jungle /32/128/2048 Two-class Neural Network takes forever 100 hidden units & 100 iters Two-class Locally-Deep SVM /0.1/0.01/0.01/1 Two-class SVM /0.001 Two-class boosted decision tree /10/0.2/ /10/0.1/ /10/0.01/500~700 Multi-class Methods Unapplicable Two-class boosted decision tree is a kind of boosting ensemble methods that consist of weak learns(i.e. decision tree). Every classifier trains one after another and every classifier depends on the previous one. First train the base classifier on all the training data with equal importance weights on each case. Then re-weight the training data to emphasize the hard cases and train a second model. Finally, use a weighted union of all the models for the test data. Parameter setting: Maximum number of leaves per tree: 20 Minimum number of samples per leaf node: 10 Learning rate: 0.01 Number of trees constructed: 500 ~ 700
3 You can see from above that LR = 0.01 is the best one among all. For more accurate observation, I plotted this graph and it turns out that Iter = 500~800 are the peak before overfitting. That s what I m looking for the best parameters.
4 3. Effectiveness and Prediction Such model has been working well on our training dataset that accuracy achieved 97.7% and AUC eventually reached 98.7%, which is obviously higher than any other models. Thus, I used entire training data to train the model for prediction. I predicted the score and score probability for evaluation dataset in the attachment. 4. Problem Unresolved / Interesting Discovery 1) The meaning of some features Many columns like admission data is not clear about what this column to do. Thus, it s hard for me to decide whether should I keep this column or what kind of method should I use for missing data. 2) Recall vs. Precision In my experiment, my Recall and Precision are and respectively, which are relatively balanced. Since those data come from ED, the meaning of Recall is supposed to be how many emergency out there correctly come to the ED while the meaning of Precision is how many of my ED patients are truly need ED. In the meantime, these two elements are kind of tradeoff that we can t have them both. In this case, if the hospital has enough capacity for patients, I would be concern about Recall because this means we don t care receive more patients but only focus on true emergency. However, if the resource for this hospital is very intense, then I will suggest consider more Precision rather than Recall for we need to be accurate about the ED patients we selected. 3) Overfitting I tried several combinations of my parameters. In this case, since we have a lot of features therefore it s reasonable to find a big tree with many leaves. The larger maximum number of tree leaves, the more possibility that the tree might be divided into more detail pieces. As for Minimum number of samples per leaf node, this parameter cares about the generalization of a certain model. If this number is too small, there might be some noise in one node as a way of classification. Besides, an appropriate learning rate combining with proper iterations could gradually push the model to the best without overfitting. If the learning rate is too big, it s probable that the curve will never converge and bounce around instead. Thus, I prefer small learning rate with more iterations so that we can push to the
5 best fitting model gradually. Let s take LR=0.01 as an example, when I set iterations to (500,700), the AUC of the model will be but when the iteration comes to 1000, the AUC dropped to And for iterations equals to 10000, the AUC dropped to I believe the charts shown below could explain my basic idea of looking for the best model settings. Iter LR According to the chart, it s not hard to tell that LR=0.01 and iteration=500 is the best option. 4) Filter Based Features In Azure, we can use Filter Based Feature Selection to tell which columns are the useless ones and exclude some columns then. In order to make the overall and reasonable decision, I tried all the methods provided and made a chart like this: Fisher Score Chi-Square Hospital_Admisisons Test_E Gender Test_F Mutual Information Hospital_Admisisons Test_E Gender Test_F Spearman Correlation Kendall Correlation Pearson
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Car Insurance. Havránek, Pokorný, Tomášek
Car Insurance Havránek, Pokorný, Tomášek Outline Data overview Horizontal approach + Decision tree/forests Vertical (column) approach + Neural networks SVM Data overview Customers Viewed policies Bought
Microsoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql [email protected] http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Maschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Model Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
On the Effectiveness of Obfuscation Techniques in Online Social Networks
On the Effectiveness of Obfuscation Techniques in Online Social Networks Terence Chen 1,2, Roksana Boreli 1,2, Mohamed-Ali Kaafar 1,3, and Arik Friedman 1,2 1 NICTA, Australia 2 UNSW, Australia 3 INRIA,
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski [email protected] Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Predictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath [email protected] National Institute of Industrial Engineering (NITIE) Vihar
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
Predicting Flight Delays
Predicting Flight Delays Dieterich Lawson [email protected] William Castillo [email protected] Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Predicting earning potential on Adult Dataset
MSc in Computing, Business Intelligence and Data Mining stream. Business Intelligence and Data Mining Applications Project Report. Predicting earning potential on Adult Dataset Submitted by: xxxxxxx Supervisor:
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Identifying SPAM with Predictive Models
Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Predicting borrowers chance of defaulting on credit loans
Predicting borrowers chance of defaulting on credit loans Junjie Liang ([email protected]) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm
INTRODUCING AZURE MACHINE LEARNING
David Chappell INTRODUCING AZURE MACHINE LEARNING A GUIDE FOR TECHNICAL PROFESSIONALS Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents What is Machine Learning?... 3 The
How To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008
Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
Boosting. [email protected]
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg [email protected]
Alignment and Preprocessing for Data Analysis
Alignment and Preprocessing for Data Analysis Preprocessing tools for chromatography Basics of alignment GC FID (D) data and issues PCA F Ratios GC MS (D) data and issues PCA F Ratios PARAFAC Piecewise
Employer Health Insurance Premium Prediction Elliott Lui
Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected]
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected] Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Better credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
CS570 Data Mining Classification: Ensemble Methods
CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Comparison of machine learning methods for intelligent tutoring systems
Comparison of machine learning methods for intelligent tutoring systems Wilhelmiina Hämäläinen 1 and Mikko Vinni 1 Department of Computer Science, University of Joensuu, P.O. Box 111, FI-80101 Joensuu
Interactive Data Mining and Visualization
Interactive Data Mining and Visualization Zhitao Qiu Abstract: Interactive analysis introduces dynamic changes in Visualization. On another hand, advanced visualization can provide different perspectives
Predicting Student Persistence Using Data Mining and Statistical Analysis Methods
Predicting Student Persistence Using Data Mining and Statistical Analysis Methods Koji Fujiwara Office of Institutional Research and Effectiveness Bemidji State University & Northwest Technical College
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Data Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
MTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
An Approach to Detect Spam Emails by Using Majority Voting
An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H-12 Islamabad, Pakistan Usman Qamar Faculty,
Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model
AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different
Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit
Statistics in Retail Finance Chapter 7: Fraud Detection in Retail Credit 1 Overview > Detection of fraud remains an important issue in retail credit. Methods similar to scorecard development may be employed,
Machine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation
Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation James K. Kimotho, Christoph Sondermann-Woelke, Tobias Meyer, and Walter Sextro Department
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Ensemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
Machine Learning & Predictive Analytics for IT Services
Dorothea Wiesmann, IBM Research Zurich Machine Learning & Predictive Analytics for IT Services 1 Broad Application of Analytics and Data Science Various projects in IBM Research Zurich applying statistical
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
A Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi [email protected] Satakshi Rana [email protected] Aswin Rajkumar [email protected] Sai Kaushik Ponnekanti [email protected] Vinit Parakh
DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University
DATA ANALYSIS QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University Quantitative Research What is Statistics? Statistics (as a subject) is the science
Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring
714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: [email protected]
Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery
A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery Alex A. Freitas Postgraduate Program in Computer Science, Pontificia Universidade Catolica do Parana Rua Imaculada Conceicao,
Stock Market Forecasting Using Machine Learning Algorithms
Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
Mining Life Insurance Data for Customer Attrition Analysis
Mining Life Insurance Data for Customer Attrition Analysis T. L. Oshini Goonetilleke Informatics Institute of Technology/Department of Computing, Colombo, Sri Lanka Email: [email protected] H. A. Caldera
ViviSight: A Sophisticated, Data-driven Business Intelligence Tool for Churn and Loan Default Prediction
ViviSight: A Sophisticated, Data-driven Business Intelligence Tool for Churn and Loan Default Prediction Barun Paudel 1, T.H. Gopaluwewa 1, M.R.De. Waas Gunawardena 1, W.C.H. Wijerathna 1, Rohan Samarasinghe
DATA MINING AND REPORTING IN HEALTHCARE
DATA MINING AND REPORTING IN HEALTHCARE Divya Gandhi 1, Pooja Asher 2, Harshada Chaudhari 3 1,2,3 Department of Information Technology, Sardar Patel Institute of Technology, Mumbai,(India) ABSTRACT The
Monday Morning Data Mining
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass
Pentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA
Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA An Overview of SAS Enterprise Miner The following article is in regards to Enterprise Miner v.4.3 that is available in SAS v9.1.3.
Question 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
