Data mining techniques: decision trees
|
|
|
- Nora Blankenship
- 10 years ago
- Views:
Transcription
1 Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1
2 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39 Set of conditions organized hierarchically in such a way that the final decision can be determined following the conditions that are fulfilled from the root of the tree to one of its leaves. They are easily understandable. They build a model (made up by rules) easy to understand for the user. They only work over a single table, and over a single attribute at a time. They are one of the most used data mining techniques. 4/39 2
3 The weather problem: 5/39 Decision tree that solves the problem: 6/39 3
4 The decision trees are based on the strategy "divide and conquer". There are two possible types of divisions or partitions: Nominal partitions: a nominal attribute may lead to a split with as many branches as values there are for the attribute. Numerical partitions: typically, they allow partitions like "X>" and "X <a". Partitions relating two different attributes are not permitted. What distinguishes the different algorithms from each others are the partitions they allow, and what criteria they use to select the partitions. 7/39 Expressive power: they can only make partitions parallel to the axis: 2D example 8/39 4
5 Agenda Rule systems Building rule systems vs rule systems Quick reference 9/39 The most common approach is to try to minimize the function like n is the number of partitions p j is the probability of falling into the node j p c j is the proportion of elements of the class c falling into the node j. f is a function of impurity, thus the formula calculates the average "impurity" of the nodes. 10/39 5
6 Some functions: Error GINI (CART) Entropy (C4.5) DKM The more entropy there is, the more information is needed to solve the problem. Note: log are taken in base 2. 11/39 The wheather problem: Which is the best decision? Outlook: 12/39 6
7 But how much increase in "purity" (information) would that choice produce? What was the "purity" at the root of the tree? Thus, we are gaining: 13/39 Thus, the info gain for every possible branch is: 14/39 7
8 Once the first decision attribute has been chosen, we repeat the operation with the other nodes: 15/39 Final result: With real examples, a certain level of "impurity" in each leaf node is usually tolerated. Trying to learn the training data perfectly, will likely lead to overfitting. 16/39 8
9 But 17/39 This attribute gives us a perfect (and useless) classification!!! Its info gain is bits, i.e., all the information needed to solve the problem : 18/39 9
10 The solution to this problem is to somehow penalize the attributes that lead to a very high number of branches. One option is to take into account the number and size of the children nodes, regardless of what classes they contain. For our ID attribute: This is bits. We shall call this value information value of the attribute. 19/39 Instead of using "gain" to determine which attribute to use for the next branch, we shall use "gain ratio" which we shall define as the division between gain and the information value of the attribute. For our ID attribute: gain ratio (ID)= 0.940/3.807= For the initial branch: 20/39 10
11 The calculation method presented here is the one used in WEKA for the algorithm C4.5. It is one of the best algorithms, both from a theoretical point of view and in practice. A commercial variant (C5) implemented in Clementine has a slightly better performance. 21/39 Often, too much adjusting to the training data is counterproductive (overtraining). One solution is to remove some branches of the tree ( "pruning"). Two pruning strategies: Pre-pruning: the process is done during the construction of the tree. There is some criteria to stop expanding the nodes (allowing a certain level of "impurity" in each node). Post-pruning: the process is done after the construction of the tree. Branches are removed from the bottom up to a certain limit. It uses similar criteria to pre-pruning. 22/39 11
12 The key to knowing when and how much to prune is to check the behavior of the tree with the test data. With the training data the behavior of the tree will only improve. Pruning is the key to dealing with noisy data. 23/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 24/39 12
13 Rule systems In rule systems the rules are not organized hierarchically, nor must they provide a classification for all the examples. There may be instances that are not covered by the rule set. Instead of building the rules by "divide and conquer" they use an approach based on coverage. They try to create rules that "cover" all instances of the training data set. 25/39 Rule systems Example: 26/39 13
14 Rule systems Example: 27/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 28/39 14
15 Building rule systems Another example: we have to determine the recommended type of contact lens. Some of the data: 29/39 Building rule systems All the data: Column 2, age of the patient: (1) young, (2) pre-presbyopic, (3) presbyopic. Column 3, spectacle prescription: (1) myope, (2) hypermetrope. Column 4, astigmatic: (1) no, (2) yes Column 5, tear production rate: (1) reduced, (2) normal. Column 6, contact lenses: (1) hard, (2) soft, (3) no contact lenses. 30/39 15
16 Building rule systems We start by looking for conditions that will enable us to determine whether the lenses should be hard: To do this, we calculate the number of instances for which, given the value of an attribute, we can adequately predict that the contact lenses should be hard. We then select the attribute that covers a higher percentage of instances that meet the desired criteria (hard contact lenses recommended). 31/39 Building rule systems Selection process: 32/39 16
17 Building rule systems We now try to properly classify the other 8 instances covered by the rule: 33/39 Building rule systems And we try to properly classify the other 2 instances covered by the rule: We now have a rule that classifies perfectly the three instances it covers. Now we start the process again, and so on until we have rules that cover all (or most) instances. 34/39 17
18 Agenda Rule systems Building rule systems vs rule systems Quick reference 35/39 vs rule systems The decision trees are often more efficient because each partition generates two rules. Pruning of trees is often simpler than the pruning of rules systems (although the same criteria are used in both cases). Rule systems permit non-exhaustive coverage, which is interesting if an attribute has many values, but not all are relevant. Rule systems are less voracious, so they tend to fall less in local minima. 36/39 18
19 Agenda Rule systems Building rule systems vs rule systems Quick reference 37/39 Quick reference They are used mainly for classification (supervised learning), although it is possible to use them for regression, clustering and estimation of probabilities. They are relatively efficient and capable of working with large volumes of data. They work with numeric and nominal attributes. They only work over a single table and a single attribute at a time. They do not allow relations between two attributes (for efficiency). They are tolerant to noise, not significant attributes and missing values. They build a model (rules) that is easy to understand and interpret by the end user. 38/39 19
20 Quick reference Visual comparison of their expressive power: Rule systems 39/39 20
Decision Trees. JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology. Doctoral School, Catania-Troina, April, 2008
Decision Trees JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology Doctoral School, Catania-Troina, April, 2008 Aims of this module The decision tree representation. The basic
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Decision-Tree Learning
Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values
Data Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Machine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART I What s it all about 10/25/2000 2 Data vs. information Society produces huge amounts of data
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
Data Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
Content-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
Performance Analysis of Decision Trees
Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Data Mining with Weka
Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to
Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm [email protected].
Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email protected] 42-268-7599 Copyright Andrew W. Moore Slide Decision Trees Decision trees
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski [email protected] Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
COC131 Data Mining - Clustering
COC131 Data Mining - Clustering Martin D. Sykora [email protected] Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups
Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups Achim Zeileis, Torsten Hothorn, Kurt Hornik http://eeecon.uibk.ac.at/~zeileis/ Overview Motivation: Trees, leaves, and
Data Mining: Foundation, Techniques and Applications
Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
Introduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
In this tutorial, we try to build a roc curve from a logistic regression.
Subject In this tutorial, we try to build a roc curve from a logistic regression. Regardless the software we used, even for commercial software, we have to prepare the following steps when we want build
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model
A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model ABSTRACT Mrs. Arpana Bharani* Mrs. Mohini Rao** Consumer credit is one of the necessary processes but lending bears
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
THE LAST THING A FISH NOTICES IS THE WATER IN WHICH IT SWIMS COMPETITIVE MARKET ANALYSIS: AN EXAMPLE FOR MOTOR INSURANCE PRICING RISK
THE LAST THING A FISH NOTICES IS THE WATER IN WHICH IT SWIMS COMPETITIVE MARKET ANALYSIS: AN EXAMPLE FOR MOTOR INSURANCE Topic: PRICING RISK Authors: Santoni, Alessandro Towers Perrin Via Boezio, 6 00193
Classification On The Clouds Using MapReduce
Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal [email protected] Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal [email protected]
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
A Data Mining Tutorial
A Data Mining Tutorial Presented at the Second IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 98) 14 December 1998 Graham Williams, Markus Hegland and Stephen
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Interactive Data Mining and Design of Experiments: the JMP Partition and Custom Design Platforms
: the JMP Partition and Custom Design Platforms Marie Gaudard, Ph. D., Philip Ramsey, Ph. D., Mia Stephens, MS North Haven Group March 2006 Table of Contents Abstract... 1 1. Data Mining... 1 1.1. What
Data Mining Tools. Jean- Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, 75252 Paris, Cedex 05 Jean- Gabriel.Ganascia@lip6.
Data Mining Tools Jean- Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, 75252 Paris, Cedex 05 Jean- [email protected] DATA BASES Data mining Extraction Data mining Interpretation/
Using Trace Clustering for Configurable Process Discovery Explained by Event Log Data
Master of Business Information Systems, Department of Mathematics and Computer Science Using Trace Clustering for Configurable Process Discovery Explained by Event Log Data Master Thesis Author: ing. Y.P.J.M.
EFFICIENT CLASSIFICATION OF BIG DATA USING VFDT (VERY FAST DECISION TREE)
EFFICIENT CLASSIFICATION OF BIG DATA USING VFDT (VERY FAST DECISION TREE) Sourav Roy 1, Brina Patel 2, Samruddhi Purandare 3, Minal Kucheria 4 1 Student, Computer Department, MIT College of Engineering,
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
Professor Anita Wasilewska. Classification Lecture Notes
Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,
SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING
I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 233-237 SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING K. SARULADHA 1 AND L. SASIREKA 2 1 Assistant Professor, Department of Computer Science and
BIRCH: An Efficient Data Clustering Method For Very Large Databases
BIRCH: An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny CPSC 504 Presenter: Discussion Leader: Sophia (Xueyao) Liang HelenJr, Birches. Online Image.
Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
Data Mining on Streams
Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Weather forecast prediction: a Data Mining application
Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,[email protected],8407974457 Abstract
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Clustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies
Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Spam
Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System
International Journal Of Computational Engineering Research (ijceronline.com) Vol. 3 Issue. 3 Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System
The KDD Process: Applying Data Mining
The KDD Process: Applying Nuno Cavalheiro Marques ([email protected]) Spring Semester 2010/2011 MSc in Computer Science Outline I 1 Knowledge Discovery in Data beyond the Computer 2 by Visualization Lift
Specific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Business Analytics and Credit Scoring
Study Unit 5 Business Analytics and Credit Scoring ANL 309 Business Analytics Applications Introduction Process of credit scoring The role of business analytics in credit scoring Methods of logistic regression
Day 2: Machine Learning & Data Mining Basics. Beibei Li Carnegie Mellon University
Day 2: Machine Learning & Data Mining Basics Beibei Li Carnegie Mellon University 1 Getting a Job? not what you know, but who you know What exactly matters? Size? Quality (strength of tie)? How strong/weak
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
2 Decision tree + Cross-validation with R (package rpart)
1 Subject Using cross-validation for the performance evaluation of decision trees with R, KNIME and RAPIDMINER. This paper takes one of our old study on the implementation of cross-validation for assessing
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
A Decision Tree for Weather Prediction
BULETINUL UniversităŃii Petrol Gaze din Ploieşti Vol. LXI No. 1/2009 77-82 Seria Matematică - Informatică - Fizică A Decision Tree for Weather Prediction Elia Georgiana Petre Universitatea Petrol-Gaze
from Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
Optimization of C4.5 Decision Tree Algorithm for Data Mining Application
Optimization of C4.5 Decision Tree Algorithm for Data Mining Application Gaurav L. Agrawal 1, Prof. Hitesh Gupta 2 1 PG Student, Department of CSE, PCST, Bhopal, India 2 Head of Department CSE, PCST, Bhopal,
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
