# Classification and Prediction

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser University, Canada Han: KDD --- Classification 1

2 Classification A Two-Step Process Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction: training set The model is represented as classification rules, decision trees, or mathematical formulae Model usage: for classifying future or unknown objects Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set, otherwise over-fitting will occur Han: KDD --- Classification 2

3 Classification Process (1): Model Construction Training Data Classification Algorithms NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no Classifier (Model) IF rank = professor OR years > 6 THEN tenured = yes Han: KDD --- Classification 3

4 Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data NAME RANK YEARS TENURED Tom Assistant Prof 2 no Merlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes (Jeff, Professor, 4) Tenured? Han: KDD --- Classification 4

5 Supervised vs. Unsupervised Learning Supervised learning (e.g. classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations New data is classified based on the training set Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data Han: KDD --- Classification 5

6 Evaluating Classification Methods Predictive accuracy Speed and scalability time to construct the model time to use the model Robustness handling noise and missing values Scalability efficiency in disk-resident databases Interpretability: understanding and insight provded by the model Goodness of rules decision tree size compactness of classification rules Han: KDD --- Classification 6

7 Classification by Decision Tree Induction Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution Decision tree generation consists of two phases Tree construction At start, all the training examples are at the root Partition examples recursively based on selected attributes Tree pruning Identify and remove branches that reflect noise or outliers Use of decision tree: Classifying an unknown sample Test the attribute values of the sample against the decision tree Han: KDD --- Classification 7

8 Training Dataset This follows an example from Quinlan s ID3 age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no excellent no Han: KDD --- Classification 8

9 Output: A Decision Tree for buys_computer age? <=30 overcast >40 student? yes credit rating? no yes fair excellent no yes no yes Han: KDD --- Classification 9

10 Algorithm for Decision Tree Induction Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) Conditions for stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning majority voting is employed for classifying the leaf There are no samples left Han: KDD --- Classification 10

11 Attribute Selection Measure Information gain (ID3/C4.5) All attributes are assumed to be categorical Can be modified for continuous-valued attributes Gini index (IBM IntelligentMiner) All attributes are assumed continuous-valued Assume there exist several possible split values for each attribute May need other tools, such as clustering, to get the possible split values Can be modified for categorical attributes Han: KDD --- Classification 11

12 Gini Index (IBM IntelligentMiner) If a data set T contains examples from n classes, gini index, gini(t) is defined as n gini( T) = 1 p 2 j j= 1 where p j is the relative frequency of class j in T. If a data set T is split into two subsets T 1 and T 2 with sizes N 1 and N 2 respectively, the gini index of the split data contains examples from n classes, the gini index gini(t) is defined as ( ) N1 ( ) N 2 gini split T = gini T1 + gini( T 2) N N The attribute provides the smallest gini split (T) is chosen to split the node (need to enumerate all possible splitting points for each attribute). Han: KDD --- Classification 12

13 Avoid Overfitting in Classification The generated tree may overfit the training data Too many branches, some may reflect anomalies due to noise or outliers Result is in poor accuracy for unseen samples Two approaches to avoid overfitting Prepruning: Halt tree construction early do not split a node if this would result in the goodness measure falling below a threshold Difficult to choose an appropriate threshold Postpruning: Remove branches from a fully grown tree get a sequence of progressively pruned trees Use a set of data different from the training data to decide which is the best pruned tree Han: KDD --- Classification 13

14 Classification in Large Databases Classification a classical problem extensively studied by statisticians and machine learning researchers Scalability: Classifying data sets with millions of examples and hundreds of attributes with reasonable speed Why decision tree induction in data mining? relatively faster learning speed (than other classification methods) convertible to simple and easy to understand classification rules can use SQL queries for accessing databases comparable classification accuracy with other methods Han: KDD --- Classification 14

15 Scalable Decision Tree Induction Methods in Data Mining Studies SLIQ (EDBT 96 Mehta et al.) builds an index for each attribute and only class list and the current attribute list reside in memory SPRINT (VLDB 96 J. Shafer et al.) constructs an attribute list data structure PUBLIC (VLDB 98 Rastogi & Shim) integrates tree splitting and tree pruning: stop growing the tree earlier RainForest (VLDB 98 Gehrke, Ramakrishnan & Ganti) separates the scalability aspects from the criteria that determine the quality of the tree builds an AVC-list (attribute, value, class label) Han: KDD --- Classification 15

### COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

### Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

### Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

### Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

### CS570 Introduction to Data Mining. Classification and Prediction. Partial slide credits: Han and Kamber Tan,Steinbach, Kumar

CS570 Introduction to Data Mining Classification and Prediction Partial slide credits: Han and Kamber Tan,Steinbach, Kumar 1 Classification and Prediction Overview Classification algorithms and methods

### Classification and Prediction

Classification and Prediction 1. Objectives...2 2. Classification vs. Prediction...3 2.1. Definitions...3 2.2. Supervised vs. Unsupervised Learning...3 2.3. Classification and Prediction Related Issues...4

### Data Mining: Foundation, Techniques and Applications

Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing

### Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 3. Clustering - Descriptive

### Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

### Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Slides for Textbook Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab Simon Fraser University, Ari Visa,, Institute of Signal Processing

### Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

### Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

### Decision Trees. JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology. Doctoral School, Catania-Troina, April, 2008

Decision Trees JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology Doctoral School, Catania-Troina, April, 2008 Aims of this module The decision tree representation. The basic

### Comparative Analysis of Serial Decision Tree Classification Algorithms

Comparative Analysis of Serial Decision Tree Classification Algorithms Matthew N. Anyanwu Department of Computer Science The University of Memphis, Memphis, TN 38152, U.S.A manyanwu @memphis.edu Sajjan

### Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1

Chapter 4 Data Mining A Short Introduction 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining

### Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

### T3: A Classification Algorithm for Data Mining

T3: A Classification Algorithm for Data Mining Christos Tjortjis and John Keane Department of Computation, UMIST, P.O. Box 88, Manchester, M60 1QD, UK {christos, jak}@co.umist.ac.uk Abstract. This paper

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Learning from Data: Decision Trees

Learning from Data: Decision Trees Amos Storkey, School of Informatics University of Edinburgh Semester 1, 2004 LfD 2004 Decision Tree Learning - Overview Decision tree representation ID3 learning algorithm

### Data mining techniques: decision trees

Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

### Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status

Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of

### Performance Analysis of Decision Trees

Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India

### Mauro Sousa Marta Mattoso Nelson Ebecken. and these techniques often repeatedly scan the. entire set. A solution that has been used for a

Data Mining on Parallel Database Systems Mauro Sousa Marta Mattoso Nelson Ebecken COPPEèUFRJ - Federal University of Rio de Janeiro P.O. Box 68511, Rio de Janeiro, RJ, Brazil, 21945-970 Fax: +55 21 2906626

### Information Management course

Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

### Roadmap DB Sys. Design & Impl. Citation. Detailed Outline. Data Ware-housing. Problem. Data Cubes. Christos Faloutsos

572 DB Sys. Design & mpl. Data Cubes Christos Faloutsos www.cs.cmu.edu/~christos Roadmap ) Roots: System R and ngres 2) mplementation: buffering, indexing, qopt 3) Transactions: locking, recovery 4) Distributed

### Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,

### Decision tree algorithm short Weka tutorial

Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili Machine leanring for Web Mining a.a. 2009-2010 Machine Learning: brief summary Example You need to write a program that: given a

### Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Chapter 6 Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei Han and Micheline Kamber, All rights

### A Data Mining Tutorial

A Data Mining Tutorial Presented at the Second IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 98) 14 December 1998 Graham Williams, Markus Hegland and Stephen

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

### Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

### Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

### Optimization of C4.5 Decision Tree Algorithm for Data Mining Application

Optimization of C4.5 Decision Tree Algorithm for Data Mining Application Gaurav L. Agrawal 1, Prof. Hitesh Gupta 2 1 PG Student, Department of CSE, PCST, Bhopal, India 2 Head of Department CSE, PCST, Bhopal,

### Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge

### Introduction to Data Mining

Introduction to Data Mining Part 5: Prediction Spring 2015 Ming Li Department of Computer Science and Technology Nanjing University Prediction Predictive modeling can be thought of as learning a mapping

### Rule based Classification of BSE Stock Data with Data Mining

International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

### International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

### CLASSIFICATION. Data Mining 1

CLASSIFICATION Data Mining 1 Classification and Prediction Classification and prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict

### Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model

10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Decision-Tree Learning

Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values

### Classification with Decision Trees

Classification with Decision Trees Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 24 Y Tao Classification with Decision Trees In this lecture, we will discuss

### Data Mining Practical Machine Learning Tools and Techniques

Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Outline Terminology What s a concept Classification, association, clustering, numeric

### Clustering through Decision Tree Construction in Geology

Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 29-41 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty

### Fig. 1 A typical Knowledge Discovery process [2]

Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

### Data mining knowledge representation

Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

### Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

### Data Mining based on Rough Set and Decision Tree Optimization

Data Mining based on Rough Set and Decision Tree Optimization College of Information Engineering, North China University of Water Resources and Electric Power, China, haiyan@ncwu.edu.cn Abstract This paper

### Smart Grid Data Analytics for Decision Support

1 Smart Grid Data Analytics for Decision Support Prakash Ranganathan, Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA Prakash.Ranganathan@engr.und.edu, 701-777-4431

### RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

RainForest - A Framework for Fast Decision Tree Construction of Large Datasets Johannes Gehrke Raghu Ramakrishnan Venkatesh Ganti Department of Computer Sciences, University of Wisconsin-Madison johannes,raghu,vganti

### Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad

### BIRCH: An Efficient Data Clustering Method For Very Large Databases

BIRCH: An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny CPSC 504 Presenter: Discussion Leader: Sophia (Xueyao) Liang HelenJr, Birches. Online Image.

### Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies

Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Spam

### Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

### CLOUDS: A Decision Tree Classifier for Large Datasets

CLOUDS: A Decision Tree Classifier for Large Datasets Khaled Alsabti Department of EECS Syracuse University Sanjay Ranka Department of CISE University of Florida Vineet Singh Information Technology Lab

### (b) How data mining is different from knowledge discovery in databases (KDD)? Explain.

Q2. (a) List and describe the five primitives for specifying a data mining task. Data Mining Task Primitives (b) How data mining is different from knowledge discovery in databases (KDD)? Explain. IETE

### Generalization and Decision Tree Induction: Efficient Classification in Data Mining

Generalization and Decision Tree Induction: Efficient Classification in Data Mining Micheline Kamber Lara Winstone Wan Gong Shan Cheng Jiawei Han Database Systems Research Laboratory School of Computing

### Inductive Learning in Less Than One Sequential Data Scan

Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Research Hawthorne, NY 10532 {weifan,haixun,psyu}@us.ibm.com Shaw-Hwa Lo Statistics Department,

### ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,

### A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### Foundations of Artificial Intelligence. Introduction to Data Mining

Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present

### Decision Tree Induction in High Dimensional, Hierarchically Distributed Databases

Decision Tree Induction in High Dimensional, Hierarchically Distributed Databases Amir Bar-Or, Assaf Schuster, Ran Wolff Faculty of Computer Science Technion, Israel {abaror, assaf, ranw}@cs.technion.ac.il

### Data Mining. Session 7 Main Theme Classification and Prediction. Dr. Jean-Claude Franchitti

Data Mining Session 7 Main Theme Classification and Prediction Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences Adapted from course

### A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

### Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu.

Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 42-268-7599 Copyright Andrew W. Moore Slide Decision Trees Decision trees

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is

### Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

### Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

### Introduction to Machine Learning Connectionist and Statistical Language Processing

Introduction to Machine Learning Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Introduction to Machine Learning p.1/22

### Introduction to Learning & Decision Trees

Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

### Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

### Data Mining Fundamentals

Part I Data Mining Fundamentals Data Mining: A First View Chapter 1 1.11 Data Mining: A Definition Data Mining The process of employing one or more computer learning techniques to automatically analyze

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

### Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

### EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

### A Survey of Classification Techniques in the Area of Big Data.

A Survey of Classification Techniques in the Area of Big Data. 1PrafulKoturwar, 2 SheetalGirase, 3 Debajyoti Mukhopadhyay 1Reseach Scholar, Department of Information Technology 2Assistance Professor,Department

### ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### Decision Trees (Sections )

Decision Trees (Sections 8.1-8.4) Non-metric methods CART (Classification & regression Trees) Number of splits Query selection & node impurity Multiway splits When to stop splitting? Pruning Assignment

### COMPARATIVE STUDY OF DATA MINING MODEL FOR CREDIT CARD APPLICATION SCORING IN BANK

COMPARATIVE STUDY OF DATA MINING MODEL FOR CREDIT CARD APPLICATION SCORING IN BANK 1 EVARISTUS DIDIK MADYATMADJA, 2 MEDIANA ARYUNI 1 School of Information Systems, Bina Nusantara University, Jakarta, Indonesia

### DATA MINING METHODS WITH TREES

DATA MINING METHODS WITH TREES Marta Žambochová 1. Introduction The contemporary world is characterized by the explosion of an enormous volume of data deposited into databases. Sharp competition contributes

### ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com

### A Systematic Approach on Data Pre-processing In Data Mining

ISSN:2320-0790 A Systematic Approach on Data Pre-processing In Data Mining S.S.Baskar 1, Dr. L. Arockiam 2, S.Charles 3 1 Research scholar, Department of Computer Science, St. Joseph s College, Trichirappalli,

### Clustering Via Decision Tree Construction

Clustering Via Decision Tree Construction Bing Liu 1, Yiyuan Xia 2, and Philip S. Yu 3 1 Department of Computer Science, University of Illinois at Chicago, 851 S. Morgan Street, Chicago, IL 60607-7053.

### Web Document Clustering

Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

### PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.

### Classification On The Clouds Using MapReduce

Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt

### An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

### Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview

### Data Mining on Streams

Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

### CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

### Scalable Classification over SQL Databases

Scalable Classification over SQL Databases Surajit Chaudhuri Usama Fayyad Jeff Bernhardt Microsoft Research Redmond, WA 98052, USA Email: {surajitc,fayyad, jeffbern}@microsoft.com Abstract We identify

### A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining Sakshi Department Of Computer Science And Engineering United College of Engineering & Research Naini Allahabad sakshikashyap09@gmail.com