# Classification Techniques (1)

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distance-based Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality of Classifiers Data Mining Lecture 3: Classification 1 2 Classification Problem Given a database D = {t 1,t 2,,t n } and a finite set of classes C = {C 1,,C m }, the Classification Problem is to define a mapping f:d C where each t i is assigned to one class. Actually, f divides D into equivalence classes. Prediction is a similar process, but may be viewed as having an infinite number of classes. Data Mining Lecture 3: Classification 1 3 Classification: Definition Given a collection of records (training set) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. Data Mining Lecture 3: Classification 1 4 Illustrating Classification Task Eamples of Classification Tasks Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No Learning algorithm Predicting tumor cells as benign or malignant 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes Induction Learn Model Classifying credit card transactions as legitimate or fraudulent 9 No Medium 75K No 10 No Small 90K Yes Training Set Attrib1 Attrib2 Attrib3 Tid Class 11 No Small 55K? Apply Model Model Classifying secondary structures of protein as alpha-heli, beta-sheet, or random coil 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? 15 No Large 67K? Test Set Deduction Categorizing news stories as finance, weather, entertainment, sports, etc Data Mining Lecture 3: Classification 1 5 Data Mining Lecture 3: Classification 1 6 1

2 More Classification Eamples Classification Eample: Grading Eams Classify students grades as VG, G, or U. Identify mushrooms as poisonous or edible. Classify stocks as buy, keep, or sell. Identify individuals with credit risks. Perform speech recognition. Perform pattern recognition. If >= 28 then grade = VG If 18 <= < 28 then grade = G If < 18 then grade = U <28 <18 U >=28 VG >=18 G Data Mining Lecture 3: Classification 1 7 Data Mining Lecture 3: Classification 1 8 Classification Eample: Letter Recognition View letters as constructed from 5 components: Letter A Letter B Classification Techniques Approach: Create specific model by evaluating training data (or by using knowledge from domain eperts). Apply model developed to new data. Classes must be predefined. Letter C Letter E Letter D Letter F Most common techniques use decision trees (DTs), neural networks (NNs), or are based on distances or statistical methods. Data Mining Lecture 3: Classification 1 9 Data Mining Lecture 3: Classification 1 10 Defining Classes Issues in Classification 10 5 Class B Class A Class C Partitioning Based 10 5 Distance Based Class B Class A Class C Missing Data Ignore Replace with assumed value Handling of Outliers in Training Data Measuring Performance Classification accuracy on test data Confusion matri Data Mining Lecture 3: Classification 1 11 Data Mining Lecture 3: Classification

3 Handling of Outliers in Training Data Height Eample Data? With Outliers Without Outliers Data Mining Lecture 3: Classification 1 13 Nam e Gender Height O utput1 O utput2 Kristina F 1.60 Short M edium Jim M 2.02 Tall M edium M aggie F 1.90 M edium Tall M artha F 1.88 M edium Tall Stephanie F 1.71 Short M edium Bob M 1.85 M edium M edium Kathy F 1.60 Short M edium Dave M 1.72 Short M edium W orth M 2.12 Tall Tall Steven M 2.10 Tall Tall Debbie F 1.78 M edium M edium Todd M 1.95 M edium M edium Kim F 1.89 M edium Tall Am y F 1.81 M edium M edium W ynette F 1.75 M edium M edium Data Mining Lecture 3: Classification 1 14 Classification Performance Confusion Matri Eample True Positive False Negative Using height data eample with Output1 correct and Output2 actual assignment Tall Tall Classified Tall Classified Classified Tall Classified A c tu a l A s s ig n m e n t M e m b e rs h ip S h o rt M e d iu m T a ll S h o rt M e d iu m T a ll False Positive True Negative Data Mining Lecture 3: Classification 1 15 Data Mining Lecture 3: Classification 1 16 Regression Linear Regression Poor Fit Assume data fits a predefined function Determine best values for regression coefficients c 0,c 1,,c n. Assume an error: y = c 0 +c c n n +ε Estimate error using mean squared error for training set: Data Mining Lecture 3: Classification 1 17 Data Mining Lecture 3: Classification

4 Classification Using Regression Division Division: Use regression function to divide area into regions. Prediction: Use regression function to predict a class membership function. Input includes desired class. Data Mining Lecture 3: Classification 1 19 Data Mining Lecture 3: Classification 1 20 Prediction Instance Based Classifiers Eamples: Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training eamples eactly Nearest neighbor Uses k closest points (nearest neighbors) for performing classification Data Mining Lecture 3: Classification 1 21 Data Mining Lecture 3: Classification 1 22 Nearest Neighbor Classifiers Basic idea: If it walks like a duck, quacks like a duck, then it s probably a duck Training Records Compute Distance Choose k of the nearest records Test Record Classification Using Distance Measures Place items in the class to which they are closest. Must determine distance between an item and a class. Classes represented by Centroid: Central value. Medoid: Representative point. A set of individual points Algorithm: K-Nearest Neighbors (KNN) Data Mining Lecture 3: Classification 1 23 Data Mining Lecture 3: Classification

5 Nearest-Neighbor Classifiers Definition of Nearest Neighbor Unknown record Requires three things The set of stored records Distance Metric to compute distance between records The value of k, the number of nearest neighbors to retrieve X X X To classify an unknown record: Compute distance to other training records Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record are data points that have the k smallest distance to Data Mining Lecture 3: Classification 1 25 Data Mining Lecture 3: Classification Nearest Neighbor Voronoi Diagram Nearest Neighbor Classification Compute distance between two points: Euclidean distance d( p, q) = i ( p i q i ) 2 Determine the class from nearest neighbor list take the majority vote of class labels among the k- nearest neighbors Weigh the vote according to distance weight factor, w = 1/d 2 Data Mining Lecture 3: Classification 1 27 Data Mining Lecture 3: Classification 1 28 K Nearest Neighbors (KNN): Training set includes classes. Eamine K items near item to be classified. New item placed in class with the most number of close items. O(q) for each tuple to be classified. (Here q is the size of the training set.) Data Mining Lecture 3: Classification 1 29 KNN Algorithm Input D // training data N // neighbors t // tuple to classify Output c // class to which t gets classified KNN Algorithm N = ; for each d D do if N < K then N = N {d} ; else u = the item in N with ma distance (dissimilarity) from t ; if sim(t,u) < sim(t,d) then N = (N {u}) {d} ; c = the class to which most n N are classified ; Data Mining Lecture 3: Classification

6 KNN Algorithm Nearest Neighbor Classification: Issues? Choosing the value of k: If k is too small, sensitive to noise points If k is too large, neighborhood may include points from other classes X Data Mining Lecture 3: Classification 1 31 Data Mining Lecture 3: Classification 1 32 Nearest Neighbor Classification: More issues Scaling issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Eample: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from \$10K to \$1M Nearest Neighbor Classification: Wrap-up k-nn classifiers are lazy learners They do not build models eplicitly Unlike eager learners such as decision tree induction and rule-based systems Classifying unknown records is relatively epensive Data Mining Lecture 3: Classification 1 33 Data Mining Lecture 3: Classification

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

### Data Mining Classification: Alternative Techniques. Instance-Based Classifiers. Lecture Notes for Chapter 5. Introduction to Data Mining

Data Mining Classification: Alternative Techniques Instance-Based Classifiers Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar Set of Stored Cases Atr1... AtrN Class A B

### Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

### Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

### Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

### Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status

Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of

### Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

### Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

### Foundations of Artificial Intelligence. Introduction to Data Mining

Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### Review: Classification Outline

Data Miig CS 341, Sprig 2007 Decisio Trees Neural etworks Review: Lecture 6: Classificatio issues, regressio, bayesia classificatio Pretice Hall 2 Data Miig Core Techiques Classificatio Clusterig Associatio

### Data mining knowledge representation

Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

### Classification and Prediction

Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

### C19 Machine Learning

C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

### COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

### What is Classification? Data Mining Classification. Certainty. Usual Examples. Predictive / Definitive. Techniques

What is Classification? Data Mining Classification Kevin Swingler Assigning an object to a certain class based on its similarity to previous examples of other objects Can be done with reference to original

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### 203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

### DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

### Data Mining Individual Assignment report

Björn Þór Jónsson bjrr@itu.dk Data Mining Individual Assignment report This report outlines the implementation and results gained from the Data Mining methods of preprocessing, supervised learning, frequent

### DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

### International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

### Quick Introduction of Data Mining Techniques

Quick Introduction of Data Mining Techniques *Sources partially from Introduction to Data Mining, by P.-N. Tan, M. Steinbach, V. Kumar, Addison-Wesley, 2005. Main Data Mining Techniques Link Analysis Associations

### Instance-Based Learning. Instance-Based Learning

Rote Learning k Nearest-Neighbor Classification Prediction, Weighted Prediction choosing k feature weighting (RELIEF) instance weighting (PEBLS) efficiency kd-trees IBL and Rule Learning EACH: Nearest

### K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### Introduction to Data Mining

Introduction to Data Mining Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Course Overview Introduction to Data Mining

### Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster

### Introduction to Data Mining

Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

Introduction to Artificial Intelligence G51IAI An Introduction to Data Mining Learning Objectives Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees

### Introduction to Statistical Machine Learning

CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

### Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

### Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

### Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

### ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

### Cluster Analysis: Basic Concepts and Algorithms

Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths

### Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

### Data Mining Classification

Data Mining Classification Jingpeng Li 1 of 26 What is Classification? Assigning an object to a certain class based on its similarity to previous examples of other objects Can be done with reference to

### K-nearest-neighbor: an introduction to machine learning

K-nearest-neighbor: an introduction to machine learning Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1 Outline Types of learning Classification:

### Segmentation of stock trading customers according to potential value

Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research

### Lecture 6 - Data Mining Processes

Lecture 6 - Data Mining Processes Dr. Songsri Tangsripairoj Dr.Benjarath Pupacdi Faculty of ICT, Mahidol University 1 Cross-Industry Standard Process for Data Mining (CRISP-DM) Example Application: Telephone

### Data Preprocessing. Week 2

Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

### Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

### Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

### Machine Learning in Spam Filtering

Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering

### Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 3. Clustering - Descriptive

### Mining Educational Data to Improve Students Performance: A Case Study

Mining Educational Data to Improve Students Performance: A Case Study Mohammed M. Abu Tair, Alaa M. El-Halees Faculty of Information Technology Islamic University of Gaza Gaza, Palestine ABSTRACT Educational

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Practical Introduction to Machine Learning and Optimization. Alessio Signorini <alessio.signorini@oneriot.com>

Practical Introduction to Machine Learning and Optimization Alessio Signorini Everyday's Optimizations Although you may not know, everybody uses daily some sort of optimization

### A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining Sakshi Department Of Computer Science And Engineering United College of Engineering & Research Naini Allahabad sakshikashyap09@gmail.com

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

### Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar

### CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015

CPSC 340: Machine Learning and Data Mining K-Means Clustering Fall 2015 Admin Assignment 1 solutions posted after class. Tutorials for Assignment 2 on Monday. Random Forests Random forests are one of the

### Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

### Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

### 1 What is Machine Learning?

COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #1 Scribe: Rob Schapire February 4, 2008 1 What is Machine Learning? Machine learning studies computer algorithms for learning to do

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

### Entropy and Information Gain

Entropy and Information Gain The entropy (very common in Information Theory) characterizes the (im)purity of an arbitrary collection of examples Information Gain is the expected reduction in entropy caused

### Unsupervised learning: Clustering

Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

### Unsupervised Learning: Clustering with DBSCAN Mat Kallada

Unsupervised Learning: Clustering with DBSCAN Mat Kallada STAT 2450 - Introduction to Data Mining Supervised Data Mining: Predicting a column called the label The domain of data mining focused on prediction:

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

### Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

### Data Mining: Foundation, Techniques and Applications

Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

### Machine Learning for NLP

Natural Language Processing SoSe 2015 Machine Learning for NLP Dr. Mariana Neves May 4th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Introduction Field of study that gives computers the ability

### Course 395: Machine Learning

Course 395: Machine Learning Lecturers: Maja Pantic (maja@doc.ic.ac.uk) Stavros Petridis (sp104@doc.ic.ac.uk) Goal (Lectures): To present basic theoretical concepts and key algorithms that form the core

Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:

### 15.564 Information Technology I. Business Intelligence

15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses

### Unsupervised Learning. What is clustering for? What is clustering for? (cont )

Unsupervised Learning c: Artificial Intelligence The University of Iowa Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

### CLASSIFICATION JELENA JOVANOVIĆ. Web:

CLASSIFICATION JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is classification? Binary and multiclass classification Classification algorithms Performance measures

### Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

### Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

### Data Mining 5. Cluster Analysis

Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables

### Introduction of Information Visualization and Visual Analytics. Chapter 4. Data Mining

Introduction of Information Visualization and Visual Analytics Chapter 4 Data Mining Books! P. N. Tan, M. Steinbach, V. Kumar: Introduction to Data Mining. First Edition, ISBN-13: 978-0321321367, 2005.

### Chapter 7. Diagnosis and Prognosis of Breast Cancer using Histopathological Data

Chapter 7 Diagnosis and Prognosis of Breast Cancer using Histopathological Data In the previous chapter, a method for classification of mammograms using wavelet analysis and adaptive neuro-fuzzy inference

### Machine Learning: Overview

Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

### Weiwei Cheng & Eyke Hüllermeier

Weiwei Cheng & Eyke Hüllermeier Knowledge Engineering & Bioinformatics Lab Department of Mathematics and Computer Science University of Marburg, Germany Multilabel Classification cloud sky tree 1/16 What

### Issues in Information Systems Volume 16, Issue IV, pp. 30-36, 2015

DATA MINING ANALYSIS AND PREDICTIONS OF REAL ESTATE PRICES Victor Gan, Seattle University, gany@seattleu.edu Vaishali Agarwal, Seattle University, agarwal1@seattleu.edu Ben Kim, Seattle University, bkim@taseattleu.edu

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

### Hadoop Operations Management for Big Data Clusters in Telecommunication Industry

Hadoop Operations Management for Big Data Clusters in Telecommunication Industry N. Kamalraj Asst. Prof., Department of Computer Technology Dr. SNS Rajalakshmi College of Arts and Science Coimbatore-49

### Classification: Naïve Bayes Classifier Evaluation. Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Classification: Naïve Bayes Classifier Evaluation Toon Calders ( t.calders@tue.nl ) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Last Lecture Classification

### Chapter 7. Cluster Analysis

Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1

Chapter 4 Data Mining A Short Introduction 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining