# Introduction to Machine Learning Connectionist and Statistical Language Processing

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Introduction to Machine Learning Connectionist and Statistical Language Processing Frank Keller Computerlinguistik Universität des Saarlandes Introduction to Machine Learning p.1/22

2 Overview definition of learning sample data set terminology: concepts, instances, attributes learning rules learning decision trees types of learning evaluating learning performance learning bias Literature: Witten and Frank (2000: ch. 1 3), Mitchell (1997: ch. 1, 2). Introduction to Machine Learning p.2/22

3 Definition of Learning From Mitchell (1997: 2): A computer program is said to learn from experimence E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. From Witten and Frank (2000: 6): things learn when they change their behavior in a way that makes them perform better in the future. In practice this means: we have sets of examples from which we want to extract regularities. Introduction to Machine Learning p.3/22

4 A Sample Data Set Fictional data set that describes the weather conditions for playing some unspecified game. outlook temp. humidity windy play sunny hot high false no sunny hot high true no overcast hot high false yes rainy mild high false yes rainy cool normal false yes rainy cool normal true no overcast cool normal true yes outlook temp. humidity windy play sunny mild high false no sunny cool normal false yes rainy mild normal false yes sunny mild normal true yes overcast mild high true yes overcast hot normal false yes rainy mild high true no Introduction to Machine Learning p.4/22

5 Terminology Instance: single example in a data set. Example: each of the rows in the table on the preceding slide. Attribute: an aspect of an instance. Example: outlook, temperature, humidity, windy. Also called feature. Attributes can take categorical or numeric values. Value: category that an attribute can take. Example: sunny, overcast, rainy for the attribute outlook. The attribute temperature could also take numeric values. Concept: the thing to be learned. Example: a classification of the instances into play and no play. Introduction to Machine Learning p.5/22

6 Learning Rules Example for a set of rules learned from the example data set: if outlook = sunny and humidity = high if outlook = rainy and windy = true if outlook = overcast if humidity = normal if none of the above then play = no then play = no then play = yes then play = yes then play = yes This is called a decision list, and is use as follows: use the first rule first, if it doesn t apply, use the second one, etc. These are classification rules that assign an output class (play or not) to each instance. Introduction to Machine Learning p.6/22

7 Learning Rules Here is a different set of rules learned from the data set: if temperature = cool if humidity = normal and windy = false if outlook = sunny and play = no if windy = false and play = no then humidity = normal then play = yes then humidity = high then outlook = sunny and humidity = high These are association rules that describe associations between different attribute values. Introduction to Machine Learning p.7/22

8 Learning Decision Trees Example: XOR (familiar from connectionist networks). x = 1? x y class 0 0 b 0 1 a 1 0 a 1 1 b no yes y = 1? y = 1? no yes no yes b a a b Nodes represent decisions on attributes, leaves represent classifications. Introduction to Machine Learning p.8/22

9 Learning Decision Trees Any decision tree can be turned into a set of rules. Traverse the tree depth first and create a conjunction for each node that you hit. if x = 0 and y = 0 if x = 0 and y = 1 if x = 1 and y = 0 if x = 1 and y = 1 then class = b then class = a then class = a then class = b Introduction to Machine Learning p.9/22

10 Learning Decision Trees However, this methods creates rule set that can be highly redundant. A better rule set would be: if x = 0 and y = 1 if x = 1 and y = 0 Otherwise then class = a then class = a class = b Inverse problem: it is not straightforward to turn a rule set into a decision tree (redundancy: replicated subtree problem). Introduction to Machine Learning p.10/22

11 Types of Learning Machine learning is not only about classification. The following main classes of problems exist: Classification: learn to put instances into pre-defined classes. Association: learn relationships between attributes. Numeric prediction: learn to predict a numeric quantity instead of a class. Clustering: discover classes of instances that belong together. Introduction to Machine Learning p.11/22

12 Comparison with Connectionist Nets Connectionist nets are machine learning engines that can be used for these four tasks. Examples include: Classification: competitive network: selects one unit in the output layer (target class) Association: pattern associator: recalls input patterns based on similarity Numeric prediction: perceptron: outputs a real-valued function in the output layer Clustering: self-organizing map: reduces dimensions in the input space based on similarity Introduction to Machine Learning p.12/22

13 Evaluating Learning Performance What does it mean for a model to successfully learn a concept? descriptive: captures the training data; predictive: generalizes to unseen data; explanatory: provides a plausible description of the concept to be learned. Introduction to Machine Learning p.13/22

14 Descriptive Evaluation Measures of model fit commonly used in computational linguistics (originally proposed for information retrieval): Precision: how many data points the model gets right: Precision = data points modeled correctly data points modeled Recall: how many data points the model accounts for: Recall = data points modeled correctly total data points Introduction to Machine Learning p.14/22

15 Descriptive Evaluation Measure of model fit commonly used in connectionist nets: Mean Squared Error: MSE = 1 n n i=1 (H i H i ) 2 H i : quantity observed in the data set H i : quantity predicted by model n: number of instances in the data set Traditionally, precision/recall has been used for classification tasks, while MSE has been used for numeric prediction tasks. Introduction to Machine Learning p.15/22

16 Predictive Evaluation Is the model able to generalize? Can it deal with unseen data, or does it overfit the data? Test on held-out data: split data to be modeled in training and test set; train the model (determine its parameters) on training set; apply model to training set; compute model fit apply model to test set; compute model fit; difference between compare model fit on training and test data measured the model s ability to generalize. Introduction to Machine Learning p.16/22

17 Explanatory Evaluation The concept of explanatory adequacy is elusive. Does the model provide a plausible description of the concept to be learned? Classification: does it base its classification on plausible classification rules? Association: does it discover plausible relationships in the data? Clustering: Does it come up with plausible clusters? The meaning of plausible to be defined by a human expert. Introduction to Machine Learning p.17/22

18 Learning Bias To generalize successfully, a machine learning system uses a learning bias to guide it through the space of possible concepts. Also called inductive bias (Mitchell 1997). Language bias: the language in which the result is expressed determines which concepts can be learned. Example: a learner that can learn disjunctive rules will get different results than one that doesn t use disjunction. An important factor is domain knowledge, e.g., the knowledge that certain combinations of attributes can never occur. Introduction to Machine Learning p.18/22

19 Learning Bias Search bias: the way the space of possible concepts is searched determines the outcome of learning. Example: greedy search: try to find the best rule at each stage and add it to the rule set; beam search: pursue a number of alternative rule sets in parallel. Two common search biases are general-to-specific (start with a general concept description and refine) and specific-to-general (start with a specific example and generalize). Introduction to Machine Learning p.19/22

20 Learning Bias Overfitting-avoidance bias: avoid learning a concept that overfits, i.e., just enumerates the training data: this will give very bad results on test data, as it lacks the ability to generalized to unseen instances. Example: consider simple concepts first, then proceed to more complex one, e.g., avoid a complex rule set with one rule for each training instance. Introduction to Machine Learning p.20/22

21 Approaches Dealt with in this Course Decision trees Bayesian classifiers Linear models Clustering algorithms Introduction to Machine Learning p.21/22

22 References Mitchell, Tom. M Machine Learning. New York: McGraw-Hill. Witten, Ian H., and Eibe Frank Data Mining: Practical Machine Learing Tools and Techniques with Java Implementations. San Diego, CA: Morgan Kaufmann. Introduction to Machine Learning p.22/22

### Overview. Introduction to Machine Learning. Definition of Learning. A Sample Data Set. Connectionist and Statistical Language Processing

Overview Introduction to Machine Learning Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes definition of learning sample

### Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

### Overview. Clustering. Clustering vs. Classification. Supervised vs. Unsupervised Learning. Connectionist and Statistical Language Processing

Overview Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes clustering vs. classification supervised vs. unsupervised

### Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

### Data Mining with Weka

Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to

### Data Mining Practical Machine Learning Tools and Techniques

Some Core Learning Representations Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision trees Learning Rules Association rules

### Data Mining Practical Machine Learning Tools and Techniques

Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Outline Terminology What s a concept Classification, association, clustering, numeric

### Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### Knowledge-based systems and the need for learning

Knowledge-based systems and the need for learning The implementation of a knowledge-based system can be quite difficult. Furthermore, the process of reasoning with that knowledge can be quite slow. This

### Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

### More Data Mining with Weka

More Data Mining with Weka Class 3 Lesson 1 Decision trees and rules Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 3.1: Decision trees and rules

### Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

### Concept Learning. Machine Learning 1

Concept Learning Inducing general functions from specific training examples is a main issue of machine learning. Concept Learning: Acquiring the definition of a general category from given sample positive

### Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART I What s it all about 10/25/2000 2 Data vs. information Society produces huge amounts of data

### Decision-Tree Learning

Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values

### 6. If there is no improvement of the categories after several steps, then choose new seeds using another criterion (e.g. the objects near the edge of

Clustering Clustering is an unsupervised learning method: there is no target value (class label) to be predicted, the goal is finding common patterns or grouping similar examples. Differences between models/algorithms

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

### Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu.

Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 42-268-7599 Copyright Andrew W. Moore Slide Decision Trees Decision trees

### COMP9417: Machine Learning and Data Mining

COMP9417: Machine Learning and Data Mining A note on the two-class confusion matrix, lift charts and ROC curves Session 1, 2003 Acknowledgement The material in this supplementary note is based in part

### Data Mining on Streams

Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

### Learning. Artificial Intelligence. Learning. Types of Learning. Inductive Learning Method. Inductive Learning. Learning.

Learning Learning is essential for unknown environments, i.e., when designer lacks omniscience Artificial Intelligence Learning Chapter 8 Learning is useful as a system construction method, i.e., expose

### 8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

### Classification and Prediction

Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

### Version Spaces. riedmiller@informatik.uni-freiburg.de

. Machine Learning Version Spaces Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

### Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

### Big Data: The Science of Patterns. Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu

Big Data: The Science of Patterns Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu The Blessing and the Curse: Lots of Data Outlook Temp Humidity Wind Play Sunny Hot High Weak No

### Flat Clustering K-Means Algorithm

Flat Clustering K-Means Algorithm 1. Purpose. Clustering algorithms group a set of documents into subsets or clusters. The cluster algorithms goal is to create clusters that are coherent internally, but

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Monday Morning Data Mining

Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik

### Weather forecast prediction: a Data Mining application

Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,ashwini.mandale@gmail.com,8407974457 Abstract

### Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

### IS463 Introduction to Data Mining Semester 1, Academic year Tutorial # 2

IS463 Introduction to Data Mining Semester 1, Academic year 2012-2013 Tutorial # 2 Activity 1: Classify the following attributes as qualitative (nominal or ordinal) or quantitative (interval or ratio),

### INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

### More Data Mining with Weka

More Data Mining with Weka Class 5 Lesson 1 Simple neural networks Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 5.1: Simple neural networks Class

### Data Mining - Introduction

Data Mining - Introduction Peter Brezany Institut für Scientific Computing Universität Wien Tel. 4277 39425 Sprechstunde: Di, 13.00-14.00 Outline Business Intelligence and its components Knowledge discovery

### Decision tree algorithm short Weka tutorial

Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili Machine leanring for Web Mining a.a. 2009-2010 Machine Learning: brief summary Example You need to write a program that: given a

### Decision Trees. JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology. Doctoral School, Catania-Troina, April, 2008

Decision Trees JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology Doctoral School, Catania-Troina, April, 2008 Aims of this module The decision tree representation. The basic

### Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

### Data Mining - Introduction. Lecturer: JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology Poznań, Poland

Data Mining - Introduction Lecturer: JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology Poznań, Poland Doctoral School, Catania-Troina, April, 2008 Who andwhere Poznań University

### Data Mining: Foundation, Techniques and Applications

Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing

### IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

### Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of

### Data Mining of Web Access Logs

Data Mining of Web Access Logs A minor thesis submitted in partial fulfilment of the requirements for the degree of Master of Applied Science in Information Technology Anand S. Lalani School of Computer

### Big Data Analytics for SCADA

ENERGY Big Data Analytics for SCADA Machine Learning Models for Fault Detection and Turbine Performance Elizabeth Traiger, Ph.D., M.Sc. 14 April 2016 1 SAFER, SMARTER, GREENER Points to Convey Big Data

### Visualizing class probability estimators

Visualizing class probability estimators Eibe Frank and Mark Hall Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall}@cs.waikato.ac.nz Abstract. Inducing classifiers

### An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. XML data mining research based on multi-level technology

[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 13 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(13), 2014 [7602-7609] XML data mining research based on multi-level technology

### Web Document Clustering

Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

### Concept Learning. Learning from examples General-to specific ordering of hypotheses Version spaces and candidate elimination algorithm Inductive bias

Concept Learning Learning from examples Generalto specific ordering of hypotheses Version spaces and candidate elimination algorithm Inductive bias What s Concept Learning? infer the general definition

### Classification with Decision Trees

Classification with Decision Trees Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 24 Y Tao Classification with Decision Trees In this lecture, we will discuss

### Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

### A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

### Exploring Practical Data Mining Techniques at Undergraduate Level

Exploring Practical Data Mining Techniques at Undergraduate Level ERIC P. JIANG University of San Diego 5998 Alcala Park, San Diego, CA 92110 UNITED STATES OF AMERICA jiang@sandiego.edu Abstract: Data

### Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati

### Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

### Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision

### TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

### Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

### New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

### Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:

### Neural Networks. Introduction to Artificial Intelligence CSE 150 May 29, 2007

Neural Networks Introduction to Artificial Intelligence CSE 150 May 29, 2007 Administration Last programming assignment has been posted! Final Exam: Tuesday, June 12, 11:30-2:30 Last Lecture Naïve Bayes

### Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

### Course 395: Machine Learning

Course 395: Machine Learning Lecturers: Maja Pantic (maja@doc.ic.ac.uk) Stavros Petridis (sp104@doc.ic.ac.uk) Goal (Lectures): To present basic theoretical concepts and key algorithms that form the core

### An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients

An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients Celia C. Bojarczuk 1, Heitor S. Lopes 2 and Alex A. Freitas 3 1 Departamento

### Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23

Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction

### Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

### D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal Research Article ISSN 2277 9140 ABSTRACT Web page categorization based

### DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA

315 DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA C. K. Lowe-Ma, A. E. Chen, D. Scholl Physical & Environmental Sciences, Research and Advanced Engineering Ford Motor Company, Dearborn, Michigan, USA

### Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status

Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of

### WEB PAGE CATEGORISATION BASED ON NEURONS

WEB PAGE CATEGORISATION BASED ON NEURONS Shikha Batra Abstract: Contemporary web is comprised of trillions of pages and everyday tremendous amount of requests are made to put more web pages on the WWW.

### The Optimality of Naive Bayes

The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

### Applying Data Mining Classification Techniques for Employee s Performance Prediction

Applying Data Mining Classification Techniques for Employee s Performance Prediction Hamidah Jantan 1, Mazidah Puteh 1, Abdul Razak Hamdan 2 and Zulaiha Ali Othman 2 1 Faculty of Computer and Mathematical

### Classification and Regression Trees

Classification and Regression Trees Bob Stine Dept of Statistics, School University of Pennsylvania Trees Familiar metaphor Biology Decision tree Medical diagnosis Org chart Properties Recursive, partitioning

### Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

### Smart Grid Data Analytics for Decision Support

1 Smart Grid Data Analytics for Decision Support Prakash Ranganathan, Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA Prakash.Ranganathan@engr.und.edu, 701-777-4431

### Comparative Analysis of Classification Algorithms on Different Datasets using WEKA

Volume 54 No13, September 2012 Comparative Analysis of Classification Algorithms on Different Datasets using WEKA Rohit Arora MTech CSE Deptt Hindu College of Engineering Sonepat, Haryana, India Suman

### Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Big Data Analytics CSCI 4030

High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

### Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

### Evaluation and Credibility. How much should we believe in what was learned?

Evaluation and Credibility How much should we believe in what was learned? Outline Introduction Classification with Train, Test, and Validation sets Handling Unbalanced Data; Parameter Tuning Cross-validation

### Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

### 6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

### Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

### T3: A Classification Algorithm for Data Mining

T3: A Classification Algorithm for Data Mining Christos Tjortjis and John Keane Department of Computation, UMIST, P.O. Box 88, Manchester, M60 1QD, UK {christos, jak}@co.umist.ac.uk Abstract. This paper

### Evaluating Data Mining Models: A Pattern Language

Evaluating Data Mining Models: A Pattern Language Jerffeson Souza Stan Matwin Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa K1N 6N5, Canada {jsouza,stan,nat}@site.uottawa.ca

### Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

### Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

International Journal Of Computational Engineering Research (ijceronline.com) Vol. 3 Issue. 3 Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

### Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

### Predicting Flight Delays

Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail