Learning. CS461 Artificial Intelligence Pinar Duygulu. Bilkent University, Spring 2007. Slides are mostly adapted from AIMA and MIT Open Courseware

Similar documents

Chapter 4: Artificial Neural Networks

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Neural Networks and Support Vector Machines

Supervised Learning (Big Data Analytics)

Principles of Data Mining by Hand&Mannila&Smyth

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

NEURAL NETWORKS A Comprehensive Foundation

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Feedforward Neural Networks and Backpropagation

LCs for Binary Classification

Data Mining Practical Machine Learning Tools and Techniques

Data Mining. Nonlinear Classification

Knowledge Discovery and Data Mining

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Machine Learning Logistic Regression

Data Mining Techniques for Prognosis in Pancreatic Cancer

Chapter 12 Discovering New Knowledge Data Mining

Car Insurance. Havránek, Pokorný, Tomášek

Learning is a very general term denoting the way in which agents:

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

STA 4273H: Statistical Machine Learning

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Predict Influencers in the Social Network

Knowledge Discovery and Data Mining

Course Outline Department of Computing Science Faculty of Science. COMP Applied Artificial Intelligence (3,1,0) Fall 2015

Data Mining - Evaluation of Classifiers

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Neural network software tool development: exploring programming language options

Knowledge Discovery and Data Mining

Supporting Online Material for

Monotonicity Hints. Abstract

Role of Neural network in data mining

GLOVE-BASED GESTURE RECOGNITION SYSTEM

Data quality in Accounting Information Systems

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

Introduction to Learning & Decision Trees

Lecture 6. Artificial Neural Networks

Analecta Vol. 8, No. 2 ISSN

Data Mining Analytics for Business Intelligence and Decision Support

Sentiment analysis using emoticons

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Recurrent Neural Networks

Using Artificial Intelligence to Manage Big Data for Litigation

An Introduction to Neural Networks

Machine Learning: Overview

Active Learning SVM for Blogs recommendation

Classification algorithm in Data mining: An Overview

Artificial neural networks

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Data Mining: An Overview. David Madigan

A Time Series ANN Approach for Weather Forecasting

Università degli Studi di Bologna

Operations Research and Knowledge Modeling in Data Mining

Introduction to Machine Learning CMU-10701

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

An Introduction to Data Mining

3 An Illustrative Example

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

A Simple Introduction to Support Vector Machines

Optimizing content delivery through machine learning. James Schneider Anton DeFrancesco

Big Data Analytics CSCI 4030

Feature Selection with Monte-Carlo Tree Search

L13: cross-validation

The Artificial Prediction Market

CS 534: Computer Vision 3D Model-based recognition

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Comparison of Data Mining Techniques used for Financial Data Analysis

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Learning Agents: Introduction

Spam? Not Any More! Detecting Spam s using neural networks

Neural Network Design in Cloud Computing

Linear Threshold Units

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Monday Morning Data Mining

A Comparison of Leading Data Mining Tools

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Machine Learning and Data Mining. Fundamentals, robotics, recognition

American International Journal of Research in Science, Technology, Engineering & Mathematics

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Component Ordering in Independent Component Analysis Based on Data Power

Comparison of machine learning methods for intelligent tutoring systems

Artificial Neural Network for Speech Recognition

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

AN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING

How To Get A Computer Engineering Degree

The Scientific Data Mining Process

Course 395: Machine Learning

An Approach for Utility Pole Recognition in Real Conditions

REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES

Classification and Prediction

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Transcription:

1 Learning CS 461 Artificial Intelligence Pinar Duygulu Bilkent University, Slides are mostly adapted from AIMA and MIT Open Courseware

2 Learning What is learning?

3 Induction David Hume Bertrand Russell If asked why we believe the sun will rise tomorrow, we shall naturally answer, 'Because it has always risen every day.' We have a firm belief that it will rise in the future, because it has risen in the past. The real question is: Do any number of cases of a law being fulfilled in the past afford evidence that it will be fulfilled in the future? It has been argued that we have reason to know the future will resemble the past, because what was the future has constantly become the past, and has always been found to resemble the past, so that we really have experience of the future, namely of times which were formerly future, which we may call past futures. But such an argument really begs the very question at issue. We have experience of past futures, but not of future futures, and the question is: Will future futures resemble past futures?

4 Kinds of Learning

5 Learning a function

6 Aspects of function learning

7 Example Problem

8 Memory

9 Averaging

10 Sensor noise

11 Generalization

12 The red and the black

13 What is the right hypothesis?

14 What is the right hypothesis?

15 What is the right hypothesis?

16 How about this?

17 Variety of learning methods

18 Nearest Neighbor

19 Decision trees

20 Neural Networks

21 Machine learning successes

22 Supervised learning

23 Best hypothesis

24 Learning Conjunctions

25 Algorithm

26 Algorithm Start with N equal to all the negative examples and h = true Then loop, adding conjuncts that rue out negative examples until N is empty Inside the loop consider every feature that would not rule out any positive examples

27 Simulation

28 Now, we consider all the the features that would not exclude any positive examples. Those are features f3 and f4. f3 would exclude 1 negative example; f4 would exclude 2. So we pick f4.

29 Simulation Now we remove the examples from N that are ruled out by f4 and add f4 to h. Now, based on the new N, n3 = 1 and n4 = 0. So we pick f3.

30 Simulation Because f3 rules out the last remaining negative example, we're done!

31 A harder Problem

32 Disjunctive Normal form

33 Learning DNF

34 Algorithm The idea is that each disjunct will cover or account for some subset of the positive examples. So in the outer loop, we make a conjunction that includes some positive examples and no negative examples, and add it to our hypothesis. We keep doing that until no more positive examples remain to be covered.

35 Choosing a feature

36 Simulation

37 How well does it work?

38 Cross validation

39 Learning curves

40 Learning curves

41 Simple Gifts

42 Noisy data

43 Pseudo code: Noisy DNF Learning

44 Epsilon is our data

45 Overfitting curve

46 Hypothesis complexity

47 Bias vs variance

48

49

50

51

52

53

54 Picking epsilon

55 Domains

56 Congressional Voting

57 Decision Trees

58 Hypothesis class

59

60

61 Tree Bias

62 Trees vs DNF

63 Trees vs DNF

64 Algorithm

65 Let's split

66 Entropy

67 Let's split

68 Let's split

69 Stopping

70 Simulation

71 Exclusive OR

72 Congressional voting

73 Naïve Bayes

74 Example

75

76

77 Prediction P

78 Learning Algorithm

79 Prediction Algorithm

80 Laplace Correction

81 Example with correction

82 Prediction with correction

83 Hypothesis space

84 Exclusive OR

85 Probabilistic Inference

86 Bayes' rule

87 Why is Bayes Naive

88 Learning Algorithm

89 Prediction Algorithm

90 Feature Spaces

91 Predicting Bankruptcy

92 Nearest neighbor

93 What do we mean by nearest?

94 Scaling

95 Predicting Bankruptcy

96 Predicting Bankruptcy

97 Hypothesis

98 Time and space

99 Noise

Noise 100

K-nearest neighbor 101

Curse of dimensionality 102

Test domains 103

Decision trees 104

Numerical attributes 105

106

Considering splits 107

Considering splits 108

Bankruptcy example 109

Heart disease 110

More than 22 MPG? 111

Bankruptcy example 112

1-Nearest Neighbor hypothesis 113

Decision tree hypothesis 114

Linear hypothesis 115

Linearly separable 116

Not linearly separable 117

Linear hypothesis class 118

Hyperplane geometry 119

120

Perceptron algorithm 121

Bankruptcy example- 49 iterations 122

Gradient Ascent 123

Gradient ascent/descent 124

Perceptron training via gradient descent 125

Artificial Neural Networks (Feedforward Nets) 126

Single Perceptron Unit 127

Beyond linear separability 128

Multi-layer perceptron 129

Multilayer perceptron 130

Multilayer perceptron learning 131

Sigmoid unit 132

133

Gradient descent 134

Gradient descent single unit 135

Derivative of the sigmoid 136

Gradient of unit output 137

Gradient of error 138

Gradient of Unit Output 139

Generalized delta rule 140

Backpropagation 141

Backpropagation example 142

Training neural nets 143

Applications 144

Applications 145

146 The vertical face-finding part of Rowley, Baluja and Kanade s system Figure from Rotation invariant neural-network based face detection, H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright IEEE Adapted from David 1998, Forsyth, UC Berkeley

147 Architecture of the complete system: they use another neural net to estimate orientation of the face, then rectify it. They search over scales to find bigger/smaller faces. Figure from Rotation invariant neural-network based face detection, H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE Adapted from David Forsyth, UC Berkeley

148 Figure from Rotation invariant neural-network based face detection, H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE Adapted from David Forsyth, UC Berkeley

Limitations 149

150

151

152

153

154

155

156

157

158