1 Learning CS 461 Artificial Intelligence Pinar Duygulu Bilkent University, Slides are mostly adapted from AIMA and MIT Open Courseware
2 Learning What is learning?
3 Induction David Hume Bertrand Russell If asked why we believe the sun will rise tomorrow, we shall naturally answer, 'Because it has always risen every day.' We have a firm belief that it will rise in the future, because it has risen in the past. The real question is: Do any number of cases of a law being fulfilled in the past afford evidence that it will be fulfilled in the future? It has been argued that we have reason to know the future will resemble the past, because what was the future has constantly become the past, and has always been found to resemble the past, so that we really have experience of the future, namely of times which were formerly future, which we may call past futures. But such an argument really begs the very question at issue. We have experience of past futures, but not of future futures, and the question is: Will future futures resemble past futures?
4 Kinds of Learning
5 Learning a function
6 Aspects of function learning
7 Example Problem
8 Memory
9 Averaging
10 Sensor noise
11 Generalization
12 The red and the black
13 What is the right hypothesis?
14 What is the right hypothesis?
15 What is the right hypothesis?
16 How about this?
17 Variety of learning methods
18 Nearest Neighbor
19 Decision trees
20 Neural Networks
21 Machine learning successes
22 Supervised learning
23 Best hypothesis
24 Learning Conjunctions
25 Algorithm
26 Algorithm Start with N equal to all the negative examples and h = true Then loop, adding conjuncts that rue out negative examples until N is empty Inside the loop consider every feature that would not rule out any positive examples
27 Simulation
28 Now, we consider all the the features that would not exclude any positive examples. Those are features f3 and f4. f3 would exclude 1 negative example; f4 would exclude 2. So we pick f4.
29 Simulation Now we remove the examples from N that are ruled out by f4 and add f4 to h. Now, based on the new N, n3 = 1 and n4 = 0. So we pick f3.
30 Simulation Because f3 rules out the last remaining negative example, we're done!
31 A harder Problem
32 Disjunctive Normal form
33 Learning DNF
34 Algorithm The idea is that each disjunct will cover or account for some subset of the positive examples. So in the outer loop, we make a conjunction that includes some positive examples and no negative examples, and add it to our hypothesis. We keep doing that until no more positive examples remain to be covered.
35 Choosing a feature
36 Simulation
37 How well does it work?
38 Cross validation
39 Learning curves
40 Learning curves
41 Simple Gifts
42 Noisy data
43 Pseudo code: Noisy DNF Learning
44 Epsilon is our data
45 Overfitting curve
46 Hypothesis complexity
47 Bias vs variance
48
49
50
51
52
53
54 Picking epsilon
55 Domains
56 Congressional Voting
57 Decision Trees
58 Hypothesis class
59
60
61 Tree Bias
62 Trees vs DNF
63 Trees vs DNF
64 Algorithm
65 Let's split
66 Entropy
67 Let's split
68 Let's split
69 Stopping
70 Simulation
71 Exclusive OR
72 Congressional voting
73 Naïve Bayes
74 Example
75
76
77 Prediction P
78 Learning Algorithm
79 Prediction Algorithm
80 Laplace Correction
81 Example with correction
82 Prediction with correction
83 Hypothesis space
84 Exclusive OR
85 Probabilistic Inference
86 Bayes' rule
87 Why is Bayes Naive
88 Learning Algorithm
89 Prediction Algorithm
90 Feature Spaces
91 Predicting Bankruptcy
92 Nearest neighbor
93 What do we mean by nearest?
94 Scaling
95 Predicting Bankruptcy
96 Predicting Bankruptcy
97 Hypothesis
98 Time and space
99 Noise
Noise 100
K-nearest neighbor 101
Curse of dimensionality 102
Test domains 103
Decision trees 104
Numerical attributes 105
106
Considering splits 107
Considering splits 108
Bankruptcy example 109
Heart disease 110
More than 22 MPG? 111
Bankruptcy example 112
1-Nearest Neighbor hypothesis 113
Decision tree hypothesis 114
Linear hypothesis 115
Linearly separable 116
Not linearly separable 117
Linear hypothesis class 118
Hyperplane geometry 119
120
Perceptron algorithm 121
Bankruptcy example- 49 iterations 122
Gradient Ascent 123
Gradient ascent/descent 124
Perceptron training via gradient descent 125
Artificial Neural Networks (Feedforward Nets) 126
Single Perceptron Unit 127
Beyond linear separability 128
Multi-layer perceptron 129
Multilayer perceptron 130
Multilayer perceptron learning 131
Sigmoid unit 132
133
Gradient descent 134
Gradient descent single unit 135
Derivative of the sigmoid 136
Gradient of unit output 137
Gradient of error 138
Gradient of Unit Output 139
Generalized delta rule 140
Backpropagation 141
Backpropagation example 142
Training neural nets 143
Applications 144
Applications 145
146 The vertical face-finding part of Rowley, Baluja and Kanade s system Figure from Rotation invariant neural-network based face detection, H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright IEEE Adapted from David 1998, Forsyth, UC Berkeley
147 Architecture of the complete system: they use another neural net to estimate orientation of the face, then rectify it. They search over scales to find bigger/smaller faces. Figure from Rotation invariant neural-network based face detection, H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE Adapted from David Forsyth, UC Berkeley
148 Figure from Rotation invariant neural-network based face detection, H.A. Rowley, S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998, copyright 1998, IEEE Adapted from David Forsyth, UC Berkeley
Limitations 149
150
151
152
153
154
155
156
157
158