ECE 5984: Introduction to Machine Learning

Size: px

Start display at page:

Download "ECE 5984: Introduction to Machine Learning"

Nicholas James
7 years ago
Views:

1 ECE 5984: Introduction to Machine Learning Topics: Neural Networks Backprop Readings: Murphy 16.5 Dhruv Batra Virginia Tech

2 Administrativia HW3 Due: in 2 weeks You will implement primal & dual SVMs Kaggle competition: Higgs Boson Signal vs Background classification (C) Dhruv Batra 2

3 Administrativia Project Mid-Sem Spotlight Presentations Friday: 5-7pm, 3-5pm Whittemore slides (recommended) 4 minute time (STRICT) min Q&A Tell the class what you re working on Any results yet? Problems faced? Upload slides on Scholar (C) Dhruv Batra 3

4 Recap of Last Time (C) Dhruv Batra 4

5 Not linearly separable data Some datasets are not linearly separable! AppletSVM.html

6 Addressing non-linearly separable data Option 1, non-linear features Choose non-linear features, e.g., Typical linear features: w 0 + i w i x i Example of non-linear features: Degree 2 polynomials, w 0 + i w i x i + ij w ij x i x j Classifier h w (x) still linear in parameters w As easy to learn Data is linearly separable in higher dimensional spaces Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin 6

7 Addressing non-linearly separable data Option 2, non-linear classifier Choose a classifier h w (x) that is non-linear in parameters w, e.g., Decision trees, neural networks, More general than linear classifiers But, can often be harder to learn (non-convex optimization required) Often very useful (outperforms linear classifiers) In a way, both ideas are related (C) Dhruv Batra Slide Credit: Carlos Guestrin 7

8 Biological Neuron (C) Dhruv Batra 8

9 Recall: The Neuron Metaphor Neurons accept information from multiple inputs, transmit information to other neurons. Multiply inputs by weights along edges Apply some function to the set of inputs at each node Slide Credit: HKUST 9

1 2 D 1 2 1 0 Linear Neuron Types of Neurons 1 0 f(~x, ) 1 2 f(~x, ) D 1 0 Logistic Neuron f(~x, ) D

10 1 2 D Linear Neuron Types of Neurons 1 0 f(~x, ) 1 2 f(~x, ) D 1 0 Logistic Neuron f(~x, ) D Perceptron Potentially more. Require a convex loss function for gradient descent training. Slide Credit: HKUST 10

11 Limitation A single neuron is still a linear decision boundary What to do? Idea: Stack a bunch of them together! (C) Dhruv Batra 11

12 Multilayer Networks Cascade Neurons together The output from one layer is the input to the next Each Layer has its own sets of weights x 0 ~ 0,0 x 1 x 2 ~ 0,1 ~ 0,2 ~ 1,2 ~ 1,0 2,0 ~ 1,1 2,1 2,2 f(x, ~ ) x P Slide Credit: HKUST 12

13 Universal Function Approximators Theorem 3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi 89] (C) Dhruv Batra 13

14 Plan for Today Neural Networks Parameter learning Backpropagation (C) Dhruv Batra 14

15 Forward Propagation On board (C) Dhruv Batra 15

16 Feed-Forward Networks Predictions are fed forward through the network to classify x 0 ~ 0,0 ~ 1,0 2,0 x 1 x 2 x P ~ 0,1 ~ 1,1 ~ 0,2 ~ 1,2 2,1 2,2 Slide Credit: HKUST 16

17 Feed-Forward Networks Predictions are fed forward through the network to classify x 0 ~ 0,0 ~ 1,0 2,0 x 1 x 2 x P ~ 0,1 ~ 1,1 ~ 0,2 ~ 1,2 2,1 2,2 Slide Credit: HKUST 17

18 Feed-Forward Networks Predictions are fed forward through the network to classify x 0 ~ 0,0 ~ 1,0 2,0 x 1 x 2 x P ~ 0,1 ~ 1,1 ~ 0,2 ~ 1,2 2,1 2,2 Slide Credit: HKUST 18

19 Feed-Forward Networks Predictions are fed forward through the network to classify x 0 ~ 0,0 ~ 1,0 2,0 x 1 x 2 x P ~ 0,1 ~ 1,1 ~ 0,2 ~ 1,2 2,1 2,2 Slide Credit: HKUST 19

20 Feed-Forward Networks Predictions are fed forward through the network to classify x 0 ~ 0,0 ~ 1,0 2,0 x 1 x 2 x P ~ 0,1 ~ 1,1 ~ 0,2 ~ 1,2 2,1 2,2 Slide Credit: HKUST 20

21 Feed-Forward Networks Predictions are fed forward through the network to classify x 0 ~ 0,0 ~ 1,0 2,0 x 1 x 2 x P ~ 0,1 ~ 1,1 ~ 0,2 ~ 1,2 2,1 2,2 Slide Credit: HKUST 21

22 Gradient Computation First let s try: Single Neuron for Linear Regression Single Neuron for Logistic Regresion (C) Dhruv Batra 22

23 Logistic regression Learning rule MLE: (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

24 Gradient Computation First let s try: Single Neuron for Linear Regression Single Neuron for Logistic Regresion Now let s try the general case Backpropagation! Really efficient (C) Dhruv Batra 24

25 Neural Nets Best performers on OCR NetTalk Text to Speech system from Rick Rashid speaks Mandarin (C) Dhruv Batra 25

26 Neural Networks Demo bpfunctionapprox.html (C) Dhruv Batra 26

27 Historical Perspective (C) Dhruv Batra 27

28 Convergence of backprop Perceptron leads to convex optimization Gradient descent reaches global minima Multilayer neural nets not convex Gradient descent gets stuck in local minima Hard to set learning rate Selecting number of hidden units and layers = fuzzy process NNs had fallen out of fashion in 90s, early 2000s Back with a new name and significantly improved performance!!!! Deep networks Dropout and trained on much larger corpus (C) Dhruv Batra Slide Credit: Carlos Guestrin 28

29 Overfitting Many many many parameters Avoiding overfitting? More training data Regularization Early stopping (C) Dhruv Batra 29

30 A quick note (C) Dhruv Batra Image Credit: LeCun et al

31 Rectified Linear Units (ReLU) (C) Dhruv Batra 31

32 Convolutional Nets Basic Idea On board Assumptions: Local Receptive Fields Weight Sharing / Translational Invariance / Stationarity Each layer is just a convolution! Input image Convolutional layer Sub-sampling layer (C) Dhruv Batra Image Credit: Chris Bishop 32

33 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 33

34 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 34

35 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 35

36 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 36

37 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 37

38 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 38

39 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 39

40 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 40

41 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 41

42 Convolutional Nets Example: INPUT 32x32 C1: feature maps C3: f. maps S4: f. maps S2: f. maps C5: layer 120 F6: layer 84 OUTPUT 10 Convolutions Subsampling Convolutions Full connection Gaussian connections Subsampling Full connection (C) Dhruv Batra Image Credit: Yann LeCun, Kevin Murphy 42

43 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 43

44 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 44

45 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 45

46 Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 46

47 Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 47

48 Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 48

49 Autoencoders Goal Compression: Output tries to predict input (C) Dhruv Batra Image Credit: 49

50 Autoencoders Goal Learns a low-dimensional basis for the data (C) Dhruv Batra Image Credit: Andrew Ng 50

51 Stacked Autoencoders How about we compress the low-dim features more? (C) Dhruv Batra Image Credit: 51

52 Sparse DBNs [Lee et al. ICML 09] Figure courtesy: Quoc Le (C) Dhruv Batra 52

53 Stacked Autoencoders Finally perform classification with these low-dim features. (C) Dhruv Batra Image Credit: 53

54 What you need to know about neural networks Perceptron: Representation Derivation Multilayer neural nets Representation Derivation of backprop Learning rule Expressive power

Neural Networks and Support Vector Machines

INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines