Gaussian Processes for Big Data Problems
|
|
- Grant Thomas
- 8 years ago
- Views:
Transcription
1 Gaussian Processes for Big Data Problems Marc Deisenroth Department of Computing Imperial College London Machine Learning Summer School, Chalmers University 14 April 2015
2 Gaussian Processes Marc 14 April
3 Problem Setting f(x) x Objective For a set of observations y i f px i q ` ε, ε N `0, σε 2, find a distribution over functions pp f q that explains the data Probabilistic regression problem Gaussian Processes Marc 14 April
4 Problem Setting 2 f(x) x Objective For a set of observations y i f px i q ` ε, ε N `0, σε 2, find a distribution over functions pp f q that explains the data Probabilistic regression problem Gaussian Processes Marc 14 April
5 (Some) Relevant Application Areas Global black-box optimization and experimental design Autonomous learning in robotics Probabilistic dimensionality reduction and data visualization Gaussian Processes Marc 14 April
6 Table of Contents Introduction Gaussian Distribution Gaussian Processes Definition and Derivation Inference Covariance Functions Model Selection Limitations Scaling Gaussian Processes to Large Data Sets Sparse Gaussian Processes Distributed Gaussian Processes Gaussian Processes Marc 14 April
7 Table of Contents Introduction Gaussian Distribution Gaussian Processes Definition and Derivation Inference Covariance Functions Model Selection Limitations Scaling Gaussian Processes to Large Data Sets Sparse Gaussian Processes Distributed Gaussian Processes Gaussian Processes Marc 14 April
8 Key Concepts in Probability Theory Two fundamental rules: ż ppxq ppx, yqdy ppx, yq ppy xqppxq Sum rule/marginalization property Product rule Gaussian Processes Marc 14 April
9 Key Concepts in Probability Theory Two fundamental rules: ż ppxq ppx, yqdy ppx, yq ppy xqppxq Sum rule/marginalization property Product rule Bayes Theorem (Probabilistic Inverse) ppy xq ppxq ppx yq, x : hypothesis, y : measurement ppyq Gaussian Processes Marc 14 April
10 Key Concepts in Probability Theory Two fundamental rules: ż ppxq ppx, yqdy ppx, yq ppy xqppxq Sum rule/marginalization property Product rule Bayes Theorem (Probabilistic Inverse) ppy xq ppxq ppx yq, x : hypothesis, y : measurement ppyq Posterior belief Prior belief x Likelihood (measurement model) y Marginal likelihood (normalization constant) Gaussian Processes Marc 14 April
11 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 p(x) Mean 95% confidence bound p(x) x Gaussian Processes Marc 14 April
12 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 p(x) Mean 95% confidence bound 3 2 Mean 95% confidence bound p(x) x x x 1 Gaussian Processes Marc 14 April
13 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 Data p(x) Mean 95% confidence interval 3 2 Data Mean 95% confidence bound p(x) x x x 1 Gaussian Processes Marc 14 April
14 Sampling from a Multivariate Gaussian Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. However, we only have access to a random number generator that can sample x from N `0, I... Gaussian Processes Marc 14 April
15 Sampling from a Multivariate Gaussian Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. However, we only have access to a random number generator that can sample x from N `0, I... Exploit that affine transformations y Ax ` b of Gaussians remain Gaussian Mean: E x rax ` bs AE x rxs ` b Covariance: V x rax ` bs AV x rxsa J Gaussian Processes Marc 14 April
16 Sampling from a Multivariate Gaussian Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. However, we only have access to a random number generator that can sample x from N `0, I... Exploit that affine transformations y Ax ` b of Gaussians remain Gaussian Mean: E x rax ` bs AE x rxs ` b Covariance: V x rax ` bs AV x rxsa J 1. Find conditions for A, b to match the mean of y 2. Find conditions for A, b to match the covariance of y Gaussian Processes Marc 14 April
17 Sampling from a Multivariate Gaussian (2) Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. x = randn(d,1); y = chol(σ) *x + µ; Sample x N `0, I Scale x Here chol(σ) is the Cholesky factor L, such that L J L Σ Gaussian Processes Marc 14 April
18 Sampling from a Multivariate Gaussian (2) Objective Generate a random sample y N `µ, Σ from a D-dimensional joint Gaussian with covariance matrix Σ and mean vector µ. x = randn(d,1); y = chol(σ) *x + µ; Sample x N `0, I Scale x Here chol(σ) is the Cholesky factor L, such that L J L Σ Therefore, the mean and covariance of y are Erys ȳ ErL J x ` µs L J Erxs ` µ µ Covrys Erpy ȳqpy ȳq J s ErL J xx J Ls L J Erxx J sl L J L Σ Gaussian Processes Marc 14 April
19 Conditional 3 2 «ff µx Joint p(x,y) ppx, yq N, µ y «ff Σxx Σ xy Σ yx Σ yy 1 0 y x Gaussian Processes Marc 14 April
20 Conditional Joint p(x,y) Observation ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y x Gaussian Processes Marc 14 April
21 Conditional 3 2 Joint p(x,y) Observation y Conditional p(x y) ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y ppx yq N `µ x y, Σ x y µ x y µ x ` Σ xy Σ 1 yy py µ y q Σ x y Σ xx Σ xy Σ 1 yy Σ yx x Conditional ppx yq is also Gaussian Computationally convenient Gaussian Processes Marc 14 April
22 Marginal Joint p(x,y) Marginal p(x) ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y ż pp x q pp x, y qd y N ` µ x, Σ xx x The marginal of a joint Gaussian distribution is Gaussian Intuitively: Ignore (integrate out) everything you are not interested in Gaussian Processes Marc 14 April
23 Table of Contents Introduction Gaussian Distribution Gaussian Processes Definition and Derivation Inference Covariance Functions Model Selection Limitations Scaling Gaussian Processes to Large Data Sets Sparse Gaussian Processes Distributed Gaussian Processes Gaussian Processes Marc 14 April
24 Gaussian Process Definition A Gaussian process (GP) is a collection of random variables x 1, x 2,..., any finite number of which is Gaussian distributed. Gaussian Processes Marc 14 April
25 Gaussian Process Definition A Gaussian process (GP) is a collection of random variables x 1, x 2,..., any finite number of which is Gaussian distributed. Unexpected (?) example: Linear dynamical system x t`1 Ax t ` w, x 1, x 2,... is a collection of random variables w N `0, Q Gaussian Processes Marc 14 April
26 Gaussian Process Definition A Gaussian process (GP) is a collection of random variables x 1, x 2,..., any finite number of which is Gaussian distributed. Unexpected (?) example: Linear dynamical system x t`1 Ax t ` w, w N `0, Q x 1, x 2,... is a collection of random variables, any finite number of which is Gaussian distributed Gaussian Processes Marc 14 April
27 Gaussian Process Definition A Gaussian process (GP) is a collection of random variables x 1, x 2,..., any finite number of which is Gaussian distributed. Unexpected (?) example: Linear dynamical system x t`1 Ax t ` w, w N `0, Q x 1, x 2,... is a collection of random variables, any finite number of which is Gaussian distributed This is a GP Gaussian Processes Marc 14 April
28 Gaussian Process Definition A Gaussian process (GP) is a collection of random variables x 1, x 2,..., any finite number of which is Gaussian distributed. Unexpected (?) example: Linear dynamical system x t`1 Ax t ` w, w N `0, Q x 1, x 2,... is a collection of random variables, any finite number of which is Gaussian distributed This is a GP But we will use the Gaussian process for a different purpose, not for time series Gaussian Processes Marc 14 April
29 The Gaussian in the Limit Consider the joint Gaussian distribution ppx, xq, where x P R D and x P R k, k Ñ 8 Gaussian Processes Marc 14 April
30 The Gaussian in the Limit Consider the joint Gaussian distribution ppx, xq, where x P R D and x P R k, k Ñ 8 Then «ff j µx Σxx Σ ppx, xq N, x x µ x Σ xx where Σ x x P R kˆk and Σ x x P R Dˆk, k Ñ 8. Σ x x Gaussian Processes Marc 14 April
31 The Gaussian in the Limit Consider the joint Gaussian distribution ppx, xq, where x P R D and x P R k, k Ñ 8 Then «ff j µx Σxx Σ ppx, xq N, x x µ x Σ xx Σ x x where Σ x x P R kˆk and Σ x x P R Dˆk, k Ñ 8. However, the marginal remains finite ż pp x q pp x, x qd x N ` µ x, Σ xx where we integrate out an infinite number of variables x i. Gaussian Processes Marc 14 April
32 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Gaussian Processes Marc 14 April
33 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide) Gaussian Processes Marc 14 April
34 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide)» fi» ppxq N µ train µ test µ other ffi ffi fl, Σ train Σ train,test Σ test,train Σ test Σ test,other Σ train,other Σ other,train Σ other,test Σ other fi ffi ffi fl Gaussian Processes Marc 14 April
35 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide)» fi» ppxq N µ train µ test µ other ffi ffi fl, Σ train Σ train,test Σ test,train Σ test Σ test,other Σ train,other Σ other,train Σ other,test Σ other ż ppx train, x test q pp x train, x test, x other qd x other fi ffi ffi fl Gaussian Processes Marc 14 April
36 Marginal and Conditional in the Limit In practice, we consider finite training and test data x train, x test Then, x tx train, x test, x other u (x other plays the role of x from previous slide)» fi» ppxq N µ train µ test µ other ffi ffi fl, Σ train Σ train,test Σ test,train Σ test Σ test,other Σ train,other Σ other,train Σ other,test Σ other ż ppx train, x test q pp x train, x test, x other qd x other ppx test x train q N `µ, Σ µ µ test ` Σ test,train Σ 1 train px train µ train q Σ Σ test Σ test,train Σ 1 train Σ train,test fi ffi ffi fl Gaussian Processes Marc 14 April
37 Back to Regression: Distribution over Functions We are not really interested in a distribution on x, but rather in a distribution over function values f pxq Gaussian Processes Marc 14 April
38 Back to Regression: Distribution over Functions We are not really interested in a distribution on x, but rather in a distribution over function values f pxq Let s replace x with function values f f pxq (Treat a function as a long vector of function values)» fi» fi pp f, f q N µ f fl, µ f Σ f f Σ f f Σ f f Σ f f where Σ f f P R kˆk and Σ f f P R Nˆk, k Ñ 8. fl Gaussian Processes Marc 14 April
39 Back to Regression: Distribution over Functions We are not really interested in a distribution on x, but rather in a distribution over function values f pxq Let s replace x with function values f f pxq (Treat a function as a long vector of function values)» fi» fi pp f, f q N µ f fl, µ f Σ f f Σ f f Σ f f Σ f f where Σ f f P R kˆk and Σ f f P R Nˆk, k Ñ 8. fl Again, the marginal remains finite ż pp f q pp f, f qd f N ` µ f, Σ f f Gaussian Processes Marc 14 April
40 Marginal and Conditional over Functions Define f : f test, f : f train. Gaussian Processes Marc 14 April
41 Marginal and Conditional over Functions Define f : f test, f : f train. Marginal pp f, f q N «µ f µ ff, «Σ f f Σ f Σ f Σ ff Gaussian Processes Marc 14 April
42 Marginal and Conditional over Functions Define f : f test, f : f train. Marginal pp f, f q N «µ f µ ff, «Σ f f Σ f Σ f Σ ff Conditional (predictive distribution) pp f f q N `m, S m µ ` Σ f Σ 1 f f p f µq S Σ Σ f Σ 1 f f Σ f Gaussian Processes Marc 14 April
43 Marginal and Conditional over Functions Define f : f test, f : f train. Marginal pp f, f q N «µ f µ ff, «Σ f f Σ f Σ f Σ ff Conditional (predictive distribution) pp f f q N `m, S m µ ` Σ f Σ 1 f f p f µq S Σ Σ f Σ 1 f f Σ f We need to compute (cross-)covariances between unknown function values Gaussian Processes Marc 14 April
44 Marginal and Conditional over Functions Define f : f test, f : f train. Marginal pp f, f q N «µ f µ ff, «Σ f f Σ f Σ f Σ ff Conditional (predictive distribution) pp f f q N `m, S m µ ` Σ f Σ 1 f f p f µq S Σ Σ f Σ 1 f f Σ f We need to compute (cross-)covariances between unknown function values The kernel trick helps us out Gaussian Processes Marc 14 April
45 Kernelization ˆ j µ pp f, f q N f, µ j Σ f f Σ f Σ f Σ pp f f q N `m, S m µ ` Σ f Σ 1 f f p f µq S Σ Σ f Σ 1 f f Σ f Gaussian Processes Marc 14 April
46 Kernelization ˆ j µ pp f, f q N f, µ j Σ f f Σ f Σ f Σ pp f f q N `m, S m µ ` Σ f Σ 1 f f p f µq S Σ Σ f Σ 1 f f Σ f A kernel function k is symmetric and positive definite. Gaussian Processes Marc 14 April
47 Kernelization ˆ j µ pp f, f q N f, µ j Σ f f Σ f Σ f Σ pp f f q N `m, S m µ ` Σ f Σ 1 f f p f µq S Σ Σ f Σ 1 f f Σ f A kernel function k is symmetric and positive definite. Kernel computes covariances between unknown function values f px i q and f px j q by just looking at the corresponding inputs x i, x j Σ pi,jq f f Covr f px i q, f px j qs kpx i, x j q Gaussian Processes Marc 14 April
48 Kernelization ˆ j µ pp f, f q N f, µ j Σ f f Σ f Σ f Σ pp f f q N `m, S m µ ` Σ f Σ 1 f f p f µq S Σ Σ f Σ 1 f f Σ f A kernel function k is symmetric and positive definite. Kernel computes covariances between unknown function values f px i q and f px j q by just looking at the corresponding inputs x i, x j Σ pi,jq f f Covr f px i q, f px j qs kpx i, x j q This yields the predictive distribution pp f f, X, X q N `m, S m µ ` kpx, XqkpX, Xq 1 p f µq S K kpx, XqkpX, Xq 1 kpx, X q Gaussian Processes Marc 14 April
49 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ε, ε N `0, σε 2, find a distribution over functions pp f q that explains the data Gaussian Processes Marc 14 April
50 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ε, ε N `0, σε 2, find a distribution over functions pp f q that explains the data Training data: X, y. Bayes theorem yields ppy f, Xq pp f q pp f X, yq ppy Xq Gaussian Processes Marc 14 April
51 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ε, ε N `0, σε 2, find a distribution over functions pp f q that explains the data Training data: X, y. Bayes theorem yields ppy f, Xq pp f q pp f X, yq ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k Gaussian Processes Marc 14 April
52 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ε, ε N `0, σ 2 ε, find a distribution over functions pp f q that explains the data Training data: X, y. Bayes theorem yields ppy f, Xq pp f q pp f X, yq ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k Likelihood (noise model): ppy f, Xq N ` f pxq, σε 2 I Gaussian Processes Marc 14 April
53 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ε, ε N `0, σ 2 ε, find a distribution over functions pp f q that explains the data Training data: X, y. Bayes theorem yields ppy f, Xq pp f q pp f X, yq ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k Likelihood (noise model): ppy f, Xq N ` f pxq, σε 2 I Marginal likelihood (evidence): ppy Xq ş ppy f, Xqpp f qd f Gaussian Processes Marc 14 April
54 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ε, ε N `0, σ 2 ε, find a distribution over functions pp f q that explains the data Training data: X, y. Bayes theorem yields ppy f, Xq pp f q pp f X, yq ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k Likelihood (noise model): ppy f, Xq N ` f pxq, σε 2 I Marginal likelihood (evidence): ppy Xq ş ppy f, Xqpp f qd f Posterior: pp f y, Xq GPpm post, k post q Gaussian Processes Marc 14 April
55 GP Regression as a Bayesian Inference Problem Objective For a set of observations y i f px i q ` ε, ε N `0, σ 2 ε, find a distribution over functions pp f q that explains the data Training data: X, y. Bayes theorem yields ppy f, Xq pp f q pp f X, yq ppy Xq Prior: pp f q GPpm, kq Specify mean m function and kernel k Likelihood (noise model): ppy f, Xq N ` f pxq, σε 2 I Marginal likelihood (evidence): ppy Xq ş ppy f, Xqpp f qd f Posterior: pp f y, Xq GPpm post, k post q m post px i q mpx i q ` kpx, x i q J pk ` σ 2 ε Iq 1 py mpx i qq k post px i, x j q kpx i, x j q kpx, x i q J pk ` σ 2 ε Iq 1 kpx, x j q Gaussian Processes Marc 14 April
56 GP Regression as a Bayesian Inference Problem (2) Posterior Gaussian process: m post px i q mpx i q ` kpx, x i q J pk ` σ 2 ε Iq 1 py mpx i qq k post px i, x j q kpx i, x j q kpx, x i q J pk ` σ 2 ε Iq 1 kpx, x j q Gaussian Processes Marc 14 April
57 GP Regression as a Bayesian Inference Problem (2) Posterior Gaussian process: m post px i q mpx i q ` kpx, x i q J pk ` σ 2 ε Iq 1 py mpx i qq k post px i, x j q kpx i, x j q kpx, x i q J pk ` σ 2 ε Iq 1 kpx, x j q Predictive distribution pp f X, y, x q at test inputs x : pp f X, y, x q N `Er f s, Vr f s Er f X, y, x s m post px q mpx q ` kpx, x q J pk ` σ 2 ε Iq 1 py mpx qq Vr f X, y, x s k post px, x q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc 14 April
58 GP Regression as a Bayesian Inference Problem (2) Posterior Gaussian process: m post px i q mpx i q ` kpx, x i q J pk ` σ 2 ε Iq 1 py mpx i qq k post px i, x j q kpx i, x j q kpx, x i q J pk ` σ 2 ε Iq 1 kpx, x j q Predictive distribution pp f X, y, x q at test inputs x : pp f X, y, x q N `Er f s, Vr f s Er f X, y, x s m post px q mpx q ` kpx, x q J pk ` σ 2 ε Iq 1 py mpx qq Vr f X, y, x s k post px, x q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q From now: Set prior mean function m 0 Gaussian Processes Marc 14 April
59 Illustration f(x) x Prior belief about the function Predictive (marginal) mean and variance: Er f px q x, s mpx q 0 Vr f px q x, s σ 2 px q kpx, x q Gaussian Processes Marc 14 April
60 Illustration f(x) x Prior belief about the function Predictive (marginal) mean and variance: Er f px q x, s mpx q 0 Vr f px q x, s σ 2 px q kpx, x q Gaussian Processes Marc 14 April
61 Illustration f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc 14 April
62 Illustration f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc 14 April
63 Illustration f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc 14 April
64 Illustration f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc 14 April
65 Illustration f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc 14 April
66 Illustration f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc 14 April
67 Illustration f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc 14 April
68 Illustration f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc 14 April
69 Illustration f(x) x Posterior belief about the function Predictive (marginal) mean and variance: Er f px q x, X, ys mpx q kpx, x q J pk ` σ 2 ε Iq 1 y Vr f px q x, X, ys σ 2 px q kpx, x q kpx, x q J pk ` σ 2 ε Iq 1 kpx, x q Gaussian Processes Marc 14 April
70 Covariance Function A Gaussian process is fully specified by a mean function m and a kernel/covariance function k Covariance function encodes high-level structural assumptions about the latent function f (e.g., smoothness, differentiability, periodicity) Gaussian Processes Marc 14 April
71 Gaussian Covariance Function k Gauss px i, x j q θ1 2 exp ` px i x j q J px i x j q{θ2 2 θ 1 : Amplitude of the latent function θ 2 : Length scale. How far do we have to move in input space before the function value changes significantly Smoothness parameter Gaussian Processes Marc 14 April
72 Matérn Covariance Function k Mat,3{2 px i, x j q θ 2 1? 3pxi x 1 ` j q θ 2 exp? 3pxi x j q θ 2 Gaussian Processes Marc 14 April
73 Periodic Covariance Function k per px i, x j q θ1 2 exp 2 ` sin2 θ 3 px i x j q 2π θ 3 : Periodicity parameter θ 2 2 k Gauss pupx i q, upx j qq, upxq j cospxq sinpxq Gaussian Processes Marc 14 April
74 Model Selection A GP is fully specified by a mean function m and a covariance function (kernel) k. Both functions possess parameters tθ m, θ k u : θ How do we find good parameters θ? How do we choose m and k? Model selection Gaussian Processes Marc 14 April
75 f(x) Model Selection I: Length-Scales Length scales determine how wiggly the function is Data x Gaussian Processes Marc 14 April
76 f(x) Model Selection I: Length-Scales Length scales determine how wiggly the function is Data x Gaussian Processes Marc 14 April
77 f(x) Model Selection I: Length-Scales Length scales determine how wiggly the function is Data x Gaussian Processes Marc 14 April
78 f(x) Model Selection I: Length-Scales Length scales determine how wiggly the function is Data x Gaussian Processes Marc 14 April
79 Model Selection: Hyper-Parameters GP Training Find good GP hyper-parameters θ (kernel and mean function parameters) Gaussian Processes Marc 14 April
80 Model Selection: Hyper-Parameters GP Training Find good GP hyper-parameters θ (kernel and mean function parameters) Assign a prior ppθq over hyper-parameters Posterior over hyper-parameters: ż ppθqppy X, θq ppθ X, yq, ppy X, θq ppy Xq ppy f pxqqpp f θqd f Gaussian Processes Marc 14 April
81 Model Selection: Hyper-Parameters GP Training Find good GP hyper-parameters θ (kernel and mean function parameters) Assign a prior ppθq over hyper-parameters Posterior over hyper-parameters: ż ppθqppy X, θq ppθ X, yq, ppy X, θq ppy Xq ppy f pxqqpp f θqd f Choose MAP hyper-parameters θ, such that θ P arg max log ppθq ` log ppy X, θq θ Gaussian Processes Marc 14 April
82 Model Selection: Hyper-Parameters GP Training Find good GP hyper-parameters θ (kernel and mean function parameters) Assign a prior ppθq over hyper-parameters Posterior over hyper-parameters: ż ppθqppy X, θq ppθ X, yq, ppy X, θq ppy Xq ppy f pxqqpp f θqd f Choose MAP hyper-parameters θ, such that θ P arg max log ppθq ` log ppy X, θq θ Maximize marginal likelihood if ppθq U (uniform prior) Gaussian Processes Marc 14 April
83 Training via Marginal Likelihood Maximization GP Training Maximize the evidence/marginal likelihood (probability of the data given the hyper-parameters) θ P arg max log ppy X, θq θ log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const Gaussian Processes Marc 14 April
84 Training via Marginal Likelihood Maximization GP Training Maximize the evidence/marginal likelihood (probability of the data given the hyper-parameters) θ P arg max log ppy X, θq θ log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const Automatic trade-off between data fit and model complexity Gaussian Processes Marc 14 April
85 Training via Marginal Likelihood Maximization GP Training Maximize the evidence/marginal likelihood (probability of the data given the hyper-parameters) θ P arg max log ppy X, θq θ log ppy X, θq 1 2yJK 1 θ y 1 2 log K θ ` const Automatic trade-off between data fit and model complexity Gradient-based optimization possible: B log ppy X, θq Bθ 1 BK 2yJK 1 K 1 y 1 Bθ 2 BK tr`k 1 Bθ Gaussian Processes Marc 14 April
86 f(x) Example Data LML= LML= LML= x Compromise between fitting the data and simplicity of the model Not fitting the data is explained by a larger noise variance (treated as an additional hyper-parameter and learned jointly)s Gaussian Processes Marc 14 April
87 Model Selection II Mean Function and Kernel Assume we have a finite set of models M i, each one specifying a mean function m i and a kernel k i. How do we find the best one? Gaussian Processes Marc 14 April
88 Model Selection II Mean Function and Kernel Assume we have a finite set of models M i, each one specifying a mean function m i and a kernel k i. How do we find the best one? Assign a prior ppm i q for each model Posterior model probability: ppm i X, yq ppm iqppy X, M i q ppy Xq Gaussian Processes Marc 14 April
89 Model Selection II Mean Function and Kernel Assume we have a finite set of models M i, each one specifying a mean function m i and a kernel k i. How do we find the best one? Assign a prior ppm i q for each model Posterior model probability: ppm i X, yq ppm iqppy X, M i q ppy Xq Choose MAP model M, such that M P arg max log ppmq ` log ppy X, Mq M Gaussian Processes Marc 14 April
90 Model Selection II Mean Function and Kernel Assume we have a finite set of models M i, each one specifying a mean function m i and a kernel k i. How do we find the best one? Assign a prior ppm i q for each model Posterior model probability: ppm i X, yq ppm iqppy X, M i q ppy Xq Choose MAP model M, such that M P arg max log ppmq ` log ppy X, Mq M Compare marginal likelihoods if ppm i q 1{ M Gaussian Processes Marc 14 April
91 Model Selection II Mean Function and Kernel Assume we have a finite set of models M i, each one specifying a mean function m i and a kernel k i. How do we find the best one? Assign a prior ppm i q for each model Posterior model probability: ppm i X, yq ppm iqppy X, M i q ppy Xq Choose MAP model M, such that M P arg max log ppmq ` log ppy X, Mq M Compare marginal likelihoods if ppm i q 1{ M Marginal likelihood is central for model selection Gaussian Processes Marc 14 April
92 f(x) Example x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc 14 April
93 f(x) Example 3 Constant kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc 14 April
94 f(x) Example 3 Linear kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc 14 April
95 f(x) Example 3 Matern kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc 14 April
96 f(x) Example 3 Gaussian kernel, LML= x Four different kernels (mean function fixed to m 0) MAP hyper-parameters for each kernel Log-marginal likelihood values for each (optimized) model Gaussian Processes Marc 14 April
97 Quick Summary f(x) x Gaussian processes are extremely flexible models Consistent confidence bounds! Based on simple manipulations of plain Gaussian distributions Few hyper-parameters Generally easy to train First choice for black-box regression Gaussian Processes Marc 14 April
98 Application Areas Reinforcement Learning and Robotics Model value functions and/or dynamics with GPs Bayesian Optimization (Experimental Design) Model unknown utility functions with GPs Geostatistics Spatial modeling (e.g., landscapes, resources) Sensor networks Gaussian Processes Marc 14 April
99 Limitations of Gaussian Processes Computational and memory complexity Training scales in OpN 3 q Prediction (variances) scales in OpN 2 q Memory requirement: OpND ` N 2 q Practical limit N «10, 000 Gaussian Processes Marc 14 April
100 Table of Contents Introduction Gaussian Distribution Gaussian Processes Definition and Derivation Inference Covariance Functions Model Selection Limitations Scaling Gaussian Processes to Large Data Sets Sparse Gaussian Processes Distributed Gaussian Processes Gaussian Processes Marc 14 April
101 GP Factor Graph Inducing function values Training function values f i f(u1) f(um ) f(x1) f(xn ) f 1 f L Training data Test data Probabilistic graphical model (factor graph) of a GP All function values are jointly Gaussian distributed (e.g., training and test function values) Gaussian Processes Marc 14 April
102 GP Factor Graph Inducing function values Training function values f i f(u1) f(um ) f(x1) f(xn ) f 1 f L Training data Test data Probabilistic graphical model (factor graph) of a GP All function values are jointly Gaussian distributed (e.g., training and test function values) GP prior «ff «ff 0 K f f K f pp f, f q N, 0 K f K Gaussian Processes Marc 14 April
103 Inducing Variables Inducing function values f(u1) f(um ) Training function values f i Hypothetical function values f u f(x1) f(xn ) f 1 f L Training data Test data Introduce inducing function values f u Hypothetical function values Gaussian Processes Marc 14 April
104 Inducing Variables Inducing function values f(u1) f(um ) Training function values f i Hypothetical function values f u f(x1) f(xn ) f 1 f L Training data Test data Introduce inducing function values f u Hypothetical function values All function values are still jointly Gaussian distributed (e.g., training, test and inducing function values) Gaussian Processes Marc 14 April
105 Central Approximation Scheme Inducing function values f(u1) f(um ) f(x1) f(xn ) f 1 f L Training data Test data Approximation: Training and test set are conditionally independent given the inducing function values: f K f f u Gaussian Processes Marc 14 April
106 Central Approximation Scheme Inducing function values f(u1) f(um ) f(x1) f(xn ) f 1 f L Training data Test data Approximation: Training and test set are conditionally independent given the inducing function values: f K f f u Then, the effective GP prior is ż qp f, f q pp f f u qpp f f u qpp f u qd f u «ff» fi N 0, K f f Q f fl, Q ab : K K 1 a fu 0 Q f K f u f u K fu b Gaussian Processes Marc 14 April
107 FI(T)C Sparse Approximation Inducing function values f(u1) f(um ) f(x1) f(xn ) f 1 f L Assume that training (and test sets) are fully independent given the inducing variables (Snelson & Ghahramani, 2006) Training data Test data Gaussian Processes Marc 14 April
108 FI(T)C Sparse Approximation Inducing function values f(u1) f(um ) f(x1) f(xn ) f 1 f L Assume that training (and test sets) are fully independent given the inducing variables (Snelson & Ghahramani, 2006) Training data Test data Effective GP prior with this approximation «ff» fi qp f, f q N 0, Q f f diagpq f f K f f q Q f fl 0 Q f K Q diagpq K q can be used instead of K FIC Gaussian Processes Marc 14 April
109 FI(T)C Sparse Approximation Inducing function values f(u1) f(um ) f(x1) f(xn ) f 1 f L Assume that training (and test sets) are fully independent given the inducing variables (Snelson & Ghahramani, 2006) Training data Test data Effective GP prior with this approximation «ff» fi qp f, f q N 0, Q f f diagpq f f K f f q Q f fl 0 Q f K Q diagpq K q can be used instead of K FIC Training: OpNM 2 q, Prediction: OpM 2 q Gaussian Processes Marc 14 April
110 Inducing Inputs FI(T)C sparse approximation relies on inducing function values f pu i q, where u i are the corresponding inputs Gaussian Processes Marc 14 April
111 Inducing Inputs FI(T)C sparse approximation relies on inducing function values f pu i q, where u i are the corresponding inputs These inputs are unknown a priori Find optimal ones Gaussian Processes Marc 14 April
112 Inducing Inputs FI(T)C sparse approximation relies on inducing function values f pu i q, where u i are the corresponding inputs These inputs are unknown a priori Find optimal ones Find them by maximizing the FI(T)C marginal likelihood with respect to the inducing inputs (and the standard hyper-parameters): u 1:M P arg max q FITC py X, u 1:M, θq u 1:M Gaussian Processes Marc 14 April
113 Inducing Inputs FI(T)C sparse approximation relies on inducing function values f pu i q, where u i are the corresponding inputs These inputs are unknown a priori Find optimal ones Find them by maximizing the FI(T)C marginal likelihood with respect to the inducing inputs (and the standard hyper-parameters): u 1:M P arg max q FITC py X, u 1:M, θq u 1:M Intuitively: The marginal likelihood is not only parameterized by the hyper-parameters θ, but also by the inducing inputs u 1:M. Gaussian Processes Marc 14 April
114 Inducing Inputs FI(T)C sparse approximation relies on inducing function values f pu i q, where u i are the corresponding inputs These inputs are unknown a priori Find optimal ones Find them by maximizing the FI(T)C marginal likelihood with respect to the inducing inputs (and the standard hyper-parameters): u 1:M P arg max q FITC py X, u 1:M, θq u 1:M Intuitively: The marginal likelihood is not only parameterized by the hyper-parameters θ, but also by the inducing inputs u 1:M. End up with a high-dimensional non-convex optimization problem with MD additional parameters Gaussian Processes Marc 14 April
115 FITC Example Pink: Original data Red crosses: Initialization of inducing inputs Blue crosses: Location of inducing inputs after optimization Figure from Ed Snelson Efficient compression of the original data set Gaussian Processes Marc 14 April
116 Summary Sparse Gaussian Processes Sparse approximations typically approximate a GP with N data points by a model with M! N data points Gaussian Processes Marc 14 April
117 Summary Sparse Gaussian Processes Sparse approximations typically approximate a GP with N data points by a model with M! N data points Selection of these M data points can be tricky and may involve non-trivial computations (e.g., optimizing inducing inputs) Gaussian Processes Marc 14 April
118 Summary Sparse Gaussian Processes Sparse approximations typically approximate a GP with N data points by a model with M! N data points Selection of these M data points can be tricky and may involve non-trivial computations (e.g., optimizing inducing inputs) Simple (random) subset selection is fast and generally robust (Chalupka et al., 2013) Gaussian Processes Marc 14 April
119 Summary Sparse Gaussian Processes Sparse approximations typically approximate a GP with N data points by a model with M! N data points Selection of these M data points can be tricky and may involve non-trivial computations (e.g., optimizing inducing inputs) Simple (random) subset selection is fast and generally robust (Chalupka et al., 2013) Computational complexity: OpM 3 q or OpNM 2 q for training Gaussian Processes Marc 14 April
120 Summary Sparse Gaussian Processes Sparse approximations typically approximate a GP with N data points by a model with M! N data points Selection of these M data points can be tricky and may involve non-trivial computations (e.g., optimizing inducing inputs) Simple (random) subset selection is fast and generally robust (Chalupka et al., 2013) Computational complexity: OpM 3 q or OpNM 2 q for training Practical limit M ď Often: M P Op10 2 q in the case of inducing variables Gaussian Processes Marc 14 April
121 Summary Sparse Gaussian Processes Sparse approximations typically approximate a GP with N data points by a model with M! N data points Selection of these M data points can be tricky and may involve non-trivial computations (e.g., optimizing inducing inputs) Simple (random) subset selection is fast and generally robust (Chalupka et al., 2013) Computational complexity: OpM 3 q or OpNM 2 q for training Practical limit M ď Often: M P Op10 2 q in the case of inducing variables If we set M N{100, i.e., each inducing function value summarizes 100 real function values, our practical limit is N P Op10 6 q Gaussian Processes Marc 14 April
122 Summary Sparse Gaussian Processes Sparse approximations typically approximate a GP with N data points by a model with M! N data points Selection of these M data points can be tricky and may involve non-trivial computations (e.g., optimizing inducing inputs) Simple (random) subset selection is fast and generally robust (Chalupka et al., 2013) Computational complexity: OpM 3 q or OpNM 2 q for training Practical limit M ď Often: M P Op10 2 q in the case of inducing variables If we set M N{100, i.e., each inducing function value summarizes 100 real function values, our practical limit is N P Op10 6 q Let s try something different Gaussian Processes Marc 14 April
123 An Orthogonal Approximation: Distributed GPs 1 1 = Gaussian Processes Marc 14 April
124 An Orthogonal Approximation: Distributed GPs 1 1 = Randomly split the full data set into M chunks Gaussian Processes Marc 14 April
125 An Orthogonal Approximation: Distributed GPs 1 1 = Randomly split the full data set into M chunks Place M independent GP models (experts) on these small chunks Gaussian Processes Marc 14 April
126 An Orthogonal Approximation: Distributed GPs 1 1 = Randomly split the full data set into M chunks Place M independent GP models (experts) on these small chunks Independent computations can be distributed Gaussian Processes Marc 14 April
127 An Orthogonal Approximation: Distributed GPs 1 1 = Randomly split the full data set into M chunks Place M independent GP models (experts) on these small chunks Independent computations can be distributed Block-diagonal approximation of kernel matrix K Gaussian Processes Marc 14 April
128 An Orthogonal Approximation: Distributed GPs 1 1 = Randomly split the full data set into M chunks Place M independent GP models (experts) on these small chunks Independent computations can be distributed Block-diagonal approximation of kernel matrix K Combine independent computations to an overall result Gaussian Processes Marc 14 April
129 Training the Distributed GP Split data set of size N into M chunks of size P Independence of experts Factorization of marginal likelihood: log ppy X, θq «ÿ M log p k 1 kpy pkq X pkq, θq Gaussian Processes Marc 14 April
130 Training the Distributed GP Split data set of size N into M chunks of size P Independence of experts Factorization of marginal likelihood: log ppy X, θq «ÿ M log p k 1 kpy pkq X pkq, θq Distributed optimization and training straightforward Gaussian Processes Marc 14 April
131 Training the Distributed GP Split data set of size N into M chunks of size P Independence of experts Factorization of marginal likelihood: log ppy X, θq «ÿ M log p k 1 kpy pkq X pkq, θq Distributed optimization and training straightforward Computational complexity: OpMP 3 q [instead of OpN 3 q] But distributed over many machines Memory footprint: OpMP 2 ` NDq [instead of OpN 2 ` NDq] Gaussian Processes Marc 14 April
132 Empirical Training Time Computation time of NLML and its gradients (s) Computation time (DGP) 10 2 Computation time (Full GP) Computation time (FITC) Number of GP experts (DGP) Training data set size Number of GP experts (DGP) NLML is proportional to training time Gaussian Processes Marc 14 April
133 Empirical Training Time Computation time of NLML and its gradients (s) Computation time (DGP) 10 2 Computation time (Full GP) Computation time (FITC) Number of GP experts (DGP) Training data set size Number of GP experts (DGP) NLML is proportional to training time Full GP (16K training points) «sparse GP (50K training points) «distributed GP (16M training points) Push practical limit by order(s) of magnitude Gaussian Processes Marc 14 April
134 Practical Training Times Training* with N 10 6, D 1 on a laptop: «30 min Training* with N 5 ˆ 10 6, D 8 on a workstation: «4 hours Gaussian Processes Marc 14 April
135 Practical Training Times Training* with N 10 6, D 1 on a laptop: «30 min Training* with N 5 ˆ 10 6, D 8 on a workstation: «4 hours *: Maximize the marginal likelihood, stop when converged** **: Convergence often after line searches*** Gaussian Processes Marc 14 April
136 Practical Training Times Training* with N 10 6, D 1 on a laptop: «30 min Training* with N 5 ˆ 10 6, D 8 on a workstation: «4 hours *: Maximize the marginal likelihood, stop when converged** **: Convergence often after line searches*** ***: Line search «2 3 evaluations of marginal likelihood and its gradient (usually OpN 3 q) Gaussian Processes Marc 14 April
137 Predictions with the Distributed GP µ, σ µ 1, σ 1 µ 2, σ 2 µ 3, σ 3 Prediction of each GP expert is Gaussian N `µ i, σ 2 i How to combine them to an overall prediction N `µ, σ 2? Gaussian Processes Marc 14 April
138 Predictions with the Distributed GP µ, σ µ 1, σ 1 µ 2, σ 2 µ 3, σ 3 Prediction of each GP expert is Gaussian N `µ i, σi 2 How to combine them to an overall prediction N `µ, σ 2? Product-of-GP-experts PoE (product of experts) (Ng & Deisenroth, 2014) gpoe (generalized product of experts) (Cao & Fleet, 2014) BCM (Bayesian Committee Machine) (Tresp, 2000) rbcm (robust BCM) (Deisenroth & Ng, 2015) Gaussian Processes Marc 14 April
139 Objectives µ, σ µ, σ µ1, σ1 µ2, σ2 µ3, σ3 µ11, σ11 µ12, σ12 µ13, σ13 µ21, σ21 µ22, σ22 µ23, σ23 µ31, σ31 µ32, σ32 µ33, σ33 Figure: Two computational graphs Scale to large data sets Gaussian Processes Marc 14 April
140 Objectives µ, σ µ, σ µ1, σ1 µ2, σ2 µ3, σ3 µ11, σ11 µ12, σ12 µ13, σ13 µ21, σ21 µ22, σ22 µ23, σ23 µ31, σ31 µ32, σ32 µ33, σ33 Figure: Two computational graphs Scale to large data sets Good approximation of full GP ( ground truth ) Gaussian Processes Marc 14 April
141 Objectives µ, σ µ, σ µ1, σ1 µ2, σ2 µ3, σ3 µ11, σ11 µ12, σ12 µ13, σ13 µ21, σ21 µ22, σ22 µ23, σ23 µ31, σ31 µ32, σ32 µ33, σ33 Figure: Two computational graphs Scale to large data sets Good approximation of full GP ( ground truth ) Predictions independent of computational graph Runs on heterogeneous computing infrastructures (laptop, cluster,...) Gaussian Processes Marc 14 April
142 Objectives µ, σ µ, σ µ1, σ1 µ2, σ2 µ3, σ3 µ11, σ11 µ12, σ12 µ13, σ13 µ21, σ21 µ22, σ22 µ23, σ23 µ31, σ31 µ32, σ32 µ33, σ33 Figure: Two computational graphs Scale to large data sets Good approximation of full GP ( ground truth ) Predictions independent of computational graph Runs on heterogeneous computing infrastructures (laptop, cluster,...) Reasonable predictive variances Gaussian Processes Marc 14 April
143 Running Example Investigate various product-of-experts models Same training procedure, but different mechanisms for predictions Gaussian Processes Marc 14 April
144 Product of GP Experts Prediction model (independent predictors): Mź pp f x, Dq p k p f x, D pkq q, k 1 p k p f x, D pkq q N ` f µ k px q, σ 2 k px q Gaussian Processes Marc 14 April
145 Product of GP Experts Prediction model (independent predictors): Mź pp f x, Dq p k p f x, D pkq q, k 1 p k p f x, D pkq q N ` f µ k px q, σ 2 k px q Predictive precision (inverse variance) and mean: q 2 ÿ σ 2 px q k pσ poe µ poe k pσ poe q ÿ 2 σ 2 px qµ k k px q k Gaussian Processes Marc 14 April
146 Product of GP Experts Prediction model (independent predictors): Mź pp f x, Dq p k p f x, D pkq q, k 1 p k p f x, D pkq q N ` f µ k px q, σ 2 k px q Predictive precision (inverse variance) and mean: q 2 ÿ σ 2 px q k pσ poe µ poe k pσ poe q ÿ 2 σ 2 px qµ k k px q Independent of the computational graph k Gaussian Processes Marc 14 April
147 Product of GP Experts Unreasonable variances for M ą 1: Gaussian Processes Marc 14 April
148 Product of GP Experts Unreasonable variances for M ą 1: pσ poe q 2 ÿ σ 2 px q k The more experts the more certain the prediction, even if every expert itself is very uncertain Cannot fall back to the prior Gaussian Processes Marc 14 April k
149 Generalized Product of GP Experts Weight the responsiblity of each expert in PoE with β k Gaussian Processes Marc 14 April
150 Generalized Product of GP Experts Weight the responsiblity of each expert in PoE with β k Prediction model (independent predictors): pp f x, Dq Mź k 1 p β k k p f x, D pkq q p k p f x, D pkq q N ` f µ k px q, σ 2 k px q Gaussian Processes Marc 14 April
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationModel-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationBayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationBayesian Image Super-Resolution
Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationLocal Gaussian Process Regression for Real Time Online Model Learning and Control
Local Gaussian Process Regression for Real Time Online Model Learning and Control Duy Nguyen-Tuong Jan Peters Matthias Seeger Max Planck Institute for Biological Cybernetics Spemannstraße 38, 776 Tübingen,
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationarxiv:1410.4984v1 [cs.dc] 18 Oct 2014
Gaussian Process Models with Parallelization and GPU acceleration arxiv:1410.4984v1 [cs.dc] 18 Oct 2014 Zhenwen Dai Andreas Damianou James Hensman Neil Lawrence Department of Computer Science University
More informationLatent variable and deep modeling with Gaussian processes; application to system identification. Andreas Damianou
Latent variable and deep modeling with Gaussian processes; application to system identification Andreas Damianou Department of Computer Science, University of Sheffield, UK Brown University, 17 Feb. 2016
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationConvolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/
Basic Filters (7) Convolution/correlation/Linear filtering Gaussian filters Smoothing and noise reduction First derivatives of Gaussian Second derivative of Gaussian: Laplacian Oriented Gaussian filters
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More information1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationPenalized Splines - A statistical Idea with numerous Applications...
Penalized Splines - A statistical Idea with numerous Applications... Göran Kauermann Ludwig-Maximilians-University Munich Graz 7. September 2011 1 Penalized Splines - A statistical Idea with numerous Applications...
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationFlexible and efficient Gaussian process models for machine learning
Flexible and efficient Gaussian process models for machine learning Edward Lloyd Snelson M.A., M.Sci., Physics, University of Cambridge, UK (2001) Gatsby Computational Neuroscience Unit University College
More informationDesigning a learning system
Lecture Designing a learning system Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x4-8845 http://.cs.pitt.edu/~milos/courses/cs750/ Design of a learning system (first vie) Application or Testing
More informationNatural Conjugate Gradient in Variational Inference
Natural Conjugate Gradient in Variational Inference Antti Honkela, Matti Tornio, Tapani Raiko, and Juha Karhunen Adaptive Informatics Research Centre, Helsinki University of Technology P.O. Box 5400, FI-02015
More informationParametric Statistical Modeling
Parametric Statistical Modeling ECE 275A Statistical Parameter Estimation Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 275A SPE Version 1.1 Fall 2012 1 / 12 Why
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x θ) Likelihood: eskmate
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationChapter 22: The Electric Field. Read Chapter 22 Do Ch. 22 Questions 3, 5, 7, 9 Do Ch. 22 Problems 5, 19, 24
Chapter : The Electric Field Read Chapter Do Ch. Questions 3, 5, 7, 9 Do Ch. Problems 5, 19, 4 The Electric Field Replaces action-at-a-distance Instead of Q 1 exerting a force directly on Q at a distance,
More informationVariance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers
Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More information1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
More informationPrinciples of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n
Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
More informationLinear Regression. Guy Lebanon
Linear Regression Guy Lebanon Linear Regression Model and Least Squares Estimation Linear regression is probably the most popular model for predicting a RV Y R based on multiple RVs X 1,..., X d R. It
More informationSome probability and statistics
Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationProbability for Estimation (review)
Probability for Estimation (review) In general, we want to develop an estimator for systems of the form: x = f x, u + η(t); y = h x + ω(t); ggggg y, ffff x We will primarily focus on discrete time linear
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationGraphical Modeling for Genomic Data
Graphical Modeling for Genomic Data Carel F.W. Peeters cf.peeters@vumc.nl Joint work with: Wessel N. van Wieringen Mark A. van de Wiel Molecular Biostatistics Unit Dept. of Epidemiology & Biostatistics
More informationGaussian Process Training with Input Noise
Gaussian Process Training with Input Noise Andrew McHutchon Department of Engineering Cambridge University Cambridge, CB PZ ajm57@cam.ac.uk Carl Edward Rasmussen Department of Engineering Cambridge University
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationMachine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationD-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationMultivariate Statistical Inference and Applications
Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim
More informationVariations of Statistical Models
38. Statistics 1 38. STATISTICS Revised September 2013 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationMath 241, Exam 1 Information.
Math 241, Exam 1 Information. 9/24/12, LC 310, 11:15-12:05. Exam 1 will be based on: Sections 12.1-12.5, 14.1-14.3. The corresponding assigned homework problems (see http://www.math.sc.edu/ boylan/sccourses/241fa12/241.html)
More informationConstrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing
Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationMonte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February
More informationServer Load Prediction
Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that
More informationTraffic Driven Analysis of Cellular Data Networks
Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationA SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS
A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationDepartment of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationProbabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationMarkov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
More informationProbability and statistics; Rehearsal for pattern recognition
Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationSection 5. Stan for Big Data. Bob Carpenter. Columbia University
Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationThe Kelly Betting System for Favorable Games.
The Kelly Betting System for Favorable Games. Thomas Ferguson, Statistics Department, UCLA A Simple Example. Suppose that each day you are offered a gamble with probability 2/3 of winning and probability
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More information