Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
|
|
- Archibald Nicholson
- 7 years ago
- Views:
Transcription
1 Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission Problems, Moscow, Russia 2 DATADVANCE, llc, Moscow, Russia 3 PreMoLab, MIPT, Dolgoprudny, Russia ICML 2014 workshop on New Learning Frameworks and Models for Big Data Beijing, /39 Burnaev Evgeny
2 Approximation task Problem statement Let y = g (x) be some unknown function. The training sample is given D = {x i, y i}, g (x i) = y i, i = 1,..., N. The task is to construct ˆf (x) such that: ˆf (x) g(x). 2/39 Burnaev Evgeny
3 Factorial Design of Experiments Factors: s k = {x k i k X k, i k = 1,...,n k }, X k R d k, k = 1,..., K; d k dimensionality of the factor s k. Factorial Design of Experiments: S = {x i} N i=1 = s 1 s 2 s K. Dimensionality of x S: d = K k=1 d k. Sample size: N = K k=1 n k x x 1 3/39 Burnaev Evgeny
4 Example of Factorial DoE x x 1 1 x 2 Figure : Factorial DoE with multidimensional factor 4/39 Burnaev Evgeny
5 DoE in engineering problems 1 Independent groups of variables factors. 2 Training sample generation procedure: Fix values of the first factor. Perform experiments varying values of other factors. Fix other value of the first factor. Perform new series of experiments. 3 Take into account knowledge from a subject domain. Data properties Has special structure Can have large sample size Factors sizes can differ significantly 5/39 Burnaev Evgeny
6 Example of Factorial DoE Modeling of pressure distribution over the aircraft wing: Angle of attack: s 1 = {0, 0.8, 1.6, 2.4, 3.2, 4.0}. Mach number: s 2 = {0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83}. Wing points coordinates: s point in R 3. The training sample: S = s 1 s 2 s 3. Dimensionality d = = 5. Sample size N = = /39 Burnaev Evgeny
7 Existing solutions Universal techniques: Disadvantages: don t take into account sample structure low approximation quality, high computational complexity Multivariate Adaptive Regression Splines [Friedman, 1991] Disadvantages: discontinuous derivatives, non-physical behaviour Tensor product of splines [Stone et al., 1997, Xiao et al., 2013] Disadvantages: only one-dimensional factors, no accuracy evaluation procedure Gaussian Processes on lattice [Dietrich and Newsam, 1997, Stroud et al., 2014] Disadvantages: two-dimensional grid with equidistant points 7/39 Burnaev Evgeny
8 The aim is to develop computationally efficient algorithm taking into account special features of factorial Design of Experiments 8/39 Burnaev Evgeny
9 Gaussian Process Regression Function model g(x) = f(x) + ε(x), where f(x) Gaussian process (GP), ε(x) Gaussian white noise. GP is fully defined by its mean and covariance function. The covariance function of f(x) K f (x, x ) = σ 2 f exp ( ) d θi 2 (x (i) x (i) ) 2, where x (i) i-th component of vector, θ = (σ 2 f, θ 1,..., θ d ) parameters of the covariance function. The covariance function of g(x): δ(x, x ) Kronecker delta. i=1 K g(x, x ) = K(x, x ) + σ 2 noiseδ(x, x ), 9/39 Burnaev Evgeny
10 Choosing parameters Maximum Likelihood Estimation Loglikelihood log p(y X, θ) = 1 2 yt K 1 g y 1 2 log Kg N 2 log 2π, where K g determinant of matrix K g = K g(x i, x j) N i,j=1, xi, xj S. Parameters θ are chosen such that θ = arg max(log p(y X, θ)) θ 10/39 Burnaev Evgeny
11 Final model Prediction of g(x) at point x ˆf(x) = k T K 1 g y, where k = (K f (x 1, x),..., K f (x n, x)). Posterior variance σ 2 (x) = K f (x, x) k T K 1 g k. 11/39 Burnaev Evgeny
12 Gaussian Processes for factorial DoE Issues: 1 High computational complexity: O(N 3 ). In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 12/39 Burnaev Evgeny
13 Gaussian Processes for factorial DoE Issues: 1 High computational complexity: O(N 3 ). In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 12/39 Burnaev Evgeny
14 Estimation of covariance function parameters Loglikelihood: Derivatives: log p(y X, θ) = 1 2 yt K 1 g y 1 2 log Kg N 2 log 2π, θ (log p(y X, σ f, σ noise)) = 1 2 Tr(K 1 g K ) yt K 1 g K K 1 g y, where θ is a parameter of covariance function (component of θ i, σ noise or σ f,i, i = 1,..., d), and K = K θ 13/39 Burnaev Evgeny
15 Tensors and Kronecker product Definition Tensor Y is a K-dimensional matrix of size n 1 n 2 n K: Y = { y i1,i 2,...,i K, {i k = 1,..., n k } K k=1}. Definition The Kronecker product of matrices A and B is a block matrix a 11B a 1nB A B = a m1b a mnb 14/39 Burnaev Evgeny
16 Related operations Operation vec: vec(y) = [Y 1,1,...,1, Y 2,1,...,1,..., Y n1,1,...,1, Y 1,2,...,1,..., Y n1,n 2,...,n K ]. Multiplication of a tensor by a matrix along k-th direction Z = Y k B Z i1,...,i k 1,j,i k+1,...i K = Y i1,...,i k,...,i K B ik j. i k Connection between tensors and the Kronecker product: vec(y 1 B 1 K B K) = (B 1 B K)vec(Y) (1) Complexity of computation of the left part O(N k n k), of the right part O(N 2 ). 15/39 Burnaev Evgeny
17 Covariance function Form of the covariance function: K K f (x, y) = k i(x i, y i ), x i, y i s i, i=1 where k i is an arbitrary covariance function for i-th factor. Covariance matrix: K K = K i, i=1 K i is a covariance matrix for i-th factor. 16/39 Burnaev Evgeny
18 Fast computation of loglikelihood Proposition Let K i = U id iu T i be a Singular Value Decomposition (SVD) of the matrix K i, where U i is an orthogonal matrix, and D i is diagonal. Then: K g = D i1,...,i K, i 1,...,i K ( ) ( ) 1 ( ) K 1 g = U T k D k + σnoisei 2 U k, (2) k k K 1 g y = vec [((Y 1 U 1 K U K) D 1) ] 1 U T 1 K U T K, k where D is a tensor of diagonal elements of the matrix σ 2 noisei + k D k 17/39 Burnaev Evgeny
19 Computational complexity Proposition Calculation of the loglikelihood using (2) has the following computation complexity ( ) K K O N n i +. i=1 i=1 n 3 i Assuming n i N (number of factors is large and their sizes are close) we get O (N ) ( ) n i = O N 1+ K 1. 18/39 Burnaev Evgeny
20 Fast computation of derivatives Proposition The following statements hold Tr(K 1 g K ) = diag ( ˆD 1 ), K diag ( U ik ) iu i, i=1 1 2 yt K 1 g K K 1 g y = A, A 1 K T 1 2 i 1 K T i 1 i i K T i θ where ˆD = σnoisei 2 + k D k, and vec(a) = K 1 g y. The computational complexity is ( ) K K O N n i +. i=1 i+1 KT i+1 i+2 K K T K i=1 n 3 i, 19/39 Burnaev Evgeny
21 Gaussian Processes for facotrial DoE Issues: 1 High computational complexity: O(N 3 ). In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 20/39 Burnaev Evgeny
22 Gaussian Processes for facotrial DoE Issues: 1 High computational complexity: O(N 3 ). In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 20/39 Burnaev Evgeny
23 Degeneracy Example of degeneracy True function training set GP regression training set x x x x Figure : Original function Figure : Approximation obtained using GP from GPML toolbox 21/39 Burnaev Evgeny
24 Regularization Prior distribution: θ (i) k b (i) k a(i) k a(i) k Be(α, β), {i = 1,..., d k } K k=1, a (i) k = c k max y (i) ), x,y s k(x(i) b(i) k = C k (x (i) y (i) ) min x,y s k,x y where Be(α, β) is the Beta distribution, c k and C k are parameters of the algorithm (we use c k = 0.01 and C k = 2). Initialization [ )] 1 θ (i) 1 k (max = ) min ). n k x s k(x(i) x s k(x(i) 22/39 Burnaev Evgeny
25 Regularization Loglikelihood: log p(y X, θ, σ f, σ noise) = 1 2 yt K 1 y y 1 2 log Ky N 2 + k,i ( (α 1) log ( θ (i) k b (i) k a(i) k a(i) k ) + (β 1) log ( 1 θ(i) k b (i) k a(i) k a(i) k log 2π )) d log(b(α, β)). 23/39 Burnaev Evgeny
26 Regularization Loglikelihood: log p(y X, θ, σ f, σ noise) = 1 2 yt K 1 y y 1 2 log Ky N 2 + k,i ( (α 1) log ( θ (i) k b (i) k a(i) k a(i) k ) + (β 1) log ( 1 θ(i) k b (i) k a(i) k a(i) k log 2π )) d log(b(α, β)) log(pdf) a = b = x Figure : Penalty function 23/39 Burnaev Evgeny
27 Regularization Example of regularized approximation x True function training set x x tensorgp reg training set x Figure : Original function Figure : Approximation obtained using developed algorithm with regularization 24/39 Burnaev Evgeny
28 Toy problems Set of 34 functions (usual artificial test functions used for testing of global optimization algorithms) Dimensionality from 2 to 6. Sample sizes from 80 to with full factorial DoE Quality criteria training time and approximation error: MSE = 1 N test ( N ˆf(x i) g(x i)) 2 test i=1 Tested algorithms: MARS Multivariate Adaptive Regression Splines SparseGP sparse Gaussian Processes (GPML toolbox) tensorgp developed algorithm without regularization tensorgp-reg developed algorithm with regularization 25/39 Burnaev Evgeny
29 Dolan-Moré curves T problems, A algorithms. e ta approximation error (or training time) of a-th algorithm on t-th problem. ẽ t = min a e ta. ρ a(τ) = #{t : eta < τẽt} T The higher the curve is the better works the corresponding algorithm. ρ a(1) fraction of problems for which a-th algorithm worked the best. 26/39 Burnaev Evgeny
30 Results of experiments: training time Figure : Dolan-Moré curves. Quality criterion training time 27/39 Burnaev Evgeny
31 Results of experiments: approximation quality Figure : Dolan-Moré curves. Quality criterion MSE 28/39 Burnaev Evgeny
32 Real problem: Rotating disc of an impeller Objective functions: p 1 contact pressure. S rmax maximum radial stress. w weight of disc. The geometrical shape of the disc is parametrized by 6 input variables x = (h 1, h 2, h 3, h 4, r 2, r 3) (r 1 and r 4 are fixed) Training sample (generated from computational physical model): Sample size Factor sizes {1, 8, 8, 3, 15, 5} Figure : Rotating disc of an impeller 29/39 Burnaev Evgeny
33 Results of experiment Table : Approximation errors of p 1 MAE MSE RRMS MARS SparseGP tensorgp-reg RRMS = MAE = 1 N test N ˆf(x i) g(x i), test MSE = 1 N test i=1 ( ˆf(x i) g(x i)) 2 N test i=1 Ntest i=1 ( ˆf(x i) g(x i)) 2 Ntest i=1 (ȳ g(xi))2, ȳ = 1 N N y i. i=1 30/39 Burnaev Evgeny
34 Results of approximation 2D-slices of approximations (other parameters are fixed) p tensorgp reg training set test set x5 Figure : GPML Sparse GP is applied 31/39 Burnaev Evgeny x6 Figure : Approximation obtained using developed algorithm with regularization 350
35 Further research: missing data Reasons for missing points: Data generation is in progress (each point calculation is time consuming). Data generator failed in some points. 32/39 Burnaev Evgeny
36 Incomplete Factorial DoE S full full factorial DoE. N full = S full. S incomplete factorial DoE. N = S. S S full Covariance matrix is not the Kronecker product! K K f K i. i=1 x x 1 33/39 Burnaev Evgeny
37 Computation of loglikelihood Notation {x i} N full i=1 = S full. K f covariance matrix for full factorial DoE S full. { 1, if x i S W diagonal matrix such that W ii = 0, if x i / S. ỹ vector of outputs y extended by arbitrary values. Proposition Let z be a solution of ( K f W K f + σ 2 noise K f ) z = K f Wỹ. (3) Then the solution z of (K f + σ 2 noisei)z = y, i.e. z = (K f + σ 2 noisei) 1 y, has the form z = ( z i 1,..., z i N ), where i k {j : x j S}, k = 1,..., N. 34/39 Burnaev Evgeny
38 Computation of loglikelihood R = N full N number of missing points. Ũ = K i=1 Ũi D = K i=1 D i. ˆ D = D + σ 2 noisei. The change of variables ( ) 12 ˆ D D Ũ T z if R < N, α = ( D) 12 σnoise 2 Ũ T z if R N. (4) Proposition System of linear equations (3) can be solved using the change of variables (4) and Conjugate Gradient Method in at most min(r + 1, N + 1) iterations. The computational complexity of each iteration is O(N full k n k). 35/39 Burnaev Evgeny
39 Computation of determinant ( ) K f + σnoisei 2 Kg A = A T B Determinant Proposition K g = K f + σ 2 noisei B A T K 1 g A. (5) K f + σnoise 2 I is computed using formulae for full factorial case. B A T K 1 g A is computed numerically. The complexity of computing determinant using (5) is O ( min{r + 1, N + 1}RN full k n k). 36/39 Burnaev Evgeny
40 Conclusion The developed algorithm is computationally efficient; can handle large samples; takes into account features of given data; is proved to be efficient on a large set of toy problems as well as real world problems. 37/39 Burnaev Evgeny
41 Thank you for attention! More details are given in Belyaev, M., Burnaev, E., and Kapushev, Y. (2014). Exact inference for gaussian process regression in case of big data with the cartesian product structure. arxiv preprint arxiv: /39 Burnaev Evgeny
42 References Dietrich, C. R. and Newsam, G. N. (1997). Fast and exact simulation of stationary gaussian processes through circulant embedding of the covariance matrix. SIAM J. Sci. Comput., 18(4): Friedman, J. (1991). Multivariate adaptive regression splines. Annals of Statistic, 19(1): Stone, C., Hansen, M., Kooperberg, C., and Truong, Y. (1997). Polynomial splines, their tensor products in extended linear modeling. Annals of Statistic, 25: Stroud, J. R., Stein, M. L., and Lysen, S. (2014). Bayesian and maximum likelihood estimation for gaussian processes on an incomplete lattice. arxiv preprint arxiv: Xiao, L., Li, Y., and Ruppert, D. (2013). Fast bivariate p-splines: the sandwich smoother. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(3): /39 Burnaev Evgeny
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationFitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,
More informationVariance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers
Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications
More informationHETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg daniel.harenberg@gmx.de. University of Mannheim. Econ 714, 28.11.
COMPUTING EQUILIBRIUM WITH HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY (BASED ON KRUEGER AND KUBLER, 2004) Daniel Harenberg daniel.harenberg@gmx.de University of Mannheim Econ 714, 28.11.06 Daniel Harenberg
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationWeb-based Supplementary Materials for. Modeling of Hormone Secretion-Generating. Mechanisms With Splines: A Pseudo-Likelihood.
Web-based Supplementary Materials for Modeling of Hormone Secretion-Generating Mechanisms With Splines: A Pseudo-Likelihood Approach by Anna Liu and Yuedong Wang Web Appendix A This appendix computes mean
More informationGLAM Array Methods in Statistics
GLAM Array Methods in Statistics Iain Currie Heriot Watt University A Generalized Linear Array Model is a low-storage, high-speed, GLAM method for multidimensional smoothing, when data forms an array,
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationFiltered Gaussian Processes for Learning with Large Data-Sets
Filtered Gaussian Processes for Learning with Large Data-Sets Jian Qing Shi, Roderick Murray-Smith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
More informationLecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationMultidimensional data and factorial methods
Multidimensional data and factorial methods Bidimensional data x 5 4 3 4 X 3 6 X 3 5 4 3 3 3 4 5 6 x Cartesian plane Multidimensional data n X x x x n X x x x n X m x m x m x nm Factorial plane Interpretation
More informationNumerical methods for American options
Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment
More informationMath Placement Test Practice Problems
Math Placement Test Practice Problems The following problems cover material that is used on the math placement test to place students into Math 1111 College Algebra, Math 1113 Precalculus, and Math 2211
More informationANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationState of Stress at Point
State of Stress at Point Einstein Notation The basic idea of Einstein notation is that a covector and a vector can form a scalar: This is typically written as an explicit sum: According to this convention,
More informationIntroduction: Overview of Kernel Methods
Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University
More informationExtreme Value Modeling for Detection and Attribution of Climate Extremes
Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016
More informationReject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationNeural Networks: a replacement for Gaussian Processes?
Neural Networks: a replacement for Gaussian Processes? Matthew Lilley and Marcus Frean Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand marcus@mcs.vuw.ac.nz http://www.mcs.vuw.ac.nz/
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationNote on the EM Algorithm in Linear Regression Model
International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University
More informationContrôle dynamique de méthodes d approximation
Contrôle dynamique de méthodes d approximation Fabienne Jézéquel Laboratoire d Informatique de Paris 6 ARINEWS, ENS Lyon, 7-8 mars 2005 F. Jézéquel Dynamical control of approximation methods 7-8 Mar. 2005
More informationOrthogonal Distance Regression
Applied and Computational Mathematics Division NISTIR 89 4197 Center for Computing and Applied Mathematics Orthogonal Distance Regression Paul T. Boggs and Janet E. Rogers November, 1989 (Revised July,
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationMAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX
MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX KRISTOFFER P. NIMARK The next section derives the equilibrium expressions for the beauty contest model from Section 3 of the main paper. This is followed by
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationLecture Topic: Low-Rank Approximations
Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original
More informationGaussian Conjugate Prior Cheat Sheet
Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian
More informationSmoothing. Fitting without a parametrization
Smoothing or Fitting without a parametrization Volker Blobel University of Hamburg March 2005 1. Why smoothing 2. Running median smoother 3. Orthogonal polynomials 4. Transformations 5. Spline functions
More informationGaussian Processes for Big Data Problems
Gaussian Processes for Big Data Problems Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Machine Learning Summer School, Chalmers University 14
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationSolving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.
Solving Linear Systems of Equations Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.edu These slides are a supplement to the book Numerical Methods with Matlab:
More informationAnalyzing Structural Equation Models With Missing Data
Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.
More informationPractical Approaches to Principal Component Analysis in the Presence of Missing Values
Journal of Machine Learning Research () 957- Submitted 6/8; Revised /9; Published 7/ Practical Approaches to Principal Component Analysis in the Presence of Missing Values Alexander Ilin Tapani Raiko Adaptive
More informationFactorial experimental designs and generalized linear models
Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationNonlinear Algebraic Equations Example
Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities
More informationCyber-Security Analysis of State Estimators in Power Systems
Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute
More informationIntroduction to mixed model and missing data issues in longitudinal studies
Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationDERIVATIVES AS MATRICES; CHAIN RULE
DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationGenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1
(R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationUnit 3 (Review of) Language of Stress/Strain Analysis
Unit 3 (Review of) Language of Stress/Strain Analysis Readings: B, M, P A.2, A.3, A.6 Rivello 2.1, 2.2 T & G Ch. 1 (especially 1.7) Paul A. Lagace, Ph.D. Professor of Aeronautics & Astronautics and Engineering
More informationMoving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationMachine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationOptimal shift scheduling with a global service level constraint
Optimal shift scheduling with a global service level constraint Ger Koole & Erik van der Sluis Vrije Universiteit Division of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationα = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationAn explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach
Intro B, W, M, & R SPDE/GMRF Example End An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach Finn Lindgren 1 Håvard Rue 1 Johan
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationExtrinsic geometric flows
On joint work with Vladimir Rovenski from Haifa Paweł Walczak Uniwersytet Łódzki CRM, Bellaterra, July 16, 2010 Setting Throughout this talk: (M, F, g 0 ) is a (compact, complete, any) foliated, Riemannian
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
More informationDesigning a learning system
Lecture Designing a learning system Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x4-8845 http://.cs.pitt.edu/~milos/courses/cs750/ Design of a learning system (first vie) Application or Testing
More informationRegression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013
Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear
More informationMath 241, Exam 1 Information.
Math 241, Exam 1 Information. 9/24/12, LC 310, 11:15-12:05. Exam 1 will be based on: Sections 12.1-12.5, 14.1-14.3. The corresponding assigned homework problems (see http://www.math.sc.edu/ boylan/sccourses/241fa12/241.html)
More informationTutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More information1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationRecall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.
ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?
More informationMAT188H1S Lec0101 Burbulla
Winter 206 Linear Transformations A linear transformation T : R m R n is a function that takes vectors in R m to vectors in R n such that and T (u + v) T (u) + T (v) T (k v) k T (v), for all vectors u
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationSmoothing and Non-Parametric Regression
Smoothing and Non-Parametric Regression Germán Rodríguez grodri@princeton.edu Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest
More informationA General Approach to Variance Estimation under Imputation for Missing Survey Data
A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey
More informationIntroduction to the Finite Element Method
Introduction to the Finite Element Method 09.06.2009 Outline Motivation Partial Differential Equations (PDEs) Finite Difference Method (FDM) Finite Element Method (FEM) References Motivation Figure: cross
More informationBayesian Image Super-Resolution
Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationHowHow To Choose A Good Stock Broker
AN EMPIRICAL LIELIHOOD METHOD FOR DATA AIDED CHANNEL IDENTIFICATION IN UNNOWN NOISE FIELD Frédéric Pascal 1, Jean-Pierre Barbot 1, Hugo Harari-ermadec, Ricardo Suyama 3, and Pascal Larzabal 1 1 SATIE,
More informationClassification by Pairwise Coupling
Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating
More informationDerivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
More informationO(n 3 ) O(4 d nlog 2d n) O(n) O(4 d log 2d n)
O(n 3 )O(4 d nlog 2d n) O(n)O(4 d log 2d n) O(n 3 ) O(n) O(n 3 ) O(4 d nlog 2d n) O(n)O(4 d log 2d n) 1. Fast Gaussian Filtering Algorithm 2. Binary Classification Using The Fast Gaussian Filtering Algorithm
More informationComplex Numbers. w = f(z) z. Examples
omple Numbers Geometrical Transformations in the omple Plane For functions of a real variable such as f( sin, g( 2 +2 etc ou are used to illustrating these geometricall, usuall on a cartesian graph. If
More information