Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Size: px
Start display at page:

Download "Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure"

Transcription

1 Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission Problems, Moscow, Russia 2 DATADVANCE, llc, Moscow, Russia 3 PreMoLab, MIPT, Dolgoprudny, Russia ICML 2014 workshop on New Learning Frameworks and Models for Big Data Beijing, /39 Burnaev Evgeny

2 Approximation task Problem statement Let y = g (x) be some unknown function. The training sample is given D = {x i, y i}, g (x i) = y i, i = 1,..., N. The task is to construct ˆf (x) such that: ˆf (x) g(x). 2/39 Burnaev Evgeny

3 Factorial Design of Experiments Factors: s k = {x k i k X k, i k = 1,...,n k }, X k R d k, k = 1,..., K; d k dimensionality of the factor s k. Factorial Design of Experiments: S = {x i} N i=1 = s 1 s 2 s K. Dimensionality of x S: d = K k=1 d k. Sample size: N = K k=1 n k x x 1 3/39 Burnaev Evgeny

4 Example of Factorial DoE x x 1 1 x 2 Figure : Factorial DoE with multidimensional factor 4/39 Burnaev Evgeny

5 DoE in engineering problems 1 Independent groups of variables factors. 2 Training sample generation procedure: Fix values of the first factor. Perform experiments varying values of other factors. Fix other value of the first factor. Perform new series of experiments. 3 Take into account knowledge from a subject domain. Data properties Has special structure Can have large sample size Factors sizes can differ significantly 5/39 Burnaev Evgeny

6 Example of Factorial DoE Modeling of pressure distribution over the aircraft wing: Angle of attack: s 1 = {0, 0.8, 1.6, 2.4, 3.2, 4.0}. Mach number: s 2 = {0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83}. Wing points coordinates: s point in R 3. The training sample: S = s 1 s 2 s 3. Dimensionality d = = 5. Sample size N = = /39 Burnaev Evgeny

7 Existing solutions Universal techniques: Disadvantages: don t take into account sample structure low approximation quality, high computational complexity Multivariate Adaptive Regression Splines [Friedman, 1991] Disadvantages: discontinuous derivatives, non-physical behaviour Tensor product of splines [Stone et al., 1997, Xiao et al., 2013] Disadvantages: only one-dimensional factors, no accuracy evaluation procedure Gaussian Processes on lattice [Dietrich and Newsam, 1997, Stroud et al., 2014] Disadvantages: two-dimensional grid with equidistant points 7/39 Burnaev Evgeny

8 The aim is to develop computationally efficient algorithm taking into account special features of factorial Design of Experiments 8/39 Burnaev Evgeny

9 Gaussian Process Regression Function model g(x) = f(x) + ε(x), where f(x) Gaussian process (GP), ε(x) Gaussian white noise. GP is fully defined by its mean and covariance function. The covariance function of f(x) K f (x, x ) = σ 2 f exp ( ) d θi 2 (x (i) x (i) ) 2, where x (i) i-th component of vector, θ = (σ 2 f, θ 1,..., θ d ) parameters of the covariance function. The covariance function of g(x): δ(x, x ) Kronecker delta. i=1 K g(x, x ) = K(x, x ) + σ 2 noiseδ(x, x ), 9/39 Burnaev Evgeny

10 Choosing parameters Maximum Likelihood Estimation Loglikelihood log p(y X, θ) = 1 2 yt K 1 g y 1 2 log Kg N 2 log 2π, where K g determinant of matrix K g = K g(x i, x j) N i,j=1, xi, xj S. Parameters θ are chosen such that θ = arg max(log p(y X, θ)) θ 10/39 Burnaev Evgeny

11 Final model Prediction of g(x) at point x ˆf(x) = k T K 1 g y, where k = (K f (x 1, x),..., K f (x n, x)). Posterior variance σ 2 (x) = K f (x, x) k T K 1 g k. 11/39 Burnaev Evgeny

12 Gaussian Processes for factorial DoE Issues: 1 High computational complexity: O(N 3 ). In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 12/39 Burnaev Evgeny

13 Gaussian Processes for factorial DoE Issues: 1 High computational complexity: O(N 3 ). In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 12/39 Burnaev Evgeny

14 Estimation of covariance function parameters Loglikelihood: Derivatives: log p(y X, θ) = 1 2 yt K 1 g y 1 2 log Kg N 2 log 2π, θ (log p(y X, σ f, σ noise)) = 1 2 Tr(K 1 g K ) yt K 1 g K K 1 g y, where θ is a parameter of covariance function (component of θ i, σ noise or σ f,i, i = 1,..., d), and K = K θ 13/39 Burnaev Evgeny

15 Tensors and Kronecker product Definition Tensor Y is a K-dimensional matrix of size n 1 n 2 n K: Y = { y i1,i 2,...,i K, {i k = 1,..., n k } K k=1}. Definition The Kronecker product of matrices A and B is a block matrix a 11B a 1nB A B = a m1b a mnb 14/39 Burnaev Evgeny

16 Related operations Operation vec: vec(y) = [Y 1,1,...,1, Y 2,1,...,1,..., Y n1,1,...,1, Y 1,2,...,1,..., Y n1,n 2,...,n K ]. Multiplication of a tensor by a matrix along k-th direction Z = Y k B Z i1,...,i k 1,j,i k+1,...i K = Y i1,...,i k,...,i K B ik j. i k Connection between tensors and the Kronecker product: vec(y 1 B 1 K B K) = (B 1 B K)vec(Y) (1) Complexity of computation of the left part O(N k n k), of the right part O(N 2 ). 15/39 Burnaev Evgeny

17 Covariance function Form of the covariance function: K K f (x, y) = k i(x i, y i ), x i, y i s i, i=1 where k i is an arbitrary covariance function for i-th factor. Covariance matrix: K K = K i, i=1 K i is a covariance matrix for i-th factor. 16/39 Burnaev Evgeny

18 Fast computation of loglikelihood Proposition Let K i = U id iu T i be a Singular Value Decomposition (SVD) of the matrix K i, where U i is an orthogonal matrix, and D i is diagonal. Then: K g = D i1,...,i K, i 1,...,i K ( ) ( ) 1 ( ) K 1 g = U T k D k + σnoisei 2 U k, (2) k k K 1 g y = vec [((Y 1 U 1 K U K) D 1) ] 1 U T 1 K U T K, k where D is a tensor of diagonal elements of the matrix σ 2 noisei + k D k 17/39 Burnaev Evgeny

19 Computational complexity Proposition Calculation of the loglikelihood using (2) has the following computation complexity ( ) K K O N n i +. i=1 i=1 n 3 i Assuming n i N (number of factors is large and their sizes are close) we get O (N ) ( ) n i = O N 1+ K 1. 18/39 Burnaev Evgeny

20 Fast computation of derivatives Proposition The following statements hold Tr(K 1 g K ) = diag ( ˆD 1 ), K diag ( U ik ) iu i, i=1 1 2 yt K 1 g K K 1 g y = A, A 1 K T 1 2 i 1 K T i 1 i i K T i θ where ˆD = σnoisei 2 + k D k, and vec(a) = K 1 g y. The computational complexity is ( ) K K O N n i +. i=1 i+1 KT i+1 i+2 K K T K i=1 n 3 i, 19/39 Burnaev Evgeny

21 Gaussian Processes for facotrial DoE Issues: 1 High computational complexity: O(N 3 ). In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 20/39 Burnaev Evgeny

22 Gaussian Processes for facotrial DoE Issues: 1 High computational complexity: O(N 3 ). In case of factorial DoE the sample size N can be very large. 2 Degeneracy as a result of significantly different factor sizes. 20/39 Burnaev Evgeny

23 Degeneracy Example of degeneracy True function training set GP regression training set x x x x Figure : Original function Figure : Approximation obtained using GP from GPML toolbox 21/39 Burnaev Evgeny

24 Regularization Prior distribution: θ (i) k b (i) k a(i) k a(i) k Be(α, β), {i = 1,..., d k } K k=1, a (i) k = c k max y (i) ), x,y s k(x(i) b(i) k = C k (x (i) y (i) ) min x,y s k,x y where Be(α, β) is the Beta distribution, c k and C k are parameters of the algorithm (we use c k = 0.01 and C k = 2). Initialization [ )] 1 θ (i) 1 k (max = ) min ). n k x s k(x(i) x s k(x(i) 22/39 Burnaev Evgeny

25 Regularization Loglikelihood: log p(y X, θ, σ f, σ noise) = 1 2 yt K 1 y y 1 2 log Ky N 2 + k,i ( (α 1) log ( θ (i) k b (i) k a(i) k a(i) k ) + (β 1) log ( 1 θ(i) k b (i) k a(i) k a(i) k log 2π )) d log(b(α, β)). 23/39 Burnaev Evgeny

26 Regularization Loglikelihood: log p(y X, θ, σ f, σ noise) = 1 2 yt K 1 y y 1 2 log Ky N 2 + k,i ( (α 1) log ( θ (i) k b (i) k a(i) k a(i) k ) + (β 1) log ( 1 θ(i) k b (i) k a(i) k a(i) k log 2π )) d log(b(α, β)) log(pdf) a = b = x Figure : Penalty function 23/39 Burnaev Evgeny

27 Regularization Example of regularized approximation x True function training set x x tensorgp reg training set x Figure : Original function Figure : Approximation obtained using developed algorithm with regularization 24/39 Burnaev Evgeny

28 Toy problems Set of 34 functions (usual artificial test functions used for testing of global optimization algorithms) Dimensionality from 2 to 6. Sample sizes from 80 to with full factorial DoE Quality criteria training time and approximation error: MSE = 1 N test ( N ˆf(x i) g(x i)) 2 test i=1 Tested algorithms: MARS Multivariate Adaptive Regression Splines SparseGP sparse Gaussian Processes (GPML toolbox) tensorgp developed algorithm without regularization tensorgp-reg developed algorithm with regularization 25/39 Burnaev Evgeny

29 Dolan-Moré curves T problems, A algorithms. e ta approximation error (or training time) of a-th algorithm on t-th problem. ẽ t = min a e ta. ρ a(τ) = #{t : eta < τẽt} T The higher the curve is the better works the corresponding algorithm. ρ a(1) fraction of problems for which a-th algorithm worked the best. 26/39 Burnaev Evgeny

30 Results of experiments: training time Figure : Dolan-Moré curves. Quality criterion training time 27/39 Burnaev Evgeny

31 Results of experiments: approximation quality Figure : Dolan-Moré curves. Quality criterion MSE 28/39 Burnaev Evgeny

32 Real problem: Rotating disc of an impeller Objective functions: p 1 contact pressure. S rmax maximum radial stress. w weight of disc. The geometrical shape of the disc is parametrized by 6 input variables x = (h 1, h 2, h 3, h 4, r 2, r 3) (r 1 and r 4 are fixed) Training sample (generated from computational physical model): Sample size Factor sizes {1, 8, 8, 3, 15, 5} Figure : Rotating disc of an impeller 29/39 Burnaev Evgeny

33 Results of experiment Table : Approximation errors of p 1 MAE MSE RRMS MARS SparseGP tensorgp-reg RRMS = MAE = 1 N test N ˆf(x i) g(x i), test MSE = 1 N test i=1 ( ˆf(x i) g(x i)) 2 N test i=1 Ntest i=1 ( ˆf(x i) g(x i)) 2 Ntest i=1 (ȳ g(xi))2, ȳ = 1 N N y i. i=1 30/39 Burnaev Evgeny

34 Results of approximation 2D-slices of approximations (other parameters are fixed) p tensorgp reg training set test set x5 Figure : GPML Sparse GP is applied 31/39 Burnaev Evgeny x6 Figure : Approximation obtained using developed algorithm with regularization 350

35 Further research: missing data Reasons for missing points: Data generation is in progress (each point calculation is time consuming). Data generator failed in some points. 32/39 Burnaev Evgeny

36 Incomplete Factorial DoE S full full factorial DoE. N full = S full. S incomplete factorial DoE. N = S. S S full Covariance matrix is not the Kronecker product! K K f K i. i=1 x x 1 33/39 Burnaev Evgeny

37 Computation of loglikelihood Notation {x i} N full i=1 = S full. K f covariance matrix for full factorial DoE S full. { 1, if x i S W diagonal matrix such that W ii = 0, if x i / S. ỹ vector of outputs y extended by arbitrary values. Proposition Let z be a solution of ( K f W K f + σ 2 noise K f ) z = K f Wỹ. (3) Then the solution z of (K f + σ 2 noisei)z = y, i.e. z = (K f + σ 2 noisei) 1 y, has the form z = ( z i 1,..., z i N ), where i k {j : x j S}, k = 1,..., N. 34/39 Burnaev Evgeny

38 Computation of loglikelihood R = N full N number of missing points. Ũ = K i=1 Ũi D = K i=1 D i. ˆ D = D + σ 2 noisei. The change of variables ( ) 12 ˆ D D Ũ T z if R < N, α = ( D) 12 σnoise 2 Ũ T z if R N. (4) Proposition System of linear equations (3) can be solved using the change of variables (4) and Conjugate Gradient Method in at most min(r + 1, N + 1) iterations. The computational complexity of each iteration is O(N full k n k). 35/39 Burnaev Evgeny

39 Computation of determinant ( ) K f + σnoisei 2 Kg A = A T B Determinant Proposition K g = K f + σ 2 noisei B A T K 1 g A. (5) K f + σnoise 2 I is computed using formulae for full factorial case. B A T K 1 g A is computed numerically. The complexity of computing determinant using (5) is O ( min{r + 1, N + 1}RN full k n k). 36/39 Burnaev Evgeny

40 Conclusion The developed algorithm is computationally efficient; can handle large samples; takes into account features of given data; is proved to be efficient on a large set of toy problems as well as real world problems. 37/39 Burnaev Evgeny

41 Thank you for attention! More details are given in Belyaev, M., Burnaev, E., and Kapushev, Y. (2014). Exact inference for gaussian process regression in case of big data with the cartesian product structure. arxiv preprint arxiv: /39 Burnaev Evgeny

42 References Dietrich, C. R. and Newsam, G. N. (1997). Fast and exact simulation of stationary gaussian processes through circulant embedding of the covariance matrix. SIAM J. Sci. Comput., 18(4): Friedman, J. (1991). Multivariate adaptive regression splines. Annals of Statistic, 19(1): Stone, C., Hansen, M., Kooperberg, C., and Truong, Y. (1997). Polynomial splines, their tensor products in extended linear modeling. Annals of Statistic, 25: Stroud, J. R., Stein, M. L., and Lysen, S. (2014). Bayesian and maximum likelihood estimation for gaussian processes on an incomplete lattice. arxiv preprint arxiv: Xiao, L., Li, Y., and Ruppert, D. (2013). Fast bivariate p-splines: the sandwich smoother. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(3): /39 Burnaev Evgeny

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

More information

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications

More information

HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg daniel.harenberg@gmx.de. University of Mannheim. Econ 714, 28.11.

HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg daniel.harenberg@gmx.de. University of Mannheim. Econ 714, 28.11. COMPUTING EQUILIBRIUM WITH HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY (BASED ON KRUEGER AND KUBLER, 2004) Daniel Harenberg daniel.harenberg@gmx.de University of Mannheim Econ 714, 28.11.06 Daniel Harenberg

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Web-based Supplementary Materials for. Modeling of Hormone Secretion-Generating. Mechanisms With Splines: A Pseudo-Likelihood.

Web-based Supplementary Materials for. Modeling of Hormone Secretion-Generating. Mechanisms With Splines: A Pseudo-Likelihood. Web-based Supplementary Materials for Modeling of Hormone Secretion-Generating Mechanisms With Splines: A Pseudo-Likelihood Approach by Anna Liu and Yuedong Wang Web Appendix A This appendix computes mean

More information

GLAM Array Methods in Statistics

GLAM Array Methods in Statistics GLAM Array Methods in Statistics Iain Currie Heriot Watt University A Generalized Linear Array Model is a low-storage, high-speed, GLAM method for multidimensional smoothing, when data forms an array,

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Filtered Gaussian Processes for Learning with Large Data-Sets

Filtered Gaussian Processes for Learning with Large Data-Sets Filtered Gaussian Processes for Learning with Large Data-Sets Jian Qing Shi, Roderick Murray-Smith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

More information

Lecture 8 February 4

Lecture 8 February 4 ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Multidimensional data and factorial methods

Multidimensional data and factorial methods Multidimensional data and factorial methods Bidimensional data x 5 4 3 4 X 3 6 X 3 5 4 3 3 3 4 5 6 x Cartesian plane Multidimensional data n X x x x n X x x x n X m x m x m x nm Factorial plane Interpretation

More information

Numerical methods for American options

Numerical methods for American options Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment

More information

Math Placement Test Practice Problems

Math Placement Test Practice Problems Math Placement Test Practice Problems The following problems cover material that is used on the math placement test to place students into Math 1111 College Algebra, Math 1113 Precalculus, and Math 2211

More information

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

State of Stress at Point

State of Stress at Point State of Stress at Point Einstein Notation The basic idea of Einstein notation is that a covector and a vector can form a scalar: This is typically written as an explicit sum: According to this convention,

More information

Introduction: Overview of Kernel Methods

Introduction: Overview of Kernel Methods Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University

More information

Extreme Value Modeling for Detection and Attribution of Climate Extremes

Extreme Value Modeling for Detection and Attribution of Climate Extremes Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Neural Networks: a replacement for Gaussian Processes?

Neural Networks: a replacement for Gaussian Processes? Neural Networks: a replacement for Gaussian Processes? Matthew Lilley and Marcus Frean Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand marcus@mcs.vuw.ac.nz http://www.mcs.vuw.ac.nz/

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Note on the EM Algorithm in Linear Regression Model

Note on the EM Algorithm in Linear Regression Model International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

More information

Contrôle dynamique de méthodes d approximation

Contrôle dynamique de méthodes d approximation Contrôle dynamique de méthodes d approximation Fabienne Jézéquel Laboratoire d Informatique de Paris 6 ARINEWS, ENS Lyon, 7-8 mars 2005 F. Jézéquel Dynamical control of approximation methods 7-8 Mar. 2005

More information

Orthogonal Distance Regression

Orthogonal Distance Regression Applied and Computational Mathematics Division NISTIR 89 4197 Center for Computing and Applied Mathematics Orthogonal Distance Regression Paul T. Boggs and Janet E. Rogers November, 1989 (Revised July,

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX KRISTOFFER P. NIMARK The next section derives the equilibrium expressions for the beauty contest model from Section 3 of the main paper. This is followed by

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

Gaussian Conjugate Prior Cheat Sheet

Gaussian Conjugate Prior Cheat Sheet Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian

More information

Smoothing. Fitting without a parametrization

Smoothing. Fitting without a parametrization Smoothing or Fitting without a parametrization Volker Blobel University of Hamburg March 2005 1. Why smoothing 2. Running median smoother 3. Orthogonal polynomials 4. Transformations 5. Spline functions

More information

Gaussian Processes for Big Data Problems

Gaussian Processes for Big Data Problems Gaussian Processes for Big Data Problems Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Machine Learning Summer School, Chalmers University 14

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.

Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx. Solving Linear Systems of Equations Gerald Recktenwald Portland State University Mechanical Engineering Department gerry@me.pdx.edu These slides are a supplement to the book Numerical Methods with Matlab:

More information

Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

More information

Practical Approaches to Principal Component Analysis in the Presence of Missing Values

Practical Approaches to Principal Component Analysis in the Presence of Missing Values Journal of Machine Learning Research () 957- Submitted 6/8; Revised /9; Published 7/ Practical Approaches to Principal Component Analysis in the Presence of Missing Values Alexander Ilin Tapani Raiko Adaptive

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Nonlinear Algebraic Equations Example

Nonlinear Algebraic Equations Example Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities

More information

Cyber-Security Analysis of State Estimators in Power Systems

Cyber-Security Analysis of State Estimators in Power Systems Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute

More information

Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

DERIVATIVES AS MATRICES; CHAIN RULE

DERIVATIVES AS MATRICES; CHAIN RULE DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1 (R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Unit 3 (Review of) Language of Stress/Strain Analysis

Unit 3 (Review of) Language of Stress/Strain Analysis Unit 3 (Review of) Language of Stress/Strain Analysis Readings: B, M, P A.2, A.3, A.6 Rivello 2.1, 2.2 T & G Ch. 1 (especially 1.7) Paul A. Lagace, Ph.D. Professor of Aeronautics & Astronautics and Engineering

More information

Moving Least Squares Approximation

Moving Least Squares Approximation Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Optimal shift scheduling with a global service level constraint

Optimal shift scheduling with a global service level constraint Optimal shift scheduling with a global service level constraint Ger Koole & Erik van der Sluis Vrije Universiteit Division of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The

More information

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach

An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach Intro B, W, M, & R SPDE/GMRF Example End An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach Finn Lindgren 1 Håvard Rue 1 Johan

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Extrinsic geometric flows

Extrinsic geometric flows On joint work with Vladimir Rovenski from Haifa Paweł Walczak Uniwersytet Łódzki CRM, Bellaterra, July 16, 2010 Setting Throughout this talk: (M, F, g 0 ) is a (compact, complete, any) foliated, Riemannian

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

More information

Designing a learning system

Designing a learning system Lecture Designing a learning system Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x4-8845 http://.cs.pitt.edu/~milos/courses/cs750/ Design of a learning system (first vie) Application or Testing

More information

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear

More information

Math 241, Exam 1 Information.

Math 241, Exam 1 Information. Math 241, Exam 1 Information. 9/24/12, LC 310, 11:15-12:05. Exam 1 will be based on: Sections 12.1-12.5, 14.1-14.3. The corresponding assigned homework problems (see http://www.math.sc.edu/ boylan/sccourses/241fa12/241.html)

More information

Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES 1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n. ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?

More information

MAT188H1S Lec0101 Burbulla

MAT188H1S Lec0101 Burbulla Winter 206 Linear Transformations A linear transformation T : R m R n is a function that takes vectors in R m to vectors in R n such that and T (u + v) T (u) + T (v) T (k v) k T (v), for all vectors u

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Smoothing and Non-Parametric Regression

Smoothing and Non-Parametric Regression Smoothing and Non-Parametric Regression Germán Rodríguez grodri@princeton.edu Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest

More information

A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

More information

Introduction to the Finite Element Method

Introduction to the Finite Element Method Introduction to the Finite Element Method 09.06.2009 Outline Motivation Partial Differential Equations (PDEs) Finite Difference Method (FDM) Finite Element Method (FEM) References Motivation Figure: cross

More information

Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

HowHow To Choose A Good Stock Broker

HowHow To Choose A Good Stock Broker AN EMPIRICAL LIELIHOOD METHOD FOR DATA AIDED CHANNEL IDENTIFICATION IN UNNOWN NOISE FIELD Frédéric Pascal 1, Jean-Pierre Barbot 1, Hugo Harari-ermadec, Ricardo Suyama 3, and Pascal Larzabal 1 1 SATIE,

More information

Classification by Pairwise Coupling

Classification by Pairwise Coupling Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating

More information

Derivative Free Optimization

Derivative Free Optimization Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

More information

O(n 3 ) O(4 d nlog 2d n) O(n) O(4 d log 2d n)

O(n 3 ) O(4 d nlog 2d n) O(n) O(4 d log 2d n) O(n 3 )O(4 d nlog 2d n) O(n)O(4 d log 2d n) O(n 3 ) O(n) O(n 3 ) O(4 d nlog 2d n) O(n)O(4 d log 2d n) 1. Fast Gaussian Filtering Algorithm 2. Binary Classification Using The Fast Gaussian Filtering Algorithm

More information

Complex Numbers. w = f(z) z. Examples

Complex Numbers. w = f(z) z. Examples omple Numbers Geometrical Transformations in the omple Plane For functions of a real variable such as f( sin, g( 2 +2 etc ou are used to illustrating these geometricall, usuall on a cartesian graph. If

More information