Statistique en grande dimension

Size: px
Start display at page:

Download "Statistique en grande dimension"

Transcription

1 Statistique en grande dimension Lecturer : Dalalyan A., Scribe : Thomas F.-X. First lecture Introduction. Statistique classique Statistique paramétriques : Z,..., Z n iid, avec une loi commune P θ On fait l hypothèse θ Θ R d Connu : Z,..., Z n et Θ Inconnu : θ ou P θ Hypothèse importante : d est fixe et n + On sait dans ce cas que l estimateur du MV est asymptotiquement le plus) efficace convergent) : ˆθ MV vérifie quand n + : E P ˆθ MV θ 2] = C + o )) n On estime θ à une vitesse n vitesse paramétrique) Constat. Si d = d n t.q. lim n + d n = +, alors toute la théorie paramétrique est inutilisable. De plus, l estimateur du MV n est plus le meilleur estimateur!.2 Statistique non paramétrique On observe Z,..., Z n iid de loi P, inconnue, telle que P P θ, θ Θ, mais avec Θ soit de dimension infinie, soit de dimension d = d n finie mais + avec la taille de l échantillon. Exemples: Θ = = f : 0, ] R, f Lipschitz de constante L f : 0, ] R, x, y, f x) f y) L x y Θ = θ = θ, θ 2,...), j= ) 2) θ 2 j < + = l 2 3) Démarche générale: On approche Θ par une suite croissante Θ k de sous-ensembles de Θ telle que Θ k est de

2 dimension d k. En procédant comme si θ appartenait à Θ k ce n est pas nécessairement le cas), on utilise une méthode paramétrique pour définir un estimateur θ k de θ. Cela nous donne une famille d estimateurs θ k. Question principale. Comment choisir k pour minimiser le risque de θ k? Si k est petit, on est face à un phénomène de sous-apprentissage underfitting) Inversement, si k est grand, phénomène de sur-apprentissage overfitting).3 Principal models in non-parametric statistics Density model. We have X,..., X n iid with a density f defined on R p, and : P X A) = A f x)dx The assumptions imposed on f are very weak as opposed to the parametric setting. For instance, a typical assumption in parametric setting is that f is the Gaussian density : f x) = det Σ ) 2π) p/2 exp ] 2 x µ)t Σ x µ), whereas a common assumption on f in nonparametric framework is : f is smooth, say, twice continuously differentiable with a bounded second derivative. Regression model. We observe Z i = X i, Y i ), with input X i, output Y i and error ε i : Y i = f X i ) + ε i. The function f is called the regression function. Here, the goal is to estimate f without assuming any parametric structure on it. Practical examples. Marketing. Each i represents a consumer X i are the features of the consumer A typical question is how do I estimate different relevant groups of consumers. A typical answer is then to use clustering algorithms. We assume that X,..., X n are iid with density f. Then, we estimate f in a non-parametric manner by ˆf. The clusters are defined as regions around the local maxima of the function ˆf..4 Machine Learning Essentially the same as non-parametric statistics The main focus here is on the algorithms rather than on the models), their statistical performance and their computational complexity. 2

3 2 Main concepts and notations Observations : Z,..., Z n iid, P Non-supervised learning : Z i = X i Supervised learning : Z i = X i, Y i ), where X i is an example or a feature, and Y i a label. Aim. To learn the distribution P or some properties of it. Prediction. We assume that a new feature X from the same prob. distribution as X,..., X n ) is observed. The aim is to predict the label associated to X. To measure the quality of a prediction, we need a loss function l y, ỹ) y is the true label, ỹ is the predicted label). In practice, both y and ỹ are random variables, furthermore y and its distribution are unknown, so l is hard to compute! Risk function. This is the expectation of the loss. Definition Assume that Z i = X i, Y i ) X Y and l : Y Y R is a loss function. A predictor, or preduction algorithm, is any mapping : The risk of the prediction function g is : ĝ : X Y) n Y X R P g ] = E P l Y, gx))] The risk of a predictor ĝ is R P ĝ ], which is random since ĝ depends on the data. R P ĝ ] = l y, ĝx)) dp x, y) Examples: X Y. Binary classification: Y = 0,, with any X l y, ỹ) = 0, if y = ỹ, otherwise = y = ỹ) = y ỹ) Least-squares regression: Y R, with any X l y, ỹ) = y ỹ) 2. 3 Excess risk and Bayes predictor We have Z i = X i, Y i ) R P g ] = X Y l y, gx)) P dx, dy) P dx, dy) = P Y X dy X = x) P X dx) Definition 2 Given a loss function l : Y Y R, the Bayes predictor, or oracle is the prediction function minimizing the risk : g arg min g Y X R P g ] 3

4 Remark In practice, g is unavailable, since it depends on P, which is unknown. The ultimate goal is to do almost as well as the oracle. A predictor ĝ n will be considered as a good one if : lim R P ĝ n ] R P g ] = 0 n + excess risk Definition 3 We say that the predictor ĝ n is consistent universally consistent) if P, we have : Theorem lim E P R P ĝ n ]] R P g ] = 0 n +. Suppose that x X,the infimum of y E P l Y, y) X = x] is reached. Then the funcion g defined by : g x) arg min y Y E P l Y, y) X = x]...is a Bayes predictor. 2. In the case of the binary classification, Y = 0, and l y, ỹ) = y = ỹ), g x) = η x) > ) where η x) = P Y = X = x]. 2 Furthermore, the excess risk can be computed by R P g] R P g ] = E P gx) g X)) 2η X))]. 4) 3. In the case of the least squares regression, Furthermore, for any η : X Y, we have : g x) = η x) where η x) = E P Y X = x] R P η ] R P η ] = E P η X) η X)) 2] Proof. Let g Y X and let : We have : g x) arg min y Y E P l Y, y) X = x]. R P g ] = E P l Y, g X))] = EP l Y, g X)) X = x] P X dx) EP l Y, g x)) X = x] P X dx) = R P g ]. 4

5 2. Using the first assertion, Therefore, g x) arg min P Y = y) X = x] y 0, = arg min P Y = y X = x) y 0, = arg max P Y = y X = x) y 0, To check 4), it suffices to remark that = arg max η x)y = ) + η x))y = 0). y 0, g x) = 0, if P Y = X = x) 2, otherwise. R P g] = E P gx) Y) 2 ] = E P gx) 2 ] + E P Y 2 ] 2E P YgX)] = E P gx)] + E P Y] 2E P E P YgX) X)] = E P gx)] + E P Y] 2E P gx)e P Y X)] = E P gx)] + E P Y] 2E P gx)η P X)] = E P gx) 2η P X)] + E PY]. Writing the same identity for g P and making the difference of these two identities, we get the desired result. 3. In view of the first assertion of the theorem, we have: g x) arg min y R E ] P Y y) 2 X = x = arg min y R ϕ y) where ϕ y) = E P Y 2 X = x ] 2yE P Y X = x] + y 2 is a second order polynomial. The minimization of such a polynomial is straightforward and leads to: arg min y R ϕ y) = E P Y X = x]. This shows that the Bayes predictor is equal to the regression function η x). The risk of this predictor is: R P η ] = E P Y η X)) 2] ]) = E P EP Y η X)) 2 X ] = E P EP Y η X)) 2 X ] = R P η ] E P η η) 2 X), where the cross-product term vanishes since ) + 2E P Y η X)) η η) X) X] + η η) 2 X) E P Y η X)) η η) X) X] = η η) X)E P Y η X)) X] = 0. This completes the proof of the theorem. 5

6 3. Link between Binary Classification & Regression Plug-in rule We start by estimating η x) by ˆη n x), ) We define ĝ n x) = ˆη n > 2. Question: How good the plug-in rule ĝ n is? ) Proposition Let ˆη be an estimator of the regression function η, and let ĝx) = ˆη x) > 2. Then, we have : R class ĝ] R class g ] 2 R reg ˆη] R reg η ] ) Proof Let η : X Y R, and gx) = ηx) > 2, and let s compute the excess risk of g. We have, R class g] R class g ] = E P gx) g X)) 2η X))]. Since g and g are both indicator functions and, therefore, take only the values 0 and, their difference will be nonzero if and only if one of them is equal to and the other one is equal to 0. This leads to R class g ] Rclass E P ηx) /2 < η X) ) 2η X) ] +E P η X) /2 < ηx) ) 2η X) ] = 2E P /2 η X), ηx)] ) η X) /2 ] If ηx) /2 and η X) > /2, then η X) /2 η X) ηx), and thus : ] R class g Rclass g ] 2E P /2 ηx), η X)] ) η X ) η X) ] 2E P ηx) η X) ] 2 EP ηx) η X) ) 2] = 2 R reg η) R reg η ). Since this inequality is true for every deterministic η, we get the desired property. 6

Machine Learning and Applications Christoph Lampert

Machine Learning and Applications Christoph Lampert Machine Learning and Applications Christoph Lampert Spring Semester 2014/2015 Lecture 2 Decision Theory (for Supervised Learning Problems) Goal: Understand existing algorithms Develop new algorithms with

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear

More information

Knowledge Engineering and Expert Systems

Knowledge Engineering and Expert Systems Knowledge Engineering and Expert Systems Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Lecture Notes on Machine

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Statistical Learning Theory Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Lecture Notes 5. For now, we focus on parametric models. Later we consider nonparametric models.

Lecture Notes 5. For now, we focus on parametric models. Later we consider nonparametric models. Lecture Notes 5 Statistical Models (Chapter 6) A statistical model P is a collection of probability distributions (or a collection of densities) Examples of nonparametric models are { P = p : (p (x)) dx

More information

Statistical Prediction

Statistical Prediction Statistical Prediction Matteo Pelagatti October 11, 2013 A statistical prediction is a guess about the value of a random variable Y based on the outcome of other random variables X 1,..., X m. Thus, a

More information

Maximum likelihood estimation: the optimization poin

Maximum likelihood estimation: the optimization poin Maximum likelihood estimation: the optimization point of view Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA 2014-2015 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis

More information

An Introduction to Statistical Machine Learning - Classical Models -

An Introduction to Statistical Machine Learning - Classical Models - An Introduction to Statistical Machine Learning - Classical Models - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,

More information

Methods of Estimation

Methods of Estimation Chapter 2 Methods of Estimation 2.1 The plug-in principles Framework: X P P, usually P = {P θ : θ Θ} for parametric models. More specifically, if X 1,, X n i.i.d.p θ, then P θ = P θ P θ. Unknown parameters:

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Mode-Finding of Gaussian Mixtures

Mode-Finding of Gaussian Mixtures Mode-Finding of Gaussian Mixtures Seppo Pulkkinen University of Turku January 13, 2012 Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, 2012 1 / 22 Outline 1 Introduction

More information

UNE ESTIMÉE DES SOMMES DE GAUSS DANS DES CORPS FINIS ARBITRAIRES

UNE ESTIMÉE DES SOMMES DE GAUSS DANS DES CORPS FINIS ARBITRAIRES A GAUSS SUM ESTIMATE IN ARBITRARY FINITE FIELDS Jean Bourgain Mei-Chu Chang Summary. We establish bounds on exponential sums ψx n ) where q = p m, p prime, ψ an additive character on F q. They extend the

More information

Master MVA: Apprentissage par renforcement Lecture: 5. Références bibliographiques: [LGM10b, MS08, MMLG10, ASM08] i=1 X ] log 2/δ (b i a i ) n.

Master MVA: Apprentissage par renforcement Lecture: 5. Références bibliographiques: [LGM10b, MS08, MMLG10, ASM08] i=1 X ] log 2/δ (b i a i ) n. Master MVA: Apprentissage par renforcement Lecture: 5 Sample complexity en apprentissage par renforcement Professeur: Rémi Munos http://researchers.lille.inria.fr/ munos/master-mva/ Références bibliographiques:

More information

Real vs. Complex Null Space Properties for Sparse Vector Recovery

Real vs. Complex Null Space Properties for Sparse Vector Recovery Real vs. Complex Null Space Properties for Sparse Vector Recovery Simon Foucart, Rémi Gribonval Abstract We identify and solve an overlooked problem about the characterization of underdetermined systems

More information

Clustering / Unsupervised Methods

Clustering / Unsupervised Methods Clustering / Unsupervised Methods Jason Corso, Albert Chen SUNY at Buffalo J. Corso (SUNY at Buffalo) Clustering / Unsupervised Methods 1 / 41 Clustering Introduction Until now, we ve assumed our training

More information

Midterm Exam, Spring 2011

Midterm Exam, Spring 2011 10-701 Midterm Exam, Spring 2011 1. Personal info: Name: Andrew account: E-mail address: 2. There are 14 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

More information

508-B (Statistics Camp, Wash U, Summer 2014) Point Estimation. Instructor: Andrés Hincapié. THIS VERSION: August 12, 2014

508-B (Statistics Camp, Wash U, Summer 2014) Point Estimation. Instructor: Andrés Hincapié. THIS VERSION: August 12, 2014 Point Estimation Instructor: Andrés Hincapié THIS VERSION: August 12, 2014 Point Estimation 2 When sample is assumed to come from a population with f (x θ), knowing θ yields knowledge about the entire

More information

Overfitting and Model Selection

Overfitting and Model Selection Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 22) Outline Outline I Overfitting 2 3 Regularized Linear Regression 4 Foray into Statistical

More information

Efficiency of the minimum quadratic distance estimator for the bivariate Poisson distribution

Efficiency of the minimum quadratic distance estimator for the bivariate Poisson distribution Efficiency of the minimum quadratic distance estimator for the bivariate Poisson distribution Louis G. Doray and Baba Madji Alhadji Kolo 1 Département de mathématiques et de statistique, Université de

More information

Recap. We are discussing non-parametric estimation of density functions. PR NPTEL course p.1/122

Recap. We are discussing non-parametric estimation of density functions. PR NPTEL course p.1/122 Recap We are discussing non-parametric estimation of density functions. PR NPTEL course p.1/122 Recap We are discussing non-parametric estimation of density functions. Here we do not assume any form for

More information

Introduction to Statistical Learning

Introduction to Statistical Learning Introduction to Statistical Learning Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines ParisTech and Institut Curie Master Course, 2011. Jean-Philippe Vert (Mines ParisTech) 1 / 46 Outline 1 Introduction

More information

A crash course in probability and Naïve Bayes classification

A crash course in probability and Naïve Bayes classification Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

More information

Using the Mean Absolute Percentage Error for Regression Models

Using the Mean Absolute Percentage Error for Regression Models Using the Mean Absolute Percentage Error for Regression Models Arnaud de Myttenaere 1,2, Boris Golden 1, Bénédicte Le Grand 3 & Fabrice Rossi 2 1 - Viadeo 30 rue de la Victoire, 75009 Paris - France 2

More information

A real application of the filtered derivative with false discovery rate

A real application of the filtered derivative with false discovery rate A real application of the filtered derivative with false discovery rate MOHAMED ELMI 1 1 Université de Djibouti, Faculté de Science, mahamedelmifr@yahoo.fr Résumé. Dans ce travail, nous donnons une application

More information

Asymptotic Confidence Bands for Density and Regression Functions in the Gaussian Case

Asymptotic Confidence Bands for Density and Regression Functions in the Gaussian Case Journal Afrika Statistika Journal Afrika Statistika Vol 5, N,, page 79 87 ISSN 36-9X Asymptotic Confidence Bs for Density egression Functions in the Gaussian Case Nahima Nemouchi Zaher Mohdeb Department

More information

Chapter 6. Kernel Methods

Chapter 6. Kernel Methods Chapter 6 Kernel Methods Below is the results of using running mean (K nearest neighbor) to estimate the effect of time to zero conversion on CD4 cell count. One of the reasons why the running mean (seen

More information

Probability Review. Rob Hall. September 9, 2010

Probability Review. Rob Hall. September 9, 2010 Probability Review Rob Hall September 9, 2010 What is Probability? Probability reasons about a sample, knowing the population. The goal of statistics is to estimate the population based on a sample. Both

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo February 2012 J. Corso (SUNY at Buffalo) Parametric Techniques February 2012 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed

More information

Generative Learning algorithms

Generative Learning algorithms CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x For instance,

More information

Introduction Introduction to Pattern Recognition. Lecture 7: Density Estimation and Parzen Windows p.4/29. Types of non-parametric methods

Introduction Introduction to Pattern Recognition. Lecture 7: Density Estimation and Parzen Windows p.4/29. Types of non-parametric methods Introduction 8001652 Introduction to Pattern Recognition. Lecture 7: Density Estimation and Parzen Windows Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

1 Theta Functions. 2 Poisson Summation for Lattices. April 22, 1:00 pm

1 Theta Functions. 2 Poisson Summation for Lattices. April 22, 1:00 pm April 22, 1:00 pm 1 Theta Functions We ve previously seen connections between modular forms and Ramanujan s work by verifying that Eisenstein series are indeed modular forms, and showing that the Discriminant

More information

Points of non-differentiability of typical Lipschitz functions

Points of non-differentiability of typical Lipschitz functions Points of non-differentiability of typical Lipschitz functions D. Preiss 1 and J. Tišer 2 While the Lebesgue Theorem, according to which real-valued Lipschitz functions defined on the real line are differentiable

More information

Density Estimation Trees

Density Estimation Trees Department of Informatics Imanol Studer (09-734-575) Density Estimation Trees Seminar Report: Database Systems Department of Informatics - Database Technology University of Zurich Supervision Prof. Dr.

More information

An introduction to statistical learning theory. Fabrice Rossi TELECOM ParisTech November 2008

An introduction to statistical learning theory. Fabrice Rossi TELECOM ParisTech November 2008 An introduction to statistical learning theory Fabrice Rossi TELECOM ParisTech November 2008 About this lecture Main goal Show how statistics enable a rigorous analysis of machine learning methods which

More information

LECTURE NOTES PROF. ALAN YUILLE

LECTURE NOTES PROF. ALAN YUILLE LECTURE NOTES PROF. ALAN YUILLE 1. Non-Parametric Learning In previous lectures, we described ML learning for parametric distributions in particular, for exponential models of form p(x λ) = (1/Z[λ]) exp{λ

More information

The Elements of Statistical Learning

The Elements of Statistical Learning The Elements of Statistical Learning http://www-stat.stanford.edu/~tibs/elemstatlearn/download.html http://www-bcf.usc.edu/~gareth/isl/islr%20first%20printing.pdf Printing 10 with corrections Contents

More information

COS 511: Foundations of Machine Learning. Rob Schapire Lecture #14 Scribe: Qian Xi March 30, 2006

COS 511: Foundations of Machine Learning. Rob Schapire Lecture #14 Scribe: Qian Xi March 30, 2006 COS 511: Foundations of Machine Learning Rob Schapire Lecture #14 Scribe: Qian Xi March 30, 006 In the previous lecture, we introduced a new learning model, the Online Learning Model. After seeing a concrete

More information

Locally Weighted Regression. 2 Parametric vs nonparametric egression methods

Locally Weighted Regression. 2 Parametric vs nonparametric egression methods CMSC 35900 (Spring 2009) Large Scale Learning Lecture: 6 Locally Weighted Regression Instructors: Sham Kakade and Greg Shakhnarovich NN in a subspace A common pre-processing step is to project the data

More information

1.7.1 Moments and Moment Generating Functions

1.7.1 Moments and Moment Generating Functions 18 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY 1.7.1 Moments and Moment Generating Functions Definition 1.12. The nth moment n N) of a random variable X is defined as µ n E Xn The nth central

More information

Econ 514: Probability and Statistics. Lecture 9: Point estimation. Point estimators

Econ 514: Probability and Statistics. Lecture 9: Point estimation. Point estimators Econ 514: Probability and Statistics Lecture 9: Point estimation Point estimators In Lecture 7 we discussed the setup of a study of the income distribution in LA. Regarding the population we considered

More information

4. Continuous Random Variables 4.1: Definition. Density and distribution functions. Examples: uniform, exponential, Laplace, gamma. Expectation, variance. Quantiles. 4.2: New random variables from old.

More information

Lecture 2: Linear Algebra and Fourier Series.

Lecture 2: Linear Algebra and Fourier Series. Lecture : Linear Algebra and Fourier Series. 1 Introduction. At the beginning of the first lecture we gave the definition of Fourier series. Here we begin with the same definition: Definition 1.1 The Fourier

More information

Probability Review. Gonzalo Mateos

Probability Review. Gonzalo Mateos Probability Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ September 28, 2016 Introduction

More information

Missing Data and the EM algorithm

Missing Data and the EM algorithm Missing Data and the EM algorithm MSc Further Statistical Methods Lecture 4 and 5 Hilary Term 2007 Steffen Lauritzen, University of Oxford; January 31, 2007 Missing data problems case A B C D E F 1 a 1

More information

Lecture 16. Point Estimation, Sample Analogue Principal

Lecture 16. Point Estimation, Sample Analogue Principal Lecture 16. Point Estimation, Sample Analogue Principal 11/14/2011 Point Estimation In a typical statistical problem, we have a random variable/vector X of interest but its pdf f X (x or pmf p X (x is

More information

Linear Discriminant Analysis

Linear Discriminant Analysis Linear Discriminant Analysis Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Notation The prior probability of class k is π k, K k=1 π k = 1. π k is usually estimated

More information

Part II: Web Content Mining Chapter 3: Clustering

Part II: Web Content Mining Chapter 3: Clustering Part II: Web Content Mining Chapter 3: Clustering Learning by Example and Clustering Hierarchical Agglomerative Clustering K-Means Clustering Probability-Based Clustering Collaborative Filtering Slides

More information

1. Univariate Random Variables

1. Univariate Random Variables The Transformation Technique By Matt Van Wyhe, Tim Larsen, and Yongho Choi If one is trying to find the distribution of a function (statistic) of a given random variable X with known distribution, it is

More information

Machine Learning. Last time: PAC and Agnostic Learning

Machine Learning. Last time: PAC and Agnostic Learning Machine Learning 0-70/5 70/5-78, 78, Spring 2008 Computational Learning Theory II Eric Xing Lecture 2, February 25, 2008 Reading: Chap. 7 T.M book Last time: PAC and Agnostic Learning Finite H, assume

More information

PMR 2728 / 5228 Probability Theory in AI and Robotics. Machine Learning. Fabio G. Cozman - Office MS08 -

PMR 2728 / 5228 Probability Theory in AI and Robotics. Machine Learning. Fabio G. Cozman - Office MS08 - PMR 2728 / 5228 Probability Theory in AI and Robotics Machine Learning Fabio G. Cozman - Office MS08 - fgcozman@usp.br November 12, 2012 Machine learning Quite general term: learning from explanations,

More information

Linear models for classification

Linear models for classification Linear models for classification Grzegorz Chrupa la and Nicolas Stroppa Saarland University Google META Workshop Chrupala and Stroppa (UdS) Linear models 2010 1 / 62 Outline 1 Linear models 2 Perceptron

More information

Max Min Word Problems. Our approach to max min word problems is modeled after our approach to related rates word problems. We will

Max Min Word Problems. Our approach to max min word problems is modeled after our approach to related rates word problems. We will Max Min Word Problems Our approach to max min word problems is modeled after our approach to related rates word problems. We will Max Min Word Problems Our approach to max min word problems is modeled

More information

Statistiques en grande dimension

Statistiques en grande dimension Statistiques en grande dimension Christophe Giraud 1,2 et Tristan Mary-Huart 3,4 (1) Université Paris-Sud (2) Ecole Polytechnique (3) AgroParistech (4) INRA - Le Moulon M2 MathSV & Maths Aléa C. Giraud

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press, Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 4: Parametric Methods Parametric Estimation X

More information

Summer School in Statistics for Astronomers. June 2-6, 2014

Summer School in Statistics for Astronomers. June 2-6, 2014 Summer School in Statistics for Astronomers June 2-6, 2014 Inference II: Maximum Likelihood Estimation, the Cramér-Rao Inequality, and the Bayesian Information Criterion James L Rosenberger Acknowledgements:

More information

Math 6810 (Probability and Fractals) Spring Lecture notes

Math 6810 (Probability and Fractals) Spring Lecture notes Math 681 (Probability and Fractals Spring 216 Lecture notes Pieter Allaart University of North Texas March 23, 216 2 Recommended reading: (Do not purchase these books before consulting with your instructor!

More information

Actualization Process and Financial Risk. Summary. Résumé. Procede d'actualisation et Risque Financier

Actualization Process and Financial Risk. Summary. Résumé. Procede d'actualisation et Risque Financier Actualization Process and Financial Risk Pierre Devolder Royale Belge, 25 boulevard du Souverain, 1170 Brussels, Belgium Summary The purpose of this paper is to present a general stochastic model of capitalization,

More information

Math 209B Homework 1

Math 209B Homework 1 Math 09B Homework Edward Burkard 3.5. Functions of Bounded Variation. 3. Signed Measures and Differentiation Exercise 30 Construct an increasing function on R whose set of discontinuities is Q. Let Q {q

More information

A simple lack-of-fit test for a wide class of regression models

A simple lack-of-fit test for a wide class of regression models A simple lack-of-fit test for a wide class of regression models Jean-Baptiste Aubin, Samuela Leoni-Aubin To cite this version: Jean-Baptiste Aubin, Samuela Leoni-Aubin. A simple lack-of-fit test for a

More information

CHAPTER 2. Cramer-Rao lower bound

CHAPTER 2. Cramer-Rao lower bound CHAPTER 2. Cramer-Rao lower bound Given an estimation problem, what is the variance of the best possible estimator? This quantity is given by the Cramer-Rao lower bound (CRLB), which we will study in this

More information

Learning Theory. 1 Introduction. 2 Hoeffding s Inequality. Statistical Machine Learning Notes 10. Instructor: Justin Domke

Learning Theory. 1 Introduction. 2 Hoeffding s Inequality. Statistical Machine Learning Notes 10. Instructor: Justin Domke Statistical Machine Learning Notes Instructor: Justin Domke Learning Theory Introduction Most of the methods we have talked about in the course have been introduced somewhat heuristically, in the sense

More information

Machine Learning Fall 2011: Homework 2 Solutions

Machine Learning Fall 2011: Homework 2 Solutions 10-701 Machine Learning Fall 2011: Homework 2 Solutions 1 Linear regression, model selection 1.1 Ridge regression Starting from our true model y Xθ + ϵ, we express ˆθ in terms of ϵ and θ: y Xθ + ϵ X X

More information

Exercises Chapter 2 Maximum Likelihood Estimation

Exercises Chapter 2 Maximum Likelihood Estimation Exercises Chapter 2 Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. b 0, b 1 Q = n (Y i (b 0 + b 1 X i )) 2 i=1 Minimize this by maximizing

More information

Feature selection, L1 vs. L2 regularization, and rotational invariance

Feature selection, L1 vs. L2 regularization, and rotational invariance Feature selection, L vs. L2 regularization, and rotational invariance Andrew Ng ICML 2004 Presented by Paul Hammon April 4, 2005 Outline. Background information 2. L -regularized logistic regression 3.

More information

Lecture 9. sup. Exercise 1.3 Use characteristic functions to show that if µ is infinitely divisible, and for each

Lecture 9. sup. Exercise 1.3 Use characteristic functions to show that if µ is infinitely divisible, and for each Lecture 9 1 Infinitely Divisible Distributions Given a triangular array of independent random variables, X ) 1 i n, we have previously given sufficient conditions for the law of S n := n X to converge

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning.  CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Study material Handouts, your notes and course readings Primary

More information

8 Laws of large numbers

8 Laws of large numbers 8 Laws of large numbers 8.1 Introduction The following comes up very often, especially in statistics. We have an experiment and a random variable X associated with it. We repeat the experiment n times

More information

Problems 1(a)(b)(c), 2(a)(b)(c), 3(a)(b)(c)(d)(e), 4(a)

Problems 1(a)(b)(c), 2(a)(b)(c), 3(a)(b)(c)(d)(e), 4(a) Problems 1(a)(b)(c), 2(a)(b)(c), 3(a)(b)(c)(d)(e), 4(a) Bob Lutz MATH 138 Final xamination 03/06/2012 Problem 1. Suppose that F and G are increasing functions on R. Recall that µ F is the unique Borel

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 ECON 3150/4150, Spring term 2013. Lecture 3 Review of theoretical statistics for econometric modelling (I) Ragnar Nymoen University of Oslo 22 January 2013 1 / 47 References to Lecture 3 HGL: Probability

More information

Ch. 17 Maximum Likelihood Estimation

Ch. 17 Maximum Likelihood Estimation Ch 7 Maximum Likelihood Estimation Introduction The identification process having led to a tentative formulation for the model, we then need to obtain efficient estimates of the parameters After the parameters

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

Learning GPs from Multiple Tasks

Learning GPs from Multiple Tasks Learning GPs from Multiple Tasks Kai Yu 1 Joint work with Volker Tresp 1, and Anton Schwaighofer 2 1 Corporate Technology, Siemens, Munich 2 Intelligent Data Analysis, Fraunhofer FIRST, Berlin First Prev

More information

Problems with solution to the written Master s Examination-Option I

Problems with solution to the written Master s Examination-Option I Problems with solution to the written Master s Examination-Option I Probability and Statistics, Spring 7 [] Math 3 -MS Exam, Spring 7 If A is a real symmetric n n matrix then show that A is idempotent

More information

Lecture 1: Introduction to regression and prediction

Lecture 1: Introduction to regression and prediction Lecture 1: Introduction to regression and prediction Rafael A. Irizarry and Hector Corrada Bravo January, 2010 Introduction A common situation in applied sciences is that one has an independent variable

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

More information

Introduction to Machine Learning Lecture 3. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 3. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 3 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Bayesian Learning Terminology: Bayes Formula/Rule Pr[Y X] = posterior probability likelihood

More information

Extending from bijections between marked occurrences of patterns to all occurrences of patterns.

Extending from bijections between marked occurrences of patterns to all occurrences of patterns. FPSAC 2012, Nagoya, Japan DMTCS proc. AR, 2012, 985 992 Extending from bijections between marked occurrences of patterns to all occurrences of patterns. Jeffrey Remmel and Mark Tiefenbruck Department of

More information

6 Jointly continuous random variables

6 Jointly continuous random variables 6 Jointly continuous random variables Again, we deviate from the order in the book for this chapter, so the subsections in this chapter do not correspond to those in the text. 6.1 Joint density functions

More information

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -6 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -6 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY Lecture -6 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. Summary of the previous lecture Normal distribu@on Central limit theorem

More information

Lecture 24: Completeness

Lecture 24: Completeness Lecture 24: Completeness Definition 6.2.16 (ancillary statistics) A statistic V (X) is ancillary iff its distribution does not depend on any unknown quantity. A statistic V (X) is first-order ancillary

More information

BAYESIAN ESTIMATION UNDER ESTIMATION CONSTRAINT. 1. Introduction

BAYESIAN ESTIMATION UNDER ESTIMATION CONSTRAINT. 1. Introduction ACTA MATHEMATICA VIETNAMICA 201 Volume 28, Number 2, 2003, pp. 201-207 BAYESIAN ESTIMATION UNDER ESTIMATION CONSTRAINT PHAM GIA THU AND TRAN LOC HUNG Abstract. We suppose that a constraint is imposed on

More information

The training of the RBF neural networks is often composed of two stages; (i) find the number and the centers c i and then (ii) find the weights w i.

The training of the RBF neural networks is often composed of two stages; (i) find the number and the centers c i and then (ii) find the weights w i. RBF center selection methods The training of the RBF neural networks is often composed of two stages; (i) find the number and the centers c i and then (ii) find the weights w i. So far we assumed that

More information

Generalized Linear Model Theory

Generalized Linear Model Theory Appendix B Generalized Linear Model Theory We describe the generalized linear model as formulated by Nelder and Wedderburn (1972), and discuss estimation of the parameters and tests of hypotheses. B.1

More information

Consistency of Surrogate Risk Minimization Methods for Binary Classification using Strongly Proper Losses

Consistency of Surrogate Risk Minimization Methods for Binary Classification using Strongly Proper Losses E0 370 Statistical Learning Theory Lecture 13 Sep 4, 013) Consistency of Surrogate Risk Minimization Methods for Binary Classification using Strongly Proper Losses Lecturer: Shivani Agarwal Scribe: Rohit

More information

Manipulating the Multivariate Gaussian Density

Manipulating the Multivariate Gaussian Density Manipulating the Multivariate Gaussian Density Thomas B. Schön and Fredrik Lindsten Division of Automatic Control Linköping University SE 5883 Linköping, Sweden. E-mail: {schon, lindsten}@isy.liu.se January,

More information

Amath 546/Econ 589 Copulas

Amath 546/Econ 589 Copulas Amath 546/Econ 589 Copulas Eric Zivot Updated: May 15, 2012 Reading FRF chapter 1 QRM chapter 4, sections 5 and 6; chapter 5 FMUND chapter 6 SDAFE chapter 8 Introduction Capturing co-movement between financial

More information

5. Moment Generating Functions and Functions of Random Variables

5. Moment Generating Functions and Functions of Random Variables 5. Moment Generating Functions and Functions of Random Variables 5.1 The Distribution of a Sum of Two Independent Random Variables - Convolutions Suppose X and Y are discrete random variables. Let S =

More information

Power Series. Chapter Introduction

Power Series. Chapter Introduction Chapter 6 Power Series Power series are one of the most useful type of series in analysis. For example, we can use them to define transcendental functions such as the exponential and trigonometric functions

More information

Maximum-Likelihood and Bayesian Parameter Estimation

Maximum-Likelihood and Bayesian Parameter Estimation Maximum-Likelihood and Bayesian Parameter Estimation Expectation Maximization (EM) Estimating Missing Feature Value Estimating missing variable with known parameters Known Value In the absence of x, most

More information

Statistics 100A Homework 6 Solutions

Statistics 100A Homework 6 Solutions Chapter 5 Statistics A Homework Solutions Ryan Rosario 3. The time in hours) required to repair a machine is an exponential distributed random variable with paramter λ. What is Let X denote the time in

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture How do we formalize the agent-environment

More information

Hypo-elliptic simulated annealing

Hypo-elliptic simulated annealing Hypo-elliptic simulated annealing Christian Bayer 1 Josef Teichmann 2 Richard Warnung 3 1 Department of Mathematics Royal Institute of Technology, Stockholm 2 Department of Mathematics ETH Zurich 3 Raiffeisen

More information

UNSUPERVISED LEARNING AND CLUSTERING. Jeff Robble, Brian Renzenbrink, Doug Roberts

UNSUPERVISED LEARNING AND CLUSTERING. Jeff Robble, Brian Renzenbrink, Doug Roberts UNSUPERVISED LEARNING AND CLUSTERING Jeff Robble, Brian Renzenbrink, Doug Roberts Unsupervised Procedures A procedure that uses unlabeled data in its classification process. Why would we use these? Collecting

More information

Lecture 9: Statistical models and experimental designs Chapters 8 & 9

Lecture 9: Statistical models and experimental designs Chapters 8 & 9 Lecture 9: Statistical models and experimental designs Chapters 8 & 9 Geir Storvik Matematisk institutt, Universitetet i Oslo 12. Mars 2014 Content Why statistical models Statistical inference Experimental

More information

5.5 Convergence Concepts

5.5 Convergence Concepts 5.5 Convergence Concepts This section treats the somewhat fanciful idea of allowing the sample size to approach infinity and investigates the behavior of certain sample quantities as this happens. We are

More information

Chapter 4 Continuous Random Variables

Chapter 4 Continuous Random Variables Chapter 4 Continuous Random Variables 曾志成 國立宜蘭大學電機工程學系 tsengcc@niu.edu.tw 1 Figure 4.1 X Y=n X Y=1 Y=2 Y=3 The random pointer on disk of circumference 1. 2 Example 4.1 Problem Suppose we have a wheel of

More information

CHAPTER 6. Max, Min, Sup, Inf

CHAPTER 6. Max, Min, Sup, Inf CHAPTER 6 Max, Min, Sup, Inf We would like to begin by asking for the maximum of the function f(x) = (sin x)/x. An approximate graph is indicated below. Looking at the graph, it is clear that f(x) 1 for

More information

LAPLACE S METHOD, FOURIER ANALYSIS, AND RANDOM WALKS ON Z d

LAPLACE S METHOD, FOURIER ANALYSIS, AND RANDOM WALKS ON Z d LAPLACE S METHOD, FOURIER ANALYSIS, AND RANDOM WALKS ON Z d STEVEN P. LALLEY 1. LAPLACE S METHOD OF ASYMPTOTIC EXPANSION 1.1. Stirling s formula. Laplace s approach to Stirling s formula is noteworthy

More information

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London February-March 2016 Blundell (University College London)

More information