171:290 Model Selection Lecture II: The Akaike Information Criterion
|
|
|
- Jessica George
- 9 years ago
- Views:
Transcription
1 171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012
2 Introduction AIC, the Akaike Information Criterion, is generally regarded as the first model selection criterion. Today, AIC continues to be the most widely known and used model selection tool among practitioners. AIC was introduced by Hirotugu Akaike in his seminal 1973 paper Information Theory and an Extension of the Maximum Likelihood Principle (in: B. N. Petrov and F. Csaki, eds., 2nd International Symposium on Information Theory, Akademia Kiado, Budapest, pp ).
3 Introduction The traditional maximum likelihood paradigm, as applied to statistical modeling, provides a mechanism for estimating the unknown parameters of a model having a specified dimension and structure. Akaike extended this paradigm by considering a framework in which the model dimension is also unknown, and must therefore be determined from the data. Thus, Akaike proposed a framework wherein both model estimation and selection could be simultaneously accomplished.
4 Introduction For a parametric candidate model of interest, the likelihood function reflects the conformity of the model to the observed data. As the complexity of the model is increased, the model becomes more capable of adapting to the characteristics of the data. Thus, selecting the fitted model that maximizes the empirical likelihood will invariably lead one to choose the most complex model in the candidate collection. Model selection based on the likelihood principle, therefore, requires an extension of the traditional likelihood paradigm.
5 Introduction Outline: Model Selection Framework Kullback-Leibler Information Derivation of AIC Properties and Limitations of AIC Use of AIC Application
6 Model Selection Framework Suppose a collection of data y has been generated according to an unknown model or density g(y). We endeavor to find a fitted parametric model which provides a suitable approximation to g(y). Let F(k) = {f (y θ k ) θ k Θ(k)} denote a k dimensional parametric class, i.e., a class of densities in which the parameter space Θ(k) consists of k dimensional vectors whose components are functionally independent. Let ˆθ k denote a vector of estimates obtained by maximizing the likelihood function f (y θ k ) over Θ(k). Let f (y ˆθ k ) denote the corresponding fitted model.
7 Model Selection Framework Suppose our goal is to search among a collection of classes F = {F(k 1 ), F(k 2 ),..., F(k L )} for the fitted model f (y ˆθ k ), k {k 1, k 2,..., k L }, which serves as the best approximation to g(y). Note: AIC may be used to delineate between different fitted models having the same dimension. (For instance, fitted regression models based on design matrices having the same size yet different column spaces.) For simplicity, our notation and framework assume that each candidate model class F(k) and the corresponding fitted model f (y ˆθ k ) are distinguished by the dimension k. Our model selection problem can therefore be viewed as a problem of dimension determination.
8 Model Selection Framework True or generating model: g(y). Candidate or approximating model: f (y θ k ). Fitted model: f (y ˆθ k ). Candidate family: F.
9 Model Selection Framework To determine which of the fitted models {f (y ˆθ k1 ), f (y ˆθ k2 ),..., f (y ˆθ kl )} best approximates g(y), we require a measure which provides a suitable reflection of the disparity between the true model g(y) and an approximating model f (y θ k ). The Kullback-Leibler information is one such measure.
10 Kullback-Leibler Information For two arbitrary densities g(y) and f (y) with the same support, the Kullback-Leibler information or Kullback s directed divergence between g(y) and f (y) with respect to g(y) is defined as { I (g, f ) = E ln g(y) }, f (y) where E denotes the expectation under g(y). I (g, f ) 0 with equality if and only if g and f are the same density. I (g, f ) reflects the separation between g and f. I (g, f ) is not a formal distance measure.
11 Kullback-Leibler Information For our purposes, we will consider the Kullback-Leibler information between the true model g(y) and the approximating model f (y θ k ) with respect to g(y), which we will denote as I (θ k ): { I (θ k ) = E ln g(y) }. f (y θ k ) In the preceding expression and henceforth, E denotes the expectation under the true density or model.
12 Kullback-Leibler Information Let d(θ k ) = E{ 2 ln f (y θ k )}. d(θ k ) is often called the Kullback discrepancy. Note that we can write 2I (θ k ) = d(θ k ) E{ 2 ln g(y)}. Since E{ 2 ln g(y)} does not depend on θ k, any ranking of a set of candidate models corresponding to values of I (θ k ) would be identical to a ranking corresponding to values of d(θ k ). Hence, for the purpose of discriminating among various candidate models, d(θ k ) serves as a valid substitute for I (θ k ).
13 Derivation of AIC The measure d(ˆθ k ) = E{ 2 ln f (y θ k )} θk =ˆθ k reflects the separation between the generating model g(y) and a fitted model f (y ˆθ k ). Evaluating d(ˆθ k ) is not possible, since doing so requires knowledge of g(y). The work of Akaike (1973), however, suggests that 2 ln f (y ˆθ k ) serves as a biased estimator of d(ˆθ k ), and that the bias adjustment E{d(ˆθ k )} E{ 2 ln f (y ˆθ k )} can often be asymptotically estimated by twice the dimension of θ k.
14 Derivation of AIC Since k denotes the dimension of θ k, under appropriate conditions, the expected value of AIC = 2 ln f (y ˆθ k ) + 2k should asymptotically approach the expected value of d(ˆθ k ), say (k) = E{d(ˆθ k )}. Specifically, one can establish that E{AIC} + o(1) = (k). AIC therefore provides an asymptotically unbiased estimator of (k).
15 Derivation of AIC (k) is often called the expected Kullback discrepancy. (k) reflects the average separation between the generating model g(y) and fitted models having the same structure as f (y ˆθ k ).
16 Derivation of AIC The asymptotic unbiasedness property of AIC requires the assumption that g(y) F(k). This assumption implies that the true model or density is a member of the parametric class F(k), and can therefore be written as f (y θ o ), where θ o Θ(k). From a practical perspective, the assumption that f (y θ o ) F(k) implies that the fitted model f (y ˆθ k ) is either correctly specified or overfitted. Is this a problematic assumption?
17 Derivation of AIC To justify the asymptotic unbiasedness of AIC, consider writing (k) as follows: (k) = E{d(ˆθ k )} = E{ 2 ln f (y ˆθ k )} [ ] + E{ 2 ln f (y θ o )} E{ 2 ln f (y ˆθ k )} (1) [ ] + E{d(ˆθ k )} E{ 2 ln f (y θ o )}. (2) The following lemma asserts that (1) and (2) are both within o(1) of k.
18 Derivation of AIC Lemma We assume the necessary regularity conditions required to ensure the consistency and asymptotic normality of the maximum likelihood vector ˆθ k. E{ 2 ln f (y θ o )} E{ 2 ln f (y ˆθ k )} = k + o(1), (1) E{d(ˆθ k )} E{ 2 ln f (y θ o )} = k + o(1). (2)
19 Proof of Lemma Proof: Define I(θ k ) = E [ ] 2 ln f (y θ k ) θ k θ k and I(θ k, y) = I(θ k ) is the expected Fisher information matrix. I(θ k, y) is the observed Fisher information matrix. [ ] 2 ln f (y θ k ). θ k θ k
20 Proof of Lemma First, consider taking a second-order expansion of 2 ln f (y θ o ) about ˆθ k, and evaluating the expectation of the result. We obtain E{ 2 ln f (y θ o )} = E{ 2 ln f (y ˆθ k )} { } +E (ˆθ k θ o ) {I(ˆθ k, y)}(ˆθ k θ o ) Thus, +o(1). E{ 2 ln f (y θ o )} E{ 2 ln f (y ˆθ k )} { } = E (ˆθ k θ o ) {I(ˆθ k, y)}(ˆθ k θ o ) + o(1). (3)
21 Proof of Lemma Next, consider taking a second-order expansion of d(ˆθ k ) about θ o, again evaluating the expectation of the result. We obtain Thus, E{d(ˆθ k )} = E{ 2 ln f (y θ o )} { } +E (ˆθ k θ o ) {I(θ o )}(ˆθ k θ o ) +o(1). E{d(ˆθ k )} E{ 2 ln f (y θ o )} { } = E (ˆθ k θ o ) {I(θ o )}(ˆθ k θ o ) + o(1). (4)
22 Proof of Lemma The quadratic forms (ˆθ k θ o ) {I(ˆθ k, y)}(ˆθ k θ o ) and (ˆθ k θ o ) {I(θ o )}(ˆθ k θ o ) both converge to centrally distributed chi-square random variables with k degrees of freedom. Recall again that we are assuming θ o Θ(k). Thus, the expectations of both quadratic forms are within o(1) of k. This fact along with (3) and (4) establishes (1) and (2).
23 Bias Correction AIC provides us with an approximately unbiased estimator of (k) in settings where n is large and k is comparatively small. In settings where n is small and k is comparatively large (e.g., k n/2), 2k is often much smaller than the bias adjustment, making AIC substantially negatively biased as an estimator of (k). In small-sample applications, better estimators of the bias adjustment are available than 2k. (These estimators will be discussed in future lectures.)
24 Bias Correction If AIC severely underestimates (k) for higher dimensional fitted models in the candidate set, the criterion may favor the higher dimensional models even when the expected discrepancy between these models and the generating model is rather large. Examples illustrating this phenomenon appear in Linhart and Zucchini (1986, pages 86 88), who comment (page 78) that in some cases the criterion simply continues to decrease as the number of parameters in the approximating model is increased.
25 Asymptotic Efficiency AIC is asymptotically efficient in the sense of Shibata (1980, 1981), yet it is not consistent. Suppose that the generating model is of a finite dimension, and that this model is represented in the candidate collection under consideration. A consistent criterion will asymptotically select the fitted candidate model having the correct structure with probability one. On the other hand, suppose that the generating model is of an infinite dimension, and therefore lies outside of the candidate collection under consideration. An asymptotically efficient criterion will asymptotically select the fitted candidate model which minimizes the mean squared error of prediction.
26 Use of AIC AIC is applicable in a broad array of modeling frameworks, since its justification only requires conventional large-sample properties of maximum likelihood estimators. The application of the criterion does not require the assumption that one of the candidate models is the true or correct model, although the derivation implies otherwise.
27 Use of AIC AIC can be used to compare non-nested models. Burnham and Anderson (2002, p. 88): A substantial advantage in using information-theoretic criteria is that they are valid for nonnested models. Of course, traditional likelihood ratio tests are defined only for nested models, and this represents another substantial limitation in the use of hypothesis testing in model selection.
28 Use of AIC AIC can be used to compare models based on different probability distributions. However, when the criterion values are computed, no constants should be discarded from the goodness-of-fit term 2 ln f (y ˆθ k ) (such as n ln 2π). Keep in mind that certain statistical software packages routinely discard constants in the evaluation of likelihood-based selection criteria: e.g., in normal linear regression, n ln ˆσ 2 is often used as the goodness-of-fit term as opposed to n ln ˆσ 2 + n(ln 2π + 1).
29 Use of AIC In a model selection application, the optimal fitted model is identified by the minimum value of AIC. However, the criterion values are important; models with similar values should receive the same ranking in assessing criterion preferences.
30 Use of AIC Question: What constitutes a substantial difference in criterion values? For AIC, Burnham and Anderson (2002, p. 70) feature the following table. AIC i AIC min Level of Empirical Support for Model i 0-2 Substantial 4-7 Considerably Less > 10 Essentially None
31 Application The Iowa Fluoride Study is a longitudinal study based on several hundred children throughout the state of Iowa. Brofitt, Levy, Warren, and Cavanaugh (2007) analyzed data from the Iowa Fluoride Study to investigate the association between bottled water use and caries prevalence. Since fluoride is typically not added to bottled water, children who drink primarily bottled water may be more prone to caries than those who drink primarily fluoridated tap water.
32 Application Our modeling analyses were based on 413 children. The subjects were classified as bottled water users or non bottled water users based on data collected from questionnaires completed by the parents every six months.
33 Application The response variable was defined as a count of the number of affected tooth surfaces among the permanent incisors and first molars at the time of the mixed dentition exam. The covariate of primary interest was the dichotomous bottled water usage variable (BW). Other covariates considered were the age of the child at the time of the examination and toothbrushing frequency (coded as a five-level score). Generalized linear models (GLMs) were employed. The log link was used. For the random component, two probability distributions were considered: Poisson and negative binomial.
34 Application Model selections were made using AIC. We will consider 4 models: Poisson GLM with BW, Poisson GLM without BW, Negative binomial GLM with BW, Negative binomial GLM without BW.
35 Application AIC Results Distribution With BW Without BW Poisson Neg. Binomial
36 References Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In: B. N. Petrov and F. Csáki, eds., 2nd International Symposium on Information Theory, Akadémia Kiadó, Budapest, pp Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control AC-19, Shibata, R. (1981). An optimal selection of regression variables. Biometrika 68,
37 H. Akaike Hirotugu Akaike ( )
Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma
Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minami-azabu
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
UNDERSTANDING INFORMATION CRITERIA FOR SELECTION AMONG CAPTURE-RECAPTURE OR RING RECOVERY MODELS
UNDERSTANDING INFORMATION CRITERIA FOR SELECTION AMONG CAPTURE-RECAPTURE OR RING RECOVERY MODELS David R. Anderson and Kenneth P. Burnham Colorado Cooperative Fish and Wildlife Research Unit Room 201 Wagar
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
MATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Practice problems for Homework 11 - Point Estimation
Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:
Continued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
CS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random
[Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator
Multimodel Inference
Multimodel Inference Understanding AIC and BIC in Model Selection KENNETH P. BURNHAM DAVID R. ANDERSON Colorado Cooperative Fish and Wildlife Research Unit (USGS-BRD) The model selection literature has
160 CHAPTER 4. VECTOR SPACES
160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results
Multiple group discriminant analysis: Robustness and error rate
Institut f. Statistik u. Wahrscheinlichkeitstheorie Multiple group discriminant analysis: Robustness and error rate P. Filzmoser, K. Joossens, and C. Croux Forschungsbericht CS-006- Jänner 006 040 Wien,
11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
Penalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
Lecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
The Method of Partial Fractions Math 121 Calculus II Spring 2015
Rational functions. as The Method of Partial Fractions Math 11 Calculus II Spring 015 Recall that a rational function is a quotient of two polynomials such f(x) g(x) = 3x5 + x 3 + 16x x 60. The method
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Master s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE
ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
Chapter 4: Statistical Hypothesis Testing
Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Orthogonal Distance Regression
Applied and Computational Mathematics Division NISTIR 89 4197 Center for Computing and Applied Mathematics Orthogonal Distance Regression Paul T. Boggs and Janet E. Rogers November, 1989 (Revised July,
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by
Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter
Life Table Analysis using Weighted Survey Data
Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Credit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
Orthogonal Projections
Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors
Solutions to Math 51 First Exam January 29, 2015
Solutions to Math 5 First Exam January 29, 25. ( points) (a) Complete the following sentence: A set of vectors {v,..., v k } is defined to be linearly dependent if (2 points) there exist c,... c k R, not
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
NOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
Solving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
Fitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
Chapter 4. Probability and Probability Distributions
Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the
Quadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Persuasion by Cheap Talk - Online Appendix
Persuasion by Cheap Talk - Online Appendix By ARCHISHMAN CHAKRABORTY AND RICK HARBAUGH Online appendix to Persuasion by Cheap Talk, American Economic Review Our results in the main text concern the case
Lesson19: Comparing Predictive Accuracy of two Forecasts: Th. Diebold-Mariano Test
Lesson19: Comparing Predictive Accuracy of two Forecasts: The Diebold-Mariano Test Dipartimento di Ingegneria e Scienze dell Informazione e Matematica Università dell Aquila, [email protected]
Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis
Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics
Linear Models for Continuous Data
Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear
CURVE FITTING LEAST SQUARES APPROXIMATION
CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship
Overview of Factor Analysis
Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,
Handling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain
Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal
Exact Nonparametric Tests for Comparing Means - A Personal Summary
Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola
Estimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
Factor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
Efficiency and the Cramér-Rao Inequality
Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision.
Chapter 3: The Multiple Linear Regression Model
Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
Systems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
Nonnested model comparison of GLM and GAM count regression models for life insurance data
Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development
Math 120 Final Exam Practice Problems, Form: A
Math 120 Final Exam Practice Problems, Form: A Name: While every attempt was made to be complete in the types of problems given below, we make no guarantees about the completeness of the problems. Specifically,
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:
How To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
Penalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Chapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
Random variables, probability distributions, binomial random variable
Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that
SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria
Paper SA01_05 SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN
Panel Data Econometrics
Panel Data Econometrics Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans University of Orléans January 2010 De nition A longitudinal, or panel, data set is
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
Inference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
Time Series Analysis
Time Series Analysis [email protected] Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances
METRON - International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285-298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Stein-rule estimators
These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
