Functional Principal Components Analysis with Survey Data

Size: px
Start display at page:

Download "Functional Principal Components Analysis with Survey Data"

Transcription

1 First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 Functional Principal Components Analysis with Survey Data Hervé CARDOT, Mohamed CHAOUCH ( ), Camelia GOGA & Catherine LABRUÈRE Institut de Mathématiques de Bourgogne, Université de Bourgogne, 9 Avenue Alain Savary, BP 47870, DIJO Cedex, FRACE. {herve.cardot, mohamed.chaouch, camelia.goga, catherine.labruere}@u-bourgogne.fr Abstract This work aims at performing Functional Principal Components Analysis (FPCA) thanks to Horvitz-Thompson estimators when the curves are collected with survey sampling techniques. Linearization approaches based on the influence function allow us to derive estimators of the asymptotic variance of the eigenelements of the FPCA. The method is illustrated with simulations which confirm the good properties of the linearization technique. 1. Introduction Functional Data Analysis whose main purpose is to provide tools for describing and modeling sets of curves is a topic of growing interest in the statistical community. The books by Ramsay and Silverman (2002, 2005) propose an interesting description of the available procedures dealing with functional observations. These functional approaches have been proved useful in various domains such as chemometrics, economy, climatology, biology or remote sensing. The statistician generally wants, in a first step, to represent as well as possible a set of random curves in a small space in order to get a description of the functional data that allows interpretation. Functional principal components analysis (FPCA) gives a small dimension space which captures the main modes of variability of the data (see Ramsay and Silverman, 2002 for more details).

2 The way the data are collected is seldom taken into account in the literature and one generally supposes the data are independent realizations of a common functional distribution. However there are some cases for which this assumption is not fulfilled, for example when the realizations result from a sampling scheme. For instance, Dessertaine (2006) considers the estimation with time series procedures of a global demand for electricity at fine time scales with the observation of individual electricity consumption curves. More generally, there are now data (data streams) produced automatically by large numbers of distributed sensors which generate huge amounts of data that can be seen as functional. The use of sampling technique to collect them proposed for instance in Chiky and Hébrail (2007) seems to be a relevant approach in such a framework allowing a trade off between storage capacities and accuracy of the data. We propose in this work to give estimators of the functional principal components analysis when the curves are collected with survey sampling strategies. Let us note that Skinner et al. (1986) have studied some properties of multivariate PCA in a survey framework. The functional framework is different since the eigenfunctions which exibit the main modes of variability of the data are also functions and can be naturally interpreted as modes of variability varying along time. In this new functional framework, we estimate the mean function and the covariance operator using the Horvitz-Thompson estimator. The eigenelements are estimated by diagonalization of the estimated covariance operator. In order to calculate and estimate the variance of the so-constructed estimators, we use the influence function linearization method introduced by Deville (1999). This paper is organized as follows : Section 2 presents the functional principal components analysis in the setting of finite populations and defines then the Horvitz-Thompson estimator in the new functional framework. The generality of the influence function allows us to extend in section 3 the estimators proposed by Deville to our functional objects and to get asymptotic variances with the help of perturbation theory (Kato, 1966). Section 4 proposes a simulation study which shows the good behavior of our estimators for various sampling schemes as well as good approximations to their theoretical variances. 2. FPCA and sampling 2.1 FPCA in a finite population setting Let us consider a finite population U = {1,..., k,..., } with size not necessarily known and a functional variable Y defined for each element k of the population U : Y k = (Y k (t)) t [0,1] belongs to the separable Hilbert space L 2 [0, 1] of square integrable functions defined on the closed interval [0, 1] equipped with the usual inner product.,. and the norm.. The mean function µ L 2 [0, 1], is defined by µ(t) = 1 Y k (t), t [0, 1] (1) and the covariance operator Γ by Γ = 1 (Y k µ) (Y k µ) (2)

3 where the tensor product of two elements a and b of L 2 [0, 1] is the rank one operator such that a b(u) = a, u b for all u in L 2 [0, 1]. The operator Γ is symmetric and non negative ( Γu, u 0). Its eigenvalues, sorted in decreasing order, λ 1 λ 2 λ 0, satisfy Γv j (t) = λ j v j (t), t [0, 1], (3) where the eigenfunctions v j form an orthonormal system in L 2 [0, 1], i.e v j, v j = 1 if j = j and zero else. We can get now an expansion similar to the Karhunen-Loeve expansion or FPCA which allows to get the best approximation in a finite dimension space with dimension q to the curves of the population q Y k (t) µ(t) + Y k µ, v j v j (t), t [0, 1] j=1 The eigenfunctions v j indicate the main modes of variation along time t of the data around the mean µ and the explained variance of the projection onto each v j is given by the eigenvalue λ j = 1 Y k µ, v j 2. We aim at estimating the mean function µ and the covariance operator Γ in order to deduce estimators of the eigenelements (λ j, v j ) when the data are obtained with survey sampling procedures. 2.2 The Horvitz-Thompson estimator We consider a sample of n individuals s, i.e. a subset s U, selected according to a probabilistic procedure p(s) where p is a probability distribution on the set of 2 subsets of U. We denote by = Pr(k s) for all k U the first order inclusion probabilities and by l = Pr(k & l s) for all k, l U with k =, the second order inclusion probabilities. We suppose that > 0 and l > 0. We suppose also that and l are not depending on t [0, 1]. We propose to estimate the mean function µ and the covariance operator Γ by replacing each total with the corresponding Horvitz-Thompson (HT) estimator (Horvitz and Thompson, 1952). We obtain µ = 1 k s Γ = 1 k s Y k (4) Y k Y k µ µ (5) where the size of the population is estimated by = k s 1 when it is not known. Then estimators of the eigenfunctions { v j, j = 1,... q} and eigenvalues { λ j, j = 1,... q}

4 are obtained readily by diagonalisation (or spectral analysis) of the estimated covariance operator Γ. Let us note that the eigenelements of the covariance operator are not linear functions. 3. Linearization by influence function We would like to calculate and estimate the variance of ˆµ, v j and λ j. The nonlinearity of these estimators and the functional nature of Y make the variance estimation issue difficult. For this reason, we adapt the influence function linearization technique introduced by Deville (1999) to the functional framework. Let us consider the discrete measure M defined on L 2 [0, 1] as follows M = U δ Y k where δ Yk is the Dirac function taking value 1 if Y = Y k and zero otherwise. Let us suppose that each parameter of interest can be written as a functional T of M. For example, (M) = dm, µ(m) = YdM/ dm and Γ(M) = (Y µ(m)) (Y µ(m)) dm/ dm. The eigenelements given by (??) are implicit functionals T of M. The measure M is estimated by the random measure M defined as follows M = U with I k = 1 {k s}. Then the estimators given by (??) and (??) are obtained by substitution of M by M, namely they are written as functionnals T of M. 3.1 Asymptotic Properties We give in this section the asymptotic properties of our estimators. In order to do that, one need that the population and sample sizes tend to infinity. We use the asymptotic framework introduced by Isaki & Fuller (1982). Let us suppose the following assumptions : (A1) sup Y k C <, (A2) lim n = π (0, 1), (A3) min λ > 0, min l λ > 0 and lim n max l π l <, k l k l with λ and λ are two positive constant. We also suppose that the functional T giving the parameter of interest is an homogeneous functional of degree α, namely T (rm) = r α T (M) and lim α T (M) <. For example, µ and Γ are functionals of degree zero with respect to M. Let us note that the eigenelements of Γ are also functionals of degree zero with respect to M. Let us also introduce the Hilbert-Schmidt norm, denoted by 2 for operators mapping L 2 [0, 1] to L 2 [0, 1]. We show in the next proposition that the our estimators are asymptotically design ) unbiased, lim (E p (T ( M)) T (M) = 0, and consistent, namely for any fixed ε > 0 we have lim P ( T ( M) T (M) > ε) = 0. Here, E p ( ) is the expectation with respect to p(s). Proposition 1 Under hypotheses (A1), (A2) and (A3), E p µ µ 2 = O(n 1 Γ ), E p Γ 2 = 2 O(n 1 ). δ Yk I k

5 If we suppose that the non null eigenvalues are distinct, we also have, E p (sup λ j λ ) 2 j = O(n 1 ), E p v j v j 2 = O(n 1 ) for each fixed j. j 3.2 Variance approximation and estimation Let define, when it exists, the influence function of a functional T at point Y L 2 [0, 1] say IT (M, Y), as follows IT (M, Y) = lim h 0 T (M + hδ Y ) T (M) h where δ Y is the Dirac function at Y. Proposition 2 Under assumption (A1), we get that the influence functions of µ and Γ exist and Iµ(M, Y k ) = (Y k µ)/ and IΓ(M, Y k ) = 1 ((Y k µ) (Y k µ) Γ). If the non null eigenvalues of Γ are distinct then Iλ j (M, Y k ) = 1 ( Yk µ, v j 2 ) λ j Iv j (M, Y k ) = 1 Y k µ, v j Y k µ, v l v l. λ j λ l l j In order to obtain the asymptotic variance of T ( M) for T given by (??), (??) and (??), we write the first-order von Mises expansion of our functional in M/ near M/ and use the fact that T is of degree 0 and IT (M/, Y k ) = IT (M, Y k ), T ( M) = T (M) + ( ) ( ) Ik M IT (M, Y k ) 1 + R T, M. Proposition 3 Suppose the hypotheses (A1), (A2) and (A3) are fulfilled. Consider the functional T giving the parameters of interest defined in ((??), (??), ) (??). We suppose that the non null eigenvalues are distinct. Then R T, M cm = o p (n 1/2 ) and U (l the asymptotic variance of T ( M) is equal to V p [ k s IT (M, Y k) I k ] = U π l ) IT (M,Y k) IT (M,Y l ) π l. One can remark that the asymptotic variance given by the above result is not known. We propose to estimate it by the HT variance estimator with IT (M, Y k ) replaced by its HT estimator. We obtain V p ( µ) = 1 2 V p ( λj ) = 1ˆ 2 V p ( v j ) = k s k s l s k s l s l s 1 l kl π l (Y k µ) (Y l µ) 1 l kl π l ( Y k µ, v j 2 λ j ) ( Y l µ, v j 2 λ j ) 1 l kl π l Îv j (M, Y s ) Îv j(m, Y l )

6 ( ) where kl = l π l and Îv j(m, Y l ) = 1ˆ Y k bµ,bv j Y k bµ,bv l l j bλ j λ b v l. Cardot et al. l (2007) show that under the assumptions (A1)-(A3), these estimators are asymptotically design unbiased and consistent. 4. A Simulation study In our simulations all functional variables are discretized in p = 100 equispaced points in the interval [0, 1]. We consider a random variable Y distributed as brownian motion on [0, 1]. We make = replications of Y and construct then two strata U 1 and U 2 with different variances and with sizes 1 = 7000 and 2 = Our population U is the union of the two strata. Then we estimate the eigenelements of the covariance operator for two different sampling designs (Simple Random Sampling Without Replacement (SRSWR) and stratified) and two different sample sizes n = 100 and n = To evaluate our estimation procedures we make 500 replications of the previous experiment. Then estimation errors for the first eigenvalue and the first eigenvector are evaluated by considering the following loss criterions λ1 ˆλ 1 λ 1 and v1 ˆv1 v 1, with. is the Euclidiean norm. Linear approximation by influence function gives reasonable estimation of the variance for small size samples and accurates estimations as far as n gets large enough (n = 1000). We also note that the variance of the estimators given by stratified sampling turns out to be smaller than those by SRSWR. References Cardot, H, Chaouch, M, Goga, C. and Labruère, C. (2007). Functional Principal Components Analysis with Survey Data. Preprint. Chiky, R, Hébrail, G. (2007). Generic tool for summarizing distributed data streams. Preprint. Dauxois, J., Pousse, A., and Romain, Y. (1982). Asymptotic theory for the principal component analysis of a random vector function : some applications to statistical inference. J. Multivariate Anal., 12, Dessertaine A. (2006). Sondage et séries temporelles : une application pour la prévision de la consommation electrique. 38èmes Journées de Statistique, Clamart, Juin Deville, J.C. (1999). Variance estimation for complex statistics and estimators : linearization and residual techniques. Survey Methodology, 25, Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. J. Am. Statist. Ass., 47, Isaki, C.T. and Fuller, W.A. (1982). Survey design under the regression superpopulation model. J. Am. Statist. Ass. 77, Kato, T. (1966). Perturbation theory for linear operators. Springer Verlag, Berlin. Ramsay, J. O. and Silverman, B.W. (2005). Functional Data Analysis. Springer-Verlag, 2nd ed. Skinner, C.J, Holmes, D.J, Smith, T.M.F (1986). The Effect of Sample Design on Principal Components Analysis. J. Am. Statist. Ass. 81,

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Conditional quantiles with functional covariates: an application to ozone pollution forecasting

Conditional quantiles with functional covariates: an application to ozone pollution forecasting Conditional quantiles with functional covariates: an application to ozone pollution forecasting Hervé Cardot, Christophe Crambes, Pascal Sarda To cite this version: Hervé Cardot, Christophe Crambes, Pascal

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne

More information

Estimation of the Population Total using the Generalized Difference Estimator and Wilcoxon Ranks

Estimation of the Population Total using the Generalized Difference Estimator and Wilcoxon Ranks Revista Colombiana de Estadística Junio 2009, volumen 32, no. 1, pp. 123 a 143 Estimation of the Population Total using the Generalized Difference Estimator and Wilcoxon Ranks Estimación del total poblacional

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Random Effects Models for Longitudinal Survey Data

Random Effects Models for Longitudinal Survey Data Analysis of Survey Data. Edited by R. L. Chambers and C. J. Skinner Copyright 2003 John Wiley & Sons, Ltd. ISBN: 0-471-89987-9 CHAPTER 14 Random Effects Models for Longitudinal Survey Data C. J. Skinner

More information

Continuity of the Perron Root

Continuity of the Perron Root Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

More information

CONSTANT-SIGN SOLUTIONS FOR A NONLINEAR NEUMANN PROBLEM INVOLVING THE DISCRETE p-laplacian. Pasquale Candito and Giuseppina D Aguí

CONSTANT-SIGN SOLUTIONS FOR A NONLINEAR NEUMANN PROBLEM INVOLVING THE DISCRETE p-laplacian. Pasquale Candito and Giuseppina D Aguí Opuscula Math. 34 no. 4 2014 683 690 http://dx.doi.org/10.7494/opmath.2014.34.4.683 Opuscula Mathematica CONSTANT-SIGN SOLUTIONS FOR A NONLINEAR NEUMANN PROBLEM INVOLVING THE DISCRETE p-laplacian Pasquale

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

NOV - 30211/II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane

NOV - 30211/II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane Mathematical Sciences Paper II Time Allowed : 75 Minutes] [Maximum Marks : 100 Note : This Paper contains Fifty (50) multiple choice questions. Each question carries Two () marks. Attempt All questions.

More information

Monte Carlo Methods in Finance

Monte Carlo Methods in Finance Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

Contributions to high dimensional statistical learning

Contributions to high dimensional statistical learning Contributions to high dimensional statistical learning Stéphane Girard INRIA Rhône-Alpes & LJK (team MISTIS). 655, avenue de l Europe, Montbonnot. 38334 Saint-Ismier Cedex, France Stephane.Girard@inria.fr

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Contributions to extreme-value analysis

Contributions to extreme-value analysis Contributions to extreme-value analysis Stéphane Girard INRIA Rhône-Alpes & LJK (team MISTIS). 655, avenue de l Europe, Montbonnot. 38334 Saint-Ismier Cedex, France Stephane.Girard@inria.fr Abstract: This

More information

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects

More information

The Advantages of a Strochastic Approach to Data Mining

The Advantages of a Strochastic Approach to Data Mining Aggregation of asynchronous electric power consumption time series knowing the integral Raja Chiky ISEP-LISITE 21 rue Assas Paris, France raja.chiky@isep.fr Laurent Decreusefond BILab Joint Lab EDF R&D

More information

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Panel Data Econometrics

Panel Data Econometrics Panel Data Econometrics Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans University of Orléans January 2010 De nition A longitudinal, or panel, data set is

More information

LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as

More information

Sensitivity analysis of European options in jump-diffusion models via the Malliavin calculus on the Wiener space

Sensitivity analysis of European options in jump-diffusion models via the Malliavin calculus on the Wiener space Sensitivity analysis of European options in jump-diffusion models via the Malliavin calculus on the Wiener space Virginie Debelley and Nicolas Privault Département de Mathématiques Université de La Rochelle

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively. Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

More information

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

More information

Approaches for Analyzing Survey Data: a Discussion

Approaches for Analyzing Survey Data: a Discussion Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata

More information

Master of Mathematical Finance: Course Descriptions

Master of Mathematical Finance: Course Descriptions Master of Mathematical Finance: Course Descriptions CS 522 Data Mining Computer Science This course provides continued exploration of data mining algorithms. More sophisticated algorithms such as support

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Some stability results of parameter identification in a jump diffusion model

Some stability results of parameter identification in a jump diffusion model Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss

More information

HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg daniel.harenberg@gmx.de. University of Mannheim. Econ 714, 28.11.

HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg daniel.harenberg@gmx.de. University of Mannheim. Econ 714, 28.11. COMPUTING EQUILIBRIUM WITH HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY (BASED ON KRUEGER AND KUBLER, 2004) Daniel Harenberg daniel.harenberg@gmx.de University of Mannheim Econ 714, 28.11.06 Daniel Harenberg

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

Lecture 5: Singular Value Decomposition SVD (1)

Lecture 5: Singular Value Decomposition SVD (1) EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

More information

Understanding and Applying Kalman Filtering

Understanding and Applying Kalman Filtering Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

More information

Comparison of Estimation Methods for Complex Survey Data Analysis

Comparison of Estimation Methods for Complex Survey Data Analysis Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.

More information

Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis

Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics &

More information

Simultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators

Simultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators Shalabh & Christian Heumann Simultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators Technical Report Number 104, 2011 Department of Statistics University of Munich

More information

2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering

2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering 2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering Compulsory Courses IENG540 Optimization Models and Algorithms In the course important deterministic optimization

More information

Classification of Cartan matrices

Classification of Cartan matrices Chapter 7 Classification of Cartan matrices In this chapter we describe a classification of generalised Cartan matrices This classification can be compared as the rough classification of varieties in terms

More information

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152

More information

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Filtered Gaussian Processes for Learning with Large Data-Sets

Filtered Gaussian Processes for Learning with Large Data-Sets Filtered Gaussian Processes for Learning with Large Data-Sets Jian Qing Shi, Roderick Murray-Smith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications

More information

One-shot learning and big data with n = 2

One-shot learning and big data with n = 2 One-shot learning and big data with n = Lee H. Dicker Rutgers University Piscataway, NJ ldicker@stat.rutgers.edu Dean P. Foster University of Pennsylvania Philadelphia, PA dean@foster.net Abstract We model

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Information Security and Risk Management

Information Security and Risk Management Information Security and Risk Management by Lawrence D. Bodin Professor Emeritus of Decision and Information Technology Robert H. Smith School of Business University of Maryland College Park, MD 20742

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Dimensioning an inbound call center using constraint programming

Dimensioning an inbound call center using constraint programming Dimensioning an inbound call center using constraint programming Cyril Canon 1,2, Jean-Charles Billaut 2, and Jean-Louis Bouquard 2 1 Vitalicom, 643 avenue du grain d or, 41350 Vineuil, France ccanon@fr.snt.com

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Lecture 3: QR, least squares, linear regression Linear Algebra Methods for Data Mining, Spring 2007, University

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

On the existence of multiple principal eigenvalues for some indefinite linear eigenvalue problems

On the existence of multiple principal eigenvalues for some indefinite linear eigenvalue problems RACSAM Rev. R. Acad. Cien. Serie A. Mat. VOL. 97 (3), 2003, pp. 461 466 Matemática Aplicada / Applied Mathematics Comunicación Preliminar / Preliminary Communication On the existence of multiple principal

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10 Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T. Heath Chapter 10 Boundary Value Problems for Ordinary Differential Equations Copyright c 2001. Reproduction

More information

Betting with the Kelly Criterion

Betting with the Kelly Criterion Betting with the Kelly Criterion Jane June 2, 2010 Contents 1 Introduction 2 2 Kelly Criterion 2 3 The Stock Market 3 4 Simulations 5 5 Conclusion 8 1 Page 2 of 9 1 Introduction Gambling in all forms,

More information

Corollary. (f є C n+1 [a,b]). Proof: This follows directly from the preceding theorem using the inequality

Corollary. (f є C n+1 [a,b]). Proof: This follows directly from the preceding theorem using the inequality Corollary For equidistant knots, i.e., u i = a + i (b-a)/n, we obtain with (f є C n+1 [a,b]). Proof: This follows directly from the preceding theorem using the inequality 120202: ESM4A - Numerical Methods

More information

Monte Carlo testing with Big Data

Monte Carlo testing with Big Data Monte Carlo testing with Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

More information

Chapter 5. Banach Spaces

Chapter 5. Banach Spaces 9 Chapter 5 Banach Spaces Many linear equations may be formulated in terms of a suitable linear operator acting on a Banach space. In this chapter, we study Banach spaces and linear operators acting on

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

New Methods Providing High Degree Polynomials with Small Mahler Measure

New Methods Providing High Degree Polynomials with Small Mahler Measure New Methods Providing High Degree Polynomials with Small Mahler Measure G. Rhin and J.-M. Sac-Épée CONTENTS 1. Introduction 2. A Statistical Method 3. A Minimization Method 4. Conclusion and Prospects

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

Graduate Certificate in Systems Engineering

Graduate Certificate in Systems Engineering Graduate Certificate in Systems Engineering Systems Engineering is a multi-disciplinary field that aims at integrating the engineering and management functions in the development and creation of a product,

More information

1 2 3 1 1 2 x = + x 2 + x 4 1 0 1

1 2 3 1 1 2 x = + x 2 + x 4 1 0 1 (d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which

More information

Clarifying Some Issues in the Regression Analysis of Survey Data

Clarifying Some Issues in the Regression Analysis of Survey Data Survey Research Methods (2007) http://w4.ub.uni-konstanz.de/srm Vol. 1, No. 1, pp. 11-18 c European Survey Research Association Clarifying Some Issues in the Regression Analysis of Survey Data Phillip

More information

[1] Diagonal factorization

[1] Diagonal factorization 8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

More information

An Analysis of Rank Ordered Data

An Analysis of Rank Ordered Data An Analysis of Rank Ordered Data Krishna P Paudel, Louisiana State University Biswo N Poudel, University of California, Berkeley Michael A. Dunn, Louisiana State University Mahesh Pandit, Louisiana State

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Integrating Benders decomposition within Constraint Programming

Integrating Benders decomposition within Constraint Programming Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP

More information

Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products

Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products H. Geuvers Institute for Computing and Information Sciences Intelligent Systems Version: spring 2015 H. Geuvers Version:

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Visualization of textual data: unfolding the Kohonen maps.

Visualization of textual data: unfolding the Kohonen maps. Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing

More information

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

More information