Functional Principal Components Analysis with Survey Data
|
|
- Jasmine Skinner
- 8 years ago
- Views:
Transcription
1 First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 Functional Principal Components Analysis with Survey Data Hervé CARDOT, Mohamed CHAOUCH ( ), Camelia GOGA & Catherine LABRUÈRE Institut de Mathématiques de Bourgogne, Université de Bourgogne, 9 Avenue Alain Savary, BP 47870, DIJO Cedex, FRACE. {herve.cardot, mohamed.chaouch, camelia.goga, catherine.labruere}@u-bourgogne.fr Abstract This work aims at performing Functional Principal Components Analysis (FPCA) thanks to Horvitz-Thompson estimators when the curves are collected with survey sampling techniques. Linearization approaches based on the influence function allow us to derive estimators of the asymptotic variance of the eigenelements of the FPCA. The method is illustrated with simulations which confirm the good properties of the linearization technique. 1. Introduction Functional Data Analysis whose main purpose is to provide tools for describing and modeling sets of curves is a topic of growing interest in the statistical community. The books by Ramsay and Silverman (2002, 2005) propose an interesting description of the available procedures dealing with functional observations. These functional approaches have been proved useful in various domains such as chemometrics, economy, climatology, biology or remote sensing. The statistician generally wants, in a first step, to represent as well as possible a set of random curves in a small space in order to get a description of the functional data that allows interpretation. Functional principal components analysis (FPCA) gives a small dimension space which captures the main modes of variability of the data (see Ramsay and Silverman, 2002 for more details).
2 The way the data are collected is seldom taken into account in the literature and one generally supposes the data are independent realizations of a common functional distribution. However there are some cases for which this assumption is not fulfilled, for example when the realizations result from a sampling scheme. For instance, Dessertaine (2006) considers the estimation with time series procedures of a global demand for electricity at fine time scales with the observation of individual electricity consumption curves. More generally, there are now data (data streams) produced automatically by large numbers of distributed sensors which generate huge amounts of data that can be seen as functional. The use of sampling technique to collect them proposed for instance in Chiky and Hébrail (2007) seems to be a relevant approach in such a framework allowing a trade off between storage capacities and accuracy of the data. We propose in this work to give estimators of the functional principal components analysis when the curves are collected with survey sampling strategies. Let us note that Skinner et al. (1986) have studied some properties of multivariate PCA in a survey framework. The functional framework is different since the eigenfunctions which exibit the main modes of variability of the data are also functions and can be naturally interpreted as modes of variability varying along time. In this new functional framework, we estimate the mean function and the covariance operator using the Horvitz-Thompson estimator. The eigenelements are estimated by diagonalization of the estimated covariance operator. In order to calculate and estimate the variance of the so-constructed estimators, we use the influence function linearization method introduced by Deville (1999). This paper is organized as follows : Section 2 presents the functional principal components analysis in the setting of finite populations and defines then the Horvitz-Thompson estimator in the new functional framework. The generality of the influence function allows us to extend in section 3 the estimators proposed by Deville to our functional objects and to get asymptotic variances with the help of perturbation theory (Kato, 1966). Section 4 proposes a simulation study which shows the good behavior of our estimators for various sampling schemes as well as good approximations to their theoretical variances. 2. FPCA and sampling 2.1 FPCA in a finite population setting Let us consider a finite population U = {1,..., k,..., } with size not necessarily known and a functional variable Y defined for each element k of the population U : Y k = (Y k (t)) t [0,1] belongs to the separable Hilbert space L 2 [0, 1] of square integrable functions defined on the closed interval [0, 1] equipped with the usual inner product.,. and the norm.. The mean function µ L 2 [0, 1], is defined by µ(t) = 1 Y k (t), t [0, 1] (1) and the covariance operator Γ by Γ = 1 (Y k µ) (Y k µ) (2)
3 where the tensor product of two elements a and b of L 2 [0, 1] is the rank one operator such that a b(u) = a, u b for all u in L 2 [0, 1]. The operator Γ is symmetric and non negative ( Γu, u 0). Its eigenvalues, sorted in decreasing order, λ 1 λ 2 λ 0, satisfy Γv j (t) = λ j v j (t), t [0, 1], (3) where the eigenfunctions v j form an orthonormal system in L 2 [0, 1], i.e v j, v j = 1 if j = j and zero else. We can get now an expansion similar to the Karhunen-Loeve expansion or FPCA which allows to get the best approximation in a finite dimension space with dimension q to the curves of the population q Y k (t) µ(t) + Y k µ, v j v j (t), t [0, 1] j=1 The eigenfunctions v j indicate the main modes of variation along time t of the data around the mean µ and the explained variance of the projection onto each v j is given by the eigenvalue λ j = 1 Y k µ, v j 2. We aim at estimating the mean function µ and the covariance operator Γ in order to deduce estimators of the eigenelements (λ j, v j ) when the data are obtained with survey sampling procedures. 2.2 The Horvitz-Thompson estimator We consider a sample of n individuals s, i.e. a subset s U, selected according to a probabilistic procedure p(s) where p is a probability distribution on the set of 2 subsets of U. We denote by = Pr(k s) for all k U the first order inclusion probabilities and by l = Pr(k & l s) for all k, l U with k =, the second order inclusion probabilities. We suppose that > 0 and l > 0. We suppose also that and l are not depending on t [0, 1]. We propose to estimate the mean function µ and the covariance operator Γ by replacing each total with the corresponding Horvitz-Thompson (HT) estimator (Horvitz and Thompson, 1952). We obtain µ = 1 k s Γ = 1 k s Y k (4) Y k Y k µ µ (5) where the size of the population is estimated by = k s 1 when it is not known. Then estimators of the eigenfunctions { v j, j = 1,... q} and eigenvalues { λ j, j = 1,... q}
4 are obtained readily by diagonalisation (or spectral analysis) of the estimated covariance operator Γ. Let us note that the eigenelements of the covariance operator are not linear functions. 3. Linearization by influence function We would like to calculate and estimate the variance of ˆµ, v j and λ j. The nonlinearity of these estimators and the functional nature of Y make the variance estimation issue difficult. For this reason, we adapt the influence function linearization technique introduced by Deville (1999) to the functional framework. Let us consider the discrete measure M defined on L 2 [0, 1] as follows M = U δ Y k where δ Yk is the Dirac function taking value 1 if Y = Y k and zero otherwise. Let us suppose that each parameter of interest can be written as a functional T of M. For example, (M) = dm, µ(m) = YdM/ dm and Γ(M) = (Y µ(m)) (Y µ(m)) dm/ dm. The eigenelements given by (??) are implicit functionals T of M. The measure M is estimated by the random measure M defined as follows M = U with I k = 1 {k s}. Then the estimators given by (??) and (??) are obtained by substitution of M by M, namely they are written as functionnals T of M. 3.1 Asymptotic Properties We give in this section the asymptotic properties of our estimators. In order to do that, one need that the population and sample sizes tend to infinity. We use the asymptotic framework introduced by Isaki & Fuller (1982). Let us suppose the following assumptions : (A1) sup Y k C <, (A2) lim n = π (0, 1), (A3) min λ > 0, min l λ > 0 and lim n max l π l <, k l k l with λ and λ are two positive constant. We also suppose that the functional T giving the parameter of interest is an homogeneous functional of degree α, namely T (rm) = r α T (M) and lim α T (M) <. For example, µ and Γ are functionals of degree zero with respect to M. Let us note that the eigenelements of Γ are also functionals of degree zero with respect to M. Let us also introduce the Hilbert-Schmidt norm, denoted by 2 for operators mapping L 2 [0, 1] to L 2 [0, 1]. We show in the next proposition that the our estimators are asymptotically design ) unbiased, lim (E p (T ( M)) T (M) = 0, and consistent, namely for any fixed ε > 0 we have lim P ( T ( M) T (M) > ε) = 0. Here, E p ( ) is the expectation with respect to p(s). Proposition 1 Under hypotheses (A1), (A2) and (A3), E p µ µ 2 = O(n 1 Γ ), E p Γ 2 = 2 O(n 1 ). δ Yk I k
5 If we suppose that the non null eigenvalues are distinct, we also have, E p (sup λ j λ ) 2 j = O(n 1 ), E p v j v j 2 = O(n 1 ) for each fixed j. j 3.2 Variance approximation and estimation Let define, when it exists, the influence function of a functional T at point Y L 2 [0, 1] say IT (M, Y), as follows IT (M, Y) = lim h 0 T (M + hδ Y ) T (M) h where δ Y is the Dirac function at Y. Proposition 2 Under assumption (A1), we get that the influence functions of µ and Γ exist and Iµ(M, Y k ) = (Y k µ)/ and IΓ(M, Y k ) = 1 ((Y k µ) (Y k µ) Γ). If the non null eigenvalues of Γ are distinct then Iλ j (M, Y k ) = 1 ( Yk µ, v j 2 ) λ j Iv j (M, Y k ) = 1 Y k µ, v j Y k µ, v l v l. λ j λ l l j In order to obtain the asymptotic variance of T ( M) for T given by (??), (??) and (??), we write the first-order von Mises expansion of our functional in M/ near M/ and use the fact that T is of degree 0 and IT (M/, Y k ) = IT (M, Y k ), T ( M) = T (M) + ( ) ( ) Ik M IT (M, Y k ) 1 + R T, M. Proposition 3 Suppose the hypotheses (A1), (A2) and (A3) are fulfilled. Consider the functional T giving the parameters of interest defined in ((??), (??), ) (??). We suppose that the non null eigenvalues are distinct. Then R T, M cm = o p (n 1/2 ) and U (l the asymptotic variance of T ( M) is equal to V p [ k s IT (M, Y k) I k ] = U π l ) IT (M,Y k) IT (M,Y l ) π l. One can remark that the asymptotic variance given by the above result is not known. We propose to estimate it by the HT variance estimator with IT (M, Y k ) replaced by its HT estimator. We obtain V p ( µ) = 1 2 V p ( λj ) = 1ˆ 2 V p ( v j ) = k s k s l s k s l s l s 1 l kl π l (Y k µ) (Y l µ) 1 l kl π l ( Y k µ, v j 2 λ j ) ( Y l µ, v j 2 λ j ) 1 l kl π l Îv j (M, Y s ) Îv j(m, Y l )
6 ( ) where kl = l π l and Îv j(m, Y l ) = 1ˆ Y k bµ,bv j Y k bµ,bv l l j bλ j λ b v l. Cardot et al. l (2007) show that under the assumptions (A1)-(A3), these estimators are asymptotically design unbiased and consistent. 4. A Simulation study In our simulations all functional variables are discretized in p = 100 equispaced points in the interval [0, 1]. We consider a random variable Y distributed as brownian motion on [0, 1]. We make = replications of Y and construct then two strata U 1 and U 2 with different variances and with sizes 1 = 7000 and 2 = Our population U is the union of the two strata. Then we estimate the eigenelements of the covariance operator for two different sampling designs (Simple Random Sampling Without Replacement (SRSWR) and stratified) and two different sample sizes n = 100 and n = To evaluate our estimation procedures we make 500 replications of the previous experiment. Then estimation errors for the first eigenvalue and the first eigenvector are evaluated by considering the following loss criterions λ1 ˆλ 1 λ 1 and v1 ˆv1 v 1, with. is the Euclidiean norm. Linear approximation by influence function gives reasonable estimation of the variance for small size samples and accurates estimations as far as n gets large enough (n = 1000). We also note that the variance of the estimators given by stratified sampling turns out to be smaller than those by SRSWR. References Cardot, H, Chaouch, M, Goga, C. and Labruère, C. (2007). Functional Principal Components Analysis with Survey Data. Preprint. Chiky, R, Hébrail, G. (2007). Generic tool for summarizing distributed data streams. Preprint. Dauxois, J., Pousse, A., and Romain, Y. (1982). Asymptotic theory for the principal component analysis of a random vector function : some applications to statistical inference. J. Multivariate Anal., 12, Dessertaine A. (2006). Sondage et séries temporelles : une application pour la prévision de la consommation electrique. 38èmes Journées de Statistique, Clamart, Juin Deville, J.C. (1999). Variance estimation for complex statistics and estimators : linearization and residual techniques. Survey Methodology, 25, Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. J. Am. Statist. Ass., 47, Isaki, C.T. and Fuller, W.A. (1982). Survey design under the regression superpopulation model. J. Am. Statist. Ass. 77, Kato, T. (1966). Perturbation theory for linear operators. Springer Verlag, Berlin. Ramsay, J. O. and Silverman, B.W. (2005). Functional Data Analysis. Springer-Verlag, 2nd ed. Skinner, C.J, Holmes, D.J, Smith, T.M.F (1986). The Effect of Sample Design on Principal Components Analysis. J. Am. Statist. Ass. 81,
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationDepartment of Economics
Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationConditional quantiles with functional covariates: an application to ozone pollution forecasting
Conditional quantiles with functional covariates: an application to ozone pollution forecasting Hervé Cardot, Christophe Crambes, Pascal Sarda To cite this version: Hervé Cardot, Christophe Crambes, Pascal
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationDiscussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne
More informationEstimation of the Population Total using the Generalized Difference Estimator and Wilcoxon Ranks
Revista Colombiana de Estadística Junio 2009, volumen 32, no. 1, pp. 123 a 143 Estimation of the Population Total using the Generalized Difference Estimator and Wilcoxon Ranks Estimación del total poblacional
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationRandom Effects Models for Longitudinal Survey Data
Analysis of Survey Data. Edited by R. L. Chambers and C. J. Skinner Copyright 2003 John Wiley & Sons, Ltd. ISBN: 0-471-89987-9 CHAPTER 14 Random Effects Models for Longitudinal Survey Data C. J. Skinner
More informationContinuity of the Perron Root
Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North
More informationCONSTANT-SIGN SOLUTIONS FOR A NONLINEAR NEUMANN PROBLEM INVOLVING THE DISCRETE p-laplacian. Pasquale Candito and Giuseppina D Aguí
Opuscula Math. 34 no. 4 2014 683 690 http://dx.doi.org/10.7494/opmath.2014.34.4.683 Opuscula Mathematica CONSTANT-SIGN SOLUTIONS FOR A NONLINEAR NEUMANN PROBLEM INVOLVING THE DISCRETE p-laplacian Pasquale
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationNOV - 30211/II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane
Mathematical Sciences Paper II Time Allowed : 75 Minutes] [Maximum Marks : 100 Note : This Paper contains Fifty (50) multiple choice questions. Each question carries Two () marks. Attempt All questions.
More informationMonte Carlo Methods in Finance
Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction
More informationLinear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationInner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
More informationOrthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More informationCHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
More informationContributions to high dimensional statistical learning
Contributions to high dimensional statistical learning Stéphane Girard INRIA Rhône-Alpes & LJK (team MISTIS). 655, avenue de l Europe, Montbonnot. 38334 Saint-Ismier Cedex, France Stephane.Girard@inria.fr
More informationBANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationContributions to extreme-value analysis
Contributions to extreme-value analysis Stéphane Girard INRIA Rhône-Alpes & LJK (team MISTIS). 655, avenue de l Europe, Montbonnot. 38334 Saint-Ismier Cedex, France Stephane.Girard@inria.fr Abstract: This
More informationA THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA
A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects
More informationThe Advantages of a Strochastic Approach to Data Mining
Aggregation of asynchronous electric power consumption time series knowing the integral Raja Chiky ISEP-LISITE 21 rue Assas Paris, France raja.chiky@isep.fr Laurent Decreusefond BILab Joint Lab EDF R&D
More informationAdaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement
Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationPanel Data Econometrics
Panel Data Econometrics Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans University of Orléans January 2010 De nition A longitudinal, or panel, data set is
More informationLOGNORMAL MODEL FOR STOCK PRICES
LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as
More informationSensitivity analysis of European options in jump-diffusion models via the Malliavin calculus on the Wiener space
Sensitivity analysis of European options in jump-diffusion models via the Malliavin calculus on the Wiener space Virginie Debelley and Nicolas Privault Département de Mathématiques Université de La Rochelle
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationAu = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.
Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry
More informationLife Table Analysis using Weighted Survey Data
Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using
More informationApproaches for Analyzing Survey Data: a Discussion
Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata
More informationMaster of Mathematical Finance: Course Descriptions
Master of Mathematical Finance: Course Descriptions CS 522 Data Mining Computer Science This course provides continued exploration of data mining algorithms. More sophisticated algorithms such as support
More informationThe VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
More informationChapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
More informationDimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationSome stability results of parameter identification in a jump diffusion model
Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss
More informationHETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg daniel.harenberg@gmx.de. University of Mannheim. Econ 714, 28.11.
COMPUTING EQUILIBRIUM WITH HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY (BASED ON KRUEGER AND KUBLER, 2004) Daniel Harenberg daniel.harenberg@gmx.de University of Mannheim Econ 714, 28.11.06 Daniel Harenberg
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationIntroduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationLecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
More informationUnderstanding and Applying Kalman Filtering
Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationPartial Least Squares (PLS) Regression.
Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component
More informationComparison of Estimation Methods for Complex Survey Data Analysis
Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.
More informationComparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis
Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics &
More informationSimultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators
Shalabh & Christian Heumann Simultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators Technical Report Number 104, 2011 Department of Statistics University of Munich
More information2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering
2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering Compulsory Courses IENG540 Optimization Models and Algorithms In the course important deterministic optimization
More informationClassification of Cartan matrices
Chapter 7 Classification of Cartan matrices In this chapter we describe a classification of generalised Cartan matrices This classification can be compared as the rough classification of varieties in terms
More informationChapter 11 Introduction to Survey Sampling and Analysis Procedures
Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152
More informationA Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationFiltered Gaussian Processes for Learning with Large Data-Sets
Filtered Gaussian Processes for Learning with Large Data-Sets Jian Qing Shi, Roderick Murray-Smith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More informationVariance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers
Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications
More informationOne-shot learning and big data with n = 2
One-shot learning and big data with n = Lee H. Dicker Rutgers University Piscataway, NJ ldicker@stat.rutgers.edu Dean P. Foster University of Pennsylvania Philadelphia, PA dean@foster.net Abstract We model
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More informationInformation Security and Risk Management
Information Security and Risk Management by Lawrence D. Bodin Professor Emeritus of Decision and Information Technology Robert H. Smith School of Business University of Maryland College Park, MD 20742
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationDimensioning an inbound call center using constraint programming
Dimensioning an inbound call center using constraint programming Cyril Canon 1,2, Jean-Charles Billaut 2, and Jean-Louis Bouquard 2 1 Vitalicom, 643 avenue du grain d or, 41350 Vineuil, France ccanon@fr.snt.com
More informationSubspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Lecture 3: QR, least squares, linear regression Linear Algebra Methods for Data Mining, Spring 2007, University
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
More informationOn the existence of multiple principal eigenvalues for some indefinite linear eigenvalue problems
RACSAM Rev. R. Acad. Cien. Serie A. Mat. VOL. 97 (3), 2003, pp. 461 466 Matemática Aplicada / Applied Mathematics Comunicación Preliminar / Preliminary Communication On the existence of multiple principal
More informationLecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T. Heath Chapter 10 Boundary Value Problems for Ordinary Differential Equations Copyright c 2001. Reproduction
More informationBetting with the Kelly Criterion
Betting with the Kelly Criterion Jane June 2, 2010 Contents 1 Introduction 2 2 Kelly Criterion 2 3 The Stock Market 3 4 Simulations 5 5 Conclusion 8 1 Page 2 of 9 1 Introduction Gambling in all forms,
More informationCorollary. (f є C n+1 [a,b]). Proof: This follows directly from the preceding theorem using the inequality
Corollary For equidistant knots, i.e., u i = a + i (b-a)/n, we obtain with (f є C n+1 [a,b]). Proof: This follows directly from the preceding theorem using the inequality 120202: ESM4A - Numerical Methods
More informationMonte Carlo testing with Big Data
Monte Carlo testing with Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationPRACTICAL DATA MINING IN A LARGE UTILITY COMPANY
QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,
More informationChapter 5. Banach Spaces
9 Chapter 5 Banach Spaces Many linear equations may be formulated in terms of a suitable linear operator acting on a Banach space. In this chapter, we study Banach spaces and linear operators acting on
More information1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
More informationNew Methods Providing High Degree Polynomials with Small Mahler Measure
New Methods Providing High Degree Polynomials with Small Mahler Measure G. Rhin and J.-M. Sac-Épée CONTENTS 1. Introduction 2. A Statistical Method 3. A Minimization Method 4. Conclusion and Prospects
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More informationGraduate Certificate in Systems Engineering
Graduate Certificate in Systems Engineering Systems Engineering is a multi-disciplinary field that aims at integrating the engineering and management functions in the development and creation of a product,
More information1 2 3 1 1 2 x = + x 2 + x 4 1 0 1
(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which
More informationClarifying Some Issues in the Regression Analysis of Survey Data
Survey Research Methods (2007) http://w4.ub.uni-konstanz.de/srm Vol. 1, No. 1, pp. 11-18 c European Survey Research Association Clarifying Some Issues in the Regression Analysis of Survey Data Phillip
More information[1] Diagonal factorization
8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:
More informationAn Analysis of Rank Ordered Data
An Analysis of Rank Ordered Data Krishna P Paudel, Louisiana State University Biswo N Poudel, University of California, Berkeley Michael A. Dunn, Louisiana State University Mahesh Pandit, Louisiana State
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationIntegrating Benders decomposition within Constraint Programming
Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP
More informationMatrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products
Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products H. Geuvers Institute for Computing and Information Sciences Intelligent Systems Version: spring 2015 H. Geuvers Version:
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationVisualization of textual data: unfolding the Kohonen maps.
Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing
More informationA SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS
A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of
More information