# A methodology for estimating joint probability density functions

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 A methodology for estimating joint probability density functions Mauricio Monsalve Computer Science Department, Universidad de Chile Abstract We develop a theoretical methodology for estimating joint probability density functions from marginals by, transforming the set of random variables into another set of independent random variables is necessary. To test this theory in practice, we developed a software application which implemented a reduced form of this methodology, and worked well with several datasets. Finally, we discuss how to expand this methodology. Key words: joint probability density funtions, function estimation 1. Introduction Estimating the distibution of random variables is an essential concern to statistics and its related disciplines, such as machine learning and data mining. A classic approach to estimating joint probabilities for discrete data are Chow-Liu trees. Chow-Liu trees method consist in specifying which variables are related using a tree structure, much like Bayesian trees [1]. The best property of Chow-Liu trees is that, if they are built as maximum likelihood estimators, then they can be built using a MST algorithm, like Kruskal s. Later, machine learning techniques appeared. Several modeling techniques deal with the problem of computing joint probabilities, such as Bayesian networks and classification trees, but these are discrete. However, neural networks and support vector machines already support fuzzy input, which makes them somewhat adequate for estimating joint pdfs. And regarding fuzziness, fuzzy logic and sets are somewhat adequate for estimating joint pdfs too. Despite fuzzy methods were designed to work with discrete data, their trick was supporting continuous data. But these methods still do not analytically model joint pdfs. (Yet can extend almost any discrete technique to continuity.) In what strictly regards to estimating joint pdfs or cdfs, we find several multivariate distributions such as the multivariate normal distribution, multivariate Student distribution, Wishart distribution (generalizes chi-square to multiple dimensions), etc. Note that this approach to joint distributions is the classic one. A slightly different approach is shown in [2]. In this work, Pages-Laguna and Zamoras apply spectral estimate methods to estimate joint pdfs. Spectral estimate methods are applied to power spectrum density functions, which are positive and real, like pdfs. Note that the approximation techniques also apply to joint cdfs. However, there is an active research topic concerned with the estimation of cdfs: copulas. A copula is a function defined on [0,1] n (n dimensions) which estimates joint cdfs from marginals cdfs [3], and can naturally be extended to estimate joint jdfs. Research on copulas started after Sklar s theorem in 1959, when he demonstrated that always exists a copula C : [0,1] d [0,1] which estimates any given joint cdf. Each argument of the copula is marginal cdf evaluated on its parameter, i.e. C(F 1 (x 1 ),...,F n (x n )) = F(x 1,...,x n ), Preprint submitted to Reportes Técnicos, DCC, U. de Chile April 15, 2009

2 where F i, i = 1..n, are marginal cdfs, and F(x 1,...,x n ) is the joint cdf. To make use of copulas, it is necessary to use a function which form is almost completely defined. We claim to have solved this problem in part, with a different approach, which is concerned with joint pdfs instead of joint cdfs. In the following, we will deduce a method for estimating joint pdfs from sample data, by transforming a set of random variables into a set of independent ones, and by computing the marginal pdfs of the latter. Then we present a software application based on this methodology, and test it using several datasets. Finally, we discuss how to improve the methodology. 2. Our method Let us recall two particular cases of pdfs. First, if X is a vector of independent random variables, then its joint pdf is: f (x 1,x 2,...,x n ) = f 1 (x 1 ) f 2 (x 2 )... f n (x n ), and if not, one can generally write the joint pdf as: where f (x 1,x 2,...,x n ) = f 1 (x 1 ) f 2 (x 2 x 1 )... f n (x n x 1,...,x n 1 ), f (a b) a lim ε 0 Pr(A a B B(b,ε)), Pr(B B(b,ε)) and B(b,ε) is the ball centered in b with radius ε. What if we could reduce a joint pdf of the second form to a pdf of the first form? If we could do that, then the problem of computing the pdf is reduced to reducing the problem and computing the marginals Joint pdfs from marginals Let us suppose we can define variables Y k so that Y k = φ k (X 1,...,X k ), k n, so that {Y k } is a set of independent random variables and {φ k } are increasing in their last parameter. Then, the joint pdf of Y = (Y 1,...,Y n ) is g(y 1,...,y n ) = g 1 (y 1 )...g n (y n ). In particular g k (y k ) = g k (y k y 1...y k 1 ) because of their independence. Now, let us focus on f k (x k x 1...x k 1 ). We first see that: As φ k is increasing in its last parameter: f k (x k x 1...x k 1 ) = Pr(X k x k ( i < k)x i = x i ). Pr(X k x k ( i < k)x i = x i ) = Pr(φ(X 1,...,X k ) φ(x 1,...,X k 1,x k ) ( i < k)x i = x i ). Replacing φ k (X 1,...,X k ) by Y k : Pr(X k x k ( i < k)x i = x i ) = Pr(Y k φ(x 1,...,X k 1,x k ) ( i < k)x i = x i ). 2

3 By using ( i < k)x i = x i, we can replace φ(x 1,...,X k 1,x k ) by φ(x 1,...,x k ): Pr(X k x k ( i < k)x i = x i ) = Pr(Y k φ(x 1,...,x k ) ( i < k)x i = x i ). By recognizing that the {X i } and {Y i } sets share the same information, we can replace ( i < k)x i = x i by ( i < k)y i = y i : Pr(X k x k ( i < k)x i = x i ) = Pr(Y k φ(x 1,...,x k ) ( i < k)y i = y i ). And recalling that {Y i } is a set of independent random variables, we get: Pr(X k x k ( i < k)x i = x i ) = Pr(Y k φ(x 1,...,x k )). Thus, going back to f k : f k (x k x 1...x k 1 ) = Pr(X k x k ( i < k)x i = x i ) f k (x k x 1...x k 1 ) = Pr(Y k φ k (x 1,...,x k )) f k (x k x 1...x k 1 ) = g k (φ k (x 1,...,x k )) φ k. Therefore, the joint pdf of X 1,...,X k is: ( f (x 1,...,x n ) = Π k g k (φ k (x 1,...,x k )) φ ) k. Note that computing f was reduced to computing each g k and φ k. In particular, computing g k corresponds to the classic problem of estimating univariate pdfs from samples. So, the difficulty of estimating f k was really reduced to computing φ k. Note that we cannot use this approach to compute joint cdfs. If it holds that then it also holds that where and F(x 1,...,x n ) = G(φ 1 (x 1 ),...,φ n (x 1,...,x n )), Z Z g(y 1,...,Y n )d Y = g(y 1,...,Y n )d Y, Ω F Ω G Ω F = {φ 1 (X 1 ),φ 2 (X 1,X 2 ),...,φ n (X 1,...,X n ) : X 1 x 1... X n x n } Ω G = {Y 1,Y 2,...,Y n : Y 1 φ 1 (x 1 ) Y 2 φ 2 (x 1,x 2 )... Y n φ n (x 1,...,x n )}. The above condition requres that Ω F = Ω G, a condition we cannot guarantee. For example, consider that X 1 = Y 1 and X 2 = Y 2 Y 1 ; X 1 0 and X 2 0 is equivalent to Y 1 0 and Y 2 Y 1 (Ω F ), which is different to Y 1 0 and Y 2 0 (Ω G ). To address the problem of computing joint cdfs, we suggest using Monte Carlo methods, which are not affected by the curse of dimensionality as their convergence rate is independent of the number of dimensions (but instead are very slow) and are simple to implement. 3

4 2.2. Choosing φ k Which φ k functions should we use? We propose reducing the problem of computing φ k (x 1,...,x k ) to the problem of computing ψ k (y 1,...,y k 1,x k ), a function which creates a new random variable by making x k independent from y i<k. Note that φ k can be reduced to ψ k, and it holds that φ k (x 1,...,x k ) = ψ k (ψ 1 (x 1 ),ψ 2 (ψ 1 (x 1 ),x 2 ),ψ 3 (ψ 1 (x 1 ),ψ 2 (ψ 1 (x 1 ),x 2 ),x 3 ),...,x k ), φ k = ψ k. Dealing with such complicated form of ψ k is not hard. We just have to keep a record of our transformations, to use the already computed variables in the next steps: y 1 = ψ 1 (x 1 ), y 2 = ψ 2 (y 1,x 2 ), y 3 = ψ 3 (y 1,y 2,x 3 ), y 4 = ψ 4 (y 1,y 2,y 3,x 4 ),... y n = ψ n (y 1,...,y n 1,x n ). Now that the problem is reduced to using ψ k, the next problem consists in estimating these functions. (We leave this problem open.) 3. Implementation We implemented the simplest case of this methodology, which is limiting ourselves to the linear case. In particular, we will assume that independence is achieved when covariance is zero. Naturally, this assumption will not work is many cases, so we also decided to evaluate the accuracy of our estimation in our software. Note that, to build the independent random variables, we use functions y k = ψ k (y 1,...,y k 1,x k ) = x k α i y i, i<k so y k can be obtained using a Gram-Schmidt method. Also, ψ k / = 1, which simplifies the implementation a bit. The application was written in Java and reads CSV files. Its appearance can be seen in figure Supported pdfs We supported the following pdfs in our software to compute the marginals: Normal distribution. We had to include this distribution because it appears commonly. Powered normal distribution. It is also common to find skewed bell shaped distributions, like lognormals. To address this problem in its generality, we include the Box-Cox method, which enables our program to handle these cases. 4

5 Figure 1: Joint PDF Estimator. Evaluated with cc42a dataset. Power law. Power laws are also common distributions. They appear often when inequality appears. Exponential distribution. This distribution is common is memoryless processes, like waiting times. Triangular distribution. We included this distribution to handle spiked distributions. Uniform distribution. Uniform distributions are also common. And they are very simple to implement Goodness of fit To assess the quality of the estimation, we decided to avoid using Pearson or similar goodness of fit tests. Basically, our data has several dimensions and we cannot contrast a joint pdf to the data; there are not joint pdfs from samples since discrete values are not differentiable. Besides, histogram-based tests are computationally expensive with high dimensions, so we decided to compare the estimated joint cdfs to the sample joint cdfs using observations. If the estimation was good, then both cdfs should be similar, i.e. F e (x 1,...,x n ) = F s (x 1,...,x n ) + ε (where ε is a small error). To measure the similarity of F e (our joint cdf) and F s, we generate a large sample. For each observation, we compute both F e and F s (we need the original data to compute this function) and then we perform a linear regression between them. In particular, we expect: ( ) A high correlation. That is, r 2 1. r 2 = Cov2 (F e,f s ) Var(F e )Var(F s ). 5

6 A null intercept. That is, in F e = af s + b, b 0. An inclination of one. That is, in F e = af s + b, a 1. The above expectations led us design a score for assessing the goodness of fit of our data: ( )( ) 1 q = r b 1 + a 1 Note that we may have used a Cramer von Mises test to assess goodness of fit, but F e and F s do not distribute uniformly. In particular, relations among variable change the way their joint cdfs distribute. So, we decided to use a least squares approach which handles ill distributed data: weighted least squares. We splitted the observations in bins, and the weights per bin are the multiplicative inverse of the number of observations Computing joint cdfs Monte Carlo methods are slow since their convergence rate is sublinear. In particular, we will use average weighting which converges in order O(n 0.5 ), and is computed as: Z S f (x 1,...,x n ) k f (x 1,...,x n )I S (x 1,...,x n ) k f (x 1,...,x n ), where I S (x 1,...,x n ) = 1 when (x 1,...,x n ) S, and is zero otherwise [4]. Note that the sample size must be large to achieve precision. However, we can compute a large number of joint cdfs efficiently. To do this, we propose creating a large sample and using it to compute several cdfs. Nevertheless, we may suffer the risk of consistent bias. To avoid this, we may renew the sample after several uses. In our particular case, we build samples of 2000 observations and renew them after 50 computations, when evaluating large numbers of joint cdfs. By doing this, we saved much computational time; what used to take 10 minutes, now takes about 5 to 15 seconds to compute. Note that we even saved the time of evaluating pdfs, since we can do that at the same time we renew the sample. (When we generate each (x 1,...,x n ), we also compute its joint pdf and record it.) 4. Experiments We tested our software using several datasets. First, we tested it using four datasets which were random sampled by us. If X,Y,Z are independent random variables, then the columns of Ss1 are A + 0.5B, C A,B; these of Ss2 are A B 1.1, B C 1.1, A + C + AC/1000; these of Ss3 are A + AB/1000, B logc, 20log(C + 5); and these of Ss4 are (A + B) 2, (C B) 2, C 2. Ss2, Ss3 and Ss4 where designed to evaluate the tolerance of our estimations when slight nonlinear relations appeared. Second, we used course data. These datasets correspond to the grades of students of Universidad de Chile in three different courses. We expect the presence of linear relations in these datasets. Third, we used the voting intentions in the Chilean 1988 Plebiscite dataset [5]. It has 2700 observations and involves 5 variables which relations are somewhat obscure. (Linear relations are found but do not fully explain the data.) And fourth, we used several datasets from the Andrews & Herzberg archive at StatLib [6]. To use them, we removed their first columns and left only numerical sampled data. (We removed text data, row numbers, etc.) The sizes of these datasets range from 10 to

7 Table 1: Evaluation of Joint PDF Estimator with several datasets. Dataset Size Variables Fitness Comments Ss Fine, yet slightly biased around F s = 0. Ss Works fine, somewhat dispersed. Ss Somewhat disperse and biased around F s = (Fig.2-a.) Ss Small bias. Dispersion increases as F s decreases. cc10b Works fine. Last variable ignored. cc42a Low dispersion, but slightly biased. (Fig.1.) in50a High dispersion. Seems unbiased.(fig.2-b.) voting High dispersion. Slight bias around F s = 0. (Fig.2-c.) T High dispersion. T Small bias. T Works bad. T Works bad. T Works bad. T Works bad. Strong bias. T Works well despite sample size. (Fig.2-d.) T Disperse but overall fine. T Work fine. Slight bias around F s = 1. T Works bad. Only 10 variables considered. T Works bad. T High dispersion. Seems unbiased. T Works well. A bit disperse. Table 1 shows the results of the tests. Note that Fitness corresponds to the g measure we already defined, and its value is randomly generated. Also note that F e is randomly generated too (using Monte Carlo integration) and that F s is estimated from the dataset by counting (another source of error). We cannot expect g to be close to 1 (for example, 0.97) under these conditions. As we can see, our application worked well with several datasets. The results of datasets Ss2, Ss3 (fig.2- b) and Ss4 show that it is somewhat tolerant to nonlinear relations. The result of dataset T30.1 (fig.2-d) shows that it can approximate joint pdfs even when samples are small. (By taking relations into account, the curse of dimensionality should be reduced. For example, if corr(x,y ) 1, we need as many observations as these needed to estimate X solely when estimating f (x, y). Dimensions are relevant as the uncertainty increases.) Also, the results of the estimations give validity to the measure g included in the program. 5. Conclusions We developed a methodology for estimating joint pdfs, which can be summarized to: 1. Compute functions φ 1...φ n such that variables Y k = φ k (X 1,...,X k ) form a set of independent random variables. 2. Compute the marginal pdf g k for each Y k. 3. The joint pdf of X 1,...,X n is f (x 1,...x n ) = g 1 (y 1 )...g n (y n ) φ 1 x 1... φ n x n, where ( k)y k = φ k (x 1,...,x k ). To put this methodology into practice, we developed a software application which works with linear φ k functions. We found that its estimations worked well (the estimated joint cdfs were similar to the sample joint cdfs) in several datasets, which confirms our theory. 7

8 (a) Ss3 (b) in50a (c) voting (d) T30.1 Figure 2: Scatter plots generated by Joint PDF Estimator. In spite of the simplicity of our methodology, finding suitable functions φ k is not a simple problem. By improving these functions, we may be able to estimate better joint pdfs. However, we should design simple functions as complicated pdfs are hard to use in theory. Naturally, this constraint also applies to φ k /, as it is part of the built pdf. Acknowledgments.. We would like to thank project FONDECYT for their support. References [1] C.K. Chow, C.N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3): [2] A. Pages-Zamora, M.A. Lagunas. Joint probability density function estimation by spectral estimate methods. In Proceedings of the Acoustics, Speech, and Signal Processing, on Conference Proceedings., 1996 IEEE international Conference - Volume 05 (May 07-10, 1996). ICASSP. IEEE Computer Society, Washington, DC, [3] T. Schmidt. Coping with copulas. In Copulas: From theory to application in finance [4] C.P. Robert, G. Casella. Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer-Verlag New York, Inc [5] Voting intentions of the Chilean 1988 Plebiscite dataset. In John Fox, Applied Regression, Generalized Linear Models, and Related Methods. 2nd edition, Sage Publications mcmaster.ca/jfox/books/applied-regression-2e/datasets/chile.txt [6] Andrews and Herzberg archive. StatLib. 8

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### 1. χ 2 minimization 2. Fits in case of of systematic errors

Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first

### Measuring Line Edge Roughness: Fluctuations in Uncertainty

Tutor6.doc: Version 5/6/08 T h e L i t h o g r a p h y E x p e r t (August 008) Measuring Line Edge Roughness: Fluctuations in Uncertainty Line edge roughness () is the deviation of a feature edge (as

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### Lean Six Sigma Training/Certification Book: Volume 1

Lean Six Sigma Training/Certification Book: Volume 1 Six Sigma Quality: Concepts & Cases Volume I (Statistical Tools in Six Sigma DMAIC process with MINITAB Applications Chapter 1 Introduction to Six Sigma,

### Probability and Random Variables. Generation of random variables (r.v.)

Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly

### SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation

SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Jean- Damien Villiers ESSEC Business School Master of Sciences in Management Grande Ecole September 2013 1 Non Linear

### 3. The Multivariate Normal Distribution

3. The Multivariate Normal Distribution 3.1 Introduction A generalization of the familiar bell shaped normal density to several dimensions plays a fundamental role in multivariate analysis While real data

### A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Algebra 1 Course Information

Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

### Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

### NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### itesla Project Innovative Tools for Electrical System Security within Large Areas

itesla Project Innovative Tools for Electrical System Security within Large Areas Samir ISSAD RTE France samir.issad@rte-france.com PSCC 2014 Panel Session 22/08/2014 Advanced data-driven modeling techniques

### arxiv:1506.04135v1 [cs.ir] 12 Jun 2015

Reducing offline evaluation bias of collaborative filtering algorithms Arnaud de Myttenaere 1,2, Boris Golden 1, Bénédicte Le Grand 3 & Fabrice Rossi 2 arxiv:1506.04135v1 [cs.ir] 12 Jun 2015 1 - Viadeo

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

### Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

### Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

### Problem of Missing Data

VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

### Credit Risk Models: An Overview

Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Nonparametric statistics and model selection

Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

### Sales forecasting # 2

Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting

### 3.6: General Hypothesis Tests

3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.

### EC 6310: Advanced Econometric Theory

EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### General Sampling Methods

General Sampling Methods Reference: Glasserman, 2.2 and 2.3 Claudio Pacati academic year 2016 17 1 Inverse Transform Method Assume U U(0, 1) and let F be the cumulative distribution function of a distribution

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Learning outcomes. Knowledge and understanding. Competence and skills

Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

### Introduction. Independent Component Analysis

Independent Component Analysis 1 Introduction Independent component analysis (ICA) is a method for finding underlying factors or components from multivariate (multidimensional) statistical data. What distinguishes

### Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

### A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### C19 Machine Learning

C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

### An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

### Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

### Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

### Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

### Mathematics (MAT) Faculty Mary Hayward, Chair Charles B. Carey Garnet Hauger, Affiliate Tim Wegner

Mathematics (MAT) Faculty Mary Hayward, Chair Charles B. Carey Garnet Hauger, Affiliate Tim Wegner About the discipline The number of applications of mathematics has grown enormously in the natural, physical

### Predictive Modeling and Big Data

Predictive Modeling and Presented by Eileen Burns, FSA, MAAA Milliman Agenda Current uses of predictive modeling in the life insurance industry Potential applications of 2 1 June 16, 2014 [Enter presentation

### Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

CS 7 Discrete Mathematics and Probability Theory Fall 29 Satish Rao, David Tse Note 8 A Brief Introduction to Continuous Probability Up to now we have focused exclusively on discrete probability spaces

### (Quasi-)Newton methods

(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

### Sections 2.11 and 5.8

Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

### T Cryptology Spring 2009

T-79.5501 Cryptology Spring 2009 Homework 2 Tutor : Joo Y. Cho joo.cho@tkk.fi 5th February 2009 Q1. Let us consider a cryptosystem where P = {a, b, c} and C = {1, 2, 3, 4}, K = {K 1, K 2, K 3 }, and the

### 11 TRANSFORMING DENSITY FUNCTIONS

11 TRANSFORMING DENSITY FUNCTIONS It can be expedient to use a transformation function to transform one probability density function into another. As an introduction to this topic, it is helpful to recapitulate

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### Change of Continuous Random Variable

Change of Continuous Random Variable All you are responsible for from this lecture is how to implement the Engineer s Way (see page 4) to compute how the probability density function changes when we make

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### 11. Time series and dynamic linear models

11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

### These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

### Calculating Interval Forecasts

Calculating Chapter 7 (Chatfield) Monika Turyna & Thomas Hrdina Department of Economics, University of Vienna Summer Term 2009 Terminology An interval forecast consists of an upper and a lower limit between

### Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

### PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE

### MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

### Control Functions and Simultaneous Equations Methods

Control Functions and Simultaneous Equations Methods By RICHARD BLUNDELL, DENNIS KRISTENSEN AND ROSA L. MATZKIN Economic models of agent s optimization problems or of interactions among agents often exhibit

### Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering