# PS 271B: Quantitative Methods II. Lecture Notes

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 PS 271B: Quantitative Methods II Lecture Notes Langche Zeng

2 The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference. (Order?) Examples: Presidential approval; International conflict/civil war. Identification: Can quantities of interest be determined from the model/data, assuming sufficient sample size? (asymptotic concept) Parameters in structural equation models, for example, are often of theoretical interests or directly code causal assumptions. Can they be uniquely determined with available measured variables?

3 Endogenous vs. ex exogenous variables; exclusion restrictions (certain causal links are ruled out); order condition (necessary condition for identification. number of excluded exogenous vars at least equal the number of included endogenous vars.) A single equation models can be considered part of a SEM (with some of the right-hand side variables potentially endogenous.) Standard models (parametric or non-parametric matching) typically assume a set of control variables are measured that makes identification of the causal parameter possible. What variables should be in the model? Is the same model good for both prediction and causal inference? 3

4 Standard practice: use the same (parametric) model for prediction and causal inference, often for studying causal effects of each independent variable in the model in turn. e.g.: Pr(Voting) = f(education, income, party ID, race, gender, etc.) But: different objectives may require very different x s to enter the model. Prediction: all direct causes of y; Causal inference on x i : all x j s that confound the relationship between x i and y. 4

5 5 x 2 x 1 x 3 y In this hypothetical causal structure: prediction of y: all x s; causal effect of x 1 on y: x 1 and x 2 ; causal effect of x 2 on y: x 2 (controlling for x 1, its consequence, leads to bias on the total effects). causal effect of x 3 on y: x 3

6 Finding the right set of control variables is hard 6 In practice, decision is often made informally, on a case-bycase basis, resting on folklore and intuition rather than on hard mathematics. (Pearl 2009) Different studies of the same causal relationship often use different sets of control variables, guided by even slightly different substantive theories. Lead to not only changes in magnitude but even reversal of signs in estimated effects. Simpson s Paradox. Pearl (2009) and related work (being introduced to political science); Causal graph theory

7 the possibility of causal inference from observational data 7 the discovery of underlying causal graphs from data; graphical tools for control variable selection based on the causal graph.

8 8 Data source/measurement: Experimental data If done right, the gold standard. Random assignment makes treatment exogenous and treatment and control group comparable (for sufficient N) Can be expensive/infeasible (regime type change?) Issues like noncompliance, external validity, Hawthorne effect (effect of observation) Observational data, such as from surveys Issues of sampling design. e.g, stratification with different sam-

9 pling rates (weighting necesary). Clustering (correlations within clusters). Selection bias. Response-based sampling (e.g., rare events data) missing data; sensitive questions cross sectional, panel (small T), tscs Measurement: e.g, Party identification? Economic wellbeing? Ideal point? Power? Structural characteristics of the international system? Some easier, some harder. E.g. Party ID can be obtained directly from survey data; others require more sophisticated methods, as in recovering ideal points from roll call data (e.g. Item response 9

10 10 model) Social network analysis useful for measuring structural characteristics (such as polarization, globalization)

11 Modeling: 11 Abstraction: no model is ever perfect (if it is, then not a model ). Reality itself is infinitely rich and complex Seek to capture the essential features of the data generating process; A collection of assumptions about the process. Systematic and stochastic components: e.g. Linear regression: Y = Xβ + ɛ (1) (Why ɛ: Never could measure all relevant variables; plus the universe is inherently probabilistic, according to quantum physics.)

12 Y : N 1; X: N k; β: k 1; 12 ɛ N(0, σ 2 I) Equivalently, Y N(Xβ, σ 2 I) For each individual i, i = 1, 2,..., N: Y i N(X i β, σ 2 )

13 Also equivalent: 13 Y i f N (y i µ i, σ 2 ), µ i = x i β where y i is an observed value of the random variable Y i. Read: The density of Y i at a particular location y i is given by the normal distribution density with mean µ i = x i β and variance σ 2. We ll be looking at a variety of forms of systematic and stochastic components (distribution functions) suitable for different types of data Y (binary, multinomial, ordinal, counted, censored/truncated, duration, etc.)

14 Parametric, semi-parametric, non-parametric 14 We ve just seen an example of a parametric model. The data generating process is known up to a set of unknown parameters (in the regression model, {β, σ}) Estimation of these parameters (more below): OLS, Least absolute deviation, MLE, Bayesian.. Semi-parametric models combine a parametric component with a non-parametric component more flexible/robust than fully parametric models (but less efficient, if parametric forms can be correctly specified). This can be in terms of partially specified functional form for the systematic part (such as in neural net-

15 work model; Cox proportional hazard model), or in the form of avoiding distributional assumptions for the stochastic term. Method of Moment (and GMM, generalized MM) estimations are semi-parametric, more robust to distributional assumptions on the stochastic part. Moments: mean, variance, etc. 15 n th moment: M n = x n f(x)dx Basic idea: making use of the fact that sample moments approximates population moments, regardless of the distribution. find a set of equations known to hold in the population given

16 a model. The equations involve population moments which are functions of the unknown parameters. Obtain estimates by substituting sample moments for the population moments. e.g. the OLS estimator is also a method of moment estimator. One of the key assumptions of the classical linear model is E[ɛ i x i ] = E[(y i x i β)x i ] = 0 (for simplicity, assuming x i scalar) Sample version: (y i x i β)x i = 0 1 N i This is the same as the OLS normal equation: (first order derivative=0) 16

17 17 min i ɛ2 i = min i (y i x i β) 2 2 i (y i x i β)x i = 0 (y i x i β)x i = 0 1 N i Non-parametric models avoid such functional form assumptions as well as distributional assumptions. The less assumed, the more robust. But the less efficient (in case parametric assumptions are correct) e.g.1. Kernel smoothing. ˆm h (x) = n i=1 K h(x x i )y i n i=1 K h(x x i ) (K: some kernel function; h; bandwidth)

18 Local methods. 18 e.g.2. non-parametric matching. propensity score approach. program evaluation. (will discuss in detail later) The vast majority of standard models used in political science are parametric (logit/probit/ordered logit/tobit/heckit/poisson regression, etc.) Pros: if assumptions are (approximately) right, more efficient inference. Can do a lot of things with the precise functional relations after estimation, such as marginal effects, prediction. Cons: assumptions can be wrong.

19 Examples of functional forms for the systematic part: 19

20 Functional complexity in social science data. Neural networks as 20 universal learning machines. y Output Layer γ 1 γ 2 γ Weights z 1 z 2 Hidden Layer β 11 β β β β21 β 22 β Weights Input Layer x 1 x 2 x 3 Figure 1: A one hidden layer feed forward neural network Model selection:

21 Fitting vs. Out of sample performance. 21 Bayesian model averaging: in the Bayesian framework, no single model is true. Each is valid with certain probability. Average the ones with relatively high probability to be true. Estimation: (focusing on parametric models) How to learn about the unknown parameters (i.e., the unknown part of the model) from data Estimation criteria/principles How to fit a line/curve to the scatter plot data? visual

22 22 y Model 2 Model 1 Least Square: minimize sum of squared errors. (have seen) Least absolute deviation (more robust w.r.t. outliers). Mathematically more difficult to handle than OLS Maximum likelihood: parameter values that maximize the probability of observed data given the model are most plausible. x These are point estimates. Confidence intervals can be con-

23 structed based on the sampling distribution of the estimators. the Bayesian approach: start with a prior belief about the unknown. Update our knowledge according to the Bayes rule. As the posterior density is proportional to likelihood times prior, the data influence inference only through the likelihood function. When data dominate prior, the likelihood resembles the posterior. From the posterior distribution one can obtain point estimate (e.g., the posterior mean or the most probable value) and interval estimate (probability intervals based on the posterior distribution). 23

24 24 P (θ y) = = = P (θ, y) P (y) P (y θ)p (θ) P (y) P (y θ)p (θ) P (y θ)p (θ)dθ Computationally, the main distinction is optimization of a function vs. sampling from a distribution. Maximum likelihood estimation is obtained through optimization: find values of parameters that maximizes the likelihood function. But one can explore the likelihood function by sampling from

25 the entire distribution (e.g., Gill & King paper on Hessian not invertable mode doesn t work, explore the mean instead.) MCMC uses computational algorithms for obtaining samples from a distribution. Heavily used in Bayesian inference. e.g., Gibbs Sampler (alternating conditional sampling). Convergence is proved. Software such as BUGS (Bayesian inference Using Gibbs Sampler; WinBugs Window version), JAGS (Just Another Gibs Sampler). Several R packages interface these with R or implement various specific models (e.g. MCMCPack). Note that MCMC Bayesian inference. Where posterior distribution is known or approximated through analytical methods, MCMC 25

26 is unnecessary. When the posterior/likelihood are well behaved (such as being globally concave), optimization is more efficient and more reliable. For complex function/distributions, MCMC returns some results when optimization is difficult to do. Of course, where optimization may fail, the quality of posterior approximation through sampling could be low too. there is no magic. how special data features require special sampling and/or estimation strategies, e.g. rare events (logit estimates biased); endogenous dependence structure (independence assumption doesn t hold). 26

27 Inference 27 Quantities of interest can be computed based on the model and the parameter estimates. e.g. marginal effect of an x. Except in linear models with no higher order terms, this is generally not the coefficient of x. But they are usually functions of the parameters. Uncertainty measures should be reported, based on uncertainty measures for the parameters. (for quantities pertaining to individual observations, also the fundamental uncertainty in the error term. e.g. E(Y i X i ) vs. Y i X i Model dependence: to what extent inference depends on the as-

28 sumption that the model is true. 28 Data quality: What kind of questions can be reliably answered from available data? Or, when can history be our guide?

### Lab 8: Introduction to WinBUGS

40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

### BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Microeconometrics Blundell Lecture 1 Overview and Binary Response Models

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London February-March 2016 Blundell (University College London)

### OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance

Lecture 5: Hypothesis Testing What we know now: OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance (if the Gauss-Markov

### Analysis of Microdata

Rainer Winkelmann Stefan Boes Analysis of Microdata With 38 Figures and 41 Tables 4y Springer Contents 1 Introduction 1 1.1 What Are Microdata? 1 1.2 Types of Microdata 4 1.2.1 Qualitative Data 4 1.2.2

### Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### Linear and Piecewise Linear Regressions

Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. b 0, b 1 Q = n (Y i (b 0 + b 1 X i )) 2 i=1 Minimize this by maximizing

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Towards running complex models on big data

Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

### Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Readings: Ha and Ha Textbook - Chapters 1 8 Appendix D & E (online) Plous - Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

### Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

### The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

### Bayesian Methods. 1 The Joint Posterior Distribution

Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower

### CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

### Examples. David Ruppert. April 25, 2009. Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.

Cornell University April 25, 2009 Outline 1 2 3 4 A little about myself BA and MA in mathematics PhD in statistics in 1977 taught in the statistics department at North Carolina for 10 years have been in

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Calculating Interval Forecasts

Calculating Chapter 7 (Chatfield) Monika Turyna & Thomas Hrdina Department of Economics, University of Vienna Summer Term 2009 Terminology An interval forecast consists of an upper and a lower limit between

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Centre for Central Banking Studies

Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

### OLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.

OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### Final Exam, Spring 2007

10-701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### ECONOMETRIC THEORY. MODULE I Lecture - 1 Introduction to Econometrics

ECONOMETRIC THEORY MODULE I Lecture - 1 Introduction to Econometrics Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 2 Econometrics deals with the measurement

### Quantitative Research Methods II. Vera E. Troeger Office: Office Hours: by appointment

Quantitative Research Methods II Vera E. Troeger Office: 0.67 E-mail: v.e.troeger@warwick.ac.uk Office Hours: by appointment Quantitative Data Analysis Descriptive statistics: description of central variables

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Forecasting in supply chains

1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

### Simultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the

Simultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the explanatory variables is jointly determined with the dependent

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies Lecture 3. Factor Models and Their Estimation Steve Yang Stevens Institute of Technology 09/19/2013 Outline 1 Factor Based Trading 2 Risks to Trading Strategies 3 Desirable

### Research Methods & Experimental Design

Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

### What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

### 1 Prior Probability and Posterior Probability

Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

### Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

### Structural Econometric Modeling in Industrial Organization Handout 1

Structural Econometric Modeling in Industrial Organization Handout 1 Professor Matthijs Wildenbeest 16 May 2011 1 Reading Peter C. Reiss and Frank A. Wolak A. Structural Econometric Modeling: Rationales

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers

### Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

### A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

### Objections to Bayesian statistics

Bayesian Analysis (2008) 3, Number 3, pp. 445 450 Objections to Bayesian statistics Andrew Gelman Abstract. Bayesian inference is one of the more controversial approaches to statistics. The fundamental

### Lecture #2 Overview. Basic IRT Concepts, Models, and Assumptions. Lecture #2 ICPSR Item Response Theory Workshop

Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

### Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

### Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors Arthur Lewbel, Yingying Dong, and Thomas Tao Yang Boston College, University of California Irvine, and Boston

### Lasso on Categorical Data

Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.

### Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

### Experiment #1, Analyze Data using Excel, Calculator and Graphs.

Physics 182 - Fall 2014 - Experiment #1 1 Experiment #1, Analyze Data using Excel, Calculator and Graphs. 1 Purpose (5 Points, Including Title. Points apply to your lab report.) Before we start measuring

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### BAYESIAN ECONOMETRICS

BAYESIAN ECONOMETRICS VICTOR CHERNOZHUKOV Bayesian econometrics employs Bayesian methods for inference about economic questions using economic data. In the following, we briefly review these methods and

### Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam. Software Profiling Seminar, Statistics 101

Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam Software Profiling Seminar, 2013 Statistics 101 Descriptive Statistics Population Object Object Object Sample numerical description Object

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### Latent Class (Finite Mixture) Segments How to find them and what to do with them

Latent Class (Finite Mixture) Segments How to find them and what to do with them Jay Magidson Statistical Innovations Inc. Belmont, MA USA www.statisticalinnovations.com Sensometrics 2010, Rotterdam Overview

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Statistical Foundations: Measures of Location and Central Tendency and Summation and Expectation

Statistical Foundations: and Central Tendency and and Lecture 4 September 5, 2006 Psychology 790 Lecture #4-9/05/2006 Slide 1 of 26 Today s Lecture Today s Lecture Where this Fits central tendency/location

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### 1 Teaching notes on GMM 1.

Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in