PS 271B: Quantitative Methods II. Lecture Notes


 Phillip McBride
 2 years ago
 Views:
Transcription
1 PS 271B: Quantitative Methods II Lecture Notes Langche Zeng
2 The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference. (Order?) Examples: Presidential approval; International conflict/civil war. Identification: Can quantities of interest be determined from the model/data, assuming sufficient sample size? (asymptotic concept) Parameters in structural equation models, for example, are often of theoretical interests or directly code causal assumptions. Can they be uniquely determined with available measured variables?
3 Endogenous vs. ex exogenous variables; exclusion restrictions (certain causal links are ruled out); order condition (necessary condition for identification. number of excluded exogenous vars at least equal the number of included endogenous vars.) A single equation models can be considered part of a SEM (with some of the righthand side variables potentially endogenous.) Standard models (parametric or nonparametric matching) typically assume a set of control variables are measured that makes identification of the causal parameter possible. What variables should be in the model? Is the same model good for both prediction and causal inference? 3
4 Standard practice: use the same (parametric) model for prediction and causal inference, often for studying causal effects of each independent variable in the model in turn. e.g.: Pr(Voting) = f(education, income, party ID, race, gender, etc.) But: different objectives may require very different x s to enter the model. Prediction: all direct causes of y; Causal inference on x i : all x j s that confound the relationship between x i and y. 4
5 5 x 2 x 1 x 3 y In this hypothetical causal structure: prediction of y: all x s; causal effect of x 1 on y: x 1 and x 2 ; causal effect of x 2 on y: x 2 (controlling for x 1, its consequence, leads to bias on the total effects). causal effect of x 3 on y: x 3
6 Finding the right set of control variables is hard 6 In practice, decision is often made informally, on a casebycase basis, resting on folklore and intuition rather than on hard mathematics. (Pearl 2009) Different studies of the same causal relationship often use different sets of control variables, guided by even slightly different substantive theories. Lead to not only changes in magnitude but even reversal of signs in estimated effects. Simpson s Paradox. Pearl (2009) and related work (being introduced to political science); Causal graph theory
7 the possibility of causal inference from observational data 7 the discovery of underlying causal graphs from data; graphical tools for control variable selection based on the causal graph.
8 8 Data source/measurement: Experimental data If done right, the gold standard. Random assignment makes treatment exogenous and treatment and control group comparable (for sufficient N) Can be expensive/infeasible (regime type change?) Issues like noncompliance, external validity, Hawthorne effect (effect of observation) Observational data, such as from surveys Issues of sampling design. e.g, stratification with different sam
9 pling rates (weighting necesary). Clustering (correlations within clusters). Selection bias. Responsebased sampling (e.g., rare events data) missing data; sensitive questions cross sectional, panel (small T), tscs Measurement: e.g, Party identification? Economic wellbeing? Ideal point? Power? Structural characteristics of the international system? Some easier, some harder. E.g. Party ID can be obtained directly from survey data; others require more sophisticated methods, as in recovering ideal points from roll call data (e.g. Item response 9
10 10 model) Social network analysis useful for measuring structural characteristics (such as polarization, globalization)
11 Modeling: 11 Abstraction: no model is ever perfect (if it is, then not a model ). Reality itself is infinitely rich and complex Seek to capture the essential features of the data generating process; A collection of assumptions about the process. Systematic and stochastic components: e.g. Linear regression: Y = Xβ + ɛ (1) (Why ɛ: Never could measure all relevant variables; plus the universe is inherently probabilistic, according to quantum physics.)
12 Y : N 1; X: N k; β: k 1; 12 ɛ N(0, σ 2 I) Equivalently, Y N(Xβ, σ 2 I) For each individual i, i = 1, 2,..., N: Y i N(X i β, σ 2 )
13 Also equivalent: 13 Y i f N (y i µ i, σ 2 ), µ i = x i β where y i is an observed value of the random variable Y i. Read: The density of Y i at a particular location y i is given by the normal distribution density with mean µ i = x i β and variance σ 2. We ll be looking at a variety of forms of systematic and stochastic components (distribution functions) suitable for different types of data Y (binary, multinomial, ordinal, counted, censored/truncated, duration, etc.)
14 Parametric, semiparametric, nonparametric 14 We ve just seen an example of a parametric model. The data generating process is known up to a set of unknown parameters (in the regression model, {β, σ}) Estimation of these parameters (more below): OLS, Least absolute deviation, MLE, Bayesian.. Semiparametric models combine a parametric component with a nonparametric component more flexible/robust than fully parametric models (but less efficient, if parametric forms can be correctly specified). This can be in terms of partially specified functional form for the systematic part (such as in neural net
15 work model; Cox proportional hazard model), or in the form of avoiding distributional assumptions for the stochastic term. Method of Moment (and GMM, generalized MM) estimations are semiparametric, more robust to distributional assumptions on the stochastic part. Moments: mean, variance, etc. 15 n th moment: M n = x n f(x)dx Basic idea: making use of the fact that sample moments approximates population moments, regardless of the distribution. find a set of equations known to hold in the population given
16 a model. The equations involve population moments which are functions of the unknown parameters. Obtain estimates by substituting sample moments for the population moments. e.g. the OLS estimator is also a method of moment estimator. One of the key assumptions of the classical linear model is E[ɛ i x i ] = E[(y i x i β)x i ] = 0 (for simplicity, assuming x i scalar) Sample version: (y i x i β)x i = 0 1 N i This is the same as the OLS normal equation: (first order derivative=0) 16
17 17 min i ɛ2 i = min i (y i x i β) 2 2 i (y i x i β)x i = 0 (y i x i β)x i = 0 1 N i Nonparametric models avoid such functional form assumptions as well as distributional assumptions. The less assumed, the more robust. But the less efficient (in case parametric assumptions are correct) e.g.1. Kernel smoothing. ˆm h (x) = n i=1 K h(x x i )y i n i=1 K h(x x i ) (K: some kernel function; h; bandwidth)
18 Local methods. 18 e.g.2. nonparametric matching. propensity score approach. program evaluation. (will discuss in detail later) The vast majority of standard models used in political science are parametric (logit/probit/ordered logit/tobit/heckit/poisson regression, etc.) Pros: if assumptions are (approximately) right, more efficient inference. Can do a lot of things with the precise functional relations after estimation, such as marginal effects, prediction. Cons: assumptions can be wrong.
19 Examples of functional forms for the systematic part: 19
20 Functional complexity in social science data. Neural networks as 20 universal learning machines. y Output Layer γ 1 γ 2 γ Weights z 1 z 2 Hidden Layer β 11 β β β β21 β 22 β Weights Input Layer x 1 x 2 x 3 Figure 1: A one hidden layer feed forward neural network Model selection:
21 Fitting vs. Out of sample performance. 21 Bayesian model averaging: in the Bayesian framework, no single model is true. Each is valid with certain probability. Average the ones with relatively high probability to be true. Estimation: (focusing on parametric models) How to learn about the unknown parameters (i.e., the unknown part of the model) from data Estimation criteria/principles How to fit a line/curve to the scatter plot data? visual
22 22 y Model 2 Model 1 Least Square: minimize sum of squared errors. (have seen) Least absolute deviation (more robust w.r.t. outliers). Mathematically more difficult to handle than OLS Maximum likelihood: parameter values that maximize the probability of observed data given the model are most plausible. x These are point estimates. Confidence intervals can be con
23 structed based on the sampling distribution of the estimators. the Bayesian approach: start with a prior belief about the unknown. Update our knowledge according to the Bayes rule. As the posterior density is proportional to likelihood times prior, the data influence inference only through the likelihood function. When data dominate prior, the likelihood resembles the posterior. From the posterior distribution one can obtain point estimate (e.g., the posterior mean or the most probable value) and interval estimate (probability intervals based on the posterior distribution). 23
24 24 P (θ y) = = = P (θ, y) P (y) P (y θ)p (θ) P (y) P (y θ)p (θ) P (y θ)p (θ)dθ Computationally, the main distinction is optimization of a function vs. sampling from a distribution. Maximum likelihood estimation is obtained through optimization: find values of parameters that maximizes the likelihood function. But one can explore the likelihood function by sampling from
25 the entire distribution (e.g., Gill & King paper on Hessian not invertable mode doesn t work, explore the mean instead.) MCMC uses computational algorithms for obtaining samples from a distribution. Heavily used in Bayesian inference. e.g., Gibbs Sampler (alternating conditional sampling). Convergence is proved. Software such as BUGS (Bayesian inference Using Gibbs Sampler; WinBugs Window version), JAGS (Just Another Gibs Sampler). Several R packages interface these with R or implement various specific models (e.g. MCMCPack). Note that MCMC Bayesian inference. Where posterior distribution is known or approximated through analytical methods, MCMC 25
26 is unnecessary. When the posterior/likelihood are well behaved (such as being globally concave), optimization is more efficient and more reliable. For complex function/distributions, MCMC returns some results when optimization is difficult to do. Of course, where optimization may fail, the quality of posterior approximation through sampling could be low too. there is no magic. how special data features require special sampling and/or estimation strategies, e.g. rare events (logit estimates biased); endogenous dependence structure (independence assumption doesn t hold). 26
27 Inference 27 Quantities of interest can be computed based on the model and the parameter estimates. e.g. marginal effect of an x. Except in linear models with no higher order terms, this is generally not the coefficient of x. But they are usually functions of the parameters. Uncertainty measures should be reported, based on uncertainty measures for the parameters. (for quantities pertaining to individual observations, also the fundamental uncertainty in the error term. e.g. E(Y i X i ) vs. Y i X i Model dependence: to what extent inference depends on the as
28 sumption that the model is true. 28 Data quality: What kind of questions can be reliably answered from available data? Or, when can history be our guide?
Lab 8: Introduction to WinBUGS
40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationC: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}
C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression
More informationBayesX  Software for Bayesian Inference in Structured Additive Regression
BayesX  Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, LudwigMaximiliansUniversity Munich
More informationEconometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England
Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationINTRODUCTORY STATISTICS
INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationMicroeconometrics Blundell Lecture 1 Overview and Binary Response Models
Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London FebruaryMarch 2016 Blundell (University College London)
More informationOLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique  ie the estimator has the smallest variance
Lecture 5: Hypothesis Testing What we know now: OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique  ie the estimator has the smallest variance (if the GaussMarkov
More informationAnalysis of Microdata
Rainer Winkelmann Stefan Boes Analysis of Microdata With 38 Figures and 41 Tables 4y Springer Contents 1 Introduction 1 1.1 What Are Microdata? 1 1.2 Types of Microdata 4 1.2.1 Qualitative Data 4 1.2.2
More informationWebbased Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Webbased Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationSimple Linear Regression Chapter 11
Simple Linear Regression Chapter 11 Rationale Frequently decisionmaking situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related
More informationLinear and Piecewise Linear Regressions
Tarigan Statistical Consulting & Coaching statisticalcoaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Handson Data Analysis
More informationSummary of Probability
Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible
More informationRegression Estimation  Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation  Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. b 0, b 1 Q = n (Y i (b 0 + b 1 X i )) 2 i=1 Minimize this by maximizing
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationSurvival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence
Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest
More informationDescribe what is meant by a placebo Contrast the doubleblind procedure with the singleblind procedure Review the structure for organizing a memo
Readings: Ha and Ha Textbook  Chapters 1 8 Appendix D & E (online) Plous  Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationLecture 6: The Bayesian Approach
Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Loglinear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set
More informationModeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit nonresponse. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationIntroduction to Fixed Effects Methods
Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The PairedComparisons ttest as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measuresoffit in multiple regression Assumptions
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulationbased method for estimating the parameters of economic models. Its
More informationThe Big 50 Revision Guidelines for S1
The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand
More informationBayesian Methods. 1 The Joint Posterior Distribution
Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationExamples. David Ruppert. April 25, 2009. Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.
Cornell University April 25, 2009 Outline 1 2 3 4 A little about myself BA and MA in mathematics PhD in statistics in 1977 taught in the statistics department at North Carolina for 10 years have been in
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationVariance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.
Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be
More informationCalculating Interval Forecasts
Calculating Chapter 7 (Chatfield) Monika Turyna & Thomas Hrdina Department of Economics, University of Vienna Summer Term 2009 Terminology An interval forecast consists of an upper and a lower limit between
More informationModelbased Synthesis. Tony O Hagan
Modelbased Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationproblem arises when only a nonrandom sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a nonrandom
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationGraduate Programs in Statistics
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationCentre for Central Banking Studies
Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics
More informationSENSITIVITY ANALYSIS AND INFERENCE. Lecture 12
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationOLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.
OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationModels for Count Data With Overdispersion
Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extrapoisson variation and the negative binomial model, with brief appearances
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationParametric Models Part I: Maximum Likelihood and Bayesian Density Estimation
Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015
More informationFinal Exam, Spring 2007
10701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: Email address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:
More informationThe Exponential Family
The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural
More informationECONOMETRIC THEORY. MODULE I Lecture  1 Introduction to Econometrics
ECONOMETRIC THEORY MODULE I Lecture  1 Introduction to Econometrics Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 2 Econometrics deals with the measurement
More informationQuantitative Research Methods II. Vera E. Troeger Office: Office Hours: by appointment
Quantitative Research Methods II Vera E. Troeger Office: 0.67 Email: v.e.troeger@warwick.ac.uk Office Hours: by appointment Quantitative Data Analysis Descriptive statistics: description of central variables
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationMarkov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares
Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationForecasting in supply chains
1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the
More informationThe aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree
PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and
More informationSimultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the
Simultaneous Equation Models As discussed last week, one important form of endogeneity is simultaneity. This arises when one or more of the explanatory variables is jointly determined with the dependent
More information4. Introduction to Statistics
Statistics for Engineers 41 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationFE670 Algorithmic Trading Strategies. Stevens Institute of Technology
FE670 Algorithmic Trading Strategies Lecture 3. Factor Models and Their Estimation Steve Yang Stevens Institute of Technology 09/19/2013 Outline 1 Factor Based Trading 2 Risks to Trading Strategies 3 Desirable
More informationResearch Methods & Experimental Design
Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and
More informationWhat s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationNonparametric adaptive age replacement with a onecycle criterion
Nonparametric adaptive age replacement with a onecycle criterion P. CoolenSchrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK email: Pauline.Schrijner@durham.ac.uk
More informationStructural Econometric Modeling in Industrial Organization Handout 1
Structural Econometric Modeling in Industrial Organization Handout 1 Professor Matthijs Wildenbeest 16 May 2011 1 Reading Peter C. Reiss and Frank A. Wolak A. Structural Econometric Modeling: Rationales
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationAverage Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation
Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis WeiChen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationA General Approach to Variance Estimation under Imputation for Missing Survey Data
A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey
More informationObjections to Bayesian statistics
Bayesian Analysis (2008) 3, Number 3, pp. 445 450 Objections to Bayesian statistics Andrew Gelman Abstract. Bayesian inference is one of the more controversial approaches to statistics. The fundamental
More informationLecture #2 Overview. Basic IRT Concepts, Models, and Assumptions. Lecture #2 ICPSR Item Response Theory Workshop
Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More informationComparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors
Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors Arthur Lewbel, Yingying Dong, and Thomas Tao Yang Boston College, University of California Irvine, and Boston
More informationLasso on Categorical Data
Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More informationExperiment #1, Analyze Data using Excel, Calculator and Graphs.
Physics 182  Fall 2014  Experiment #1 1 Experiment #1, Analyze Data using Excel, Calculator and Graphs. 1 Purpose (5 Points, Including Title. Points apply to your lab report.) Before we start measuring
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationBAYESIAN ECONOMETRICS
BAYESIAN ECONOMETRICS VICTOR CHERNOZHUKOV Bayesian econometrics employs Bayesian methods for inference about economic questions using economic data. In the following, we briefly review these methods and
More informationDr. Peter Tröger Hasso Plattner Institute, University of Potsdam. Software Profiling Seminar, Statistics 101
Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam Software Profiling Seminar, 2013 Statistics 101 Descriptive Statistics Population Object Object Object Sample numerical description Object
More informationElements of statistics (MATH04871)
Elements of statistics (MATH04871) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis 
More informationLatent Class (Finite Mixture) Segments How to find them and what to do with them
Latent Class (Finite Mixture) Segments How to find them and what to do with them Jay Magidson Statistical Innovations Inc. Belmont, MA USA www.statisticalinnovations.com Sensometrics 2010, Rotterdam Overview
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More informationStatistical Foundations: Measures of Location and Central Tendency and Summation and Expectation
Statistical Foundations: and Central Tendency and and Lecture 4 September 5, 2006 Psychology 790 Lecture #49/05/2006 Slide 1 of 26 Today s Lecture Today s Lecture Where this Fits central tendency/location
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More information1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
More informationCHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA
Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN Linear Algebra Slide 1 of
More information