# Introduction to Fixed Effects Methods

Size: px
Start display at page:

Transcription

1 Introduction to Fixed Effects Methods The Promise of Fixed Effects for Nonexperimental Research The Paired-Comparisons t-test as a Fixed Effects Method Costs and Benefits of Fixed Effects Methods Why Are These Methods Called Fixed Effects? Fixed Effects Methods in SAS/STAT What You Need to Know Computing The Promise of Fixed Effects for Nonexperimental Research Every empirical researcher knows that randomized experiments have major advantages over observational studies in making causal inferences. Randomization of subjects to different treatment conditions ensures that the treatment groups, on average, are identical with respect to all possible characteristics of the subjects, regardless of whether those characteristics can be measured or not. If the subjects are people, for example, the treatment groups produced by randomization will be approximately equal with respect to such easily measured variables as race, sex, and age, and also approximately equal for more problematic variables like intelligence, aggressiveness, and creativity. In nonexperimental studies, researchers often try to approximate a randomized experiment by statistically controlling for other variables using methods such as linear regression, logistic regression, or propensity scores. While statistical control can certainly be a useful tactic, it has two major limitations. First, no matter how many variables you control for, someone can always criticize your study by suggesting that you left out some crucial variable. (Such critiques are more compelling when that crucial variable is named). As is well known, the omission of a key covariate can lead to severe bias in estimating the effects of the variables that are included. Second, to statistically control for a variable, you have to measure it and explicitly include it in some kind of model. The problem is that some variables are

2 2 Fixed Effects Regression Methods for Longitudinal Data Using SAS notoriously difficult to measure. If the measurement is imperfect (and it usually is), this can also lead to biased estimates. So in practice, causal inference via statistical adjustment usually runs a poor second to the randomized experiment. It turns out, however, that with certain kinds of nonexperimental data we can get much closer to the virtues of a randomized experiment. Specifically, by using the fixed effects methods discussed in this book, it is possible to control for all possible characteristics of the individuals in the study even without measuring them so long as those characteristics do not change over time. I realize that this is a powerful claim, and it is one that I will take pains to justify as we go along. What is also remarkable is that fixed effects methods have been lying under our noses for many years. If the dependent variable is quantitative, then fixed effects methods can be easily implemented using ordinary least squares linear regression. When the dependent variable is categorical, somewhat more sophisticated methods are necessary, but even then the fixed effects approach is a lot easier than many alternative methods. There are two key data requirements for the application of a fixed effects method. First, each individual in the sample must have two or more measurements on the same dependent variable. Second, for at least some of the individuals in the sample, the values of the independent variable(s) of interest must be different on at least two of the measurement occasions. 1.2 The Paired-Comparisons t-test as a Fixed Effects Method Perhaps the simplest design that meets these two requirements is a before-after study. Suppose, for example, that 100 people volunteer to participate in a weight loss program. They all get weighed when they enter the study, producing the variable W 1. All 100 people are then given a new medication believed to facilitate weight loss. After two months on this medication, they are weighed again, producing the variable W 2. So we have measurements of weight on two occasions for each participant. The participants are off the medication for the first measurement and are on the medication for the second. How should such data be analyzed? Before answering that question, let s first concede that this is not an ideal study design, most importantly because there is no control group of people who don t receive the medication at either time. Nevertheless, the application of fixed effects methods has all the virtues that I claimed above. The objective here is to test the null hypothesis that mean weight at time 1 is the same as mean weight at time 2, against the alternative that mean weight is lower at time 2. In this case, an easily applied fixed effects method is one that is taught in most introductory statistics courses under the name of pairedcomparisons t-test or paired-differences t-test. The steps are: 1. Form D = W 2 W Calculate D, the mean of D. 3. Test whether D is significantly less than 0. The third step is accomplished by dividing D by its estimated standard error s / n, where s is the sample standard deviation of D, and n is the sample size. The resulting test statistic has

3 Chapter 1 Introduction to Fixed Effects Methods 3 a t distribution with n 1 degrees of freedom under the null hypothesis (assuming that D is normally distributed). If D is significantly less than 0, what can we conclude? Well, we can t be sure that the medication caused the weight loss, because it s possible that something else happened to these people between time 1 and time 2. However, we can be sure that the difference in average weight between the two time points is not explainable by stable characteristics of the people in the study. In other words, we can be quite confident that the weight loss was not produced by changes in race, gender, parental wealth, or intelligence. While this conclusion may seem obvious, it s helpful to consider a mathematical formulation as a way of introducing some of the ideas that underlie the more complicated models considered later. Let W W i1 i2 = µ + α + ε i i1 = µ + δ + α + ε i i2 where W i1 is the weight of person i at time 1, and similarly W i2 is the weight at time 2. In this model, µ is the baseline average weight, and δ denotes the change in the average from time 1 to time 2. The disturbance terms ε i1 and ε i2 represent random variation that is specific to a particular individual at a particular point in time. As in other linear models, one might assume that ε i1 and ε i2 each have an expected value of 0. The term α i represents all the person-specific variation that is stable over time. Thus α i can be thought to include the effects of such variables as race, gender, parental wealth, and intelligence. When we form the difference score, we get Di = δ + ( ε ε ) i2 i1 which shows that both the baseline mean µ and the stable individual variation α i disappear when we compute difference scores. Therefore, the stable individual differences can have no effect on our conclusions, even if α i is correlated with ε i1 or ε i2. The essence of a fixed effects method is captured by saying that each individual serves as his or her own control. That is accomplished by making comparisons within individuals (hence the need for at least two measurements), and then averaging those differences across all the individuals in the sample. How this is accomplished depends greatly on the characteristics of the data and the design of the study. 1.3 Costs and Benefits of Fixed Effects Methods As already noted, the major attraction of fixed effects methods in nonexperimental research is the ability to control for all stable characteristics of the individuals in the study, thereby eliminating potentially large sources of bias. Within-subject comparisons have also been popular in certain kinds of designed experiments known as changeover or crossover designs (Senn 1993). In these designs, subjects receive different treatments at different times, and a response variable is measured for each treatment. Ideally, the order in which the treatments are received is randomized. The objective of the crossover design is not primarily to reduce bias, but to reduce sampling variability and hence produce more powerful tests of hypotheses.

4 4 Fixed Effects Regression Methods for Longitudinal Data Using SAS The rationale is that by differencing out the individual variability across subjects, one can eliminate much of the error variance that is present with conventional experimental designs in which each subject receives only one treatment. By contrast, when fixed effects methods are applied to nonexperimental data, there is often an increase in sampling variability relative to alternative methods of analysis. The reason is that in the typical observational study, the independent variables of interest vary both within and between subjects. Suppose, for example, that one of the independent variables is personal income, measured annually for five successive years. While there might be considerable within-person variation in income over time, the bulk of the variation is likely to be between persons. Fixed effects methods completely ignore the between-person variation and focus only on the within-person variation. Unfortunately, discarding the between-person variation can yield standard errors that are considerably higher than those produced by methods that utilize both within- and between-person variation. So why do it? The answer is that the between-person variation is very likely to be contaminated by unmeasured personal characteristics that are correlated with income. By restricting ourselves to the within-person variation, we eliminate that contamination and are much more likely to get unbiased estimates. So what we re dealing with is a trade-off between bias and sampling variability. For nonexperimental data, fixed effects methods tend to reduce bias at the expense of greater sampling variability. Given the many reasons for expecting bias in observational studies, I think this is usually an attractive bargain. Nevertheless, one crucial limitation to fixed effects methods arises when the ratio of within- to between-person variance declines to 0: fixed effects methods cannot estimate coefficients for variables that have no within-subject variation. Hence, a fixed effects method will not give you coefficients for race, sex, or region of birth. Among adults, it won t be very helpful in estimating effects of height or years of schooling (although there may be a little within-person variation on the latter). Keep in mind, however, that all these stable variables are controlled in a fixed effects regression, even if there are no measurements of them. In fact, the control is likely to be much more effective than in conventional regression. And as we ll see later, you can include interactions between stable variables such as sex and variables that vary over time. But for most observational studies, fixed effects methods are primarily useful for investigating the effects of variables that vary within subject. For experimental data, the situation with respect to bias and sampling variability is exactly reversed. Bias is eliminated by giving the same set of treatments to all subjects and by randomizing the order in which the treatments are presented. The result is approximately zero correlation between treatment and stable characteristics of the subjects, which means that there is no need for fixed effects to reduce bias. On the other hand, by design, all the variation on the independent variables (the treatments) is within subjects. So no information is lost by restricting attention to the within-subject variation. Indeed, standard errors can be greatly reduced by fixed effects methods because the error term has smaller variance.

7 Chapter 1 Introduction to Fixed Effects Methods 7 understand the basic model for binary logistic regression, how to estimate that model via maximum likelihood, and how to interpret the results in terms of odds ratios. Some familiarity with PROC LOGISTIC is helpful, but not essential. For chapter 4 on fixed effects Poisson regression, you should have a basic familiarity with the Poisson regression model, discussed in chapter 9 of Logistic Regression Using the SAS System: Theory and Application. I ll use PROC GENMOD to estimate the model, so previous experience with this procedure will be helpful. For chapter 5 on survival analysis, some basic knowledge of the Cox proportional hazards model and partial likelihood estimation is essential. These methods are described in my 1995 book Survival Analysis Using SAS: A Practical Guide, along with instruction on how to use the PHREG procedure. Finally, to read chapter 6 you should have some knowledge of linear structural equation models (SEMs) that include both observed and latent variables. A good introduction to this topic for SAS users is A Step-by-Step Approach to Using the SAS System for Factor Analysis and Structural Equation Modeling (Hatcher 1994). I have tried to keep the mathematics to a minimal level throughout the book. In particular, there is no calculus and little use of matrix notation. Nevertheless, to simplify the presentation of regression models, I frequently use the vector notation β x = β0 + β1x βk x k. While it would be helpful to have some knowledge of maximum likelihood estimation, it s hardly essential. With regard to SAS, the more experience you have with SAS/STAT and the SAS DATA step, the easier it will be to follow the presentation of SAS programs. On the other hand, most of the programs presented in this book are fairly simple and short, so don t be intimidated if you re just beginning to learn SAS. 1.7 Computing All the computer input and output displayed in this book was produced by and for SAS 9.1 for Windows. Occasionally I point out differences between the syntax of SAS 9.1 and earlier releases. I use the following convention for presenting SAS programs: All words that are part of the SAS language are shown in upper case. All user-specified variable names and data set names are in lower case. In the main text itself, both SAS keywords and userspecified variables are in upper case. The output displays were produced with the SAS Output Delivery System. To avoid unnecessary distraction, I do not include the ODS statements in the programs shown in the book. However, the reader who would like to duplicate those displays can do so with the following code before and after the procedure statements: OPTIONS LS=75 PS=3000 NODATE NOCENTER NONUMBER; ODS LISTING CLOSE; ODS RTF FILE='c:\book.rtf' STYLE=JOURNAL BODYTITLE; ---PROC STATEMENTS HERE--- ODS RTF CLOSE; ODS LISTING;

8 8 Fixed Effects Regression Methods for Longitudinal Data Using SAS All the examples were run on a Dell Optiplex GX270 desktop computer running Windows XP at 3 gigahertz with 1 gibabyte of physical memory.

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Research Methods & Experimental Design

Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

### Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

### individualdifferences

1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

### 17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

### Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

### UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

### A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

### Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

### Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Multiple Regression: What Is It?

Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

### Empirical Methods in Applied Economics

Empirical Methods in Applied Economics Jörn-Ste en Pischke LSE October 2005 1 Observational Studies and Regression 1.1 Conditional Randomization Again When we discussed experiments, we discussed already

### UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST A dependent-samples t test (a.k.a. matched or paired-samples, matched-pairs, samples, or subjects, simple repeated-measures or within-groups, or correlated groups)

### Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.

Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

### A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011

A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011 1 Abstract We report back on methods used for the analysis of longitudinal data covered by the course We draw lessons for

### Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

### Imputing Missing Data using SAS

ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### 1 Theory: The General Linear Model

QMIN GLM Theory - 1.1 1 Theory: The General Linear Model 1.1 Introduction Before digital computers, statistics textbooks spoke of three procedures regression, the analysis of variance (ANOVA), and the

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

REPUBLIC OF SOUTH AFRICA GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD SHAHID KHANDKER World Bank June 2006 ORGANIZED BY THE WORLD BANK AFRICA IMPACT

### Simple Tricks for Using SPSS for Windows

Simple Tricks for Using SPSS for Windows Chapter 14. Follow-up Tests for the Two-Way Factorial ANOVA The Interaction is Not Significant If you have performed a two-way ANOVA using the General Linear Model,

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

### SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in

### Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl

Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### The correlation coefficient

The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

### Logit Models for Binary Data

Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Note 2 to Computer class: Standard mis-specification tests

Note 2 to Computer class: Standard mis-specification tests Ragnar Nymoen September 2, 2013 1 Why mis-specification testing of econometric models? As econometricians we must relate to the fact that the

### USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique

### Reflections on Probability vs Nonprobability Sampling

Official Statistics in Honour of Daniel Thorburn, pp. 29 35 Reflections on Probability vs Nonprobability Sampling Jan Wretman 1 A few fundamental things are briefly discussed. First: What is called probability

### UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

### DATA COLLECTION AND ANALYSIS

DATA COLLECTION AND ANALYSIS Quality Education for Minorities (QEM) Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. August 23, 2013 Objectives of the Discussion 2 Discuss

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Structural Equation Modelling (SEM)

(SEM) Aims and Objectives By the end of this seminar you should: Have a working knowledge of the principles behind causality. Understand the basic steps to building a Model of the phenomenon of interest.

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### 3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

### Notes on Applied Linear Regression

Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

### Statistics 2014 Scoring Guidelines

AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Introduction to proc glm

Lab 7: Proc GLM and one-way ANOVA STT 422: Summer, 2004 Vince Melfi SAS has several procedures for analysis of variance models, including proc anova, proc glm, proc varcomp, and proc mixed. We mainly will

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### Overview of Factor Analysis

Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

### An Introduction to Path Analysis. nach 3

An Introduction to Path Analysis Developed by Sewall Wright, path analysis is a method employed to determine whether or not a multivariate set of nonexperimental data fits well with a particular (a priori)

### Gender Effects in the Alaska Juvenile Justice System

Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender

### Solución del Examen Tipo: 1

Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### MULTIPLE REGRESSION WITH CATEGORICAL DATA

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

### Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.

Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,

### The Null Hypothesis. Geoffrey R. Loftus University of Washington

The Null Hypothesis Geoffrey R. Loftus University of Washington Send correspondence to: Geoffrey R. Loftus Department of Psychology, Box 351525 University of Washington Seattle, WA 98195-1525 gloftus@u.washington.edu

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables We often consider relationships between observed outcomes

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Two-sample inference: Continuous data

Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

### The Use of Event Studies in Finance and Economics. Fall 2001. Gerald P. Dwyer, Jr.

The Use of Event Studies in Finance and Economics University of Rome at Tor Vergata Fall 2001 Gerald P. Dwyer, Jr. Any views are the author s and not necessarily those of the Federal Reserve Bank of Atlanta

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Chapter 6 Experiment Process

Chapter 6 Process ation is not simple; we have to prepare, conduct and analyze experiments properly. One of the main advantages of an experiment is the control of, for example, subjects, objects and instrumentation.

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Using R for Linear Regression

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -