MORE ON LOGISTIC REGRESSION
|
|
- Emery Shepherd
- 8 years ago
- Views:
Transcription
1 DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MORE ON LOGISTIC REGRESSION I. AGENDA: A. Logistic regression 1. Multiple independent variables 2. Example: The Bell Curve 3. Evaluation of fit 4. Inference B. Reading: Agresti and Finlay Statistical Methods in the Social Sciences, 3 rd edition, pages 576 to 585. II. MULTIPLE VARIABLE LOGISTIC REGRESSION: A. As noted last time, we can approach model building, that is, explaining variation in the log odds or odds, in the same way we did with multiple regression. 1. We can add variables of any of the types discussed under that topic. 2. Hence, we can add more continuous Xs; 3. Dummy indicators for categorical variables. 4. Interaction terms. B. Here once again are the results for the California congressional delegation. 1. The estimated parameters are partial regression coefficients that show the effects of a variable on the logit when the other variables in the model have been held constant or controlled. 2. Here are the results: Logistic Regression Table Odds 95% CI Predictor Coef StDev Z P Ratio Lower Upper Constant Rural ADA Log-Likelihood = Test that all slopes are zero: G = , DF = 2, P-Value = C. We ll consider the significance of the terms in a moment. 1. But you might anticipate on what has gone before that there really won t be an improvement. i. For one thing, there is a relatively strong negative correlation between ADA and percent rural, The estimated equation for the log odds is:
2 Posc/Uapp 816 Class 23 More on Logistic Regression Page 2 ŜS ' &2.711 &.01497Rural %.14762ADA 3. To see what the numbers mean just substitute some meaningful values for X 1 and X 2 such as 0 and 0. i. Actually this combination would not make sense in American politics because it s unlikely that an extreme conservative (ADA = 0) would represent a totally urbanized district (rural = 0). ii. Anyway, the estimated log odds would be and the estimated odds would be e =.0665 to 1. iii. What would the log odds, odds and predicted probability be for a representative from a district with a rural population of 30 and an ADA score of 70? D. A more realistic example: 1 1. Let s consider Richard Herrnstein and Charles Murray s The Bell Curve, an important book that claims IQ(native intelligence) accounts for variation in achievement much more than social background does. i. At one point the authors want to explain being below the poverty line. That is, they are not concerned with the rate or number of poor people; rather they want to know, given that a person has such and such an IQ, is of such and such age, and comes from such and such family background, what is the probability of the person s living below the poverty line. 1) In our previous notation, they want a model for π, where π is the probability that Y = 1 (i.e., being in poverty). 2. Major arguments: i. They claim among other things, that IQ has an independent effect on poverty status, irrespective of other variables such as social economic background. ii. Hence, whatever the family background, the higher one s IQ, the lower the one s chance of living in poverty (a negative correlation). iii. They use their analysis of poverty to make a much larger point: 1) Our thesis is that the twentieth century has continued the transformation, so that the twenty-first will open on a world in which cognitive ability [which, they claim, is driven mostly by genes, not environment] is the decisive dividing force...social class remains the vehicle of life, but 1994). 1 Richard J. Herrnstein and Charles Murray, The Bell Curve (New York: The Free Press,
3 Posc/Uapp 816 Class 23 More on Logistic Regression Page 3 intelligence now pulls the train I have some comments on their analysis at the end of the notes, plus a couple of suggested readings. But let s simply use their results as an example of logistic regression. E. The logit 1. As we saw in Class 22, social scientists prefer not to model probabilities directly (such models are called the linear probability models). Instead, they use a transformation of the probability, π. 2. One of the commonest and the one used by Herrnstein and Murray is the logit, defined as: S ' logit(b) ' log B (1 & B) where π is the probability of an event of interest occurring(e.g., being in a state of poverty. (This is the log of the odds.) 3. A linear multiple variable model for the log odds is: S ' $ 0 % $ 1 X 1 % $ 2 X 2 %... % $ K X K % g 4. Recall some of the properties of log odds and models for them. i. They can take on any value from minus to plus infinity. ii. Hence, if we think of the logit as a dependent variable, we model it in much the same way as with regression analysis. iii. The errors in log odds models (hopefully) meet the statistical assumptions so that we can obtain unbiased and efficient estimators of parameters. 1) They have constant variance and are uncorrelated with each other and explanatory variables. F. A logit is not a natural variable to many of us so in order to understand the substantive significance of these models we can convert it to a probability: B ' e logit 1 % e & logit G. Probabilities of these sort constitute the subject of The Bell Curve s models. 2 Herrnstein and Murray, The Bell Curve, page 25.
4 Posc/Uapp 816 Class 23 More on Logistic Regression Page 4 1. Much of their data come from the National Longitudinal Survey [of Labor Market Experience] of Youth. i. It is a panel study in which a sample of people aged 14 to 22, first interviewed in 1979, are repeatedly re-interviewed. ii. The original sample consisted of 12,686 respondents. 2. In one analysis their main explanatory variables are: i. Armed Forces Qualitfy Test (AFQT) scores, which is used as a measure of cognitive ability and that reflects native intelligence ( g scores ). ii. Social economic status (SES) of the respondents family. iii. Age 3. All of the variables are standardized to have mean 0 and standard deviation 1.0. i. Recall our discussion of standardized data and the hopes placed on them. H. The estimate parameters for their various models appear in Appendix 4 of The Bell Curve. 1. For example, the following table (based on the one on page 594) gives these estimates of the regression parameters for logits pertaining to falling below the poverty level. Variable Estimate Constant/ intercept IQ (AFQT) SES Age The equation version of these results, which shows their relations to predicted values of the logit is ŜS ' & & &.83763(AFQT) & &.33017(SES) & &.02384(Age) 3. Once we find predicted logits, we can use the formula on page to convert them to probabilities. I. Examples: 1. Suppose we let the three independent variables have their mean values, which are 0. i. As noted, all of the variables in this analysis have been standardized.
5 Posc/Uapp 816 Class 23 More on Logistic Regression Page 5 And recall that the mean of a standardized variable is 0 and the standard deviation is 1. ii. The use of standardized scores, moreover, means that the coefficients are standardized regression coefficients, whose magnitudes can presumably be directly compared. 1) We know that this assumption is problematic, however. J. Based on a comparison of the standardized coefficients Herrnstein and Murray claim that IQ is a more important explanation of this form of achievement that is social class background. 1. The standardized coefficient is twice as large as the one for social background, so doesn t this mean that intelligence is twice as important in explaining poverty status? i. The drift of their argument is that people who are poor (or don t achieve) have only their biological endowment to blame. They haven t been disadvantaged by their environment. K. Interpretation of coefficient: 1. For now we need to put aside any discussion of this claim and look at the numbers meanings. 2. So if we let age = IQ = SES = 0, which is the same as looking at some who is at the mean or average of these factors, we can predict this person s log odds of being below the poverty line: i. Note carefully: in this context a score of 0 represents the mean. It does not mean, for instance, literally age equals zero. ŜS ' & &.8376(0) &.3301(0) &.0239(0) & ii. The odds and probability that corresponds to this logit are: ˆ? ' e & ' and ˆB ' e & ' & % e iii. iv. These numbers mean, first, that the odds of someone with average age, SES standing, and intelligence being below the poverty line are.07 to 1. The corresponding probability of such a person being in poverty is
6 Posc/Uapp 816 Class 23 More on Logistic Regression Page 6 3. Now suppose a person is exceptionally bright. That is, although the SES and age remain at the mean (0), the individual s IQ is one full standard deviation above the average (that is, IQ = 1). i. Again, recall that the data are in standard deviation form. ii. If IQ is normally distributed this would mean that the person is above two thirds of the sample. 4. The estimated log odds are now: ŜS ' & &.8376(1) ' & &.8376 ' & i. The log odds have decreased slightly ii. The standardized partial regression coefficient for IQ is Since this is added to the constant, which is negative, we see that the logit (and odds) will decrease. 5. The odds and probability that the person falls into poverty are ˆ? ' e & & (1) ' e & ' and ˆB ' e & ' & % e i. Being in the upper third of the IQ distribution thus lowers (compared to being average) the odds and probability of being below the poverty level, after social class and age have been controlled.. 6. Now let s see what the estimated log odds are of someone whose IQ falls 1 standard deviation below average: ŜS ' & &.8376(&1) &.3301(0) &.0239(0) ' & %.8376 ' &1.8110
7 Posc/Uapp 816 Class 23 More on Logistic Regression Page 7 i. The log odds have gone up a bit and the estimated odds and probability are: ˆ? ' e & & (&1) ' e &1,81110 ' and ˆB ' e &1,81110 '.1405 &1, % e ii. The chances for someone near the bottom of the IQ ladder (the bottom one third) being below the poverty level have increased quite a bit. 7. We can substitute in other values in order to see what effect they have on the logits and (as shown below) the probabilities. For example, consider a Herrnstein-Murray loser, someone two standard deviations below the mean. The log odds of being in poverty are , the odds are to 1; and probability is i. This perhaps discouraging results show that a person of substantially below average intelligence has a more than one in four chance of being poor, even after controlling for age and social background. L. I ll try to present some graphs, similar to the ones in Chapter 5, that show how the probability of being in poverty changes with changes in an independent variable, with the other variables held constant. 1. More comments later. III. EVALUATING LOGISTIC REGRESSION MODELS: A. The notes in this section simply repeat the ones for Class 22. B. We can test the significance of the estimated parameters using the same as ideas as we employed in regular regression. In particular, we can 1. 2 Compute statistics roughly comparable to R as a measure of how well the data fit the model. 2. An overall or global test of the regression parameters. 3. Tests for individual parameters. 4. Confidence intervals for estimated parameters. 5. Confidence intervals for predicted probabilities. C. Note and warning: 1. The statistical results for logistic regression usually assume that the sample is relatively large. 2. If, for example, a statistic such as the estimator of the regression coefficient is said to be normally distributed with a standard deviation of σ β, the
8 Posc/Uapp 816 Class 23 More on Logistic Regression Page 8 statement applies strictly speaking for estimators based on large N. i. How big does N have to be? A rule of thumb: roughly 60 or more cases. 2 D. The R analogue There really isn t a completely satisfactory version of R available to measure the explained variation in Y similar to common multiple R, so we will use a different measure, the correct classification rate (CCR). 2. MINITAB effectively constructs a cross-classification table of predicted and observed results that takes this form. Observed/Predicted Ŷ ' 0 Ŷ ' 1 Number Number Y = 0 correct incorrect Number Number Y = 1 incorrect correct i. The table cross-classifies predicted Y s by observed Y s. 3. If the model does a good job, then presumably the total number of correct predictions--the frequencies in the main (shaded) diagonal--should greatly outweigh the incorrect guesses. i. For instance, suppose a model led to this pattern of correct and incorrect predictions. Observed/Predicted Y = 0 Y = 0 Y = 0 Y = ii. Since there a total of 83 observations in the table and 76 of them have been correctly predicted, the CCR is 76/83 X 100 = 91.16%. 4. Some software reports this number or it can be easily calculated from reported data. 5. MINITAB, however, reports measures of association for the table. i. These measures are bounded between -1.0 and 1.0 and attain maximum values (1.0) when there are no errors. ii. So a measure equal to.9 indicates that most of the Y s have been correctly predicted and the model fits reasonably well. 6. The measures for the percent rural and the ADA models are:
9 Posc/Uapp 816 Class 23 More on Logistic Regression Page 9 Percent Rural Somers' D 0.51 Goodman-Kruskal Gamma 0.57 Kendall's Tau-a 0.22 ADA Somers' D 0.94 Goodman-Kruskal Gamma 0.95 Kendall's Tau-a 0.40 i. The measures association for the independent variable rural are about.5--half way between 0 for no correlation and 1.0 for perfect correlation--so the data fit at best moderately well. 1) Note I prefer using Somer s measure. ii. For the ADA variable, however, the value of the measure is nearly 1, which suggests a quite good fit. iii. One would based on these considerations conclude that ADA scores better explain and predict votes on assault weapons than percent rural does. Needless to say, this conclusion undercuts the original hypothesis. IV. INFERENCE FOR LOGISTIC REGRESSION: A. A global test of the hypothesis that β 1 = β 2 = β 3 =... = β K is usually done by comparing the likelihood, L β, for the model to the likelihood (L 0) for a model for the data containing only a constant. 1. Sorry, we can take time to explain likelihood, although the concept is not difficult. 2. Think of it as very, very roughly akin to residual sum of squares. B. A bit more formally one obtains an observed statistic LLR ' &log L $ L 0 2 ' &2(logL $ & logl 0 ) 1. LLR, called the log of the likelihood ratio, is a simple chi square statistic with degrees of freedom equal to the number of variables in the model. 2. The sample size has to be reasonably large, say more than 60 cases. C. More generally, one can test the significance of a set of parameters by comparing a model that includes them--call it the complete model--with one that does not have those parameters--call it the reduced model. 1. This strategy parallels in form the one used in multiple regression. 2. That is suppose the complete model has K variables while the reduced
10 Posc/Uapp 816 Class 23 More on Logistic Regression Page 10 model contains K - q, where q < K. 3. Use a program to obtain the likelihood for the full model (L complete) and the likelihood for the reduced model (L reduced) 4. The test statistic: LLR ' &log L reduced L complete 2 2 is distributed as χ with q degrees of freedom. D. Maximum likelihood estimation provides (asymptotic or large sample) standard errors of the coefficients. These can be used to test hypotheses about individual parameters and construct (simultaneous) confidence intervals. E. The test statistic for a (partial) regression parameter resembles the form of the statistic for regular regression parameters: it is the estimated coefficient divided by its standard error. 1. This statistic, called Wald s Z, is: Z ' ˆ$ & $ ˆFˆ$ 2. It is distributed approximately as a standard normal variable, so one uses the z table to find a critical value and test the hypothesis, which usually is that β = 0. i. This is a z statistic, not t. ii. Report attained level of significance when possible. 3. As noted above, 60 cases should be sufficiently large in most situations to obtain reasonably valid results. i. I have also seen the rule of thumb: the ratio of the sample size to the number of variables in the model should be 20 to 1 or greater. 3 F. I ll discuss examples in class. V. A COUPLE OF ADDITIONAL REMARKS ON THE BELL CURVE: A. The importance of explanatory factors: a critique 1. First, as noted several times Herrnstein and Murray analyze variables with 3 Robert D. Rutherford and Minja Kim Choe, Statistical Models for Causal Analysis (Wiley, 1993) page 137.
11 Posc/Uapp 816 Class 23 More on Logistic Regression Page 11 different scales (e.g., AFQT has a different scale than parental socioeconomic background). 2. To over come this problem, which is actually not necessarily a problem, the authors standardize each variable so that the means are 0 and standard deviations are 1. i. Consequently, instead of talking about, say, age in years, they refer to it in standard deviation units. A person, for instance, isn t 22 years old, but has a score of.3 or 3/10 of a standard deviation. Another individual might have a score on age of, say, -.21 instead of Standardizing presumably permits one to compare the magnitudes of different β s because they are all based on the same scale. 4. The authors then interpret the numerical size of the coefficients as indicators of importance. 5. As noted countless times before, the β s based on standard scores are called standardized regression coefficients instead of just regression coefficient. 6. One can think of them in resulting from the following manipulation: $ ( 1 ' $ ˆF Y 1 ˆF X * i. Here the β is the standardized coefficient, the one Herrnstein and Murray report, β 1 is the unstandardized regression coefficient, and the sigmas are the sample standard deviations. 7. In view of this relationship note the following. All else equal, if the variation in X doubled, the size of the standardized coefficient would also double. 8. The lesson is thus that the magnitude of a standardized coefficient is a function not only of the strength of the relationship but also the amount of variation in the independent variable. 9. So if we were comparing two groups (with standardized coefficients) our conclusions about the importance of variables could be affected by the variation in each group. B. Theoretical importance:
12 Posc/Uapp 816 Class 23 More on Logistic Regression Page 12 Figure 1: Importance of Variables? 1. Look at Figure 1 above. i. Suppose X 1 and X 2 both affect Y. Can we say that X 1 is more important than X 2, even though the first s coefficient is larger? ii. Suppose both are necessary for the occurrence or variation of Y. iii. This seems to be a theoretical issue, not one of statistics. C. A huge amount of has been written about The Bell Curve: 1. For a statistical analysis see Arthur S. Goldberger and Charles F. Manski, Review Article: The Bell Curve in the Journal of Economic Literature, volume 33 (1995) pages 762 to An excellent but mostly verbal collection of essays that critique the book is The Bell Curve Wars: Race, Intelligence, and the Future of America, edited by Steven Fraser (Basic Books, 1995.) 3. For a more balanced assessment see Bernie Devlin and others, Intelligence, Genes, and Success (Springler-Verlag, 1997). 4. Also, Claude S. Fisher and others, Inequality By Design: Cracking the Bell Curve Myth (Princeton University Press, 1996). 5. By far the most popular foe of Herrnstein and Murray s critics is Stephen Jay Gould. As much as I respect and admire his work, I think it is fair to say many biologists, philosophers, and sociologists find great faults in his analysis of sociobiology. But most would agree that The Bell Curve is a flawed study of genes and intelligence. VI. NEXT TIME: A. Summary of data analysis Go to Notes page Go to Statistics page
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationTIME SERIES ANALYSIS
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 TIME SERIES ANALYSIS I. AGENDA: A. Correction B. Time series C. Reading: Agresti and Finlay Statistical Methods in the Social Sciences,
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationCategorical Data Analysis
Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationMULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationCorrelational Research
Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.
More informationChapter 23. Inferences for Regression
Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationStata Walkthrough 4: Regression, Prediction, and Forecasting
Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationChi Square Tests. Chapter 10. 10.1 Introduction
Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationSUGI 29 Statistics and Data Analysis
Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationElementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationOne-Way Analysis of Variance
One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationChapter 7. One-way ANOVA
Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks
More informationGender Effects in the Alaska Juvenile Justice System
Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender
More informationSimple Linear Regression
STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationElements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
More informationIntroduction to Linear Regression
14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationDeveloping Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationUsing An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
More informationAn analysis method for a quantitative outcome and two categorical explanatory variables.
Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
More informationAPPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING
APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More information4. Multiple Regression in Practice
30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationMODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING
Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects
More informationViolent crime total. Problem Set 1
Problem Set 1 Note: this problem set is primarily intended to get you used to manipulating and presenting data using a spreadsheet program. While subsequent problem sets will be useful indicators of the
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationTesting for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
More informationCOMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationIn the past, the increase in the price of gasoline could be attributed to major national or global
Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More informationCHAPTER 13. Experimental Design and Analysis of Variance
CHAPTER 13 Experimental Design and Analysis of Variance CONTENTS STATISTICS IN PRACTICE: BURKE MARKETING SERVICES, INC. 13.1 AN INTRODUCTION TO EXPERIMENTAL DESIGN AND ANALYSIS OF VARIANCE Data Collection
More informationSection Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini
NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building
More informationCOLLEGE ALGEBRA. Paul Dawkins
COLLEGE ALGEBRA Paul Dawkins Table of Contents Preface... iii Outline... iv Preliminaries... Introduction... Integer Exponents... Rational Exponents... 9 Real Exponents...5 Radicals...6 Polynomials...5
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationChapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation
Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus
More informationLogistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
More informationPredicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables
Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Introduction In the summer of 2002, a research study commissioned by the Center for Student
More informationCausal Forecasting Models
CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationHLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More informationINTRODUCTION TO MULTIPLE CORRELATION
CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION Chapter 12 introduced you to the concept of partialling and how partialling could assist you in better interpreting the relationship between two primary
More information