Goodness of fit assessment of item response theory models


 Britton Stanley
 2 years ago
 Views:
Transcription
1 Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014
2 Outline Introduction Overall goodness of fit testing Two examples Assessing goodness of approximation Fine tuning the model: identifying misfitting items Final remarks
3 3 Introduction Statistical modeling involves finding a model that may have generated the data What is (absolute) goodness of fit testing? o Testing whether the fitted model may have generated the observed data Why is this important? o Because inferences drawn on poorly fitting models are misleading How misleading? o It depends on a number of factors, among them the discrepancy between the true and fitted model
4 4 Why is fit important? 3 df 3 df 3 df 1 df 1 df 1 df 0 df
5 5 Outline Introduction Overall goodness of fit testing
6 6 How to assess goodness of fit in IRT models IRT are models for multivariate categorical data o The goodness of fit of IRT models is assessed using goodness of fit statistics for categorical data, such as Pearson s X statistic o There is nothing really special about testing IRT models
7 7 Pearson s X Notation: o N = number of respondents o n = number of items o K = number of response alternatives (number of categories) o C = K n = number of possible response patterns (c = 1,, C) o p c = observed proportion of a response pattern o c = probability of a response pattern o = vector of item parameters (not latent traits!) o q = number of item parameters where ˆ ˆ, ˆ o For the MLE, X ( p ˆ ) ˆ 1 X N N ˆ c c p ˆ D p ˆ, (1) c c p are the cell residuals, and D ˆ diag ˆ d Cq1
8 8 Two small examples A 1PL (a Rasch model where the latent trait is treated as a random effect) applied to a 15 item test measuring mathematical proficiency in Chilean adults o n = 15; N = 3,000; K = ; q = 16; C = 15 = 3,768; df = 3,751 Samejima s graded model applied to the 7 items of the short version of the PROMIS depression scale o n = 7; N = 767; K = 4; q = 35; C = 4 7 = 16,384; df = 16,348 Problem: o (asymptotic) p values for X are only accurate when the expected frequencies of every pattern are large ( Nc 5 is the usual rule of thumb) o In large models (large C) that s impossible to accomplish o p values for X are only accurate in models with a small number of possible response patterns (300?) In IRT we are interested in much larger models than C = 300, how do we go about testing them?
9 9 Limited information goodness of fit testing There have been a number of proposals, most of them involve limited information test statistics Key references o Reiser (1996, Psychometrika) o Cai, Maydeu Olivares, Coffman & Thissen (British J. of Math and Stat Psych, 006) o Maydeu Olivares & Joe (005, J. American Stat Assoc; 006, Psychometrika)
10 10 What is limited information goodness of fit testing? Multivariate categorical data admits (at least) two representations cells moments Y 1 Y 0 1 Y Y 0 1 (1)(1) 1 (1) (1)() 1 () (1) 1 There is a one to one relationship between them All formulae in categorical data analysis can be written using cells or moments 1 ˆ ˆ ˆ ˆ p p p D pˆ ˆ ˆ 1 p p X N N, () Pearson s X is a quadratic form in all moments up to order n
11 11 What is limited information goodness of fit testing? 1 ˆ ˆ ˆ ˆ p p p D pˆ ˆ ˆ 1 p p X N N A limited information test statistic is ˆ 1 p p ˆ , (3) L N (4) For chi square distributed statistics degrees of freedom = number of statistics number of item parameters Degrees of freedom for Samejima s model are positive when univariate and bivariate moments are used (r = ) o Limited information testing requires using at least univariate and bivariate moments
12 1 Moments or margins? cells moments Y 1 Y 0 1 Y Y 0 1 (1)(1) 1 (1) (1)() 1 () (1) 1 Univariate moments = univariate probabilities that do not involve category 0 A researcher interested in univariate testing can use (1) (1) () o All univariate moments: 1,, o All univariate margins:,,,, (0) 1 (1) 1 o The relationship is one to one It is convenient to use moments because they are mathematically independent o Fewer o Their covariance matrix is of full rank Moments up to order r is equivalent to use all r way margins Bivariate information = univariate and bivariate moments (0) (1) ()
13 13 Pros and cons of limited information testing Suppose bivariate information is used PROS: 1) Accurate (asymptotic) p values can be obtained with N = 100 o Asymptotic approximation to the limited information statistic requires that 4 way frequencies are large For r way test, r way frequencies must be large ) If misfit is located in the bivariate margins, higher power is obtained CON: If misfit is not located in the bivariate margins, no power
14 14 Overall limited information test statistics Consider the quadratic form in univariate and bivariate moments Q N p ˆ Wˆ p ˆ. (5) Q is asymptotically distributed as a mixture of independent chi square variates. When Ŵis chosen so that ΣWΣWΣ ΣWΣ (6) where N is the asymptotic covariance matrix of p ˆ, Q is asymptotically chi square Two ways to satisfy (6) o Choose Ŵ such that is a generalized inverse of W: M of M O and Joe (005,006) M N ˆ ˆ p C p ˆ, X X D ( D X D ) DX C, (7) o Choose Ŵ such that W is a generalized inverse of : R of Reiser (1996) R N p ˆ Σˆ p ˆ, (8)
15 15 Overall limited information test statistics Third alternative: 1 Use an easily computable weight matrix ˆ ˆ 1 W ˆ,diag, p values by adjusting Q by its asymptotic mean and variance 1 o With ˆ diagˆ I and compute W and binary data, this is Cai et al. (006) and Bartholomew and Leung (00) o Within the context of SEM, these are the Satorra Bentler corrections Degrees of freedom: o M : number of statistics number of parameters o R : estimated rank of ˆΣ (integer) o MV: df tr tr WΣ ˆˆ WΣ ˆˆ (a real number) MV corrected statistics can be transformed so that they have the same number of degrees of freedom as M
16 16 The covariance matrix of the bivariate residuals (moments) Acov( ˆ ) (9) Notation: o = (asymptotic) covariance matrix of univariate and bivariate proportions (that exclude category 0) o = derivatives of univariate and bivariate moments with respect to item parameters o Acov( ˆ) = (asymptotic) covariance matrix of item parameter estimates 1 For multinomial ML (marginal ML), Acov( ˆ ) ˆ
17 17 Estimating the covariance matrix of the item parameter estimates Expected information, E o First order derivatives for all possible patterns o It can only be computed for small models Cross products information, XP o First order derivatives for observed patterns Observed information, O o First and second order derivatives for observed patterns 1 1 Sandwich (aka robust) information, S O XP O Crossing the different estimators of the covariance matrix of the item parameter estimates with the different choices of bivariate statistics, there are many possible choices of test statistics!
18 18 Outline Introduction Overall goodness of fit testing Two examples
19 19 Two numerical examples A 1PL applied to Chilean mathematical proficiency data o n = 15; N = 3,000; K = ; q = 16; C = 15 = 3,768; df = 3,751 o Samejima s graded model applied to the short version of the PROMIS anxiety scale o n = 7; N = 767; K = 4; q = 35; C = 4 7 = 16,384; df = 16,348 PROMIS anxiety Chilean mathematical proficiency stat value df RMSEA value df p RMSEA M R Y L Y L
20 0 Type I errors, N = 300 K =, n = 10 K = 5, n = 10 stat =.01 =.05 =.10 =.01 =.05 =.10 M Y Y O XP L L O XP R O XP L N p p, ˆ diag ˆ 1 ˆ ˆ ˆ 1 ˆ Y N p p, (10)
21 Computational issues For large unidimensional models, computing the goodness of fit statistic can take longer than the estimation of the item parameters and their SEs The number of univariate and bivariate moments is s = nn ( 1) nk ( 1) ( K 1) o For n = 0 and K = 5, s = 3,10; o For n = 30 and K = 7, s = 15,840 (probably unfeasible) When data is nominal, there is no solution When data is ordinal one can use instead of a quadratic form in residual univariate and bivariate moments a quadratic form in residual (multinomial) means and cross products The number of means and cross products is s = o One can test much larger models nn ( 1) 1
22 Means and cross products for multinomial variables 0 Pr 0 ( 1) Pr 1 EY Y K Y K, (11) i i i i i i ij E YY i j 00Pr Yi 0, Yj 0 ( K 1) ( K 1) Pr Y K 1, Y K 1 i j i i j j, (1) with sample counterparts k i y i (the sample mean), and kij yy i j / (the sample cross product), respectively, where y i denotes the observed data for item i. N For our previous 3 example, the elements of are 1PrY 1 (1) 1Pr 1 Pr 1 (1) () EY EY Y Y EYY 11Pr Y1, Y 1 1Pr Y1, Y. (1)(1) (1)() (13)
23 3 Testing large models for ordinal data, M ord Using these statistics one can construct a goodness of fit statistic for ordinal data M N k C k, ord ˆ ˆ ˆ ord X X D ( DX D ) D X C, (14) ord ord ord ord r ord ord ord ord It is just like M except that M ord uses as summaries statistics that further concentrate the information o If the misfit is in the direction where the information has been concentrated, M ord will be more powerful than M It is good practice to check the rank of D or D ord in applications. If the rank of these matrices is unstable, the estimated statistic is not reliable o The model is empirically underidentified from the information used for testing M ord cannot be computed when K is large and n is small due to lack of df, use M M ord = M for binary data
24 4 Outline Introduction Overall goodness of fit testing Two examples Assessing goodness of approximation
25 5 Assessing the goodness of approximation Finding a well fitting model is more difficult as the number of variables increase o If I generate a 1 variable dataset and I ask you what statistical model I used to generate the data, it will be easier for you than if I give you a 10 variable dataset In our experience, finding a well fitting model is more difficult as the number of categories increase o Are our models for polytomous data worse than our models for binary data? o Are personality/attitudinal/patient reported data more difficult to model than educational data?
26 6 Assessing the goodness of approximation I would argue that finding a well fitting model in IRT is harder than in FA o In IRT we model all moments (univariate, bivariate, trivariate,, n variate) o In FA we only model univariate and bivariate moments If we reject the fitted model, how do we judge how close we are to the true and unknown model? o I.e., how do we judge the goodness of approximation of our model?
27 7 Bivariate RMSEA and full information RMSEA The Root Mean Squared Error of Approximation is the standard procedure for assessing goodness of approximation in SEM A RMSEA can be constructed based on Pearson s X (RMSEA n ), M (RMSEA ), or any other statistic with known asymptotic distribution For members of the RMSEA r family r T 0 0 T 0 r r Crr r df r r M r ˆ r (15) df r RMSEA will generally be larger than RMSEA n because M is generally more powerful than X and statistics with higher power have a higher M / df ratio. r r
28 8 Choice of cutoff values for RMSEA and RMSEA n in binary IRT models A cut off of RMSEA 0.05 separates unidimensional from twodimensional models A cut off RMSEA n 0.03 separates unidimensional from twodimensional models
29 9 RMSEA n, RMSEA, and RMSEA ord RMSEA n is based on X There is a limit in the size of models for which X can be computed o Its sampling distribution will only be well approximated in small models (just as the sampling distribution of X ) o It is not possible to offer an overall cut off as population values of RMSEA n depend on the number of variables and the number of categories
30 30 RMSEA n, RMSEA, and RMSEA ord RMSEA is based on M o It can be computed for larger models o Its sampling distribution can be well approximated in large models o One can obtain confidence intervals and a test of close fit o It is possible to offer an overall cut off RMSEA ord is based on M ord o It can be computed for even larger models o Its sampling distribution can be well approximated in large models o One can obtain confidence intervals and a test of close fit BUT o It is not possible to offer an overall cut off as population values of RMSEA ord depend on the number of variables and the number of categories
31 31 Effect of the number of variables and categories on RMSEA and RMSEA ord RMSEA adjusted by the number of categories is stable when misfit is small RMSEA ord decreases as n and K increase. o It is easier to obtain a low RMSEA ord for large n and K
32 3 Goodness of approximation when the model is large If the model is so large that RMSEA cannot be computed and we should not use RMSEA ord because we cannot offer an overall cutoff, what do we do? The RMSEAs are ill designed to assess pure goodness of approximation o They cannot be interpreted substantively They are a weighted sum of residuals divided by df The RMSEAs by construction mix goodness of approximation and model selection How can we measure pure goodness of approximation? We want to measure the magnitude of the misfit (effect size) o In substantively interpretable units o Such that we can offer an overall cut off valid for any n and K
33 The Standardized Root Mean Squared Residual (SRMSR) A more appropriate name is the Squared Root Mean Residual Correlation o It is only valid for binary or ordinal data (just as the RMSEA ord ) r ˆ ij ij SRMSR (16) nn ( 1)/ i j r ij is just the product moment (Pearson) correlation and ˆ ˆ ˆ ij i i ˆ ij. (17) ˆ ˆ ˆ ˆ ii i jj j where the means ( i and j ) and the cross product ij were given in (11) and (1), and ii is ii E Yi 0 Pr Yi 0 ( Ki 1) Pr Yi Ki 1. (18) 33
34 Relationship between RMSEA and SRMR 34
35 Relationship between RMSEA adjusted by the number of categories and SRMR 35
36 36 Proposal of cut off values for assessing close fit in IRT criterion RMSEA SRMR adequate fit close fit excellent fit 0.05 / (K 1) 0.07 / (K 1) Criteria for adequate fit and close fit are essentially identical to those proposed by Browne and Cudeck (1993) in the context of SEM Criteria for excellent and close fit are equal for binary data It is possible (but cumbersome) to compute CIs and tests for the SRMR o It is best used as a goodness of fit index When used as a goodness of fit index, the SRMR can be computed for models of any size o It only requires computing means and cross products under the IRT model
37 37 Goodness of fit statistics are only summary measures It is possible that the model fits well overall but that it fits very poorly some items o It is good practice to inspect item and item pairs statistics and to report the largest observed statistics If the model misfits we wish to locate the source of the misfit
38 38 Outline Introduction Overall goodness of fit testing Two examples Assessing goodness of approximation Fine tuning the model: identifying misfitting items
39 39 Identifying misfitting items in IRT One could compute Pearson s statistic for every item and pair of items. For a pair of items we can write where X N p D p (19) ˆ ˆ ˆ 1 ij ij ij ij ij ij p, ˆ are vectors of dimension K. ij ij o Unfortunately X does not follow an asymptotic chi square distribution when applied to a subtable As in the case of the overall goodness of fit statistics we can use M N ˆ ˆ ˆ R ˆ ˆ ˆ ij N ij ij ij ij ij p C p, 1 ij ij ij ij ij ij or a mean and variance corrected X ij Cˆ Dˆ Dˆ Δˆ Δˆ Dˆ Δˆ Δˆ D ˆ (0) ij ij ij ij ij ij ij ij ij p Σ p. (1)
40 40 Identifying misfitting items in IRT Alternatively, we could use z statistics for the residual cross products z ord k ˆ ˆ ij ij kij ij SE ˆ / N k ˆ ij ij ord. () For all statistics (but M ij ) the (asymptotic) covariance matrix of the residuals for the pair of items is needed. This is, for multinomial ML, and for the z statistics we need where v is the 1 K vector Σ D ππ Δ Δ. (3) 1 ij ij ij ij ij ( ij) ij ˆ ord ij ˆ v Σ v, (4) v 00,01, 0 ( K 1),,( K 1) 0,( K 1) 1,,( K 1) ( K 1). (5)
41 Identifying misfitting items in IRT: Type I errors for a PL 41
42 Identifying misfitting items: Power for Samejima s model 4
43 43 Assessing the source of misfit in the PROMIS anxiety data Item Average A 90% CI for RMSEA is (0.09; 0.040); SRMSR = o The model provides a close fit to the data z ord statistics are displayed above the diagonal, residual correlations below the diagonal. z ord statistics significant at the 5% level with a Bonferroni adjustment have been boldfaced.
44 44 Outline Introduction Overall goodness of fit testing Two examples Assessing goodness of approximation Fine tuning the model: identifying misfitting items Final remarks
45 45 Recommendations For overall goodness of fit use M o Mean and variance diagonally weighted statistic may be more powerful to detect the presence of mixtures and guessing It requires the estimation of the covariance matrix of the item parameter estimates using the observed information matrix For assessing the magnitude of the misfit use RMSR (only with binary or ordinal data) For assessing goodness of approximation (taking into account model parsimony) use the RMSEA o Cut off values for the RMSR and RMSEA are available
46 46 Recommendations Report largest statistics for item pairs o Residual correlations and z statistics for binary and ordinal data o Mean and variance adjusted X statistics for polytomous nominal data They require the estimation of the covariance matrix of the item parameter estimates
47 47 Concluding remarks Model data assessment in IRT can now be performed as well (if not better) than in SEM Most often, our models will be misspecified to some extent o Research is needed to investigate the practical implications of different levels of misspecification The theory presented here is completely general o Applicable to any model for multivariate discrete data (e.g., cognitive diagnostic models) Better IRT models (or other measurement models) are needed o Particularly for polytomous data
Goodness of fit assessment of item response theory models. Alberto MaydeuOlivares. University of Barcelona. Author note
GOF OF IRT MODELS 1 Goodness of fit assessment of item response theory models Alberto MaydeuOlivares University of Barcelona Author note This research was supported by an ICREAAcademia Award and grant
More informationIndices of Model Fit STRUCTURAL EQUATION MODELING 2013
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:
More informationA modelbased approach to goodnessoffit evaluation in item response theory
A MODELBASED APPROACH TO GOF EVALUATION IN IRT 1 A modelbased approach to goodnessoffit evaluation in item response theory DL Oberski JK Vermunt Dept of Methodology and Statistics, Tilburg University,
More informationComparing the Fit of Item Response Theory and Factor Analysis Models
Structural Equation Modeling, 8:333 356, 20 Copyright Taylor & Francis Group, LLC ISSN: 07055 print/5328007 online OI: 0.080/07055.20.58993 Comparing the Fit of Item Response Theory and Factor Analysis
More informationA Cautionary Note on Using G 2 (dif) to Assess Relative Model Fit in Categorical Data Analysis
MULTIVARIATE BEHAVIORAL RESEARCH, 4(), 55 64 Copyright 2006, Lawrence Erlbaum Associates, Inc. A Cautionary Note on Using G 2 (dif) to Assess Relative Model Fit in Categorical Data Analysis Albert MaydeuOlivares
More informationLecture #2 Overview. Basic IRT Concepts, Models, and Assumptions. Lecture #2 ICPSR Item Response Theory Workshop
Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationREINVENTING GUTTMAN SCALING AS A STATISTICAL MODEL: ABSOLUTE SIMPLEX THEORY
REINVENTING GUTTMAN SCALING AS A STATISTICAL MODEL: ABSOLUTE SIMPLEX THEORY Peter M. Bentler* University of California, Los Angeles *With thanks to Prof. KeHai Yuan, University of Notre Dame 1 Topics
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More informationGoodnessofFit Testing
GoodnessofFit Testing A MaydeuOlivares and C GarcíaForero, University of Barcelona, Barcelona, Spain ã 2010 Elsevier Ltd. All rights reserved. Glossary Absolute goodness of fit The discrepancy between
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #47/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationDEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests
DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also
More informationSimple Second Order ChiSquare Correction
Simple Second Order ChiSquare Correction Tihomir Asparouhov and Bengt Muthén May 3, 2010 1 1 Introduction In this note we describe the second order correction for the chisquare statistic implemented
More informationChi Square for Contingency Tables
2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure
More informationLecture  32 Regression Modelling Using SPSS
Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture  32 Regression Modelling Using SPSS (Refer
More informationChisquare test Testing for independeny The r x c contingency tables square test
Chisquare test Testing for independeny The r x c contingency tables square test 1 The chisquare distribution HUSRB/0901/1/088 Teaching Mathematics and Statistics in Sciences: Modeling and Computeraided
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationPsychology 405: Psychometric Theory Homework on Factor analysis and structural equation modeling
Psychology 405: Psychometric Theory Homework on Factor analysis and structural equation modeling William Revelle Department of Psychology Northwestern University Evanston, Illinois USA June, 2014 1 / 20
More informationVariance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.
Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationApplications of Structural Equation Modeling in Social Sciences Research
American International Journal of Contemporary Research Vol. 4 No. 1; January 2014 Applications of Structural Equation Modeling in Social Sciences Research Jackson de Carvalho, PhD Assistant Professor
More informationThe Latent Variable Growth Model In Practice. Individual Development Over Time
The Latent Variable Growth Model In Practice 37 Individual Development Over Time y i = 1 i = 2 i = 3 t = 1 t = 2 t = 3 t = 4 ε 1 ε 2 ε 3 ε 4 y 1 y 2 y 3 y 4 x η 0 η 1 (1) y ti = η 0i + η 1i x t + ε ti
More informationANSWERS TO EXERCISES AND REVIEW QUESTIONS
ANSWERS TO EXERCISES AND REVIEW QUESTIONS PART FIVE: STATISTICAL TECHNIQUES TO COMPARE GROUPS Before attempting these questions read through the introduction to Part Five and Chapters 1621 of the SPSS
More informationIt is important to bear in mind that one of the first three subscripts is redundant since k = i j +3.
IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationEvaluating the Fit of Structural Equation Models: Tests of Significance and Descriptive GoodnessofFit Measures
Methods of Psychological Research Online 003, Vol.8, No., pp. 374 Department of Psychology Internet: http://www.mpronline.de 003 University of KoblenzLandau Evaluating the Fit of Structural Equation
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means Oneway ANOVA To test the null hypothesis that several population means are equal,
More informationIntroduction to Multivariate Models: Modeling Multivariate Outcomes with Mixed Model Repeated Measures Analyses
Introduction to Multivariate Models: Modeling Multivariate Outcomes with Mixed Model Repeated Measures Analyses Applied Multilevel Models for Cross Sectional Data Lecture 11 ICPSR Summer Workshop University
More informationSample Size in Factor Analysis: The Role of Model Error
Multivariate Behavioral Research, 36 (4), 611637 Copyright 2001, Lawrence Erlbaum Associates, Inc. Sample Size in Factor Analysis: The Role of Model Error Robert C. MacCallum Ohio State University Keith
More informationWriting about SEM/Best Practices
Writing about SEM/Best Practices Presenting results Best Practices» specification» data» analysis» interpretation 1 Writing about SEM Summary of recommendation from Hoyle & Panter, McDonald & Ho, Kline»
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More information11/20/2014. Correlational research is used to describe the relationship between two or more naturally occurring variables.
Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationMultivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I  1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
More informationBasic Statistcs Formula Sheet
Basic Statistcs Formula Sheet Steven W. ydick May 5, 0 This document is only intended to review basic concepts/formulas from an introduction to statistics course. Only meanbased procedures are reviewed,
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationSPSS and AMOS. Miss Brenda Lee 2:00p.m. 6:00p.m. 24 th July, 2015 The Open University of Hong Kong
Seminar on Quantitative Data Analysis: SPSS and AMOS Miss Brenda Lee 2:00p.m. 6:00p.m. 24 th July, 2015 The Open University of Hong Kong SBAS (Hong Kong) Ltd. All Rights Reserved. 1 Agenda MANOVA, Repeated
More informationA Introduction to Matrix Algebra and Principal Components Analysis
A Introduction to Matrix Algebra and Principal Components Analysis Multivariate Methods in Education ERSH 8350 Lecture #2 August 24, 2011 ERSH 8350: Lecture 2 Today s Class An introduction to matrix algebra
More informationMixture Models for Understanding Dependencies of Insurance and Credit Risks
Mixture Models for Understanding Dependencies of Insurance and Credit Risks Emiliano A. Valdez, PhD, FSA, FIAA University of New South Wales BrightonLeSands, Sydney 1112 May 2006 Emiliano A. Valdez
More informationThe effect of varying the number of response alternatives in rating scales: Experimental evidence from intraindividual effects
Behavior Research Methods 009, 4 (), 95308 doi:0.3758/brm.4..95 The effect of varying the number of response alternatives in rating scales: Experimental evidence from intraindividual effects Alberto
More informationAlgebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
More informationStatistical Foundations: Measures of Location and Central Tendency and Summation and Expectation
Statistical Foundations: and Central Tendency and and Lecture 4 September 5, 2006 Psychology 790 Lecture #49/05/2006 Slide 1 of 26 Today s Lecture Today s Lecture Where this Fits central tendency/location
More informationChapter 11: Two Variable Regression Analysis
Department of Mathematics Izmir University of Economics Week 1415 20142015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions
More informationProbability and Statistics
CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b  0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute  Systems and Modeling GIGA  Bioinformatics ULg kristel.vansteen@ulg.ac.be
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More informationDepartment of Economics
Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 14730278 On Testing for Diagonality of Large Dimensional
More informationSample Size Determination
Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015711 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More informationConfidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
More informationAnalysis of Microdata
Rainer Winkelmann Stefan Boes Analysis of Microdata With 38 Figures and 41 Tables 4y Springer Contents 1 Introduction 1 1.1 What Are Microdata? 1 1.2 Types of Microdata 4 1.2.1 Qualitative Data 4 1.2.2
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra  1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationUNDERSTANDING THE TWOWAY ANOVA
UNDERSTANDING THE e have seen how the oneway ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationLecture 7: Binomial Test, Chisquare
Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two
More informationRegression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology
Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association
More information, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (
Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we
More informationCHAPTER 11 CHISQUARE: NONPARAMETRIC COMPARISONS OF FREQUENCY
CHAPTER 11 CHISQUARE: NONPARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples
More informationLinda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
More informationDescribe what is meant by a placebo Contrast the doubleblind procedure with the singleblind procedure Review the structure for organizing a memo
Readings: Ha and Ha Textbook  Chapters 1 8 Appendix D & E (online) Plous  Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability
More informationChapter 11: Linear Regression  Inference in Regression Analysis  Part 2
Chapter 11: Linear Regression  Inference in Regression Analysis  Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.
More informationConfidence Intervals for the Area Under an ROC Curve
Chapter 261 Confidence Intervals for the Area Under an ROC Curve Introduction Receiver operating characteristic (ROC) curves are used to assess the accuracy of a diagnostic test. The technique is used
More information1. χ 2 minimization 2. Fits in case of of systematic errors
Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first
More informationProbability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
More informationAn example ANOVA situation. 1Way ANOVA. Some notation for ANOVA. Are these differences significant? Example (Treating Blisters)
An example ANOVA situation Example (Treating Blisters) 1Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Subjects: 25 patients with blisters Treatments: Treatment A, Treatment
More informationWhat Does the Correlation Coefficient Really Tell Us About the Individual?
What Does the Correlation Coefficient Really Tell Us About the Individual? R. C. Gardner and R. W. J. Neufeld Department of Psychology University of Western Ontario ABSTRACT The Pearson product moment
More informationThe Big 50 Revision Guidelines for S1
The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationIntroduction to Confirmatory Factor Analysis and Structural Equation Modeling
Introduction to Confirmatory Factor Analysis and Structural Equation Modeling Lecture 12 August 7, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Class An Introduction to:
More informationMAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters
MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters Inferences about a population parameter can be made using sample statistics for
More informationThe Exponential Family
The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationCS395T Computational Statistics with Application to Bioinformatics
CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 6: Multivariate Normal Distributions and Chi Square The
More informationMultiple regression  Matrices
Multiple regression  Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,
More informationWhen to Use Which Statistical Test
When to Use Which Statistical Test Rachel Lovell, Ph.D., Senior Research Associate Begun Center for Violence Prevention Research and Education Jack, Joseph, and Morton Mandel School of Applied Social Sciences
More informationAssessing the Relative Fit of Alternative Item Response Theory Models to the Data
Research Paper Assessing the Relative Fit of Alternative Item Response Theory Models to the Data by John Richard Bergan, Ph.D. 6700 E. Speedway Boulevard Tucson, Arizona 85710 Phone: 520.323.9033 Fax:
More informationHow To Use A Monte Carlo Study To Decide On Sample Size and Determine Power
How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power Linda K. Muthén Muthén & Muthén 11965 Venice Blvd., Suite 407 Los Angeles, CA 90066 Telephone: (310) 3919971 Fax: (310) 3918971
More informationCrosstabulation & Chi Square
Crosstabulation & Chi Square Robert S Michael Chisquare as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among
More informationLatent Class Regression Part II
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More information4. Introduction to Statistics
Statistics for Engineers 41 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationModule 9: Nonparametric Tests. The Applied Research Center
Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } OneSample ChiSquare Test
More informationAnalysing Tables Part V Interpreting ChiSquare
Analysing Tables Part V Interpreting ChiSquare 8.0 Interpreting Chi Square Output 8.1 the Meaning of a Significant ChiSquare Sometimes the purpose of Crosstabulation is wrongly seen as being merely about
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More information