Some fallacies and remedies in secondary data analysis for survey data
|
|
- Audrey McDonald
- 8 years ago
- Views:
Transcription
1 Some fallacies and remedies in secondary data analysis for survey data Giancarlo Manzi Department of Economics, Management and Quantitative Methods, Università degli Studi di Milano, Italy Sonia Stefanizzi - Department of Sociology and Social Research, of Milan-Bicocca, Italy Pier Alda Ferrari Department of Economics, Management and Quantitative Methods, Università degli Studi di Milano, Italy Conference of European Statistics Stakeholders November 24-25, 2014 ROME, Sapienza
2 Fisher s famous quote revisited To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of (Fisher, 1938). We revisit this famous quote by the following: To call in the statistician after the experiment is done may be sometimes convenient: he or she can revive it! Our particular focus is on Secondary Data Analysis (SDA) fallacies in surveys, emerging during the statistical analysis. Some suggestions for future surveys and remedies for current European surveys are also presented. Conference of European Statistics Stakeholders, November 24-25, 2014, ROME, Sapienza
3 Overview of the talk Introduction and motivation. Quality of data and data analysis Coherence and comparability Issues in conducting SDA Data validity, reflexivity and reliability SDA statistical remedies: Combining survey results borrowing strength from each other Use of suitable analysis tools Building improved surveys from existing surveys An example on European data showing how these fallacies arises when performing statistical analysis, and some points to think of. Conclusion and future steps
4 Introduction and motivation (1) In a collaboration project with statisticians and sociologists, a series of problems aroused during when performing SDA on European data. From this, the need to study this topic further. We started first with general definitions of data information quality. For example, Kenett & Shmueli (2014) define eight dimensions of info quality: data resolution; data structure; data integration; temporal relevance; generalizability; chronology of data and goal; construct operationalization and communication.
5 Introduction and motivation (2) Eurostat has established seven dimensions of quality since 2000: Quality dimension 1. Relevance of statistical concept 2. Accuracy of estimates 3. Timeliness and punctuality in disseminating results 4. Accessibility and clarity of the information 5. Comparability 6. Coherence 7. Completeness Source: Eurostat (2000) Remark A statistical product is relevant if it meets user needs. Thus user needs has to be established at the outset. Accuracy is the difference between the estimate and the true parameter value. Assessing the accuracy is not always possible, due to financial and methodological constraints. In our experience this is perhaps one of the most important user needs. Perhaps this is so because this dimension is so obviously linked to an efficient use of the results. Results are of high value when they are easily accessible and available in forms suitable to users. The data provider should also assist the users in interpreting the results. Reliable comparisons across space and time are often crucial. Recently, new demands on cross-national comparisons have become common. This in turn puts new demands on developing methods for adjusting for cultural differences. When originating from a single source, statistics are coherent, in that elementary concepts can be combined in more complex ways. When originating from different sources, and in particular from statistical studies of different periodicities, statistics are coherent insofar as they are based on common definition, classifications, and methodological standards Domains for which statistics are available should reflect the needs and priorities expressed by users as a collective.
6 Introduction and motivation (3) In this talk we focus on points 5. and 6. above. Data quality can be addressed in connection with coherence and comparability as follows: i. resuming the most common fallacies linked to survey implementation and result analysis; ii. recalling some statistical tools able to reduce or control bias when analyzing, comparing or combining survey data. A special reference to problems arising in SDA is also given with an example in the social and economic field (the Eurobarometer survey (EB)) where issues are detected and some sketches of remedies are pointed out.
7 SDA: target switching from original data SDA: the set of research activities through which data from different surveys with certain assumptions and conceptual frameworks are used individually for purposes not necessarily coinciding with those that guided the data collection Examples: Boudon, 1973: through SDA the social researcher can widen the validity of atomic results to the point that he/she is able to modify the original conceptual framework, formulating new interpretative hypotheses which can be different with respect to those in primary analysis. Ferrari & Salini, 2011: data on European user satisfaction for utilities can be used to reveal the multifaceted importance and quality of different aspects of public services.
8 Issues in SDA: data validity DV: the correspondence between characteristics to be detected and indicators chosen to measure them Objective (i.e. observable) aspects must lead to latent states of societies and individuals. CONSTRUCTION OF VARIABLES APPROPRIATE CONCEPTUALIZATION APPROPRIATE MEASUREMENT MEANING OF THE REAL RELATIONSHIP BETWEEN Conference of European Statistics VARIABLES Stakeholders, November REVEALED 24, 2014 November 25, 2014, ROME, Sapienza
9 Issues in SDA: data reflexivity Example: performing immigration surveys. Immigration surveys express also the reflexive character of policies. Immigration surveys may result limited and constrained. Examples of constraints: Immigration policies defined only in terms of migrant categories and quotas. Official statistics almost exclusively focused on the foreigners legal matters as: their nationality; residence; duration and purpose of stay; etc. This sometimes leaves aside other important components of migration such as: social contexts of migrants origin (urban/rural); their social background; the way their migratory experience is articulated; etc.
10 Issues in SDA: data reliability DR: the degree to which data collection procedures are applied in a consistent and coherent way with respect to previously established criteria. The reliability issue occurs both at the level of data production and at the level of data collection, classification and dissemination.
11 Some examples of remedies for SDA fallacies on EU data (1) Methods for blending results from different surveys to attenuate data flaws. 1. Small area estimation where results from different surveys are blended to attenuate data flaws. Lohr & Brick (2012) explore methods for small domain estimation from two surveys when one survey is believed to be biased with respect to the other. The novelty of their work is that they use methods to adjust estimates before a new companion survey is being implemented, i.e. in the stage of constructing a newly planned survey. 2. Meta-analytic approaches. Manzi et al. [16] use a meta-analytic approach with a hierarchical Bayesian model for small domain statistics. Official survey estimates are integrated with estimates from smaller surveys covering smaller areas. Estimates are averaged with weights proportional to the strength of each survey, with the bigger surveys dominating the others, but with information coming also from smaller but more up-to-date surveys.
12 Some examples of remedies for SDA fallacies on EU data (2) Methods for the detection of latent variables which explain hidden structures in the data. 1. Ferrari et al. (2010) use Nonlinear Principal Component Analysis (NPCA) to detect latent constructs and then average them over countries. 2. In Ferrari & Salini (2011) NPCA is proposed together with the Rasch Model (RM) for the assessment of latent concepts such as satisfaction for public services. With this use of NPCA and RM: the level of satisfaction is individually determined via NPCA, but the importance of single satisfaction components (given by component loadings in NPCA) and the quality of components (given by item parameters of RM) are also determined.
13 A motivational example: SDA fallacies arising in EB survey Analysis on European citizens attachment/ expectations/information with regard to the EU. Data: EB survey Techniques used: NPCA and ML analysis. Evident problems: Excessive number of Don t Know answers: Questions erroneously formulated? Sometimes a DK answer makes sense, sometimes not. Maybe, should a way to avoid or diminish their presence in the data set be established? A great deal with imputation. Sometimes there is no coherence in the scales of same type of variables (same questionnaire sections) with recoding needed.
14 A motivational example: Post-analysis incoherence detection (1) Question understanding: What really does «understand» mean in this question? Respondents may be puzzled. Consequence: when exploring for latent variables (using NPCA) this question is ambiguously classified.
15 A motivational example: Post-analysis incoherence detection (2) Ambiguous questions/wording: Is this a question about trust in the EU? Or rather about how citizens are well-informed about it (most probably but not sure). Consequence: again, problems when clustering variables.
16 A motivational example: Post-analysis incoherence detection (3) Is maybe the choice of question formulation (verb tense, for example) decisive to assign a question to a category rather than to another? Sometimes questions have intrinsic double meanings: Are we sure that this question is really correct to check how citizens are well-informed about the EU? Is it also trying to investigate their attachment?
17 A motivational example: SDA in action (1) We wanted to evaluate EU citizens feelings about the EU. An initial set of 44 candidate variables were detected in a series of meetings among the authors. The order of categories of some variables were recoded inverting their order for homogeneity with other variables. After performing a NPCA with three and four components on these variables, some variables were excluded because not clearly in line with one of the extracted components. Some variables initially inserted in a dimension were included in another dimension. Final number of variables left: 37.
18 A motivational example: SDA in action (2) 22 variables for the EU attachment/expectation/confidence dimension. 9 variables for the EU evaluation dimension. 6 variables for the level of information about EU dimension. After some correlation and regression analysis on sociodemographic variables in the EB data set, 4 individual variables were left. After performing NPCA, individual NPCA scores were obtained separately for each of the three dimensions. Country averages of these scores where obtained, intended to show average country EU attachment/evaluation/information.
19 A motivational example: SDA in action (3) We also wanted to know if country ranking on citizens attachment/evaluation/information was related to some contextual country variable and therefore performed a ML analysis inserting contextual independent variables to detect determinants of attachment/evaluation/information. Contextual variables were essential economic and social measurements (GDP per capita, Public debt, Index of deprivation, inactivity rate, etc.). After performing the ML analysis, ranking was altered with respect to a one-level analysis and citizens of the socalled PIIGS where not among the less satisfied with the EU.
20 NPCA logic NPCA: belongs to the nonlinear multivariate analysis family is the nonlinear counterpart of principal component analysis provides dimensionality reduction by means of nonlinear transformation of variables, i.e. assigning quantitative values to qualitative scales has a solution which is derived by minimizing a least squares type loss function, expressed in terms of optimally quantified variables and scores on objects
21 NPCA: how it works in general (1) The goal of NPCA is the construction of a p- dimensional Euclidean space in which objects (individuals) are represented Suppose J categorical variables are observed on N objects (survey respondents) Let X be a N x p matrix of object scores (to be determined) Let be the x p matrix of "quantifications" of the J variables ( has to be determined, j = 1,,J, is the number of categories for the j-th variable). Let be an indicator matrix with entries if object i holds category t or otherwise,
22 NPCA: how it works in general (2) The solution of NPCA is determined by minimizing the following loss function: where SSQ(H) denotes the sum of squares of the elements of matrix H, is an column vector of single category optimal quantifications for the j-th variable and is a p-column vector of weights
23 ML: how it works in general (1) Consider the simple regression model: y ij 0 j 1 j performed in J (j=1,.,j) different groups (schools, regions, countries, etc.) with individual variables X. At the second level a group variable expressing changes from group to group can be important to explain second-level variability, and therefore: Inserting the two equations above in the regression model we get the full level-2 multilevel linear model: y ij x ij 0 j 00 01w j u0 j w 1 j j u1 j ( 01w j 10 xij 11w j xij ) [ u0 j u1 00 j ij ij ij x ]
24 A motivational example: results (1) ATTACHMENT Model 0 ONLY RANDOM INTERCEPT Coefficient SE z p-value CI Intercept Random Effects (RE) First-level variance (variability between citizen) Second-level variance (variability between countries) Deviance Model 1 ONLY INDIVIDUAL EXPLICATIVE VARIABLES VARIABLES IN THE MODEL: Age education; Age: years; Age: 55 years or more; Job: medium status; Job: High status; Community: small or medium town; Community: big town Coefficient SE z p-value CI Intercept Individual variables: Age education Age: years Age: 55 years or more Job: medium status Job: high status Community: small-medium town Community: big town Random Effects (RE) First-level variance (variability between citizen)
25 A motivational example: results (2) MODEL 2 (FULL MODEL): INDIVIDUAL AND CONTEXTUAL EXPLICATIVE VARIABLES VARIABLES IN THE MODEL: Age education; Age: years; Age: 55 years or more; Job: medium status; Job: High status; Community: small or medium town; Community: big town; Public deficit (2013); Household deprivation (2013) Coefficient SE z p-value CI Intercept Individual variables: Age education Age: years Age: 55 years or more Job: medium status Job: high status Community: small-medium town Community: big town Contextual variables Public deficit (2013) Household deprivation (2013) Random Effects (RE) First-level variance (variability between citizen) Second-level variance (variability between countries) Deviance
26 A motivational example: results (3) Two-step analysis on EU attachment First case: residuals of the null model no explicative variables Second case: residuals of the individual model only individual explicative variables Third case: residuals of the full model individual and contextual explicative variables
27 Some focus points for discussion (1) More integration between disciplines (Statistics and Sociology, for example). For example, the European Social Survey is pushing towards a more integrated work for the improvement of survey results. Statistics is useful for interpreting survey respondents answers, for example to unveil citizens attitudes towards the EU. Some classic and consolidated questions in EU questionnaires about citizens attachment to the EU may result obsolete: statistical techniques help in detecting flaws for future surveys. In our work, dimensions traditionally used by EU policy makers to analyze the level of Europeanization (evaluation, information and attachment) have shown many problems.
28 Some focus points for discussion (2) When doing SDA researchers are focused only on their particular problems. Comparability, harmonization and quality: in practice these problems are not sufficiently highlighted or are stressed with superficiality ( This new wave does not contain this question contained in the previous wave ) Statistical analysis helps in formulating new proposals for improved survey in the course of its implementation. Statistical techniques should not be used for the benefit of statistics only, but should be contextualized to give an answer to epistemological problems.
29 Some suggestions Questions in questionnaires should be as objective as possible. When planning new surveys use results emerged from statistical analysis in other studies/surveys (metaanalytic approach). From fallacies emerged from statistical analysis, construct new surveys (Lohr s example). Analyses should be contextualized referring to different areas. A meta-data codebook with rules coming also from previous statistical analysis should accompany traditional meta-data (example of ML results: are really Greeks angry with the EU?)
30 References Boudon, R. (1973) Equality, Opportunity, and Social Inequality. New York: Wiley. Eurostat (2000) Assessment of the Quality in Statistics. Eurostat/A4/Quality/00/General Standard Report, April 4-5, Luxembourg. Ferrari, P. A., Annoni, P., Manzi, G.: Evaluation and Comparison of European Countries (2010) Public Opinion on Services, Qual Quant, 44, Ferrari, P. A., Salini, S. (2011) Complementary Use of Rasch Models and Nonlinear Principal Components Analysis in the Assessment of the Opinion of Europeans about Utilities, J Classif, 28, Fisher, R. A. (1938) Indian statistical congress. CA: Sankhya. Kenett, R.S., Shmueli G. (2014) On information quality, J Roy Stat Soc A Sta, 177, Lohr, S.L., Brick, J.M. (2012) Blending domain estimates from two victimization surveys with possible bias, Can J Stat, 40(4), Manzi, G., Spiegelhalter, D.J., Turner, R.M., Flowers, J., Thompson, S.G. (2011) Modelling bias in combining small area prevalence estimates from multiple surveys, J Roy Stat Soc A Sta, 174,
Handling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationStatistical Office of the European Communities PRACTICAL GUIDE TO DATA VALIDATION EUROSTAT
EUROSTAT Statistical Office of the European Communities PRACTICAL GUIDE TO DATA VALIDATION IN EUROSTAT TABLE OF CONTENTS 1. Introduction... 3 2. Data editing... 5 2.1 Literature review... 5 2.2 Main general
More informationQualitative vs Quantitative research & Multilevel methods
Qualitative vs Quantitative research & Multilevel methods How to include context in your research April 2005 Marjolein Deunk Content What is qualitative analysis and how does it differ from quantitative
More informationTEACHING OF STATISTICS IN NEWLY INDEPENDENT STATES: THE CASE OF KAZAKSTAN
TEACHING OF STATISTICS IN NEWLY INDEPENDENT STATES: THE CASE OF KAZAKSTAN Guido Ferrari, Dipartimento di Statistica G. Parenti, Università di Firenze, Italy The aim of this report is to discuss the state
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationApplication of discriminant analysis to predict the class of degree for graduating students in a university system
International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application
More informationMaking Sense of Web Traffic Data and Its Implications for B2B Marketing Strategies
Making Sense of Web Data and Its Implications for B2B Marketing Strategies Analysis with Data Generated Website Abstract In the realm of B2B marketing, it is now realized buyers are switching catalogs
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationThe Basic Two-Level Regression Model
2 The Basic Two-Level Regression Model The multilevel regression model has become known in the research literature under a variety of names, such as random coefficient model (de Leeuw & Kreft, 1986; Longford,
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationDepartment of Economics
Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional
More informationStatistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationData quality and metadata
Chapter IX. Data quality and metadata This draft is based on the text adopted by the UN Statistical Commission for purposes of international recommendations for industrial and distributive trade statistics.
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationCourse Catalog Sociology Courses - Graduate Level Subject Course Title Course Description
Course Catalog Sociology Courses - Graduate Level Subject Course Title Course Description SO 6113 SO 6123 SO 6173 SO 6203 SO 6223 SO 6233 SO 6243 SO 6253 Soc Org & Change Poverty Analysis Environment-
More informationthe general concept down to the practical steps of the process.
Article Critique Affordances of mobile technologies for experiential learning: the interplay of technology and pedagogical practices C.- H. Lai, J.- C. Yang, F.- C. Chen, C.- W. Ho & T.- W. Chan Theoretical
More information1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationChapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
More informationIntroduction to Data Analysis in Hierarchical Linear Models
Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationFactor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models
Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis
More informationApproaches for Analyzing Survey Data: a Discussion
Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata
More informationJoint models for classification and comparison of mortality in different countries.
Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute
More informationAC 2009-803: ENTERPRISE RESOURCE PLANNING: A STUDY OF USER SATISFACTION WITH REFERENCE TO THE CONSTRUCTION INDUSTRY
AC 2009-803: ENTERPRISE RESOURCE PLANNING: A STUDY OF USER SATISFACTION WITH REFERENCE TO THE CONSTRUCTION INDUSTRY I. Choudhury, Texas A&M University American Society for Engineering Education, 2009 Page
More informationHow To Understand Multivariate Models
Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationIntroducing the Multilevel Model for Change
Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling - A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.
More informationElements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationIntroduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
More informationCOMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
More information240ST014 - Data Analysis of Transport and Logistics
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 240 - ETSEIB - Barcelona School of Industrial Engineering 715 - EIO - Department of Statistics and Operations Research MASTER'S
More informationSEM Analysis of the Impact of Knowledge Management, Total Quality Management and Innovation on Organizational Performance
2015, TextRoad Publication ISSN: 2090-4274 Journal of Applied Environmental and Biological Sciences www.textroad.com SEM Analysis of the Impact of Knowledge Management, Total Quality Management and Innovation
More informationSampling solutions to the problem of undercoverage in CATI household surveys due to the use of fixed telephone list
Sampling solutions to the problem of undercoverage in CATI household surveys due to the use of fixed telephone list Claudia De Vitiis, Paolo Righi 1 Abstract: The undercoverage of the fixed line telephone
More informationIntroduction to time series analysis
Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples
More informationThe primary goal of this thesis was to understand how the spatial dependence of
5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial
More information1 Theory: The General Linear Model
QMIN GLM Theory - 1.1 1 Theory: The General Linear Model 1.1 Introduction Before digital computers, statistics textbooks spoke of three procedures regression, the analysis of variance (ANOVA), and the
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationLongitudinal Meta-analysis
Quality & Quantity 38: 381 389, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. 381 Longitudinal Meta-analysis CORA J. M. MAAS, JOOP J. HOX and GERTY J. L. M. LENSVELT-MULDERS Department
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationAnalyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest
Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationAdequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
More informationQuality and critical appraisal of clinical practice guidelines a relevant topic for health care?
Quality and critical appraisal of clinical practice guidelines a relevant topic for health care? Françoise Cluzeau, PhD St George s Hospital Medical School, London on behalf of the AGREE Collaboration
More informationA Bayesian hierarchical surrogate outcome model for multiple sclerosis
A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)
More informationGraduate Certificate in Systems Engineering
Graduate Certificate in Systems Engineering Systems Engineering is a multi-disciplinary field that aims at integrating the engineering and management functions in the development and creation of a product,
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationMATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.
MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all n-dimensional column
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationProblem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
More informationANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL
Kardi Teknomo ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Revoledu.com Table of Contents Analytic Hierarchy Process (AHP) Tutorial... 1 Multi Criteria Decision Making... 1 Cross Tabulation... 2 Evaluation
More information10. Analysis of Longitudinal Studies Repeat-measures analysis
Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.
More informationAppendix B Data Quality Dimensions
Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational
More informationMSCA 31000 Introduction to Statistical Concepts
MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced
More informationMultilevel Models for Social Network Analysis
Multilevel Models for Social Network Analysis Paul-Philippe Pare ppare@uwo.ca Department of Sociology Centre for Population, Aging, and Health University of Western Ontario Pamela Wilcox & Matthew Logan
More informationStatistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
More informationMultivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
More informationPower and sample size in multilevel modeling
Snijders, Tom A.B. Power and Sample Size in Multilevel Linear Models. In: B.S. Everitt and D.C. Howell (eds.), Encyclopedia of Statistics in Behavioral Science. Volume 3, 1570 1573. Chicester (etc.): Wiley,
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationDifferences in Characteristics of the ERP System Selection Process between Small or Medium and Large Organizations
Proc. of the Sixth Americas Conference on Information Systems (AMCIS 2000), pp. 1022-1028, Long Beach, CA, 2000. Differences in Characteristics of the ERP System Selection Process between Small or Medium
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationModule 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling
Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling Pre-requisites Modules 1-4 Contents P5.1 Comparing Groups using Multilevel Modelling... 4
More informationLocal outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
More informationMAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =
MAT 200, Midterm Exam Solution. (0 points total) a. (5 points) Compute the determinant of the matrix 2 2 0 A = 0 3 0 3 0 Answer: det A = 3. The most efficient way is to develop the determinant along the
More informationLAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE
LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-
More informationMonica Pratesi, University of Pisa
DEVELOPING ROBUST AND STATISTICALLY BASED METHODS FOR SPATIAL DISAGGREGATION AND FOR INTEGRATION OF VARIOUS KINDS OF GEOGRAPHICAL INFORMATION AND GEO- REFERENCED SURVEY DATA Monica Pratesi, University
More informationHow To Understand The Data Collection Of An Electricity Supplier Survey In Ireland
COUNTRY PRACTICE IN ENERGY STATISTICS Topic/Statistics: Electricity Consumption Institution/Organization: Sustainable Energy Authority of Ireland (SEAI) Country: Ireland Date: October 2012 CONTENTS Abstract...
More informationTHE IMPACT OF MACROECONOMIC FACTORS ON NON-PERFORMING LOANS IN THE REPUBLIC OF MOLDOVA
Abstract THE IMPACT OF MACROECONOMIC FACTORS ON NON-PERFORMING LOANS IN THE REPUBLIC OF MOLDOVA Dorina CLICHICI 44 Tatiana COLESNICOVA 45 The purpose of this research is to estimate the impact of several
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationEarly FP Estimation and the Analytic Hierarchy Process
Early FP Estimation and the Analytic Hierarchy Process Luca Santillo (luca.santillo@gmail.com) Abstract Several methods exist in order to estimate the size of a software project, in a phase when detailed
More informationBasic Concepts in Research and Data Analysis
Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationDGD14-006. ACT Health Data Quality Framework
ACT Health Data Quality Framework Version: 1.0 Date : 18 December 2013 Table of Contents Table of Contents... 2 Acknowledgements... 3 Document Control... 3 Document Endorsement... 3 Glossary... 4 1 Introduction...
More informationShould we Really Care about Building Business. Cycle Coincident Indexes!
Should we Really Care about Building Business Cycle Coincident Indexes! Alain Hecq University of Maastricht The Netherlands August 2, 2004 Abstract Quite often, the goal of the game when developing new
More informationHMRC Tax Credits Error and Fraud Additional Capacity Trial. Customer Experience Survey Report on Findings. HM Revenue and Customs Research Report 306
HMRC Tax Credits Error and Fraud Additional Capacity Trial Customer Experience Survey Report on Findings HM Revenue and Customs Research Report 306 TNS BMRB February2014 Crown Copyright 2014 JN119315 Disclaimer
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More information[This document contains corrections to a few typos that were found on the version available through the journal s web page]
Online supplement to Hayes, A. F., & Preacher, K. J. (2014). Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, 67,
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationOverview... 2. Accounting for Business (MCD1010)... 3. Introductory Mathematics for Business (MCD1550)... 4. Introductory Economics (MCD1690)...
Unit Guide Diploma of Business Contents Overview... 2 Accounting for Business (MCD1010)... 3 Introductory Mathematics for Business (MCD1550)... 4 Introductory Economics (MCD1690)... 5 Introduction to Management
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationPOLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.
Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression
More informationRandom Effects Models for Longitudinal Survey Data
Analysis of Survey Data. Edited by R. L. Chambers and C. J. Skinner Copyright 2003 John Wiley & Sons, Ltd. ISBN: 0-471-89987-9 CHAPTER 14 Random Effects Models for Longitudinal Survey Data C. J. Skinner
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationCompetency 1 Describe the role of epidemiology in public health
The Northwest Center for Public Health Practice (NWCPHP) has developed competency-based epidemiology training materials for public health professionals in practice. Epidemiology is broadly accepted as
More informationCorrelational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots
Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship
More informationCHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More information