Imputation Strategies and their Evaluation

Size: px
Start display at page:

Download "Imputation Strategies and their Evaluation"

Transcription

1 Imputation Strategies and their Evaluation Seppo Laaksonen Statistics Finland and University of Tampere Presentation for the Intermediate Chintex Workshop, 30 November 2001, Statistics Finland, Helsinki General Points on Surveys What is Imputation? Why Imputation Use of Imputed data Auxiliary Data Service Imputation Process and Imputation methods - Pre-Imputation vs. Final Imputation Imputation Software s Future and Conclusions All comments, critics and proposals are welcome. Seppo.Laaksonen@ Stat.Fi

2 m missing

3 Tasks for Providing Survey Data for Users Users Needs Survey Design Sampling Design Data Collection Editing and Imputation Initial Weighting (Design Weights, Basic Weights) Re-Weighting (Post-stratification, Response Propensity Modelling, G-Weighting (Ratio, Regression), Outlier Weighting, Calibration (aggregate level) Output Data: Aggregated Macro Data and Micro Data for Special Users (data are flagged if imputed) Dissemination

4 What is imputation? Replacing a missing or incomplete or strange (outlier etc.) value with a more or less artificial value - If only one replacement: Single Imputation - If several replacements (either more units for one data set or several completed data sets): Multiple Imputation

5 Scheme of a Typical Statistical Micro File Seppo Laaksonen, 2001 Statistical Units -N(D) Identifiers -crosssectional -longitudinal -protected X -Variables - for sample selection - other auxiliary variables Y -Variables (Outcome Vbles) Several types such as: - based on various scalings - flag variables (initial, imputed,...) - variously confidential Sampling and Other Weights *Basic & *Comparison GRP Adj. Calibrated Purposes -n(d) 1 i r n Frame Overcoverage Sample Overcoverage Item Nonresponse Sub-sample of unit nonrespondents, Option Short Questionn. Unit Nonresponse N(real) N(true) Symbols: Undercoverage Excluded from the Sample Survey r= number of respondents; N(D) and n(d) =numbers of overcoverage units in a frame and in a sample, the last one may be needed to estimate; n = sample size initial except overcoverage, N's = population sizes (true =target; real =frame population);

6 Why to use Imputation? Unfortunately I have no definite answer But In many situations this operation should have been considered carefully, such as - when item missingness rate is high/significant (key variables) - when units available for multivariate analysis will be reduced much without imputations, see e.g. in the page which is not dramatic at all (but there is a danger for that analysis if imputation is not well done) - partially known values: e.g. - if known an interval (or rounded value) where a correct values lies - if should have chosen from certain categories (some are excluded like in show Who wants to be a millionaire?

7 - helping editing procedures (pre-imputation) - harmonising purposes (Y* = f(x, Y) ) - confidentiality purposes - linking/matching cross-sectional/longitudinal files together, new holes may be appearing and these may be best to fill by imputing

8 Number of missing values for some variables in the Finnish ECHP 1996 p p p p p p p p p p p p p p p p p Any of those

9 Use of Imputed data May be done at macro level but we here discuss micro-level imputations. It is important to note that the partially imputed micro data may be used at several levels, and 1. The requirements for further use are naturally most demanding when the further use will be done at micro level. This requires that real (normally unknown) values are reasonably well preserved at this level, or at least interrelationships between variables used in multivariate analysis are reasonably well preserved. 2. Somewhat less demanding is to use imputed data at distributional levels. E.g. my old exercises (Laaksonen 1991) for Finnish income distributions gave very promising results but a good reweighting procedure may be quite good as well (note: that imputation is in margins of the distribution very often superior to it). 3. Imputed data for tabular use (incl. constructing good time series) is even less demanding. When imputation has been done at micro-level, it is very comfortable to use such data for whatever tabulations. Hence it has been proposed even that all missing data could be imputed (incl. unit missingness).

10 But Whatever Imputation Method is not recommendable. It should have been done for each exploitation level optimally. User should know for which purpose it best to use and where may be arising problems. We go now to look the methods useable for good imputations The first basic point is how good auxiliary data are available. If such data are poor, micro level utilisation may be best to forget.

11 A Typology of Auxiliary Variables in Surveys, examples from business surveys Type of Auxiliary Data Examples (period) Use 1. Sampling design variables from population level Sizeband (t-1), Industry class (t-1), Region (t-1). 2. Non-updated sampling design variables from population level 3. Updated sampling design variables from population level 4. Other population level data from registers or recent surveys (estimated) The same as in type 1, new strata may be done (poststrata); in ABI: AWEIGHTBAND The same as the previous but from period t; Aggregated register turnover, employment (t-1, t); aggregated turnover from RSI (around t) Designing, Design weighting for sampled units Initial or post-stratified weights for respondents excl. overcoverage based on sample information Better weights as in the previous, sample and population overcoverage, undercoverage, deaths, births, mergers, splits, re-constructions Macro editing Macro imputation G-Weights (for each GWEIGHTBAND) based on ratio estimation or advanced

12 5. Micro data at sample level (respondents, overcoverage, nonrespondents) from registers, independent surveys and other external sources 6. Micro data at respondents level from internal sources (same survey) 7. Micro data as a subsample of non-respondents or respondents Categorical: sizeband and industry (t, t-1) Continuous: register turnover (t, t-1), register employment (t, t-1), RSI turnover (around t ) The above ones are available soon (designing time), but some others maybe later (estimation time) In addition to group 5: whatever survey variables from t, e.g. survey turnover, survey employment, survey value added, total output, imputed y value In addition to standard vbles: key variables of the survey concerned (modelling) methods (Calibration) Micro editing: error localization, selective editing, Imputation: modelling and task for crucial variables with missingness Re-weighting: GREG, response propensity modelling Editing incl. selective editing using best guess (preliminary imputed value, previous value) Imputation: modelling using auxiliary vbles either independently for each imputation task or sequentially (imputing first missing values of one vble, then the next) Quality checking Re-weighting, Imputation

13 8. Micro data from the previous waves of the same repeated survey (panel) 9. Super-auxiliary variables for specific small groups at micro level if possible 10. Hypotheses on the behaviour of variables, based on previous experiences from the same survey, international harmonisation purpose, etc. Any categorical and continuous variables for the same unit (if unit changed, this should take into account) from t-1, t-2, Note: also changes in weights Big and other unique businesses are often so special that from the same survey cannot be found reasonably observations for modelling or donors. Hence multi-national data or other super data should be used Distributions (normal, lognormal, binomial, Poisson), link functions, conditions (CMAR, MAR, NMAR), sensitivity, bounds, relevant time series Micro editing Imputation Re-weighting if need for longitudinal analysis (longitudinal weighting) Micro editing: plausibility checking Imputation Outlier weights Models for editing, imputation, weighting, outlier detection

14 NEED for AUXILIARY DATA SERVICE Although this need is recognised, THIS ACTION IS NOT USUALLY FOCUSED in NSI s auxiliary data are used too much ad hoc or following the traditions in this particular statistics data easily available is mostly exploited, but there are problems - in using updated data (for period t, or close to it), - data from other surveys or registers are not used reasonably - data from previous periods of the same survey may be used better - changes in businesses or households may be taken into account better than done. FECHP: register data have been exploited but I am not sure whether in the best way.

15 Key Variables should have recognised and used extensively Both for pre-imputation and final imputation

16 Imputation Process Step 1. The data editing process precedes the imputation process but this should thus be integrated well with real imputation process In any case, the pre-editing process has identified such values, which are required to impute. It is possible that a new editing, post-editing, is needed later in the estimation stage. Note that pre-imputation as described earlier is an essential part of editing, especially if selective or significance editing is wanted to use. Step 2. All auxiliary information potentially helpful for imputation must have been collected and validated for each imputation task. This job will continue during the following tasks if reasonable results have not been achieved with available variables and with their initial forms.

17 Step 3. The imputation model is extremely important in the whole process. Examples: - good guess - known function (logical imputation) - linear regression model with constant term - linear regression model with noise term - linear regression model with constant and noise term - linear regression model with slope (and noise) - linear regression model with constant and slopes (and noise) - logistic regression with different alternatives as above (categorical variables) - multi-level modelling - generalised linear models - non-parametric regression models (including estimation of median and other quantiles) - regression tree, classification tree (WAID software is available but not good for standard business surveys because requires categorical auxiliary variables) - multi-dimensional non-parametric surface - neural nets: self-organising maps (SOM), MLP, AURA, (Euredit project is working with these, results expected in 1-2 years) - rules from editing

18 ALL 1498 GENDER= GENDER= ADULTS=5, ADULTS=2-4,6, ADULTS= ADULTS=1, ADULTS= ADULTS=2,4,6, ADULTS=4, ADULTS=2, SAUNA1= SAUNA1= ADULTS= ADULTS=4,6, SAUNA1= MOBILE= SAUNA1= SAUNA1= MOBILE= SAUNA1= SAUNA1= WAID OLS Tree for DRINKS with 4 Explanatory Variables. The right tree (gender=2) is truncated.

19 Step 4. Imputation itself Two basic alternatives: 1. In the case of model-donor imputation the imputed values are directly derived from a (behavioural) model. 2. In the case of real-donor imputation the imputed values are directly derived from a set of observed values, from a real donor respondent, but still are indirectly derived from a more or less exactly defined model.

20 Group 1: imputed value is a predicted value of the model, adding a noise term if necessary. Group 2: how to choose a donor, it is the big issue: - Generalising: it is always the value from the neighbourhood, even the nearest based on the rules derived from the model - Many names are used, such as random hot decking (random raw with or without replacement), sequential hot decking, nearest neighbour, near neighbour In practice, a best solution may be to use both techniques, one for one part of the data, and another for the rest.

21 Other classifications: A. - Deterministic (model without random noise) - Stochastic These may be included within the previous classification. B. - Single (1 imputed value) - Multiple (3-8 imputed values) This requires some type of stochastic procedure.

22 General example: Simple linear regression model y(t) = α + β 1 x 1 + β 2 x 2 + γ y(t-1) + ε (demographic changes may be added as dummy variables, e.g.) (ε = random noise term, y survey income, x 1 domain (e.g. social group), x 2 register income, t survey period) The estimates for the parameters are denoted a, b 1, b 2, c and e. If this is reduced so that the estimated equation is y(t) = a, then it is called mean imputation, or if variable is of ratio type, ratio imputation (median may be also possible). If the model is reduced to y(t) = ε, then imputation may be done using observed residuals or theoretical residuals assuming that these follow a certain distribution such as normal distribution, but the imputation may be done either using real-donor or model-donor technique.

23 If just these theoretical values have been used, it is a model-donor technique, whereas using observed residuals, the technique is a real-donor one. In this case, the observed values may be used only once (without replacement) or several times (with replacement). The last one is usually called random hot decking. If term β 1 x 1 has been added into the model, methods such as - cell/domain mean imputation, - or cell hot decking may be applied The predicted values of the estimated regression model may be used (i) directly as imputed values, or (ii) adding (theoretical or observed) residuals, or (iii) the values in (i) and (ii) may be used for constructing near(est) neighbour technique, and this has been used when finding a donor for each missing value (regression-based nearest neighbour hot decking).

24 Some other features on Careful Imputation in the case of regression model - Special values (e.g. extreme cases) are useful to impute but not necessarily to use these values in the final data set (except non-key variables). - Final imputation is often useful to do within homogenous imputation cells. - Sampling weights should have been taken into account in final imputation. - Sequential imputation is becoming more common, for example, so that the key variables have been first imputed and the imputed values of these have been used as explanatory variables when imputing non-key variables. - Make results consistent to each other including edit rules - Check the completed results against available benchmarking data (aggregate level)

25 What Method is best, and How we know it? Excellent question The experience helps a lot: - Good imputation model or Predictability of it over the whole distribution helps whether finally imputation done based on real- or model-donors. - If the model not fine, in my experience real-donor methods are preferable, but these will only include observed values (like weighting methods); hence these are not good in such area where are not donors at all, or reasonably. This leads to model-donor techniques if values cannot be found. - Multiple Imputation have some advantages

26 Step 5. When imputations have been provided, the point estimates and the ordinary sampling variance estimates may be computed. Moreover, it is necessary to continue towards the additional variance due to imputation, called imputation variance. Analytical Formula may be developed for certain standard situations Often Replicated methods useful or Multiple imputation is a method for this purpose. It requires a proper technique. Step 6. There are several outputs from imputations. Standard estimation results are enough for most users, but many of them wish to further analyse the micro data file with imputed values. Hence, it is essential to exactly tell for these users, which values are imputed and which are not.

27 Software s for imputation * No extremely good software does not exist * SOLAS for missing values (Statistical Solutions): - Missing data pattern looked easily - Some rather simple single imputation methods, including two techniques for multiple imputation * WAID (AutImp/CBS) - Tree-based methods, not available for continuous explanatory variables in imputation model * SAS: simple imputation methods are easy to implement using SAS, and more demanding ones by a sophisticated user. * Multiple imputation software s are available but not maybe very good for NSI data (biometricians have used those, I suppose)

28 In the future: Some types of EUREDIT developments too: EU/FP5 Project for Website: Workplan EUREDIT is investigating neural networks, support vector machines, model-based and donor-based editing and imputation methods. Of these, the first two represent technologies that have only recently become readily available, and offer promise for providing accurate imputation methods in situations where traditional methods run into difficulties. Model-based methods, particularly those based on multivariate models for the data generation process, will be investigated and further developed. Real-Donor-based methods are already in place in a number of NSIs (National Statistical Institutes) and the project includes this methodology in order to provide a baseline for comparing the performance of the other more novel methods. DataClean 2002 Conference in Jyväskylä, Finland Website:

29 A Technique: Self-Organising Maps (SOM) for editing and imputation The Finnish WP: University of Jyväskylä and Statistics Finland Tree-Structured Self-Organizing Maps, TS-SOM

30 CONCLUSIONS * Some Imputation methods are to be used in close connection to editing * Imputation has been used already in all surveys, not maybe much, although not always recognised; unfortunately the methods used are sometimes subjective (good guesses) and hence not documented. Thus: need for objective methods although not used much. * Imputation could be much more used than done today, because missingness is growing in surveys and censuses. * Good new techniques are under development also in EU projects including EUREDIT which is looking for neural nets, outlier detection and classical techniques. It does not seem to be very simple job.

31 * A user should recognise that imputed data are not a real one, and leads to an additional variance (reduction in accuracy) and this should have estimated. On the other hand, without good imputation, a user will lose much more information. * Imputation is more necessary in economic surveys (e.g. for business and income variables with nonignorable missingness) but also needed when dealing with social phenomena. * Good auxiliary data are needed for successful imputation

32 Some References Heeringa, S.G., Little, R.J. and Raghunatan, T.E. (1997). Bayesian Estimation and Inference for Multivariate Coarsened Data on U.S. Household Income and Wealth. Invited Paper for the 51 st Session of the ISI, Istanbul. Kalton, G. and Kasprzyk, D. (1986). The Treatment of Missing Survey Data. Survey Methodology 12, Laaksonen, S. (1991). Adjustments for Non-response in Two-year Panel Data. The Statistician. Great Britain 40, Laaksonen, S. (1999). Weighting and Auxiliary Variables in Sample Surveys. In: G. Brossier and A-M. Dussaix (eds). "Enquêtes et Sondages. Méthodes, modèles, applications, nouvelles approches," Dunod. Paris. Laaksonen, S. (2000). Regression-Based Nearest Neighbour Hot Decking. Computational Statistics 15, 1, Lawrence, D. and McKenzie, R. (2000). The General Application of Significance Editing. Journal of Official Statistics 16, 3. Little, R. (1988). Missing-Data Adjustments in Large Surveys. Journal of Business & Economic Statistics 6, Little, R. and Rubin, D. (1987). Statistical Analysis with Missing Data. John Wiley & Sons. Rao, J.N.K. and Shao, J. (1992). Jack-knife Variance Estimation With Survey Data Under Hot Deck Imputation. Biometrika 79, Rubin, D. (1987). Multiple Imputation in Surveys. John Wiley & Sons. Rubin, D. and the papers and the discussion by B. Fay, J. Rao, D. Binder, J. Eltinge and D. Judkins (1996). Multiple Imputation After 18+ Years. Journal of the American Statistical Association 91, Särndal, C-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer. Särndal, C-E. (1996). For a Better Understanding Imputation. In: S. Laaksonen (ed.). International Perspectives on Non-response. Statistics Finland Research Reports 219. Pp Schafer, J.L. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall. Schulte Nordholt, E. (1998). Imputation: Methods, Simulation, Experiments and Practical Examples. International Statistical Review, 66, Solas (1999). Solas for Missing Data Analysis 2.0. Statistical Solutions, Ltd. Cork, Ireland. West, S.A., Kratzke, D-T. and Robertson, K.W. (1996). Alternative Imputation Procedures for Item-Non-response from New Establishments in the Universe. ASA Proceedings of the Section in Survey Research Methods.

Discussion. Seppo Laaksonen 1. 1. Introduction

Discussion. Seppo Laaksonen 1. 1. Introduction Journal of Official Statistics, Vol. 23, No. 4, 2007, pp. 467 475 Discussion Seppo Laaksonen 1 1. Introduction Bjørnstad s article is a welcome contribution to the discussion on multiple imputation (MI)

More information

Workpackage 11 Imputation and Non-Response. Deliverable 11.2

Workpackage 11 Imputation and Non-Response. Deliverable 11.2 Workpackage 11 Imputation and Non-Response Deliverable 11.2 2004 II List of contributors: Seppo Laaksonen, Statistics Finland; Ueli Oetliker, Swiss Federal Statistical Office; Susanne Rässler, University

More information

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random [Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator

More information

Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Comparison of Imputation Methods in the Survey of Income and Program Participation

Comparison of Imputation Methods in the Survey of Income and Program Participation Comparison of Imputation Methods in the Survey of Income and Program Participation Sarah McMillan U.S. Census Bureau, 4600 Silver Hill Rd, Washington, DC 20233 Any views expressed are those of the author

More information

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple

More information

Dealing with Missing Data

Dealing with Missing Data Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904

More information

Statistical Office of the European Communities PRACTICAL GUIDE TO DATA VALIDATION EUROSTAT

Statistical Office of the European Communities PRACTICAL GUIDE TO DATA VALIDATION EUROSTAT EUROSTAT Statistical Office of the European Communities PRACTICAL GUIDE TO DATA VALIDATION IN EUROSTAT TABLE OF CONTENTS 1. Introduction... 3 2. Data editing... 5 2.1 Literature review... 5 2.2 Main general

More information

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA 123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake

More information

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects

More information

Integration of Registers and Survey-based Data in the Production of Agricultural and Forestry Economics Statistics

Integration of Registers and Survey-based Data in the Production of Agricultural and Forestry Economics Statistics Integration of Registers and Survey-based Data in the Production of Agricultural and Forestry Economics Statistics Paavo Väisänen, Statistics Finland, e-mail: Paavo.Vaisanen@stat.fi Abstract The agricultural

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

Sensitivity Analysis in Multiple Imputation for Missing Data

Sensitivity Analysis in Multiple Imputation for Missing Data Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes

More information

Data Cleaning and Missing Data Analysis

Data Cleaning and Missing Data Analysis Data Cleaning and Missing Data Analysis Dan Merson vagabond@psu.edu India McHale imm120@psu.edu April 13, 2010 Overview Introduction to SACS What do we mean by Data Cleaning and why do we do it? The SACS

More information

Imputation and Analysis. Peter Fayers

Imputation and Analysis. Peter Fayers Missing Data in Palliative Care Research Imputation and Analysis Peter Fayers Department of Public Health University of Aberdeen NTNU Det medisinske fakultet Missing data Missing data is a major problem

More information

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. AGENDA Overview/Introduction to Data Mining

More information

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t

More information

Point and Interval Estimates

Point and Interval Estimates Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

More information

Reflections on Probability vs Nonprobability Sampling

Reflections on Probability vs Nonprobability Sampling Official Statistics in Honour of Daniel Thorburn, pp. 29 35 Reflections on Probability vs Nonprobability Sampling Jan Wretman 1 A few fundamental things are briefly discussed. First: What is called probability

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values Methods Report A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values Hrishikesh Chakraborty and Hong Gu March 9 RTI Press About the Author Hrishikesh Chakraborty,

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

Topic (i): Automated editing and imputation and software applications

Topic (i): Automated editing and imputation and software applications WP. 5 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Neuchâtel, Switzerland, 5 7 October

More information

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation Statistical modelling with missing data using multiple imputation Session 4: Sensitivity Analysis after Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk

More information

REENGINEERING OF THE CANADIAN MONTHLY RESTAURANTS, CATERERS AND TAVERNS SURVEY USING STATISTICS CANADA GENERALIZED SYSTEMS

REENGINEERING OF THE CANADIAN MONTHLY RESTAURANTS, CATERERS AND TAVERNS SURVEY USING STATISTICS CANADA GENERALIZED SYSTEMS REENGINEERING OF THE CANADIAN MONTHLY RESTAURANTS, CATERERS AND TAVERNS SURVEY USING STATISTICS CANADA GENERALIZED SYSTEMS Charles Tardif and Ron Carpenter, Statistics Canada, Canada ABSTRACT The Monthly

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

Descriptive Methods Ch. 6 and 7

Descriptive Methods Ch. 6 and 7 Descriptive Methods Ch. 6 and 7 Purpose of Descriptive Research Purely descriptive research describes the characteristics or behaviors of a given population in a systematic and accurate fashion. Correlational

More information

OECD SHORT-TERM ECONOMIC STATISTICS EXPERT GROUP (STESEG)

OECD SHORT-TERM ECONOMIC STATISTICS EXPERT GROUP (STESEG) OECD SHORT-TERM ECONOMIC STATISTICS EXPERT GROUP (STESEG) 10-11 September 2009 OECD Conference Centre, Paris Session II: Short-Term Economic Statistics and the Current Crisis A national statistics office

More information

THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS

THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS List of best practice for the conduct of business and consumer surveys 21 March 2014 Economic and Financial Affairs This document is written

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

PATTERN MIXTURE MODELS FOR MISSING DATA. Mike Kenward. London School of Hygiene and Tropical Medicine. Talk at the University of Turku,

PATTERN MIXTURE MODELS FOR MISSING DATA. Mike Kenward. London School of Hygiene and Tropical Medicine. Talk at the University of Turku, PATTERN MIXTURE MODELS FOR MISSING DATA Mike Kenward London School of Hygiene and Tropical Medicine Talk at the University of Turku, April 10th 2012 1 / 90 CONTENTS 1 Examples 2 Modelling Incomplete Data

More information

Visualization of Complex Survey Data: Regression Diagnostics

Visualization of Complex Survey Data: Regression Diagnostics Visualization of Complex Survey Data: Regression Diagnostics Susan Hinkins 1, Edward Mulrow, Fritz Scheuren 3 1 NORC at the University of Chicago, 11 South 5th Ave, Bozeman MT 59715 NORC at the University

More information

Statistical matching: a model based approach for data integration

Statistical matching: a model based approach for data integration ISSN 1977-0375 Methodologies and Working papers Statistical matching: a model based approach for data integration 2013 edition Methodologies and Working papers Statistical matching: a model based approach

More information

Discussion of Presentations on Commercial Big Data and Official Economic Statistics

Discussion of Presentations on Commercial Big Data and Official Economic Statistics Discussion of Presentations on Commercial Big Data and Official Economic Statistics John L. Eltinge U.S. Bureau of Labor Statistics Presentation to the Federal Economic Statistics Advisory Committee June

More information

Application in Predictive Analytics. FirstName LastName. Northwestern University

Application in Predictive Analytics. FirstName LastName. Northwestern University Application in Predictive Analytics FirstName LastName Northwestern University Prepared for: Dr. Nethra Sambamoorthi, Ph.D. Author Note: Final Assignment PRED 402 Sec 55 Page 1 of 18 Contents Introduction...

More information

National Endowment for the Arts. A Technical Research Manual

National Endowment for the Arts. A Technical Research Manual 2012 SPPA PUBLIC-USE DATA FILE USER S GUIDE A Technical Research Manual Prepared by Timothy Triplett Statistical Methods Group Urban Institute September 2013 Table of Contents Introduction... 3 Section

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

More information

Final Report for 2006 AICPA Summer Internship: AICPA Practice Analysis Methodology for Sampling Design and Selected Topics

Final Report for 2006 AICPA Summer Internship: AICPA Practice Analysis Methodology for Sampling Design and Selected Topics Final Report for 2006 AICPA Summer Internship: AICPA Practice Analysis Methodology for Sampling Design and Selected Topics Technical Report September 2007 Number W0704 Elaine M. Rodeck University of Nebraska-Lincoln

More information

Data quality and metadata

Data quality and metadata Chapter IX. Data quality and metadata This draft is based on the text adopted by the UN Statistical Commission for purposes of international recommendations for industrial and distributive trade statistics.

More information

Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

More information

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

Missing Data in Quantitative Social Research

Missing Data in Quantitative Social Research PSC Discussion Papers Series Volume 15 Issue 14 Article 1 10-1-2001 Missing Data in Quantitative Social Research S. Obeng-Manu Gyimah University of Western Ontario Follow this and additional works at:

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Missing data and net survival analysis Bernard Rachet

Missing data and net survival analysis Bernard Rachet Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,

More information

TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science

TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science TNS EX A MINE BehaviourForecast Predictive Analytics for CRM 1 TNS BehaviourForecast Why is BehaviourForecast relevant for you? The concept of analytical Relationship Management (acrm) becomes more and

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

STATISTICAL DATA EDITING Quality Measures. Section 2.2

STATISTICAL DATA EDITING Quality Measures. Section 2.2 STATISTICAL DATA EDITING Quality Measures 95 Section. IMPACT OF THE EDIT AND IMPUTATION PROCESSES ON DATA QUALITY AND EXAMPLES OF EVALUATION STUDIES Foreword Natalie Shlomo, Central Bureau of Statistics,

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Quality Assessment of Administrative Data

Quality Assessment of Administrative Data Research and Development Methodology reports from Statistics Sweden 2011:2 Quality Assessment of Administrative Data Statistiska centralbyrån Statistics Sweden Thomas Laitila Anders Wallgren Britt Wallgren

More information

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration Business Analytics using Data Mining Project Report Optimizing Operation Room Utilization by Predicting Surgery Duration Project Team 4 102034606 WU, CHOU-CHUN 103078508 CHEN, LI-CHAN 102077503 LI, DAI-SIN

More information

Introduction to time series analysis

Introduction to time series analysis Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

More information

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

More information

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

More information

OFFICE OF MANAGEMENT AND BUDGET STANDARDS AND GUIDELINES FOR STATISTICAL SURVEYS September 2006. Table of Contents

OFFICE OF MANAGEMENT AND BUDGET STANDARDS AND GUIDELINES FOR STATISTICAL SURVEYS September 2006. Table of Contents OFFICE OF MANAGEMENT AND BUDGET STANDARDS AND GUIDELINES FOR STATISTICAL SURVEYS September 2006 Table of Contents LIST OF STANDARDS FOR STATISTICAL SURVEYS... i INTRODUCTION... 1 SECTION 1 DEVELOPMENT

More information

Stat 9100.3: Analysis of Complex Survey Data

Stat 9100.3: Analysis of Complex Survey Data Stat 9100.3: Analysis of Complex Survey Data 1 Logistics Instructor: Stas Kolenikov, kolenikovs@missouri.edu Class period: MWF 1-1:50pm Office hours: Middlebush 307A, Mon 1-2pm, Tue 1-2 pm, Thu 9-10am.

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

Identifying and Reducing Nonresponse Bias throughout the Survey Process

Identifying and Reducing Nonresponse Bias throughout the Survey Process Identifying and Reducing Nonresponse Bias throughout the Survey Process Thomas Krenzke, Wendy Van de Kerckhove, and Leyla Mohadjer Westat Keywords: Weighting, Data Collection, Assessments. Introduction

More information

Master s Program in Statistics as Platform for Co-operation between University and Official statistics

Master s Program in Statistics as Platform for Co-operation between University and Official statistics Master s Program in Statistics as Platform for Co-operation between University and Official statistics Erkki Pahkinen Viitaniementie 30 A FIN-40720 Jyväskylä, Finland Pahkinen@cc.jyu.fi Risto Lehtonen

More information

CROATIAN BUREAU OF STATISTICS REPUBLIC OF CROATIA MAIN (STATISTICAL) BUSINESS PROCESSES INSTRUCTIONS FOR FILLING OUT THE TEMPLATE

CROATIAN BUREAU OF STATISTICS REPUBLIC OF CROATIA MAIN (STATISTICAL) BUSINESS PROCESSES INSTRUCTIONS FOR FILLING OUT THE TEMPLATE CROATIAN BUREAU OF STATISTICS REPUBLIC OF CROATIA MAIN (STATISTICAL) BUSINESS PROCESSES INSTRUCTIONS FOR FILLING OUT THE TEMPLATE CONTENTS INTRODUCTION... 3 1. SPECIFY NEEDS... 4 1.1 Determine needs for

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

The treatment of missing values and its effect in the classifier accuracy

The treatment of missing values and its effect in the classifier accuracy The treatment of missing values and its effect in the classifier accuracy Edgar Acuña 1 and Caroline Rodriguez 2 1 Department of Mathematics, University of Puerto Rico at Mayaguez, Mayaguez, PR 00680 edgar@cs.uprm.edu

More information

Dealing with Missing Data

Dealing with Missing Data Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January

More information

Nonresponse Adjustment Using Logistic Regression: To Weight or Not To Weight?

Nonresponse Adjustment Using Logistic Regression: To Weight or Not To Weight? Nonresponse Adjustment Using Logistic Regression: To eight or Not To eight? Eric Grau, Frank Potter, Steve illiams, and Nuria Diaz-Tena Mathematica Policy Research, Inc., Princeton, New Jersey -9 TNS,

More information

A Split Questionnaire Survey Design applied to German Media and Consumer Surveys

A Split Questionnaire Survey Design applied to German Media and Consumer Surveys A Split Questionnaire Survey Design applied to German Media and Consumer Surveys Susanne Rässler, Florian Koller, Christine Mäenpää Lehrstuhl für Statistik und Ökonometrie Universität Erlangen-Nürnberg

More information

Note on growth and growth accounting

Note on growth and growth accounting CHAPTER 0 Note on growth and growth accounting 1. Growth and the growth rate In this section aspects of the mathematical concept of the rate of growth used in growth models and in the empirical analysis

More information

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

More information

Sampling solutions to the problem of undercoverage in CATI household surveys due to the use of fixed telephone list

Sampling solutions to the problem of undercoverage in CATI household surveys due to the use of fixed telephone list Sampling solutions to the problem of undercoverage in CATI household surveys due to the use of fixed telephone list Claudia De Vitiis, Paolo Righi 1 Abstract: The undercoverage of the fixed line telephone

More information

Research and teaching cooperation between academia and NSI: Finland

Research and teaching cooperation between academia and NSI: Finland Research and teaching cooperation between academia and NSI: Finland Risto Lehtonen, Seppo Laaksonen and Markku Lanne Emails: Firstname.Lastname@Helsinki.fi Abstract The paper briefly presents experiences

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS)

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS) Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS) April 30, 2008 Abstract A randomized Mode Experiment of 27,229 discharges from 45 hospitals was used to develop adjustments for the

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Data mining and statistical models in marketing campaigns of BT Retail

Data mining and statistical models in marketing campaigns of BT Retail Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120

More information

Farm Business Survey - Statistical information

Farm Business Survey - Statistical information Farm Business Survey - Statistical information Sample representation and design The sample structure of the FBS was re-designed starting from the 2010/11 accounting year. The coverage of the survey is

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

2. Issues using administrative data for statistical purposes

2. Issues using administrative data for statistical purposes United Nations Statistical Institute for Asia and the Pacific Seventh Management Seminar for the Heads of National Statistical Offices in Asia and the Pacific, 13-15 October, Shanghai, China New Zealand

More information

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas Regression and Time Series Analysis of Petroleum Product Sales in Masters Energy oil and Gas 1 Ezeliora Chukwuemeka Daniel 1 Department of Industrial and Production Engineering, Nnamdi Azikiwe University

More information

The Development of the Annual Business Inquiry

The Development of the Annual Business Inquiry The Development of the Annual Business Inquiry Gareth Jones Office for National Statistics E-mail: info@statistics.gov.uk National Statistics customer enquiry line: +44 (0)845 601 3034 Introduction This

More information

How To Understand The Data Collection Of An Electricity Supplier Survey In Ireland

How To Understand The Data Collection Of An Electricity Supplier Survey In Ireland COUNTRY PRACTICE IN ENERGY STATISTICS Topic/Statistics: Electricity Consumption Institution/Organization: Sustainable Energy Authority of Ireland (SEAI) Country: Ireland Date: October 2012 CONTENTS Abstract...

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information