Multilevel Regression and Poststratification in Stata

Similar documents
Traineeships Regulation in Italy after the Fornero Labour Market Reform

Delegated to CNR on December 23rd, New synchronous registration system from September 28 th, 2009

Demographic indicators

5,000,000,000 Covered Bond Programme

Fiscal federalism in Italy at a glance

Introduction. We will refer in particular to the list of issues : Part I - n 7 and n 12 Part III - n 1: (ii) points a, c, e.

PRESENTATION ITALIAN PRISON SYSTEM. Italia, LPPS November 2010

HOSPICE (AND PALLIATIVE CARE NETWORK) IN ITALY AN UNMPREDICTABLE GROWTH

MUNICIPAL SOLID WASTE MANAGEMENT IN ITALY

Telecom Italia Portfolio Beni Stabili Investor Day

Gruppo Intesa Network

Data Envelopment Analysis (DEA) assessment of composite indicators. of infrastructure endowment.

Mariarosa Silvestro Letizia Cinganotto

A Single Market for Lawyers Challenges and Solutions in Cross Border Insurance. Italian Experience

BUSINESS ARCHITECTURE (BA) BA redesign

February Monitor of Bankruptcies, Insolvency Proceedings and Business Closures FourthQuarter 2012

ASSOBIOMEDICA AND BIOMEDICAL START-UPS. Vera Codazzi, Ph.D.

Rifiuti tra crescita e decrescita

Zadar,, October 24th 2013

NATI PER LEGGERE. Nati per leggere: A national programme to enhance literacy and health in small children through reading aloud

Horizon2020 La partecipazione italiana allo Strumento PMI

Sustainability Abstract

RESULTS OF THE NATIONAL SURVEY ON RADON INDOORS IN ALL THE 21 ITALIAN REGIONS

Consumer Behavior in Tourism Symposium ESTIMATING THE CARBON FOOTPRINT OF TOURISM IN SOUTH TYROL Mattia Cai

Best regards President of Italian Dance Sport Federation Christian Zamblera

presentazionenew_eng:layout 1 16/04/ :09 Page 1 VISTA Parliamentary TV Agency Rome / Brussels

Regione Provincia Distretto Abruzzo Chieti DISTRETTO 009 Abruzzo Chieti DISTRETTO 010 Abruzzo Chieti DISTRETTO 011 Abruzzo Chieti DISTRETTO 015

Price Dispersion: The Case of Pasta

BEST ONE - BEST ONE VOX SUBMERSIBLE ELECTRIC PUMPS. 60 Hz

REGLEG PRESIDENCY 2011

Using Proxy Measures of the Survey Variables in Post-Survey Adjustments in a Transportation Survey

N S S T O R I E S s p r i n g s u m m e r

Maintenance and Densification of the Italian GNSS Network. DIPARTIMENTO DI GEOSCIENZE A. Caporali J. Zurutuza M. Bertocco R. Corso P.

Handling missing data in Stata a whirlwind tour

INDEX ITALIAN SUPPLIERS

Joint Heritage the dialogue of different cultures

The emergency planning for volcanic risk at Vesuvius and Campi Flegrei

Ongoing Italian Wetland Inventory through the MedWet WIS (Wetland Inventory System) level = PMWI

Discover Trinity College London

AN OVERVIEW OF THE UK OUTBOUND MARKET

Italian Youth Guarantee Implementation Plan

STATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS

Missing data in randomized controlled trials (RCTs) can

ITALIAN GUIDELINES FOR THE APPLICATION OF RISK ASSESSMENT AT CONTAMINATED SITES

Air Quality Monitoring in Italy

Experimental Regulatory Audit and National Seminar on standards and criteria for the inspection of blood establishments Ancona, 12 th 13 th May 2010

Where do we come from?

Multiple Imputation for Missing Data: A Cautionary Tale

The drugs tracking system in Italy. Roma, March 21st 22nd W. Bergamaschi Ministry of Health

How To Find Out If A Temporary Job Is Better Than A Permanent Job

INDEX ITALIAN SUPPLIERS

Investment Opportunities in Italy's Conference Tourism sector

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

How To Live In Italy

Farm Business Survey - Statistical information

Regional policies for endorsing Renewable Energy production in Apulia. Department Director: A. Antonicelli

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Table 1. Daily newspapers: national circulation (2012)

Quaderni di Dipartimento. Assessing Gender Inequality among Italian Regions: The Italian Gender Gap Index. Monica Bozzano (Università di Pavia)

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

Youth Entrepreneurship in Italy. An Overview from Isfol

Over 200 companies part of ELITE community in Italy and UK

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS)

Tourism. Capacity and occupancy of tourist accommodation establishments

The drive-test campaigns in Italy

Angioplastica Primaria nelle SCA:

UNDERSTANDING THE TWO-WAY ANOVA

Member States experience with simplification: Italy. Cristina Colombo, ESF Managing Authority in Lombardy Region Dublin, 27th January 2014

2015 COMPANY PROFILE

GENDER DISPARITIES IN ITALY FROM A HUMAN DEVELOPMENT PERSPECTIVE

Access to medical technology: the role of financing

n Preface Innocenzo Cipolletta

Non-Marital Cohabitation in Italy. Christin Löffler

The influence of teacher support on national standardized student assessment.

A GUARANTEE THAT IS NOT THERE (YET)

Confidence Intervals for One Standard Deviation Using Standard Deviation

SPPH 501 Analysis of Longitudinal & Correlated Data September, 2012

Italy: toward a federal state? Recent constitutional developments in Italy

BORGHI SRL Iniziative Sviluppo Locale. Company Presentation

Seniors Choice of Online vs. Print Response in the 2011 Member Health Survey 1. Nancy P. Gordon, ScD Member Health Survey Director April 4, 2012

CONTENTS THE ITALIAN REVENUE AGENCY THE GOVERNANCE CENTRAL ORGANISATION CENTRAL DIRECTORATES FUNCTIONS STAFF OFFICES FUNCTIONS

European Hospital Survey: Benchmarking Deployment of e-health Services ( )

Executive Summary. Public Support for Marriage for Same-sex Couples by State by Andrew R. Flores and Scott Barclay April 2013

Summary. 1. Introduction Data format Survey datasets Information contained in the datasets Aggregate variables...

THE ITALIAN INNOVATION SYSTEM AND ITS POTENTIAL FOR HIGH-TECH START-UPS

!! " # $ #! % &' (!! )**&! %!!!!! + #! % " #

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

CENTRAL EUROPE PROGRAMME Launch of the 3 rd Call for Proposals

E-learning in prison: a proposal

Gender and the Labour Market: An International Perspective and the Case of Italy

Struggles with Survey Weighting and Regression Modeling 1

Chapter 1 Introduction. 1.1 Introduction

Carige s project: history and results. The The Business Plan Plan. The adoption of IAS and 1H 2005 results. Carige share performance -1-

Bio-economy between Food and non Food: The Italian Way

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

MEASURES OF VARIATION

PROVISIONAL DATA ON OPERATION OF THE ITALIAN POWER SYSTEM

From the help desk: Bootstrapped standard errors

An Introduction to Basic Statistics and Probability

Embracing Disciplinary Diversity: Public Administration Education in Italy

Transcription:

Multilevel Regression and Poststratification in Stata Maurizio Pisati 1 Valeria Glorioso 1,2 maurizio.pisati@unimib.it v.glorioso@campus.unimib.it 1 Dept. of Sociology and Social Research University of Milano-Bicocca (Italy) 2 Dept. of Society, Human Development, and Health Harvard School of Public Health Stata Conference Chicago 2011 July 14-15 Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 1/49

Outline Introduction 1 Introduction The problem The solution Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 2/49

Outline Introduction 1 Introduction The problem The solution 2 Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 2/49

Outline Introduction 1 Introduction The problem The solution 2 3 Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 2/49

Outline Introduction 1 Introduction The problem The solution 2 3 4 Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 2/49

Outline Introduction 1 Introduction The problem The solution 2 3 4 5 Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 2/49

The problem The solution Introduction Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 3/49

A common research objective The problem The solution Sometimes social scientists are interested in determining whether, and to what extent, the distribution of a given target variable Y varies across K groups defined by the values of one or more covariates of interest Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 4/49

A common research objective The problem The solution Sometimes social scientists are interested in determining whether, and to what extent, the distribution of a given target variable Y varies across K groups defined by the values of one or more covariates of interest Let G denote a discrete variable representing the K groups under comparison. Without loss of generality, G can represent either a single discrete covariate or the cross-classification of two or more discrete covariates Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 4/49

A common research objective The problem The solution In symbols: M G V g G = g=1 where Π is the Cartesian product operator; M G denotes the total number of covariates forming G; and V g denotes the g th covariate Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 5/49

A common research objective The problem The solution In symbols: M G V g G = g=1 where Π is the Cartesian product operator; M G denotes the total number of covariates forming G; and V g denotes the g th covariate We will refer to G as the group variable Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 5/49

A common research objective The problem The solution The (conditional) distribution of Y within each category k of G can be described as follows: Y k f(θ k, φ k ) for k = 1,..., K where f( ) denotes a generic probability distribution; θ k denotes the expected value(s) of the distribution; and φ k denotes one or more additional parameters of the distribution (e.g., its variance) Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 6/49

A common research objective The problem The solution For the sake of simplicity, let us focus on the expected value(s) of Y, so that our goal is to determine whether, and to what extent, the expected value(s) of Y varies/vary across the K categories of G Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 7/49

A common research objective The problem The solution For the sake of simplicity, let us focus on the expected value(s) of Y, so that our goal is to determine whether, and to what extent, the expected value(s) of Y varies/vary across the K categories of G In terms of regression analysis, this amounts to estimating the K possible values of the regression function E(Y G = k), i.e., E(Y G = 1) θ 1, E(Y G = 2) θ 2,..., E(Y G = K) θ K Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 7/49

A common research objective The problem The solution For the sake of simplicity, let us focus on the expected value(s) of Y, so that our goal is to determine whether, and to what extent, the expected value(s) of Y varies/vary across the K categories of G In terms of regression analysis, this amounts to estimating the K possible values of the regression function E(Y G = k), i.e., E(Y G = 1) θ 1, E(Y G = 2) θ 2,..., E(Y G = K) θ K Let us denote our estimand i.e., our quantity of interest by θ {θ k : k = 1,..., K} Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 7/49

Estimating θ Introduction The problem The solution How do we get accurate i.e., precise and unbiased estimates of θ? Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 8/49

Estimating θ Introduction The problem The solution How do we get accurate i.e., precise and unbiased estimates of θ? For the sake of simplicity, let us suppose that (a) observations are sampled from a given target population, and (b) the data of interest are collected without measurement error, so that the only source of random estimation error is the sampling variance, and the only (possible) source of systematic estimation error is the selection bias Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 8/49

Estimating θ Introduction The problem The solution How do we get accurate i.e., precise and unbiased estimates of θ? For the sake of simplicity, let us suppose that (a) observations are sampled from a given target population, and (b) the data of interest are collected without measurement error, so that the only source of random estimation error is the sampling variance, and the only (possible) source of systematic estimation error is the selection bias The expression selection bias is used here as a shorthand for the sum of coverage bias, nonresponse bias, and sampling bias (Groves 1989) Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 8/49

Estimating θ Introduction The problem The solution The standard (maximum likelihood) estimator of each element θ k of θ is: n k ˆθ k E(Y G = k) = i=1 n k where n k denotes the number of valid sample observations within category k of variable G Y i Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 9/49

Estimating θ Introduction The problem The solution When n k is small, ˆθ k tends to be very unprecise, i.e., to generate highly variable estimates of θ k Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 10/49

Estimating θ Introduction The problem The solution When n k is small, ˆθ k tends to be very unprecise, i.e., to generate highly variable estimates of θ k The accuracy of ˆθ k decreases further if the data object of analysis are affected by selection bias, i.e., if the valid observations are a nonrandom sample of the target population and the process of selection into the sample is associated with one or more variables that are also associated with variable Y Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 10/49

Here s Mr. P Introduction The problem The solution For all those cases where the number of valid observations within one or more categories of G is small and/or collected data are affected by selection bias, relatively accurate estimates of θ can be obtained by using a proper combination of multilevel regression modeling and poststratification (henceforth MrP) Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 11/49

Here s Mr. P Introduction The problem The solution For all those cases where the number of valid observations within one or more categories of G is small and/or collected data are affected by selection bias, relatively accurate estimates of θ can be obtained by using a proper combination of multilevel regression modeling and poststratification (henceforth MrP) This approach has been devised by Andrew Gelman and colleagues (Gelman and Little 1997; Park, Gelman and Bafumi 2004; Park, Gelman and Bafumi 2006; Gelman and Hill 2007) and recently elaborated on by Kastellec, Lax and Phillips (Lax and Phillips 2009a; Lax and Phillips 2009b; Kastellec, Lax and Phillips 2010) Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 11/49

The MrP estimator Introduction The problem The solution The MrP estimator of θ which we will denote by θ can be described as a four-step procedure as follows: Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 12/49

The MrP estimator Introduction The problem The solution First: Identify one or more covariates that might possibly be responsible for selection bias. Without loss of generality, let C denote a discrete variable representing the cross-classification of these covariates. In symbols: M C V c C = c=1 where Π is the Cartesian product operator; M C denotes the total number of covariates forming C; and V c denotes the c th covariate. We will refer to C as the composition variable Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 13/49

The MrP estimator Introduction The problem The solution Second: Define the new estimand γ {γ kl : k = 1,..., K; l = 1,..., L}, where γ kl E(Y G = k, C = l); k indexes the K categories of variable G as above; and l indexes the L categories of variable C Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 14/49

The MrP estimator Introduction The problem The solution Third: Use a properly specified multilevel regression model to estimate γ Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 15/49

The MrP estimator Introduction The problem The solution Fourth: Compute the estimate of each element θ k of θ as a weighted sum of the proper subset of ˆγ: θ k = L ˆγ kl w l k l=1 where w l k = N kl /N k ; N k denotes the number of members of the target population who belong in category k of variable G; and N kl denotes the number of members of the target population who belong in category k of variable G and in category l of variable C Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 16/49

The problem The solution The MrP estimator: Advantages The use of multilevel regression modeling (step 3 above) helps to increase precision Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 17/49

The problem The solution The MrP estimator: Advantages The use of multilevel regression modeling (step 3 above) helps to increase precision If the composition variable C is carefully defined, poststratification (step 4 above) helps to decrease bias Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 17/49

The problem The solution The MrP estimator: Advantages The use of multilevel regression modeling (step 3 above) helps to increase precision If the composition variable C is carefully defined, poststratification (step 4 above) helps to decrease bias In sum, we expect MrP to be a relatively accurate estimator of θ Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 17/49

The problem The solution The MrP estimator: Disadvantages We need to have population data or, at least, a sufficiently accurate estimate of it for the full G C cross-classification; this might limit the definition of C Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 18/49

The problem The solution The MrP estimator: Disadvantages We need to have population data or, at least, a sufficiently accurate estimate of it for the full G C cross-classification; this might limit the definition of C To get good estimates of γ, the multilevel regression model must be specified very carefully but this caveat applies to any kind of regression model Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 18/49

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 19/49

mrp a Stata implementation of MrP mrp is a novel user-written that implements the MrP estimator outlined above Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 20/49

mrp a Stata implementation of MrP mrp is a novel user-written that implements the MrP estimator outlined above Basically, mrp requests the user to specify (a) the target variable Y ; (b) the list of covariates forming the group variable G; (c) the list of covariates forming the composition variable C; (d) the multilevel regression command appropriate to the problem at hand (e.g., xtmixed); (e) the list of fixed effects ; (f) the list of random effects ; and (g) the name of a properly arranged dataset contaning the population totals N kl Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 20/49

mrp a Stata implementation of MrP mrp is a novel user-written that implements the MrP estimator outlined above Basically, mrp requests the user to specify (a) the target variable Y ; (b) the list of covariates forming the group variable G; (c) the list of covariates forming the composition variable C; (d) the multilevel regression command appropriate to the problem at hand (e.g., xtmixed); (e) the list of fixed effects ; (f) the list of random effects ; and (g) the name of a properly arranged dataset contaning the population totals N kl The basic output of mrp is an estimate of the K values of the regression function E(Y G = k), i.e., of the K elements of θ Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 20/49

Example (based on simulation) Our objective is to describe the extent to which the proportion of Italian adults who attend Catholic Mass regularly varies across Italian regions Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 21/49

Example (based on simulation) Our objective is to describe the extent to which the proportion of Italian adults who attend Catholic Mass regularly varies across Italian regions To this aim, a simple random sample of 2,000 units is drawn from the target population (Italian men and women aged 18+), and each sampled unit is contacted for interview Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 21/49

Example (based on simulation) Our objective is to describe the extent to which the proportion of Italian adults who attend Catholic Mass regularly varies across Italian regions To this aim, a simple random sample of 2,000 units is drawn from the target population (Italian men and women aged 18+), and each sampled unit is contacted for interview Only 984 subjects accept to participate in the survey. The response rate turns out to be higher among women and positively correlated with age and educational level Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 21/49

Example Introduction Since the number of valid observations within each region k is generally small (min(n k )=30, max(n k )=97), the standard estimator of θ will be very unprecise Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 22/49

Example Introduction Since the number of valid observations within each region k is generally small (min(n k )=30, max(n k )=97), the standard estimator of θ will be very unprecise Moreover, since sex, age, and educational level are associated with Catholic Mass attendance, the standard estimator of θ will likely be affected by selection bias Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 22/49

Example Introduction Since the number of valid observations within each region k is generally small (min(n k )=30, max(n k )=97), the standard estimator of θ will be very unprecise Moreover, since sex, age, and educational level are associated with Catholic Mass attendance, the standard estimator of θ will likely be affected by selection bias In an attempt to increase precision and decrease bias, we estimate θ using the new mrp Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 22/49

Example: mrp specification Target variable mrp church, g(region relmar region) c(sex age edu) /// regcommand(xtmixed) binomial /// fe(relmar) re(i.age i.edu i.sex i.region) /// popref("popref.dta") npop(n) /// percent Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 23/49

Example: mrp specification List of covariates forming group variable G mrp church, g(region relmar region) c(sex age edu) /// regcommand(xtmixed) binomial /// fe(relmar) re(i.age i.edu i.sex i.region) /// popref("popref.dta") npop(n) /// percent Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 24/49

Example: mrp specification List of covariates forming composition variable C mrp church, g(region relmar region) c(sex age edu) /// regcommand(xtmixed) binomial /// fe(relmar) re(i.age i.edu i.sex i.region) /// popref("popref.dta") npop(n) /// percent Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 25/49

Example: mrp specification Multilevel regression command mrp church, g(region relmar region) c(sex age edu) /// regcommand(xtmixed) binomial /// fe(relmar) re(i.age i.edu i.sex i.region) /// popref("popref.dta") npop(n) /// percent Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 26/49

Example: mrp specification List of fixed effects mrp church, g(region relmar region) c(sex age edu) /// regcommand(xtmixed) binomial /// fe(relmar) re(i.age i.edu i.sex i.region) /// popref("popref.dta") npop(n) /// percent Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 27/49

Example: mrp specification List of random effects mrp church, g(region relmar region) c(sex age edu) /// regcommand(xtmixed) binomial /// fe(relmar) re(i.age i.edu i.sex i.region) /// popref("popref.dta") npop(n) /// percent Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 28/49

Example: mrp specification Dataset and variable containing population totals N kl mrp church, g(region relmar region) c(sex age edu) /// regcommand(xtmixed) binomial /// fe(relmar) re(i.age i.edu i.sex i.region) /// popref("popref.dta") npop(n) /// percent Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 29/49

Example: mrp specification Scale option (converts proportions into percentages) mrp church, g(region relmar region) c(sex age edu) /// regcommand(xtmixed) binomial /// fe(relmar) re(i.age i.edu i.sex i.region) /// popref("popref.dta") npop(n) /// percent Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 30/49

Example: Results Dot = True population value, S = Standard estimate, M = MrP estimate Piemonte Lombardia Trentino-Alto Adige Veneto Friuli-Venezia Giulia Liguria Emilia-Romagna Toscana Umbria Marche Lazio Abruzzo Molise Campania Puglia Basilicata Calabria Sicilia Sardegna S M MS M S MS M M S M S MS M S S M M S M S MS MS M M M S 0 10 20 30 40 50 60 70 M E(Y G = k) M S S S S S Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 31/49

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 32/49

Quantity of interest We used Monte Carlo simulation to evaluate the performance of three estimators of θ in a research setting analogous to the one illustrated in the example above Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 33/49

Quantity of interest We used Monte Carlo simulation to evaluate the performance of three estimators of θ in a research setting analogous to the one illustrated in the example above Our underlying research objective is to describe the extent to which the proportion of Italian adults who attend Catholic Mass regularly varies across 19 of the 20 regions into which Italy is subdivided (the 20 th region, Valle d Aosta, is excluded from the analysis because of its peculiarities) Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 33/49

Quantity of interest We used Monte Carlo simulation to evaluate the performance of three estimators of θ in a research setting analogous to the one illustrated in the example above Our underlying research objective is to describe the extent to which the proportion of Italian adults who attend Catholic Mass regularly varies across 19 of the 20 regions into which Italy is subdivided (the 20 th region, Valle d Aosta, is excluded from the analysis because of its peculiarities) Thus, our quantity of interest θ corresponds to the K = 19 values of the regression function E(Y G = k), where the target variable Y is a binary indicator of regular Catholic Mass attendance, and the group variable G is the region of residence Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 33/49

Procedure Introduction For each estimator, we followed a three-step procedure: Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 34/49

Procedure Introduction For each estimator, we followed a three-step procedure: 1 First, we simulated 1,000 sample surveys, using as the sampling frame a large dataset (N = 251, 708) that mimics the socio-demographic structure of the full Italian adult population and contains complete information on the following individual characteristics: region of residence (region), sex (sex), age (age), educational level (edu), and Catholic Mass attendance (church) Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 34/49

Procedure Introduction For each estimator, we followed a three-step procedure: 1 First, we simulated 1,000 sample surveys, using as the sampling frame a large dataset (N = 251, 708) that mimics the socio-demographic structure of the full Italian adult population and contains complete information on the following individual characteristics: region of residence (region), sex (sex), age (age), educational level (edu), and Catholic Mass attendance (church) 2 Second, we used the data collected in each simulated survey to estimate the quantity of interest, thus getting a simulated sampling distribution of θ made of 1,000 estimates Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 34/49

Procedure Introduction For each estimator, we followed a three-step procedure: 1 First, we simulated 1,000 sample surveys, using as the sampling frame a large dataset (N = 251, 708) that mimics the socio-demographic structure of the full Italian adult population and contains complete information on the following individual characteristics: region of residence (region), sex (sex), age (age), educational level (edu), and Catholic Mass attendance (church) 2 Second, we used the data collected in each simulated survey to estimate the quantity of interest, thus getting a simulated sampling distribution of θ made of 1,000 estimates 3 Finally, we evaluated the estimator in question by computing its bias, empirical standard error, and root mean square error Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 34/49

Survey specifications Sampling method: Simple random sampling Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 35/49

Survey specifications Sampling method: Simple random sampling Initial sample size: n = 2, 000 Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 35/49

Survey specifications Sampling method: Simple random sampling Initial sample size: n = 2, 000 Response rate: Each sampled unit is selected into the final sample with a probability determined by his/her sex, age, and educational level. Such probabilities range from a minimum of 20% (poorly-educated men aged 18-44) to a maximum of 100 % (highly-educated women aged 65+) Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 35/49

Survey specifications Sampling method: Simple random sampling Initial sample size: n = 2, 000 Response rate: Each sampled unit is selected into the final sample with a probability determined by his/her sex, age, and educational level. Such probabilities range from a minimum of 20% (poorly-educated men aged 18-44) to a maximum of 100 % (highly-educated women aged 65+) Final sample size: mean(n)=970, min(n)=897, max(n)=1,035 Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 35/49

Estimator 1 Standard (std) Introduction The standard estimator of each element θ k of θ is defined as follows: n k church i i=1 ˆθ k = n k where church i takes value 1 when subject i attends Catholic Mass regularly, value 0 otherwise; and n k denotes the number of valid sample observations within region of residence k Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 36/49

Estimator 2 Multilevel Regression with Poststratification (mrp) The MrP estimator of each element θ k of θ is defined as follows: θ k = L ˆγ kl w l k l=1 where all symbols are defined as in slides 14-16 above Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 37/49

Estimator 2 Multilevel Regression with Poststratification (mrp) The MrP estimator of each element θ k of θ is defined as follows: θ k = L ˆγ kl w l k l=1 where all symbols are defined as in slides 14-16 above The estimation of parameters γ kl requires that the composition variable C be previously defined Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 37/49

Estimator 2 Multilevel Regression with Poststratification (mrp) The MrP estimator of each element θ k of θ is defined as follows: θ k = L ˆγ kl w l k l=1 where all symbols are defined as in slides 14-16 above The estimation of parameters γ kl requires that the composition variable C be previously defined In our case, we define C as the cross-classification of three categorical covariates: sex (2 levels), age (4 levels), and edu (3 levels). Therefore, L = 2 4 3 = 24 Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 37/49

Estimator 2 Multilevel Regression with Poststratification (mrp) Given the definition of composition variable C, the parameters γ kl are estimated using the following multilevel regression model: γ kl = β 0 + α region k + αr[l] sex + αage s[l] + αedu t[l] Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 38/49

Estimator 2 Multilevel Regression with Poststratification (mrp) where α region k N(β relmar relmar, σregion) 2 for k = 1,..., 19 αr sex N(0, σsex) 2 for r = 1,..., 2 αs age N(0, σage) 2 for s = 1,..., 4 αt edu N(0, σedu) 2 for t = 1,..., 3 and relmar is a region-level variable that expresses the percentage of religious marriages in each region Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 39/49

Estimator 3 Standard Regression with Poststratification (srp) The SrP estimator of each element θ k of θ is defined as follows: θ k = L ˆγ kl w l k l=1 where all symbols are defined as above Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 40/49

Estimator 3 Standard Regression with Poststratification (srp) The SrP estimator has the same general form as the MrP estimator, but in the SrP estimator the parameters γ kl are estimated using a standard logistic regression model as follows: γ kl = invlogit(β 0 + β relmar relmar + β region k region k + + βr sex sex r[l] + βs age age s[l] + βt edu edu t[l] ) where β region 1 = β region 2 = β sex 1 = β age 1 = β edu 1 = 0 Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 41/49

Results: Bias Introduction Piemonte Lombardia Trentino-Alto Adige Veneto Friuli-Venezia Giulia Liguria Emilia-Romagna Toscana Umbria Marche Lazio Abruzzo Molise Campania Puglia Basilicata Calabria Sicilia Sardegna srp mrp mrp srp srpmrp mrp srp mrp srp srpmrp srp mrp srp mrp mrpsrp srp mrp srp mrp srp mrp srp mrp mrp srp srp mrp mrp srp srp mrp srp mrp mrp srp std std std std std std std std std std std std std std std std std std std -3-2 -1 0 1 2 3 4 5 6! k 33% 39% 50% 44% 29% 26% 24% 25% 30% 45% 31% 35% 40% 46% 43% 36% 40% 41% 31% Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 42/49

Results: Empirical standard error Piemonte Lombardia Trentino-Alto Adige Veneto Friuli-Venezia Giulia Liguria Emilia-Romagna Toscana Umbria Marche Lazio Abruzzo Molise Campania Puglia Basilicata Calabria Sicilia Sardegna mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp srpstd srpstd srp std srpstd srp std srp std srp std srp std srp srpstd srp std srp std srpstd srpstd srp std srpstd srp std std srp std 0 1 2 3 4 5 6 7 8 9 10 srp std! k 33% 39% 50% 44% 29% 26% 24% 25% 30% 45% 31% 35% 40% 46% 43% 36% 40% 41% 31% Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 43/49

Results: Root mean square error Piemonte Lombardia Trentino-Alto Adige Veneto Friuli-Venezia Giulia Liguria Emilia-Romagna Toscana Umbria Marche Lazio Abruzzo Molise Campania Puglia Basilicata Calabria Sicilia Sardegna mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp mrp srp srp srp srp srp std std srp srp std std srp srp std std srp srp srp std std srp std srp std std srp std srp std srp std srp std std srp std 0 1 2 3 4 5 6 7 8 9 10 11 std! k 33% 39% 50% 44% 29% 26% 24% 25% 30% 45% 31% 35% 40% 46% 43% std 36% 40% 41% 31% Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 44/49

Results: Summary Bias: In absolute terms, the MrP estimator exhibits little bias in most cases less than one percentage point, for an average true value of 36%. Comparatively, it exhibits significantly less bias than the standard estimator and slightly more bias than the SrP estimator Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 45/49

Results: Summary Bias: In absolute terms, the MrP estimator exhibits little bias in most cases less than one percentage point, for an average true value of 36%. Comparatively, it exhibits significantly less bias than the standard estimator and slightly more bias than the SrP estimator Precision: The MrP estimator is significantly more precise (i.e., less variable) than both the standard estimator and the SrP estimator Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 45/49

Results: Summary Bias: In absolute terms, the MrP estimator exhibits little bias in most cases less than one percentage point, for an average true value of 36%. Comparatively, it exhibits significantly less bias than the standard estimator and slightly more bias than the SrP estimator Precision: The MrP estimator is significantly more precise (i.e., less variable) than both the standard estimator and the SrP estimator Accuracy: Combining bias and precision, we can conclude that the MrP estimator is 1 to 4 times more accurate than the standard estimator and 1 to 3 times more accurate than the SrP estimator Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 45/49

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 46/49

Introduction mrp is still at alpha stage and it will take a few months before it reaches a publishable form Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 47/49

Introduction mrp is still at alpha stage and it will take a few months before it reaches a publishable form mrp is part of a larger project on the analysis of variation in Stata, and eventually it will be subsumed under a more general command for the analysis of association Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 47/49

Introduction mrp is still at alpha stage and it will take a few months before it reaches a publishable form mrp is part of a larger project on the analysis of variation in Stata, and eventually it will be subsumed under a more general command for the analysis of association Part of the work presented here was carried out while Maurizio Pisati was a visiting scholar at the Institute for Quantitative Social Science at Harvard University, and Valeria Glorioso was a visiting student researcher at the Department of Society, Human Development, and Health of the Harvard School of Public Health Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 47/49

Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 48/49

Introduction Gelman, A. and J. Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press. Gelman, A. and T.C. Little. 1997. Poststratification into many categories using hierarchical logistic regression. Survey Methodology 23: 127 135. Groves, R.M. 1989. Survey Errors and Survey Costs. New York: Wiley. Kastellec, J., Lax, J.R. and J.H. Phillips. 2010. Public opinion and Senate confirmation of Supreme Court nominees. Journal of Politics 72: 767 784. Lax, J.R. and J.H. Phillips. 2009a. How should we estimate public opinion in the States?. American Journal of Political Science 53: 107 121. Lax, J.R. and J.H. Phillips. 2009b. Gay rights in the States: Public opinion and policy responsiveness. American Political Science Review 103: 367 386. Park, D.K., Gelman, A. and J. Bafumi. 2004. Bayesian multilevel estimation with poststratification: State-level estimates from national polls. Political Analysis 12: 375 385. Park, D.K., Gelman, A. and J. Bafumi. 2006. State level opinions from national surveys: Poststratification using multilevel logistic regression. In Public Opinion in State Politics. Ed. J.E. Cohen. Stanford, CA: Stanford University Press, 209 228. Maurizio Pisati and Valeria Glorioso Multilevel Regression and Poststratification 49/49