Working Paper

Size: px
Start display at page:

Download "Working Paper 2014-79"

Transcription

1 Working Paper QCA, the Truth Table Analysis and Large-N Survey Data: The Benefits of Calibration and the Importance of Robustness Tests Patrick Emmenegger, Dominik Schraff and André Walter Department of Political Science University of St. Gallen Abstract: This paper argues that QCA is a suitable methodological choice for the analysis of a specific but widely used form of large-n data in the social sciences, namely survey data collected through computer-assisted telephone interviews or internet surveys. The reason is that the linguistic form of survey data often lends itself to a direct translation into fuzzy sets. Likert-scaled survey items let respondents make qualitative statements of agreement, disagreement and indifference. Fuzzy sets can capture these qualitative differences in a way that classical intervalscaled indicators cannot. Moreover, fuzzy algebra allows researchers to combine multiple sets in a simple and transparent manner, thereby giving QCA an important advantage over regression-based approaches. However, the analysis of large-n survey data removes one of the characteristic strengths of QCA: its case orientation. In the case of large-n survey data, the case orientation is typically weak and causal inference thus questionable. To remedy this shortcoming QCA methodologists have suggested robustness tests to enhance confidence in the proposed relationships. This paper shows how these robustness tests can be used in a large-n setting and suggests a new robustness test that is particularly suited for large-n survey data. Keywords: Large-N, Surveys, Robustness Tests, Calibration, Truth Table Analysis, fsqca COMPASSS Working Paper Available at: EmmeneggerSchraffWalter2014.pdf COMPASSS Working Papers Series Managing Editor, Claude Rubinson

2 QCA, the Truth Table Analysis and Large-N Survey Data: The Benefits of Calibration and the Importance of Robustness Tests Abstract: This paper argues that QCA is a suitable methodological choice for the analysis of a specific but widely used form of large-n data in the social sciences, namely survey data collected through computer-assisted telephone interviews or internet surveys. The reason is that the linguistic form of survey data often lends itself to a direct translation into fuzzy sets. Likertscaled survey items let respondents make qualitative statements of agreement, disagreement and indifference. Fuzzy sets can capture these qualitative differences in a way that classical intervalscaled indicators cannot. Moreover, fuzzy algebra allows researchers to combine multiple sets in a simple and transparent manner, thereby giving QCA an important advantage over regressionbased approaches. However, the analysis of large-n survey data removes one of the characteristic strengths of QCA: its case orientation. In the case of large-n survey data, the case orientation is typically weak and causal inference thus questionable. To remedy this shortcoming QCA methodologists have suggested robustness tests to enhance confidence in the proposed relationships. This paper shows how these robustness tests can be used in a large-n setting and suggests a new robustness test that is particularly suited for large-n survey data. Keywords: Large-N, Surveys, Robustness Tests, Calibration, fsqca, Truth Table Analysis 1

3 Patrick Emmenegger (corresponding author) Department of Political Science University of St. Gallen Rosenbergstrasse 51 CH-9000 St. Gallen Dominik Schraff Department of Political Science University of St. Gallen Rosenbergstrasse 51 CH-9000 St. Gallen André Walter Department of Political Science University of St. Gallen Rosenbergstrasse 51 CH-9000 St. Gallen Acknowledgements: For helpful comments on previous versions of this paper, we would like to thank Keith Banting, Andrew Bennett, Barry Cooper, Bernhard Ebbinghaus, Judith Glaesser, Johannes Meuer, Benoît Rihoux, Claude Rubinson, Carsten Q. Schneider, Wim van Oorschot, Claudius Wagemann, Oliver Westerwinter and two anonymous reviewers. Special thanks go to Alrik Thiem and Adrian Dusa for providing support on their QCA R package. We are also grateful to all participants of the ESPAnet doctoral workshop Comparing Welfare States: Applying Quantitative and Qualitative Comparative Analysis in Social Policy Research in Mannheim (2013), the NordWel & REASSESS International Summer School in Reykjavik (2013) and the QCA Expert Workshop in Zürich (2013). 2

4 Introduction Qualitative Comparative Analysis (QCA) is typically used in case of small- and medium-sized datasets. However, in recent years scholars have begun to explore the potential of QCA for the analysis of large-n datasets (Cooper 2005; Ragin and Fiss 2008; Glaesser and Cooper 2010; Cooper and Glaesser 2011; Greckhamer et al. 2013). QCA has unique advantages over regression-based approaches (Ragin 2008; Vis 2012; Schneider and Wagemann 2013) and thus promises new insights in the analysis of large-n datasets. However, the suitability of QCA for the analysis of a large number of cases is still an open question. While Ragin (1987, 2000) originally envisioned QCA to be a medium-n data method, other authors have emphasized that QCA is the appropriate choice of method if theoretical arguments are formulated in set-theoretic terms, independent of the number of observations (Schneider and Wagemann 2012). Nevertheless, most methodological work on QCA has so far focused on medium-n datasets. In this paper, we discuss the extent to which QCA is a suitable methodological choice for the analysis of a specific but widely used form of large-n data in the social sciences, namely survey data collected through computer-assisted telephone interviews or internet surveys. We argue that such large-n datasets raise two methodological issues that have not received sufficient attention in methodological debates yet: the potential of calibration and the importance of robustness tests. First, we argue that the linguistic form of survey data often lends itself to a direct translation into fuzzy sets. Likert-scaled survey items let respondents make qualitative statements of agreement, disagreement and indifference. Fuzzy sets can reflect these qualitative differences in a way that classical interval-scaled indicators cannot. Moreover, fuzzy algebra allows researchers to combine multiple sets in a simple and transparent manner. As we demonstrate in the first part of 3

5 this paper, researchers have not yet taken sufficient advantage of this untapped potential of QCA for the analysis of large-n survey data. Contrarily, especially in large-n settings researchers still seem prone to use averages or other inductive procedures to calibrate sets. We argue that this is bad practice and illustrate how researcher can use theoretical knowledge to create more adequate sets. Second, the analysis of large-n survey data removes one of the characteristic strengths of QCA: its case orientation. The results of the Truth Table Analysis are but a midpoint in a proper QCA analysis. Typically, the results of the Truth Table Analysis are, among others, complemented by a qualitative discussion of the cases that are covered by a solution term to show that the observed configurations indeed represent causal relationships. However, in the case of large-n survey data, this case orientation is often weak and causal inference thus questionable. To remedy this shortcoming QCA methodologists have suggested robustness tests to gauge causality. In the second part of this paper, we show how these robustness tests can be used and suggest a new test that we believe is particularly suited for large-n survey data. To make these two methodological points, we use the extensive literature on opposition towards immigration as our empirical example. There is a large number of established theoretical arguments explaining opposition towards immigration and many researchers have used the 2002/03 wave of the European Social Survey, thus allowing us to compare our findings with numerous studies using regression-based approaches (e.g. Dustmann and Preston 2004; Sides and Citrin 2007; Finseraas 2008; Rydgren 2008; Herreros and Criado 2009; Meulemann et al. 2009; Senik et al. 2008; Emmenegger and Klemmensen 2013ab). In particular, we analyse whether preferences for cultural homogeneity, ethnic competition over scarce resources such as jobs, 4

6 heterogeneous social networks and education influence opposition towards immigration in Germany (Pettigrew 1998; Hainmueller and Hiscox 2007; Sides and Citrin 2007; van Oorschot and Uunk 2007; Rydgren 2008; Ceobanu and Escandell 2010). However, our main analytical goal is not to make a substantive contribution to the literature on preferences for immigration. Hence, we refrain from discussing the literature in any detail and refer readers to the cited sources for a discussion of the theoretical arguments. The paper is structured as follows. In the next section, we discuss the untapped potential of survey data for the calibration of fuzzy sets. Subsequently, we demonstrate the importance of robustness tests in case of large-n survey data. In this section we also suggest a novel robustness test that is particularly suitable for large-n survey data. A final section concludes. Untapped potential: The calibration of survey data In some respects QCA is better suited to deal with survey data than regression-based approaches. Regression-based approaches typically rely on indicators created by means of inductive procedures and thus often sever the direct link between concepts and measures. In contrast, settheoretic methods can translate survey items and Likert scales into (fuzzy) sets without any loss of information or conceptual clarity. In particular, the calibration of sets does not force researchers to turn differences-in-kind (opposition or no opposition) into differences-in-degree (more or less opposition). In the following section, we outline our argument in detail. In addition, we use the example of the opposition towards immigration -literature to illustrate our argument and demonstrate the advantages of QCA for survey data. 5

7 In regression-based approaches, researchers typically rely on quantitative, inductive techniques such as simple averages or factor analysis to create indicators. In a similar vein, QCA practitioners often propose and use such inductive techniques to calibrate sets (cf. Berg-Schlosser and De Meur 2009; Crilly et al. 2012; Greckhamer et al. 2008; Grendstad 2007; Schneider et al. 2009; Thygeson et al. 2012). However, such inductive approaches are generally seen in a critical light in set-theoretical research because they typically lead to concept-measure inconsistencies (Goertz 2006). Put simply, qualitative differences still reflected in concepts get lost once these concepts are operationalized as indicators (or as sets using inductive approaches such as averages). In contrast, sets, properly calibrated, are able to reflect these qualitative differences. Using external standards such as the researchers theoretical and substantive knowledge, sets thus increase concept-measure consistency. But as the number of cases increases, it often becomes difficult to have the necessary substantive knowledge for assigning cases to sets. As a result, QCA scholars often revert to quantitative techniques such as simple averages of indicators to calibrate sets. However, not all large-n data sets suffer from this problem. In particular, survey data, although typically large-n, can often be easily translated into sets. The reason for this is the survey questions linguistic form. Words are inherently set-theoretical. As emphasized by the cognitive linguist Lakoff (1987), experiences, events and things are not perceived as unique but rather in terms of patterns and categories. To these categories we give names that convey meaning (concepts). For instance, when using the term democracy, we (and our social environment) recall a particular class of units that are all comparatively similar (summarized in the concept democracy ). To put it in set-theoretic terms, there is a set of democracies, in which countries have different levels of (non-)membership. Concepts such as democracy refer to the knowledge 6

8 that we have about a category and therefore allow for linguistic action (Schedler 2012). They allow use to communicate what is (rather) part of the category and what is not. Survey data, unlike conventional (numeric) indicators, is based on words. For instance, in case of Likert scales, respondents are asked whether they agree with a certain statement. In most research, respondents answers are subsequently translated into ordinal scale variables that reflect how strong the respondents agreement was with certain statements. However, such translation automatically leads to a significant loss of information because the qualitative dimension of the survey item is no longer considered in the indicator. While we are still able to observe whether a person has a higher level of agreement than another one, we are no longer able to observe whether somebody in fact agrees or disagrees with the statement. These differences-in-kind, as expressed in the survey question, have disappeared from the data set once the data have been turned into indicators. This loss of information, however, is not necessary. Fuzzy sets are able to capture all the relevant information, i.e. both differences-in-degree as well as differences-in-kind. In our empirical example, we analyse the determinants of opposition towards immigration using fsqca. One of the conditions that we use in the analysis captures respondents preferences for cultural unity. This set is based on the survey statement that it is better if almost everyone shares the same customs and traditions. Respondents were asked if they agree or disagree with the statement. The five answer options were strongly agree, agree somewhat, neither agree nor disagree, disagree somewhat and strongly disagree. We code respondents who strongly agree with the statement as fully in the set preference for cultural unity, while respondents who somewhat agree are assigned to the value 0.8, reflecting the ambiguity in the formulation somewhat. In contrast, all respondents who express disagreement with the statement are coded 7

9 as being fully out of the set preference for cultural unity. Finally, indifferent respondents have some partial membership in the set preference for cultural unity although they are rather out of the set than in. By being indifferent, respondents rejected to agree with the statement. Hence, it would be incorrect to put indifferent respondents on the maximum point of ambiguity, 0.5. Instead, we assign them the value 0.2, which reflects the fact that these respondents also deliberately decided not to disagree with the statement. Figure 1 shows the calibration in detail. Figure 1: Calibration of the Likert scale cultural unity As a result of our calibration, only respondents who agree with the statement are in the set cultural unity. In contrast, regression-based approaches do not consider this qualitative dimension of the survey item. Rather, they treat the difference between strongly disagreeing 8

10 and somewhat disagreeing exactly like the difference between neither agreeing nor disagreeing and somewhat agreeing. The qualitative difference between agreement and disagreement has been lost in translation. As a consequence, set-theoretic approaches are particularly suited to maximize the information contained in Likert scales. Unlike QCA, regression-based techniques use survey items as quasi-metric indicators, thereby ignoring the substantive meaning of indifference and qualitative differences between agreeing and disagreeing with certain statements. 1 However, following Goertz (2006), the graduation of a phenomenon makes sense only within and not across sets. The consideration of the qualitative dimension contained in survey items has also important implications for causal statements. For an investigation on the determinants of opposition towards immigration, it hardly makes sense to examine the determinants of support of immigration (which regression-based approaches implicitly do). The relationship between education and opposition to immigration clarifies this point: While many studies argue that higher education leads to more tolerance towards immigrants, no study employs the reverse argument, i.e. that lower levels of education cause intolerance towards immigration (Ceobanu and Escandell 2010: 319). Set-theoretic approaches also have an untapped potential with regard to the combination of multiple survey items. When a concept cannot be measured by a single indicator, it is common procedure in studies using regression-based approaches (but oddly enough also in studies using 1 Of course, regression-based approaches are also able to capture such qualitative differences. However, we could not find a single paper in the opposition towards immigration literature that considered such qualitative differences within survey items. On the difference between the methodologically possible and usual practice see also Goertz and Mahoney (2012). 9

11 set-theoretical methods) to simply use the average of different variables or to conduct a factor analysis of variables that are expected to be in a causal relationship with the latent concept (for QCA studies using such inductive procedures see Berg-Schlosser 2008; Cárdenas 2012; Cheng et al. 2013; Crowley 2013; Engeli 2012; Grendstad 2007; Vaisey 2007). However, using such an inductive approach to capture sets is problematic for at least two reasons. First, while indicators are typically numeric, concepts are constructed in terms of necessary and sufficient conditions (Goertz 2006). For instance, Canada is not a member of the category European democracies because Canada, although democratic, is not a European country. Hence, Canada s set membership is zero and not 0.5 as the average of the variables democratic and European might imply. Hence, the calibration of sets by means of linear algebra is highly susceptible to misclassification, while conceptual thinking implies that variables are combined in a logical fashion using AND/OR operations. If the conceptual structure of necessary and sufficient conditions is not reflected in the measurement process, the result is concept-measure inconsistency. In our empirical example, scholars typically relied on inductive approaches, thus leaving conceptualization underdeveloped and resulting in large number of empirical work which is conceptually only loosely connected. Second, averaging different variables to capture a concept is based on the assumption that all indicators are equally important for a concept. For instance, opposing immigration from poor Asian or African countries has the same weight as the opposition towards immigration from rich, neighbouring European countries. However, this line of argumentation is hardly justifiable for a number of reasons, which we discuss below. 10

12 To clarify both points, we turn to the calibration of the outcome opposition towards immigration from our empirical example. Previous studies typically used a set of six survey items from the first round of European Social Survey. In the six questions, respondents were asked to what extent rich/poor immigrants from European/non-European countries and from the same/different race or ethnic group should be allowed to enter the country. The answer options were allow many, allow some, allow few and allow none. The studies then often simply used the average of the respondents answers to these six questions. In contrast, we combine these six survey items in a theoretically informed way to capture opposition towards immigration (see Figure 2). To construct the set we distinguish between the geographical, ethnic and stratificatory dimensions of opposition towards immigration. Previous studies have shown that respondents assess immigration along these dimensions differently (Hainmueller and Hiscox 2007; Hainmueller and Hopkins 2013). In the first step, all respondents who oppose immigration (meaning answering allow few or allow none ) of poor, non-european immigrants are assigned to the score 0.6. The reason we assign these respondents only marginally above the 0.5 anchor point is that opposition to immigration is most common when it focuses on culturally and geographically more distant groups. In addition, previous studies provide evidence that people prefer high-skilled to low-skilled immigration, independent of the educational and occupational background of the respective respondent (Hainmueller and Hiscox 2007). 2 As a consequence, 2 Hainmueller and Hiscox (2007) show that the education level of immigrants is strongly associated with the GDP per capita of the home country. Therefore, they argue, questions about immigration from poor countries can be interpreted as questions about low skilled immigration. 11

13 opposition towards immigration from poor, non-european countries is the most prevalent form of opposition towards immigration. Figure 2: Calibration of the set opposition towards immigration In a second step, we assign all respondents who additionally oppose immigration from different ethnic groups to 0.7. While the question about immigration of a different ethnic group suggests geographical and cultural distance, the stratificatory dimension is now missing. Therefore, 12

14 opposing immigration of a different ethnic group includes poor as well as rich individuals. Consequently, respondents have to oppose immigration in the current as well as in the previous question(s) to be assigned higher fuzzy set membership scores. 3 Using such AND-relationships between the questions, we can ensure that these respondents oppose immigration to a higher degree. The remaining set is constructed in the same manner. Respondents who oppose immigration from poor European countries (and poor non-european countries as well as from a different ethnic group) are assigned to 0.8. Opponents of immigration from rich non-european countries are assigned to 0.9, while respondents who oppose immigration even from rich European countries and from the same ethnic group are fully in the set (1.0). We construct non-membership in the set in a similar fashion. We consider respondents to be out of the set if they answer allow some or allow many to the question, meaning they support immigration. To get assigned to the 0.4 point, respondents have to support immigration from the same ethnic group or rich European countries. Here, the intuition is also straightforward: Support for culturally and geographic proximate immigration as well as from rich European countries is more common compared to other forms of immigration. Respondents who also support immigration from rich, non-european countries are assigned to 0.3. Again, these respondents also have to support immigration from rich European countries as well as from the same ethnic group. Furthermore, respondents who support immigration from poor European countries are assigned to 0.2 while supporters of immigration of different ethnic groups are assigned to 0.1. Only 3 We have examined the response behaviour by constructing dummy variables of opposition towards immigration ( allow few and allow none ) and support for immigration ( allow some and allow many ) for all survey items and cross-tabulated them. Overall, the response behaviour is very consistent with our construction of the outcome. 13

15 respondents who even support immigration from poor non-european countries and a different ethnicity are fully out of the set of opposition towards immigration. Figure 3 shows the distribution of our outcome opposition towards immigration. The distribution is almost perfectly U-shaped. A large part of the members of the set opposition towards immigration are fully in the set or assigned to the score 0.9. Furthermore, respondents who are out of the set are even more homogeneous with regard to their attitudes. The vast majority receives the score 0.0. Figure 3: Histogram of the outcome opposition towards immigration To investigate the determinants of opposition towards immigration in the next section, we select five prominent arguments from the literature. First, the fuzzy set cultural unity accounts for identity-related arguments: Immigration threatens the national and cultural identity of natives (Sides and Citrin 2007: 479). Cultural unity is calibrated as an asymmetrical fuzzy set as discussed above. The second fuzzy set is economic threat. According to self-interest theories, immigration leads to more competition on the labour market. As a result, natives develop hostile attitudes towards 14

16 immigrants (van Oorschot and Uunk 2007). To calibrate this condition we use different data sources: First, we capture the economic risk by calculating occupation-specific unemployment rates as well as the migrant share in all occupations using data from the ILO and OECD (using current or previous occupations). We use the 25, 50, and 75 percentiles for the 0, 0.5, and 1- anchor points to construct two separate fuzzy sets (for unemployment and migrant shares). 4 We then combine both sets by using the family resemblance strategy, meaning that we take the maximum score of both sets to assign the respondents to our set economy threat. While this set can account for competition for wages and promotions, the crucial part of exclusion from the labour market is still missing. Hence, we use the survey question concerning the employment status of respondents. If respondents are currently unemployed, they are recoded as fully in the set of people facing an economic threat (independently of the unemployment rates and migrants shares in their former occupational groups). As this example demonstrates, fuzzy sets allow for the creative combination of multiple data sources. The last three conditions are simply calibrated as crisp sets. According to the literature, education is one of the most important determinants for (or rather against) opposition towards immigration (cf. Ceobanu and Escandell 2010). We calibrate education by assigning all respondents who completed tertiary education to the set of highly educated respondents while all other respondents are out of the set. The second condition derives from interaction theory: Respondents who have a large number of immigrants in their social network develop sympathetic attitudes towards immigration (Pettigrew 1998; Sides and Citrin 2007). Our set differentiates between respondents 4 Calibration with percentiles can be an adequate strategy for continuous variables. Our argument for a more sensible, set-theoretic calibration primarily applies to ordinal-scaled survey items often used in social science research. 15

17 with no immigrant friends (full membership) and respondents with many immigrant friends (full non-membership). The third crisp set captures gender, differentiating between men (full membership) and women (full non-membership). Previous studies provide evidence that men hold more negative attitudes toward immigration (Quillian 1995). Questionable relationships: The importance of robustness tests Even though set-theoretic approaches like QCA seem to have clear advantages over statistical approaches in utilizing information from large-n surveys, QCA faces difficulties in inferring causality from the data analysis. From a purely technical point of view, the Truth Table Analysis should work with large-n survey data just as well as with small-n data. However, QCA is a caseoriented method, which has important consequences within a large-n setting. Going back to the cases is a crucial analytical step in QCA (Emmenegger et al. 2013). As Ragin (2000: 283) puts it, the Truth Table Analysis does not substitute for the study of cases just as reading a detailed map is not a substitute for taking a hike in the mountains. Yet, this going back to the cases is often not possible with large-n survey data because we have only the limited information provided by the survey and no chance of contacting individual respondents afterwards. Even if we had the chance to contact individuals after a single survey, the mere number of cases would make it impossible for the researcher to manage the information (although researchers can of course take advantage of the additional information contained in a survey). 16

18 However, case-orientation is crucial since QCA (including the Truth Table Analysis) does not rest on strongly formalized, automatic procedures. The calibration of the data, the setting of consistency and frequency thresholds and the selection of conditions requires researchers to make qualitative decisions which influence the results. An investigation of single cases is important to validate that these decisions most adequately reflect the realities in the respective case universe. It has correctly been pointed out that the Truth Table Analysis is very sensitive to single cases and measurement error (Hug 2013; Seawright 2005). This finding, however, is not very surprising for a case-oriented method. QCA depends on the qualitative position of its cases within a set. Hence, we want our analysis to react to changes in our data. With QCA a causal inference does not rest first and foremost on the robustness of the algorithm to data manipulations. Rather, causal inferences are identified by going back to the cases (Thiem and Ragin 2013). Yet, in a large-n setting we are no longer able to provide this crucial validation step. This is a challenge for QCA that does not hold true for statistical methods. Put differently, robustness concerns are specifically relevant for large-n QCA applications. Hence, in such settings the causal interpretation of QCA results seems problematic. 5 In recent years, the literature has suggested a number of strategies to deal with such robustness concerns if going back to the cases is not a viable option (Skaaning 2011; Maggetti and Levi- Faur 2013; Schneider and Wagemann 2013: 284ff). Of course, issues such as measurement error, sensitivity to calibration decisions and the choice of thresholds are not problems exclusive to 5 Of course, robustness does not imply causality. One could say that robustness is a necessary but not sufficient condition for causal inference. In practice, however, researchers typically use robustness tests to enhance their trust into the proposed causal relationship. 17

19 large-n settings. Also in small- and medium-n settings researchers have to explore the extent to which their results rest on particular decisions made in the analysis. Yet, we think that the issue of robustness is even more pronounced in large-n settings because of the loss of the caseorientation. Moreover, large-n survey data is especially prone to one specific error. With survey data measurement error is endemic (Hug 2013: 261) but we lack clear guidelines on how to deal with it. This problem is exacerbated by the fact that QCA papers are often rather weak with regard to the formulation of complex theoretical propositions in set-theoretical terms (Emmenegger et al. 2013; Hug 2013). We now revisit some of the most prominent suggestions made to assess the robustness of QCA results and then use our large-n survey data example on opposition towards immigration to show how they can be implemented in a large-n setting. Following Skaaning (2011) and others, we examine the robustness of our findings by using different consistency and frequency thresholds and by changing the calibration. In addition, we propose a new robustness test that we believe is particularly suitable to deal with measurement error in case of large-n samples: the random deletion of shares of cases. Table 1 presents the truth table of our five conditions: low education [LOWEDU], preference for cultural unity [UNITY], no immigrant friends [NOFRIEND], facing an economic threat [THREAT], gender [MAN] and the outcome opposition to immigration. As can be seen from the table, the consistency values of the rows are generally rather low. 6 The truth table reveals a problem not new to survey research. The abundance of information and the resulting 6 Consistency levels are very similar for the negated outcome. Only the first five truth table rows have a consistency value of above

20 heterogeneity in the response behaviour of survey participants reduces the consistency scores to a rather low level. A similar problem is often encountered in statistical research, where measures of explained variance (such as R² ) usually become rather small when analysing individual-level data. Table 1: Full truth table Row [LOWEDU] [UNITY] [NOFRIEND] [THREAT] [MAN] Outcome N Consistency For QCA a consistency level of 0.75 is often used as a rough benchmark, even though it is emphasized that the threshold should depend on the specific research context. With a low number of cases consistency thresholds should be higher, while a high number of cases allows for lower 19

21 consistency values (Schneider and Wagemann 2013: 127f). In small- and medium-n settings, low consistency values are often prescribed to inadequate calibration or inadequate selection of conditions. Yet, we believe this interpretation can be disregarded in our case because we use data and conditions that found strong support in numerous previous studies (Dustmann and Preston 2004; Sides and Citrin 2007; Finseraas 2008; Rydgren 2008; Senik et al. 2008; Herreros and Criado 2009; Meuleman et al. 2009; Emmenegger and Klemmensen 2013ab). Hence, we are quite confident that the low consistency values are not due to a lack of association between the conditions and the outcome but rather due to data characteristics typical of large-n survey data. The two plots of Figure 4 show how strong respondents cluster in the different corners of the property space given our two fuzzy conditions UNITY and THREAT as well as the outcome opposition to immigration. There are no clear visible patterns. Hence, given this heterogeneity in the response behaviour, the low consistency values displayed in Table 1 are not surprising at all. Figure 4: Kernel density plots of fuzzy sets with the outcome 20

22 For the following analysis, we use a consistency threshold of 0.66 for two reasons. First, there is a clear drop of consistency between the sixth and seventh row in Table 1. Second, even though this consistency threshold is rather low, a large-n setting gives us more leverage to decrease the consistency value. A consistency threshold of 0.66 could be regarded as a degree of sufficiency described by Ragin (2000: 110) as usually sufficient. 7 We now turn to the analysis of sufficiency by logically minimizing the truth table. 8 We set a frequency threshold of N=25. By disregarding rare combinations of conditions, we already apply a first strategy against measurement error. Table 2 presents the intermediate solution, which reports two prime implicants (PI). 9 The solution consistency of is rather good given the 7 The 0.75-consistency threshold is a matter of convention rather than a fixed value. Schneider and Wagemann (2012: 128) highlight a number of contextual factors that should inform the choice of the consistency threshold. Most importantly, the authors emphasize the infeasibility of determining a single, universal consistency value. They therefore recommend varying the threshold to assess the sensitivity of the results. 8 For reasons of space we do not display the results of the analysis of necessary conditions. We did not find a necessary condition in our analysis. The best result was received for LOW EDUCATION, having a consistency of 0.81 and coverage of This is clearly below the recommended 0.90 consistency threshold. Also, the condition is likely to be trivial (Schneider and Wagemann Trivialness of Necessity Score is 0.45). Generally, our random deletion procedure should also work with necessity statements. Yet, we think that robustness might be less relevant since solutions are simple and researchers are more conservative with regards to the measures of fit. We thank Claude Rubinson for making us think about this question. 9 To be short, we focus on the intermediate solution, as it is the most frequently used solution type. It is also a good starting point to investigate stability of the solutions given that the intermediate solution is a midpoint between the parsimonious and the complex solutions. For our baseline result in Table 2, the complex and intermediate solutions are the same, while the parsimonious solution misses LOW EDUCATION in the second solution term. Generally, all the robustness checks we discuss can be applied to all three solutions. 21

23 heterogeneity in the data. Also the two PIs from Table 2 both have good consistency values. Yet, the first PI has decisively more unique coverage. However, overall, the solution coverage suffers from the abundance of information induced by the individual-level data. The heterogeneity in the answer patterns makes it hard to identify regularities covering a large number of cases. What we can derive from such an analysis are relatively consistent sufficiency statements for comparatively small subsets of the data sample. Table 2: Analysis of sufficiency: Intermediate solution Analysis of the Outcome Consistency Raw Coverage Unique Coverage 1. PI LOWEDU * UNITY * NOFRIEND PI LOWEDU * UNITY * THREAT * man Solution Note: Consistency Threshold of 0.66; Frequency Threshold of N= 25; Cases Coded as In/Out: 610/1576 By employing the conventional robustness tests, we now assess how robust these conclusions are. Table 3 displays the results. In a first step, we vary the frequency threshold. The first panel of Table 3 shows the intermediate solution with a frequency threshold of N=5 (rather than 25). This effectively leads to the inclusion of row 4 in the minimization. The second panel displays the intermediate solution with a frequency threshold of N=50, which effectively leads to the exclusion of row 5. The results show that changes to the frequency threshold only lead to minor changes in the analysis. With a frequency threshold of five, we receive a third PI that has virtually no unique coverage. Overall, the solutions remain rather stable. Varying the frequency threshold provided us with some first useful insights about the stability of the results. From this exercise we have won some confidence in the robustness of our solution. However, there are clear limits to extent to which the frequency thresholds can be varied (within 22

24 reasonable boundaries). The strategy thus provides only limited potential to assess the robustness of our results. Table 3: Conventional robustness tests Frequency Threshold = 5 Consistency Raw Coverage Unique Coverage 1. PI LOWEDU * UNITY * NOFRIEND PI LOWEDU * UNITY * THREAT * man PI UNITY * NOFRIEND * THREAT * MAN Solution Frequency Threshold = 50 Consistency Raw Coverage Unique Coverage 1. PI LOWEDU * UNITY * NOFRIEND PI LOWEDU * UNITY * THREAT * man Solution Consistency Threshold = 0.7 Consistency Raw Coverage Unique Coverage 1. PI LOWEDU * UNITY * NOFRIEND * man PI LOWEDU * UNITY * NOFRIEND * THREAT Solution Consistency Threshold = 0.62 Consistency Raw Coverage Unique Coverage 1. PI LOWEDU * UNITY * man PI LOWEDU * UNITY * NOFRIEND PI LOWEDU * NOFRIEND * THREAT * MAN Solution New Calibration of UNITY Consistency Raw Coverage Unique Coverage 1. PI LOWEDU * UNITY * NOFRIEND PI LOWEDU * UNITY * THREAT * man Solution

25 We now turn to the consistency threshold. In case of large-n data, setting the consistency threshold might seem rather arbitrary. Yet, similar to the frequency threshold, the available options for reasonable thresholds are in fact rather limited. On the one hand, the consistency threshold should be clearly distinguishable from 0.5 and should deliver consistent results. On the other hand, it should also allow covering a substantial number of cases in the analysis. 10 That said, the results of the Truth Table Analysis should not be affected by slight variations of the consistency threshold. The third and fourth panels of Table 3 present the intermediate solutions with consistency thresholds of 0.7 and 0.62, respectively (instead of 0.66). With the higher consistency value we observe important changes. The first PI of the original solution is now joined by man, while the second PI now contains NOFRIEND instead of man. This would change the substantive conclusions we draw from the analysis. Also, unique coverage now assigns more importance to the second PI. The results also change substantively if we lower the consistency value to Here the second PI is identical with the first PI in the original solution. However, the other two PIs are different. Unique coverage is now low for all three paths. In addition, solution coverage increased while consistency slightly decreased. From varying the consistency threshold we conclude that the results are not entirely robust. Even though we identify parallels between the solutions, the changes are likely to challenge the substantive interpretations we draw from the solutions. 10 This trade-off between consistency and coverage is not unique to large-n QCA but a general characteristic of set-theoretic methods (Ragin 2008: 55; Schneider and Wagemann 2013: 148ff) 24

26 As a third robustness test, we now change the calibration of our condition UNITY. Taking seriously our suggestions about theoretically informed calibration from the first part of this article, we want to highlight that the usefulness of this strategy is limited as well because the set membership scores are determined by theoretical reasoning. Yet, for instance for UNITY the set membership values of 0.8 ( somewhat agree ) and 0.2 ( neither agree nor disagree are assigned rather arbitrarily. For the purpose of robustness we change these values to 0.7 and 0.3 respectively. The last panel in Table 3 shows that these calibration changes do not affect our findings. One interesting consequence of this exercise is that it shows that one should not put too much meaning into fuzzy set scores. As long as scores remain on the same side of the point of maximum ambiguity, the results of the Truth Table Analysis seem to be rather robust. This is important to note for scholars criticising the often unavoidable degree of arbitrariness in specific fuzzy set scores. So far we have implemented the state-of-the-art strategies to assess the robustness of QCA solutions. The findings, however, are somewhat inconclusive. Solutions change in response to our tests, yet we could also clearly identify patterns of stability. What is more, our implementation of these robustness tests shows that the possibilities to do reasonable readjustments to the specifications are in fact quite limited. 11 Especially with regard to the frequency threshold, we have often limited possibilities to investigate reasonable alternative specifications. Yet, the frequency threshold is thought of being especially useful to address casesensitivity and measurement error in large-n QCA (Ragin 2008: 133). This type of error can be referred to as random error and is typically considered to be pervasive in large-n survey datasets 11 This is not bad news as it implies that in QCA researchers should often converge to the same choices with regard to calibration and thresholds even in the absence of objective criteria. 25

27 (Maggetti and Levi-Faur 2013: 202). Hence we are particularly dissatisfied with our robustness tests regarding random error. Therefore, we use the remainder of this paper to propose a new strategy for assessing random measurement error in large-n QCA. Our strategy takes advantage of the possibilities offered by large-n data with a random datagenerating process. This setting allows us to randomly delete a proportion of cases from the data set and re-run the minimization. We think it is a useful endeavour to assess how sensitive a solution is to random deletion of cases for two reasons. First, this strategy allows us to investigate how much a solution depends on single or small groups of cases. Second, by reducing the sample size through random deletion we can model the effect of randomly missing data. This is a crucial extension for our robustness test section because the survey data we are using is very likely to suffer from random error and the previous strategies to account for it (e.g. frequency threshold) provide only a limited test. However, two caveats have to be mentioned prior the implementation of this strategy. First, random deletion does not just model random missing data; it also changes limited diversity and the relative position of the consistency threshold. 12 These are all important parameters to vary for the purpose of robustness, which makes this strategy even more interesting. Yet we cannot clearly disentangle them with our method. Second, random deletion of cases does not tackle a number of other error sources such as systematic error or conditioning error (Maggetti and Levi- Faur 2013). 12 We thank Barry Cooper and Judith Glaesser for this remark. 26

28 Using the QCA package for R (Dusa and Thiem 2014), we simulate a large number of subsamples from the original data and assess the stability of the minimization results over these subsamples. We randomly simulate subsamples containing only 90 per cent of the observations (N=2091). Effectively, we randomly delete 10 per cent of the observations from the sample. Step-by-step the simulation resamples, re-runs the QCA minimization, saves the results and starts over again (in total 999 times). This strategy has some similarities with Hug s (2013) Monte Carlo simulations that drop single cases from the QCA analysis. Yet, this strategy was implemented with a small-n dataset. As Ragin and Thiem (2013) show, this is a questionable strategy for data that is not generated at random. Accordingly, it is not surprising that a caseoriented method using macro-comparative data is case-sensitive. However, in our large-n survey data framework, a random deletion of cases is useful because respondents were randomly sampled. Our simulation procedure results in 999 QCA solutions. How can we now assess robustness using these results? Potentially, researchers might be interested in a number of the solution terms characteristics such as the type and number of prime implicants (PIs) or consistency and coverage scores. In a first step we propose an easy-to-implement strategy that allows researchers to identify potentially important configurations not reported by the original solution due to random error. Rather intuitively, a solution seems to be robust if it is composed of the same PIs over most of the simulated results. Hence, we simply count the number of times a PI appears in the 999 simulated solutions. Evidence for robustness is created if the most frequent PIs are the ones reported in the original solution. Ideally, we would like to see that the simulations do not report any PIs different from the ones we have in the original solution. Robustness suffers if we receive a great number of 27

29 PIs, which appear in the solutions at similar rates or if we have new, different PIs as the most frequent ones. Figure 5 presents the frequency of all PIs appearing in the simulated solutions using the specifications from our original result in Table 2. The figure shows that our result is very stable against the random deletion of cases. The first PI from Table 2 appears in 908 of the 999 simulated solutions; the second PI 705 times. The third and fourth PIs appear 91 and 89 times and are subsets of the original solutions. The remaining PIs are new, but appear very rarely. Figure 5: Frequency of PIs over simulated solutions (consistency threshold = 0.66) While our original solution seems to be rather robust against random missing data and measurement error, Figure 6 presents an example of a less robust solution. Here, we simulated the results for the intermediate solution with the consistency threshold Reflecting this lack of robustness, Figure 6 displays a comparatively large number of PIs with some frequently 28

30 occurring PIs such as LOWEDU*UNITY*man (appearing nearly 500 times in Figure 6) not being present in the original solution displayed in Table 3. Figure 6: Frequency of PIs over simulated solutions (consistency threshold = 0.62) Our simulation-based approach also allows for the analysis of the effect of random error on the distribution of PIs measures of fit. Table 4 presents means, medians and standard deviations of the consistency, raw coverage and unique coverage scores of the PI LOWEDU*UNITY*NOFRIEND that was part of both the original solution (see Table 2) and the solution with the lowered consistency threshold (see Table 3). The first panel is based on the simulation results from Figure 5 (consistency threshold 0.66) and the second panel is based on the results from Figure 6 (consistency threshold 0.62). The measures of fit are all normally distributed and except one have low standard deviations. From this we can infer that our consistency and coverage values from Table 2 are rather robust against random error (see first panel of Table 4). Yet, the decreased stability of the solution using the 0.62 consistency threshold 29

QUALITATIVE COMPARATIVE ANALYSIS. Charles C. Ragin Department of Sociology University of Arizona Tucson, AZ 85718. http://www.fsqca.

QUALITATIVE COMPARATIVE ANALYSIS. Charles C. Ragin Department of Sociology University of Arizona Tucson, AZ 85718. http://www.fsqca. QUALITATIVE COMPARATIVE ANALYSIS Charles C. Ragin Department of Sociology University of Arizona Tucson, AZ 85718 http://www.fsqca.com http://www.compasss.org http://www.u.arizona.edu/~cragin Why the U-Shaped

More information

What is Qualitative Comparative Analysis (QCA)?

What is Qualitative Comparative Analysis (QCA)? What is Qualitative Comparative Analysis (QCA)? Charles C. Ragin Department of Sociology and Department of Political Science University of Arizona Tucson, AZ 85721 USA www.fsqca.com www.compasss.org www.u.arizona.edu/~cragin

More information

Crisp and Fuzzy Set Qualitative Comparative Analysis (QCA)

Crisp and Fuzzy Set Qualitative Comparative Analysis (QCA) Crisp and Fuzzy Set Qualitative Comparative Analysis (QCA) Peer C. Fiss University of Southern California (I thank Charles Ragin for the use of his slides; modified for this workshop) QCA Background QCA

More information

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

More information

Using fsqca. A Brief Guide and Workshop for Fuzzy-Set Qualitative Comparative Analysis

Using fsqca. A Brief Guide and Workshop for Fuzzy-Set Qualitative Comparative Analysis Using fsqca A Brief Guide and Workshop for Fuzzy-Set Qualitative Comparative Analysis Ray Kent Department of Marketing University of Stirling r.a.kent@stir.ac.uk 2008 Preface Ray Kent of the University

More information

The Contextualization of Project Management Practice and Best Practice

The Contextualization of Project Management Practice and Best Practice The Contextualization of Project Management Practice and Best Practice Claude Besner PhD, University of Quebec at Montreal Brian Hobbs PhD, University of Quebec at Montreal Abstract This research aims

More information

Revealing Taste-Based Discrimination in Hiring: A Correspondence Testing Experiment with Geographic Variation

Revealing Taste-Based Discrimination in Hiring: A Correspondence Testing Experiment with Geographic Variation D I S C U S S I O N P A P E R S E R I E S IZA DP No. 6153 Revealing Taste-Based Discrimination in Hiring: A Correspondence Testing Experiment with Geographic Variation Magnus Carlsson Dan-Olof Rooth November

More information

The primary goal of this thesis was to understand how the spatial dependence of

The primary goal of this thesis was to understand how the spatial dependence of 5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial

More information

Levels of measurement in psychological research:

Levels of measurement in psychological research: Research Skills: Levels of Measurement. Graham Hole, February 2011 Page 1 Levels of measurement in psychological research: Psychology is a science. As such it generally involves objective measurement of

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

Response to Critiques of Mortgage Discrimination and FHA Loan Performance

Response to Critiques of Mortgage Discrimination and FHA Loan Performance A Response to Comments Response to Critiques of Mortgage Discrimination and FHA Loan Performance James A. Berkovec Glenn B. Canner Stuart A. Gabriel Timothy H. Hannan Abstract This response discusses the

More information

THE EFFECT OF NO-FAULT ON FATAL ACCIDENT RATES

THE EFFECT OF NO-FAULT ON FATAL ACCIDENT RATES -xiii- SUMMARY Thirteen states currently either mandate no-fault auto insurance or allow drivers to choose between no-fault and tort insurance. No-fault auto insurance requires individuals to carry personal

More information

Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith

Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith Policy Discussion Briefing January 27 Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith Introduction It is rare to open a newspaper or read a government

More information

Analyzing and interpreting data Evaluation resources from Wilder Research

Analyzing and interpreting data Evaluation resources from Wilder Research Wilder Research Analyzing and interpreting data Evaluation resources from Wilder Research Once data are collected, the next step is to analyze the data. A plan for analyzing your data should be developed

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

More information

Asymmetry and the Cost of Capital

Asymmetry and the Cost of Capital Asymmetry and the Cost of Capital Javier García Sánchez, IAE Business School Lorenzo Preve, IAE Business School Virginia Sarria Allende, IAE Business School Abstract The expected cost of capital is a crucial

More information

Mathematics Cognitive Domains Framework: TIMSS 2003 Developmental Project Fourth and Eighth Grades

Mathematics Cognitive Domains Framework: TIMSS 2003 Developmental Project Fourth and Eighth Grades Appendix A Mathematics Cognitive Domains Framework: TIMSS 2003 Developmental Project Fourth and Eighth Grades To respond correctly to TIMSS test items, students need to be familiar with the mathematics

More information

Lasse Cronqvist. Email: lasse@staff.uni-marburg.de. Tosmana. TOol for SMAll-N Analysis. version 1.2. User Manual

Lasse Cronqvist. Email: lasse@staff.uni-marburg.de. Tosmana. TOol for SMAll-N Analysis. version 1.2. User Manual Lasse Cronqvist Email: lasse@staff.uni-marburg.de Tosmana TOol for SMAll-N Analysis version 1.2 User Manual Release: 24 th of January 2005 Content 1. Introduction... 3 2. Installing Tosmana... 4 Installing

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Errors in Operational Spreadsheets: A Review of the State of the Art

Errors in Operational Spreadsheets: A Review of the State of the Art Errors in Operational Spreadsheets: A Review of the State of the Art Stephen G. Powell Tuck School of Business Dartmouth College sgp@dartmouth.edu Kenneth R. Baker Tuck School of Business Dartmouth College

More information

Life Cycle Asset Allocation A Suitable Approach for Defined Contribution Pension Plans

Life Cycle Asset Allocation A Suitable Approach for Defined Contribution Pension Plans Life Cycle Asset Allocation A Suitable Approach for Defined Contribution Pension Plans Challenges for defined contribution plans While Eastern Europe is a prominent example of the importance of defined

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Strategies for Promoting Gatekeeper Course Success Among Students Needing Remediation: Research Report for the Virginia Community College System

Strategies for Promoting Gatekeeper Course Success Among Students Needing Remediation: Research Report for the Virginia Community College System Strategies for Promoting Gatekeeper Course Success Among Students Needing Remediation: Research Report for the Virginia Community College System Josipa Roksa Davis Jenkins Shanna Smith Jaggars Matthew

More information

Behavioral Segmentation

Behavioral Segmentation Behavioral Segmentation TM Contents 1. The Importance of Segmentation in Contemporary Marketing... 2 2. Traditional Methods of Segmentation and their Limitations... 2 2.1 Lack of Homogeneity... 3 2.2 Determining

More information

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

More information

CRM Forum Resources http://www.crm-forum.com

CRM Forum Resources http://www.crm-forum.com CRM Forum Resources http://www.crm-forum.com BEHAVIOURAL SEGMENTATION SYSTEMS - A Perspective Author: Brian Birkhead Copyright Brian Birkhead January 1999 Copyright Brian Birkhead, 1999. Supplied by The

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

The Elasticity of Taxable Income: A Non-Technical Summary

The Elasticity of Taxable Income: A Non-Technical Summary The Elasticity of Taxable Income: A Non-Technical Summary John Creedy The University of Melbourne Abstract This paper provides a non-technical summary of the concept of the elasticity of taxable income,

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically

More information

Constructing Kirq, software for set-theoretic social research: A software development travelogue

Constructing Kirq, software for set-theoretic social research: A software development travelogue Constructing Kirq, software for set-theoretic social research: A software development travelogue Claude Rubinson University of Houston Downtown rubinsonc@uhd.edu cjr@grundrisse.org http://grundrisse.org

More information

Price Dispersion. Ed Hopkins Economics University of Edinburgh Edinburgh EH8 9JY, UK. November, 2006. Abstract

Price Dispersion. Ed Hopkins Economics University of Edinburgh Edinburgh EH8 9JY, UK. November, 2006. Abstract Price Dispersion Ed Hopkins Economics University of Edinburgh Edinburgh EH8 9JY, UK November, 2006 Abstract A brief survey of the economics of price dispersion, written for the New Palgrave Dictionary

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

QCA: A Package for Qualitative Comparative Analysis by Alrik Thiem and Adrian Duşa

QCA: A Package for Qualitative Comparative Analysis by Alrik Thiem and Adrian Duşa CONTRIBUTED RESEARCH ARTICLES 87 QCA: A Package for Qualitative Comparative Analysis by Alrik Thiem and Adrian Duşa Abstract We present QCA, a package for performing Qualitative Comparative Analysis (QCA).

More information

Basic Concepts in Research and Data Analysis

Basic Concepts in Research and Data Analysis Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the

More information

Segmentation: Foundation of Marketing Strategy

Segmentation: Foundation of Marketing Strategy Gelb Consulting Group, Inc. 1011 Highway 6 South P + 281.759.3600 Suite 120 F + 281.759.3607 Houston, Texas 77077 www.gelbconsulting.com An Endeavor Management Company Overview One purpose of marketing

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Persuasion by Cheap Talk - Online Appendix

Persuasion by Cheap Talk - Online Appendix Persuasion by Cheap Talk - Online Appendix By ARCHISHMAN CHAKRABORTY AND RICK HARBAUGH Online appendix to Persuasion by Cheap Talk, American Economic Review Our results in the main text concern the case

More information

How To Teach Social Science To A Class

How To Teach Social Science To A Class Date submitted: 18/06/2010 Using Web-based Software to Promote Data Literacy in a Large Enrollment Undergraduate Course Harrison Dekker UC Berkeley Libraries Berkeley, California, USA Meeting: 86. Social

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Intercoder reliability for qualitative research

Intercoder reliability for qualitative research Intercoder reliability for qualitative research You win some, but do you lose some as well? TRAIL Research School, October 2012 Authors Niek Mouter, MSc and Diana Vonk Noordegraaf, MSc Faculty of Technology,

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

Masters in Public Policy and Social Change (MAPS), 2012-2013

Masters in Public Policy and Social Change (MAPS), 2012-2013 Applied Comparative Methods Masters in Public Policy and Social Change (MAPS), 2012-2013 Dr. Ciara O Dwyer (ciara.odwyer@carloalberto.org) Aims and Objectives This course provides an introduction to the

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

An Empirical Analysis of Insider Rates vs. Outsider Rates in Bank Lending

An Empirical Analysis of Insider Rates vs. Outsider Rates in Bank Lending An Empirical Analysis of Insider Rates vs. Outsider Rates in Bank Lending Lamont Black* Indiana University Federal Reserve Board of Governors November 2006 ABSTRACT: This paper analyzes empirically the

More information

Inequality, Mobility and Income Distribution Comparisons

Inequality, Mobility and Income Distribution Comparisons Fiscal Studies (1997) vol. 18, no. 3, pp. 93 30 Inequality, Mobility and Income Distribution Comparisons JOHN CREEDY * Abstract his paper examines the relationship between the cross-sectional and lifetime

More information

Problem of the Month Through the Grapevine

Problem of the Month Through the Grapevine The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of problems

More information

Economic impacts of immigration to the UK

Economic impacts of immigration to the UK Economics: MW 235 Summary The impact of immigration into the UK on GDP per head a key measure of prosperity - is essentially negligible. There is tentative evidence to show that immigration of non-eu workers

More information

Broad and Integrative Knowledge. Applied and Collaborative Learning. Civic and Global Learning

Broad and Integrative Knowledge. Applied and Collaborative Learning. Civic and Global Learning 1 2 3 4 5 Specialized Knowledge Broad and Integrative Knowledge Intellectual Skills Applied and Collaborative Learning Civic and Global Learning The Degree Qualifications Profile (DQP) provides a baseline

More information

Evaluation of the nation wide Integration Courses

Evaluation of the nation wide Integration Courses Rambøll Management Federal Ministry of the Interior Evaluation of the nation wide Integration Courses Executive Summary February 2007 Federal Ministry of the Interior Evaluation of the nation wide Integration

More information

Basel Committee on Banking Supervision. Working Paper No. 17

Basel Committee on Banking Supervision. Working Paper No. 17 Basel Committee on Banking Supervision Working Paper No. 17 Vendor models for credit risk measurement and management Observations from a review of selected models February 2010 The Working Papers of the

More information

Credit Card Market Study Interim Report: Annex 4 Switching Analysis

Credit Card Market Study Interim Report: Annex 4 Switching Analysis MS14/6.2: Annex 4 Market Study Interim Report: Annex 4 November 2015 This annex describes data analysis we carried out to improve our understanding of switching and shopping around behaviour in the UK

More information

Solution-Focused Rating (SFR): New Ways in Performance Appraisal

Solution-Focused Rating (SFR): New Ways in Performance Appraisal Prof. Dr. Günter Lueger Solution-Focused Rating (SFR): New Ways in Performance Appraisal Introduction Keywords: performance appraisal, appraisal interview, assessment, assessment centre, evaluation, solution-focused

More information

A Better Statistical Method for A/B Testing in Marketing Campaigns

A Better Statistical Method for A/B Testing in Marketing Campaigns A Better Statistical Method for A/B Testing in Marketing Campaigns Scott Burk Marketers are always looking for an advantage, a way to win customers, improve market share, profitability and demonstrate

More information

IAB Evaluation Study of Methods Used to Assess the Effectiveness of Advertising on the Internet

IAB Evaluation Study of Methods Used to Assess the Effectiveness of Advertising on the Internet IAB Evaluation Study of Methods Used to Assess the Effectiveness of Advertising on the Internet ARF Research Quality Council Paul J. Lavrakas, Ph.D. November 15, 2010 IAB Study of IAE The effectiveness

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

Predictive Coding Defensibility

Predictive Coding Defensibility Predictive Coding Defensibility Who should read this paper The Veritas ediscovery Platform facilitates a quality control workflow that incorporates statistically sound sampling practices developed in conjunction

More information

A survey of public attitudes towards conveyancing services, conducted on behalf of:

A survey of public attitudes towards conveyancing services, conducted on behalf of: A survey of public attitudes towards conveyancing services, conducted on behalf of: February 2009 CONTENTS Methodology 4 Executive summary 6 Part 1: your experience 8 Q1 Have you used a solicitor for conveyancing

More information

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon

More information

Institute for Communication Management, Währinger Gürtel 97, 1180 Vienna, Austria

Institute for Communication Management, Währinger Gürtel 97, 1180 Vienna, Austria ON THE ROLE AND RELEVANCE OF AGENCIES IN INTEGRATED MARKETING COMMUNICATIONS: A quantitative study assessing which contributions to an integrated corporate image large Austrian enterprises require from

More information

INTERNATIONAL FRAMEWORK FOR ASSURANCE ENGAGEMENTS CONTENTS

INTERNATIONAL FRAMEWORK FOR ASSURANCE ENGAGEMENTS CONTENTS INTERNATIONAL FOR ASSURANCE ENGAGEMENTS (Effective for assurance reports issued on or after January 1, 2005) CONTENTS Paragraph Introduction... 1 6 Definition and Objective of an Assurance Engagement...

More information

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs Andrew Gelman Guido Imbens 2 Aug 2014 Abstract It is common in regression discontinuity analysis to control for high order

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling Background Bryan Orme and Rich Johnson, Sawtooth Software March, 2009 Market segmentation is pervasive

More information

C. Wohlin, "Is Prior Knowledge of a Programming Language Important for Software Quality?", Proceedings 1st International Symposium on Empirical

C. Wohlin, Is Prior Knowledge of a Programming Language Important for Software Quality?, Proceedings 1st International Symposium on Empirical C. Wohlin, "Is Prior Knowledge of a Programming Language Important for Software Quality?", Proceedings 1st International Symposium on Empirical Software Engineering, pp. 27-36, Nara, Japan, October 2002.

More information

What Is a Case Study? series of related events) which the analyst believes exhibits (or exhibit) the operation of

What Is a Case Study? series of related events) which the analyst believes exhibits (or exhibit) the operation of What Is a Case Study? Mitchell (1983) defined a case study as a detailed examination of an event (or series of related events) which the analyst believes exhibits (or exhibit) the operation of some identified

More information

Experimental methods. Elisabeth Ahlsén Linguistic Methods Course

Experimental methods. Elisabeth Ahlsén Linguistic Methods Course Experimental methods Elisabeth Ahlsén Linguistic Methods Course Experiment Method for empirical investigation of question or hypothesis 2 types a) Lab experiment b) Naturalistic experiment Question ->

More information

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow Predictive Coding Defensibility and the Transparent Predictive Coding Workflow Who should read this paper Predictive coding is one of the most promising technologies to reduce the high cost of review by

More information

MA in Sociology. Assessment Plan*

MA in Sociology. Assessment Plan* MA in Sociology Assessment Plan* Submitted by The Graduate Assessment Committee: November, 2008 Sharon K. Araji, Chair Submitted to The Dean of the College of Liberal Arts and Sciences UC Denver * The

More information

Marketing Funnels integrated into your Customer Journey Maps: A Winning Combination

Marketing Funnels integrated into your Customer Journey Maps: A Winning Combination B2C Marketing Management Marketing Funnels integrated into your Customer Journey Maps: A Winning Combination On most websites the most common path is usually followed by less than five percent of visitors,

More information

INTERNATIONAL COMPARISONS OF PART-TIME WORK

INTERNATIONAL COMPARISONS OF PART-TIME WORK OECD Economic Studies No. 29, 1997/II INTERNATIONAL COMPARISONS OF PART-TIME WORK Georges Lemaitre, Pascal Marianna and Alois van Bastelaer TABLE OF CONTENTS Introduction... 140 International definitions

More information

FINDINGS OF THE CALIFORNIA SENATE BASELINE SURVEY

FINDINGS OF THE CALIFORNIA SENATE BASELINE SURVEY FINDINGS OF THE CALIFORNIA SENATE BASELINE SURVEY Jerald G. Schutte Professor, Department of Sociology Director, Center for Survey Research California State University, Northridge Faculty Fellows Program

More information

Dualization and crisis. David Rueda

Dualization and crisis. David Rueda Dualization and crisis David Rueda The economic crises of the 20 th Century (from the Great Depression to the recessions of the 1970s) were met with significant increases in compensation and protection

More information

Selecting Research Participants

Selecting Research Participants C H A P T E R 6 Selecting Research Participants OBJECTIVES After studying this chapter, students should be able to Define the term sampling frame Describe the difference between random sampling and random

More information

Successful International B2B Pricing for Manufacturers of Consumer Goods

Successful International B2B Pricing for Manufacturers of Consumer Goods Successful International B2B Pricing for Manufacturers of Consumer Goods Figure 1: Definition of action areas Cluster criteria: Geographical proximity Legal barriers/free trade agreements Similarity in

More information

An Improved Measure of Risk Aversion

An Improved Measure of Risk Aversion Sherman D. Hanna 1 and Suzanne Lindamood 2 An Improved Measure of Risk Aversion This study investigates financial risk aversion using an improved measure based on income gambles and rigorously related

More information

National assessment of foreign languages in Sweden

National assessment of foreign languages in Sweden National assessment of foreign languages in Sweden Gudrun Erickson University of Gothenburg, Sweden Gudrun.Erickson@ped.gu.se This text was originally published in 2004. Revisions and additions were made

More information

Chapter 3 Local Marketing in Practice

Chapter 3 Local Marketing in Practice Chapter 3 Local Marketing in Practice 3.1 Introduction In this chapter, we examine how local marketing is applied in Dutch supermarkets. We describe the research design in Section 3.1 and present the results

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Summary In the introduction of this dissertation, three main research questions were posed. The first question was: how do physical, economic, cultural and institutional distance act as barriers to international

More information

A HOW-TO GUIDE: LABOUR MARKET AND UNEMPLOYMENT STATISTICS

A HOW-TO GUIDE: LABOUR MARKET AND UNEMPLOYMENT STATISTICS A HOW-TO GUIDE: LABOUR MARKET AND UNEMPLOYMENT STATISTICS By Jim Stanford Canadian Centre for Policy Alternatives, 2008 Non-commercial use and reproduction, with appropriate citation, is authorized. This

More information

An Evaluation of Cinema Advertising Effectiveness

An Evaluation of Cinema Advertising Effectiveness An Evaluation of Cinema Advertising Effectiveness Jason Dunnett and Janet Hoek The purpose of this study was to explore the effectiveness of cinema advertising. Specifically: To quantify the proportion

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the

More information

Veto Players and Electoral Reform in Belgium. West European Politics, 34(3), 626-643. [Impact Factor 1.422; Taylor and Francis 2011] S U M M A R Y

Veto Players and Electoral Reform in Belgium. West European Politics, 34(3), 626-643. [Impact Factor 1.422; Taylor and Francis 2011] S U M M A R Y Marc Hooghe & Kris Deschouwer 2011 Veto Players and Electoral Reform in Belgium. West European Politics, 34(3), 626-643. [Impact Factor 1.422; Taylor and Francis 2011] S U M M A R Y Abstract During the

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

Level 1 Articulated Plan: The plan has established the mission, vision, goals, actions, and key

Level 1 Articulated Plan: The plan has established the mission, vision, goals, actions, and key S e s s i o n 2 S t r a t e g i c M a n a g e m e n t 1 Session 2 1.4 Levels of Strategic Planning After you ve decided that strategic management is the right tool for your organization, clarifying what

More information

Copyrighted material SUMMARY

Copyrighted material SUMMARY Source: E.C.A. Kaarsemaker (2006). Employee ownership and human resource management: a theoretical and empirical treatise with a digression on the Dutch context. Doctoral Dissertation, Radboud University

More information

Problems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment,

Problems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment, Uncertainty Problems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment, E.g., loss of sensory information such as vision Incorrectness in

More information

MEMORANDUM. RE: MPA Program Capstone Assessment Results - CY 2003 & 2004

MEMORANDUM. RE: MPA Program Capstone Assessment Results - CY 2003 & 2004 MEMORANDUM TO: CC: FROM: MPA Program Faculty Mark Rosentraub, Dean Dennis Keating, Associate Dean Vera Vogelsang-Coombs Associate Professor & Director DATE: May 29, 2005 RE: MPA Program Capstone Assessment

More information

Harvard College Program in General Education Faculty of Arts and Sciences Harvard University. A Guide to Writing in Ethical Reasoning 15

Harvard College Program in General Education Faculty of Arts and Sciences Harvard University. A Guide to Writing in Ethical Reasoning 15 Harvard College Program in General Education Faculty of Arts and Sciences Harvard University A Guide to Writing in Ethical Reasoning 15 A Guide to Writing in Ethical Reasoning 15 Professor Jay M. Harris

More information

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Charles J. Schwartz Principal, Intelligent Analytical Services Demographic analysis has become a fact of life in market

More information

Holger Sommerfeld: Developing a new management approach by combining risk management and controlling as a change management process

Holger Sommerfeld: Developing a new management approach by combining risk management and controlling as a change management process Holger Sommerfeld: Developing a new management approach by combining risk management and controlling as a change management process 0. Reasoning Why? Following the period after Lehman s collapse a lot

More information

Consulting projects: What really matters

Consulting projects: What really matters Consulting projects: What really matters The factors that influence the success of management consulting projects Case 138: het 'Zwijsen future proof' project met de inzet van GEA Results PhD 2014, Bart

More information

Measurement Information Model

Measurement Information Model mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides

More information

Labor Economics, 14.661. Lecture 3: Education, Selection, and Signaling

Labor Economics, 14.661. Lecture 3: Education, Selection, and Signaling Labor Economics, 14.661. Lecture 3: Education, Selection, and Signaling Daron Acemoglu MIT November 3, 2011. Daron Acemoglu (MIT) Education, Selection, and Signaling November 3, 2011. 1 / 31 Introduction

More information

General remarks. Page 1 of 6

General remarks. Page 1 of 6 Frankfurt am Main, 14. April 2010 Sophie Ahlswede Deutsche Bank AG/DB Research P.O. Box 60262 Frankfurt, Germany e-mail: sophie.ahlswede@db.com Tel. +49 (0)69 910 31832 Deutsche Bank response to the public

More information