Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data *

Size: px
Start display at page:

Download "Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data *"

Transcription

1 Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter ata * Pablo Barberá Forthcoming in Political Analysis Abstract Politicians and citizens increasingly engage in political conversations on social media outlets such as Twitter. In this paper I show that the structure of the social networks in which they are embedded can be a source of information about their ideological positions. Under the assumption that social networks are homophilic, I develop a Bayesian Spatial Following model that considers ideology as a latent variable, whose value can be inferred by examining which politics actors each user is following. is method allows us to estimate ideology for more actors than any existing alternative, at any point in time and across many polities. I apply this method to estimate ideal points for a large sample of both elite and mass public Twitter users in the US and five European countries. e estimated positions of legislators and political parties replicate conventional measures of ideology. e method is also able to successfully classify individuals who state their political preferences publicly and a sample of users matched with their party registration records. To illustrate the potential contribution of these estimates, I examine the extent to which online behavior during the 2012 US presidential election campaign is clustered along ideological lines. * I would like to thank Jonathan Nagler, Joshua Tucker, Nick Beauchamp, Neal Beck, Ken Benoit, ichard Bonneau, Patrick Egan, Adam Harris, John Jost, Franziska Keller, Michael Laver, Alan Potter, Gaurav Sood, Chris Tausanovitch, Shana Warren, and two anonymous reviewers for helpful comments and discussions. e present work has been supported by the National Science Foundation (Award # ) and the La Caixa Fellowship Program.

2 1 Introduction Measuring politicians and voters policy positions is a relevant, yet complex, scientific endeavor. Studies of electoral behavior, government formation, and party competition require systematic information on the placement of key political actors and voters on the relevant policy dimensions. e development of methods to estimate such positions, usually in a single latent dimension characterized as ideology (Poole and osenthal, 2007; Clinton, Jackman and ivers, 2004; Shor, Berry and McCarty, 2010; Bonica, 2013b; Jessee, 2009), represents one of the most important methodological contributions to political science in the past two decades. However, most studies estimate ideal points for legislators only. When the analysis also includes voters, it is done at the expense of strong bridging assumptions (Jessee, 2009), or only for selfselected population groups (Bonica, 2014). ere is also little work on cross-national ideological estimation (Lo, Proksch and Gschwend, 2013). Most importantly, given the sparse nature of the data (roll-call votes or contributions) and its costly collection (survey data), current measurement methods generate ideal points that are essentially static in the short-run. In this paper I show that using Twitter networks as a source of information about policy positions has the potential to solve these difficulties. Twitter has become one of the most important communication arenas in daily politics. Initially conceived as a website to share personal status updates, it now has more than 200 million monthly active users worldwide, 1 including 18% of all online Americans. 2 One distinct characteristic of this online social network is the presence of not only ordinary citizens, but also political actors. Virtually every legislator, political party, and candidate in developed democracies has an active Twitter account. Independent of their offline identities, they all interact within the same symbolic framework, using similar language in messages of identical length. Most importantly, they are embedded in a common social network. is opens the possibility of estimating ideological positions of all users on a common scale, which would allow for meaningful comparisons of voters and legislators ideal points. 1 Source: Twitter s Official Twitter Account, ecember 18, [link] 2 Source: e Pew esearch Center s Internet & American Life Project, August [link] 2

3 e use of Twitter data presents three additional advantages over other sources of information about preferences. First, the large number of active users on this social networking site can be exploited to estimate highly precise ideal points for politicians, if we consider users as experts who are rating elites through their decisions of who to follow. Second, the structure of this network is far from static, which can facilitate the estimation of highly granular dynamic ideal points in real time. ird, it is possible to link Twitter profiles to other data through name identification, which provides interesting ways to examine differences between private and public political behavior. is series of advantages comes at the expense of one important limitation. Twitter users are not a representative sample of the voting age population. is can represent a difficulty in the context of studies about mass attitudes and behavior, but not for the method I present in this paper. Citizens who discuss politics on Twitter are more likely to be educated and politically interested, and that makes them a particularly useful source of information about elites ideology. is method relies on the characteristics of the social ties that Twitter users develop with each other and, in particular, with the political actors (politicians, think tanks, news outlets, and others) they decide to follow. I argue that valid policy positions for ordinary users and political actors can be inferred from the structure of the following links across these two sets of Twitter users. e decision to follow is considered a costly signal that provides information about Twitter users perceptions of both their ideological location and that of political accounts. Unlike other studies that estimate political ideology using social media data (Conover et al., 2010; King, Orlando and Sparks, 2011; Boutet et al., 2012), I am able to estimate ideal points, with standard errors, on a continuous scale, for all types of active Twitter users, across different countries. To validate the method, I estimate the ideological positions of legislators, political parties, and a large sample of active users in the US and five European countries. eir estimated ideal points replicate conventional measures of ideology. is method represents an additional measurement tool that can be used to estimate ideology, an important quantity of interest in political science, for a larger set of political actors and individuals than any other method before. To illustrate a potential use of these estimates, I examine the extent to which online behavior during the 2012 US presidential election campaign is clustered 3

4 along ideological lines, finding support for the so-called echo-chamber theory and high levels of political polarization at the mass level. 2 Ideal Point Estimation Using Twitter ata 2.1 Previous Studies ere is a limited but increasing literature on the measurement of users attributes in social media, particularly in the field of computer science. espite ideology being one of the key predictors of political behavior, its measurement through social media data has only been examined in a handful of studies. 3 ese studies have relied on three different sources of information to infer Twitter users ideology. First, Conover et al. (2010) focus on the structure of the conversation on Twitter: who replies to whom, and who retweets whose messages. Using a community detection algorithm, they find two segregated political communities in the US, which they identify as emocrats and epublicans. Second, Boutet et al. (2012) argue that the number of tweets referring to a British political party sent by each user before the 2010 elections are a good predictor of his or her party identification. However, Pennacchiotti and Popescu (2011) and Al Zamal, Liu and uths (2012) have found that the inference accuracy of these two sources of information is outperformed by a machine learning algorithm based on a user s social network properties. In particular, their results show that the network of friends (who each individual follows on Twitter) allows us to infer political orientation even in the absence of any information about the user. Similarly, the only political science study (to my knowledge) that aims at measuring ideology (King, Orlando and Sparks, 2011) uses this type of information. ese authors apply a data-reduction technique to the complete network of followers of the U.S. Congress, and find that their estimates of the ideology of its members are highly correlated 3 Ideology is defined here as the main policy dimension that articulates political competition: a line whose le end is understood to reflect an extremely liberal position and whose right end corresponds to extreme conservatism. (Bafumi et al., 2005, p.171) Each individual s ideal point or policy preference corresponds to their position on this scale. See also Poole and osenthal (1997, 2007). 4

5 with estimates based on roll-call votes. From a theoretical perspective, the use of network properties to measure ideology has several advantages in comparison to the alternatives. Text-based measures need to solve the potentially severe problem of disambiguation caused by contractions designed to fit the 140-character limit, and are vulnerable to the phenomenon of content injection. As Conover et al. (2010) show, hashtags are o en used incorrectly for political reasons: politically-motivated individuals o en annotate content with hashtags whose primary audience would not likely choose to see such information ahead of time. is reduces the efficiency of this measure and results in bias if content injection is more frequent among one side of the political spectrum. Similarly, conversation analysis is sensitive to two common situations: the use of retweets for ironic purposes, whose purpose is to criticize or debate with another user. As a result, it is hard to characterize the emerging communities, and whether they overlap with the ideological composition of the electorate, or even if they are stable over time. In conclusion, a critical reading of the literature suggests the need to develop new, network-based measures of political orientation. It is also necessary to improve the existing statistical methods that have been applied. Pennacchiotti and Popescu (2011) and Al Zamal, Liu and uths (2012) focus only on classifying users, but most political science applications require a continuous measure of ideology. In order to draw correct inferences, it is also important to indicate the uncertainty of the estimates. Without these, it is not possible to make inferences about their rank-ordering, for example. Most importantly, none of these studies explores the possibility of placing ordinary citizens and legislators on a common scale or whether this method would generate valid ideology estimates outside of the US context. ese three limitations of the existing studies justify the need to develop a new method that can provide reliable and valid estimates (and standard errors) of Twitter users ideology on a continuous scale. at is precisely the main contribution of this paper. 5

6 2.2 Assumptions In this paper I demonstrate that valid ideal point estimates of individual Twitter users and political actors with a Twitter account can be derived from the structure of the following links between these two sets of users. In order to do so, I develop a Bayesian spatial model of Twitter users following behavior. e key assumption of this model is that Twitter users prefer to follow politicians whose position on the latent ideological dimension are similar to theirs. is assumption is equivalent to that of spatial voting models (see e.g. Enelow and Hinich, 1984). I consider following decisions to be costly signals about users perceptions of both their ideological location and that of political accounts. Such cost can take two forms. If the content of the messages users are exposed to as a result of their following decisions challenges their political views, it can create cognitive dissonance. Second, given the fast-paced nature of Twitter, it also creates opportunity costs, since it reduces the likelihood of being exposed to other messages, assuming the amount of time a user spends on Twitter is constant. In other words, these decisions provide information about how social media users decide to allocate a scarce resource their attention. While obviously less costly than campaign contributions or votes in a legislature, the assumption behind this model is similar in nature to that justifying how donations and roll-call votes can be scaled onto a latent ideological dimension (Bonica, 2014; Poole and osenthal, 2007). Two additional arguments support the notion that following decisions can be informative about ideology. First, the vast body of research about homophily in personal interactions can easily be extended to online social networks such as Twitter. As McPherson, Smith-Lovin and Cook (2001) theorize, individuals tend to be embedded in homogenous networks with regard to many sociodemographic and behavioral traits. Multiple studies have observed patterns of homophilic segregation consistent with these models in networks of interactions between Twitter users (Wu et al., 2011; Conover et al., 2012). However, Twitter is not only an online social network it is also a news media (Kwak et al., 2010). From this perspective, we should also consider the existing literature on the selective expo- 6

7 sure theory (Lazarsfeld, Berelson and Gaudet, 1944; Bryant and Miron, 2004; Stroud, 2008; Iyengar and Hahn, 2009) that argues that individuals exhibit a preference for opinion-reinforcing political information and that they systematically avoid opinion challenges. Given the dynamic nature of social media, its large size, and individuals finite ability to process incoming information (Oken Hodas and Lerman, 2012), we should expect Twitter users to maximize the value of their online experience by choosing to follow political actors who can provide information that can be of higher value to them. 2.3 e Statistical Model e statistical model I employ is similar in nature to latent space models applied to social networks (Hoff, a ery and Handcock, 2002), item-response theory models (see e.g. Linden and Hamlbleton, 1997), and other methods that scale roll-call votes or campaign contributions into latent political dimensions (Poole and osenthal, 2007; Clinton, Jackman and ivers, 2004; Bonica, 2014), but adapted to allow the estimation of ideal points for hundreds of thousands of individuals. Suppose that each Twitter user i {1,..., n} is presented with a choice between following or not following another target user j {1,..., m}, where j is a political actor who has a Twitter account. 4 Let y ij = 1 if user i decides to follow user j, and y ij = 0 otherwise. For the reasons explained above, I expect this decision to be a function of the squared Euclidean distance in the latent ideological dimension 5 between user i and j: γ θ i ϕ j 2, where θ i is the ideal point of Twitter user i, ϕ j is the ideal point of Twitter user j, and γ is a normalizing constant. To this core model, I add two additional parameters, α j and β i. e former measures the popularity of user j. is parameter accounts for the fact that some political accounts are more likely to be followed, due to the higher profile of the politicians behind them (for example, we would ex- 4 If we considered not only politicians, but the entire Twitter network, then n = m. In that case, the model would still yield valid estimates, but the estimation would be computationally intractable and inefficient and, as I argue below, the resulting latent dimension might not be ideology. In this paper I show that it is possible to obtain valid ideal point estimates choosing a small m whose characteristics make following decisions informative about the ideology of users i and j. 5 I assume that ideology is unidimensional, which is a fairly standard assumption in the literature (e.g., see Poole and osenthal, 1997, 2007) However, the model I estimate could be generalized to multiple dimensions. 7

8 pect the probability of to be higher than the probability of following a random member of the US Congress) or for other reasons (politicians who tweet more o en are more likely to be highly visible and therefore also to have more followers). e latter measures the level of political interest of each user i. Similarly, this parameter accounts for the differences in the number of political accounts each user i decides to follow, which could be related to the overall number of Twitter users they follow, or their overall level of interest in politics. e probability that user i follows a political account j is then formulated as a logit model: P (y ij = 1 α j, β i, γ, θ i, ϕ j ) = logit 1 ( α j + β i γ θ i ϕ j 2) (1) Given that none of these parameters is directly observed, the statistical problem here is inference of θ = (θ i,..., θ n ), ϕ = (ϕ j,..., ϕ m ), α = (α j,..., α m ), β = (β i,..., β n ), and γ. Assuming local independence (individual decisions to follow are independent across users n and m, conditional on the estimated parameters), the likelihood function to maximize this model is as follows: p(y θ, ϕ, α, β, γ) = n m logit 1 (π ij ) y ij (1 logit 1 (π ij )) 1 y ij, (2) i=1 j=1 where π ij = α j + β i γ θ i ϕ j 2. Estimation and inference for this type of model is not trivial. Maximum-likelihood estimation methods are usually intractable given the large number of parameters involved. However, samples from the posterior density of each parameter in the model can be obtained using Markov-Chain Monte Carlo methods. To improve the efficiency of this procedure, I use a Hamiltonian Monte Carlo algorithm (Gelman et al., 2013) and employ a hierarchical setup that considers each of the four sets of parameters as draws from four common population distributions: α j N(µ α, σ α ), 8

9 β j N(µ β, σ β ), θ i N(µ θ, σ θ ), and ϕ j s N(µ ϕ, σ ϕ ). e full joint posterior distribution is thus: p(θ, ϕ, α, β, γ y) p(θ, ϕ, α, β, γ, µ, σ) (3) n m logit 1 (π ij ) y ij (1 logit 1 (π ij )) 1 y ij i=1 j=1 m N(α j µ α, σ α ) n n m N(β i µ β, σ β ) N(θ i µ θ, σ θ ) N(ϕ j µ ϕ, σ ϕ ) j=1 i=1 i=1 j=1 2.4 Identification e model described by equation 1 is unidentified: any constant can be added to all the parameters θ i and ϕ j without changing the predictions of the model; and similarly θ i or ϕ j can be multiplied by any non-zero constant, with γ divided by its square, leaving the model predictions unchanged. ese indeterminacies, which are common to item-response theory models, are sometimes called additive aliasing and scaling invariance (see e.g. Bafumi et al., 2005 or Londregan, 1999). Existing studies on ideal point estimation employ two different strategies to identify the model. One is to apply two linearly independent restrictions on the ideal point parameters, θ or ϕ in this case. In particular, the usual procedure is to constrain the ideal points of two legislators (liberal and conservative) at arbitrary positions, such as 1 and +1 (see e.g. Londregan, 1999; Clinton, Jackman and ivers, 2004). An alternative is to apply a unit variance restriction on the set of ideal points, which in the multilevel setting would be equivalent to giving the θ i s or ϕ j s an informative N(0, 1) prior distribution (Gelman and Hill, 2007, p.318). However, note that this second approach does not solve the problem of reflection invariance : the resulting scale can be reversed (flipped le -to-right) without changing the prediction of the models. As Jackman (2001) shows, choosing starting values that are consistent with the expected direction of the scale (e.g. liberals with 1 and conservatives with +1) is sufficient to ensure global identification in most cases. As I show in the Supplementary Materials, either of these strategies can be applied in this case to resolve the indeterminacies. 9

10 2.5 MCMC algorithm To improve the efficiency of the estimation procedure, I divide it into two stages. First, I use a No-U-Turn sampler, a variant of Hamiltonian Monte Carlo sampling algorithms (Gelman et al., 2013), to estimate the parameters indexed by j, using a random sample of 10,000 i users who follow at least 10 j users. In the second stage, I use a random-walk Metropolis-Hastings algorithm (Metropolis et al., 1953) to estimate all parameters indexed by i. Note that each of these parameters can be estimated individually because I assume local independence, conditional on the j parameters, and therefore multi-core processors can be used to run multiple samplers simultaneously and dramatically increase computation speed. 6 e first stage is implemented using the Stan modeling language (Stan evelopment Team, 2012), while the second stage is implemented using. I use flat priors on all parameters, with the exception of µ θ, σ θ, and µ α, which are fixed to 0, 1, and 0 respectively for identification purposes. e samplers in both stages are run using two chains with as many iterations as necessary to ensure that all ideology parameters have an effective number of simulation draws (Gelman and ubin, 1992) of at least 200. Each chain is initiated with random draws from a multivariate normal distribution for θ and γ, the logarithm of the number of followers of user j or number of friends of user i for α and β (to speed up convergence), and values of zero for ϕ, with the exception of those who belong to a party, 1 for le -wing politicians and +1 for right-wing politicians. is model fitting strategy appears to be quite robust and my results are largely insensitive to the choice of priors and initial values iscussion A key challenge in implementing this method is the choice of the m target Twitter users who are political elites: the set of users with discriminatory predictive power such that the decision to 6 Samples from the i parameters in the second stage can be compared with those obtained for the random sample in the first stage to ensure that there were no errors in the estimation. In all the examples in this paper, the correlation between these two sets of estimates is ρ = See Section in the Supplementary Materials for the code to estimate the model in Stan, as well as results of a battery of tests that assess model fit. 10

11 follow them (or not) provides information about an individual s ideology. Following Conover et al. (2010), we could analyze the entire Twitter network and let the different clusters emerge naturally. However, homophilic networks can be based not only on political traits, but also on other personal characteristics. Instead, the approach I use is to select a limited number of target users that includes politicians, think tanks, and news outlets with a clear ideological profile that span the full range of the ideological spectrum. e set of users that are considered will determine the interpretation of the latent scale where ideal points are located and, for this reason, it is important to include identifiable figures with extreme ideological positions, beyond just partisans. 8 3 ata e estimation method I propose in this paper can be applied to any country where a high number of citizens are discussing politics on Twitter. 9 However, in order to test the validity of the estimated parameters, I will focus on six countries where high-quality ideology measures are available for a subset of all Twitter users: the US, the UK, Spain, Germany, Italy, and the Netherlands. Furthermore, the increasing complexity of the party system in each of these countries will show how the method performs as the number of parties increases. For each of these countries, I identified a set of political actors with visible profiles on Twitter: 1) all political representatives in national-level institutions, 2) political parties with accounts on Twitter; and 3) media outlets and journalists who tweet about politics. I considered only political Twitter users with more than 5,000 (US) or 2,000 (UK, Spain, Italy, Germany, the Netherlands) followers. is represents a total of m = 318 target users in the US, m = 244 in the UK, m = Note that the model is agnostic regarding the interpretation of the latent dimension, which will depend on the set of m political actors that are considered. As I show in Section 4.1, the results from multi-party systems clearly show that this dimension overlaps with the le -right scale. In the U.S., where partisanship and ideology are highly correlated, it is not as clear. However, the fact that Twitter-based ideal points are highly correlated with W-NOMINATE scores (commonly thought to capture legislators ideology), and that state-level estimates are better predictors of survey-based measures of ideology rather than partisanship also suggest that the estimated dimension is the liberal-conservative scale. 9 Estimating ideal points using data from different countries simultaneously is more complex, given the high intracountry locality effect (Gonzalez et al., 2011), which limits the number of Twitter users who could serve as bridges across countries in the estimation. 11

12 in Spain, m = 214 in Italy, m = 273 in Germany, and m = 118 in the Netherlands. 10 Next, using the Twitter EST API, I obtained the entire list of followers for all m users in each country, resulting in a entire universe of Twitter users following at least one politician of n = 32,919,418 in the US, n = 2,647,413 in the UK, n = 1,059,890 in Spain, n =1,119,763 in Italy, n = 1,559,311 in Germany, and n = 856,201 in the Netherlands. 11 However, an extremely high proportion of these users are either inactive, spam bots or reside in different countries. To overcome this problem, I extracted the available personal attributes from each user s profile, and discarded from the sample those who 1) have sent fewer than 100 tweets, 2) have not sent one tweet in the past six months, 3) have less than 25 followers, 4) are located outside the borders of the country of interest, and 5) follow less than three political Twitter accounts. 12 e final sample size is n = 301,537 users in the US, n = 135,015 in the UK, n = 123,846 in Spain, n =150,143 in Italy, n = 49,142 in Germany, and n = 96,624 in the Netherlands. 13 is is a highly self-selected sample because Twitter users are not a representative sample of the population. 14 In addition, the inferences I make based on our sample won t represent the full set of Twitter users, as I am only selecting those users who follow three or more political accounts. However, this should not affect the inference of politicians ideal points, since these users can indeed be considered as authoritative when it comes to politics. Precisely because they are more likely to be knowledgeable and interested in politics than the average citizen, examining their online behavior can be highly informative about policy positions. is procedure is roughly analogous to an expert survey with many respondents where each respondent provides a small amount of information that, 10 See the Supplementary Materials for additional details on the data collection. Full replication files are available as Barberá (2014). 11 As of November 2012 in the US, Spain, the Netherlands, and the UK; February 2013 in Italy; August 2013 in Germany 12 In the U.S. sample, I further restricted the sample to accounts who tweeted at least three times mentioning Obama or omney during the three months before the 2012 election, in order to include only users who tweet frequently about politics. 13 Note that the sample selection process requires identifying the specific country from which each user tweets. is information was inferred from the time zone and location fields in the user profile, which was sufficient to identify the country of residence in 90% of the cases. is proportion is lower when we consider more specific geographical levels, such as state in the US (71%). 14 Table 1 in the Supplementary Materials shows that Twitter users in the U.S. tend to be younger and to have a higher income level than the average citizen, and their educational background and racial composition is different than that of the entire population. 12

13 when aggregated, results in highly accurate policy estimates. 4 esults and Validation In this section I provide a summary of the ideology estimates for the six countries included in my study. To validate the method, I will use different sources of external information to assess whether this procedure is able to correctly classify and scale Twitter users on the le or right side of the ideological dimension. My analysis is divided in three parts, with each of them providing a different type of evidence to the validation. e first part shows that Twitter-based ideal points replicate existing measures of ideology for elites (legislators and political parties) in six different countries. en, I validate mass ideology at the aggregate level by examining groups of Twitter users by self-identified ideology and state of residence. Here I am also able to replicate previous findings in the literature about elite and mass polarization. Finally, I also validate mass ideology at the individual level using campaign contribution records and information about voters party registration history. 4.1 eplication of Legislators and Parties Ideal Points e first set of results I focus on are those from the United States. Figure 1 compares ϕ j, the ideal point estimates, of 231 members of the 112th U.S. Congress 15 based on their Twitter network of followers (y axis) with their W-NOMINATE scores, 16 based on their roll-call voting records (Poole and osenthal, 2007), on the x axis. Each letter corresponds to a different member of Congress, where stands for emocrats and stands for epublicans, and the two panels split the sample according to the chamber of Congress to which they were elected. As we can see, the estimated ideal points are clustered in two different groups that align almost perfectly with party membership. e correlation between Twitter- and roll-call-based ideal points 15 As explained in Section 3, only Twitter accounts with more than 5,000 followers as of November, 2012 are included in the sample. 16 Source: voteview.com 13

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

The primary goal of this thesis was to understand how the spatial dependence of

The primary goal of this thesis was to understand how the spatial dependence of 5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial

More information

How Social Media Reduces Mass Political Polarization. Evidence from Germany, Spain, and the U.S.

How Social Media Reduces Mass Political Polarization. Evidence from Germany, Spain, and the U.S. How Social Media educes Mass Political Polarization. Evidence from Germany, Spain, and the U.S. Pablo Barberá pablo.barbera@nyu.edu New York University Paper prepared for the 2015 APSA Conference Abstract

More information

Predicting Elections with Twitter What 140 Characters Reveal about Political Sentiment

Predicting Elections with Twitter What 140 Characters Reveal about Political Sentiment Predicting Elections with Twitter What 140 Characters Reveal about Political Sentiment Andranik Tumasjan, Timm O. Sprenger, Philipp G. Sandner, Isabell M. Welpe Workshop Election Forecasting 15 July 2013

More information

On Correlating Performance Metrics

On Correlating Performance Metrics On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are

More information

Dualization and crisis. David Rueda

Dualization and crisis. David Rueda Dualization and crisis David Rueda The economic crises of the 20 th Century (from the Great Depression to the recessions of the 1970s) were met with significant increases in compensation and protection

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

Title: MPs Use of Social Networking Sites. A cross-national research

Title: MPs Use of Social Networking Sites. A cross-national research Title: MPs Use of Social Networking Sites. A cross-national research Bio: Norbert Merkovity lecturer at University of Szeged and researcher at National University of Public Service, Budapest, Hungary (merkovity@juris.u-szeged.hu).

More information

This paper considers two primary issues about big

This paper considers two primary issues about big SYMPOSIUM Drawing Inferences and Testing Theories with Big Data Jonathan Nagler, New York University Joshua A. Tucker, New York University This paper considers two primary issues about big data: (1) big

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015 Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

More information

Media Channel Effectiveness and Trust

Media Channel Effectiveness and Trust Media Channel Effectiveness and Trust Edward Paul Johnson 1, Dan Williams 1 1 Western Wats, 701 E. Timpanogos Parkway, Orem, UT, 84097 Abstract The advent of social media creates an alternative channel

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Title: Split-Ticket Voting in Mixed-Member Electoral Systems: A Theoretical and Methodological Investigation.

Title: Split-Ticket Voting in Mixed-Member Electoral Systems: A Theoretical and Methodological Investigation. Thesis Summary Title: Split-Ticket Voting in Mixed-Member Electoral Systems: A Theoretical and Methodological Investigation. Author: Carolina Plescia (carolina.plescia@univie.ac.at) Supervisors: Prof.

More information

State Constitutional Reform and Related Issues

State Constitutional Reform and Related Issues California Opinion Index A digest summarizing California voter opinions about State Constitutional Reform and Related Issues October 2009 Findings in Brief By a 51% to 38% margin voters believe that fundamental

More information

Jon A. Krosnick and LinChiat Chang, Ohio State University. April, 2001. Introduction

Jon A. Krosnick and LinChiat Chang, Ohio State University. April, 2001. Introduction A Comparison of the Random Digit Dialing Telephone Survey Methodology with Internet Survey Methodology as Implemented by Knowledge Networks and Harris Interactive Jon A. Krosnick and LinChiat Chang, Ohio

More information

Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

More information

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll THE FIELD POLL THE INDEPENDENT AND NON-PARTISAN SURVEY OF PUBLIC OPINION ESTABLISHED IN 1947 AS THE CALIFORNIA POLL BY MERVIN FIELD Field Research Corporation 601 California Street, Suite 210 San Francisco,

More information

Democratic Process and Social Media: A Study of Us Presidential Election 2012

Democratic Process and Social Media: A Study of Us Presidential Election 2012 Democratic Process and Social Media: A Study of Us Presidential Election 2012 Abstract Susanta Kumar Parida M.Phil Scholar Dept. of Political Science Utkal University, Vanivihar, Bhubaneswar, Odisha India

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Indirect Presidential Influence, State-level Approval, and Voting in the U.S. Senate

Indirect Presidential Influence, State-level Approval, and Voting in the U.S. Senate Indirect Presidential Influence, State-level Approval, and Voting in the U.S. Senate Caitlin E. Dwyer dwyer077@umn.edu Department of Political Science University of Minnesota Sarah A. Treul 1 streul@unc.edu

More information

Equilibrium: Illustrations

Equilibrium: Illustrations Draft chapter from An introduction to game theory by Martin J. Osborne. Version: 2002/7/23. Martin.Osborne@utoronto.ca http://www.economics.utoronto.ca/osborne Copyright 1995 2002 by Martin J. Osborne.

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

5. Which normally describes the political party system in the United States? 1. A political party supports this during an election: A.

5. Which normally describes the political party system in the United States? 1. A political party supports this during an election: A. 1. A political party supports this during an election: A. Public Policy B. Platform C. Compromise D. Third Party 2. Third parties usually impact government by: A. Electing large numbers of politicians

More information

THE FIELD POLL. By Mark DiCamillo and Mervin Field

THE FIELD POLL. By Mark DiCamillo and Mervin Field THE FIELD POLL THE INDEPENDENT AND NON-PARTISAN SURVEY OF PUBLIC OPINION ESTABLISHED IN 1947 AS THE CALIFORNIA POLL BY MERVIN FIELD Field Research Corporation 601 California Street, Suite 210 San Francisco,

More information

RECOMMENDED CITATION: Pew Research Center, January, 2016, Republican Primary Voters: More Conservative than GOP General Election Voters

RECOMMENDED CITATION: Pew Research Center, January, 2016, Republican Primary Voters: More Conservative than GOP General Election Voters NUMBERS, FACTS AND TRENDS SHAPING THE WORLD FOR RELEASE JANUARY 28, 2016 FOR MEDIA OR OTHER INQUIRIES: Carroll Doherty, Director of Political Research Jocelyn Kiley, Associate Director, Research Bridget

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

AP UNITED STATES GOVERNMENT AND POLITICS 2010 SCORING GUIDELINES

AP UNITED STATES GOVERNMENT AND POLITICS 2010 SCORING GUIDELINES AP UNITED STATES GOVERNMENT AND POLITICS 2010 SCORING GUIDELINES Question 3 6 points Part (a): 1 point One point is earned for identifying one specific trend evident in the figure: Percentage of House

More information

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll THE FIELD POLL THE INDEPENDENT AND NON-PARTISAN SURVEY OF PUBLIC OPINION ESTABLISHED IN 1947 AS THE CALIFORNIA POLL BY MERVIN FIELD Field Research Corporation 601 California Street, Suite 210 San Francisco,

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Private Television in Poland & Slovakia

Private Television in Poland & Slovakia Private Television in Poland & Slovakia, March 2003 Matúš Minárik CONCLUSION AND RECOMMENDATIONS The present policy paper and recommendations result from the policy research done in the framework of the

More information

NATIONAL: AN ANGRY AMERICA

NATIONAL: AN ANGRY AMERICA Please attribute this information to: Monmouth University Poll West Long Branch, NJ 07764 www.monmouth.edu/polling Follow on Twitter: @MonmouthPoll Released: Monday, January 25, 2016 Contact: PATRICK MURRAY

More information

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll THE FIELD POLL THE INDEPENDENT AND NON-PARTISAN SURVEY OF PUBLIC OPINION ESTABLISHED IN 1947 AS THE CALIFORNIA POLL BY MERVIN FIELD Field Research Corporation 601 California Street, Suite 210 San Francisco,

More information

Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith

Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith Policy Discussion Briefing January 27 Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith Introduction It is rare to open a newspaper or read a government

More information

Cluster Analysis for Evaluating Trading Strategies 1

Cluster Analysis for Evaluating Trading Strategies 1 CONTRIBUTORS Jeff Bacidore Managing Director, Head of Algorithmic Trading, ITG, Inc. Jeff.Bacidore@itg.com +1.212.588.4327 Kathryn Berkow Quantitative Analyst, Algorithmic Trading, ITG, Inc. Kathryn.Berkow@itg.com

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

Drawing Inferences and Testing Theories with Big Data

Drawing Inferences and Testing Theories with Big Data Drawing Inferences and Testing Theories with Big Data Jonathan Nagler Richard Bonneau Josh Tucker John Jost Abstract We argue that having more data is an opportunity, not a constraint, on testing theories

More information

English Summary 1. cognitively-loaded test and a non-cognitive test, the latter often comprised of the five-factor model of

English Summary 1. cognitively-loaded test and a non-cognitive test, the latter often comprised of the five-factor model of English Summary 1 Both cognitive and non-cognitive predictors are important with regard to predicting performance. Testing to select students in higher education or personnel in organizations is often

More information

Types of Democracy. Types of Democracy

Types of Democracy. Types of Democracy Types of Democracy The democratic form of government is an institutional configuration that allows for popular participation through the electoral process. According to political scientist Robert Dahl,

More information

SOCIAL JOURNALISM STUDY 2012

SOCIAL JOURNALISM STUDY 2012 SOCIAL JOURNALISM STUDY 2012 2012 Social Journalism Study - United Kingdom Report by Cision & Canterbury Christ Church University (UK) www.cision.com 1. EXECUTIVE SUMMARY Key findings: 28.1% of UK journalists

More information

In this chapter, you will learn improvement curve concepts and their application to cost and price analysis.

In this chapter, you will learn improvement curve concepts and their application to cost and price analysis. 7.0 - Chapter Introduction In this chapter, you will learn improvement curve concepts and their application to cost and price analysis. Basic Improvement Curve Concept. You may have learned about improvement

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

CLUSTER ANALYSIS FOR SEGMENTATION

CLUSTER ANALYSIS FOR SEGMENTATION CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every

More information

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Introduction In the summer of 2002, a research study commissioned by the Center for Student

More information

AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

More information

Are Social Media more Social than Media? Measuring. Ideological Homophily and Segregation on Twitter

Are Social Media more Social than Media? Measuring. Ideological Homophily and Segregation on Twitter Are Social Media more Social than Media? Measuring Ideological Homophily and Segregation on Twitter Yosh Halberstam Brian Knight December 18, 2013 Abstract Social media represent a rapidly growing source

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

Methodological Approach: Typologies of Think Tanks

Methodological Approach: Typologies of Think Tanks Methodological Approach: Typologies of Think Tanks Unlike Stone, Donald Abelson applies a typology of think tanks by focusing on four distinctive periods of think tanks development to recognise the major

More information

THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING

THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING 1. Introduction The Black-Scholes theory, which is the main subject of this course and its sequel, is based on the Efficient Market Hypothesis, that arbitrages

More information

A Simple Model of Price Dispersion *

A Simple Model of Price Dispersion * Federal Reserve Bank of Dallas Globalization and Monetary Policy Institute Working Paper No. 112 http://www.dallasfed.org/assets/documents/institute/wpapers/2012/0112.pdf A Simple Model of Price Dispersion

More information

Barack Obama won the battle on social media too!

Barack Obama won the battle on social media too! Think... Special Edition Barack Obama won the battle on social media too! On the 4th of April 2011, Barack Obama announced his candidacy for the 2012 US Presidential election on Youtube* Yesterday evening

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

Behavioral Segmentation

Behavioral Segmentation Behavioral Segmentation TM Contents 1. The Importance of Segmentation in Contemporary Marketing... 2 2. Traditional Methods of Segmentation and their Limitations... 2 2.1 Lack of Homogeneity... 3 2.2 Determining

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Strategic Online Advertising: Modeling Internet User Behavior with

Strategic Online Advertising: Modeling Internet User Behavior with 2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

More information

VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR

VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR Andrew Goldstein Yale University 68 High Street New Haven, CT 06511 andrew.goldstein@yale.edu Alexander Thornton Shawn Kerrigan Locus Energy 657 Mission St.

More information

Step 5: Conduct Analysis. The CCA Algorithm

Step 5: Conduct Analysis. The CCA Algorithm Model Parameterization: Step 5: Conduct Analysis P Dropped species with fewer than 5 occurrences P Log-transformed species abundances P Row-normalized species log abundances (chord distance) P Selected

More information

Customer Life Time Value

Customer Life Time Value Customer Life Time Value Tomer Kalimi, Jacob Zahavi and Ronen Meiri Contents Introduction... 2 So what is the LTV?... 2 LTV in the Gaming Industry... 3 The Modeling Process... 4 Data Modeling... 5 The

More information

Comparing Alternate Designs For A Multi-Domain Cluster Sample

Comparing Alternate Designs For A Multi-Domain Cluster Sample Comparing Alternate Designs For A Multi-Domain Cluster Sample Pedro J. Saavedra, Mareena McKinley Wright and Joseph P. Riley Mareena McKinley Wright, ORC Macro, 11785 Beltsville Dr., Calverton, MD 20705

More information

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll THE FIELD POLL THE INDEPENDENT AND NON-PARTISAN SURVEY OF PUBLIC OPINION ESTABLISHED IN 1947 AS THE CALIFORNIA POLL BY MERVIN FIELD Field Research Corporation 601 California Street, Suite 210 San Francisco,

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Globally Optimal Crowdsourcing Quality Management

Globally Optimal Crowdsourcing Quality Management Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University akashds@stanford.edu Aditya G. Parameswaran University of Illinois (UIUC) adityagp@illinois.edu Jennifer Widom Stanford

More information

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3. IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts

Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts Page 1 of 20 ISF 2008 Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts Andrey Davydenko, Professor Robert Fildes a.davydenko@lancaster.ac.uk Lancaster

More information

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance

More information

The Youth Vote in 2012 CIRCLE Staff May 10, 2013

The Youth Vote in 2012 CIRCLE Staff May 10, 2013 The Youth Vote in 2012 CIRCLE Staff May 10, 2013 In the 2012 elections, young voters (under age 30) chose Barack Obama over Mitt Romney by 60%- 37%, a 23-point margin, according to the National Exit Polls.

More information

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

More information

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

More information

Inequality, Mobility and Income Distribution Comparisons

Inequality, Mobility and Income Distribution Comparisons Fiscal Studies (1997) vol. 18, no. 3, pp. 93 30 Inequality, Mobility and Income Distribution Comparisons JOHN CREEDY * Abstract his paper examines the relationship between the cross-sectional and lifetime

More information

CRM Forum Resources http://www.crm-forum.com

CRM Forum Resources http://www.crm-forum.com CRM Forum Resources http://www.crm-forum.com BEHAVIOURAL SEGMENTATION SYSTEMS - A Perspective Author: Brian Birkhead Copyright Brian Birkhead January 1999 Copyright Brian Birkhead, 1999. Supplied by The

More information

VANDERBILT AVENUE ASSET MANAGEMENT

VANDERBILT AVENUE ASSET MANAGEMENT SUMMARY CURRENCY-HEDGED INTERNATIONAL FIXED INCOME INVESTMENT In recent years, the management of risk in internationally diversified bond portfolios held by U.S. investors has been guided by the following

More information

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Performance Level Descriptors Grade 6 Mathematics

Performance Level Descriptors Grade 6 Mathematics Performance Level Descriptors Grade 6 Mathematics Multiplying and Dividing with Fractions 6.NS.1-2 Grade 6 Math : Sub-Claim A The student solves problems involving the Major Content for grade/course with

More information

GEOENGINE MSc in Geomatics Engineering (Master Thesis) Anamelechi, Falasy Ebere

GEOENGINE MSc in Geomatics Engineering (Master Thesis) Anamelechi, Falasy Ebere Master s Thesis: ANAMELECHI, FALASY EBERE Analysis of a Raster DEM Creation for a Farm Management Information System based on GNSS and Total Station Coordinates Duration of the Thesis: 6 Months Completion

More information

Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data Pablo Barbera, New York University

Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data Pablo Barbera, New York University Connective Action in European Mass Protest Eva Anduiza, Autonomous University of Barcelona The paper analyzes the extent to which digitally networked action is making a difference in political involvement.

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Automated Statistical Modeling for Data Mining David Stephenson 1

Automated Statistical Modeling for Data Mining David Stephenson 1 Automated Statistical Modeling for Data Mining David Stephenson 1 Abstract. We seek to bridge the gap between basic statistical data mining tools and advanced statistical analysis software that requires

More information

Vasicek Single Factor Model

Vasicek Single Factor Model Alexandra Kochendörfer 7. Februar 2011 1 / 33 Problem Setting Consider portfolio with N different credits of equal size 1. Each obligor has an individual default probability. In case of default of the

More information

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR) 2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

More information

VIRGINIA: TRUMP, CLINTON LEAD PRIMARIES

VIRGINIA: TRUMP, CLINTON LEAD PRIMARIES Please attribute this information to: Monmouth University Poll West Long Branch, NJ 07764 www.monmouth.edu/polling Follow on Twitter: @MonmouthPoll Released: Thursday, 25, Contact: PATRICK MURRAY 732-979-6769

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll THE FIELD POLL THE INDEPENDENT AND NON-PARTISAN SURVEY OF PUBLIC OPINION ESTABLISHED IN 1947 AS THE CALIFORNIA POLL BY MERVIN FIELD Field Research Corporation 601 California Street, Suite 210 San Francisco,

More information

INTRODUCTION... 2. I. Participation in the 2014 European elections... 3

INTRODUCTION... 2. I. Participation in the 2014 European elections... 3 ?? Directorate-General for Communication PUBLIC OPINION MONITORING UNIT 2014 EUROPEAN ELECTIONS DESK RESEARCH Brussels, April 2015 Profile of voters and abstainees in the European elections 2014 INTRODUCTION...

More information

A Guide to Understanding and Using Data for Effective Advocacy

A Guide to Understanding and Using Data for Effective Advocacy A Guide to Understanding and Using Data for Effective Advocacy Voices For Virginia's Children Voices For V Child We encounter data constantly in our daily lives. From newspaper articles to political campaign

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

Executive Summary. Public Support for Marriage for Same-sex Couples by State by Andrew R. Flores and Scott Barclay April 2013

Executive Summary. Public Support for Marriage for Same-sex Couples by State by Andrew R. Flores and Scott Barclay April 2013 Public Support for Marriage for Same-sex Couples by State by Andrew R. Flores and Scott Barclay April 2013 Executive Summary Around the issue of same-sex marriage, there has been a slate of popular and

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Technical note I: Comparing measures of hospital markets in England across market definitions, measures of concentration and products

Technical note I: Comparing measures of hospital markets in England across market definitions, measures of concentration and products Technical note I: Comparing measures of hospital markets in England across market definitions, measures of concentration and products 1. Introduction This document explores how a range of measures of the

More information