Protocols for Randomized Experiments to Identify Network Contagion
|
|
|
- Noah Lindsey
- 9 years ago
- Views:
Transcription
1 Protocols for Randomized Experiments to Identify Network Contagion A.C. Thomas Michael Finegold March 14, 2013 Abstract Identifying the existence and magnitude of social contagion, or the spread of an individual trait along ties in a social network, is a challenging task due in part to the tendency of individuals with similar characteristics to connect, also known as homophily. While randomized experiments on individuals of a network would seem to be the ideal method for establishing contagion, there are still considerable methodological issues stemming from structural considerations, including the likelihood of inclusion in the sample group and the implicit dependence between units due to latent homophily. We construct a protocol that correctly adjusts for these factors in a number of experimental situations. 1 INTRODUCTION The phenomenon of contagion on social networks is of considerable interest to researchers and practitioners in marketing, sociology, biology and technology, as evidenced by the very nature of viral marketing as a field of study and investment; it assumes that one need only manipulate a small number of nodes on a network before a message is able to spread across a much greater share of the population. While the promise of its exploitation is considerable, the challenge of identifying whether or not a trait is viral that is, whether a particular behavior can actually spread on a network is even more difficult to determine accurately. The general term network autocorrelation encompasses many reasons why two members of a social network would have characteristics similar to one another. Two of the simplest explanations are contagion, in which one friend influences another s adoption choices, and homophily, in which similar individuals become friends on the basis of shared interests or opportunities (Manski, 1993). While many observational studies have sought to distinguish the two, it is rarely simple; latent homophily, for example, is a phenomenon where a homophilous factor that causes a friendship to form may also cause a change in personal characteristics, which can be mistaken for contagion (Shalizi and Thomas, 2011). All these problems have 1
2 emphasized the role that experiments can play in identifying whether or not contagion has actually taken place. By randomizing a treatment condition over a different number of units, one can isolate the direct contribution from observable characteristics while removing the contribution of confounding factors. One potential approach is to construct a social network from scratch, under the complete control of the experimenter, such as those commissioned by Centola (2010; 2011). In this case, a health website with social network components was created de novo, and new visitors were organized into small-scale social networks according to their characteristics and particular experimental conditions on network topology and enforced homophily. The adoption of a trait (the use and maintenance of a diet log) could only be achieved if a directly connected peer had themselves adopted it, so that the trait could spread only through the social network. While this approach has its merits in a smaller-scale system, it has little applicability to large social network experiments in vivo, in which the conditions that spawned and grew the network are largely unknown. The typical social network experiment is what we will call the viral marketing design: distribute an incentive-style treatment to a random sample of the population the manipulated group and compare the rate their contact adopt a certain behavior to the rate an unmanipulated group s contacts adopt the same behavior. In particular, if a real contagion effect is small enough, we may be justified in ignoring spillover effects, or extensions of the contagious process beyond the original units. Adoption of an online service is a common choice in information systems(bapna and Umyarov, 2012; Aral and Walker, 2012), and the effect sizes in these studies tend to be small enough that our assumption is reasonable. The principle at work in this experimental design is that the manipulation can be very subtly achieved: one can select a small subset of nodes in a network and indirectly manipulate many more, proportional to at least the number of individuals who are directly connected, which makes this design seem compelling. Since the manipulated units were chosen at random, they should have no systematic differences from the population. The problem lies in the fact that we are measuring outcomes on the followers of the manipulated and unmanipulated individuals, not the manipulated and unmanipulated groups themselves. The groups of contacts may be systematically different from groups chosen at random from the population. Two effects have particular impact on this sort of study, even when the experimental set-up is taken into account: Latent homophily Individuals with similar characteristics tend to form more connections with each other than with those whose characteristics are different. If these factors also contribute to the observed behavior of an individual, this can be mistaken for contagion 2
3 (Shalizi and Thomas, 2011); in general, it can be difficult to disentangle this even with time-dependent models. In a perfectly balanced experiment, each treated unit will have a matching control, and so the expected value of the difference will not be affected. Because the units within each group can be slightly dependent by design, the variance of the estimator can be considerably higher from the covariance between units; without taking this into account, standard test statistics for differences, such as the two-sample t-test, will not have the level of coverage as originally advertised by the number of discrete units in each sample. Since we suspect dependence between units, and it is simple to construct a corrected statistical test based on a null distribution with independent units, we show that using clusters of friends, based on the original seed nodes, corrects the is discrepancy in coverage properties. Inclusion by degree Even though one can select a subset of the population uniformly at random, it is not at all clear that the actual distribution of experimental units, the social followers of the original subset, will be representative of the population. In particular, while the selection process for seed nodes can be uniformly random, it is well-known that processes that crawl the social graph are biased in favor of well-connected individuals. If the probability of inclusion is dependent on the outcome being measured, then the estimator of the population mean will be biased. A standard correction for this is the estimator of Horvitz and Thompson (1952); this has since been used in some form in methods for sampling from a network that crawl the social graph, such as Respondent-Driven Sampling(Heckathorn, 1997; Gile and Handcock, 2010). We demonstrate that when considering each of these corrective factors in our estimators, we can control their associated factors appropriately in terms of coverage probabilities for this class of network experiments. We continue in Section 2 by illustrating the failure of standard statistical tests under this paradigm, particularly in Section 2.1 with simulated networks and Section 2.2 with a real-world network subset. We demonstrate how this applies in a quasi-experimental setting in Section 3 before discussing future extensions in Section 4. 2 DEMONSTRATION In these experiments, we have four primary groups of individuals on the network under inspection: Two groups of nodes, the manipulated M, and the unmanipulated U, corresponding to the original sample of individuals on the network (let L be the union of these 3
4 groups). These two groups together are typically sampled uniformly at random on the network, and nodes are partitioned between these groups through straight or blocked randomization. Two further groups of nodes, corresponding to the treated T, and the control C, are those individuals who name members of M and U in their social networks (that is, the followers of M and U respectively). The differential exposure to a manipulated node is the treatment in question. Nodes that are exposed to members of M and U simultaneously are excluded for balance purposes rather than simply included in T (Bapna and Umyarov, 2012); if the sample is small compared to the population, the impact on selection probability for the remaining nodes is minimal. We first generate synthetic networks to demonstrate how the standard sampling scheme can lead to unexpected consequences. 2.1 Simulated Networks and Outcomes For demonstration purposes, we consider a simple network model with two features: variability on the number of connections made by each individual (in terms of both inbound and outbound links), and a binary factor that drives a homophily-based mechanism. For each individual i [1,..., N] the binary factor, X i, is drawn from the Bernoulli distribution, X i Be(p), and the propensity to form inbound, α i, and outbound, β i, ties is drawn from the bivariate normal distribution, [ α i β i ] ([ ] [ 0 σα 2 N 2, 0 ρσ α σ β ρσ α σ β so that there is variability in both follower and followee count which may be related. Each potential directed edge, denoted as Z ij, is then drawn from a Bernoulli distribution, σ 2 β ]), Z ij Be(Φ(µ + α i + β j + γi(x i = X j ))), where γ > 0 ensures that ties are more likely to form between nodes with the same binary factor. This model draws elements from the p 1 model (Holland and Leinhardt, 1981) and the stochastic block model (Holland et al., 1983), but the characteristics of the networks it generates are known to be common to many real-world networks. 4
5 Define Z i. = j Z ij and Z.j = i Z ij, the out-degree and in-degree respectively. Let Z = i Z i./n be the grand mean degree. Once the network has been established, we generate a pair of potential outcomes for each individual corresponding to treatment and control conditions. For the sake of exposition, we consider two parts to this effect: the average treatment effect, and a variability in effect due to the number of a person s connections, particularly the number of people who they identify as friends (corresponding to out-degree). It is the average treatment effect that is typically of greatest interest. First, we generate the baseline (control) outcome according to the normal distribution, Y i (c) N (θ 1 X i, 1), where we choose θ 1 > 0 so that those units with the positive binary factor tend to have higher outcomes. We then generate the outcome under exposure to the treatment according to the normal distribution, Y i (t) N ( Y i (c) + θ 2 (Z i. Z) + τ, 1) ), where τ is the average treatment effect, and θ 2 > 0 means that units with higher out degree tend to be more positively affected by the treatment. It remains to choose the parameters (ρ, σ α, σ β, µ, γ, θ 1, θ 2, τ) to test the mechanism. For the sake of these trials, we choose parameters that lead to social networks with reasonable properties: a network with n = 10, 000 nodes, a mean degree of 10, with the majority of individual degrees between 3 and 30, is adequate for our purposes. In particular, we choose θ 1 > 0 and θ 2 > 0 for clarity of explanation, though neither of these signs must be constrained in real examples. Once we identify the status of our nodes as being from the treatment or the control, we construct the test statistic in the usual fashion, ˆτ = k Y kw k (t)i(k T ) k W ki(k T ) k Y kw k I(k C) k W ki(k C) where the weight W k is uniform in the standard case, and 1/Z k. in the Horvitz-Thompson case. Figure 1 demonstrates the distribution of p-values under the null hypothesis of τ = 0 for comparing the original manipulated node sets M and U, as well as the experimental unit sets T and C. While the distribution is uniform as expected in the first case, in a network with either latent homophily or a dependence on degree, the p-value distribution for the two-sided t-test is heavily skewed towards zero. 5
6 Simple t test, tau=0: M vs U Simple t test, tau=0: T vs C Figure 1: Distributions of p-values under standard t-tests for various groups of nodes from simulated networks. Each group represents nodes under hypothetical treatment and control, under the null hypothesis, when latent homophily is present. Left: the comparison of M and U, who were chosen uniformly at random from the population; the distribution is uniform as expected. Right: the comparison of T and C, who are followers of M and U respectively, and are autocorrelated due to latent homophily; the distribution of p-values is shifted in the extreme towards zero. Figure 2 shows how the situation improves with each change to the process. First, consider permutation tests for the null hypothesis. The simplest method is to permute the labels for membership in T and C directly, which yields incorrect coverage. The next correction is the use of the Horvitz-Thompson correction, so that only latent homophily still plays a role, increasing the effective variance. Continuing further, we permute instead the labels on M and U, and reassign the labels of T and C accordingly. This move from full node permutation to cluster-based permutation permuting groups so that all followers of any particular seed node in M or U stay together restores the uniform distribution of p-values for the null distribution. Second, we explore the properties of bootstrap confidence intervals under the alternate hypothesis τ 0 under the same principles. As expected, Horvitz-Thompson estimation alone is insufficient to correct the distribution of p-values with respect to the true generative value of τ. Simple block bootstrapping does not completely fix the problem; as in the Horvitz- Thompson estimator, the probability of sampling a cluster for the bootstrap must also be weighted by the inverse of the number of nodes to correct for oversampling, before the HT correction to the estimate is made once again. 6
7 Node Perm Test, tau=0 Node Bootstrap, HT Node Perm Test, tau=0. HT Block Bootstrap, HT Block Perm Test, tau=0, HT Weighted Block Bootstrap, HT
8 2.2 Real-World Networks with Autocorrelation Our simulations demonstrate that the design and analysis previously described can lead to false discoveries of contagion if there is sufficient latent homophily present, let alone a differential effect size due to degree. For the sorts of networks that we have access to, and wish to conduct experiments on, it is prudent to investigate whether there is significant observable homophily on observable characteristics on these real-world networks, and whether this is a practical concern. We consider a subset of the Twitter network representing Singapore followers of Korean Pop music (or K-Pop). Elicited behaviors on Twitter can include the spread of information, from simple hashtags, to the sharing of news articles, to a full viral marketing campaign that leads to the purchasing of real-world products. A typical treatment applied to the target group is a message with a link to a website and enrollment incentive, to see if it has any effect on the enrollment of their followers. Our subset was derived from a complete, multi-stage snowball sample (Goodman, 1961) whose seed nodes originate in Singapore. We excluded private accounts for which we could not gather information, and expanded the set to included the neighbors of the remaining public users. Of these, we selected those users who follow one of 50 identified K-pop news sources. The final set contains 7,283 users. To conduct our experiment, we choose L to be a uniform random sample of 500 nodes from the final set, randomly assigning 250 to M and the other 250 to U. We then identify our treatment and control groups from the entire population of Twitter users. We identify the set of users T, who are in neither M nor U, follow at least one user in M, but follow no users in U. Similarly, we identify the set C as those users in neither M nor U who follow at least one user in U, but follow no users in M. At this point, to demonstrate the properties of this sampling mechanism when there is no treatment, we apply our phantom manipulation (such as Send No Message, or SNM) to users in M and do absolutely nothing (DAN) to users in U. We then measure the outcomes of interest for T and C and perform significance tests in the usual manner. In particular, we measure four response variables: time of last tweet, friend count, whether the user is in the Singapore time zone, and if the user s language is English. For the first two we perform a standard t-test and for the last two we perform Fisher s exact test. We compare the responses for two pairs: between M and U and between T and C. In all, we perform a total of eight tests and collect eight p-values. We repeat the entire process (sample M and U, create T and C, and measure the outcomes) 1000 times. Since there is no actual difference between our mock treatment and control conditions, a proper significance test should assign p-values according to the U nif orm(0, 1) distribution. 8
9 Figure 3 suggests that while the p-values for comparing M to U do indeed follow an approximately uniform distribution (since M and U are drawn at uniform from the population), there are serious deviations from the uniform distribution in each outcome. In particular, p-values less than 0.05 occur times more often than would be expected. This demonstrates the practical implications for studies that treat such p-values as indicators of statistical significance. It is true that the network autocorrelation for both tweet time and friend count is plausibly caused by contagious mechanisms; a tweet by a user might cause several followers to retweet, or to follow the same users. It is far less plausible that time zone and language are contagious in this way. Friends could conceivably move closer to each other, or learn a new language, but structural homophily seems a more reasonable explanation for the observed autocorrelation; that is, those with a common language and in a common location are more likely to become friends (or followers, in this case) than those with different languages or in different locations. In either case, the difference between the T and C groups clearly can not be explained by the different effects of our treatment and control conditions. A real-world test we might perform would tend to falsely conclude that a certain manipulation of one user has an effect on his followers behavior. It would be to an experimenter s benefit to pre-screen the data in this manner, by first performing simulated no-treatment experiments as just described; if tests comparing the outcome of interest (e.g., whether a subscription was purchased in the previous month) between the two groups yields near uniform p-values, then one can conduct the desired experiment with different treatments. This is less than ideal there is no guarantee that network behavior is static over time and in any case it is better to proceed according to the methods we have described. 3 QUASI-EXPERIMENTAL APPLICATION We can further verify the usefulness of the method by testing its worthiness in quasiexperimental situations. The propagation of information on Twitter, for example, may be treated as entirely endogenous, making a true experimental design quite difficult to achieve. However, we can still consider a quasi-experimental design to estimate the effects of network treatment effects. Consider another extraction of the Singapore Twitter network; in this subset, we take those users with 100 or more total followers and construct a new sub-network. On August 9, 2012, the country celebrated Singapore s 47th National Day with a parade, and a unique hashtag for the event, #ndp2012, was expressed by many users before and during the day, 9
10 M v U T v C Tweet Time Tweet Time Friend Count Friend Count Singapore Time Singapore Time English English
11 Property Size of subnetwork 4586 Total Uses in Hour 1 33 Total Uses in Hours Fraction of Uses (T) Fraction of Uses (C) Standard t-test p-value (1-side) 0.02 Block Permutation p-value (1-side) 0.11 Table 1: Properties of the propagation test of hashtag #ndp2012 on August 9, 2012 for the Singapore Twitter community. The unadjusted t-test p-value is less than 0.05; with the appropriate test, correcting for homophily and follower bias, the p-value rises dramatically. peaking while the parade itself was in progress. The particular hashtag was publicly advertised in advance of the event, rather than arising spontaneously by one or more users of the microblogging service. For a quasi-experimental design, we consider the use of the hashtag during a one-hour period before the beginning of the parade as the initial manipulated group M; the treated user set T are that user s followers. We then select a control set U by matching on indegree and out-degree respectively; the followers of U then become the control group C. The outcome of interest is the use of the hashtag in each of the three hours that followed. As shown in Table 1, the original t-test to compare the use of the hashtag in T and C gives a p-value of 0.02 for the fraction of uses of the hashtag in the treated group compared to the control, which is statistically significant for α = 0.05; the effective use rate is roughly doubled, which suggests a reasonably strong effect size overall. However, latent homophily appears to play a strong role in this difference as well; using the block permutation test, the p-value now rises to 0.11, removing statistical significance from this result. 4 DISCUSSION Many attempts have been made to identify influence in social networks. The difficulty of distinguishing social influence from latent homophily in observational studies has been demonstrated, however, even for dynamic studies over long time periods. One prescription for identifying influence is the design of randomized experiments. We demonstrate in this paper, however, that even in the case of randomized experiments, researchers can still mistake homophily for influence if adequate care is not given to the method of randomization. We have shown how one can correct for latent homophily and degree bias when analyzing the results from a particular experimental design. Given these complications, it is worth 11
12 asking how we can design experiments with greater power. The experimental protocol we describe implies that we can only manipulate the status of selected nodes, then observe the outcomes on the rest of the network, beginning with their neighbors. The very notion that homophily exists, however, suggests that we can get the most leverage by using a mechanism that can block-randomize on members of the same cluster. Manipulating the exposure of each individual in a group to the treatment would seem to be the most effective means of achieving that. That is, select a sample of users for inclusion in M. Then, for each user in M, randomly select a subset of their followers and remove their ability to see the results of the manipulation. We are not always in a position to manipulate what an individual sees, however, and are often limited to the incentive-style framework described in this paper. We then seek to ask, given a fixed number of manipulations, if there is a better way to choose the set M. We may suspect, for example, that users who follow only a few other users are more likely to be influenced by a manipulation on a user they follow. The original sampling scheme will under-sample users with low out degree. Even if corrected for in analysis, the amount of under-sampling may lead to very low power. One easy correction for this is to do a weighted sampling of the M group, with higher selection probability given to users whose followers have low out degree. There is an inevitable trade-off, however, between the selection bias corrected for in the groups T and C and the selection bias introduced for the groups M and U. Improving design may depend heavily on modeling assumptions, prior assessment of parameter variability, and the specific domain. Acknowledgements This work was partly funded through the authors involvement with the Living Analytics Research Center. Thanks to Xiao Hui Tai and Agus Kwee for preparing the Twitter networks for analysis. References Aral, S. and Walker, D. (2012). Social Networks. Science, 337. Identifying Influential and Susceptible Members of Bapna, R. and Umyarov, A. (2012). Do Your Online Friends Make You Pay? A Randomized Field Experiment in an Online Music Social Network. NBER Working Paper Series
13 Centola, D. (2010). The Spread of Behavior in an Online Social Network Experiment. Science, Centola, D. (2011). An Experimental Study of Homophily in the Adoption of Health Behavior. Science, 334. Gile, K. J. and Handcock, M. S. (2010). Respondent-Driven Sampling: An Assessment of Current Methodology. Sociological Methodology, 40. Goodman, L. (1961). Snowball sampling. Annals of Mathematical Statistics, Heckathorn, D. D. (1997). Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations. Social Problems, Holland, P., Laskey, K. and Leinhardt, S. (1983). Stochastic Block Models: First Steps. Social Networks, Holland, P. and Leinhardt, S. (1981). An Exponential Family of Probability Distributions for Directed Graphs. Journal of the American Statistical Association, Horvitz, D. G. and Thompson, D. J. (1952). A Generalization of Sampling Without Replacement From a Finite Universe. Journal of the American Statistical Association, Manski, C. F. (1993). Identification of Endogeneous Social Effects: The Reflection Problem. Review of Economic Studies, Shalizi, C. R. and Thomas, A. C. (2011). Homophily and Contagion Are Generically Confounded in Observational Social Network Studies. Sociological Methods and Research,
Sample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses
Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Inclusion and Exclusion Criteria
Inclusion and Exclusion Criteria Inclusion criteria = attributes of subjects that are essential for their selection to participate. Inclusion criteria function remove the influence of specific confounding
SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?
SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one? Simulations for properties of estimators Simulations for properties
Permutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
Statistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
Two-sample inference: Continuous data
Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected]
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected] Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
Chapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing
Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
Hypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
Standard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
Hypothesis Testing: Two Means, Paired Data, Two Proportions
Chapter 10 Hypothesis Testing: Two Means, Paired Data, Two Proportions 10.1 Hypothesis Testing: Two Population Means and Two Population Proportions 1 10.1.1 Student Learning Objectives By the end of this
Association Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
3.4 Statistical inference for 2 populations based on two samples
3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted
Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
Two-Sample T-Tests Assuming Equal Variance (Enter Means)
Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of
ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015
ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.
Sample Multiple Choice Questions for the material since Midterm 2. Sample questions from Midterms and 2 are also representative of questions that may appear on the final exam.. A randomly selected sample
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
PS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng [email protected] The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
Introduction to Fixed Effects Methods
Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed
Section 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
Likelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
Exact Nonparametric Tests for Comparing Means - A Personal Summary
Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola
Non-Inferiority Tests for Two Means using Differences
Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous
The Null Hypothesis. Geoffrey R. Loftus University of Washington
The Null Hypothesis Geoffrey R. Loftus University of Washington Send correspondence to: Geoffrey R. Loftus Department of Psychology, Box 351525 University of Washington Seattle, WA 98195-1525 [email protected]
Introduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
22. HYPOTHESIS TESTING
22. HYPOTHESIS TESTING Often, we need to make decisions based on incomplete information. Do the data support some belief ( hypothesis ) about the value of a population parameter? Is OJ Simpson guilty?
Introduction to Hypothesis Testing OPRE 6301
Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior
Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)
Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption
Non-Inferiority Tests for Two Proportions
Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which
NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)
NONPARAMETRIC STATISTICS 1 PREVIOUSLY parametric statistics in estimation and hypothesis testing... construction of confidence intervals computing of p-values classical significance testing depend on assumptions
individualdifferences
1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,
Nonparametric statistics and model selection
Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Inference for two Population Means
Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example
Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
Mind on Statistics. Chapter 12
Mind on Statistics Chapter 12 Sections 12.1 Questions 1 to 6: For each statement, determine if the statement is a typical null hypothesis (H 0 ) or alternative hypothesis (H a ). 1. There is no difference
Lecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test
Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Binomial lattice model for stock prices
Copyright c 2007 by Karl Sigman Binomial lattice model for stock prices Here we model the price of a stock in discrete time by a Markov chain of the recursive form S n+ S n Y n+, n 0, where the {Y i }
Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples
Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The
A Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420
BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test
This chapter discusses some of the basic concepts in inferential statistics.
Research Skills for Psychology Majors: Everything You Need to Know to Get Started Inferential Statistics: Basic Concepts This chapter discusses some of the basic concepts in inferential statistics. Details
Hypothesis testing. c 2014, Jeffrey S. Simonoff 1
Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there
Extracting Information from Social Networks
Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,
Non-Inferiority Tests for One Mean
Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Tests for Two Proportions
Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics
The Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 [email protected] August 15, 2009 NC State Statistics Departement Tech Report
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
Numerical Methods for Option Pricing
Chapter 9 Numerical Methods for Option Pricing Equation (8.26) provides a way to evaluate option prices. For some simple options, such as the European call and put options, one can integrate (8.26) directly
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
Interpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
The Wilcoxon Rank-Sum Test
1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:
Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a
Analysis of Variance ANOVA
Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
Statistiek I. Proportions aka Sign Tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen. http://www.let.rug.nl/nerbonne/teach/statistiek-i/
Statistiek I Proportions aka Sign Tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://www.let.rug.nl/nerbonne/teach/statistiek-i/ John Nerbonne 1/34 Proportions aka Sign Test The relative frequency
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
Point Biserial Correlation Tests
Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable
Chapter 3. Sampling. Sampling Methods
Oxford University Press Chapter 3 40 Sampling Resources are always limited. It is usually not possible nor necessary for the researcher to study an entire target population of subjects. Most medical research
Chapter 2. Hypothesis testing in one population
Chapter 2. Hypothesis testing in one population Contents Introduction, the null and alternative hypotheses Hypothesis testing process Type I and Type II errors, power Test statistic, level of significance
Principle of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
Stat 5102 Notes: Nonparametric Tests and. confidence interval
Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions
Multiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
Recall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
A General Approach to Variance Estimation under Imputation for Missing Survey Data
A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey
The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].
Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real
Time needed. Before the lesson Assessment task:
Formative Assessment Lesson Materials Alpha Version Beads Under the Cloud Mathematical goals This lesson unit is intended to help you assess how well students are able to identify patterns (both linear
Reflections on Probability vs Nonprobability Sampling
Official Statistics in Honour of Daniel Thorburn, pp. 29 35 Reflections on Probability vs Nonprobability Sampling Jan Wretman 1 A few fundamental things are briefly discussed. First: What is called probability
Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
Research Methods & Experimental Design
Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and
ONS Methodology Working Paper Series No 4. Non-probability Survey Sampling in Official Statistics
ONS Methodology Working Paper Series No 4 Non-probability Survey Sampling in Official Statistics Debbie Cooper and Matt Greenaway June 2015 1. Introduction Non-probability sampling is generally avoided
Variables Control Charts
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables
Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,
1 Error in Euler s Method
1 Error in Euler s Method Experience with Euler s 1 method raises some interesting questions about numerical approximations for the solutions of differential equations. 1. What determines the amount of
Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
Tests for One Proportion
Chapter 100 Tests for One Proportion Introduction The One-Sample Proportion Test is used to assess whether a population proportion (P1) is significantly different from a hypothesized value (P0). This is
