(a) Write a paragraph summarizing the key characteristics of this study. In particular, identify:

Transcription

1 Data Collection: (a) Write a paragraph summarizing the key characteristics of this study. In particular, identify: Observational units and sample size The explanatory variable and response variable (as set up in the research conjecture) and classify each as quantitative or categorical The type of study (observational or randomized experiment). Justify your answer. A brief description of the sample used in the study In particular, was this a random sample selected from a larger population? >> The observational units are 40 student volunteers. The sample size is 40: student volunteers at University College, London. This was not a random sample, the students volunteered for the study. The response variable is brain size, and the explanatory variable is number of Facebook friends; both are quantitative. This is an observational study; the explanatory variable was not imposed by the researchers. Descriptive Statistics: (b) In describing scatterplots, our goal is to describe the association between the two variables. To do so, we focus on three characteristics: direction, strength, and form. Is the direction of the association as expected by the researchers? State specifically what the researchers believed the relationship would be. Would you consider the association strong? (In other words, does knowing a person s number of Facebook friends help you to accurately predict their brain density in this region?) Explain. Does the pattern of the association seem reasonably well modeled by a line? (Or, is there curvature or some other prominent pattern in the plot?) 1

2 >> The scatterplot appears linear, with a fairly weak positive correlation. Yes, the direction of the association is in the direction expected by the researchers; they conjectured that Facebook use swells the size of brain regions devoted to tracking and maintaining relationships, and therefore that there is a positive correlation between the density of those areas and the number of Facebook friends an individual has. The dots are not very concentrated along a specific line and show a weak but present positive association; I wouldn t feel very comfortable predicting someone s brain density in this region because there appears to be quite a range of brain densities for the same number of Facebook friends. good Though the scatterplot isn t obviously concentrated along a specific line, the pattern of association would be best modeled by a line as opposed to a curve. Numerical Summary: (c) Report the value of this correlation coefficient. Does the correlation coefficient indicate a positive or negative linear association here? Does it seem strong (r close to 1 or -1) or weak (r close to 0)? >> r=.366. This correlation coefficient is closer to zero than it is to 1; its positive value indicates a positive linear association. ok Interpreting the Least Squares Regression Equation: Include screen capture of scatterplot with regression line superimposed. 2

3 (d) Identify the values of the intercept and the slope, and (following the example) provide a detailed (statistical) interpretation of both the slope and intercept coefficients of the least squares regression line (red) in the Facebook/Brain Density study context. Hint: Remember that the explanatory variable here is how many hundred Facebook friends the person reports. Slope value: >> Interpretation: >> This slope coefficient of.2009 is the predicted change in the value of the brain density (of this certain region) associated with a oneunit increase in the number of hundreds of Facebook friends an observational unit has. The predicted increase in brain density (of this brain area) associated with each additional one hundred friends is.2009 arbitrary units ; how much denser we expect a person's grey matter to be if the number of Facebook friends is one hundred friends greater. Intercept value: >> Interpretation: >> This intercept coefficient of (-.7404) is the predicted value of the brain density (of this certain region) when the number of how many hundreds of Facebook friends the observational unit (student) has is zero.this model predicts that a person with zero Facebook friends has brain density (in that region of the brain) of Which doesn t make sense for this value, but appears to model the majority of the data well. Statistical Inference - Simulation: (e) If our research conjecture is that more Facebook friends will tend to correspond to larger brain density values, what does this imply about the value of the population slope? State this, in symbols, as the alternative hypothesis. >> Ha: >0 (f) Did you obtain the same regression line as the researchers (in red)? Is the association still positive in the simulated scatterplot? Is the relationship stronger or weaker than the one observed by these researchers (how are you deciding)? >> No, the new line is not the same, nor positive as our original regression line; the blue shuffled regression line displays a negative correlation coefficient of The relationship is weaker than the one observed by these researchers because itclarify is closer to zero than our previous value of.366, reflecting a weaker association between the explanatory and response variables; the closer to 1 (or -1), the stronger the association. (g) Did you obtain the same (blue) regression line the second time? >> No; r=.040, noting an extremely weak but regardless, positive association, different from the first shuffled regression line. (h) Notice that the two sample slope values you generated have been added to the dotplot on the right. Suppose you repeat these shuffling 1000 times, where do you think this distribution will be centered? Why? >> I predict the distribution will be centered around zero because with the null distribution we are assuming the null hypothesis is true, and that there is ZERO association between brain density and number of Facebook friends. (in population) 3

4 (i) Press the Shuffle Y-values button 8 more times. Describe the pattern you are seeing in how the regression lines on the scatterplot vary from sample to sample (what pattern is there to the blue lines on the scatterplot) as if to someone who couldn't see the same picture. >> Sample to sample, the shuffled regression lines vary from having positive slopes to negative slopes, but always intersect at the same point on the scatterplot. The blue lines do not display very slanted slopes and are always relatively close together; they are not as extreme in direction as the original red regression line. Does the observed regression line for this study (in red) appear to be extreme (compare to the simulated blue lines)? >> The observed regression line does not lie within the overlay of the blue regression lines and is definitely more extreme than the blue regression lines, but not distinctly so. (j) Describe the shape, center (mean), and spread (standard deviation) of the dotplot of the resulting null distribution of sample slopes. Including: what are the observational units and variable of this distribution? Also, make sure you describe what the standard deviation is measuring (Hint: What is the variable?). >> The null distribution is normal and symmetric, centered around 0.00 (mean), with a standard deviation of.086 as shown by the small amount of variability and short spread. The standard deviation is measuring the variability in the calculated slopes by all 1000 shuffles, specifically how similar the slopes of the simulated null distribution are to one another, and in relation to the center of the distribution, all under the assumption there is no association between grey matter density and number of Facebook friends. The observational units are the 1000 shuffles of 40 cards with their corresponding values (the values indicate the intersection of the density value and the number of Facebook friends value on the scatterplot), and the variable is the slope of each resulting regression line. (k) Include a screen capture of your output (null distribution and p-value) and provide a one-sentence summary of this p-value in this context. Paste a copy of the null distribution of sample slopes with shading here. 4

5 p-value interpretation>> If I shuffle a set of 40 cards with values representing the intersections? of the density value and number of Facebook friends many, many times, I would expect to see a resulting slope of.2009 (for the regression line) or more extreme (greater) in only.7% of shuffles. Assuming the null hypothesis to be true - 1/2 (l) What conclusion will you draw from this p-value? >> This p-value of.007 is less than.05 and therefore statistically significant; I can reject the null hypothesis that states there is NO association between number of Facebook friends and grey matter density in regions of the brain responsible for social perception and associative memory. in favor of? (m) Using the standard deviation you found in (j), how many standard deviations is the observed sample slope (0.2) from the hypothesized slope (0.0)? Show your work. >> ( )/.086= The observed sample slope is standard deviations from the hypothesized slope. Statistical Inference Theoretical Model (n) Does the theoretical p-value provide a reasonable approximation to the simulated p-value? >> The theoretical p-value provides a reasonable approximation to the simulated p-value; both are statistically significant, less than.05. What two values are you comparing? -1/2 Include a copy of the overlaid t-distribution and your Regression Table here. 5

6 (o) How does the SE value corresponding to the friends row compare to the standard deviation of the slopes found in (j)? >> The SE value for friends is.0830, which is less than but very close to the standard deviation I found in J,.086. (p) How does this t-value for the friends row compare to what you found in (m)? >> This t-value for friends is 2.42, which is also very similar to what I found in M, which represents how many standard deviations the observed statistic is from the mean. (q) How does the (two-sided) p-value for the friends row of the Regression Table output compare to the one-sided p-value you estimated in (k)? >>The two sided p-value for the friends row of the regression table is.0204, which is greater than the one sided p-value I estimated in K, but still statistically significant (less than.05 and very small) regardless. why is it larger? (r) Using the Two Standard Deviation method, use the observed slope and the standard error reported from the Regression Table output (or the dotplot of simulated slopes) to approximate a 95% confidence interval for the population slope. Show your work and write a one-sentence interpretation of your interval. Calculation >>.2+2(.0830)= (.083)=.034 Confidence Interval: (.034,.366) 6

7 Interpretation >> I m 95% confident that in the population, each additional one hundred Facebook friends is associated with an increase in arbitrary units on average between.034 and.366 in brain matter density. Conclusions: (s) Write a few sentences summarizing your analysis for this research study, in particular: What did you learn about the relationship between Facebook friends and brain density measurements for this sample from the scatterplot and the correlation coefficient about the sample data? State the hypotheses that we were testing about the population in words (Hint: Back in part e). Write a sentence interpreting the one-sided p-value you found in the context of the Facebook study (What is it the probability of?). What are your final conclusions about the relationship between Facebook friends and brain density for this population and how do they follow from your p-value? What is your estimate of the population slope, make sure you include an interpretation of what we mean by 'slope' in this context? What population are you willing to generalize these results to? (Briefly justify.) Are you able to draw a cause-and-effect conclusion between Facebook friends and brain density in this population? (Briefly justify.) What concerns do you have about how these data were collected and how that might impact the conclusions you can draw form this study? (Your answer should discuss at least one potential non-sampling error.) >> According to the statistically significant p-value and t-statistics calculated in this study, I learned that there may be a positive relationship between the number of stated Facebook friends and the density of certain areas in the brain responsible for social perception and associative memory, or at least that in this group of students more distinction on sample vs. population/ First just talk about the population? I can reject the assumption that there is NO association between the two. The observed correlation coefficient of.366 does not show a strong association, but regardless shows a statistically significant association between the observed variables. In testing the population, we hypothesized (our null) that there is no association between number of Facebook friends and brain density; our alternative hypothesis stated that there IS an association, and that the association is positive; that with an increase in Facebook friends comes an increase in brain density of those specific regions. My repeated p-value interpretation: If I shuffle a set of 40 cards with values representing the intersections of the density value and number of Facebook friends many, many times, (null hypothesis) I would expect to see a resulting slope of.2009 or more extreme in only.7% of shuffles. Essentially, the probability of obtaining results as extreme as these is highly unlikely (only.7%) under the assumption the null hypothesis is true. I cannot draw causation because this sample is not random, nor is random assignment used. There is potential for confounding variables in this study, primarily the idea that those who volunteer may in general be more sociable, outgoing people, with naturally denser brain regions responsible for sociability. This isn t a confounding variable, it s a source of sampling bias As well, the students self-reported the number of friends, and may have stated numbers greater than accurate.!!! Mixed up your random sampling/random assignment and consequences a bit too much -1/2 Keep in Mind: There is a difference between the strength of a relationship and the strength of the evidence against the null hypothesis 7

8 Application: (a) Create the scatterplot and calculate the least squares line. Also include the Parameter Estimates output. Include a copy of scatterplot and regression line Include a copy of Regression Table output (b) How do does the regression line compare to the follow-up data for 40 subjects that you examined in this lab? Comparison >> The two lines are very similar in slope and intercept while just eye-balling them on the scatterplots. As well, the equations model each other very closely: Original study: x Latter study: x (c) From the Regression Table output, how do the standard error for the slope, t Ratio, and p-value for the slope compare to the original analysis on the follow-up data? Explain why these changes make sense based on what is different between the two data sets. 8

9 >> The SE in this latter study is smaller than in the original due to the greater sample size; the t-stat is greater, and the p-value is smaller, both indicating stronger evidence against the null hypothesis, also due to the greater sample size. a bit more on why the larger sample size does this? -1/2 Extra Credit: Why do you think this method of "look at a bunch of regions to see which are significant" is often criticized? (Hint: If I run lots of tests of significance using a 5% level of significance, how often will I reject the null hypothesis when I really shouldn't have?) >> When using this method of looking at all the regions individually to see which are statistically significant it is easier to reject the null hypothesis more often than not because a 5% level of significance is not that much when there is a great potential for outliers; the individual sets of data, if extreme, have a greater possibility to skew the p-value in a direction that favors the alternative hypothesis, which might not and probably doesn t represent the overall data. More than if you run 100 tests like this, 5 will reject the null hypothesis when it shouldn t +1/2 9