Experimental design and inferential statistics: An introduction. Dr. Alissa Melinger School of Psychology University of Dundee

Experimental design and inferential statistics: An introduction Dr. Alissa Melinger School of Psychology University of Dundee

Structure of Tutorial Block 1: Background and fundamental underpinnings to inferential statistics Block 2: Tests for evaluating differences Block 3: Tests for evaluating associations Block 4: New analyses that I m quite excited about Plus, your data and questions

Background and fundamental underpinnings to inferential statistics Block 1

The Goal We observe some behavior, of individuals, the economy, our computer models, etc. We want to say something about this behavior. We d like to say something that extends beyond just these observations, to future behaviors, past behaviors, unobserved behaviors.

Types of Analysis Descriptive statistics: summarizing and describing the important characteristics of the data. Inferential statistics: decide if a pattern, difference, or relation found with a sample is representative and true of the population.

Definitions Data Population: from a the population entire set of exhaustively entities that are collected classified data together from all relevant individuals No need for inferential statistics Population of native German speakers (large group) Population of native Lakhota speaker (medium sized group) Data from a sample collected data from a Population subset of 5 year of old your Lakhota population speakers (small group) Inferential Sample: is statistics subset of help entities determine that make if the up asample population reflects the population Random selection of 1000 native German speakers. Key point everyone is unique

Generalizing from a sample We want some measure of reassurance that our observations are representative of the population, and not just characteristic of the sample. Any DIFFERENCE we observe in descriptive statistics needs to be evaluated.

Is this difference REAL? Descriptive difference Milk costs.69 at Plus and.89 at Lidl Is this a real difference? Subjective difference Is the difference important enough to me? Is it worth my while to travel farther to pay less? Statistical difference Is Plus generally cheaper than Lidl? How representative of the prices is milk? Are all Plus stores cheaper than all Lidl? How representative of all Plus stores is my Plus?

Is a difference REAL? To answer this question we could go through the store and compare every item in each store and call every Plus and Lidl in the world If we can get information on a whole population, we don t need inferential statistics. OR We can look at a Sample of products and stores and then use statistics to determine whether our observation is true of other unobserved products and shops. Important to choose sample well (not only from the chocolate aisle) Statistics help us determine how well our Sample represents the Population.

A simple model of the data Different statistical methods attempt to build a model of the data using hypothesized factors to account for the characteristics of the observed pattern. One simple model of the data is the MEAN Mode, median are other simple models Distributions, counts, spreads, range, ect.

The mean Subjects A B C D E F # siblings 1 3 2 0 4 1 How well does the mean model the data? ERROR Mean # siblings = 1.83

Error Variance is the average error between the mean and the observations. Sum of error is offset by positive and negative numbers Error is crucial to inferential statistics. Take the square of each error value #( x i " x) If you Sum don t of squared have Error errors in (SS) your will data increase (e.g., the if you are testing more data a model you collect. and 100 tests would give you the #( x i " x) 2 same Large result number because bad the estimate model of is # of deterministic) siblings you Divide do the not sum need of to squared do inferential errors by statistics. N-1 Variance (s 2 ) = SS/N-1

Variance Sum of error is offset by positive and negative numbers Take the square of each error value #( x i " x) Sum of squared errors (SS) #( x i " x) 2 Divide the sum of squared errors by N-1 Variance (s 2 ) = SS/N-1

Standard Deviation Variance gives us measure in units squared, so not comparable to directly to the units measured. Standard Deviation is measure of how well the mean represents the data. s = SS N "1

Sampling Sampling is a random selection of representative members of a population. If you had access to all members of a population then you would not need to conduct inferential statistics to see whether some observation generalizes to the whole population. Normally, we only have access to a (representative) subset. Most random samples tend to be fairly typical of the population, but there is always variation and the potential of selection bias.

Standard Error Standard errors (SE) are similar to SD but they apply to sample means rather than individual means. Standard errors give you a measure of how representative your sample is of the population. A large standard error means your sample is not very representative of the population. Small SE means it is representative. MSE = S / N

types of tests / types of data Which test to choose depends on the type of data you have and the question you are asking. Parametric tests have certain assumptions about the data. Non-parametric tests are assumption free

Parametric Assumptions about your data Normally distributed Independent ** Homogeneity of variance At least interval scale **

Normality Your data should be from a normally distributed population. Normal distributions are symmetrical bellshaped distributions with the majority of scores around the center.

Normal Curves Normal curves can be defined by their mean and variance. Z-distribution has mean = 0 and variance = 1

Normal Curves 95% of the time, a sample population will fall in the white part of the distribution. Parametric tests assume your data conform to this pattern 95% of cases fall within 2 standard deviations of mean

Homogeneity of Variance The variance should not change systematically throughout the data. When you test different groups of subjects (monolinguals vs. bilinguals; test vs. control; trained vs. untrained), their variances should not differ. If you test two corpora, the variance should not differ.

Independence Data from different subjects (speakers, sentences) are independent. If trial n influences behavior on trial n+i, then trials are not independent. If two participants related (friends, partners), behavior might not be independent. Binary (either X or Y) classifications are nonindependent If you measure distance between anaphor and antecedent and you have more than one anaphor per antecedent, the individual distances will not be independent.

Types of Data Nominal scale (qualitative): Numbers represent qualitative features, not quantitative. 1 not bigger than 2, just different; 1=masculine, 2 = feminine Ordinal Scale (qualitative): Rankings, 1<2<3<4, but differences between values not important or constant; Likert scale data. Distance between 1&2 distance between 3&4 Interval Scale (quantitative): like ordinal, but distances are equal Differences make sense, but ratios don t (30-20 =20-10, but 20 /10 is not twice as hot) e.g., temperature, dates Ratio Scale (quantitative): interval, plus a meaningful 0 point. Weight, length, reaction times, age

Types of Measurement Scale Nominal Ordinal Interval Ratio Quantity Relative Quantity Quantity No No No Yes ID males vs. Females Judge who is 1st, 2nd, 3rd Convey over & under estimates Measure # of correct answers on test

Types of Measurement Scale Nominal Ordinal Interval Ratio Quantity / Categories No Relative No Quantity No Quantity Yes ID males vs. Females Judge who is 1st, 2nd, 3rd Convey over & under estimates What does the scale indicate? Measure # of correct answers on test

Types of Measurement Scale Nominal Ordinal Interval Ratio Quantity / Categories No Relative No Quantity No Quantity Yes ID males vs. Females Judge who is 1st, 2nd, 3rd Convey over & under estimates Is there a true Zero? Measure # of correct answers on test

Types of Measurement Scale Nominal Ordinal Interval Ratio Quantity / Relative Quantity Quantity Categories No No No Yes ID males vs. Females Judge who is 1st, 2nd, 3rd Convey over & under estimates Measure time to complete task How might the scale be used in research?

Experimental designs What I mean by experiment is likely quite different from what you mean, but hopefully the two terms will overlap sufficiently. An experiment should allow for a systematic observation of a particular behavior under controlled circumstances. Observed patterns in the data should be traceable to our manipulation.

Two types of experimental variables You manipulate the situations under which the behavior is observed and measured. The variables you manipulate are your independent variables. You observe and measure a particular behavior. This measurement is your dependent variable.

Hypotheses: what the IV should do to the DV The experimental hypothesis is what you are testing and what you are hoping to find. The NULL Hypothesis states manipulation will have not impact The goal of our statistical tests is not to prove our hypothesis but to reject the NULL Hypothesis.

P-value Each test provides a test statistic and a p-value. In parametric tests, the test statistic is a ratio of the variance within the data not attributable to your IV to the variance that is attributable to the IV. Signal to noise ratio The p-value is a probability that the observed difference is real. probability that difference occurred by chance. probability that it would not replicate.

Variance between conditions attributable to experimental manipulation The ratio Variance within conditions reflects random variance from multiple sources. A significant effect requires more between variance than within variance

Some common designs 1 IV, 2 levels (state or value of IV) Reading times Main clause 1 Subordinate clause 2 2 IVs, 2 Levels each Reading times Main clause Subordinate clause Transitive 1 2 Intransitive 3 4

Repeated Measures Designs (Within Sample) Repeated measures = more than one observation from each subject. To reduce subject variance, use same subjects in all conditions; within subjects take multiple measures from same individual. Observe a single sentence in multiple contexts. but be sure to control for order effects and sequence effects.

Between sample design If each subject only experiences one condition (1 level of the IV), then you make comparisons between individuals. No way to factor out the inherent differences between the individuals Necessary when comparing monolinguals to bilinguals, boys to girls, dyslexics to nondyslexics, etc.

Type of Research Design One- Sample Two-Sample K sample Correlation Type of Data Parametric Onesample Z One sample t Related Related t Indepen dent Independe nt Z- Independe nt t- Variance Ratio (F) Related Variance Ratio (F) Independe nt Variance Ratio (F) Productmoment correlation coefficient (Pearson s r) Linear regression Non-parametric Onesample proportion Wilcoxon Sign Mann- Whitney χ 2 Page s L trend Jonckheere trend Spearman s rank correlation coefficient

Between analyses # of conditions Parametric scores Nonparame tric - ordinal Nonparame tric - nominal two Independen t samples t Mann- Whitney χ 2 Three or more Betweensubjects ANOVA Kruskal- Wallis χ 2

Within analyses # of conditions Parametric scores Nonparame tric - ordinal Nonparame tric - nominal two Dependent samples t Wilcoxon Linear mixed effects Three or more Withinsubjects ANOVA Friedman Linear mixed effects