1 Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota Fellow s Research Conference July 2012: Philadelphia

3 GOALS Try not to bore you to death!! Try to teach you something useful Introduce concepts Give you a stats reference guide Encourage sports med research

5 QUIZ What is the appropriate stats test to apply?. 50 soccer players wore head gear & 40 did not. Players were followed for diagnosis of concussion over one season. 1. Paired two tailed t-test 2. ANOVA 3. Chi-square analysis 4. McNemar test

6 MY TOP 10 STATS TIP LIST

7 OVERVIEW Introduction Variables Normal distribution Hypothesis testing Comparing means Measuring association Scatterplots & Correlation Regression

8 PURPOSE Stats is just a tool to analyze data you collect Learn the basics Add to your foundation over time Lots of names of tests, just like Sports Medicine!! You wouldn t talk about a Jobe s test during a knee exam Mt Stats

9 PURPOSE Infer something about a population based on information from a sample of that population Use probability concepts Describe how reliable the conclusions are ie: You have all this data & is it useful in someway?

10 MY TOP 10 STATS TIP LIST

11 Variables Discrete Examples Gender (m/f); Fracture (y/n) Nominal or Ordinal Nominal: Set of categories, no ordering ie: m/f Ordinal: Ordering, but no meaning to differences in scores ie Compare 1 st & 2 nd place finishers (ranking) without using actual times Continuous Examples Weight, race time Differences between values has meaning

12 USE FOR FUTURE REFERENCE Variable Summary Statistics Comparing 2 groups Measuring Association Nominal Mode Chi-square Contingency Coefficient Ordinal Median Chi-square Nonparametric Kappa Spearman r Kendall s tao Continuous Mean Median & SD t-test Nonparametric Spearman r Pearson r

13 SAMPLE SIZE & POWER Important to calculate Do this prior to the study Avoid expenses, time, resources, etc. Calculations available in stats software Let s you know that you have enough subjects to detect a meaningful change

14 HYPOTHESIS TESTING Null hypothesis (H 0 ) No difference between groups (groups are the same) Alternative hypothesis (H 1 ) There is a difference between groups Type I error Saying groups are different when they aren t Type II error Saying groups are the same when they are different

15 MY TOP 10 STATS TIP LIST

16 Normal Distribution Applies to continuous variables Mean=median=mode Many stats tests assume nl distr t-test; ANOVA; regression Ways to test to see if a nl distribution Use non-parametric tests or transform data (ie log) if not a nl distribution Methods that assume nl distr Robust to moderate departures of nl distr assumption if n is large enough!

17 Normal Distribution Symmetrical about the mean BLUE= 68.2% of values w/in 1 SD BLUE+ BROWN= 95.4% of values w/in 2 SD BLUE + BROWN + GREEN= 99.7% of values w/in 3 SD

18 P-Value = the probability of obtaining results by chance alone p=0.05 (5% chance) May not tell whole story Statistically significant Clinically significant Small or large n s Small n: Type II error Give both: p-value & CI

19 MY TOP 10 STATS TIP LIST

21 Comparing 2 groups or rxs Type of Outcome Continuous Binary (y/n) Nl Distribution Paired Unpaired Paired t-test Yes Parametric Unpaired t-test Sign test No Nonparametric Paired Sign rank test McNemar s test Unpaired Wilcoxon rank sum test Yes Large Sample Size Chi-Squared No Fischer s Exact Test

22 Comparing 3 or > groups Type of Outcome Continuous Binary (y/n) Nl Distribution Yes Parametric No Nonparametric Frequency Tables Chi-squared Methods ANOVA Kruskal- Wallis Test

23 Comparing 2 groups or rxs Type of Outcome Continuous Binary (y/n) Nl Distribution Yes No Parametric Nonparametric Paired Unpaired Paired Unpaired t-test t-test Sign test Sign rank test Wilcoxon rank sum test

24 Comparing Group Means t-test ANOVA Assumptions Data is continuous & nl distributed Methods 2 indep samples: 2 sample t-test Paired data: Paired t-test >2 indep samples: ANOVA Includes Confidence intervals Hypothesis testing

25 3 types 2 sample t-test Student s t-test t-tests Independent samples t-test Paired samples t-test Paired data: 2 measurements on same subject or test unit One sample t-test Compare to a known (norm) value

26 t-tests One-tailed vs two-tailed Almost always use two-tailed Results could be higher or lower not just one way

27 95% CI Confidence Intervals 95% confident that the true value falls in the interval. Wide CI suggests uncertainty about data Does the CI contain a value that implies no change or no effect? Mean: 0 Odds ratio: 1 Does the confidence interval lie partly or entirely within a range of clinical indifference?

28 Example: Confidence Intervals Survey 19 millionaires Mean income donation=15% +/- 2 SD CI: +/- 2.4% Interpretation We are 95% confident that millionaires donate between % of their income.

29 Comparing 2 groups or rxs Type of Outcome Continuous Binary (y/n) Nl Distribution Yes No Parametric Nonparametric Paired Unpaired Paired Unpaired t-test t-test Sign test Sign rank test Wilcoxon rank sum test

30 SIGN TEST Non-parametric test Not a nl distribution Alternative to paired t-test Good for small sample size Test the difference for matched pairs on before & after data Method: Calculate diffs Throw-out zero diff Test for # of + diff H 1 is true: median does not = 0

31 WILCOXON SIGN RANK TEST Same application as Sign Test Uses the ranks & the signs of diff More powerful test than Sign Test Method: Calculate differences in pairs Throw away zero differences Rank from smallest to largest difference w/out regard to +/- Test: sum of ranks of + diff

32 Wilcoxon Rank Sum Test Also known as: Mann-Whitney U test Comparing 2 independent samples Not nl distribution Good for detecting changes in medians Method: Combine data from 2 gps Rank smallest to largest Add ranks in the gp with smaller sample size Add ranks in gp with larger N Test: sum of ranks for smaller gp compared to larger gp

33 EXAMPLE: Rank-Sum Test Team Cheetah 5 team members Team Impala 7 team members Results TC: 3, 4, 7, 12, 13 (min) Results TI: 2, 5, 6, 8, 9, 10, 11 (min) Combine data & then rank: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 Sum ranks of smaller gp = 34 Test if sum ranks of smaller gp is the same or different from other group

34 MY TOP 10 STATS TIP LIST

35 Comparing 3 or > groups Type of Outcome Continuous Binary (y/n) Nl Distribution Yes Parametric No Nonparametric Frequency Tables Chi-squared Methods ANOVA Kruskal- Wallis Test

36 ANOVA Analysis of variance Comparing means of >2 groups Assumes Continuous Nl distrib Same variance w/in each group Benefits compared to t-tests Efficiency Avoids multiple testing problem Problem Sign F test tells you that at least 2 gps are different, but not which ones!

37 ANOVA-Problem Multiple Comparisons Procedures Used to tell which groups differ Stricter levels for accepting/rejecting that the means are the same 4 methods Bonferroni Tukey Neuman-Keuls Scheffe

38 Kruskal-Wallis Test Nonparametric test Use for comparing 3 or > independent groups Think of as a non-parametric ANOVA test Good for detecting changes in median

39 MY TOP 10 STATS TIP LIST

40 Comparing 2 groups or rxs Type of Outcome Continuous Binary (y/n) Paired Unpaired McNemar s test Yes Large Sample Size No Chi-Squared Fischer s Exact Test

41 Comparing Frequency Data Binary outcome (yes/no) Paired method McNemar s Test Non-paired methods Pearson s Chi-square Fisher s Exact Test

42 Assumes Pearson s Chi-square Random samples from 2 groups Compares expected with observed All samples sizes are large enough All frequencies must be > 5 2x2 table: Standard New Helmet Helmet Concussion No Concussion TOTAL n 1 =25 n 2 =19 p 1 =18/25 =0.72 (72%) p 2 =6/19 =0.32 (32%)

43 Pearson s Chi-square OBSERVED Standard Helmet New Helmet TOTAL Concussion No Concussion 7 13 TOTAL n 1 =25 n 2 = X 2 =7.1 (p=0.0077) EXPECTED (if not different) Concussion No Concussion Standard Helmet 24/44 x 25 = /44 x 25 =11.36 New Helmet 24/44 x 19 = /44 x 19 =8.64

44 Fisher s Exact Test Use this test when 1 or more of frequencies is < 5

45 McNemar s Test Use for paired binary data Same subject before & after rx Cross-over study

46 MY TOP 10 STATS TIP LIST

47 RISK Risk difference Absolute difference in risk proportions Can be difficult to interpret Relative Risk (RR) Also known as Risk Ratio Risk in 1 gp/risk in other gps Odds Ratio (OR) Probability or Odds of an event OR= odds of exposed gp/odds of control gp OR=1 means no difference

48 RELATIVE RISK Relative risk (RR) is the risk of an event relative to exposure. Risk of having a boy if mom took testosterone during pregnancy 75/100=75% Risk (probability) of having a boy= 51/100= 51% Risk Ratio=.75/.51=1.5 Easier to understand Risk ratio =0.5 =risk is half Risk ratio=2=risk is double

49 CALCULATING ODDS Odds of an event =# of events/# of nonevents 51 boys born for every 100 births Odds of any randomly chosen delivery being a boy=51/100-51=1.04 Odds>1: Event is more likely to happen than not Odds of certain event= Odds<1: Event is not likely to happen Odds of an impossible event=0

50 ODDS RATIO Testosterone example 75/ /100-51= 3/1.04= 2.9 The odds of having a boy is 2.9x higher in moms using testosterone vs mom s not using testosterone.

51 ODDS RATIO: Benefits No upper limit RR range varies depending on baseline prevalence When events are low (rare dz) OR approx RR OR ok to use with case control Don t use RR with case control

52 Calculating OR Cross Product Factor (Event) Group 1 Group 2 a b No Factor (No Event) c OR= a/c b/d d = a x d b x c Concussion No Concussion Standard New Helmet Helmet x 13 = x 7

55 MY TOP 10 STATS TIP LIST

56 SCATTERPLOT Can help answer the following Are variables X & Y related? Are X & Y linearly related? Are X & Y non-linearly related? Does the variation in Y change depending on X? Are there outliers? 1. Linear relationship 2. Small scatter (strong correlation) 3. + slope (+ correlation)

57 SCATTERPLOTS No relationship 1. Linear 2. Small scatter (strong correlation) 3. - slope (neg correlation)

58 SCATTERPLOTS Outlier Non-linear

59 CORRELATION: PEARSON Measures the strength of (linear) association between 2 variables Ranges from -1 to 1 1= -1= 0= Examples: r=0.8 r=0.3 r=-0.7 perfect + correlation perfect correlation no correlation strong + correlation weak + correlation moderate correlation

60 MY TOP 10 STATS TIP LIST

61 REGRESSION A straight line that describes the dependence of one variable on another is called a regression line Y=response variable ie finishing time X=explanatory variable ie body fat percentage Is finishing time predicted by body fat percentage?

62 Linear REGRESSION TYPES Data: Normal distribution Simple or Multiple Logistical Data: binary (y/n) Simple or Multiple Multiple Regression Models Allow estimation of the indep effect of each X after controlling for other variables in the model.

63 Simple LINEAR REGRESSION Use to predict Y given X Determine best fitting equation Test whether there is a relationship between X & Y

64 Linear Regression R 2 value =% of variance in Y explained by X If R 2 =1 then x can predict y 100% of the time F test for significance If p >0.05 then no significant relationship (slope of line =zero) exists between x & Y

65 Multiple Linear Regression Model that explains how a single dependent variable (Y) relates to several independent variables (x). Example: Test if age, gender, body fat %, prior triathlon competitions, & occupation predict finishing time.

66 Multiple Linear Regression How many variables to use? Recommend that you have 10-20x # of cases to variables tested. Test lots of variables Increase random chance of stat sign Model becomes unstable

67 Multiple Linear Regression Example cont: Model predicts 90% of variance in performance Now test for which variable or combinations of variables is most predictive Body fat %: 15% Age: 10% Gender: 30% Body fat & gender 35% Occupation 0% Prior triathlon 40%

68 MY TOP 10 STATS TIP LIST

69 QUIZ What is the appropriate stats test to apply?. 50 soccer players wore head gear & 40 did not. Players were followed for diagnosis of concussion over one season. 1. Paired two tailed t-test 2. ANOVA 3. Chi-square analysis 4. McNemar test

70 OTHER TIPS Stats support at Universities Usually charge per hour MS cheaper than PhD Authorship If stats person willing to: (International Committee of Medical Journal Editors (ICMJE) guidelines) Help design study Analyze data Format tables, graphs, etc Write a portion of article May be able to get small grant to cover \$ of stats analysis On-line support

72 REFERENCES 1. Applied Biostatistics in Clinical Research Course Book; Case-Western Reserve General Clinical Research Center Biostatistics 100B Course Book; UCLA The Essentials of Clinical Investigation Course Book; UCLA Clinical Research Center Moore, McCabe, Craig (2009) Introduction to the Practice of Statistics, Sixth Edition. WH Freeman and Company, New York. ISBN-13:

