Intro to Statistics for Infection Preventionists Presented By: Kelley M. Boston, MPH, CIC

Transcription

1 Intro to Statistics for Infection Preventionists Presented By: Kelley M. Boston, MPH, CIC Infection Prevention & Management Associates at Methodist Healthcare System

2 Role of Statistics in Hospital Epidemiology Aid in organizing and summarizing data Population characteristics Frequency distributions Calculation of infection rates Make inferences about data Suggest association Infer causality Communicate findings Prepare reports for committees Monitor the impact of interventions

3 Descriptive Epidemiology Descriptive Statistics: techniques concerned with the organization, presentation, and summarization of data. Measures of central tendency Measures of dispersion Use of proportions, rates, ratios Who, What, When, Where

4 Variables Anything that is measured or manipulated in a study Types of variables: Qualitative Nominal, Ordinal Quantitative Interval, Ratio Independent vs. Dependant Variables Continuous vs. Discrete variables

5 Variables

6 Measures of Central Tendency

7 Measures of Central Tendency Mean: mathematical average of the values in a data set. Used for describing numeric data Numeric Data = Actual measurements of individuals (temperature, height, pulse) Interval Variables Ratio Variables Continuous Variables Discrete Variables

9 Measures of Central Tendency Median: the value falling in the middle of the data set. Used for ordinal data Ordinal Data = ordered categories ASA Scores, Wound Class Median also good for numeric data when the distribution is skewed

11 Measures of Central Tendency Mode: the most frequently occurring value in a data set. Used for nominal data Nominal Data = named category Gender, Smoking Status

13 Calculating Central Tendency Patient Lengths of Stay in Days: 12, 9, 3, 5, 7, 6, 13, 8, 4, 15, 6 Mean = The sum of each patient s length of stay x The number of patients = = 88 = 8 days 11 11

14 Calculating Central Tendency Patient Lengths of Stay in Days: 12, 9, 3, 5, 7, 6, 13, 8, 4, 15, 6 Mean = The sum of each patient s length of stay x The number of patients = = 88 = 8 days Median = 3, 4, 5, 6, 6, 7, 8, 9, 12, 13, 15 = 7 days

15 Calculating Central Tendency Patient Lengths of Stay in Days: 12, 9, 3, 5, 7, 6, 13, 8, 4, 15, 6 Mean = The sum of each patient s length of stay x The number of patients = = 88 = 8 days Median = 3, 4, 5, 6, 6, 7, 8, 9, 12, 13, 15 = 7 days Mode = 3, 4, 5, 6, 6, 7, 8, 9, 12, 13, 15 = 6 days

16 Excel Hacks: Central Tendency A 1 LOS Mean =AVERAGE(A2:A12) Median =MEDIAN(A2:A12) Mode =MODE(A2:A12) 13 = Formula

17 Measures of Dispersion

18 Measures of Dispersion Range: the difference between the smallest and largest values in a data set. Patient Lengths of Stay in Days: 12, 9, 3, 5, 7, 6, 13, 8, 4, 15, 6

19 Measures of Dispersion Range: the difference between the smallest and largest values in a data set. Patient Lengths of Stay in Days: 3, 4, 5, 6, 6, 7, 8, 9, 12, 13, 15 Range = 15 3 = 12

20 Measures of Dispersion Deviation: The difference between an individual data point and the mean value for the data set Positive, Negative or No Deviation Patient Lengths of Stay in Days: 3, 4, 5, 6, 6, 7, 8, 9, 12, 13, 15 Mean = x = 8 Calculating Deviation

21 Measures of Dispersion Standard Deviation: measure of dispersion that reflects the variability in values around the mean SD = (Xi-X) 2 / n-1 Take all the deviations from the mean, square then, then divide their sum by the total number of observations minus one and take the square root of the resulting number Variance: a measure of variability that is equal to the square of the standard deviation. Patient Lengths of Stay in Days: 12, 9, 3, 5, 7, 6, 13, 8, 4, 15, 6

22 Excel Hacks: Dispersion A 1 LOS Mean =AVERAGE(A2:A2) 8 SD =STDEV(A2:12) 3.87 Variance =VAR(A2:A12) =AVERAGE(A2:A12) 14 =STDEV(A2:A12) 115 =VAR(A2:A12)

23

24

25 Normal Distribution

26 Normal Distribution Properties Continuous distribution Bell shaped curve Symmetric around the mean

27 Non-Normal Distribution: Skew Non-symmetric distribution Positive or Negative Refers to the direction of the long tail (NOT the bulk of the data) Positive skew has the long tail to the right, Negative to the left Mean near the long tail Mode near the short tail Median somewhere in the middle

28 Non-Normal Distribution: Bimodal Two peaks May have 2 populations each with its own central tendency Mean Mode

29 Other Non-Normal Distributions Multi-Modal No central tendency Refer to graph Scattered No central tendency

30 Use of Proportions, Rates & Ratios Proportions: A fraction in which the numerator is part of the denominator. Rates: A fraction in which the denominator involves a measure of time. Ratios: A fraction in which there is not necessarily a relationship between the numerator and the denominator.

31 Proportions

32 Proportions Prevalence: proportion of persons with a particular disease within a given population at a given time.

33 Percent Resistance Proportions Prevalence: proportion of persons with a particular disease within a given population at a given time. Proportion of S. aureus Nosocomial Infections Resistant to Oxacillin (MRSA) Among Intensive Care Unit Patients, * Year *Source: NNIS System, data for 2003 are incomplete

34 Rates Calculation of a Device-associated Infection Rate Step 1: Decide upon the time period for your analysis. Step 2: Select the patient population for analysis. Step 3: Select the infections to be used in the numerator. Step 4: Determine the number of device-days which is used as the denominator of the rate. Device days: total number of days of exposure to the device by all patients in the selected population during the time period. Step 5: Calculate the device-associated infection rate (per 1000 device-days) using the following formula: Number of device-associated infections x 1000 Number of device-days

35 Device-Associated Infection Rate Example: Foley-Associated UTIs in the ICU Step 1: Time period April 2010

36 Device-Associated Infection Rate Example: Foley-Associated UTIs in the ICU Step 1: Time period April 2010 Step 2: Patient population Patients in the ICU of Hospital X who have Foley catheters

37 Device-Associated Infection Rate Example: Foley-Associated UTIs in the ICU Step 1: Time period April 2010 Step 2: Patient population Patients in the ICU of Hospital X who have Foley catheters Step 3: Infections (numerator) April CAUTI infections in the ICU = 2

38 Device-Associated Infection Rate Example: Foley-Associated UTIs in the ICU Step 1: Time period April 2010 Step 2: Patient population Patients in the Medical / Surgical ICU of Hospital X who have Foley catheters Step 3: Infections (numerator) April CAUTI infections in the ICU = 2 Step 4: Device-days (denominator) Total number of days that patients in the ICU had Foley catheters in place = 920

39 Device-Associated Infection Rate Example: Foley-Associated UTIs in the ICU Step 5: Device-associated infection rate Number of device-associated infections x 1000 Number of device-days # of Infections: 2 Foley-days in ICU: 920 Rate = 2 x 1000 = 2.17 per 1000 Foley-days 920

40 NHSN Comparison

41 Common Rates Attack Rate Number of people who develop a certain illness Total number of people at risk Usually measured over an entire period of exposure Often multiplied by 100

42 Common Rates Incidence Rate (Incidence Density) Number of new cases of disease occurring in the population during a specified period of time Number of persons at risk of developing the disease during that period of time Important features are TIME and NEW cases Often multiplied by 1000 in the hospital setting

43 Ratios Calculation of Device Utilization Ratio Step 1: Decide upon the time period for your analysis. Step 2: Select the patient population for analysis. Step 3: Determine the number of device-days. Step 4: Determine the number of patient-days. Patient-days are the total number of days that patients are in the selected population (have the device) during the time period.

44 Device Utilization Ratio Step 5: Calculate the device-utilization ratio using the following formula: Number of device-days Number of patient-days Example: Foley Utilization Ratio in the ICU Foley-days in ICU: 920 Patient-days in ICU: 1176 Ratio = 920 =

45 NHSN Comparison

46 What does this tell you? When examined together, the deviceassociated infection rate and device utilization ratio can be used to appropriately target preventative measures. Consistently high rates and ratios may signify a problem and further investigation is suggested. Potential overuse/improper use of device Consistently low rates and ratios may suggest underreporting of infection or the infrequent use or short duration of use of devices.

47 Analytic Epidemiology Inferential Statistics: procedures used to make inferences about a population based on information from a sample of measurements from that population. Why, How

48 Observational Studies Descriptive Studies Case Series Analytic Studies Cross Sectional: exposure and outcome measured at same time Cohort: exposure defined at start of study, group followed to see if outcome arises Case Control: outcome defined at start of study, look at past exposure status

49 Experimental Studies Exposure status assigned by the researcher Clinical: investigator assigns intervention to individuals in the study population Community: interventions applied to groups

50 2x2: Exposures and Outcomes Patients Exposed Patients With Disease a Patients With No Disease b Patients Not Exposed c d

51 Relative Risk Comparing the risk of disease in exposed individuals to individuals who were not exposed Patients Exposed Patients Not Exposed Patients With Disease a c Patients With No Disease b d RR = Disease incidence in exposed = _a / (a + b)_ Disease incidence in non-exposed c / (c + d) RR = ( a a + b ( c c + d ) )

52 Relative Risk RR = 1 Risk in exposed equal to risk in non-exposed No association RR > 1 Risk in exposed greater than risk in non-exposed Positive association, possibly causal RR < 1 Risk in exposed less than risk in non-exposed Negative association, possibly protective

53 What are the Odds? The ratio of the number of ways an event can occur to the number of ways the event cannot occur Odds are based on Probability Probability of me getting dessert = 60% = P Probability of me not getting dessert = 40% = 1 P Odds = P = _60%_ = 1.5 : 1 = P 40% Probability of dessert = 60% Odds of dessert = 60% / 40% = 1.5

54 Odds Ratio Comparing the odds that a disease will develop Patients With History of Exposure Patients Without History of Exposure Patients With Disease (Cases) a c Patients With No Disease (Controls) b d OR = Odds that a case was exposed_ = _a / c_ = _ad_ Odds that a control was exposed b / d bc

55 Odds Ratio OR = 1 Exposure not related to the disease OR > 1 Exposure positively related to disease OR < 1 Exposure negatively related to the disease

56 Calculating Risk / Odds Patients With Disease Patients With No Disease Patients Exposed Patients Not Exposed RR = a / (a + b) c / (c + d) P / (TP + F N) = 40 / ( ) = / ( ) OR = ad / bc = (40 x 600) / (250 x 110) = 0.87

57 2x2 Table: Test Validity Patients With Disease Patients With No Disease Test is Positive a b Test Is Negative c d

58 Sensitivity / Specificity Sensitivity: the ability of a test to identify correctly those who have a disease = TP / (TP + F N) Specificity: the ability of a test to correctly rule out those who do not have a disease = TN / (TN + F P) Patients With Disease Patients With No Disease Test Positive True Positive (TP) = Have disease and have a positive test False Positive (FP) = No disease, but have positive test Test Negative False Negative (FN) = Have disease but have negative test True Negative (TN) = No disease and have negative test

59 Positive / Negative Predictive Value Positive Predictive Value (PPV): the proportion of patients who test positive that actually have the disease = TP / (TP + F P) Negative Predictive Value (NPV): the proportion of patients who test negative who do not have the disease = TN / (TN + F N) Patients With Disease Patients With No Disease Test Positive True Positive (TP) = Have disease and have a positive test False Positive (FP) = No disease, but have positive test Test Negative False Negative (FN) = Have disease but have negative test True Negative (TN) = No disease and have negative test

60 Calculating Test Validity Patients With Disease Patients With No Disease Test Positive Test Negative Sensitivity = TP / (TP + F N) = 80 / ( ) = 80 / 100 = 80% Specificity = TN / (TN + F P) PPV = TP / (TP + F P) = 800 / ( ) = 800 / 900 = 89% = 80 / ( ) = 80 / 180 = 44% NPV = TN / (TN + F N) = 800 / ( ) = 800 / 820 = 98%

61 Statistical Inference Inferences about populations based on your sample Sampling Allocation Error Systematic Random

62 Statistical Inference Standard Error of the Mean (SE or SEM) SE = SD Sample Size

63 Hypothesis Testing Studies Null Hypothesis (H o ): a hypothesis of no association between two variables. The hypothesis to be tested Alternate Hypothesis (H a ): a hypothesis of association between two variables.

64 Hypothesis Testing Four possibilities when testing whether treatments differ The treatments do not differ, and we correctly conclude that they do not differ The treatments do not differ, but we conclude that they do differ The treatments differ, but we conclude that they do not differ The treatments differ, and we correctly conclude that they do differ

65 Hypothesis Testing: Error Type I Error: Probability of rejecting the null hypothesis when the null hypothesis is true. = probability of making a type I error Type II Error: Probability of accepting the null hypothesis when the alternate hypothesis is true. = probability of making a type II error Power: Probability of correctly concluding that the outcomes differ 1 - = power

66 Hypothesis Testing: Error Reality Treatments are not different Treatments are different Decision Conclude that treatments are not different Conclude that treatments are different Correct Decision Type I Error (Probability = ) Type II Error (Probability = ) Correct Decision (Probability = 1- = Power)

67 Significance Testing Confidence Interval: a computed interval of values that, with a given probability, contains the true value of the population parameter. 95% CI: 95% of the time the true value falls within the interval given. RR 2.5 ( ) p-value ( ): probability that the findings observed could have occurred due to chance alone. p-value usually set at 0.05

68 Interpreting Confidence Intervals

69 Interpreting Confidence Intervals RR = 2 People who were exposed were twice as likely to have the outcome than people who were not exposed RR = 1 People who were exposed were no more or less likely to have the outcome than people who were not exposed RR = 0.5 People who were exposed were half as likely to have the outcome than people who were not exposed Potential Risk Factor RR 95% CI A B C D E

70 Interpreting Confidence Intervals RR = 2 People who were exposed were twice as likely to have the outcome than people who were not exposed RR = 1 People who were exposed were no more or less likely to have the outcome than people who were not exposed RR = 0.5 People who were exposed were half as likely to have the outcome than people who were not exposed Potential Risk Factor RR 95% CI A B C D E

71 Interpreting p-values What the p-values does NOT mean Not the probability of the null hypothesis being true And 1 (p-value) is not the probability of the alternative hypothesis being true Not the probability that your finding is due to random chance Not the probability of a repeat study finding the same result Does not indicate the importance of the observed effect

72 Interpreting p-values The probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true = probability of making a type I error P-value for significance is set when study is designed If p >.05, then the results are considered not statistically significant. If.01 p <.05, then the results are significant. If.001 p <.01, then the results are highly significant. If p <.001, then the results are very highly significant.

73 Interpreting p-values Study A found that the patient s average length of stay was associated with C. difficile colitis (p = 0.002). Highly Significant Study B found that men were more likely to develop a BSI than women (p = 0.09) Not Significant Study C found that redheads were 2% more likely to decline influenza vaccination than blondes (p = ) Statistically Significant, Probably Not Important

74 Inferential Statistics Parametric Tests Normal distribution of the sample population Usually continuous-interval variables z Test Student s t Test

75 z Tests Test the difference in means / proportions Calculates the ratio of the difference between means to the SE Use when: Sample size is greater than 30 Standard deviation (SD) is known Example: Comparing your mean infection rate to NHSN mean rates

76 t Tests

77 t Tests Test the difference in means (one or two tailed) Use when: Sample size is less than 30 Assumes Independence of populations & values Variance is equal for both sets of data No confounding variables Types of t Tests: Independent sample (experiment vs. control) Paired sample (before and after)

78 t Tests

79 t Tests

80 Inferential Statistics Non-Parametric Tests Do not assume normal distribution Used with more types of data: Nominal, Ordinal, Interval, Discrete Chi Square (X 2 ) Compares observed values against expected values Example: Comparing SSI rates for Dr. X and Dr. Y quare/chiexcel.htm

81 Chi square

82 Role of Statistics in Infection Prevention Aid in organizing and summarizing data Can tell you what is happening in your facility Make inferences about data Possible causes of disease or effects of IP interventions Communicate findings Monitor the impact of interventions Prepare reports to share the information Make changes to improve outcomes

83 Useful Resources: APIC EpiGraphics: Statistics and Surveillance Tools for IPs APIC Manual, Chapter 5 Use of Statistics. (2009) PDQ Statistics. GR Norman & DL Streiner (2003). BC Decker. Fundamentals of Biostatistics. B Rosner (2000). Brooks/Cole. Epidemiology for Public Health Practice. RH Friis & TA Sellers (2004). Jones and Bartlett Publishers, Inc. Excel Hacks. DE Hawley (2007). O Reilly Free statistics calculators Free online epi training from North Carolina Center of Public Health Preparedness

84 Useful Resources:

85 Questions? Questions?

86 Images From: National prevalence of methicillin-resistant Staphylococcus aureus in inpatients at US health care facilities, W Jarvis et al. AJIC December 2007 National Healthcare Safety Network (NHSN) report: Data summary for 2006 through 2008, issued December Edwards et al. AJIC December PDQ Statistics. GR Norman & DL Streiner (2003). BC Decker. Pertussis: A Disease Affecting All Ages. DS Gregory. American Family Physician. August Summarizing Your Data. Science Buddies. marizing_data.shtml