Describing Data and Descriptive Statistics

Transcription

1 Describing Data and Descriptive Statistics Peter Moffett MD May 17, 2012 Introduction Data can be categorized into nominal, ordinal, or continuous data. This data then must summarized using a measure of central tendency and a measure of variability. Once the data is presented in this fashion, relationships can be inferred, or simply left for the reader to interpret. In the real world, a researcher never studies an entire population, but rather a specific sample and then draws conclusions about the rest of the population. Describing data in the correct way, can allow the researcher to make the correct interpretations about the population. Sample versus Population Population The universe about which the investigator wants to draw conclusions. 1 Example: All males in the United States Army. Sample The subset of the population that is actually being observed or studied. 1 In order for a sample to be accurate it should be random with an equal chance of every member of the population having a chance of being selected. This helps to reduce bias. Example: A sample of 100 males from each US Army base randomly selected from the population. Types of Data (Scales of Measurement) Nominal Data Data that are divided into categories or groups with no implied order or scale. Examples: Male/female, aspirin vs. placebo, urban vs. suburban vs. rural. Hint: If you can ask a yes or no question, the answer is nominal data. Is the person male? Did the patient take aspirin? Is the patient from an urban environment? Proportions: This is another name for percentages. Be careful, percentages are actually nominal data. You are technically asking, what percentage of this sample is male? You get a number as an answer (a percentage) but it is

2 actually just a way of quantifying your yes/no answer. So is the patient a male? Answer: Yes.50% of the time. Ordinal Data Data that can be placed into some kind of meaningful order, but without any indication about the size of the interval. Example: Runners come in at 1 st, 2 nd, and 3 rd place in the race. You have no indication if they were seconds, or minutes apart. Likert Scale: This is one of the most common ordinal scales in biomedical studies. These are the 5- point scales that ask someone if they like something, or dislike something. Other medical examples: Glascow coma scale (a 6 is different than a 3 but it is not 2 times a 3), Stages of Hypertension (you cannot tell if someone who is Stage I HTN is actually at the highest allowable or lowest allowable blood pressure). Continuous (Interval/Ratio) Data A statistician would be angry over this, but it is useful to lump this all together. Essentially these are all terms for data that have a meaningful scale. These will often just be referred to as continuous data. If you want the technical definitions: o Interval- Data that has meaningful intervals but no absolute zero. So while we can quantify a 30 difference between 30 and 60 C, technically we cannot say that 60 is twice as hot as 30 because 0 C is not a true absence of heat. o Ratio data- Data that has meaningful intervals and an absolute zero. So the Kelvin scale has an absolute zero so we can say that 60 K is twice as hot as 30 K. Most of our data in medicine is ratio (weight, time, heartbeat, etc.) o Continuous data- Data that may include any value (including fractions and parts of a whole). From now on we will ignore the distinctions and refer to all of this as continuous data Examples: Heart rate, blood pressure, time, weight Useful points about types of data Some data can only be described using a nominal scale. Think of male versus female. Many types of data can be described using all of the techniques above. o Example: Hypertension in patients Nominal: Hypertension (Y/N) Ordinal: Pre- Hypertension, Stage I Hypertension, Stage II Hypertension Continuous: Actual blood pressure measurements

3 Researchers pick a certain scale (when multiple choices exist) for a variety of reasons. Continuous data has higher information content and typically require smaller sample sizes. Nominal data may be easier to collect. Proportions/Percentages: It bears repeating. A percentage seems like continuous data, but it is actually nominal data. Measures of Central Tendency All data sets can be described by a measure of central tendency. Different types of data are best summarized by different measures of central tendency. Mean Average of all of the data. Can only describe continuous data. Nominal data does not have a mean, and describing ordinal data with a mean is misleading. The mean is affected by extreme values. If a data set has a few extreme values, it will change the mean enough to make it an unreliable measure of central tendency. Median The midpoint in the data. 50% of the data points are above the median and 50% are below the median. The median is not affected by extreme values because it only responds to the number of observations, not the magnitude of the observations. Ordinal data is best described using the median. Continuous data with extreme values is best described using the median. Mode The value that appears most often in the data set. Often used to describe nominal data. Not influenced by extreme values Examples Take the data set: (1, 2, 3, 4, 5, 6, 7, 8, 9) o Mean: ( )/9= 5 o Median: 5 o Mode: 5 (in this case since all numbers are represented only once, it is typical to pick the median as the mode) Take the data set with an extreme outlier: (1, 2, 3, 4, 5, 6, 7, 8, 500) o Mean: ( )/9= 59.6 o Median: 5 o Mode: 5

4 Data Distributions For each type of data, you can take the frequency with which each value appears and plot it on a graph. This gives you a data distribution. You need to have a little background on data distributions to understand the concept of measures of variability as well as in the future when we discuss statistical tests. Normal/Parametric/Gaussian Distribution All of these terms apply to the bell- shaped curve we are all familiar with. Many biologic phenomenon fall into a normal distribution. The mean, median, and mode for a normal distribution are all the same. Mean, Median, Mode Skewed Distributions These are also referred to as non- parametric distributions. Extreme outlier data tends to pull the mean in a certain direction away from the true midpoint of the data. The mean is pulled toward the data tail and this is how the distributions are named (not for the true midpoint or hump on the data distribution)

5 Positive Skew Negative Skew Notice in the above examples how the median is the true midpoint of the data in skewed populations. Non- parametric (skewed) data cannot be tested using parametric tests (more in future lectures) Medical Example 2 If we measure the average systolic blood pressure from a sample of 30 non- hypertensive men aged years we would find the following normal distribution. Each X is one data measurement so you can see that the most frequently obtained value (the mode) is 120mm Hg.

6 I I I I I I I I I I I Mean = 120 Median = 120 Mode = 120 If instead we took the blood pressure of 26 patients with renal hypertension we would get the following curve. Notice in this example that we expect the systolic blood pressure to be really high in these people. Yet some of them are outliers and actually have a lower blood pressure than expected. This pulls the mean down and is a negatively skewed distribution. x I 180 x I 190 X X X X X X I I I I I I Mean = Median = 230 Mode = 240 Measures of Variability In order to fully describe data, you must report not only the central tendency of the data, but also the variability of that data. Look at the following frequency graph to see why. Notice that the means, medians, and modes of the data are exactly the same, but yet the data are obviously somehow different. 1

7 Range Reports the lowest and highest numbers. Purely descriptive Affected by outliers Interquartile Range Reports the range of values from the 25 th percentile through the 75 th percentile The median is always the 50 th percentile (so 50% of values fall below the median). The interquartile range contains 50% of the data points (between th percentile) Often reported with medians for ordinal data or with a median to describe continuous data with outliers. Median Percentile

8 Standard Deviation (SD) A unit of measure that has to do with variance around a mean with continuous data Can only be used with normally distributed data Approximately 68% of all data falls within 1 SD of a mean Approximately 95% of all data falls within 2 SD of a mean 15.9 th percentile S 50 th percentile 84.1 percentile 2.3 rd percentile!, / 7.7 TM percentile 2 2"14 /, "13 /, ~ I 13"59 / -4SO -3SD -2SD -1 SD I I 34.13% o 34.13% 68.26% I 95.44o/ /0 13"59% i +1 SD +2SD +3 SD +4SD 99"980/0,,, I J I 4 So if we find that our mean heart rate is 80 in a population and a SD of 10, then 68% of all people will have a heart rate between 70 and 90, and 95% of all people will have a heart rate between 60 and 100. The standard deviation is often used to determine normal lab values. Summary and Conclusions about central tendency and variability Listed below are several tables to help summarize various data types, methods of describing central tendency and measures of variability. Type of Data Example Measure of Central Tendency Measure of Variability Nominal Male v Mode Range? Female Ordinal 5 point Likert Median Interquartile Range Continuous Heart rate Mean Standard Deviation

9 Characteristic Mean Median Mode Useful with continuous data Yes Yes Yes Useful with ordinal data No Yes Yes Useful with nominal data No No Yes Affected by outliers Yes No No Confidence Intervals (CI) Definition The most basic way to think about confidence intervals is to think of them as mathematical predictions about where the real value for the variable exists. We typically use 95% confidence intervals in clinical medicine. A confidence interval simply takes the data we actually have in our sample and tells us how this applies to the population (real world). Examples o I measure the heart rate of 200 active duty soldiers and find that the mean heart rate is 50 with a 95% CI of The correct interpretation of this is that my sample mean is 50 beats per minute but that I am 95% certain that the mean heart rate for the entire population of active duty soldiers (whom I did not study) is between 42 and 61. o I ask 40 parents if they like the experience their child had when receiving intranasal fentanyl as a sedative. We use the following Likert scale (1- hate, 2- dislike, 3- neutral, 4- like, 5- love). We find a median score of 4 and our 95% CI comes back at 3-5. The correct interpretation is that in our sample, half of the parents liked or loved the sedation. In the real world we are 95% certain that 50% of parents will fall somewhere between 3 (neutral) and 5 (love). o I take 30 doctors I know and ask them if they know what a confidence interval is. I report that 40% do know what it is with a 95% CI of 5-75%. The correct interpretation is that in my sample 40% of physicians knew what a confidence interval is, and that I m 95% certain that between 5% and 75% of physicians know what a confidence interval is. Note: You will sometimes see a researcher describe their sample with a 95% CI. So you look at the first table and see they are reporting that they enrolled 65% males with a 95% CI of 55-75%. This is a little confusing if you do not understand confidence intervals. Most people will look at that and think wait, they can t count? Why aren t they 100% certain that they have a sample with 65% males. In reality, they are saying that they have a sample with 65% males, and that they think that in the population they are studying, there are somewhere between 55-75% males. The confidence interval is derived from the sample data but refers to the overall population.

10 Methods There are mathematical formulas to derive confidence intervals for almost any type of data. You can read a very thorough review and explanation with formulas in Chapter 7 of Glantz s book. 3 Here are some key points. Standard Error of the Mean (SEM)- You may see this term used. Essentially this is a mathematical way of taking the standard deviation from a sample, and determining how representative it is of the population. The SEM is then used to calculate a confidence interval. This is only useful for continuous data, but is always used as the example for how to calculate confidence intervals. o Note: The SEM should never be reported for a sample. It is a measure relating to the population, not the sample. Since the SEM is always smaller than the standard deviation of a population, some authors mistakenly use the SEM instead of the standard deviation. You may select various cutoffs of confidence for your confidence interval. We typically use 95% in medicine, but you can choose 90%, 99% or whatever other number you would like. If you want to be 99% sure that your confidence intervals include the population values then the width of the confidence interval will be wider. The sample size directly impacts the width of the confidence interval. If you have a large sample size, your confidence interval tightens or narrow. Example- Below is a graph showing our data from the blood pressure measurements in normal men aged It shows the relationship between SD, SEM, and the confidence interval. 2 ~XX.K, x x21 xn <~3.K; I I l l I I I I I I I I SD = 9.37 SD = 9.37 SEM = 1,71 SEM = 1,71 I 1 I 95% CI l MEAN

11 Example: The following Table 2 shows how changing your confidence level, or your sample size affects the reporting of the confidence intervals. TABLE 4. Effect of confidence level and sample size on confidence interval width Calculation of CIs for Data Presented in Table 2 C1(%) SD n SEM CI _ Effect of Sample Size on CI for Data With A Mean of 120 and a SD of C1(%) SD n SEM CI _ _ ± Using Confidence Intervals Confidence intervals can be used in two different ways: 1. Confidence intervals can be purely descriptive. For example, according to an analysis of the Canadian CT Head Rule 4 the sensitivity of the rule for finding neurosurgical lesions was reported as 100% 95CI ( ). The authors are saying their sample estimate is a 100% sensitivity and they are 95% certain the true test sensitivity is somewhere between 64.6% and 100%. 2. Confidence intervals can also be used to make inferences. We will discuss classical hypothesis testing in the future but a brief summary here will make the idea more clear. a. Classical hypothesis testing produces a P- value and tells the researcher/reader that the observed difference is STATISTICALLY significant. It says nothing of the magnitude of that difference. b. If instead a researcher reports the actual difference between two numbers and gives a 95% confidence interval then the magnitude of the effect is obvious. In addition, if the confidence interval crosses the identity point, then the results are not STATISTICALLY significant i. The identity point (also called the null point, or no effect point) is the number that means there is no effect. If you are subtracting two results then obviously the null point is zero. This is why it is commonly taught that if a confidence interval crosses zero it is not significant. Remember however that for a ratio, the null point is actually 1. A ratio of 1 is meaningless. c. The summary of this is that you find the difference between two numbers and report the confidence interval around it. If that confidence interval contains a number that is statistically not significant (the null point) or clinically not significant (2BPM reduction in heart rate) then you can reject the results.

12 Confidence Intervals and Hypothesis Testing (Examples): Let us pretend we have a new drug called RATE- A- BLATE that is supposed to lower the heart rate rapidly and without any blood pressure side effects in people with atrial fibrillation and rapid- ventricular response. Using this drug, we can look at the various ways confidence intervals can be used and how they are impacted by certain factors. 1. In the first experiment, we take 10 patients and give them RATE- A- BLATE (RAB). We find that in the sample RAB drops the heart rate approximately 30 beats per minute. After the math, we report the results as % CI (- 50-2) So based on this sample we are 95% certain that the drug may lower the heart rate on average 50 BPM or raise it 2BPM. Looking at this data you would think, well I do not want to give a drug that may raise the heart rate a few beats, but the possible values for this drug (negative 50 through 2) are predominantly on the heart rate lowering side. If another study was performed with larger numbers we might get a narrower range of possible values to examine. 2. In the second experiment we add 40 more patients in atrial fibrillation with rapid ventricular response so that we are now studying a total of 50 patients. Our RAB study finds a mean change in heart rate of %CI ( ) This time we can say that we are 95% certain our drug lowers heart rate between 20 and 40 beats per minute. The increase in sample size has narrowed our confidence interval. Now we are likely to accept that the drug is most certainly effective at lowering heart rate. If we had just reported the sample means in both cases, then we would all just think that RAB always lowers heart rate by 30bpm. Do not trust a study that does that. 3. Now let us move on to study RAB versus diltiazem. Assume we run a well- designed double- blind, randomized control trial and determine that once

13 again RAB gives us a mean change in heart rate of %CI ( ). We also find that diltiazem causes a mean change in heart rate of - 20 ( ) RAB Diltiazem Notice that initially we are simply describing the two heart rate lowering properties of the drugs. Looking at the data there appears to be a lot of overlap in the possible mean heart rate lowering effects of the drug. This may not be statistically significant. So we take our mean heart rate changes from the two drugs and compare them using the Student s T test (more in another lecture) and find that RAB lowers the heart rate by 10 bpm more than diltiazem but with a P- value of >.05 so it is not statistically significant. Another way (some argue a better way) is to just show the mean difference with a confidence interval. We get a %CI (- 35-5) Mean difference in heart rate between diltiazem and RAB This is the same information simply presented in a different fashion. Instead of just saying there was a - 10BPM difference between the two drugs and it was not statistically significant, we can see how close it came to being statistically significant. In this graph we are plotting the difference between two numbers so a zero is not statistically significant. But we and realize that it was almost statistically significant. We plan another study. 4. Once again we plan to increase our sample size to see if the mean difference between the two drugs is statistically significant. We enroll over 5000 patients into the same double- blind RCT comparing diltiazem to RAB. We end up this time with results indicating that the mean difference between RAB and diltiazem is % CI ( ).

14 Mean difference in heart rate between diltiazem and RAB This time we did it. The confidence interval is very narrow (only 1 BPM on either side of our sample estimate of - 10BPM). Plus it does not cross zero (and sure enough the P- value will be <.05). It looks like with enough patients; we have finally shown that RAB has a statistically significant reduction in heart rate over diltiazem of about 10BPM (to be exact it could be anywhere from 9 to 11). Example Wrap- Up: We can make the following conclusions about RAB based on our studies: RAB seems to lower heart rate in atrial fibrillation with rapid ventricular response by an average of about 30 BPM but could lower it anywhere from 20 to 40 BPM compared to doing nothing. RAB seems to lower heart rate in atrial fibrillation with rapid ventricular response a little better than diltiazem. It lowers the heart rate about 10 BPM but could lower it anywhere from 9 to 11 BPM. But what does this tell us clinically? In the end these are only STATISTICALLY significant numbers. Remember that CLINICAL significance is not the same thing. Consider the following situations: RAB is cheaper, and does not drop blood pressure like diltiazem. You think that the data supports its use. RAB turns out to be more expensive, and does not drop blood pressure like diltiazem. You decide to only use RAB in normotensive or hypotensive patients and save costs by using diltiazem when you have a hypertensive patient. RAB turns out to be less expensive, and does not drop blood pressure like diltiazem; however it is associated with causing a myocardial infarction about 10% of the time it is given. You decide that those extra 10BPM over diltiazem (as well as the cost and BP stable effects) are not worth the consequences. RAB turns out to be less expensive, and does not drop blood pressure like diltiazem. You decide however that a 10 BPM difference over diltiazem is not really that clinically significant and you are more used to diltiazem so you continue to use it instead of RAB. All of those conclusions are possible and individualized based on the data. If you look at a confidence interval, think about all of the values in the range as the real world value and think it is worth your time, then use that intervention. If not, drop it.

15 Resources: I highly recommend the following two references: 2,5 For a very complete description of all things related to confidence intervals refer to. 3 If you want to plan a study and derive the sample size based on a certain confidence interval you wish to obtain, go to: If you have data and want to calculate a confidence interval around it go to: References: 1. Glaser AN. High- Yield Biostatistics. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; Gaddis GM, Gaddis ML. Introduction to biostatistics: Part 2, Descriptive statistics. Annals of emergency medicine. Mar 1990;19(3): Glantz SA. Primer of Biostatistics. 6th ed. New York: McGraw- Hill; Smits M, Dippel DW, de Haan GG, et al. External validation of the Canadian CT Head Rule and the New Orleans Criteria for CT scanning in patients with minor head injury. JAMA : the journal of the American Medical Association. Sep ;294(12): Gaddis ML, Gaddis GM. Introduction to biostatistics: Part 1, Basic concepts. Annals of emergency medicine. Jan 1990;19(1):86-89.