Describing Data and Descriptive Statistics
|
|
- Clementine Boyd
- 7 years ago
- Views:
Transcription
1 Describing Data and Descriptive Statistics Peter Moffett MD May 17, 2012 Introduction Data can be categorized into nominal, ordinal, or continuous data. This data then must summarized using a measure of central tendency and a measure of variability. Once the data is presented in this fashion, relationships can be inferred, or simply left for the reader to interpret. In the real world, a researcher never studies an entire population, but rather a specific sample and then draws conclusions about the rest of the population. Describing data in the correct way, can allow the researcher to make the correct interpretations about the population. Sample versus Population Population The universe about which the investigator wants to draw conclusions. 1 Example: All males in the United States Army. Sample The subset of the population that is actually being observed or studied. 1 In order for a sample to be accurate it should be random with an equal chance of every member of the population having a chance of being selected. This helps to reduce bias. Example: A sample of 100 males from each US Army base randomly selected from the population. Types of Data (Scales of Measurement) Nominal Data Data that are divided into categories or groups with no implied order or scale. Examples: Male/female, aspirin vs. placebo, urban vs. suburban vs. rural. Hint: If you can ask a yes or no question, the answer is nominal data. Is the person male? Did the patient take aspirin? Is the patient from an urban environment? Proportions: This is another name for percentages. Be careful, percentages are actually nominal data. You are technically asking, what percentage of this sample is male? You get a number as an answer (a percentage) but it is
2 actually just a way of quantifying your yes/no answer. So is the patient a male? Answer: Yes.50% of the time. Ordinal Data Data that can be placed into some kind of meaningful order, but without any indication about the size of the interval. Example: Runners come in at 1 st, 2 nd, and 3 rd place in the race. You have no indication if they were seconds, or minutes apart. Likert Scale: This is one of the most common ordinal scales in biomedical studies. These are the 5- point scales that ask someone if they like something, or dislike something. Other medical examples: Glascow coma scale (a 6 is different than a 3 but it is not 2 times a 3), Stages of Hypertension (you cannot tell if someone who is Stage I HTN is actually at the highest allowable or lowest allowable blood pressure). Continuous (Interval/Ratio) Data A statistician would be angry over this, but it is useful to lump this all together. Essentially these are all terms for data that have a meaningful scale. These will often just be referred to as continuous data. If you want the technical definitions: o Interval- Data that has meaningful intervals but no absolute zero. So while we can quantify a 30 difference between 30 and 60 C, technically we cannot say that 60 is twice as hot as 30 because 0 C is not a true absence of heat. o Ratio data- Data that has meaningful intervals and an absolute zero. So the Kelvin scale has an absolute zero so we can say that 60 K is twice as hot as 30 K. Most of our data in medicine is ratio (weight, time, heartbeat, etc.) o Continuous data- Data that may include any value (including fractions and parts of a whole). From now on we will ignore the distinctions and refer to all of this as continuous data Examples: Heart rate, blood pressure, time, weight Useful points about types of data Some data can only be described using a nominal scale. Think of male versus female. Many types of data can be described using all of the techniques above. o Example: Hypertension in patients Nominal: Hypertension (Y/N) Ordinal: Pre- Hypertension, Stage I Hypertension, Stage II Hypertension Continuous: Actual blood pressure measurements
3 Researchers pick a certain scale (when multiple choices exist) for a variety of reasons. Continuous data has higher information content and typically require smaller sample sizes. Nominal data may be easier to collect. Proportions/Percentages: It bears repeating. A percentage seems like continuous data, but it is actually nominal data. Measures of Central Tendency All data sets can be described by a measure of central tendency. Different types of data are best summarized by different measures of central tendency. Mean Average of all of the data. Can only describe continuous data. Nominal data does not have a mean, and describing ordinal data with a mean is misleading. The mean is affected by extreme values. If a data set has a few extreme values, it will change the mean enough to make it an unreliable measure of central tendency. Median The mid- point in the data. 50% of the data points are above the median and 50% are below the median. The median is not affected by extreme values because it only responds to the number of observations, not the magnitude of the observations. Ordinal data is best described using the median. Continuous data with extreme values is best described using the median. Mode The value that appears most often in the data set. Often used to describe nominal data. Not influenced by extreme values Examples Take the data set: (1, 2, 3, 4, 5, 6, 7, 8, 9) o Mean: ( )/9= 5 o Median: 5 o Mode: 5 (in this case since all numbers are represented only once, it is typical to pick the median as the mode) Take the data set with an extreme outlier: (1, 2, 3, 4, 5, 6, 7, 8, 500) o Mean: ( )/9= 59.6 o Median: 5 o Mode: 5
4 Data Distributions For each type of data, you can take the frequency with which each value appears and plot it on a graph. This gives you a data distribution. You need to have a little background on data distributions to understand the concept of measures of variability as well as in the future when we discuss statistical tests. Normal/Parametric/Gaussian Distribution All of these terms apply to the bell- shaped curve we are all familiar with. Many biologic phenomenon fall into a normal distribution. The mean, median, and mode for a normal distribution are all the same. Mean, Median, Mode Skewed Distributions These are also referred to as non- parametric distributions. Extreme outlier data tends to pull the mean in a certain direction away from the true midpoint of the data. The mean is pulled toward the data tail and this is how the distributions are named (not for the true midpoint or hump on the data distribution)
5 Positive Skew Negative Skew Notice in the above examples how the median is the true midpoint of the data in skewed populations. Non- parametric (skewed) data cannot be tested using parametric tests (more in future lectures) Medical Example 2 If we measure the average systolic blood pressure from a sample of 30 non- hypertensive men aged years we would find the following normal distribution. Each X is one data measurement so you can see that the most frequently obtained value (the mode) is 120mm Hg.
6 I I I I I I I I I I I Mean = 120 Median = 120 Mode = 120 If instead we took the blood pressure of 26 patients with renal hypertension we would get the following curve. Notice in this example that we expect the systolic blood pressure to be really high in these people. Yet some of them are outliers and actually have a lower blood pressure than expected. This pulls the mean down and is a negatively skewed distribution. x I 180 x I 190 X X X X X X I I I I I I Mean = Median = 230 Mode = 240 Measures of Variability In order to fully describe data, you must report not only the central tendency of the data, but also the variability of that data. Look at the following frequency graph to see why. Notice that the means, medians, and modes of the data are exactly the same, but yet the data are obviously somehow different. 1
7 Range Reports the lowest and highest numbers. Purely descriptive Affected by outliers Interquartile Range Reports the range of values from the 25 th percentile through the 75 th percentile The median is always the 50 th percentile (so 50% of values fall below the median). The interquartile range contains 50% of the data points (between th percentile) Often reported with medians for ordinal data or with a median to describe continuous data with outliers. Median Percentile
8 Standard Deviation (SD) A unit of measure that has to do with variance around a mean with continuous data Can only be used with normally distributed data Approximately 68% of all data falls within 1 SD of a mean Approximately 95% of all data falls within 2 SD of a mean 15.9 th percentile S 50 th percentile 84.1 percentile 2.3 rd percentile!, / 7.7 TM percentile 2 2"14 /, "13 /, ~ I 13"59 / -4SO -3SD -2SD -1 SD I I 34.13% o 34.13% 68.26% I 95.44o/ /0 13"59% i +1 SD +2SD +3 SD +4SD 99"980/0,,, I J I 4 So if we find that our mean heart rate is 80 in a population and a SD of 10, then 68% of all people will have a heart rate between 70 and 90, and 95% of all people will have a heart rate between 60 and 100. The standard deviation is often used to determine normal lab values. Summary and Conclusions about central tendency and variability Listed below are several tables to help summarize various data types, methods of describing central tendency and measures of variability. Type of Data Example Measure of Central Tendency Measure of Variability Nominal Male v Mode Range? Female Ordinal 5 point Likert Median Interquartile Range Continuous Heart rate Mean Standard Deviation
9 Characteristic Mean Median Mode Useful with continuous data Yes Yes Yes Useful with ordinal data No Yes Yes Useful with nominal data No No Yes Affected by outliers Yes No No Confidence Intervals (CI) Definition The most basic way to think about confidence intervals is to think of them as mathematical predictions about where the real value for the variable exists. We typically use 95% confidence intervals in clinical medicine. A confidence interval simply takes the data we actually have in our sample and tells us how this applies to the population (real world). Examples o I measure the heart rate of 200 active duty soldiers and find that the mean heart rate is 50 with a 95% CI of The correct interpretation of this is that my sample mean is 50 beats per minute but that I am 95% certain that the mean heart rate for the entire population of active duty soldiers (whom I did not study) is between 42 and 61. o I ask 40 parents if they like the experience their child had when receiving intranasal fentanyl as a sedative. We use the following Likert scale (1- hate, 2- dislike, 3- neutral, 4- like, 5- love). We find a median score of 4 and our 95% CI comes back at 3-5. The correct interpretation is that in our sample, half of the parents liked or loved the sedation. In the real world we are 95% certain that 50% of parents will fall somewhere between 3 (neutral) and 5 (love). o I take 30 doctors I know and ask them if they know what a confidence interval is. I report that 40% do know what it is with a 95% CI of 5-75%. The correct interpretation is that in my sample 40% of physicians knew what a confidence interval is, and that I m 95% certain that between 5% and 75% of physicians know what a confidence interval is. Note: You will sometimes see a researcher describe their sample with a 95% CI. So you look at the first table and see they are reporting that they enrolled 65% males with a 95% CI of 55-75%. This is a little confusing if you do not understand confidence intervals. Most people will look at that and think wait, they can t count? Why aren t they 100% certain that they have a sample with 65% males. In reality, they are saying that they have a sample with 65% males, and that they think that in the population they are studying, there are somewhere between 55-75% males. The confidence interval is derived from the sample data but refers to the overall population.
10 Methods There are mathematical formulas to derive confidence intervals for almost any type of data. You can read a very thorough review and explanation with formulas in Chapter 7 of Glantz s book. 3 Here are some key points. Standard Error of the Mean (SEM)- You may see this term used. Essentially this is a mathematical way of taking the standard deviation from a sample, and determining how representative it is of the population. The SEM is then used to calculate a confidence interval. This is only useful for continuous data, but is always used as the example for how to calculate confidence intervals. o Note: The SEM should never be reported for a sample. It is a measure relating to the population, not the sample. Since the SEM is always smaller than the standard deviation of a population, some authors mistakenly use the SEM instead of the standard deviation. You may select various cutoffs of confidence for your confidence interval. We typically use 95% in medicine, but you can choose 90%, 99% or whatever other number you would like. If you want to be 99% sure that your confidence intervals include the population values then the width of the confidence interval will be wider. The sample size directly impacts the width of the confidence interval. If you have a large sample size, your confidence interval tightens or narrow. Example- Below is a graph showing our data from the blood pressure measurements in normal men aged It shows the relationship between SD, SEM, and the confidence interval. 2 ~XX.K, x x21 xn <~3.K; I I l l I I I I I I I I SD = 9.37 SD = 9.37 SEM = 1,71 SEM = 1,71 I 1 I 95% CI l MEAN
11 Example: The following Table 2 shows how changing your confidence level, or your sample size affects the reporting of the confidence intervals. TABLE 4. Effect of confidence level and sample size on confidence interval width Calculation of CIs for Data Presented in Table 2 C1(%) SD n SEM CI _ Effect of Sample Size on CI for Data With A Mean of 120 and a SD of C1(%) SD n SEM CI _ _ ± Using Confidence Intervals Confidence intervals can be used in two different ways: 1. Confidence intervals can be purely descriptive. For example, according to an analysis of the Canadian CT Head Rule 4 the sensitivity of the rule for finding neurosurgical lesions was reported as 100% 95CI ( ). The authors are saying their sample estimate is a 100% sensitivity and they are 95% certain the true test sensitivity is somewhere between 64.6% and 100%. 2. Confidence intervals can also be used to make inferences. We will discuss classical hypothesis testing in the future but a brief summary here will make the idea more clear. a. Classical hypothesis testing produces a P- value and tells the researcher/reader that the observed difference is STATISTICALLY significant. It says nothing of the magnitude of that difference. b. If instead a researcher reports the actual difference between two numbers and gives a 95% confidence interval then the magnitude of the effect is obvious. In addition, if the confidence interval crosses the identity point, then the results are not STATISTICALLY significant i. The identity point (also called the null point, or no effect point) is the number that means there is no effect. If you are subtracting two results then obviously the null point is zero. This is why it is commonly taught that if a confidence interval crosses zero it is not significant. Remember however that for a ratio, the null point is actually 1. A ratio of 1 is meaningless. c. The summary of this is that you find the difference between two numbers and report the confidence interval around it. If that confidence interval contains a number that is statistically not significant (the null point) or clinically not significant (2BPM reduction in heart rate) then you can reject the results.
12 Confidence Intervals and Hypothesis Testing (Examples): Let us pretend we have a new drug called RATE- A- BLATE that is supposed to lower the heart rate rapidly and without any blood pressure side effects in people with atrial fibrillation and rapid- ventricular response. Using this drug, we can look at the various ways confidence intervals can be used and how they are impacted by certain factors. 1. In the first experiment, we take 10 patients and give them RATE- A- BLATE (RAB). We find that in the sample RAB drops the heart rate approximately 30 beats per minute. After the math, we report the results as % CI (- 50-2) So based on this sample we are 95% certain that the drug may lower the heart rate on average 50 BPM or raise it 2BPM. Looking at this data you would think, well I do not want to give a drug that may raise the heart rate a few beats, but the possible values for this drug (negative 50 through 2) are predominantly on the heart rate lowering side. If another study was performed with larger numbers we might get a narrower range of possible values to examine. 2. In the second experiment we add 40 more patients in atrial fibrillation with rapid ventricular response so that we are now studying a total of 50 patients. Our RAB study finds a mean change in heart rate of %CI ( ) This time we can say that we are 95% certain our drug lowers heart rate between 20 and 40 beats per minute. The increase in sample size has narrowed our confidence interval. Now we are likely to accept that the drug is most certainly effective at lowering heart rate. If we had just reported the sample means in both cases, then we would all just think that RAB always lowers heart rate by 30bpm. Do not trust a study that does that. 3. Now let us move on to study RAB versus diltiazem. Assume we run a well- designed double- blind, randomized control trial and determine that once
13 again RAB gives us a mean change in heart rate of %CI ( ). We also find that diltiazem causes a mean change in heart rate of - 20 ( ) RAB Diltiazem Notice that initially we are simply describing the two heart rate lowering properties of the drugs. Looking at the data there appears to be a lot of overlap in the possible mean heart rate lowering effects of the drug. This may not be statistically significant. So we take our mean heart rate changes from the two drugs and compare them using the Student s T test (more in another lecture) and find that RAB lowers the heart rate by 10 bpm more than diltiazem but with a P- value of >.05 so it is not statistically significant. Another way (some argue a better way) is to just show the mean difference with a confidence interval. We get a %CI (- 35-5) Mean difference in heart rate between diltiazem and RAB This is the same information simply presented in a different fashion. Instead of just saying there was a - 10BPM difference between the two drugs and it was not statistically significant, we can see how close it came to being statistically significant. In this graph we are plotting the difference between two numbers so a zero is not statistically significant. But we and realize that it was almost statistically significant. We plan another study. 4. Once again we plan to increase our sample size to see if the mean difference between the two drugs is statistically significant. We enroll over 5000 patients into the same double- blind RCT comparing diltiazem to RAB. We end up this time with results indicating that the mean difference between RAB and diltiazem is % CI ( ).
14 Mean difference in heart rate between diltiazem and RAB This time we did it. The confidence interval is very narrow (only 1 BPM on either side of our sample estimate of - 10BPM). Plus it does not cross zero (and sure enough the P- value will be <.05). It looks like with enough patients; we have finally shown that RAB has a statistically significant reduction in heart rate over diltiazem of about 10BPM (to be exact it could be anywhere from 9 to 11). Example Wrap- Up: We can make the following conclusions about RAB based on our studies: RAB seems to lower heart rate in atrial fibrillation with rapid ventricular response by an average of about 30 BPM but could lower it anywhere from 20 to 40 BPM compared to doing nothing. RAB seems to lower heart rate in atrial fibrillation with rapid ventricular response a little better than diltiazem. It lowers the heart rate about 10 BPM but could lower it anywhere from 9 to 11 BPM. But what does this tell us clinically? In the end these are only STATISTICALLY significant numbers. Remember that CLINICAL significance is not the same thing. Consider the following situations: RAB is cheaper, and does not drop blood pressure like diltiazem. You think that the data supports its use. RAB turns out to be more expensive, and does not drop blood pressure like diltiazem. You decide to only use RAB in normotensive or hypotensive patients and save costs by using diltiazem when you have a hypertensive patient. RAB turns out to be less expensive, and does not drop blood pressure like diltiazem; however it is associated with causing a myocardial infarction about 10% of the time it is given. You decide that those extra 10BPM over diltiazem (as well as the cost and BP stable effects) are not worth the consequences. RAB turns out to be less expensive, and does not drop blood pressure like diltiazem. You decide however that a 10 BPM difference over diltiazem is not really that clinically significant and you are more used to diltiazem so you continue to use it instead of RAB. All of those conclusions are possible and individualized based on the data. If you look at a confidence interval, think about all of the values in the range as the real world value and think it is worth your time, then use that intervention. If not, drop it.
15 Resources: I highly recommend the following two references: 2,5 For a very complete description of all things related to confidence intervals refer to. 3 If you want to plan a study and derive the sample size based on a certain confidence interval you wish to obtain, go to: If you have data and want to calculate a confidence interval around it go to: References: 1. Glaser AN. High- Yield Biostatistics. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; Gaddis GM, Gaddis ML. Introduction to biostatistics: Part 2, Descriptive statistics. Annals of emergency medicine. Mar 1990;19(3): Glantz SA. Primer of Biostatistics. 6th ed. New York: McGraw- Hill; Smits M, Dippel DW, de Haan GG, et al. External validation of the Canadian CT Head Rule and the New Orleans Criteria for CT scanning in patients with minor head injury. JAMA : the journal of the American Medical Association. Sep ;294(12): Gaddis ML, Gaddis GM. Introduction to biostatistics: Part 1, Basic concepts. Annals of emergency medicine. Jan 1990;19(1):86-89.
Descriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationCA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction
CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous
More informationThe right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
More informationWeek 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
More informationWeek 3&4: Z tables and the Sampling Distribution of X
Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal
More information"Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1
BASIC STATISTICAL THEORY / 3 CHAPTER ONE BASIC STATISTICAL THEORY "Statistical methods are objective methods by which group trends are abstracted from observations on many separate individuals." 1 Medicine
More informationLevels of measurement in psychological research:
Research Skills: Levels of Measurement. Graham Hole, February 2011 Page 1 Levels of measurement in psychological research: Psychology is a science. As such it generally involves objective measurement of
More informationLecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationMeans, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationBiostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
More informationSCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
More informationDescriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research
More informationDescriptive statistics; Correlation and regression
Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human
More informationWHAT IS A JOURNAL CLUB?
WHAT IS A JOURNAL CLUB? With its September 2002 issue, the American Journal of Critical Care debuts a new feature, the AJCC Journal Club. Each issue of the journal will now feature an AJCC Journal Club
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationCHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13
COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance,
More informationStatistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
More informationIntroduction; Descriptive & Univariate Statistics
Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of
More informationStat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015
Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationChapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
More informationIntroduction to Statistics for Psychology. Quantitative Methods for Human Sciences
Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html
More informationStandard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
More informationMeasurement and Measurement Scales
Measurement and Measurement Scales Measurement is the foundation of any scientific investigation Everything we do begins with the measurement of whatever it is we want to study Definition: measurement
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationLesson 4 Measures of Central Tendency
Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
More informationUNIVERSITY OF NAIROBI
UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER
More informationThe Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces
The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces Or: How I Learned to Stop Worrying and Love the Ball Comment [DP1]: Titles, headings, and figure/table captions
More informationStatistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
More informationMeasures of Central Tendency and Variability: Summarizing your Data for Others
Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
More informationDESCRIPTIVE STATISTICS & DATA PRESENTATION*
Level 1 Level 2 Level 3 Level 4 0 0 0 0 evel 1 evel 2 evel 3 Level 4 DESCRIPTIVE STATISTICS & DATA PRESENTATION* Created for Psychology 41, Research Methods by Barbara Sommer, PhD Psychology Department
More informationMidterm Review Problems
Midterm Review Problems October 19, 2013 1. Consider the following research title: Cooperation among nursery school children under two types of instruction. In this study, what is the independent variable?
More informationHypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes
More informationSENSITIVITY ANALYSIS AND INFERENCE. Lecture 12
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationAnalysis and Interpretation of Clinical Trials. How to conclude?
www.eurordis.org Analysis and Interpretation of Clinical Trials How to conclude? Statistical Issues Dr Ferran Torres Unitat de Suport en Estadística i Metodología - USEM Statistics and Methodology Support
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationTwo-sample inference: Continuous data
Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As
More informationHYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...
HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationLecture 2: Types of Variables
2typesofvariables.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lecture 2: Types of Variables Recap what we talked about last time Recall how we study social world using populations and samples. Recall
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationCharacteristics of Binomial Distributions
Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation
More informationIndependent samples t-test. Dr. Tom Pierce Radford University
Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of
More informationDescribing, Exploring, and Comparing Data
24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter
More informationDef: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.
Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationLAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationHYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...
HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men
More informationBasic research methods. Basic research methods. Question: BRM.2. Question: BRM.1
BRM.1 The proportion of individuals with a particular disease who die from that condition is called... BRM.2 This study design examines factors that may contribute to a condition by comparing subjects
More informationSTATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
More informationNorthumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationc. Construct a boxplot for the data. Write a one sentence interpretation of your graph.
MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?
More informationChapter 4. Probability and Probability Distributions
Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More informationName: Date: Use the following to answer questions 3-4:
Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin
More informationQUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
More informationMeasurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement
Measurement & Data Analysis Overview of Measurement. Variability & Measurement Error.. Descriptive vs. Inferential Statistics. Descriptive Statistics. Distributions. Standardized Scores. Graphing Data.
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationStatistics for Sports Medicine
Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach
More informationThe Importance of Statistics Education
The Importance of Statistics Education Professor Jessica Utts Department of Statistics University of California, Irvine http://www.ics.uci.edu/~jutts jutts@uci.edu Outline of Talk What is Statistics? Four
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationIntroduction to Statistics and Quantitative Research Methods
Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationAP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationHOW TO WRITE A LABORATORY REPORT
HOW TO WRITE A LABORATORY REPORT Pete Bibby Dept of Psychology 1 About Laboratory Reports The writing of laboratory reports is an essential part of the practical course One function of this course is to
More informationPrinciples of Hypothesis Testing for Public Health
Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationAP Statistics Solutions to Packet 2
AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that
More informationSTA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance
Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis
More informationLesson 17: Margin of Error When Estimating a Population Proportion
Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information
More information5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.
The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution
More informationHypothesis testing. c 2014, Jeffrey S. Simonoff 1
Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there
More informationwww.rmsolutions.net R&M Solutons
Ahmed Hassouna, MD Professor of cardiovascular surgery, Ain-Shams University, EGYPT. Diploma of medical statistics and clinical trial, Paris 6 university, Paris. 1A- Choose the best answer The duration
More informationLecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
More informationWISE Power Tutorial All Exercises
ame Date Class WISE Power Tutorial All Exercises Power: The B.E.A.. Mnemonic Four interrelated features of power can be summarized using BEA B Beta Error (Power = 1 Beta Error): Beta error (or Type II
More informationPermutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
More informationCONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont
CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationHow To Test For Significance On A Data Set
Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.
More informationInclusion and Exclusion Criteria
Inclusion and Exclusion Criteria Inclusion criteria = attributes of subjects that are essential for their selection to participate. Inclusion criteria function remove the influence of specific confounding
More informationTesting Research and Statistical Hypotheses
Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you
More informationChapter 2: Descriptive Statistics
Chapter 2: Descriptive Statistics **This chapter corresponds to chapters 2 ( Means to an End ) and 3 ( Vive la Difference ) of your book. What it is: Descriptive statistics are values that describe the
More informationIntroduction to Hypothesis Testing
I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true
More informationElementary Statistics
Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,
More information