Descriptive statistics; Correlation and regression

Size: px
Start display at page:

Download "Descriptive statistics; Correlation and regression"

Transcription

1 Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59

2 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human beings are not good at sifting through large streams of data; we understand data much better when it is summarized for us We often display summary statistics in one of two ways: tables and figures Tables of summary statistics are very common (we have already seen several in this course) nearly all published studies in medicine and public health contain a table of basic summary statistics describing their sample However, figures are usually better than tables in terms of distilling clear trends from large amounts of information Patrick Breheny STA 580: Biostatistics I 2/59

3 Types of data Descriptive statistics Histograms Numerical summaries Percentiles The best way to summarize and present data depends on the type of data There are two main types of data: Categorical data: Data that takes on distinct values (i.e., it falls into categories), such as sex (male/female), alive/dead, blood type (A/B/AB/O), stages of cancer Continuous data: Data that takes on a spectrum of fractional values, such as time, age, temperature, cholesterol levels The distinction between categorical (also called discrete) and continuous data is fundamental and we will return to it throughout the course Patrick Breheny STA 580: Biostatistics I 3/59

4 Categorical data Descriptive statistics Histograms Numerical summaries Percentiles Summarizing categorical data is pretty straightforward you just count how many times each category occurs Instead of counts, we are often interested in percents A percent is a special type of rate, a rate per hundred Counts (also called frequencies), percents, and rates are the three basic summary statistics for categorical data, and are often displayed in tables or bar charts, as we saw in lab Patrick Breheny STA 580: Biostatistics I 4/59

5 Continuous data Descriptive statistics Histograms Numerical summaries Percentiles For continuous data, instead of a finite number of categories, observations can take on a potentially infinite number of values Summarizing continuous data is therefore much less straightforward To introduce concepts for describing and summarizing continuous data, we will look at data on infant mortality rates for 111 nations on three continents: Africa, Asia, and Europe Patrick Breheny STA 580: Biostatistics I 5/59

6 Histograms Descriptive statistics Histograms Numerical summaries Percentiles One very useful way of looking at continuous data is with histograms To make a histogram, we divide a continuous axis into equally spaced intervals, then count and plot the number of observations that fall into each interval This allows us to see how our data points are distributed Patrick Breheny STA 580: Biostatistics I 6/59

7 Histograms Numerical summaries Percentiles Histogram of European infant mortality rates Count Europe Asia Africa Deaths per 1,000 births Patrick Breheny STA 580: Biostatistics I 7/59

8 Summarizing continuous data Histograms Numerical summaries Percentiles As we can see, continuous data comes in a variety of shapes Nothing can replace seeing the picture, but if we had to summarize our data using just one or two numbers, how should we go about doing it? The aspect of the histogram we are usually most interested in is, Where is its center? This is typically represented by the average Patrick Breheny STA 580: Biostatistics I 8/59

9 The average and the histogram Histograms Numerical summaries Percentiles The average represents the center of mass of the histogram: Count Europe Asia Africa Deaths per 1,000 births Patrick Breheny STA 580: Biostatistics I 9/59

10 Spread Descriptive statistics Histograms Numerical summaries Percentiles The second most important bit of information from the histogram to summarize is, How spread out are the observations around the center? This is most typically represented by the standard deviation To understand how standard deviation works, let s return to our small example with the numbers {4, 5, 1, 9} Each of these numbers deviates from the mean by some amount: = = = = 4.25 How should we measure the overall size of these deviations? Patrick Breheny STA 580: Biostatistics I 10/59

11 Root-mean-square Descriptive statistics Histograms Numerical summaries Percentiles Taking their mean isn t going to tell us anything (why not?) We could take the average of their absolute values: = 2.25 But it turns out that for a variety of reasons, the root-mean-square works better as a measure of overall size: ( 0.75) 2 + (0.25) 2 + ( 3.75) 2 + (4.25) Patrick Breheny STA 580: Biostatistics I 11/59

12 The standard deviation Histograms Numerical summaries Percentiles The formula for the standard deviation is n i=1 s = (x i x) 2 n 1 Wait a minute; why n 1? The reason (which we will discuss further in a few weeks) is that dividing by n turns out to underestimate the true standard deviation Dividing by n 1 instead of n corrects some of that bias The standard deviation of {4, 5, 1, 9} is 3.30 (recall that we got 2.86 if we divide by n) Patrick Breheny STA 580: Biostatistics I 12/59

13 Meaning of the standard deviation Histograms Numerical summaries Percentiles The standard deviation (SD) describes how far away numbers in a list are from their average The SD is often used as a plus or minus number, as in adult women tend to be about 5 4, plus or minus 3 inches Most numbers (roughly 68%) will be within 1 SD away from the average Very few entries (roughly 5%) will be more than 2 SD away from the average This rule of thumb works very well for a wide variety of data; we ll discuss where these numbers come from in a few weeks Patrick Breheny STA 580: Biostatistics I 13/59

14 Histograms Numerical summaries Percentiles Standard deviation and the histogram Background areas within 1 SD of the mean are shaded: Europe Asia Count Africa Deaths per 1,000 births Patrick Breheny STA 580: Biostatistics I 14/59

15 The 68%/95% rule in action Histograms Numerical summaries Percentiles % of observations within Continent One SD Two SDs Europe Asia Africa Patrick Breheny STA 580: Biostatistics I 15/59

16 Summaries can be misleading! Histograms Numerical summaries Percentiles All of the following have the same mean and standard deviation: Frequency Frequency Patrick Breheny STA 580: Biostatistics I 16/59

17 Percentiles Descriptive statistics Histograms Numerical summaries Percentiles The average and standard deviation are not the only ways to summarize continuous data Another type of summary is the percentile A number is the 25th percentile of a list of numbers if it is bigger than 25% of the numbers in the list The 50th percentile is given a special name: the median The median, like the mean, can be used to answer the question, Where is the center of the histogram? Patrick Breheny STA 580: Biostatistics I 17/59

18 Median vs. mean Descriptive statistics Histograms Numerical summaries Percentiles The dotted line is the median, the solid line is the mean: Europe Asia Count Africa Deaths per 1,000 births Patrick Breheny STA 580: Biostatistics I 18/59

19 Skew Descriptive statistics Histograms Numerical summaries Percentiles Note that the histogram for Europe is not symmetric: the tail of the distribution extends further to the right than it does to the left Such distributions are called skewed The distribution of infant mortality rates in Europe is said to be right skewed or skewed to the right For asymmetric/skewed data, the mean and the median will be different Patrick Breheny STA 580: Biostatistics I 19/59

20 Hypothetical example Descriptive statistics Histograms Numerical summaries Percentiles Azerbaijan had the highest infant mortality rate in Europe at 37 What if, instead of 37, it was 200? Mean Median Real Hypothetical The mean is now higher than 72% of the countries Note that the average is sensitive to extreme values, while the median is not; statisticians say that the median is robust to the presence of outlying observations Patrick Breheny STA 580: Biostatistics I 20/59

21 Box plots Descriptive statistics Histograms Numerical summaries Percentiles Quantiles are used in a type of graphical summary called a box plot Box plots are constructed as follows: Calculate the three quartiles (the 25th, 50th, and 75th) Draw a box bounded by the first and third quartiles and with a line in the middle for the median Call any observation that is extremely far from the box an outlier and plot the observations using a special symbol (this is somewhat arbitrary and different rules exist for defining outliers) Draw a line from the top of the box to the highest observation that is not an outlier; likewide for the lowest non-outlier Patrick Breheny STA 580: Biostatistics I 21/59

22 Histograms Numerical summaries Percentiles Box plots of the infant mortality rate data Africa Asia Europe Patrick Breheny STA 580: Biostatistics I 22/59

23 Descriptive statistics Box plots are a way to examine the relationship between a continuous variable and a categorical variable In lab, we saw bar charts as a way of comparing two (or more) categorical variables Now, we will discuss how to summarize and illustrate the relationship between two continuous variables Patrick Breheny STA 580: Biostatistics I 23/59

24 Pearson s height data Descriptive statistics Statisticians in Victorian England were fascinated by the idea of quantifying hereditary influences Two of the pioneers of modern statistics, the Victorian Englishmen Francis Galton and Karl Pearson were quite passionate about this topic In pursuit of this goal, they measured the heights of 1,078 fathers and their (fully grown) sons Patrick Breheny STA 580: Biostatistics I 24/59

25 The scatter plot Descriptive statistics As we ve mentioned, it is important to plot continuous data this is especially true when you have two continuous variables and you re interested in the relationship between them The most common way to plot the relationship between two continuous variables is the two-way scatter plot Scatter plots are created by setting up two continuous axes, then creating a dot for every pair of observations Patrick Breheny STA 580: Biostatistics I 25/59

26 Scatter plot of Pearson s height data Father's height (Inches) Son's height (Inches) Patrick Breheny STA 580: Biostatistics I 26/59

27 Observations about the scatter plot Taller fathers tend to have taller sons The scatter plot shows how strong this association is there is a tendency, but there are plenty of exceptions Patrick Breheny STA 580: Biostatistics I 27/59

28 Standardizing a variable Before we summarize this relationship numerically, we must discuss the idea of standardizing a variable In Pearson s height data, one of the sons measured 63.2 inches tall Because the average height of the sons in the sample was 68.7 inches, another way of describing his height is to say that he was 5.5 inches below average Furthermore, because the standard deviation of the sons was 2.8 inches, yet another way of describing his height is to say that he was 1.9 standard deviations below the average Patrick Breheny STA 580: Biostatistics I 28/59

29 The standardization formula Putting this into a formula, we standardize an observation x i by subtracting the average and dividing by the standard deviation: z i = x i x SD x where x and SD x are the mean and standard deviation of the variable x One virtue of standardizing a variable is interpretability: If someone tells you that the concentration of urea in your blood is 50 mg/dl, that likely means nothing to you On the other hand, if you are told that the concentration of urea in your blood is 4 standard deviations above average, you can immediately recognize this as a very high value Patrick Breheny STA 580: Biostatistics I 29/59

30 More benefits of standardization If you standardize all of the observations in your sample, the resulting variable will be standardized in the sense of having mean 0 and standard deviation 1 Standardization therefore brings all variables onto a common scale regardless of whether the heights were originally measured in inches, centimeters, or miles, the standardized heights will be identical As we will see momentarily, this allows us to study the relationship between two continuous variables without worrying about the scale of measurement The concept behind standardization taking an observation, then subtracting the expected value and dividing by the variability is fundamental to statistics and we will variations on this idea many times in this course Patrick Breheny STA 580: Biostatistics I 30/59

31 The correlation coefficient The summary statistic for describing the strength of association between two variables is the correlation coefficient, denoted by r (and sometimes called Pearson s correlation coefficient) The correlation coefficient is always between 1 (perfect positive correlation) and -1 (perfect negative correlation), and can take on any value in between A positive correlation means that as one variable increases, the other one tends to increase as well A negative correlation means that as one variable increases, the other one tends to decrease Patrick Breheny STA 580: Biostatistics I 31/59

32 Calculating the correlation coefficient The correlation coefficient is simply the average of the products of the standardized variables In mathematical notation, r = n i=1 zx i zy i n 1 where z x i and z y i are the standardized values of x and y Note: The n versus n 1 issue has nothing to do with correlation; however, if n 1 is used when standardizing, it must be used again here, Patrick Breheny STA 580: Biostatistics I 32/59

33 Meaning behind the correlation coefficient formula Father's height (Inches) Son's height (Inches) Patrick Breheny STA 580: Biostatistics I 33/59

34 The correlation coefficient and the scatter plot 0.88 y x 0.34 y x 0.02 y x 0.29 y x 0.91 y x Patrick Breheny STA 580: Biostatistics I 34/59

35 More about the correlation coefficient Because the correlation coefficient is based on standardized variables, it does not depend on the units of measurement Thus, the correlation between father s and son s heights would be 0.5 even if the father s height was measured in inches and the son s in centimeters Furthermore, the correlation between x and y is the same as the correlation between y and x Patrick Breheny STA 580: Biostatistics I 35/59

36 Interpreting the correlation coefficient The correlation between heights of identical twins is around 0.95 The correlation between income and education in the United States is about 0.44 The correlation between a woman s education and the number of children she has is about -0.2 When concrete physical laws determine the relationship between two variables, their correlation can exceed 0.9 In the social sciences, this is rare correlations of 0.3 to 0.7 are considered quite strong in these fields Patrick Breheny STA 580: Biostatistics I 36/59

37 Numerical summaries can be misleading! From Cook & Swayne s Interactive and Dynamic Graphics for Data Analysis: Miscellaneous Topics is negative rather than positive. The plot at bottom right shows two variables with some positive linear dependence, but the obvious non-linear dependence is more interesting Y X Y X Y X Y X Fig Studying dependence between X and Y. All four pairs of variables have correlation approximately equal to 0.7, but they all have very different patterns. Only the top left plot shows two variables matching a dependence modeled by correlation. Patrick Breheny STA 580: Biostatistics I 37/59

38 Ecological correlations Descriptive statistics Epidemiologists often look at the correlation between two variables at the ecological level say, the correlation between cigarette consumption and lung cancer deaths per capita However, people smoke and get cancer, not countries These correlations have the potential to be misleading The reason is that by replacing individual measurements by the averages, you eliminate a lot of the variability that is present at the individual level and obtain a higher correlation than there really is Patrick Breheny STA 580: Biostatistics I 38/59

39 Fat in the diet and cancer From an article by Carroll in Cancer Research (1975): Patrick Breheny STA 580: Biostatistics I 39/59

40 NHANES Descriptive statistics and correlation The regression fallacy Every few years, the CDC conducts a huge survey of randomly chosen Americans called the National Health and Nutrition Examination Survey (NHANES) Hundreds of variables are measured on these individuals: Demographic variables like age, education, and income Physiological variables like height, weight, blood pressure, and cholesterol levels Dietary habits Disease status Lots more: everything from cavities to sexual behavior Patrick Breheny STA 580: Biostatistics I 40/59

41 Predicting weight from height and correlation The regression fallacy For the 2,649 adult women in the NHANES data set: average height = 5 feet, 3.5 inches average weight = 166 pounds SD(height) = 2.75 inches SD(weight) = 44.5 pounds correlation between height and weight = 0.3 Suppose you were asked to predict a person s weight from their height First, an easy case: suppose the woman was 5 feet, 3.5 inches Since the woman is average height, we have no reason to guess anything other than the average weight, 166 pounds Patrick Breheny STA 580: Biostatistics I 41/59

42 and correlation The regression fallacy Predicting weight from height (cont d) How about a woman who is 5 6? She s a bit taller than average, so she probably weighs a bit more than average But how much more? To put the question a different way, she is almost one standard deviation above the average height; how many standard deviations above the average weight should we expect her to be? Patrick Breheny STA 580: Biostatistics I 42/59

43 Using the correlation coefficient and correlation The regression fallacy The answer turns out to depend on the correlation coefficient Since the correlation coefficient for this data is 0.3, we would expect the woman to be 0.3 standard deviations above the mean weight, or (44.5) = 179 pounds Patrick Breheny STA 580: Biostatistics I 43/59

44 and correlation The regression fallacy Graphical interpretation Height (inches) Weight (lbs) Patrick Breheny STA 580: Biostatistics I 44/59

45 The regression line Descriptive statistics and correlation The regression fallacy This line is called the regression line It tells you, for any height, the average weight for women of that height Here, we were trying to predict one variable based on one other variable; if we were trying to predict weight based on height, dietary habits, and cholesterol levels, or trying to study the relationship between cholesterol and weight while controlling for height, then this is called multiple regression Multiple regression is beyond the scope of this course, but is a major topic in Biostatistics II Patrick Breheny STA 580: Biostatistics I 45/59

46 The equation of the regression line and correlation The regression fallacy Like all lines, the regression line may be represented by the equation y = α + βx, where α is the intercept and β is the slope For the height/weight NHANES data, the intercept is -137 pounds and the slope is 4.8 pounds/inch Patrick Breheny STA 580: Biostatistics I 46/59

47 β vs. r Descriptive statistics and correlation The regression fallacy Note the similarity and the difference between the slope of the regression line (β) and the correlation coefficient (r): The correlation coefficient says that if you go up in height by one standard deviation, you can expect to go up in weight by r = 0.3 standard deviations The slope of the regression line tells you that if you go up in height by one inch, you can expect to go up in weight by β = 4.8 pounds Essentially, they tell you the same thing, one in terms of standard units, the other in terms of actual units Therefore, if you know one, you can always figure out the other simply by changing units (which here involves multiplying by the ratio of the standard deviations) Patrick Breheny STA 580: Biostatistics I 47/59

48 β vs. r (cont d) Descriptive statistics and correlation The regression fallacy Suppose a woman s height is increased one inch; what do we expect to happen to her weight? 1 inch = 1/2.75 SDs An increase of 1/2.75 SDs in height means an increase in 0.3/2.75 SDs in weight 0.3/2.75 SDs = 0.3(44.5/2.75) = 4.8 pounds Patrick Breheny STA 580: Biostatistics I 48/59

49 β vs. r (cont d) Descriptive statistics and correlation The regression fallacy Suppose a woman s height is increased by one SD; what do we expect to happen to her weight? 1 SD = 2.75 inches An increase of 2.75 inches in height means an increase in 4.85(2.75) pounds in weight 4.85(2.75) pounds = 4.85(2.75)/44.5 = 0.3 SDs Patrick Breheny STA 580: Biostatistics I 49/59

50 There are two regression lines and correlation The regression fallacy We said that the correlation between weight and height is the same as the correlation between height and weight This is not true for regression The regression of weight on height will give a different answer than the regression of height on weight Patrick Breheny STA 580: Biostatistics I 50/59

51 and correlation The regression fallacy The two regression lines Height (inches) Weight (lbs) Patrick Breheny STA 580: Biostatistics I 51/59

52 and correlation The regression fallacy and root-mean-square error The amount by which the regression prediction is off is called the residual One way of looking at the quality of our predictions is by measuring the size of the residuals Out of all possible lines that you could draw, which one has the lowest possible root-mean-square of the residuals? The regression line Because of this, the regression line is also called the least squares fit Patrick Breheny STA 580: Biostatistics I 52/59

53 Why only r standard deviations? and correlation The regression fallacy Only moving r standard deviations away from the average may be counterintuitive; if height goes up by one SD, shouldn t weight too? Here s an example that I hope will help clarify this concept: A student is taking her first course in statistics, and we want to predict whether she will do well in the course or not Suppose we know that last semester, she got an A in math Now suppose that we know that last semester, she got an A in pottery These two pieces of information are not equally informative for predicting how well she will do in her statistics class We need to balance our baseline guess (that she will receive an average grade) with this new piece of information, and the correlation coefficient tells us how much weight the new information should carry Patrick Breheny STA 580: Biostatistics I 53/59

54 and correlation The regression fallacy Fathers and sons again Father's height (Inches) Son's height (Inches) Patrick Breheny STA 580: Biostatistics I 54/59

55 How regression got its name and correlation The regression fallacy Because the correlation coefficient is always less than 1, the regression line will always lie beneath the x goes up by 1 SD, y goes up by 1 SD rule Galton called this phenomenon regression to mediocrity, and this is where regression gets its name People frequently read too much into the regression effect this is called the regression fallacy Patrick Breheny STA 580: Biostatistics I 55/59

56 The regression fallacy, example #1 and correlation The regression fallacy A group of subjects are recruited into a study Their initial blood pressure is taken, then they take an herbal supplement for a month, and their blood pressure is taken again The mean blood pressure was the same, both before and after However, subjects with high blood pressure tended to have lower blood pressure one month later, and subjects with low blood pressure tended to have higher blood pressure later Does this supplement act to stabilize blood pressure? Patrick Breheny STA 580: Biostatistics I 56/59

57 and correlation The regression fallacy Why the does regression to the mean happen? Not really; the same effect would occur if they took placebo Why? Consider a person with a blood pressure 2 SDs above average It s possible that the person has a true blood pressure 1 SD above average, but happened to have a high first measurement; it s also possible that the person has a true blood pressure 3 SDs above average, but happened to have a low first measurement However, the first explanation is much more likely Patrick Breheny STA 580: Biostatistics I 57/59

58 The regression fallacy, example #2 and correlation The regression fallacy In professional sports, some first-year players have outstanding years and win Rookie of the Year awards They often fail to live up to expectations in their second years Writers call this the sophomore slump, and come up with elaborate explanations for it Patrick Breheny STA 580: Biostatistics I 58/59

59 and correlation The regression fallacy The regression fallacy, example #3 An instructor standardizes her midterm and final so that the class average is 50 and the SD is 10 on both tests She has taught this class many times and the correlation between the tests is always around 0.5 This year, she decides to do something different she takes the 10 students with the lowest scores on the midterm and gives them special tutoring On the final, all ten students score above 50; can this be explained by the regression effect? No! The regression effect can only take these students closer to the average; the fact that they all score above average indicates that the tutoring really did work Patrick Breheny STA 580: Biostatistics I 59/59

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches) PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Frequency Distributions

Frequency Distributions Descriptive Statistics Dr. Tom Pierce Department of Psychology Radford University Descriptive statistics comprise a collection of techniques for better understanding what the people in a group look like

More information

Chapter 4: Average and standard deviation

Chapter 4: Average and standard deviation Chapter 4: Average and standard deviation Context................................................................... 2 Average vs. median 3 Average.................................................................

More information

Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

More information

The Normal Distribution

The Normal Distribution Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces Or: How I Learned to Stop Worrying and Love the Ball Comment [DP1]: Titles, headings, and figure/table captions

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey): MATH 1040 REVIEW (EXAM I) Chapter 1 1. For the studies described, identify the population, sample, population parameters, and sample statistics: a) The Gallup Organization conducted a poll of 1003 Americans

More information

Grade. 8 th Grade. 2011 SM C Curriculum

Grade. 8 th Grade. 2011 SM C Curriculum OREGON FOCUS ON MATH OAKS HOT TOPICS TEST PREPARATION WORKBOOK 200-204 8 th Grade TO BE USED AS A SUPPLEMENT FOR THE OREGON FOCUS ON MATH MIDDLE SCHOOL CURRICULUM FOR THE 200-204 SCHOOL YEARS WHEN THE

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Describing Relationships between Two Variables

Describing Relationships between Two Variables Describing Relationships between Two Variables Up until now, we have dealt, for the most part, with just one variable at a time. This variable, when measured on many different subjects or objects, took

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

Measurement with Ratios

Measurement with Ratios Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical

More information

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. MATH 3/GRACEY PRACTICE EXAM/CHAPTERS 2-3 Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) The frequency distribution

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

1.6 The Order of Operations

1.6 The Order of Operations 1.6 The Order of Operations Contents: Operations Grouping Symbols The Order of Operations Exponents and Negative Numbers Negative Square Roots Square Root of a Negative Number Order of Operations and Negative

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

6 3 The Standard Normal Distribution

6 3 The Standard Normal Distribution 290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

A Picture Really Is Worth a Thousand Words

A Picture Really Is Worth a Thousand Words 4 A Picture Really Is Worth a Thousand Words Difficulty Scale (pretty easy, but not a cinch) What you ll learn about in this chapter Why a picture is really worth a thousand words How to create a histogram

More information

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In Xi i= 1 fancy

More information

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction Binghamton High School Rev.9/21/05 Math 1 September What is the unknown? Model relationships by using Fundamental skills of 2005 variables as a shorthand way Algebra Why do we use variables? What is a

More information

AP * Statistics Review. Descriptive Statistics

AP * Statistics Review. Descriptive Statistics AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph. MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014 STAB22H3 Statistics I Duration: 1 hour and 45 minutes Last Name: First Name: Student number: Aids

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis. Psychology 3256 Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175) Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

More information

Continuing, we get (note that unlike the text suggestion, I end the final interval with 95, not 85.

Continuing, we get (note that unlike the text suggestion, I end the final interval with 95, not 85. Chapter 3 -- Review Exercises Statistics 1040 -- Dr. McGahagan Problem 1. Histogram of male heights. Shaded area shows percentage of men between 66 and 72 inches in height; this translates as "66 inches

More information

Introduction. Chapter 1. 1.1 Before you start. 1.1.1 Formulation

Introduction. Chapter 1. 1.1 Before you start. 1.1.1 Formulation Chapter 1 Introduction 1.1 Before you start Statistics starts with a problem, continues with the collection of data, proceeds with the data analysis and finishes with conclusions. It is a common mistake

More information

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

More information

AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

More information

Mind on Statistics. Chapter 2

Mind on Statistics. Chapter 2 Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

More information

Statistics E100 Fall 2013 Practice Midterm I - A Solutions

Statistics E100 Fall 2013 Practice Midterm I - A Solutions STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 1 OF 5 Statistics E100 Fall 2013 Practice Midterm I - A Solutions 1. (16 points total) Below is the histogram for the number of medals won

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Chapter 9 Descriptive Statistics for Bivariate Data

Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)

More information

COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

Welcome to Basic Math Skills!

Welcome to Basic Math Skills! Basic Math Skills Welcome to Basic Math Skills! Most students find the math sections to be the most difficult. Basic Math Skills was designed to give you a refresher on the basics of math. There are lots

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) All but one of these statements contain a mistake. Which could be true? A) There is a correlation

More information

PRACTICE PROBLEMS FOR BIOSTATISTICS

PRACTICE PROBLEMS FOR BIOSTATISTICS PRACTICE PROBLEMS FOR BIOSTATISTICS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION 1. The duration of time from first exposure to HIV infection to AIDS diagnosis is called the incubation period.

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Chapter 1 Review 1. As part of survey of college students a researcher is interested in the variable class standing. She records a 1 if the student is a freshman, a 2 if the student

More information

Sta 309 (Statistics And Probability for Engineers)

Sta 309 (Statistics And Probability for Engineers) Instructor: Prof. Mike Nasab Sta 309 (Statistics And Probability for Engineers) Chapter 2 Organizing and Summarizing Data Raw Data: When data are collected in original form, they are called raw data. The

More information

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. Unit 1 Number Sense In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. BLM Three Types of Percent Problems (p L-34) is a summary BLM for the material

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Unit 9 Describing Relationships in Scatter Plots and Line Graphs Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear

More information

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative

More information