# Module 2 Project Maths Development Team Draft (Version 2)

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 5 Week Modular Course in Statistics & Probability Strand 1 Module 2

2 Analysing Data Numerically Measures of Central Tendency Mean Median Mode Measures of Spread Range Standard Deviation Inter-Quartile Range Module 2.1

3 Mean & Standard Deviation using a Calculator Calculate the mean and standard deviation of the following 10 students heights by: (i) using the data unsorted (ii) creating a frequency table Unsorted Data Frequency Table Gender Height /cm 1. Boy Girl Boy Boy Girl Boy Girl Girl Boy Boy 179 Height /cm Frequency Mean = cm S.D. = 9.32cm Module 2.2

4 Mean & Standard Deviation using a Calculator Calculate the mean and standard deviation of the following 10 students heights by: (i) using the data unsorted (ii) creating a frequency table Unsorted Data Frequency Table Gender Height /cm 1. Boy Girl Boy Boy Girl Boy Girl Girl Boy Boy 179 Height /cm Frequency Mean = cm S.D. = 9.32cm Module 2.2

5 Central Tendency: The Mean Advantages: Mathematical centre of a distribution Does not ignore any information Disadvantages: Influenced by extreme scores and skewed distributions May not exist in the data Module 2.3

6 Central Tendency: The Mode Advantages: Good with nominal data Easy to work out and understand The score exists in the data set Disadvantages: Small samples may not have a mode More than one mode might exist Module 2.4

7 Central Tendency: The Median Advantages: Not influenced by extreme scores or skewed distribution Good with ordinal data Easier to calculate than the mean Considered as the typical observation Disadvantages: May not exist in the data Does not take actual data into account only its (ordered) position Difficult to handle theoretically Module 2.5

8 Summary: Relationship between the 3 M s Characteristics Mean Median Mode Consider all the data in calculation Yes No No Easily affected by extreme data Yes No No Can be obtained from a graph No Yes Yes Should be one of the data No No Yes Need to arrange the data in ascending order No Yes No Module 2.6

9 Summary: Relationship between the 3 M s Characteristics Mean Median Mode Consider all the data in calculation Yes No No Easily affected by extreme data Yes No No Can be obtained from a graph No Yes Yes Should be one of the data No No Yes Need to arrange the data in ascending order No Yes No Module 2.6

10 Using Appropriate Averages Example There are 10 house in Pennylane Close. On Monday, the numbers of letters delivered to the houses are: Calculate the mean, mode and the median of the number of letters. Comment on your results. Solution Mean = 10 = 5.1 Mode = 0 Median = 2 In this case the mean (5.1) has been distorted by the large number of letters delivered to one of the houses. It is, therefore, not a good measure of a 'typical' number of letters delivered to any house in Pennylane Close. The mode (0) is also not a good measure of a 'typical' number of letters delivered to a house, since 7 out of the 10 houses do acutally receive some letters. The median (2) is perhaps the best measure of the 'typical' number of letters delivered to each house, since half of the houses received 2 or more letters and the other half received 2 or fewer letters. Module 2.7

12 2.1 Ten students submitted their Design portfolios which were marked out of 40. The marks they obtained were (a) For these marks find (i) the mode (ii) the median (iii) the mean. (b) Comment on your results. (c) An external moderator reduced all the marks by 3. Find the mode, median and mean of the moderated results. Solution (a) (i) 34 (ii) 28.5 (iii) 26.4 (b) Mean is the lowest the 4 depresses its value compared to the mode and the median. (c) 31, 25.5, 23.4

13 2.1 Ten students submitted their Design portfolios which were marked out of 40. The marks they obtained were (a) For these marks find (i) the mode (ii) the median (iii) the mean. (b) Comment on your results. (c) An external moderator reduced all the marks by 3. Find the mode, median and mean of the moderated results. Solution (a) (i) 34 (ii) 28.5 (iii) 26.4 (b) Mean is the lowest the 4 depresses its value compared to the mode and the median. (c) 31, 25.5, 23.4

14 Quartiles Second 25% of Data Third 25% of Data First 25% of Data Last 25% of Data Q1 Q2 Q3 Quartiles When we arrange the data is ascending order of magnitude and divide them into four equal parts, the values which divide the data into four equal parts are called quartiles. They are usually denoted by the symbols Q 1 is the lowest quartile (or first quartile) where 25% of the data lie below it; Q 2 is the middle quartile ( or second quartile or median) where 50% of the data lie below it; Q 3 is the upper quartile (or third quartile) where 75% of the data lie below it. Module 2.8

15 Interquartile Range & Range Data in ascending order of magnitude First 25% of data Second 25% of data Third 25% of data Last 25% of data Q1 Lower Quartile Interquartile Range Q2 Median Q3 Upper Quartile Minimum Maximum Range Module 2.9

16 Standard Deviation Example Two machines A and B are used to measure the diameter of a washer. 50 measurements of a washer are taken by each machine. If the standard deviations of measurements taken by machine A and B are 0.4mm and 0.15mm respectively, which instrument gives more consistent measurements? Solution Standard deviation of A = 0.4 mm Standard deviation of B = 0.15 mm The smaller the standard deviation, the less widely dispersed the data is. This means that more measurements are closer to the mean. Therefore, the measurements taken by instrument B are more consistent. Module 2.10

17 Guide to Distributions Example 1 Seven teenagers at a youth club were asked their age. They gave the following ages: 16, 14, 19, 16, 13, 18, 16 The mean, mode and the median of their ages are as follows: Mean = 16 Mode = 16 Median = 16 If a line plot of their ages is drawn we get the following Mean, Mode and Median When the Mean and Median are the same value the plot is symmetrical (Bell Shaped). The Mode affects the height of the of the Bell Shaped curve. Module 2.11

18

19 Example 2: Seven people were asked how many text messages they send (on average) every week. The results were as follows: 3, 25, 30, 30, 30, 33, 40 The mean mode and the median of text messages sent are as follows: Mean = Mode = 30 Median = 30 If a line plot of the number of texts sent is drawn we get the following: Mode Median When the Mean is to the left of the Median the data is said to be skewed to the left or negatively skewed. The Mode affects the height of the curve. Mean Module 2.12

20 Example 3: Eight factory workers were asked to give their annual salary. The results are as follows: (figures in thousands of Euro) 20, 22, 25, 26, 27, 27, 70 The mean mode and the median of their annual salaries are as follows: Mean = 31 Mode = 27 Median = 26 If a line plot of their salaries is drawn we get the following Median Mode Mean When the Mean is to the right of the Median the data is said to be skewed to the right or positively skewed. The Mode affects the height of the curve. Module 2.13

21 (HL Sample Paper)

22 (HL Sample Paper)

23 Are Measures of Centre Enough? Dataset 1 Dataset 2 Dataset 1 Dataset 2 Median Mean Mode Maximum Minimum Range Standard Deviation

25 2.2 A clerk entering salary data into a company spreadsheet accidentally put an extra 0 in the boss s salary, listing it as 2,000,000 instead of 200,000. Explain how this error will affect these summary statistics for the company payroll: (a) measures of centre: median and mean. (b) Solution (b) (b) measures of spread: range, IQR, and standard deviation. As long as the boss s true salary of 200,000 is still above the median, the median will be correct. The mean will be too large, since the total of all the salaries will decrease by 2,000, ,000= 1,800,000, once the mistake is corrected. The range will likely be too large. The boss s salary is probably the maximum, and a lower maximum would lead to a smaller range. The IQR will likely be unaffected, since the new maximum has no effect on the quartiles. The standard deviation will be too large, because the 2,000,000 salary will have a large squared deviation from the mean.

26 2.2 A clerk entering salary data into a company spreadsheet accidentally put an extra 0 in the boss s salary, listing it as 2,000,000 instead of 200,000. Explain how this error will affect these summary statistics for the company payroll: (a) measures of centre: median and mean. (b) Solution (b) (b) measures of spread: range, IQR, and standard deviation. As long as the boss s true salary of 200,000 is still above the median, the median will be correct. The mean will be too large, since the total of all the salaries will decrease by 2,000, ,000= 1,800,000, once the mistake is corrected. The range will likely be too large. The boss s salary is probably the maximum, and a lower maximum would lead to a smaller range. The IQR will likely be unaffected, since the new maximum has no effect on the quartiles. The standard deviation will be too large, because the 2,000,000 salary will have a large squared deviation from the mean.

27 2.4 The histogram shows the lengths of hospital stays (in days) for all the female patients admitted to hospital in New York in 1993 with a primary diagnosis of acute myocardial infarction. (heart attack) (a) (b) Solution (a) (b) From the histogram, would you expect the mean or median to be larger? Explain. Write a few sentences describing this distribution. (shape, centre, spread, unusual features). The distribution of length of stays is skewed to the right, so the mean is larger than the median. The distribution of the length of hospital stays of female heart attack patients is skewed to the right, with stays ranging from 1 day to 36 days. The distribution is centred around 8 days, with the majority of the hospital stays lasting between 1 and 15 days. There are a relatively few hospitals stays longer than 27 days. Many patients have a stay of only one day, possibly because the patient died.

28 2.4 The histogram shows the lengths of hospital stays (in days) for all the female patients admitted to hospital in New York in 1993 with a primary diagnosis of acute myocardial infarction. (heart attack) (a) (b) Solution (a) (b) From the histogram, would you expect the mean or median to be larger? Explain. Write a few sentences describing this distribution. (shape, centre, spread, unusual features). The distribution of length of stays is skewed to the right, so the mean is larger than the median. The distribution of the length of hospital stays of female heart attack patients is skewed to the right, with stays ranging from 1 day to 36 days. The distribution is centred around 8 days, with the majority of the hospital stays lasting between 1 and 15 days. There are a relatively few hospitals stays longer than 27 days. Many patients have a stay of only one day, possibly because the patient died.

29 Bivariate Data 1. Involves 2 variables 2. Deals with causes or relationships 3. The major purpose of bivariate analysis is to determine whether relationships exist We will look at the following: Scatter plots Correlation Correlation coefficient Correlation & causality Line of Best Fit Correlation coefficient not equal to slope Sample question: Is there a relationship between the scores of students who study Physics and their scores in Mathematics? Module 2.14

30 Univariate Data versus Bivariate Data Univariate data: Bivariate Data: Only one item of data is collected e.g. height Data collected in pairs to see if there is a relationship between the variables e.g. height and arm span, mobile phone bill and age etc. Examples: Categorical paired data: Discrete paired: Continuous paired: Category and discrete paired: Colour of eyes and gender Number of bars eaten per week and number of tooth fillings Height and weight Type of dwelling and number of occupants etc. [Look at questionnaire] Module 2.15

31 Correlation Correlation: is about assessing the strength of the relationship between pairs of data. The first step in determining the relationship between 2 variables is to draw a Scatter Plot. After establishing if a Linear Relationship (Line of Best Fit) exists between 2 variables X and Y, the strength of the relationship can be measured and is known as the correlation coefficient (r). Correlation is a precise term describing the strength and direction of the linear relationship between quantitative variables. Module 2.16

32 Correlation Coefficient (r) 1 r 1 r = +1 Corresponds to a perfect positive linear correlation where the points lie exactly on a straight line [The line will have positive slope] r=0: Correponds to little or no correlation i.e. as x increases there is no definite tendency for the values of y to increase or decrease in a straight line r = 1 : Correponds to a perfect negative linear correlation where the points lie exactly on a straight line [The line will have negative slope] r close to + 1 : Indicates a strong positive linear correlation, i.e. y tends to increase as x increases r close to 1 : Indicates a strong negative linear correlation, i.e. y tends to decrease as x increases The correlation coefficient (r) is a numerical measure of the direction and strength of a linear association. Module 2.17

33 What s wrong? All have a Correlation Coefficient of The four y variables have the same mean (7.5), standard deviation (4.12), correlation (0.816) and regression line (y = x). However, as can be seen on the plots, the distribution of the variables is very different.

34 Scatter Plots Can show the relationship between 2 variables using ordered pairs plotted on a coordinate plane The data points are not joined The resulting pattern shows the type and strength of the relationship between the two variables Where a relationship exists, a line of best fit can be drawn (by eye) between the points Scatter plots can show positive or negative correlation, weak or strong correlation, outliers and spread of data An outlier is a data point that does not fit the pattern of the rest of the data. There can be several reasons for an outlier including mistakes made in the data entry or simply an unusual value. Module 2.18

35 Describing Correlation Form: Straight, Curved, No pattern Direction: Positive, Negative, Neither Strength: Weak, Moderate, Strong Unusual Features: Outliers, Subgroups Exam Results Golf Scores High I.Q. Study time Practice time Height Positive correlation As one quantity increases so does the other Negative correlation As one quantity increases the other decreases. No correlation Both quantities vary with no clear relationship Module 2.19

36 Line of Best Fit Roughly goes through the middle of the scatter of the points To describe it generally: it has about as many points on one side of the line as the other, and it doesn t have to go through any of the points It can go through some, all or none of the points Strong correlation is when the scatter points lie very close to the line It also depends on the size of the sample from which the data was chosen Strong positive correlation Moderate positive correlation No correlation no linear relationship Moderate negative correlation Strong negative correlation Module 2.20

37 Example A set of students sat their mock exam in English, they sat their final exam in English at a later date. The marks obtained by the students in both examinations were as follows: Students A B C D E F G H Mock Results Final Results (a) (b) Draw a Scatter Plot for this data and draw a line of best fit. Is there a correlation between mock results and final results? Solution (a) (b) There is a positive correlation between mock results and final results. Module 2.21

38 Example A set of students sat their mock exam in English, they sat their final exam in English at a later date. The marks obtained by the students in both examinations were as follows: Students A B C D E F G H Mock Results Final Results (a) (b) Draw a Scatter Plot for this data and draw a line of best fit. Is there a correlation between mock results and final results? Solution (a) (b) There is a positive correlation between mock results and final results. Module 2.21

39 Correlation Coefficient by Calculator r = a = b = y = x Before doing this on the calculator, the class should do a scatter plot using the data in the table. Discuss the relationship between the data (i.e. grams of fat v calories). Module 2.22

41 State the type of Correlation for the Scatter plots below and write a sentence describing the relationship in each case. 2.5 Solution 1. As maths results increases physics results tend to also increase. 2. In general the more time spend exercising, tends to lead to a decrease in body mass. 3. There is no linear relationship between Maths results and the height of students. 4. As the outside air temperature increases, heating bills tend to decrease. 5. As the daily hours of sunshine increases, the sale of sun cream tends to get higher. 6. In general the older the car the less its value.

42 State the type of Correlation for the Scatter plots below and write a sentence describing the relationship in each case. 2.5 Solution 1. As maths results increases physics results tend to also increase. 2. In general the more time spend exercising, tends to lead to a decrease in body mass. 3. There is no linear relationship between Maths results and the height of students. 4. As the outside air temperature increases, heating bills tend to decrease. 5. As the daily hours of sunshine increases, the sale of sun cream tends to get higher. 6. In general the older the car the less its value.

43 2.6 Roller coasters get all their speed by dropping down a steep initial incline, so it makes sense that the height of that drop might be related to the speed of the coaster. Here s a scatter plot of top Speed and largest Drop for 75 roller coasters around the world. Speed/mph Solution (a) (b) Drop/ ft It is appropriate to calculate correlation. Both height of the drop and speed are quantitative variables, the scatter plot shows an association that is straight enough, and there are no outliers. There is a strong, positive, linear association between drop and speed; the greater the height of the initial drop, the higher the top speed.

44 2.6 Roller coasters get all their speed by dropping down a steep initial incline, so it makes sense that the height of that drop might be related to the speed of the coaster. Here s a scatter plot of top Speed and largest Drop for 75 roller coasters around the world. Speed/mph Solution (a) (b) Drop/ ft It is appropriate to calculate correlation. Both height of the drop and speed are quantitative variables, the scatter plot shows an association that is straight enough, and there are no outliers. There is a strong, positive, linear association between drop and speed; the greater the height of the initial drop, the higher the top speed.

45 2.7 A candidate for office claims that there is a correlation between television watching and crime Criticize this statement in statistical terms. Solution The candidate might mean that there is an association between television watching and crime. The term correlation is reserved for describing linear associations between quantitative variables. We don t know what type of variables television watching and crime are, but they seem categorical. Even if the variables are quantitative (hours of tv watched per week, and number of crimes committed, for example), we aren t sure that the relationship is a linear. The politician also seems to be implying a cause-and-effect relationship between television watching and crime. Association of any kind does not imply causation.

46 2.7 A candidate for office claims that there is a correlation between television watching and crime Criticize this statement in statistical terms. Solution The candidate might mean that there is an association between television watching and crime. The term correlation is reserved for describing linear associations between quantitative variables. We don t know what type of variables television watching and crime are, but they seem categorical. Even if the variables are quantitative (hours of tv watched per week, and number of crimes committed, for example), we aren t sure that the relationship is a linear. The politician also seems to be implying a cause-and-effect relationship between television watching and crime. Association of any kind does not imply causation.

47 Correlation versus Causation Correlation is a mathematical relationship between 2 variables which are measured A Correlation of 0, means that there is no linear relationship between the 2 variables and knowing one does not allow prediction of the other Strong Correlation may be no more than a statistical association and does not imply causality Just because there is a strong correlation between 2 things does not mean that one causes the other. A consistently strong correlation may suggest causation but does not prove it. Look at these examples which show a strong correlation but do not prove causality: E.g. 1 With the decrease in the number of pirates we have seen an increase in global warming over the same time period. Does this mean global warming is caused by the decrease in pirates? E.g. 2 With the increase in the number of television sets sold an electrical shop has seen an increase in the number of calculators sold over the same time period. Does this mean that buying a television causes you to buy a calculator? Module 2.23

48 Criteria for Establishing Causation There has to be a strong consistent association found in repeated studies The cause has to be plausible and precede the effect in time Higher doses will result in stronger responses Module 2.24

49 Video Clip Association does not mean Causation Length: 00:02:43

51 2.8 Fast food is often considered unhealthy because much of it high in both fat and sodium. But are the two related? Here are the fat and sodium contents of several brands of burgers. Analyze the association between fat content and sodium. Fat (g) Sodium (mg) Solution There is no apparent association between the number of grams of fat and the number of milligrams of sodium in several brands of fast food burgers. The correlation is only r = 0.199, which is close to zero, an indication of no association. One burger had a much lower fat content than the other burgers, at 19 grams of fat, with 920 milligrams of sodium. Without this (comparatively) low fat burger, the correlation would have been even lower.

52 2.8 Fast food is often considered unhealthy because much of it high in both fat and sodium. But are the two related? Here are the fat and sodium contents of several brands of burgers. Analyze the association between fat content and sodium. Fat (g) Sodium (mg) Solution There is no apparent association between the number of grams of fat and the number of milligrams of sodium in several brands of fast food burgers. The correlation is only r = 0.199, which is close to zero, an indication of no association. One burger had a much lower fat content than the other burgers, at 19 grams of fat, with 920 milligrams of sodium. Without this (comparatively) low fat burger, the correlation would have been even lower.

53 Correlation & Slope undefined Several sets of (x, y) points, with the correlation coefficient of x and y for each set. Note that the correlation reflects the spread and direction of a linear relationship but not the gradient (slope) of that relationship, N.B.: the figure in the centre of the second line has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero. The gradient (slope) of the line of best fit is not important when dealing with correlation, except that a vertical or horizontal line of best fit means that the variables are not connected. [The sign of the slope of the line of best fit will be the same as that of the correlation coefficient because both will be in the same direction.] Module 2.25

54 SAMPLE PAPER OL

55 SAMPLE PAPER HL

56 SAMPLE PAPER HL

57 Notes Module 2.26

### 5 Week Modular Course in Statistics & Probability Strand 1. Module 2

5 Week Modular Course in Statistics & Probability Strand 1 Module 2 Analysing Data Numerically Measures of Central Tendency Mean Median Mode Measures of Spread Range Standard Deviation Inter-Quartile Range

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

### F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

### Chapter 7 Scatterplots, Association, and Correlation

78 Part II Exploring Relationships Between Variables Chapter 7 Scatterplots, Association, and Correlation 1. Association. a) Either weight in grams or weight in ounces could be the explanatory or response

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Teaching & Learning Plans. The Correlation Coefficient. Leaving Certificate Syllabus

Teaching & Learning Plans The Correlation Coefficient Leaving Certificate Syllabus The Teaching & Learning Plans are structured as follows: Aims outline what the lesson, or series of lessons, hopes to

### STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

### Data Analysis: Describing Data - Descriptive Statistics

WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most

### College of the Canyons Math 140 Exam 1 Amy Morrow. Name:

Name: Answer the following questions NEATLY. Show all necessary work directly on the exam. Scratch paper will be discarded unread. One point each part unless otherwise marked. 1. Owners of an exercise

### Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

### CHINHOYI UNIVERSITY OF TECHNOLOGY

CHINHOYI UNIVERSITY OF TECHNOLOGY SCHOOL OF NATURAL SCIENCES AND MATHEMATICS DEPARTMENT OF MATHEMATICS MEASURES OF CENTRAL TENDENCY AND DISPERSION INTRODUCTION From the previous unit, the Graphical displays

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### What are Data? The Research Question (Randomised Controlled Trials (RCTs)) The Research Question (Non RCTs)

What are Data? Quantitative Data o Sets of measurements of objective descriptions of physical and behavioural events; susceptible to statistical analysis Qualitative data o Descriptive, views, actions

### Session 1.6 Measures of Central Tendency

Session 1.6 Measures of Central Tendency Measures of location (Indices of central tendency) These indices locate the center of the frequency distribution curve. The mode, median, and mean are three indices

### AP * Statistics Review. Descriptive Statistics

AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

### Chapter 3: Data Description Numerical Methods

Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,

### Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

### Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

### 1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics)

1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics) As well as displaying data graphically we will often wish to summarise it numerically particularly if we wish to compare two or more data sets.

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

### 13.2 Measures of Central Tendency

13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

### Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Name: Date: Use the following to answer questions 2-3:

Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

### 2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

### Module 4: Data Exploration

Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

### Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Means, standard deviations and. and standard errors

CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

### Pearson s correlation

Pearson s correlation Introduction Often several quantitative variables are measured on each member of a sample. If we consider a pair of such variables, it is frequently of interest to establish if there

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

### Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

### Sample Exam #1 Elementary Statistics

Sample Exam #1 Elementary Statistics Instructions. No books, notes, or calculators are allowed. 1. Some variables that were recorded while studying diets of sharks are given below. Which of the variables

### 4. Describing Bivariate Data

4. Describing Bivariate Data A. Introduction to Bivariate Data B. Values of the Pearson Correlation C. Properties of Pearson's r D. Computing Pearson's r E. Variance Sum Law II F. Exercises A dataset with

### 4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Data Analysis Plan The appropriate methods of data analysis are determined by your data types and variables of interest, the actual distribution of the variables, and the number of cases. Different analyses

### Chapter 10 - Practice Problems 1

Chapter 10 - Practice Problems 1 1. A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam. In this study, the

### Central Tendency. n Measures of Central Tendency: n Mean. n Median. n Mode

Central Tendency Central Tendency n A single summary score that best describes the central location of an entire distribution of scores. n Measures of Central Tendency: n Mean n The sum of all scores divided

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### MEASURES OF VARIATION

NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

### Chapter 3 Descriptive Statistics: Numerical Measures. Learning objectives

Chapter 3 Descriptive Statistics: Numerical Measures Slide 1 Learning objectives 1. Single variable Part I (Basic) 1.1. How to calculate and use the measures of location 1.. How to calculate and use the

### STAT 155 Introductory Statistics. Lecture 5: Density Curves and Normal Distributions (I)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 5: Density Curves and Normal Distributions (I) 9/12/06 Lecture 5 1 A problem about Standard Deviation A variable

### Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS

Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS 1. Introduction, and choosing a graph or chart Graphs and charts provide a powerful way of summarising data and presenting them in

### The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### Mind on Statistics. Chapter 3

Mind on Statistics Chapter 3 Section 3.1 1. Which one of the following is not appropriate for studying the relationship between two quantitative variables? A. Scatterplot B. Bar chart C. Correlation D.

### Data Exploration Data Visualization

Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

### Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

### Box-and-Whisker Plots

Learning Standards HSS-ID.A. HSS-ID.A.3 3 9 23 62 3 COMMON CORE.2 Numbers of First Cousins 0 3 9 3 45 24 8 0 3 3 6 8 32 8 0 5 4 Box-and-Whisker Plots Essential Question How can you use a box-and-whisker

### Statistical Analysis Using Gnumeric

Statistical Analysis Using Gnumeric There are many software packages that will analyse data. For casual analysis, a spreadsheet may be an appropriate tool. Popular spreadsheets include Microsoft Excel,

### CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

### Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

### The correlation coefficient

The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

### What Does the Normal Distribution Sound Like?

What Does the Normal Distribution Sound Like? Ananda Jayawardhana Pittsburg State University ananda@pittstate.edu Published: June 2013 Overview of Lesson In this activity, students conduct an investigation

### Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

### HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### The Cartesian Plane The Cartesian Plane. Performance Criteria 3. Pre-Test 5. Coordinates 7. Graphs of linear functions 9. The gradient of a line 13

6 The Cartesian Plane The Cartesian Plane Performance Criteria 3 Pre-Test 5 Coordinates 7 Graphs of linear functions 9 The gradient of a line 13 Linear equations 19 Empirical Data 24 Lines of best fit

### Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination

Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg - ) Chromium (mg kg - ) Copper (mg kg - ) Lead (mg kg - ) Mercury

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### First Midterm Exam (MATH1070 Spring 2012)

First Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notecard. Calculators are allowed, but other electronics are prohibited. 1. [40pts] Multiple Choice Problems

### . 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

### Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam. Software Profiling Seminar, Statistics 101

Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam Software Profiling Seminar, 2013 Statistics 101 Descriptive Statistics Population Object Object Object Sample numerical description Object

### MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Exam Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) The government of a town needs to determine if the city's residents will support the

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

### DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

### Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

### Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

### AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Measurement & Data Analysis Overview of Measurement. Variability & Measurement Error.. Descriptive vs. Inferential Statistics. Descriptive Statistics. Distributions. Standardized Scores. Graphing Data.

### Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

### Lab 1: The metric system measurement of length and weight

Lab 1: The metric system measurement of length and weight Introduction The scientific community and the majority of nations throughout the world use the metric system to record quantities such as length,

### Histogram. Graphs, and measures of central tendency and spread. Alternative: density (or relative frequency ) plot /13/2004

Graphs, and measures of central tendency and spread 9.07 9/13/004 Histogram If discrete or categorical, bars don t touch. If continuous, can touch, should if there are lots of bins. Sum of bin heights

### Exam Style Questions. Revision for this topic. Name: Ensure you have: Pencil, pen, ruler, protractor, pair of compasses and eraser

Name: Exam Style Questions Ensure you have: Pencil, pen, ruler, protractor, pair of compasses and eraser You may use tracing paper if needed Guidance 1. Read each question carefully before you begin answering

### c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

### Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

### Mind on Statistics. Chapter 2

Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Outline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics

Statistical Methods I Tamekia L. Jones, Ph.D. (tjones@cog.ufl.edu) Research Assistant Professor Children s Oncology Group Statistics & Data Center Department of Biostatistics Colleges of Medicine and Public

### Week 1. Exploratory Data Analysis

Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

### Statistics E100 Fall 2013 Practice Midterm I - A Solutions

STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 1 OF 5 Statistics E100 Fall 2013 Practice Midterm I - A Solutions 1. (16 points total) Below is the histogram for the number of medals won

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

### Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

### Stats Review Chapters 3-4

Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by Michael Sullivan, III And the corresponding Test

### Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

### AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with