# Statistics Summary (prepared by Xuan (Tappy) He)

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Statistics Summary (prepared by Xuan (Tappy) He) Statistics is the practice of collecting and analyzing data. The analysis of statistics is important for decision making in events where there are uncertainties. Example 1: Consider the following table which records the marks of students on a test out of 6. Name Mohammad Mary Mamadou Binta Alieu Lamin Mark Measures of Central Tendency: A good way to begin analyzing data is to summarize the data into a single representative value. The three most common measures of central tendency are mean, median and mode. In order to analyze a set of data using measures of central tendency, we must know: The number of entries in the data set and, The value of each entry. Mean: Definition: Another word for average. Sum of Value of Entries Mean = Formula: Number of Entries In Example 1, the mean is = = 4 Median: Definition: The median is the value that separates the higher half of a sample from the lower half. Formula: Arrange all of the values from lowest to highest. If there are an odd number of entries, the median is the middle value. If there are an even number of entries, the median is the mean of the two middle entries. In Example 1, the numbers can be reordered as 3, 3, 3, 4,, 6, and the median is = = 3. Mode: Definition: The mode is the most frequently occurring value in the data set. In a data set where each value occurs exactly once, there is no mode. In Example 1, since 3 occurred more frequently than any other value, the mode is 3. Measures of Dispersion: Dispersion gives information about how spread out the values are in the data set. Common measures of dispersion are range, standard deviation, mean deviation, and interquartile range. Range: Definition: The range in a data set measures the difference between the smallest entry value and the largest entry value. Formula: Range = (largest entry value - smallest entry value) This is the simplest measure of dispersion. In Example 1, the range is 6-3=3. Mean Deviation: Definition: The mean deviation calculates the average difference of each entry from the mean. It is calculated by taking the absolute value of the difference between each entry and the mean of the data set, adding them together, then divided by the number of entries in the data set. Sum of value of entry - mean of data set Formula: Mean Deviation =. (The vertical bars denote the absolute Number of Entries value).

2 In Example 1, we have that Mohammad s mark is -4 =1 from the mean, Mary s mark is 3-4 =1 from the mean, 2 for Mamadou, 1 for Binta, 0 for Alieu and 1 for Lamin. Thus the mean deviation is = =1 Standard Deviation: Definition: Standard deviation measures the variation or dispersion exists from the mean. A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data points are spread over a large range of values. Formula: Standard Deviation = deviation is Sum of (value of entry - mean of data set)2 Number of Entries In Example 1, the square of the difference between Mohammad s mark and the mean is (-4) 2 =1. For Mary, it is (3-4) 2 =(-1) 2 =1. For Mamadou, Binta, Alieu and Lamin, they are 4, 1, 0, and 1 respectively. The standard ( " 4) 2 + (3" 4) 2 + (6 " 4) 2 + (3 " 4) 2 + (4 " 4) 2 + (3" 4) 2 6. = 8 6 #1.1 Data presented in frequency tables Large quantities of data are often presented in frequency tables. To do this we must have A common criteria, The common criteria divided into classes, The number of entries in each class. For example, the heights of students in an elementary school class are presented in the following table. The common criteria is height, the classes are the different groups of heights, and the value of each class is the number of pupils in the group. Example 2: Height of Pupils Height 110cm-120cm 120cm-130cm 130cm-140cm 140cm-10cm Number of pupils Histograms and Frequency Polygons: Graphical representations of the frequency distribution of a data set. A histogram is a bar graph whose x-axis labels the classes for the common criteria and y-axis labels the number of entries that satisfy each criteria. A frequency polygon is a line graph whose axis labels are the same as a histogram. However, after we mark down the points corresponding to our given data on the graph we connect the two points next to each other on the graph by a straight line. This usually results in a mountain-looking polygon. Histogram for the above example: Frequency polygon for the above example: A combined look:

3 Cumulative Frequency: Definition: The number of entries in a given class in a frequency table, plus all entries in earlier classes. In order to plot a cumulative frequency chart, we add another row or column to the frequency table that lists the cumulative frequency. See Example 3. We may draw a Cumulative Frequency Curve to represent the cumulative frequency information in the table. Example 3: Height 110cm-120cm 120cm-130cm 130cm-140cm 140cm-10cm Number of pupils Height Under 120cm Under 130cm Under 140cm Under 10cm Cumulative number of pupils Quartiles, Percentiles, Interquartile Range Quartile: Definition: A quartile is each of the three points that divide a range of data into four equal groups. Lower Quartile (Q1) Definition: The value such that 2% of all data entries have values less than this value. 1 Formula: The entry at Q1 is calculated by (total number of entries +1). Q1=Value of this entry. 4 In Example 3, the height of the 7.7 th person is the cut off height of the lower quartile, i.e. 13cm (assuming the height of pupils are arranged in ascending order) Upper Quartile (Q3) Definition: The value such that 7% of all data entries are less than this value. 3 Formula: The entry at Q3 is calculated by (total number of entries +1). Q3=Value of this entry. 4 In Example 3, the 23.2 th person s height is the minimum height for the upper quartile, i.e. 14cm (again assuming the height of pupils are arranged in ascending order) Median (Q2) Definition: The middle point that divides the middle two quartiles. It is the value such that 0% of all entries are less than this value. Recall that the median is the value of the middle entry out of all entries. Formula: refer to above measure of central tendency. In Example 3, the median is the average height of the 1 th and 16 th person, i.e. 13cm. Interquartile Range (IQR) Definition: A measure of dispersion. It is the difference between the lower quartile and upper quartile. Formula: IQR = Q3" Q1 In Example 3, we have that Q3 is 14cm (When each class of our criteria is given a range of values, we typically take the middle value of that class. In this example, 14 is the middle value of the cm height class); Q1 is 13cm, hence IQR=14-13=10cm. Percentile Definition: The n th percentile of a data set is the value such that n% of all entries are less than this value. Formula: The n th percentile entry is calculated by: n%(total number of entries +1). The value of this entry is the n th percentile. The percentile of a given value is determined by the percentage of the values that are smaller than that value. In Example 3, the 80 th percentile is the height of the 24.8 th person, approximately the 2 th person when the heights are arranged in ascending order. That is, 14cm is the 80 th percentile. Note: We usually round up if the cut off for a certain quartile or percentile lands between two entries. (In the example above, we would round up the upper quartile, Q3, to the height of the 24 th person.)

4 Estimating the median, quartiles and percentile from a Cumulative Frequency Curve When we look at a cumulative frequency curve, the y-axis labels the number of entries for the set of data. The y-coordinate corresponding to the highest point on the curve is the total number of entries in the data set. Then if we take the middle of the y-axis between 0 and the value of the highest point, we have the median (Q2). Taking the value halfway between 0 and the median on the y-axis (i.e. the lower 2% of total number of entries) is the lower quartile (Q1). Taking the value halfway between the median and the value of the highest point on the y-axis is the upper quartile (Q3). To derive the n th percentile, we use the same idea and take the value on the y-axis such that this value divided by the y-value of our highest point on the curve equals n%. Examples 1). This table shows the ages of students in a class. Age (Years) Number of Students a). How many students are in the class? b). What is the probability that a randomly chosen student in the class is 17 years old? c). What is the mean, median and modal age, and what is the range of the ages? a). Let n be the total number of students in the class. Then n= There are 42 students in the class. b). There are 22 out of 42 total students who are aged 17. Hence, a randomly chosen student has the probability approximately 2.3% of being 17 years old = 11, or 21 c). (Mean) Remember that the mean is the sum of all values divided by the number of entries. We have 42 students as our number of entries (answer from part a).). Since there are 7 students of age 16, 22 students of age 17, 13 students of age 18, the sum of all values in our data set is 7x16+22x17+13x18= =720. Therefore, the average age (i.e. mean) is 720 "17.1yrs old. 42 (1+ 42) (Median) The median would be the age of the = 21.th student. (i.e. the average age between the 21 st and 22 nd 2 student). Since there are 7 students aged 16, and 22 students aged 17, if we line these students up in ascending order of ages, the 21 st student would be one of the students that is 17 years old, so would the 22 nd student. Therefore, the median is just 17 years old. (Mode) Out of the 3 possible ages, 17 is the age with the most number of people, therefore the mode is 17 years old. (Range) Remember that the range is the highest entry value minus the lowest entry value. The highest entry value is 18 years old; the lowest entry value is 16 years old. Therefore the range is 18-16=2.

5 2). Table shows the number of goals scored in a game of football shootout (each person gets kicks at the goal) by students in a class. Goal Frequency a). Calculate the mean and median. b). A pie chart (or a circle graph) is a circular chart divided into sectors. Each of its sectors represents a class of the common criteria. The size of a sector illustrates the proportion of the number of entries of that class with respect to all entries in the data set. If we represent the information in a pie chart, what would be the sectorial angle for 4 goals? 2 goals? a). (Mean) Let n be the number of entries, then n= =30. Remember that the mean is the sum of all values divided by total number of entries. Sum of all values is 1x0+4x1+9x2+8x3+x4+3x=81. Therefore, the mean is = 2.7. A student in the class scores on average 2.7 goals in the shootout. (Median) Since there are 30 students, the median is the goal shot by the (1+ 30) =1.th student; i.e. the average of the 1 th 2 and 16 th student. From the table, 1+4+9=1, the 1 th student is one of the 9 students that shot 2 goals, and the 16 th student is one of 8 students that shot 3 goals. Hence the median is = 2. goals. b). (Approach 1) There are students that scored 4 goals; there are 30 students in total. A full pie chart consists of all entries, and around a circle is 360 o. Letting x be the sector angle, then 30 = 1 6 = x 360, so 60o of a full circle is the sectorial angle for 4 goals. (Approach 2) We ll take another approach for 2 goals. Since there are 30 students in total, and the pie is 360 o. Then each student represents 360 =12 of the pie s sectorial angle. Since there are 9 students that 30 scored 2 goals, we have 12 o x9=108 o as the sectorial angle for 2 goals. 3). Sona, Karina, Omar, Mustafa and Amie earned scores of 6, 7, 3, 7, 2 on a standardized test respectively. Find the mean deviation and standard deviation of their scores Mean deviation: We must first find the mean of the data set. The mean is =. Then the mean deviation is 6 " + 7 " + 3" + 7 " + 2 " calculated by = = 2. The mean deviation is 2. Standard deviation: (6 " ) 2 + (7 " ) 2 + (3 " ) 2 + (7 " ) 2 + (2 " ) 2 = = 22 # 2.1 4). The following table contains the number of championships these clubs have won after the 2011 UEFA Champions League. Club Real Liverpool AC Milan Manchester Internazionale Bayern Barcelona Madrid United Munich # of championships 9 7 X 3 X+1 4 won If these clubs won championships over the years on average, how many championships did Bayern Munich win over the years?

6 There are 7 clubs listed here, so the number of entries is 7. Now the average is championships won per team over the years, then rearranging the formula for the mean yields: Average # of championships won " 7 = Total # of championships won by all 7 teams. So the Total number of championships won by all 7 teams together is x7=3. Adding up the winnings from the other 6 teams, we ve =28. So Bayern Munich and Man.United combined must ve won 3-28=7 championships. We now have the equation x + (x +1) = 7. Solving for x gives x=3. So Man.United won 3 UEFA Championship league titles, then Bayern Munich won x+1=4 tournaments over the years. (We can verify by checking that ). The table below shows the distribution of the ages, in year, of 120 members of a local football club. Age (Years) Number of people =.) a). Draw a histogram for the above distribution b). Calculate the mean, median and mode of the distribution. c). Find the standard deviation of the distribution, correct to 2 decimal places. a). Remember that when we draw a histogram, gaps are closed between the bars. Hence, to make up for the difference of 1 between each age group, we extend the bars by 0. on both sides. (Example: the bar representing the 21-2 age group would expand from 20. to 2., the next bar representing age group would extend from , etc.) b). (Mean) We note that when we are given the classes of our common criteria (our common criteria is age, the classes are the age groups) in a range of values, we generally take the middle value of each class. For example, for the class of 21-2 years old, we take 23 as the age of the people in this age class. Then we have 1 person 23 years old, 14 people 28 years old, 42 people 33 years old, etc. We can now calculate the mean as 1(23) + 14(28) + 42(33) + 30(38) + 20(43) + 10(48) + 3(3) = = (Median) There are 120 members in the club, so the median is the average age of the 60 th and 61 st member. Since =7<60 and =87>61, the 60 th and the 61 st person are all from the age group Hence the median age is 38. (Mode) The most commonly occurring age is the age group 31-3, whose members age we assumed to be the middle value of the age group: 33. The mode is 33. c). Standard deviation: The mean we calculated is 37. The standard deviation is calculated as 1(23 " 37) 2 +14(28 " 37) (33" 37) (38 " 37) (43 " 37) 2 +10(48 " 37) 2 + 3(3 " 37) = # # ). The frequency distribution shows the marks of 100 students in a Math test. Marks No. of Pupils a). Fill in another row of cumulative frequency in the table. b). Draw a cumulative frequency curve for the distribution. c). Use your curve to estimate:

7 i). The median; ii). Lower and upper quartile; iii). The 60 th percentile; Solutions: a). Marks No. of Pupils Cumulative Number of Pupils b, c). From the Cumulative Frequency Curve above, we see that the median (red line) lies approximately between 1 and 60. Hence we may conclude that (i) The median mark is approximately. The lower quartile (Q1, lower green line) lies approximately between 31 to 40, therefore (ii) The lower quartile mark is approximately 3 and the upper quartile, by observation is approximately. Lastly, we see that there are 100 students in this data set. So the 60 th percentile is the mark that is above the lowest 60 students marks, namely, the mark of the student with the 61 st lowest mark. (iii) The 60 th percentile mark, as illustrated by the gray line, is approximately. Exercises 1). Which of the following is not a measure of central tendency? Mean, Median, Range, Mode. 2). The following table illustrates marks obtained by students in a test. Marks Frequency X 2 a). If the mean score of the class is 3, find the value of x. b). What is the median score? What is the modal score? What is the range of the scores? 3). In a favorite animal survey of 120 students, each student is asked to identify his or her favorite animal. The data is displayed in the pie chart on the left. a). What percentage of students like rabbits as their favorite animal? b). How many students like cats as their favorite animal? c). How many more students like dogs as their favorite animal than rabbits as their favorite animal? 4). The average age of women in a group is 27 years. If two other women aged 40 and 28 years join the group, find, in years, the new average age of the group of women.

8 ). The following table shows the frequency distribution of the weights of 2 children in a community. Weight(kg) # of Children a). Find the mode of the distribution b). Calculate the mean weight of the children c). What is the probability that a child selected at random weighs more than 17kg? d). Add a new row of cumulative frequency of children s weights. e). Use information in the question and your answer in part d). to plot a cumulative frequency curve. f). Using the cumulative frequency curve, determine the upper quartile weight of the children. 6). 10 different teams played football over the summer. After the summer, the top goal scorers from each team scored the following number of goals: X If the mean number of goals scored is 9, what is: a). The value of X? b). The mode? c). The median? d). The range? e). The mean deviation? f). The standard deviation? g). The 0 th percentile? h). What is the percentile of the goal scorer with 11 goals scored? 7). The table below shows the marks scored by 30 students in a test. Mark Frequency a). Draw a frequency polygon to represent this information. b). A student is selected at random, what is the probability that he scored 7 or less points on this test? c). What is the median score? 8). In northern parts of the world, a 36-day year generally has temperature as follows: Temperature -39 to to to to 0 1 to 9 10 to to to 39 (in degrees Celsius) # of Days a). Draw a histogram to illustrate the above information. b). Add another row of cumulative frequency on the number of days of different temperature. c). Draw a cumulative frequency curve to illustrate the cumulative information. d). What percent of the year has temperature below 0 degrees? e). What percent of the year has temperature above 30 degrees? g). Using the cumulative frequency curve, determine the following: (i) The lower and upper quartile. Explain what they represent in context of the data. (ii) The median temperature (iii) The 30 th percentile and 70 th percentile. Explain what they represent in context of the data.

### A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

### Statistics Revision Sheet Question 6 of Paper 2

Statistics Revision Sheet Question 6 of Paper The Statistics question is concerned mainly with the following terms. The Mean and the Median and are two ways of measuring the average. sumof values no. of

### GCSE HIGHER Statistics Key Facts

GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information

### Glossary of numeracy terms

Glossary of numeracy terms These terms are used in numeracy. You can use them as part of your preparation for the numeracy professional skills test. You will not be assessed on definitions of terms during

### HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

### Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion

Statistics Basics Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion Part 1: Sampling, Frequency Distributions, and Graphs The method of collecting, organizing,

### Chapter 2: Frequency Distributions and Graphs

Chapter 2: Frequency Distributions and Graphs Learning Objectives Upon completion of Chapter 2, you will be able to: Organize the data into a table or chart (called a frequency distribution) Construct

### Chapter 3: Data Description Numerical Methods

Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,

### Descriptive Statistics. Frequency Distributions and Their Graphs 2.1. Frequency Distributions. Chapter 2

Chapter Descriptive Statistics.1 Frequency Distributions and Their Graphs Frequency Distributions A frequency distribution is a table that shows classes or intervals of data with a count of the number

### We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:

MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having

### 2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

### Statistics Review Solutions

Statistics Review Solutions 1. Katrina must take five exams in a math class. If her scores on the first four exams are 71, 69, 85, and 83, what score does she need on the fifth exam for her overall mean

### The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

### Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

### Frequency distributions, central tendency & variability. Displaying data

Frequency distributions, central tendency & variability Displaying data Software SPSS Excel/Numbers/Google sheets Social Science Statistics website (socscistatistics.com) Creating and SPSS file Open the

### Introduction to Descriptive Statistics

Mathematics Learning Centre Introduction to Descriptive Statistics Jackie Nicholas c 1999 University of Sydney Acknowledgements Parts of this booklet were previously published in a booklet of the same

### Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

### STATISTICS FOR PSYCH MATH REVIEW GUIDE

STATISTICS FOR PSYCH MATH REVIEW GUIDE ORDER OF OPERATIONS Although remembering the order of operations as BEDMAS may seem simple, it is definitely worth reviewing in a new context such as statistics formulae.

### Table 2-1. Sucrose concentration (% fresh wt.) of 100 sugar beet roots. Beet No. % Sucrose. Beet No.

Chapter 2. DATA EXPLORATION AND SUMMARIZATION 2.1 Frequency Distributions Commonly, people refer to a population as the number of individuals in a city or county, for example, all the people in California.

### 13.2 Measures of Central Tendency

13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

### 6 3 The Standard Normal Distribution

290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### 2. A is a subset of the population. 3. Construct a frequency distribution for the data of the grades of 25 students taking Math 11 last

Math 111 Chapter 12 Practice Test 1. If I wanted to survey 50 Cabrini College students about where they prefer to eat on campus, which would be the most appropriate way to conduct my survey? a. Find 50

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### ! x sum of the entries

3.1 Measures of Central Tendency (Page 1 of 16) 3.1 Measures of Central Tendency Mean, Median and Mode! x sum of the entries a. mean, x = = n number of entries Example 1 Find the mean of 26, 18, 12, 31,

### Chapter 3: Central Tendency

Chapter 3: Central Tendency Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the distribution and represents

### Report of for Chapter 2 pretest

Report of for Chapter 2 pretest Exam: Chapter 2 pretest Category: Organizing and Graphing Data 1. "For our study of driving habits, we recorded the speed of every fifth vehicle on Drury Lane. Nearly every

### Areas of numeracy covered by the professional skills test

Areas of numeracy covered by the professional skills test December 2014 Contents Averages 4 Mean, median and mode 4 Mean 4 Median 4 Mode 5 Avoiding common errors 8 Bar charts 9 Worked examples 11 Box and

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### MODUL 8 MATEMATIK SPM ENRICHMENT TOPIC : STATISTICS TIME : 2 HOURS

MODUL 8 MATEMATIK SPM ENRICHMENT TOPIC : STATISTICS TIME : 2 HOURS 1. The data in Diagram 1 shows the body masses, in kg, of 40 children in a kindergarten. 16 24 34 26 30 40 35 30 26 33 18 20 29 31 30

### 5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

### Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

### Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS

Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction

### Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

### CC Investigation 5: Histograms and Box Plots

Content Standards 6.SP.4, 6.SP.5.c CC Investigation 5: Histograms and Box Plots At a Glance PACING 3 days Mathematical Goals DOMAIN: Statistics and Probability Display numerical data in histograms and

### Bar Graphs and Dot Plots

CONDENSED L E S S O N 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs

### Describing Data. We find the position of the central observation using the formula: position number =

HOSP 1207 (Business Stats) Learning Centre Describing Data This worksheet focuses on describing data through measuring its central tendency and variability. These measurements will give us an idea of what

### In this course, we will consider the various possible types of presentation of data and justification for their use in given situations.

PRESENTATION OF DATA 1.1 INTRODUCTION Once data has been collected, it has to be classified and organised in such a way that it becomes easily readable and interpretable, that is, converted to information.

### F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

### Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

### 909 responses responded via telephone survey in U.S. Results were shown by political affiliations (show graph on the board)

1 2-1 Overview Chapter 2: Learn the methods of organizing, summarizing, and graphing sets of data, ultimately, to understand the data characteristics: Center, Variation, Distribution, Outliers, Time. (Computer

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

### CHAPTER 86 MEAN, MEDIAN, MODE AND STANDARD DEVIATION

CHAPTER 86 MEAN, MEDIAN, MODE AND STANDARD DEVIATION EXERCISE 36 Page 919 1. Determine the mean, median and modal values for the set: {3, 8, 10, 7, 5, 14,, 9, 8} Mean = 3+ 8 + 10 + 7 + 5 + 14 + + 9 + 8

### Data Analysis: Describing Data - Descriptive Statistics

WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most

### sum of the values X = = number of values in the sample n sum of the values number of values in the population

Topic 3: Measures of Central Tendency With any set of numerical data, there is always a temptation to summarise the data to a single value that attempts to be a typical value for the data There are three

### 6. Methods 6.8. Methods related to outputs, Introduction

6. Methods 6.8. Methods related to outputs, Introduction In order to present the outcomes of statistical data collections to the users in a manner most users can easily understand, a variety of statistical

### Variables. Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

### The Normal Distribution

Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### 2 Describing, Exploring, and

2 Describing, Exploring, and Comparing Data This chapter introduces the graphical plotting and summary statistics capabilities of the TI- 83 Plus. First row keys like \ R (67\$73/276 are used to obtain

### Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

### MEI Statistics 1. Exploring data. Section 1: Introduction. Looking at data

MEI Statistics Exploring data Section : Introduction Notes and Examples These notes have sub-sections on: Looking at data Stem-and-leaf diagrams Types of data Measures of central tendency Comparison of

### 1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics)

1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics) As well as displaying data graphically we will often wish to summarise it numerically particularly if we wish to compare two or more data sets.

### Using SPSS, Chapter 2: Descriptive Statistics

1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

### Graphical and Tabular. Summarization of Data OPRE 6301

Graphical and Tabular Summarization of Data OPRE 6301 Introduction and Re-cap... Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful information

### Each exam covers lectures from since the previous exam and up to the exam date.

Sociology 301 Exam Review Liying Luo 03.22 Exam Review: Logistics Exams must be taken at the scheduled date and time unless 1. You provide verifiable documents of unforeseen illness or family emergency,

### 32 Measures of Central Tendency and Dispersion

32 Measures of Central Tendency and Dispersion In this section we discuss two important aspects of data which are its center and its spread. The mean, median, and the mode are measures of central tendency

### Chapter 5: The normal approximation for data

Chapter 5: The normal approximation for data Context................................................................... 2 Normal curve 3 Normal curve.............................................................

### Methods for Describing Data Sets

1 Methods for Describing Data Sets.1 Describing Data Graphically In this section, we will work on organizing data into a special table called a frequency table. First, we will classify the data into categories.

### Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

### Module 4: Data Exploration

Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

### 2 Descriptive statistics with R

Biological data analysis, Tartu 2006/2007 1 2 Descriptive statistics with R Before starting with basic concepts of data analysis, one should be aware of different types of data and ways to organize data

### Lecture I. Definition 1. Statistics is the science of collecting, organizing, summarizing and analyzing the information in order to draw conclusions.

Lecture 1 1 Lecture I Definition 1. Statistics is the science of collecting, organizing, summarizing and analyzing the information in order to draw conclusions. It is a process consisting of 3 parts. Lecture

### Graphical methods for presenting data

Chapter 2 Graphical methods for presenting data 2.1 Introduction We have looked at ways of collecting data and then collating them into tables. Frequency tables are useful methods of presenting data; they

### GCSE Statistics Revision notes

GCSE Statistics Revision notes Collecting data Sample This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic

### 1. 2. 3. 4. Find the mean and median. 5. 1, 2, 87 6. 3, 2, 1, 10. Bellwork 3-23-15 Simplify each expression.

Bellwork 3-23-15 Simplify each expression. 1. 2. 3. 4. Find the mean and median. 5. 1, 2, 87 6. 3, 2, 1, 10 1 Objectives Find measures of central tendency and measures of variation for statistical data.

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### THE BINOMIAL DISTRIBUTION & PROBABILITY

REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

### Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

### CHAPTER 3 CENTRAL TENDENCY ANALYSES

CHAPTER 3 CENTRAL TENDENCY ANALYSES The next concept in the sequential statistical steps approach is calculating measures of central tendency. Measures of central tendency represent some of the most simple

### x Measures of Central Tendency for Ungrouped Data Chapter 3 Numerical Descriptive Measures Example 3-1 Example 3-1: Solution

Chapter 3 umerical Descriptive Measures 3.1 Measures of Central Tendency for Ungrouped Data 3. Measures of Dispersion for Ungrouped Data 3.3 Mean, Variance, and Standard Deviation for Grouped Data 3.4

### Describing, Exploring, and Comparing Data

24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

### Home Runs, Statistics, and Probability

NATIONAL MATH + SCIENCE INITIATIVE Mathematics American League AL Central AL West AL East National League NL West NL East Level 7 th grade in a unit on graphical displays Connection to AP* Graphical Display

### Intro to Statistics 8 Curriculum

Intro to Statistics 8 Curriculum Unit 1 Bar, Line and Circle Graphs Estimated time frame for unit Big Ideas 8 Days... Essential Question Concepts Competencies Lesson Plans and Suggested Resources Bar graphs

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Statistical Analysis Using Gnumeric

Statistical Analysis Using Gnumeric There are many software packages that will analyse data. For casual analysis, a spreadsheet may be an appropriate tool. Popular spreadsheets include Microsoft Excel,

### Descriptive Statistics Lesson 3 Measuring Variation 33

Descriptive Statistics Lesson 3 Measuring Variation 33 3.1- What is Data Variation? Data variation is a numeric value which measures the spread of data from the mean. For example, the two sets of numbers

### Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

### San Jose State University Engineering 10 1

KY San Jose State University Engineering 10 1 Select Insert from the main menu Plotting in Excel Select All Chart Types San Jose State University Engineering 10 2 Definition: A chart that consists of multiple

### Topic 5 Review [81 marks]

Topic 5 Review [81 marks] A four-sided die has three blue faces and one red face. The die is rolled. Let B be the event a blue face lands down, and R be the event a red face lands down. 1a. Write down

### Statistics GCSE Higher Revision Sheet

Statistics GCSE Higher Revision Sheet This document attempts to sum up the contents of the Higher Tier Statistics GCSE. There is one exam, two hours long. A calculator is allowed. It is worth 75% of the

### CH.6 Random Sampling and Descriptive Statistics

CH.6 Random Sampling and Descriptive Statistics Population vs Sample Random sampling Numerical summaries : sample mean, sample variance, sample range Stem-and-Leaf Diagrams Median, quartiles, percentiles,

### Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

### Statistical Concepts and Market Return

Statistical Concepts and Market Return 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 2 2. Some Fundamental Concepts... 2 3. Summarizing Data Using Frequency Distributions...

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### Descriptive Statistics Practice Problems (Total 6 marks) (Total 8 marks) (Total 8 marks) (Total 8 marks) (1)

Descriptive Statistics Practice Problems 1. The age in months at which a child first starts to walk is observed for a random group of children from a town in Brazil. The results are 14.3, 11.6, 12.2, 14.,

### Chapter 2 Summarizing and Graphing Data

Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms 2-4 Graphs that Enlighten and Graphs that Deceive Preview Characteristics of Data 1. Center: A

### Box plots & t-tests. Example

Box plots & t-tests Box Plots Box plots are a graphical representation of your sample (easy to visualize descriptive statistics); they are also known as box-and-whisker diagrams. Any data that you can

### Sta 309 (Statistics And Probability for Engineers)

Instructor: Prof. Mike Nasab Sta 309 (Statistics And Probability for Engineers) Chapter 2 Organizing and Summarizing Data Raw Data: When data are collected in original form, they are called raw data. The

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand