1 Descriptive statistics: mode, mean and median
|
|
|
- Philip Barrett
- 9 years ago
- Views:
Transcription
1 1 Descriptive statistics: mode, mean and median Statistics and Linguistic Applications Hale February 5, 2008 It s hard to understand data if you have to look at it all. Descriptive statistics are things you can calculate from data which summarize, in a shorter more manageable way, some salient qualities. Typically we want to get an idea of the overall shape or trend of the measurements. Modal value What was the commonest score on the English proficiency test? > ep <- read.table(file = "english.proficiency", header = T) > table(ep$proficiency) Looks like the scores 149, 156, 165, 168, 173, 180, 181, 205 were all pretty common. There is more than one mode in this data. (arithmetic) mean Your old friend the average. n i=1 ȳ = (y 1 + y 2 + y y n ) /n = y i n The Greek uppercase letter sigma ( ) means add up each measurement y; often the boundaries i = 1 and n are omitted since there are no special restrictions just add up everything and divide by the number of observations. Could you write this averaging function in R? Sure, taking advantage of sum being vectorized. It knows how long the vector of measurements is. In the following function x would be called a formal parameter of the homemade arithmetic.mean function. > arithmetic.mean <- function(x) { + sum(x)/length(x) + } > arithmetic.mean(c(3, 4, 6, 7)) [1] 5 > arithmetic.mean(ep$proficiency) [1] Median Remember the empirical CDF? Dalgaard explains the empirical CDF by saying the fraction of data smaller than or equal to x. That is, if x is the k th smallest observation, then the proportion k/n of the data smaller than or equal to x 1
2 Even though we can use the empirical CDF to talk about any such proportion, the median was the name we gave to the proportion k/n = 1 2. > n <- length(ep$proficiency) > plot(sort(ep$proficiency), (1:n)/n, main = "Empirical CDF of English Proficiency data") > abline(v = 174) Empirical CDF of English Proficiency data (1:n)/n sort(ep$proficiency) Half the data is smaller than or equal to the median score of 174. It is the midpoint of the sorted list of data values. Could you write a median function in R? Sure, let the machine do the ranking for you. > length(ep$proficiency) [1] 93 > 93/2 [1] 46.5 > ceiling(46.5) [1] 47 > sort(ep$proficiency)[47] [1] 174 We have 93 observations, so the middle one would be the 47 th ; the smallest integer greater than a number is its ceiling This exemplifies the case where we have an odd number of measurements. When there is an even number of measurements, the median is defined to be the average of the two values on either side of the middle. 2
3 > ep.even <- ep$proficiency[-1] > length(ep.even) [1] 92 > (sort(ep.even)[46] + sort(ep.even)[47])/2 [1] Our homemade median function (cribbed from Michael Crawley s excellent Statistics: an introduction using R) can then use modular division (%%) to check which method to use 1. > med <- function(x) { + odd.even <- length(x)%%2 + if (odd.even == 0) + (sort(x)[length(x)/2] + sort(x)[1 + length(x)/2])/2 + else sort(x)[ceiling(length(x)/2)] + } Of course you don t have to write these functions yourself, there are built-in R functions named mean and median (the mode functions pertain instead to the runtime system of R, not the statistical notion). Robustness What if some really cocky hot-shot student came in and screwed it all up for everybody else by scoring 9000 on the English proficiency test? > median(append(ep$proficiency, 9000)) [1] > mean(append(ep$proficiency, 9000)) [1] The mean is radically thrown off by such an outlier, whereas the median is not. When a set of data is skewed these various descriptions of central tendency fail to coincide. Figure 1 (overleaf) is an idealized exemple of such a skewed data set. 2 Dispersion We care about how widely dispersed data are because of its impact on our degree of certainty about where they might be coming from. 2.1 Ranges Both of these histograms come from made-up datasets with 1000 observations with mean about Modular division of x by y yields the remainder of x. In the case where y = 2 the remainder will either be 0, which implies y x was even, or 1, which implies y was odd. In general if x%%y is zero, then x is a multiple of y. 3
4 mode median f (x) mean x Figure 1: Idealized data which is positively skewed widely dispersed scores narrowly centered scores Frequency Frequency clipped narrowclipped IQR=64-27=37 IQR=47-43=4 The narrowly-dispersed data (on right) has 50% of its observations clustered between 43 and 47 that s really tight! In the more widely dispersed data (left), such a 50% central interval lies between 27 and 64; much looser. The 50% central interval is the range that includes the median and half of the data points. We can define 4
5 it in terms of quartiles (from the empirical CDF). 25% quartile (Q 1 ) the median of the observations below the grand median 75% quartile (Q 3 ) the median of the observations above the grand median. The Interquartile range Q 3 Q 1 is the width of that central interval. The summary function confirms the visual intuition that the data histogrammed on the left are more widely dispersed than the ones on the right. > summary(narrowclipped) Min. 1st Qu. Median Mean 3rd Qu. Max > summary(clipped) Min. 1st Qu. Median Mean 3rd Qu. Max Variance Does some data set have more or less dispersion, scatter, variability, etc than another? Returning to our wordcount example from last class, > np <- read.table(file = "../jan15/brown-np-lengths", header = T) > vp <- read.table(file = "../jan15/brown-vp-lengths", header = T) > mean(np$wordcount) [1] > mean(vp$wordcount) [1] we know that that average VP is longer than the average NP. Lets relativize the discussion to take that into account, looking at the residuals, y np ȳ np, the differences between particular NP wordcounts and the average NP wordcount. > head(np$wordcount - mean(np$wordcount), n = 20) [1] [7] [13] [19] Difficulty: negative residuals A negative residual comes about when an NP is shorter than average. Summing up these residuals adds up to...nothing! > sum(np$wordcount - mean(np$wordcount)) [1] e-12 The problem is that positive and negative residuals are cancelling out. One solution is to square each residual, to ensure that account is taken of variability in both directions 2. 2 Remember that multiplying negative numbers by themselves yields a positive number, i.e. ( 3) 2 = ( 3) ( 3) = 9. 5
6 2.2.2 Difficulty: number of measurements affects it The last class meeting revealed the cold hard truth that we have observations of NPs but only observations of VPs. The relative dispersion could be the same even though there are radically more attestations in one than the other. Our concept of dispersion should be relativized to the number of observations we actually have. However, if you are going to put five numbers into a box, then you get to make up all five numbers. If instead I tell you what the average of those five must be, then you really only have the freedom to choose four of the numbers the last is completely predictable. This is like using the mean ȳ in calculating a measure of dispersion; data could be very scattered, but not scattered in way such that the residuals don t add up to zero. Thus, the variance has only n 1 degrees of freedom because that last residual must be compatible with the mean. variance = sum of squared residuals degrees of freedom s 2 = (y 1 ȳ) 2 + (y 2 ȳ) (y n ȳ) 2 n 1 Equation 1 gives the definition of variance. = 1 n 1 n (y i ȳ) 2 (1) i=1 2.3 Standard deviation and Z-scores The standard deviation (equation 2) is just the square root s of the variance s 2. s = 1 n (y i ȳ) n 1 2 (2) The R function is called sd. The standard deviation is like a ruler for judging whether a particular data point is really whacko for this sample (or not). > (10 - mean(np$wordcount))/sd(np$wordcount) [1] An NP that is 10 words long is longer than your average 3-word NP. It is 1.7 standard deviations above the mean for NPs. Being ten words long is no great shakes if you are a VP in the Brown corpus, there, 10 words long is is only a quarter of a standard deviation above the mean. > (10 - mean(vp$wordcount))/sd(vp$wordcount) [1] Analogous to being in school and being put in the gifted program if you are more than 2 SDs above the average on some test. It doesn t mean you re ready for college, just that you re probably bored with the work other children your age are doing. (This ensures that the school doesn t have to spend too many resources on special kids since by definition, most of the data will fall below that mark.) The Z-score facilates the comparison of different groups having different variances and means by relativizing observations by those quantities. i=1 Z y = y ȳ s the resulting set of scores always has a mean of zero and a standard deviation of 1. (3) 6
7 3 Now you try ˆ Implement your own variance function in R. ˆ An major midwestern institution assigns beginning students to Chinese classes on the basis of their language aptitude test scores. Some students will have taken Aptitude Test A, which has a mean of 120 and a standard deviation of 12; the remainder will have taken Test B, which has a mean of 100 and a standard deviation of 15. (The means and std devs for both tests were obtained from large US Government studies). student Test A Test B P 132 Q 124 R 122 S 81 T 75 U Calculate the standardized score for each of the students listed in the table (above) and rank the students according to their apparent ability, putting the best first. 2. The institution groups students into classes C,D,E or F according to their score on Test A as follows: those scoring at least 140 are assigned to class C; those with at least 120 but less than 140 are Class D; those with at least 105 but less than 120 are Class E. The remainder are assigned to Class F. In which classes would you place students R,T and U? 7
Descriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
Lesson 4 Measures of Central Tendency
Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research
Means, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression
Measures of Central Tendency and Variability: Summarizing your Data for Others
Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :
The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
Module 4: Data Exploration
Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive
STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences
Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html
Descriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
2. Filling Data Gaps, Data validation & Descriptive Statistics
2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)
consider the number of math classes taken by math 150 students. how can we represent the results in one number?
ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.
Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs
Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)
Descriptive statistics parameters: Measures of centrality
Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between
Descriptive Statistics
Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9
Normal distribution. ) 2 /2σ. 2π σ
Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a
Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.
Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
Statistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)
Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center
Lecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel [email protected] Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
Introduction; Descriptive & Univariate Statistics
Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of
Lecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Topic 9 ~ Measures of Spread
AP Statistics Topic 9 ~ Measures of Spread Activity 9 : Baseball Lineups The table to the right contains data on the ages of the two teams involved in game of the 200 National League Division Series. Is
CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction
CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous
CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13
COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance,
Descriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
Frequency Distributions
Descriptive Statistics Dr. Tom Pierce Department of Psychology Radford University Descriptive statistics comprise a collection of techniques for better understanding what the people in a group look like
Z - Scores. Why is this Important?
Z - Scores Why is this Important? How do you compare apples and oranges? Are you as good a student of French as you are in Physics? How many people did better than you on a test? How many did worse? Are
4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
Northumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
Bar Graphs and Dot Plots
CONDENSED L E S S O N 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs
So let us begin our quest to find the holy grail of real analysis.
1 Section 5.2 The Complete Ordered Field: Purpose of Section We present an axiomatic description of the real numbers as a complete ordered field. The axioms which describe the arithmetic of the real numbers
BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
MBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
2 Describing, Exploring, and
2 Describing, Exploring, and Comparing Data This chapter introduces the graphical plotting and summary statistics capabilities of the TI- 83 Plus. First row keys like \ R (67$73/276 are used to obtain
Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.
Chapter 3. The Normal Distribution
Chapter 3. The Normal Distribution Topics covered in this chapter: Z-scores Normal Probabilities Normal Percentiles Z-scores Example 3.6: The standard normal table The Problem: What proportion of observations
Measurement with Ratios
Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical
Chapter 2: Descriptive Statistics
Chapter 2: Descriptive Statistics **This chapter corresponds to chapters 2 ( Means to an End ) and 3 ( Vive la Difference ) of your book. What it is: Descriptive statistics are values that describe the
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
3.2 Measures of Spread
3.2 Measures of Spread In some data sets the observations are close together, while in others they are more spread out. In addition to measures of the center, it's often important to measure the spread
6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
Skewed Data and Non-parametric Methods
0 2 4 6 8 10 12 14 Skewed Data and Non-parametric Methods Comparing two groups: t-test assumes data are: 1. Normally distributed, and 2. both samples have the same SD (i.e. one sample is simply shifted
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
AP Statistics Solutions to Packet 2
AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that
Final Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Module 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Exploratory Data Analysis. Psychology 3256
Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find
Interpreting Data in Normal Distributions
Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,
8 Primes and Modular Arithmetic
8 Primes and Modular Arithmetic 8.1 Primes and Factors Over two millennia ago already, people all over the world were considering the properties of numbers. One of the simplest concepts is prime numbers.
Lecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.
Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course
Introduction to Hypothesis Testing
I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true
Week 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
Foundation of Quantitative Data Analysis
Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1
Summarizing and Displaying Categorical Data
Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
How To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
DATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
Descriptive Statistics: Summary Statistics
Tutorial for the integration of the software R with introductory statistics Copyright Grethe Hystad Chapter 2 Descriptive Statistics: Summary Statistics In this chapter we will discuss the following topics:
Variables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
Lecture 5 : The Poisson Distribution
Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.
The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution
3: Summary Statistics
3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes
" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
WEEK #22: PDFs and CDFs, Measures of Center and Spread
WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!
STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.
Adding and Subtracting Positive and Negative Numbers
Adding and Subtracting Positive and Negative Numbers Absolute Value For any real number, the distance from zero on the number line is the absolute value of the number. The absolute value of any real number
Mean, Median, Standard Deviation Prof. McGahagan Stat 1040
Mean, Median, Standard Deviation Prof. McGahagan Stat 1040 Mean = arithmetic average, add all the values and divide by the number of values. Median = 50 th percentile; sort the data and choose the middle
Week 3&4: Z tables and the Sampling Distribution of X
Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence
99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm
Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the
2 Sample t-test (unequal sample sizes and unequal variances)
Variations of the t-test: Sample tail Sample t-test (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing
Zeros of a Polynomial Function
Zeros of a Polynomial Function An important consequence of the Factor Theorem is that finding the zeros of a polynomial is really the same thing as factoring it into linear factors. In this section we
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots Describing Quantitative Data with Numbers 1.3 I can n Calculate and interpret measures of center (mean, median) in context. n Calculate
STAT355 - Probability & Statistics
STAT355 - Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1 - Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive
Chapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.
Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
AP * Statistics Review. Descriptive Statistics
AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production
Point and Interval Estimates
Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number
