Introduction to Statistics

Size: px
Start display at page:

Download "Introduction to Statistics"

Transcription

1 Introduction to Statistics Class Overheads for HSS 2381 Section C Measurement and Data Analysis by D. Gordon E. Robertson, PhD, FCSB School of Human Kinetics University of Ottawa Copyright D.G.E. Robertson, PhD, FCSB September 2007

2 Introduction to Statistics 2 Parameter: measurable characteristic of a population Population: all members of a definable group. For statistical purposes a population must have definable characteristics even if it is not possible to measure the variable or even count the number of members of the population. Sample: subset or subgroup of a population Usually obtained by random sampling of a single population. Statistic: measurable characteristic of a sample E.g., height, weight, political affiliation, ethnicity, aerobic capacity, strength, power,... Data or Data Set: collection of numerical and or nonnumerical values (plural of datum) Datum: single measured value (singular of data)

3 Statistics 3 Statistics: 1. plural of statistic, 2. science of conducting studies to collect, organize, summarize, analyze and draw conclusions from data Descriptive statistics: collection, description, organization, presentation and analysis of data Inferential statistics: generalizing from samples to populations, testing of hypotheses, determining relationships among variables and making decisions, uses probability theory to make decisions Hypothesis: less than a thesis, a testable conjecture based on a theory Thesis: a dissertation or learned argument which defends a particular proposition or theory Qualitative measurements: Typically non-numerical, subjectively measured, judgmentally determined, categorical. E.g., religious affiliation, teacher/professor evaluations, emotional states, flavour, gender Quantitative measurements: Typically numerical, objectively measured, reliability (repeatability or precision) and validity (accuracy) can be evaluated against a criterion E.g., salary, course grade, foot size, IQ, age, girth

4 Quantitative Measures 4 Types of quantitative measures: Constants: quantities with fixed characteristics Physical constants: G, c, h (Planck s constant) Mathematical constants:, e, i Variables: quantities whose characteristics vary Discrete variables: numerical variables that have finitely many possibilities (usually integers), countable many possible values Examples: value of $ bills or coins, card count Continuous variables: numerical variables that have infinitely many possible values within a range of values (numbers between 1 and +1) or unbounded (Real numbers, numbers greater than 0) Examples: height, duration, angle (only a fixed number of significant figures are reported) Significant Figures: When reporting numerical information, especially when obtained by a calculator, usually only 3 or 4 digits are required. The general rule that is accurate to 0.5% holds that only 4 significant figures are needed if the first nonzero number is a 1 and 3 when it is not. Examples: , 1.234, 2.45, , , 8910 and Exceptions are frequencies and counts when all digits are reported and financial numbers which are too nearest dollar or nearest cent depending on the amount.

5 Measurement Scales 5 Nominal: classifies data into mutually exclusive (nonoverlapping), exhaustive categories in which no ordering or ranking of the categories is implied E.g., colour, flavour, religion, gender, sex, nationality, county of residence, postal code Ordinal: classifies data into categories that can be ordered or ranked (highest to lowest or vice versa), precise differences between categories does not exist E.g., teaching evaluations, letter grade (A-, A, A+...F), judges scores (1-10), preferences (polls), skill rankings Interval: numerical data with precise differences between categories but with no true zero (i.e., zero implies absence of quantity) E.g., IQ (0 means could not be measured), temperature (degrees Celsius), z-scores (0 is average value), acidity (ph, 7 is neutral) Ratio: interval data with a true zero, true ratios exist E.g., height, weight, temperature (in Kelvins), strength, price, age, duration

6 Methods of Sampling 6 Random: subjects are randomly selected from a population, all subjects have equal probability of being selected, subjects may not be selected twice Systematic: subjects are numbered sequentially and every th n subject is selected to obtain a sample of N/n subjects (N is number of people in population) Stratified: population is divided into identifiable groups (strata) by some relevant variable (income, gender, age, education) and each strata is sampled randomly in proportion to the strata s relative size Cluster: subjects are randomly sampled from representative clusters or regions of the population. Economical method if subjects are widely dispersed geographically. Convenience: typically used in student projects and by journalists, uses subjects that can be conveniently polled or tested. Not suitable for pollsters or medical research.

7 7 Graphing Types of graphs: Pictogram: numeric data are represented by pictures, usually only nominal data are depicted in this way Example: milk production increases by 200% Before Biassed way: After height of cow is doubled but two-dimensionally cow is four times bigger, three-dimensionally it is eight times bigger Unbiassed way: increase is correctly depicted as two times greater

8 8 Graphing cont d Pie chart: used with nominal or frequency data Example: number of students by province and country a segment can be emphasized by separating it two-dimensional pie cannot create a biassed view three-dimensional pies can bias a slice depending on its position - a slice in front appears - put a slice in the back larger to reduce its size - separating it creates emphasis

9 9 Graphing cont d Bar graph: used for nominal data, usually frequency counts, are depicted by bars proportional to their magnitudes bars are separated extreme length bars can be split Histogram: used for ordinal data bars are adjacent, no gaps one axis is ordered, first or last bars may include extremes

10 10 Line graph: used with interval and ratio data scaling can create a bias use large scales to hide changes Graphing cont d truncated axis reduces white space scaling to minimum and maximum emphasizes changes

11 11 Graphing cont d Ogive or cumulative frequency: (pronounced 0-jive) line starts at zero and accumulates to 100% useful for determining percentages (by interpolation)

12 12 Rules for Constructing a Frequency Histogram 1. There should be between 5 and 20 classes. this is strictly for aesthetic purposes 2. The class width should be an odd number. this ensures that the midpoint has the same number of decimal places as the original data 3. The classes must be mutually exclusive. each datum must fall into one class and one class only 4. The classes must be continuous. there should be no gaps in the number line even if a class has no members 5. The classes must be exhaustive. all possible data must fit into one of the classes 6. The classes must have equal width. if not there will be a bias among the classes you can have open-ended classes at the ends (i.e., for ages you may use 10 and under or 65 and over, etc.)

13 13 Types of Frequency Distributions Categorical Ungrouped - for nominal types of data - for numerical data with few scores Grouped - for numerical data with many scores Example: Distribution of the number of hours that boat batteries lasted. Class Class Tally Frequency Cumulative Cumulative Limits Bounds frequency percentages /// 3 3 3/25*100= 12% +))) / 1 1+3= 4 4/25*100= 16% +))) //// 5 5+4= 9 9/25*100= 36% +))) //// //// 9 9+9= 18 72% +))) //// / = 24 96% +)))) / = % Total % +)))))))))))))))))))))))))))))))))))))))- +)))))))- Use these numbers for frequency polygon. Use these numbers for constructing cumulative frequency polygon, also called an ogive.

14 Frequency Polygon and Ogive 14 Frequency polygon: Cumulative frequency or ogive:

15 Measures of Central Tendency Mode: most frequent score best average for nominal data sometimes none or multiple modes in a sample bimodal or multimodal distributions indicate several groups included in sample easy to determine 15 Midrange: mean of highest and lowest scores easy to compute, rough estimate, rarely used Median: value that divides distribution in half best average for ordinal data most appropriate average for skewed ratio or interval data or data on salaries difficult to compute because data must be sorted unaffected by extreme data Arithmetic mean: centre of balance of data sum of numbers divided by n best average for unskewed ratio or interval data easy to compute sample mean = population mean = Other measures: harmonic mean, geometric mean and quadratic mean, also called root mean square (RMS) RMS =

16 Skewed Data 16 direction of skew is the direction of the tail positive direction of a number line is to the right, left is negative direction mean, mode and median (MD) are the same for symmetrical distributions notice mean is closest to the tail (i.e., more influenced by extreme values)

17 Measures of Variation 17 Range: highest minus lowest values used for ordinal data R = highest lowest th th Interquartile range: 75 minus 25 percentile used for determining outliers IQR = Q Q 3 1 Variance: mean of squared differences between scores and the mean used on ratio or interval data used for advanced statistical analysis (ANOVAs) Standard deviation: has same units as raw data used on ratio or interval data most commonly used measure of variation Coefficient of variation: percentage of standard deviation to mean used to compare variability among data with different units of measure

18 Biased and Unbiased Estimators 18 sample mean is an unbiased estimate of the population mean variances and standard deviations are biased estimators because mean is used in their computation Why? Last score can be determined from mean and all other scores, therefore, it is not free to vary or add to variability. To compensate divide sums of squares by n 1 instead of n. Instead of using the standard formula a computing formula is used so that running totals of scores and scores squared may be used to compute variability. Computing Formulae 2 Variance: S = sample variance Standard deviation: S = sample standard deviation

19 Measures of Position 19 Percentile: score which exceeds a specified percentage of the population suitable for ordinal, ratio or interval data th median (MD or Q 2) is 50 percentile th th first and third quartiles (Q 1 and Q 3) are 25 and 75 percentiles easier for non-statisticians to understand than z-scores scores are all positive numbers Standard or z-scores: based on mean and standard deviation and the normal distribution suitable for ratio and interval numbers approximately 68% of scores are within 1 standard deviation of the mean, approximately 95% are within 2 standard deviations and approximately 99% are within 3 standard deviations half the scores are negative numbers mean score is zero excellent way of comparing measures or scores which have different units (i.e., heights vs. weights, metric vs. Imperial units, psychological vs. physiological measures)

20 Measures of Position and Outliers 20 Other measures of position: th th th Deciles: 10, 20, percentiles (D 1, D 2,...D 10) often used in education or demographic studies th th th Quartiles: 25, 50 and 75 percentiles (Q 1, Q 2, Q 3) frequently used for exploratory statistics and to determine outliers (Q is same as median) 2 Outliers: extreme values that adversely affect statistical measures of central tendency and variation Method of determining outliers: compute interquartile range (IRQ) multiply IRQ by 1.5 lower bound is Q 1 minus 1.5 IRQ upper bound is Q 3 plus 1.5 IRQ values outside these bounds are outliers and may be removed from the data set it is assumed that outliers are the result of errors in measurement or recording or were taken from an unrepresentative individual Alternate method for normally distributed data: +/ 4 or 5 standard deviations

21 Counting Techniques 21 Multiplication Rule 1: In a sequence of n events with each event having k possibilities, the total number of outcomes is: n k = k k k... k Examples: How many 6 digit student ID numbers = How many ways of throwing 5 dice (Yahtze) = How many ways of selecting 3 letters = Multiplication Rule 2: In a sequence of n events in which there are k 1 possibilities for the first event, k 2 possibilities for the second event and k 3 for the third, etc., the total number of possible outcomes is: k 1 k 2 k 3... kn Examples: How many 3 digit phone exchanges before = 160 (minus 911, 411, 511) How many 3 digit phone exchanges after = 800 (minus 911, 411, 511) How many 7 character Ontario licence plates (4 letters, 3 numbers) =

22 Factorial Notation 22 Factorials: Factorial numbers are identified with an exclamation mark (n!). They are defined: n! = n n 1 n ! is defined to be 1 Examples: 1! = 1 2! = 2 1 = 2 5! = = ! = !

23 Permutations and Combinations 23 Permutations: Rule 1: How many ways are there of arranging ALL n unique items if replacement is NOT allowed? n! = n n 1 n Examples: How many ways of arranging 6 items in a display. n = 6! = = 720 How many ways of ordering three experiments. n = 3! = 6 Rule 2: How many ways are there of selecting r items from n unique possibilities if replacement is NOT allowed? P = n! / (n r)! = n n 1 n 1... (n r) 1 n r Examples: How many ways of selecting 2 items from group of 6. How many ways of selecting committee of four from a staff of 20 if order of selection is significant.

24 Permutations and Combinations cont d Rule 3:How many ways are there of selecting n items if replacement is NOT allowed but k 1, k 2,... k n items are identical? n! / (k 1! k 2!... k n!) Examples: How many words from the letters in Ottawa. n = 6! / (2! 2!) = 180 How many words from the letters in Toronto. n = 7! / (2! 3!) = 420 Combinations: How many ways are there of selecting r items from n unique possibilities if replacement is NOT allowed and order is not important? nc r = n! / [(n r)! r!] = np r /r! Examples: How many ways are there of selecting 2 items from a group of How many ways of selecting committee of four from a staff of 20 if order of selection is unimportant.

25 Probability 25 Basic Concepts: Probability experiment: process that leads to welldefined results, called outcomes Outcome: result of a single trial of a probability experiment (a datum) Sample space: all the possible outcomes of a probability experiment, i.e., the population of outcomes Event: consists of one or more outcomes of a probability experiment, i.e., a sample of outcomes I. Classical Probability: ratio of number of ways for an event to occur over total number of outcomes in the sample space, i.e., where E is the event, S is the sample space, P(E) is the probability of the event E occurring, n(e) is the number of possible outcomes of event E, n(s) is the number of outcomes in the sample space To obtain a probability using classical methods the event and sample space must be countable. Often the rules for determining permutations and combinations are required.

26 Classical Probability 26 Examples: Probability of rolling snake eyes (1, 1) with two dice. Sample space for rolling two dice (one white, one black): sevens elevens n(s) = 36 possible outcomes Notice, there are four different ways of reporting a probability (proper fraction, ratio, decimal and percentage). Probability of 7 or 11 from rolling two dice. Probability of doubles from two dice. Probability of drawing a queen from a card deck.

27 Classical Probability cont d Examples: Probability of drawing a spade. 27 Probability of drawing a red card. Probability of flipping heads in a coin toss. Probability of flipping heads after 10 coin tosses of heads in a row. coin cannot remember its history of outcomes Probability of red on a double zero roulette wheel. wheel has numbers 1 to 36, half are red and half are black, plus green zero and double zero (n=38) Probability of not getting red on a roulette wheel.

28 28 Rules of Probability Rule 1: all probabilities range from 0 to 1 inclusively 0 P(E) 1 Rule 2: probability that an event will never occur is zero P(E) = 0 Rule 3: probability that an event will always occur is one P(E) = 1 Rule 4: if P(E) is the probability that an event will occur, the probability that the event will not occur is (also called the complement of an event): Venn diagrams: P (not E) = 1 P(E) Rule 5: the sum of probabilities of all outcomes in a sample space is one P(S) = P(E) = 1

29 Empirical Probability 29 II. Empirical Probability: obtained empirically by sampling a population and creating a representative frequency distribution for a given frequency distribution, probability is the ratio of frequency of an event class to the total number of data in the frequency distribution, i.e., Examples: Probability of a girl baby. Assume that a population has a blood type distribution of: 2% AB, 5% B, 23% A and 70% O Probability of a person having type B or AB blood. P(B or AB) = 2% + 5% = 7.00% = Probability of strongly left-handed person. P(strongly left-handed) = = 5.00% Probability of natural blues eyes. P(blue-eyed) = = 6.50%

30 Addition Rules Addition Rule 1: if two events are mutually exclusive, (i.e., no outcomes in common) then the probability of A or B occurring is: 30 Venn diagrams of events that are mutually exclusive: If three events are mutually exclusive: Examples: Probability of selecting a selecting a spade or a red card. Probability of drawing a face card (king or queen or jack). Probability of 7, 11 or doubles with 2 dice.

31 Addition Rules 31 Addition Rule 2: if two events are not mutually exclusive, the probability of A or B occurring is: Venn diagrams of events that are NOT mutually exclusive: When three are not mutually exclusive there is a region that is common to all three events. This area gets added and subtracted three times and therefore must be added back once. That is,

32 Addition Rules cont d Examples: Probability of selecting a spade or a face card. 13 spades 12 face cards (3 per suit) 32 Probability of selecting a female student or a third-year student from 103 students. 53 are female 70 in third year 45 females in third year Probability of selecting a male or person with type O blood from 100 people. half males 70% O-type blood Probability of selecting a left-handed person or a Liberal from 1000 people. 32% Liberals, 5% left-handed

33 Independence Definition: two events are independent if the occurrence of one event has no effect on the occurrence of the other event. In probability experiments where there is no memory from one event to the next, the events are called independent. Examples of Independent Events: Coin tosses. Even when 10 heads are flipped in a row the next coin toss still as a 50:50 chance of being a head. Roulette wheel spins. Each spin of the wheel is theoretically independent. Each number on the wheel has equal probability of occurring at each spin. Rolling dice repeatedly. The dice cannot remember what they rolled from one toss to another. Drawing cards with replacement. With replacement means after a card is drawn it is put back in the deck thus all cards are equally likely to be drawn each time. Examples of Dependent Events: Drawing cards without replacement. Once a card is drawn it cannot be drawn a second time. This changes the characteristics of the remaining deck of cards. Bingo numbers. Once a ball is drawn it is not replaced. Lottery 6/49. All numbers (1 to 49) are equally likely to be chosen but can only be chosen once. 33

34 Multiplication Rules 34 Multiplication Rule 1: if two events are independent (i.e., have NO on influence of each other s probability) then the probability of A and B occurring is: Venn diagram: Examples: Coin and dice tossing, lotteries, slot machines, roulette wheels etc., any game or experiment where knowledge of an outcome is not remembered by the next game or experiment. Probability of tossing heads twice. Probability of rolling seven twice with two dice. Probability of having nine daughters in a row.

35 Multiplication Rules cont d 35 Multiplication Rule 2: if two events are dependent then the probability of A and B occurring is: where P(B A) means the probability of B occurring given that A occurs or occured, also called the conditional probability Examples: Card games where the cards are not replaced or selections where replacement is not allowed. Results from one experiment affect outcome of next. Probability of a drawing a two then a three. Probability of a drawing an ace then a face card. Probability of a drawing a pair. Probability of a drawing a pair of aces.

36 Probability Distributions 36 Definition: distribution of the values of a random variable and their probability of occurrence Random variable: discrete or continuous variable whose values are determined by chance Examples: 1. Probability distribution of a coin toss (approximately 1 half) 2. Probability distribution of a th fair die toss (each 1/6 ) 3. Probability distribution of polls (correct 19 times out of 20)

37 Mean, Variance and Expectation 37 Mean: of a probability distribution (weighted average) th where X is the i outcome and P(X ) is its probability i Examples: 1. Mean number of heads for tossing two coins i 2. Mean number of spots for tossing a single die Notice that the answer does not have to be possible. Variance and Standard Deviation: Expectation: the expectation or expected value of a probability distribution is equal to the mean for predicting the cost of playing games and lotteries

38 Expectation cont d 38 Examples: 1. Compute the expectation of playing a lottery where 100 tickets are sold for $1 and the winning prize is worth $100. This is considered a fair game. If the prize was $50 the expectation would be $0.50. Any negative value is a loser for the player; any positive value is a good game for the player. 2. Compute the profit or loss of playing a lottery where the cost of a ticket is $10, there are 1000 tickets sold and the prizes are: st 1 place wins $1000, nd 2 place wins $500 and rd five 3 places win $100

39 Binomial Distribution 39 Definition: probability distribution in which there are only two outcomes, or can be reduced to only two by some rule ( an event occurs and the event does not occur ) Examples: heads and tails, true and false, success and failure, boy or girl, equal to a value and not equal, roll a 1 and not roll an 1 with a die Rules: - only two outcomes per trial - fixed number of trials - independence from trial to trial - probability same from trial to trial Notation: p = probability of success q = probability of failure n = number of trials x = number of successes where 0 x n x n-x P(x) = nc x p q Note, since p + q = 1 therefore q = 1 p Examples: 1. Probability of 4 sixes in 4 tosses of a die. 2. Probability of tossing five heads in seven tosses.

40 Binomial Distribution cont d 40 Examples: 1. Tossing of a fair coin (1 trial and 4 trials) 2. Rolling a six with a fair die (rolling a die is multinomial) 3. Answering a four-choice multiple choice question correctly

41 Normal Distribution 41 Many biological and physical processes exhibit a distribution called the normal or Gaussian distribution. Values tend to cluster around a mean and extreme values are relatively rare. Each normal distribution has different units of measure but they can be normalized using the z-score transform (z = (X )/ ). This defines the standard normal distribution or z-distribution. Definition: x e is called the exponential 1 function. e, called Euler s number, is a transcendental number equal to: 1/n! Properties of Standard Normal Distribution: bell shaped, unimodal distribution mean, median and mode are the same and equal to 0 standard deviation is equal to 1 symmetric about the mean continuous function (infinitely differentiable, for every x there is a single y) asymptotic to the x-axis at both ends (y values approach but never become zero) area under curve equals 1

42 Applications of Normal Distribution 42 Uses: computing areas and percentiles of scores that are normal distributed testing hypotheses concerning means of different populations (are they the same or different?) Examples: 1. Find the percentage of scores between +/ 1 standard deviations. area between 0 and +1 = area between 0 and 1 = area between 1 and +1 = 2 x = = 68.3% 2. Find the percentage of scores between +/ 2 and +/ 3 standard deviations. area between 0 and +2 = area between 2 and +2 = 2 x = = 95.4% area between 3 and +3 = 2 x = = 99.7% 3. Find the z-score that defines 95% of scores around the mean.

43 43 Applications of Normal Distribution cont d th 4. Find z-score that defines the lower 90 percent. th th 5. Find 25 and 75 percentile z-scores. 6. College wants top 15% of students who take a test which has a mean of 125 and standard deviation of Determine the 5 and 95 percentile heights of a th th population that has a mean of ±20.0 cm.

44 44 Central Limit Theorem Sampling Distribution of Sample Means: distribution based on the means of random samples of a specified size (n=constant) taken from a population Example: Test scores from a class of four students. Scores were 2,4,6,8 (uniform distribution) List all possible samples of size n=2, allowing replacement. Sample Mean Sample Mean 2,2 2 6,2 4 2,4 3 6,4 5 2,6 4 6,6 6 2,8 5 6,8 7 4,2 3 8,2 5 4,4 4 8,4 6 4,6 5 8,6 7 4,8 6 8,8 8 Sampling distribution of means: Mean frequency

45 45 Central Limit Theorem cont d As sample size (n) increases the shape of the sampling distribution of sample means taken from a population with mean,, and standard deviation,, will approach a normal distribution, with mean,, and standard deviation,. the standard deviation of the sampling distribution is called the standard error of the mean (notice, by definition, it is always less than sample s standard deviation when n > 1) Note, whenever the sample size (n) exceeds 5% of the population size (N) the standard error must be adjusted by the Finite Population Correction Factor: That is, Example: What is the standard error of the mean for a sampling distribution given a sample of size of 100 and s.d. of 5.00 taken from a population of size, 1000.

46 Confidence Intervals 46 Point Estimate: a specific value that estimates a parameter e.g., a sample mean ( ) is best estimator of the population mean ( ) problem is that there is no way to determine how close a point estimate is to the parameter Properties of a Good Estimator: 1. must be an unbiased estimator -expected value of estimator or mean obtained from samples of a given size must be equal to the parameter 2. must be consistent -as sample size increases estimator approaches value of the parameter 3. must be relatively efficient -estimator must have smallest variance of all other estimators Interval Estimate: range of values that estimate a parameter th e.g., mean +/ standard deviation ( ), 25 ± 10 kg, 5 to th 95 %ile precise probabilities can be assigned the validity of the interval

47 Confidence Intervals cont d 47 Confidence Interval: interval estimate based on sample data and a given confidence level Confidence Level: probability that a parameter will fall within an interval estimate related to alpha ( ) level, that is, Confidence Level = 1 E.g., C.L. = 95% means = 0.05 C.L. = 99% means = 0.01 Formula for Computing Confidence Intervals where z /2 is the z-score that places the area /2 in the right tail of the normal distribution For example if is 5% then z /2 is the z-score that places 2.5% in the right tail and z /2 2.5% in the left tail. That is z = /2

48 Confidence Intervals cont d 48 Maximum Error of Estimate (E): Example: th Compute the 95 percentile confidence interval from a sample of size 30 which has a standard deviation of 5.00 and a mean of Since is unknown use s. From z-table: z = /2 Therefore, Confidence interval = C.I. = 25.0 ±1.789 = 23.2 to 26.8

49 49 Confidence Intervals when is Unknown and n is Small When population standard deviation is known, always use the z-distribution (normal distribution). If not known, which is often the case, use sample standard deviation, s, as long as n > 29. If not, use the t-distribution with n 1degrees of freedom. Decision Tree for Selecting Statistical Method Formula for Computing Confidence Intervals for Small Sample Sizes

50 50 Confidence Intervals when is Unknown and n is Small Example: th Compute the 95 percentile confidence interval from a sample of size of 10 which has a standard deviation of 5.00 and a mean of (Similar to previous example.) From t-table with degrees of freedom = df = n 1 = 9: t = /2 Therefore, Confidence interval = C.I. = 25.0 ± 3.58 = 21.4 to 28.6

51 Sample Size 51 Minimum Sample Size for Interval Estimate of Population Mean: where n is sample size, is the population standard deviation and E is the maximum error of estimate. When is unknown it may be estimated from the sample standard deviation, s. Example: Calculate the sample size needed to estimate muscle strength from a population that has a standard deviation of newtons if you want to be 95% confident and within 50.0 newtons. 2 n = ( / 50.0) = 30.1 You will need a sample size of 31. Note, always round up to the next highest integer when there is a fraction.

52 Hypothesis Testing 52 Hypothesis: conjecture, proposition or statement based on published literature, data or a theory which may or may not be true Statistical Hypothesis: conjecture about a population parameter usually stated in mathematical terms two types, null and alternate Null Hypothesis (H 0): states that there is NO difference between a parameter and a specific value or among several different parameters Alternate Hypothesis (H 1): states that there is a significant difference between a parameter and a specific value or among several different parameters Examples: H 0: = 82 kg H 1: 82 kg* H 0: 150 cm H 1: > 150 cm H 0: 65.0 s H 1: < 65.0 s H 0: 0 = 1 H 1: ì 0 ì 1* H 0: 0 1 H 1: 0 < 1 Notice that the equality symbols are always with the null hypotheses. * These are called two-tailed tests; others are all directional or one-tailed tests.

53 Two-tailed vs One-tailed Tests 53 Two-tailed: -also called a non-directional test null hypothesis is disproved if sample mean falls in either tail most appropriate test especially with no previous experimentation less powerful than onetailed One-tailed: -also called a directional test researcher must have reason that permits selecting in which tail the test will be done, i.e., will the experimental protocol increase or decrease the sample statistic more powerful than two-tailed since it is easier to achieve a significant difference fails to handle the situation when the sample means falls in the wrong tail One-tailed, left One-tailed, right

54 Statistical Testing 54 To determine the veracity (truth) of an hypothesis a statistical test must be undertaken that yields a test value. This value is then evaluated to determine if it falls in the critical region of a appropriate probability distribution for a given significance or alpha ( ) level. The critical region is the region of the probability distribution which rejects the null hypothesis. Its limit(s), called the critical value(s), are defined by the specified confidence level. The confidence level must be selected in advance of computing the test value. To do otherwise is statistical dishonesty. When in doubt one should always use a two-tailed test. Instead of reporting significance levels ( = 0.05) or equivalent probabilities (P<0.05) many researchers report the test values as probabilities or P- values. (e..g., P = , P = 0.253, P < 0.001, Not P=0.000). Advanced statistical programs report P-values, if not, use P<0.05 or P<0.01. Truth table: H 0 is true and H is false Test rejects H 0 (accepts H ) Test does not reject H (accepts H ) Error ( ) Type I error Correct (1 ) (experiment failed) H 0 is false and H is true 1 Correct (1 ) (experiment succeeded) Error ( ) Type II error

55 z-test and t-test 55 Test for a Single Mean: used to test a single sample mean ( ) when the population mean ( ) is known Is the sample taken from the population or is it different (greater, lesser or either)? z-test: when population s.d. ( ) is known Test value: if z is in critical region defined by critical value(s) then sample mean is significantly different from population mean, if is unknown then use sample, s, as long as sample size is greater than 29 Test value: t-test: if is unknown and n < 30 then use t-test and t-distribution with d.f. = n 1 Test value:

56 Flow Diagram for Choosing the Correct Statistical Test 56 Same as flow diagram used for confidence intervals. Generally the sample s mean and standard deviation are used with the t-distribution. The t-distribution becomes indistinguishable from the z-distribution (normal distribution) when n>29.

57 Power of a Statistical Test 57 Power: -ability of a statistical test to detect a real difference probability of rejecting the null hypothesis when it is false (i.e., there is a real difference) equal to 1 (1 probability of Type II error) Ways of increasing power Increasing will increase power but it also increases chance of a Type I error Increasing sample size (increases costs) Using ratio or interval data versus nominal or ordinal. Tests involving ratio/interval data are called parametric tests. Those involving nominal and ordinal data are called nonparametric tests. Using repeated measures tests, such as, the repeated measures t-test or ANOVA. By using the same subjects repeatedly, variability is reduced. If variances are equal use pooled estimates of variance (e.g., Independent groups t-test) Using samples that represent extremes. Reduces generalizability of experiment results. Standardizing testing procedures reduces variability. Using one-tailed vs. two-tailed tests. Problem occurs if results are in wrong tail. Not recommended.

58 Testing Differences between Two Means Large Independent Sample Means: Used to test whether the data from two samples come from the same populations or whether two populations are different. Assumptions: samples must be independent, i.e., there can be no relationship between the two samples populations must be normally distributed and standard deviations known or sample size > 29 should not be used if more than two means are tested unless adjustments are made to significance levels (e.g., Bonferroni correction, Bonferroni = /number of tests) Z-test: Test value: 58 Critical value comes from standard normal (z) distribution. Use one- or two-tailed test. Conservatively, choose the two-tailed test. Values are also available at bottom of t-distribution.

59 The Step-by-Step Approach 59 Step 1: State hypotheses Two-tailed: One-tailed: H: = H: or H: H : H: > or H: < Step 2: Find critical value Look up z-score for specified significance ( ) level and for one- or two-tailed test (selected in advance). Usually use = 0.05 and two-tailed test, i.e., z critical = ± For one-tailed use z critical = ± Step 3: Compute test value Step 4: Make decision Draw diagram of normal distribution and critical regions. If test value is in critical region reject the null hypothesis otherwise do not reject. Step 5: Summarize results Restate hypothesis (null or alternate) accepted in step 4. If reject null: There is enough evidence to reject the null hypothesis. If not reject null: There is not enough evidence to reject the null hypothesis. Optionally, reword hypothesis in lay terms. E.g., There is/is not a difference between the two populations or one population is greater/lesser than the other for the independent variable.

60 Testing Differences between Two Means Small Independent Sample Means: When population standard deviations are unknown and sample size is < 30 use t-distribution for critical values and t-test for test values. Use F-ratio to determine whether sample variances are equal or unequal. Then choose the correct t-test. Assumptions two samples must be independent, i.e., different subjects, if not, use dependent groups t-test data must be normally distributed If sample variances are NOT equal: Use test value: For degrees of freedom (df) use smaller of n 1 1 and n 2 1 (i.e., conservative choice, higher critical value) If sample variances are equal: Use test value: and df = n 1 + n 2 2 Uses a pooled estimate of variance which combined with reduced degrees of freedom increases the test s power. 60

61 Test for Equal Variances 61 Also called Homogeneity of Variance used primarily to determine which t-test to use uses F-distribution and F-test (later used for ANOVA) assume variances are equal and test if unequal SPSS uses Levine s Test for Equality of Variances. If P (Sig.) < variances are NOT equal. Step 1: Always a two-tailed test. H: 0 s 1 = s22 H 1: s 1 s22 Step 2: Find critical value (F CV) from F-distribution. Use degrees of freedom of larger variance (dfn = n larger 1) as numerator and degrees of freedom of smaller variance as denominator (dfd = n 1). Step 3: smaller Compute test value: Note, F TV will always be 1. Step 4 and 5: If F TV > F CV then reject H 0 and conclude variances are unequal. If F TV F CV then do NOT reject H 0 and conclude variances are equal. I.e., you have homogeneity of variances. You can now select the appropriate Independent Groups t-test.

62 Flow Diagram for Choosing the Correct Independent Samples t-test 62 Similar to flow diagram used for single sample means. But requires a test for equality of variances (homogeneity of variance). Generally the sample s mean and standard deviation are used with the t-distribution. The t-distribution becomes indistinguishable from the z-distribution (normal distribution) when n>29. Samples must be independent.

63 Testing Differences between Two Means Dependent Sample Means: Used when two samples are not independent. More powerful then independent groups t-test and easier to perform (no variance test required). Simplifies research protocol (i.e., fewer subjects) but dependence may limit generalizability. Examples: repeated measures (test/retest, before/after) matched pairs t-test (subjects matched by a relevant variable: height, weight, shoe size, IQ score, age) twin studies (identical, heterozygotic, living apart) Step 1: Two-tailed: One-tailed: H: 0 D = 0 H: 0 D 0 or H: 0 D 0 H: 1 D 0 H: 1 D > 0 or H: 1 D < 0 Step 2: Critical value from t-distribution with degrees of freedom equal to number of data pairs minus one (df = n 1). Step 3: Compute differences between pairs (D) then mean difference ( ) and s : D Test value: 63 Step 4 and 5: If test value > critical value reject H 0 otherwise there is no difference between the two trials/groups.

64 Correlation and Regression 64 Linear Correlation: Does one variable increase or decrease linearly with another? Is there a linear relationship between two or more variables? Types of linear relationships: Positive linear Negative linear No relationship None or weak

65 Scattergrams 65 Weak linear Strong Linear Other relationships: Nonlinear or Curvilinear Linear and Exponential?

66 Correlation 66 Pearson Product Moment Correlation Coefficient: Simply called correlation coefficient, PPMC or r-value Linear correlation between two variables Examples: Weight increases with height. IQ with brain size?! Used for calibration of instruments, force transducers, spring scales, electrogoniometers (measure joint angles). Multiple Correlation: Used when several independent variables influence a dependent variable R-value Defined as: Y = A + B 1 X 1 + B 2 X 2 + B 3 X B n Xn Examples: Heart disease is affected by family history, obesity, smoking, diet etc. Academic performance is affected by intelligence, economics, experience, memory etc. Lean body mass is predicted by a combination of body mass, thigh, triceps and abdominal skinfold measures.

67 Significance of Correlation Coefficient 67 Method 1 Step 1: H: 0 0 Step 2: Look up r crit for n 2 degrees of freedom (Table A-6 or I) Step 3: Compute sample r (as above) Step 4: Sample r is significant if it is greater than r crit Step 5: If significance occurs data are linearly correlated otherwise they are not. If table of significant correlation coefficients is not available or significance level ( ) is not 0.05 or 0.01 use Method 2. Method 2 Step 1: H: 0 0 Step 2: Look up t crit for n 2 degrees of freedom Step 3: Compute sample r then t Step 4: Sample t is significant if it is greater than t crit Step 5: If significance occurs data are linearly correlated otherwise they are not.

68 Regression 68 Regression: Can only be done if a significant correlation exists. Equation of line or curve which defines the relationship between variables. The line of best fit. Mathematical technique is called least squares method. This technique computes the line that minimizes the squares of the deviations of the data from the line.

69 Coefficient of Determination and Standard Error of Estimate 69 Coefficient of Determination Measures the strength of the relationship between the two variables. Equal to the explained variation divided by the total variation = r 2 Usually given as a percentage, i.e., 2 coefficient of determination = r 100% For example, an r of 0.90 has 81% of total variation explained but an r of 0.60 has only 36% of its variation. A correlation may be significant but explain very little. Standard Error of Estimate Measure of the variability of the observed values about the regression line Can be used to compute a confidence interval for a predicted value standard error of estimate:

70 Possible Reasons for a Significant Correlation There is a direct cause-and-effect relationship between the variables. That is, x causes y. For example, positive reinforcement improves learning, smoking causes lung cancer and heat causes ice to melt. 2. There is a reverse cause-and-effect relationship between the variables. That is, y causes x. For example, suppose a researcher believes excessive coffee consumption causes nervousness, but the researcher fails to consider that the reverse situation may occur. That is, it may be that an nervous people crave coffee. 3. The relationship between the variables may be caused by a third variable. For example, if a statistician correlated the number of deaths due to drowning and the number of cans of soft drinks consumed during the summer, he or she would probably find a significant relationship. However, the soft drink is not necessarily responsible for the deaths, since both variables may be related to heat and humidity. 4. There may be a complexity of interrelationships among many variables. For example, a researcher may find a significant relationship between students high school grades and college grades. But there probably are many other variables involved, such as IQ, hours of study, influence of parents, motivation, age and instructors. 5. The relationship may be coincidental. For example, a researcher may be able to find a significant relationship between the increase in the number of people who are exercising and the increase in the number of people who are committing crimes. But common sense dictates that any relationship between these two variables must be due to coincidence.

71 Comparing Frequencies using Chi-square Chi-square or 2 : pronounced ki squared Used to test whether the frequency of nominal data fit a certain pattern (goodness of fit) or whether two variables have a dependency (test for independence). Can be used to test whether data are normally distributed and for homogeneity of proportions Frequency of each nominal category is computed and compared to an expected frequency. Goodness of Fit: Need to know expected pattern of frequencies. If not known assume equal distribution among all categories. Assumptions: data are from a random sample expected frequency for each category must be 5 or more Examples: test for product / procedure preference (each is assumed equally likely to be selected) test for fairness of coin, die, roulette wheel (expect each outcome equally) test for expected frequency distribution (need theoretically expected pattern) 71

72 Goodness of Fit Test 72 Step 1 H 0: data fit the expected pattern H : data do not fit expected pattern 1 Step 2 2 Find critical value from table. Test is always a one-tailed right test with n 1 degrees of freedom, where n is number of categories. Step 3 Compute test value from: O = observed freq. E = expected frequency Step 4 Step 5 2 Make decision. If > critical value reject H 0. Summarize the results. E.g., There is (not) enough evidence to accept/reject the claim that there is a preference for. E.g., Coin is fair / unfair Die is fair / loaded Wheel is fair / flawed

73 Step 1 Test for Independence H 0: two variables are independent H : two variables are dependent 1 73 Step 2 2 Find critical value from table. Test is always one-tailed right with (n row 1)(n col. 1) degrees of freedom, where n row and n col. are the number of categories of each variable. These correspond to the number of rows and columns in the contingency table. Step 3 Create the contingency table to derive the expected values (see next page). Compute test value from: O = observed freq. E = expected frequency Step 4 2 Make decision. If > critical value reject H 0. Step 5 Summarize the results. E.g. Getting a cold is dependent upon whether you took a cold vaccine. - Smoking and lung disease are dependent. - Is a cure dependent upon placebo vs. drug.

74 74 Contingency Table First, enter observed (O) scores and compute row and column totals. Col.1 Col.2 Col.3 totals Row Row Row totals * / 125* * Notice sum of row and column totals must equal. Second, compute expected (E) values based of row and column totals. Col.1 Col.2 Col.3 Row 1 40x40/125=E11 40x60/125=E12 40x25/125=E13 Row 2 35x40/125=E21 35x60/125=E22 35x25/125=E23 Row 3 50x40/125=E31 50x60/125=E32 50x25/125=E33 Finally, compute the test value:

75 Analysis of Variance (ANOVA) 75 One-way ANOVA: used to test for significant differences among sample means differs from t-test since more than 2 groups are tested, simultaneously one factor (independent variable) is analyzed, also called the grouping variable dependent variable should be interval or ratio but factor is nominal Factorial Design: - groups must be independent (i.e., subjects in each group are different and unrelated) Assumptions: data must be normally distributed or nearly variances must be equal (i.e., homogeneity of variance) Examples: Does fitness level (VO2Max) depend on province of residence? Fitness level is a ratio variable, residence is a nominal variable. Does statistics grade depend of highest level of mathematics course taken? Does hand grip strength vary with gender? (Can be done with t-test. t-test can handle equal or unequal variances.)

76 One-way ANOVA cont d 76 An ANOVA tests whether one or more samples means are significantly different from each other. To determine which or how many sample means are different requires post hoc testing. Two samples where means are significantly different. These two sample means are NOT significantly different due to smaller difference and high variability. Even with same difference between means, if variances are reduced the means can be significantly different.

77 One-way ANOVA cont d 77 Step 1 Step 2 Step 3 H 0: all sample means are equal H : at least one mean is different 1 Find critical value from F table (Table A-5 or H). Tables are for one-tailed test. ANOVA is always onetailed. Compute test value from: Step 4 Step 5 Make decision. If F > critical value reject H. Summarize the results with ANOVA table. 0 All means are the same, i.e., come from the same population or at least one mean is significantly different. Step 6 If a significant difference is found, perform post hoc testing to determine which mean(s) is/are different.

78 ANOVA Summary Table 78 Source Sums of d.f. Mean F P squares square Between SSB k 1 SS B/(k 1) =s 2 B sb2 /s 2 W (also called Main effect) Within SSW N k SS W/(N k) =sw2 (also called Error term) Total SS +SS (k 1)+(N k)=n 1 Examples: B W One-way Factorial Source Sums of d.f. Mean F P squares square Between <0.01 Within Total Two-way Factorial Source Sums of d.f. Mean F P squares square Factor A NS Factor B <0.025 A x B <0.005 Within Total

79 Post Hoc Testing 79 Post Hoc testing used to determine which mean or group of means is/are significantly different from the others many different choices depending upon research design and research question (Bonferroni, Duncan s, Scheffé s, Tukey s HSD,...) only done when ANOVA yields a significant F Scheffé test: when sample sizes are unequal when most conservative test is desired when all pairs of sample means are to be tested Critical value: Use critical value from ANOVA and multiply by k-1. k = number of groups (means) F' = (k 1) F critical critical Test value: Decision: If F s > F' critical, then the two means are significantly different. Summary: Graph the sample means and summarize.

80 Post Hoc Testing cont d 80 Bonferroni test: used when less conservative test is desirable, i.e., more powerful when all pairs of sample means are to be tested Critical value: Use Table A-3 or F, adjust by dividing by number of all possible pairings. Test value: Note, this is the same as taking the square root of the Scheffé test value. Decision: If t > t different. critical, then the means are significantly Summary: Graph the results and summarize.

Probability. Sample space: all the possible outcomes of a probability experiment, i.e., the population of outcomes

Probability. Sample space: all the possible outcomes of a probability experiment, i.e., the population of outcomes Probability Basic Concepts: Probability experiment: process that leads to welldefined results, called outcomes Outcome: result of a single trial of a probability experiment (a datum) Sample space: all

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

ELEMENTARY STATISTICS

ELEMENTARY STATISTICS ELEMENTARY STATISTICS Study Guide Dr. Shinemin Lin Table of Contents 1. Introduction to Statistics. Descriptive Statistics 3. Probabilities and Standard Normal Distribution 4. Estimates and Sample Sizes

More information

There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

January 26, 2009 The Faculty Center for Teaching and Learning

January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

More information

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010 MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times

More information

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

A Correlation of. to the. South Carolina Data Analysis and Probability Standards A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

More information

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability. Glossary Brase: Understandable Statistics, 10e A B This is the notation used to represent the conditional probability of A given B. A and B This represents the probability that both events A and B occur.

More information

Chapter 4 & 5 practice set. The actual exam is not multiple choice nor does it contain like questions.

Chapter 4 & 5 practice set. The actual exam is not multiple choice nor does it contain like questions. Chapter 4 & 5 practice set. The actual exam is not multiple choice nor does it contain like questions. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing! MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics

More information

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

List of Examples. Examples 319

List of Examples. Examples 319 Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

An Introduction to Statistics using Microsoft Excel. Dan Remenyi George Onofrei Joe English

An Introduction to Statistics using Microsoft Excel. Dan Remenyi George Onofrei Joe English An Introduction to Statistics using Microsoft Excel BY Dan Remenyi George Onofrei Joe English Published by Academic Publishing Limited Copyright 2009 Academic Publishing Limited All rights reserved. No

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Appendix 2 Statistical Hypothesis Testing 1

Appendix 2 Statistical Hypothesis Testing 1 BIL 151 Data Analysis, Statistics, and Probability By Dana Krempels, Ph.D. and Steven Green, Ph.D. Most biological measurements vary among members of a study population. These variations may occur for

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF NURSING COURSE SYLLABUS NURS 5317 STATISTICS FOR HEALTH PROVIDERS. Fall 2013

THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF NURSING COURSE SYLLABUS NURS 5317 STATISTICS FOR HEALTH PROVIDERS. Fall 2013 THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF NURSING 1 COURSE SYLLABUS NURS 5317 STATISTICS FOR HEALTH PROVIDERS Fall 2013 & Danice B. Greer, Ph.D., RN, BC dgreer@uttyler.edu Office BRB 1115 (903) 565-5766

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

How To Understand And Solve A Linear Programming Problem

How To Understand And Solve A Linear Programming Problem At the end of the lesson, you should be able to: Chapter 2: Systems of Linear Equations and Matrices: 2.1: Solutions of Linear Systems by the Echelon Method Define linear systems, unique solution, inconsistent,

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Statistical Functions in Excel

Statistical Functions in Excel Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: What do the data look like? Data Analysis Plan The appropriate methods of data analysis are determined by your data types and variables of interest, the actual distribution of the variables, and the number of cases. Different analyses

More information

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

More information

Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

TABLE OF CONTENTS. About Chi Squares... 1. What is a CHI SQUARE?... 1. Chi Squares... 1. Hypothesis Testing with Chi Squares... 2

TABLE OF CONTENTS. About Chi Squares... 1. What is a CHI SQUARE?... 1. Chi Squares... 1. Hypothesis Testing with Chi Squares... 2 About Chi Squares TABLE OF CONTENTS About Chi Squares... 1 What is a CHI SQUARE?... 1 Chi Squares... 1 Goodness of fit test (One-way χ 2 )... 1 Test of Independence (Two-way χ 2 )... 2 Hypothesis Testing

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

Basic Probability and Statistics Review. Six Sigma Black Belt Primer

Basic Probability and Statistics Review. Six Sigma Black Belt Primer Basic Probability and Statistics Review Six Sigma Black Belt Primer Pat Hammett, Ph.D. January 2003 Instructor Comments: This document contains a review of basic probability and statistics. It also includes

More information

Basic Concepts in Research and Data Analysis

Basic Concepts in Research and Data Analysis Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the

More information

Chi Square Tests. Chapter 10. 10.1 Introduction

Chi Square Tests. Chapter 10. 10.1 Introduction Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

THE BINOMIAL DISTRIBUTION & PROBABILITY

THE BINOMIAL DISTRIBUTION & PROBABILITY REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

More information

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

IBM SPSS Statistics for Beginners for Windows

IBM SPSS Statistics for Beginners for Windows ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

More information

Algebra 1 Course Information

Algebra 1 Course Information Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Opgaven Onderzoeksmethoden, Onderdeel Statistiek Opgaven Onderzoeksmethoden, Onderdeel Statistiek 1. What is the measurement scale of the following variables? a Shoe size b Religion c Car brand d Score in a tennis game e Number of work hours per week

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph. MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

More information

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009 Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level

More information

Descriptive Analysis

Descriptive Analysis Research Methods William G. Zikmund Basic Data Analysis: Descriptive Statistics Descriptive Analysis The transformation of raw data into a form that will make them easy to understand and interpret; rearranging,

More information

Testing Hypotheses About Proportions

Testing Hypotheses About Proportions Chapter 11 Testing Hypotheses About Proportions Hypothesis testing method: uses data from a sample to judge whether or not a statement about a population may be true. Steps in Any Hypothesis Test 1. Determine

More information