A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Size: px
Start display at page:

Download "A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability."

Transcription

1 Glossary Brase: Understandable Statistics, 10e A B This is the notation used to represent the conditional probability of A given B. A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability. A or B This represents the probability that one or both of events A or B occur. It can be calculated by taking the probability of A plus the probability of B minus the probability of both A and B. Addition rules for probability For mutually exclusive events, the probabilities can simply be added together. If the events can be cooccurring, you should subtract the probability of a co-occurring event which is otherwise counted twice (once for each event). alpha (type I error) α The probability of making a type I error is denoted by the Greek letter alpha, α. Alternate hypothesis H1 The alternate hypothesis is the statement you will adopt in the situation in which the evidence (data) is so strong that you reject the null hypothesis. A statistical test is designed to assess the strength of the evidence (data) against the null hypothesis. ANOVA An analysis of variance (ANOVA) allows us to compare several sample means. ANOVA requires that the groups are independent, randomly selected, and come from normally distributed populations with approximately the same standard deviation. Typically, the null hypothesis states that the groups all come from the same population and therefore have the same mean. Area under the standard normal curve There are extensive tables that show the area under the standard normal curve for almost any interval along the z axis. The areas are important because each area is equal to the probability that the measurement of an item selected at random falls in this interval. Areas under any normal curve To find areas and probabilities for a random variable x that follows a normal distribution, convert x values to z values. Arithmetic mean The arithmetic mean is often simply referred to as the mean. Average An average is a single number used to describe an entire sample or population. Back-to-back stem plot Back-to-back stem plots are used to compare two sets of data that share common stems. The stems are aligned vertically in a central column. The first set of leaves is displayed to the right, as in a regular stem-and-leaf display. The second set of leaves is displayed to the left, increasing outward. Bar graph In a bar graph, bars are of uniform width and uniformly spaced. The bars can be vertical or horizontal. The lengths of the bars represent values of the variable being displayed, the frequency of occurrence, or the percentage of occurrence. The same measurement scale is used for the length of each bar. Bar graphs should be well labeled. Bayes's theorem Bayes's theorem is an important relation for conditional probabilities that lets us calculate an unknown conditional probability based on other known probabilities. Bell-shaped curve The normal curve is also called a bell-shaped curve. beta (type II error) β The probability of making a type II error is denoted by the Greek letter beta, β. Bimodal distribution This term refers to a histogram in which the two classes with the largest frequencies are separated by at least one class. The top two frequencies of these classes may have slightly different values. This type of situation sometimes indicates that we are sampling from two different populations. Binomial coefficient The binomial coefficient represents the number of combinations of n distinct objects taken r at a time. Binomial experiment The central problem of a binomial experiment is to find the probability of r successes out of n trials. Each trial is independent and has one of two outcomes, success or failure.

2 Binomial probability distribution The binomial probability distribution can be used to compute the probability of r successes for any number of trials. To find the binomial distribution, take the probability of getting one outcome with r successes and n - r failures and multiply it by the number of outcomes that have r successes and n - r failures. Block A block is a group of individuals sharing some common features that might affect the treatment. Box-and-whisker plot A box-and-whisker plot is a visual representation a five-number summary. To create a box -andwhisker plot, draw a vertical scale to include the lowest and highest data values. To the right of the scale, draw a box from Q1 to Q3. Include a solid line through the box at the median level. Draw vertical lines, called whiskers, from Q1 to the lowest value and from Q3 to the highest value. confidence interval c the population mean. The value c is the proportion of confidence intervals, based on random samples of size n, that actually contain Categorical variable Sometimes qualitative variables are referred to as categorical variables. Census In a census, measurements or observations from the entire population are used. Central limit theorem The central limit theorem says that x can have any distribution whatsoever, but as the sample size gets larger and larger, the distribution of x-bar will approach a normal distribution. Chebyshev's theorem The data spread about the mean can be expressed generally for all distributions by Chebyshev's theorem. For any set of data (either population or sample) and for any constant k greater than 1, the proportion of the data that must lie within k standard deviations on either side of the mean is at least 1 minus the reciprocal of k squared. Chi-square distribution The chi-square distribution is non-symmetrical and varies depending on the degrees of freedom. Chi-square U, chi-square L Chi-square U is the value for the upper area of the curve (the right-tail), while chi-square L is the value for the lower area of the curve (the left-tail). Circle graph A circle graph is another name for a pie chart. Class boundaries There is a space between the upper limit of one class and the lower limit of the next class. The halfway points of these intervals are called class boundaries. Class frequency Examine each data value. Determine which class contains the data value and make a tally mark beside that class. The class frequency for a class is the number of tally marks corresponding to that class. Class lower limit The lower class limit is the lowest data value that can fit in a class. Class mark The class mark is another name for the class midpoint. Class midpoint The center of each class is called the midpoint (or class mark). The midpoint is often used as a representative value of the entire class. The midpoint is found by adding the lower and upper class limits of one class and dividing by 2. Class upper limit The upper class limit is the highest data value that can fit in a class. Class width The class width is the difference between the lower class limit of one class and the lower class limit of the next class. To find the class width, subtract the smallest data value from the largest data value. Divide the result by the desired number of classes, and increase the computed value to the next highest whole number. Cluster sample Divide the entire population into pre-existing segments or clusters. The clusters are often geographic. Make a random selection of clusters. Include every member of each selected cluster in the sample. Coefficient of determination r2 The coefficient of determination r2 is the square of the sample correlation coefficient r. It allows us to determine how good the least-squares line is as an instrument of regression. It is calculated as the ratio of explained variation over total variation.

3 Coefficient of multiple regression The coefficient of multiple determination allows us to determine how good a fit the least-squares regression is for a given set of data. The coefficient of multiple determination is a direct generalization of the concept of coefficient of determination between two variables. Coefficient of variation The coefficient of variation is used to express the standard deviation as a percentage of the sample or population mean. Calculate the coefficient of variation by dividing the standard deviation by the mean and multiplying the result by 100. Column total In a contingency table, the column total gives us the total number of data points that correspond with one of the variables. The column totals should sum to the number of total data points. Combinations rule The number of combinations of n objects taken r at a time is the permutations divided by r factorial where n and r are whole numbers and n is greater than or equal to r. Another commonly used notation for combinations is ncr. Complement of event A The complement of event A is the event that A does not occur. A probability and its complement sum to 1. Completely randomized design For one-way ANOVA, we have one factor. Different levels for the factor form the treatment groups under study. In a completely randomized design, independent random samples of experimental subjects or objects are selected for each treatment group. Completely randomized experiment A completely randomized experiment is one in which a random process is used to assign each individual to one of the treatments. Conditional probability Conditional probability is the probability that a dependent event will occur given that another event has occurred. Confidence interval for mean difference (standard deviations known) To find the confidence interval for a population mean difference with known standard deviations, first obtain two independent random samples from both populations. If you can assume that both population distributions are normal, any sample sizes will work. If you cannot assume this, then use sample sizes greater or equal to 30 for both populations. The confidence interval for the population mean is the difference in sample means, plus or minus the margin of error. To calculate the margin of error, take the first population variance divided by the first sample size, and add it to the second population variance divided by the second sample size. Take the square root of the calculated value and multiply by the critical value zc for the desired confidence level c. Confidence interval for mean difference (standard deviations unknown) When the population standard deviations are unknown, we turn to a Student s t distribution to find the difference in population means. The confidence interval for the population mean is the difference in sample means, plus or minus the margin of error. To calculate the margin of error, take the first sample variance divided by the first sample size, and add it to the second sample variance divided by the second sample size. Take the square root of the calculated value and multiply by the critical value tc for the desired confidence level c. When determining tc, use the degrees of freedom for the distribution with the smallest sample size. Confidence interval for p The confidence interval for p is the probability that p lies in the interval between p-hat minus the margin of error and p-hat plus the margin of error. Confidence interval for p1 - p2 The confidence interval for the difference of two binomial probability distributions is centered around the difference in p-hat values, plus or minus the margin of error. To calculate the margin of error, first multiply together the point estimates for success and failure, divided by the number of trials. Do this for both distributions and add them together to obtain the variance. Take the square root and multiply by the critical value to obtain the margin of error. Confidence interval for the population mean A c confidence interval for the population mean is an interval computed from sample data in such a way that c is the probability of generating an interval containing the actual value of the population mean. Confidence interval for the variance There are situations where we are interested in estimating the variability of a distribution rather than the expected value. Confidence level c The reliability of an estimate is measured by the confidence level. Suppose we want a confidence level of c. Theoretically, you can choose c to be any value between 0 and 1, but usually c is equal to a number such as 0.90, 0.95, or Confounding variable Two variables are confounded when the effects of one cannot be distinguished from the effects of the other. Confounding variables may be part of the study, or they may be outside lurking variables.

4 Contingency table with cells A contingency table is used to record the expected frequencies when comparing two factors. Each cell in the table corresponds to a specific combination of the two factors we are interested in measuring. Based on the null hypothesis, we should be able to pre-calculate the expected values for each cell in the table by making assumptions about the probabilities for each factor. Continuity correction Adjusting the values of discrete random variables to obtain a corresponding range for a continuous random variable is called making a continuity correction. If the discrete variable is a left point of an interval, subtract 0.5 to obtain the corresponding normal variable. If the discrete variable is a right point of an interval, add 0.5 to obtain the corresponding normal variable. Continuity correction for a p-hat distribution For a number of successes r and a total number of trials n, continuity correction can be used to convert a discrete p-hat distribution to a continuous x distribution. If r/n is the right endpoint of a p-hat interval, add 0.5/n to get the corresponding right endpoint of the x interval. If r/n is the left endpoint of a p-hat interval, subtract 0.5/n to get the corresponding left endpoint of the x interval. Continuous random variable A continuous random variable can take on any of the countless number of values in a line interval. Control chart If we are examining data over a period of equally spaced time intervals or in some sequential order, then control charts are especially useful. Control charts combine graphic and numerical descriptions of data with probability distributions. A control chart for a variable x is a plot of the observed x values in time sequence order. Control group In general, a control group is used to account for the influence of other known or unknown variables that might be an underlying cause of a change in response in the experimental group. Convenience sample Create a sample by using data from population members that are readily available. Correlation and causation The correlation coefficient is a mathematical tool for measuring the strength of a linear relationship between two variables. As such, it makes no implication about cause or effect. Correlation between averages The correlation between two variables consisting of averages is usually higher than the correlation between two variables representing corresponding raw data. One reason is that the use of averages reduces the variation that exists between individual measurements. Criteria for using normal approximation to binomial, np > 5 and nq > 5 For a distribution with a sufficiently large number of trials, the normal distribution can be used to approximate the binomial distribution. The number of trials multiplied by the probability of failure should be greater than 5. The number of trials multiplied by the probability of success should also be greater than 5, and this value can be used as the mean. Multiply together the number of trials, the probability of success, and the probability of failure to obtain the variance. Take the square root of the variance to get the standard deviation. Critical region The values of a distribution for which we reject the null hypothesis are called the critical region of the distribution. Depending on the alternate hypothesis, the critical region is located on the left side, the right side, or both sides of the distribution. Critical value Critical values are the boundaries of the critical region. Critical values are designated as z0 for the standard normal distribution. Critical values tc Critical values tc for a c confidence level indicate the values such that an area equal to c under the t distribution for a given number of degrees of freedom falls between -tc and tc. Critical values zc For a confidence level c, the critical value zc is the number such that the area under the standard normal curve between -zc and zc equals c. Cumulative frequency The cumulative frequency for a class is the sum of the frequencies for that class and all previous classes. Degrees of freedom Values of the variable t corresponding to what we call the number of degrees of freedom, abbreviated d.f. For the methods used in this section, the number of degrees of freedom is given by the formula d.f. = n - 1 where d.f. stands for the degrees of freedom and n is the sample size. Each choice for d.f. gives a different t distribution. Degrees of freedom d.f. for denominator for F distribution The degrees of freedom for the denominator of an F distribution is typically the total sample size across all groups minus the number of groups.

5 Degrees of freedom d.f. for numerator for F distribution the number of sample groups minus 1. The degrees of freedom for the numerator of an F distribution is typically Degrees of freedom d.f. for the chi-square distribution and tests of independence The degrees of freedom for a chisquare test of independence can be found by taking the number of rows - 1 and multiplying by the number of columns - 1. Degrees of freedom d.f. for the chi-squared distribution and goodness-of-fit test For a goodness-of-fit test, the number of degrees of freedom is the number of categories minus 1. degrees of freedom for testing population mean (population standard deviation unknown) If the standard deviation is unknown, you can still estimate the population mean by using a t distribution. If you can assume that your random variable is normally distributed, any sample size will work. Otherwise, be sure to choose a sample size greater or equal to 30. Use the sample size minus 1 to obtain the degrees of freedom and select a t distribution. degrees of freedom for testing the difference in population means when the population standard deviations are unknown If the standard deviations for the two distributions are unknown, you can still estimate the population mean difference by using a t distribution. If you can assume that your random variables are both normally distributed, any sample sizes will work. Otherwise, be sure to choose sample sizes greater or equal to 30. Use the sample size of the smaller distribution minus 1 to obtain the degrees of freedom and select a t distribution. Dependent events If events are dependent, the probability of one event depends upon the occurrence of the other event. Dependent samples Two sampling distributions are dependent if there is a relationship between corresponding data values in the two distributions. Paired data are an example of dependent samples. Descriptive statistics Descriptive statistics involves methods of organizing, picturing, and summarizing information from samples or populations. Discrete random variable A discrete random variable can take on only a finite number of values or a countable number of values. Dotplot A dotplot is somewhat similar to a histogram. In a dotplot, the data values are displayed along the horizontal axis. A dot is then plotted over each data value in the data set. Double-blind experiment This means that neither the individuals in the study nor the observers know which subjects are receiving the treatment. Double-blind experiments help control for subtle biases that an observer might pass on to a subject. EDA EDA stands for Exploratory Data Analysis. Empirical rule For a distribution that is symmetrical and bell-shaped (in particular, for a normal distribution): Approximately 68% of the data values will lie within one standard deviation on each side of the mean. Approximately 95% of the data values will lie within two standard deviations on each side of the mean. Approximately 99.7% (or almost all) of the data values will lie within three standard deviations on each side of the mean. Equally likely outcomes When outcomes are equally likely, the probability of an event is simply the number of favorable outcomes divided by the total number of outcomes. Error variation in ANOVA The error variation corresponds to the within-group variation of one-way ANOVA. Expected frequency of a cell, E In a contingency table, we might propose the null hypothesis that two factors are independent. In such a case, we can calculate the expected frequency of a cell by simply multiplying together the probabilities of each factor which is assumed to be independent. Computationally, this is equivalent to multiplying the row total for one factor by the column total for another factor, and dividing the result by the sample size. Expected value The mean of a probability distribution is often called the expected value of the distribution. The expected value is an average value and need not be a point of the sample space. Experiment In an experiment, a treatment is deliberately imposed on the individuals in order to observe a possible change in the response or variable being measured.

6 Explained variation Explained variation is defined as y-hat minus y-bar. It represents the difference between a base value y-bar and the least-squares line value y-hat. Explanatory variable In a scatter diagram, we call x the explanatory variable. Exploratory data analysis Exploratory data analysis techniques are particularly useful for detecting patterns and extreme data values. They are designed to help us explore a data set, to ask questions we had not thought of before, or to pursue leads in many directions. Extrapolation Predicting values of x values that are beyond observed x values in the data set is called extrapolation. Extrapolation may produce unrealistic forecasts. F distribution The F distribution can be used to test two population variances. The F distribution is skewed to the right, and its values are always greater than zero. It depends on two separate degrees of freedom, one for each of the populations being tested. F ratio The F ratio is the sample test statistic for the F distribution. It can be calculated as the ratio of sample variances. If two populations are hypothesized to be the same, then the F ratio should be approximately 1. Factor in two-way ANOVA In a two-way ANOVA model, the two variables are called factors. Factorial For a number n, its factorial is the product of n with each of the positive counting numbers less than n. By special definition, the factorial of zero is 1. Faulty recall Respondents may not accurately remember when or whether an event took place. Five-number summary The quartiles together with the low and high data values give us a very useful five-number summary of the data and their spread. Frequency Frequency is the number of times that a value appears within a set of data. Frequency distribution A frequency distribution reflects the way that values occur with varying frequency within a set of data. Frequency table A frequency table partitions data into classes or intervals and shows how many data values are in each class. The classes or intervals are constructed so that each data value falls into exactly one class. Gaussian distribution The normal distribution is sometimes called Gaussian after a mathematician who studied it, Carl Friedrich Gauss. Geometric mean When data consist of percentages, ratios, growth rates, or other rates of change, the geometric mean is a useful measure of central tendency. For n data values, multiply them together and take the nth root to calculate the geometric mean. This assumes all data values are positive. Geometric probability distribution A geometric probability distribution is used to calculate the probability that our first success comes on the nth trial. The probability for the nth trial is given by the probability of success multiplied by the probability of failure raised to the n-1 power. Goodness-of-fit test The goodness of fit test allows us to determine whether a population follows a specified distribution. In other words, we are testing the null hypothesis that a population fits a given distribution. For goodness-of-fit tests, we use a right-tailed test on the chi-square distribution. This is because we are testing to see if the chi-square measure of the difference between the observed and expected frequencies is too large to be due to chance alone. Harmonic mean When data consist of rates of change, such as speeds, the harmonic mean is an appropriate measure of central tendency. Sum together the reciprocals of each data value. Take the total number of values and divide by the computed sum to obtain the harmonic mean. This assumes no data value is 0. Hidden bias The question may be worded in such a way as to elicit a specific response. The order of questions might lead to biased responses. Also, the number of responses on a Likert scale may force responses that do not reflect the respondent s feelings or experience.

7 Histogram In histograms, we use bars to visually represent each class. The width of the bar is the class width, and the height of the bar is the class frequency. Homogeneity test A test of homogeneity tests the claim that different populations share the same proportions of specified characteristics. A test of homogeneity tests the claim that different populations share the same proportions of specified characteristics. This enables us to determine whether several populations share the same proportions of distinct categories. The computational processes for conducting tests of independence and tests of homogeneity are the same. The two main differences are the sampling method and the hypotheses. Hypergeometric probability distribution The hypergeometric distribution is a probability distribution of a random variable that has two outcomes when sampling is done without replacement. This is the distribution that is appropriate when the sample size is so small that sampling without replacement results in trials that are not even approximately independent. Hypotheses Hypotheses are assertions that you assume to be true for the purposes of investigation. Hypothesis testing Hypothesis testing is used to examine the validity of a hypothesis, such as the value of a parameter estimate. The central question in hypothesis testing is whether or not you think the value of the sample test statistic is too far away from the value of the population parameter proposed in the null hypothesis to occur by chance alone. Hypothesis tests about the variance There are situations where we are interested in testing variability of a distribution rather than the expected value, perhaps to find out whether variability increases or decreases given certain conditions. Tests of variance can be lefttailed, right-tailed, or two-tailed. Hypothesis tests about two variances It is sometimes useful to test the variances of two independent, normally distributed populations. The F-distribution can be used to test the null hypothesis that the populations share the same variance given a desired level of significance. Independence test An independence test is used to determine whether or not two factors are related to each other. This is often determined using a chi-square test. Independent events Two events are independent if the occurrence or nonoccurrence of one does not change the probability that the other will occur. Independent samples Two samples are independent if sample data drawn from one population are completely unrelated to the selection of sample data from the other population. Independent trials Trials are independent if the result of one trial has no effect on the results of other trials. Individuals Individuals are the people or objects included in the study. Inferential statistics Inferential statistics involves methods of using information from a sample to draw conclusions regarding the population. Inflection points The exact places on the normal curve where the transition between the upward and downward cupping occur are above the points one standard deviation away from the mean. In the terminology of calculus, transition points such as these are called inflection points. Interaction in two-way ANOVA In a two-way ANOVA model, be sure to test for interaction between the two factors. If you reject the null hypothesis of no interaction, then you should not test for a difference of means in the levels of the row factors or a difference of means in the levels of the column factors because the interaction of the factors makes interpretation of the results of the main effects more complicated. Interpolation Predicting values for x values that are between observed x values in the data set is called interpolation. Interquartile range The interquartile range is the difference between the third and first quartiles. Interval level The interval level of measurement applies to data that can be arranged in order. In addition, differences between data values are meaningful.

8 Interviewer influence influence responses. Factors such as tone of voice, body language, dress, gender, authority, and ethnicity of the interviewer might Law of large numbers In the long run, as the sample size increases and increases, the relative frequencies of outcomes get closer and closer to the theoretical (or actual) probability value. Leaf In a stem-and-leaf display, the rightmost part is called the leaf. Least-squares criterion One way to find a linear equation to represent a set of points in a scatter diagram is to use the least-squares criterion. This states that the sum of the squares of the vertical distances from the data points (x, y) to the line must be made as small as possible. Least-squares line y-hat = a + bx We use the notation y-hat = ax + b for the least-squares line. Algebra tells us that b is the slope and a is the intercept of the line. In this context, y-hat represents the value of the response variable y estimated using the least squares line and a given value of the explanatory variable x. Left-tailed test A statistical test is left-tailed if the alternate hypothesis states that the parameter is less than the value claimed in the null hypothesis. Level in two-way ANOVA In a two-way ANOVA model, the levels of a factor are the different values the factor can assume. Level of significance α The probability with which we are willing to risk a type I error is called the level of significance of a test. It is denoted by the Greek letter alpha, α. Levels of measurement These levels indicate the type of arithmetic that is appropriate for the data, such as ordering, taking differences, or taking ratios. Likert scale Sometimes survey respondents choose a number on a scale that represents their feelings from, say, strongly disagree to strongly agree. Such a scale is called a Likert scale. Linear combination of two independent random variables Let x1 and x2 be independent random variables, and let a and b be any constants. Then the new random variable W = ax1 + bx2 is called a linear combination of x1 and x2. Linear function of a random variable Let a and b be any constants, and let x be a random variable. Then the new random variable L = a + bx is called a linear function of x. Lurking variable A lurking variable is one for which no data have been collected but that nevertheless has influence on other variables in the study. The fact that two variables tend to increase or decrease together does not mean a change in one is causing a change in the other. A strong correlation between x and y is sometimes due to lurking variables. Main effects in two-way ANOVA In a two-way ANOVA model, the hypothesis regarding each separate factor is called a main effect. Margin of error The margin of error is the magnitude (i.e. the absolute value) of the difference between the sample point estimate and the true population parameter value. Margin of error for polls Some polls clarify the meaning of the margin of error further by saying that it is an error due to sampling. In most polls, the margin of error is given for a 95% confidence interval. Maximal margin of error The margin of error is the magnitude of the difference between the sample mean and the population mean. In most practical problems, the population mean is unknown, so the margin of error is also unknown. However, we can compute an error tolerance E that serves as a bound on the margin of error. Using a c% level of confidence, we can say that the point estimate differs from the population mean by a maximal margin of error. The maximal margin of error is zc multiplied by the population standard deviation and divided by the square root of the sample size. Mean An average that uses the exact value of each entry is the mean (sometimes called the arithmetic mean). To compute the mean, we add the values of all the entries and then divide by the number of entries. Mean for the binomial distribution The mean for a binomial distribution is the number of trials multiplied by the probability of success on a single trial.

9 Mean of a probability distribution The mean represents a central point or cluster point for the entire distribution. Mean of grouped data When data are grouped, such as in a frequency table or histogram, we can estimate the mean. For each class, multiply the class mean by the number of entries in that class. Take the sum of the computed values and divide by the total number of classes to obtain the mean. Mean of the p hat distribution The mean of the p-hat distribution is simply the probability of a successful outcome for a single trial. Mean of the x bar distribution The mean of the x-bar distribution is the same as the mean of the x distribution, denoted with the Greek letter mu, μ. Meaning of slope In the equation y-hat = ax + b, the slope b tells us how many units y-hat changes for each unit change in x. Median The median is the central value of an ordered distribution. To find the median, order the data from smallest to largest. For an odd number of data values, the median is the middle data value. For an even number of data values, take the sum of the two middle values and divide by two to obtain the median. Mode The mode of a data set is the value that occurs most frequently. Monotone relationship In a monotone relationship between variables x and y, y must always increase or always decrease as x increases. Mound-shaped symmetric distribution graph is folded vertically down the middle. This term refers to a histogram in which both sides are (more or less) the same when the MSBET, MSW The mean squares are the variance estimates needed for an ANOVA test. MSBET measures variance between groups, and MSW measures the variance within groups. The F-ratio test statistic can be obtained by dividing MSBET by MSW. Multinomial experiments A multinomial experiment is similar to a binomial, except that it accounts for more than two outcomes. To use a multinomial distribution, all trials must be independent, and outcomes must fall into a distinct category with the same probability for each trial. The test of independence and goodness of fit are both important in multinomial experiments. Multiple regression We have statistical methods for predicting one variable in terms of another single variable. However, we can improve the reliability of our predictions if we include more relevant data and corresponding random variables in the computation of our predictions. This is done using methods of multiple regression. Multiplication rule of counting The total number of possible outcomes for a sequence of events is the product of the number of possibilities for each event in the sequence. Multiplication rules of probability (for independent and dependent events) For independent events, the probabilities can simply be multiplied. For conditional events, the probability of the independent event is multiplied by the conditional probability of the dependent event. Multistage sample Use a variety of sampling methods to create successively smaller groups at each stage. The final sample consists of clusters. Mutually exclusive events Two events are mutually exclusive or disjoint if they cannot occur together. In particular, events A and B are mutually exclusive if P(A and B) = 0. Negative binomial distribution Given a number of successes k, where the kth success occurs on trial n, we can describe the probability distribution of n using the negative binomial distribution. When k is 1, this is the geometric probability distribution. Negative correlation If low values of x are associated with high values of y and high values of x are associated with low values of y, the variables are said to be negatively correlated. No linear correlation If the points of a scatter diagram are located so that no line is realistically a good fit, we then say that the points possess no linear correlation.

10 Nominal level The nominal level of measurement applies to data that consist of names, labels, or categories. There are no implied criteria by which the data can be ordered from smallest to largest. Nonparametric statistics Nonparametric methods require no assumptions about the population distributions from which samples are drawn. The obvious advantages of these tests are that they are quite general and not difficult to apply. The disadvantages are that they tend to waste information and tend to result in acceptance of the null hypothesis more often than they should. As such, nonparametric tests are sometimes less sensitive than other tests. Non-parametric test Non-parametric tests are useful when you cannot make assumptions about the shape or size of a population distribution. The disadvantage of non-parametric tests is that they are less sensitive in that they tend to accept the null hypothesis more often than they should. Nonresponse population. Individuals either cannot be contacted or refuse to participate. Nonresponse can result in significant undercoverage of a Nonsample error A nonsampling error is the result of poor sample design, sloppy data collection, faulty measuring instruments, bias in questionnaires, and so on. Normal approximation to the binomial distribution For a distribution with a sufficiently large number of trials, the normal distribution can be used to approximate the binomial distribution. The number of trials multiplied by the probability of failure should be greater than 5. The number of trials multiplied by the probability of success should also be greater than 5, and this value can be used as the mean. Multiply together the number of trials, the probability of success, and the probability of failure to obtain the variance. Take the square root of the variance to get the standard deviation. Normal curves The graph of a normal distribution is called a normal curve. Normal distributions One of the most important examples of a continuous probability distribution is the normal distribution. Normality indicators There are several indicators that can be used to determine if data have a normal distribution. A histogram of the distribution should be roughly bell-shaped. There should be less than one outlier above the third quartile or below the first quartile by greater than 1.5 interquartile range. Normal distributions are symmetric and should have a Pearson's index value between -1 and 1. In addition, a normal quantile plot of the data should have points close to a straight line. Null hypothesis H0 The null hypothesis is the statement that is under investigation or being tested. Usually the null hypothesis represents a statement of no effect, no difference, or, put another way, things haven t changed. Observational study In an observational study, observations and measurements of individuals are conducted in a way that doesn t change the response or the variable being measured. Observed frequency of a cell, O In a contingency table, the observed frequency is simply the number of actual observed data points that share the two factors being compared. Odds Odds are the ratio of an event divided by its complement. Ogive An ogive (pronounced oh ji ve ) is a graph that displays cumulative frequencies. One-way ANOVA A single-factor analysis of variance is called one-way ANOVA. Ordinal level The ordinal level of measurement applies to data that can be arranged in order. However, differences between data values either cannot be determined or are meaningless. Outlier Some data sets include values so high or so low that they seem to stand apart from the rest of the data. These data are called outliers. Outliers may represent data collection errors, data entry errors, or simply valid but unusual data values. Out-of-control signals A random variable x is said to be out of control if successive time measurements of x indicate that it is no longer following the target probability distribution. This can be used as a warning signal that a process is out of control. Paired data Paired data can be used when there is a natural matching of characteristics. For example, data pairs occur very naturally in before and after situations, where the same object or item is measured both before and after a treatment. Using matched or paired

11 data often can reduce the danger of introducing extraneous or uncontrollable factors into our sample measurements because the matched or paired data have essentially the same characteristics except for the one characteristic that is being measured. Paired data values (x,y) Studies of correlation and regression of two variables usually begin with a graph of paired data values (x, y). Parameter A parameter is a numerical measure that describes an aspect of a population. Parametric test A parametric test is a statistical test that requires certain assumptions such as a normal distribution or a large sample size. Pareto chart A Pareto chart is a bar graph in which the bar height represents frequency of an event. In addition, the bars are arranged from left to right according to decreasing height. P-Chart A P-Chart is a control chart for proportions r/n, where r is the number of successes out of a number of trials n. Pearson correlation coefficient The Pearson correlation coefficient is a mathematical measurement that describes the strength of the linear association between two variables, denoted by the letter r. Percentile There are 99 percentiles, and in an ideal situation, the 99 percentiles divide the data set into 100 equal parts. However, if the number of data elements is not exactly divisible by 100, the percentiles will not divide the data into equal parts. Perfect linear correlation If all the points in a scatter diagram lie on a line, then we have perfect linear correlation. In statistical applications, perfect linear correlation almost never occurs. Permutations rule For a number of choices n and a number of choices r, The number of ways to arrange in order n distinct objects, taking them r at a time, is n factorial divided by (n-r) factorial where n and r are whole numbers and n is greater than or equal to r. Another commonly used notation for permutations is npr. Pie chart In a circle graph or pie chart, wedges of a circle visually display proportional parts of the total population that share a common characteristic. Placebo effect The placebo effect occurs when a subject receives no treatment but (incorrectly) believes he or she is in fact receiving treatment and responds favorably. Point estimate for p, p-hat For a binomial distribution, p-hat is the number of successes divided by the number of trials. This can be used as a point estimate for p, the population proportion of successes. Point estimate for the population mean A point estimate of a population parameter is an estimate of the parameter using a single number. A sample mean is a point estimate of the population mean. Poisson approximation to the binomial For most practical purposes, the Poisson distribution will be a very good approximation to the binomial distribution provided the number of trials n is larger than or equal to 100, and the number of trials n multiplied by the probability of success p is less than 10. As n gets larger and p gets smaller, the approximation becomes better and better. Poisson probability distribution If we examine the binomial distribution as the number of trials n gets larger and larger while the probability of success p gets smaller and smaller, we obtain the Poisson distribution. The Poisson distribution applies to accident rates, arrival times, defect rates, the occurrence of bacteria in the air, and many other areas of everyday life. Pooled estimates of proportion, p-bar If two distributions are assumed to have the same proportion of successes, you can use a pooled best estimate, which is the sum of the observed number of successes for both trials divided by the sum of the total combined number of trials. Pooled standard deviation When there is reason to believe that two distributions have the same standard deviation, it is best to use a t distribution with a pooled standard deviation. The corresponding Student s t distribution has degrees of freedom equal to the sum of both sample sizes minus 2. Population correlation coefficient rho, ρ From a population of (x, y) pairs, we may be able to compute the population correlation coefficient if certain conditions are men. Specifically, the (x, y) are assumed to be representative of all possible (x, y) pairs, and both x

12 and y values should be normally distributed for their paired y and x values. We denote the population correlation coefficient using the Greek letter rho, ρ. Population data In population data, the data are from every individual of interest. Population mean, μ the population mean. The population mean is taken over the entire population. We use the lowercase Greek letter mu, μ, to represent Population parameters Population parameters are taken over an entire population instead of just a sample. When we see Greek letters used, we know the information given is from the entire population rather than just a sample. Population size The population size N is the number of all possible data values in the entire population. It is used to calculate the population mean, the population variance, and the population standard deviation. Population slope beta, β The population slope gives us the rate at which y changes per unit change in x. It is denoted by the Greek letter beta, β, and is part of the population least-squares equation y = αx + β. It is estimated by b in the equation yhat = ax + b. Population standard deviation If we have data for the entire population, we can compute the population standard deviation over all data values. Calculate the standard deviation by taking the square root of the population variance. Population variance If we have data for the entire population, we can compute the population variance over all data values. To find the sample variance, divide the sum of squares by the total number of elements in the population. Because we are using the entire population, we don't subtract 1 from the number of elements. Positive correlation The variables x and y are said to have positive correlation if low values of x are associated with low values of y and high values of x are associated with high values of y. Power of a test (1 - beta) The quantity 1 β is called the power of the test and represents the probability of rejecting the null hypothesis when it is, in fact, false. Probability distribution A probability distribution is an assignment of probabilities to each distinct value of a discrete random variable or to each interval of values of a continuous random variable. Probability of an event A, P(A) Probability is a numerical measure between 0 and 1 that describes the likelihood that an event will occur. Probabilities closer to 1 indicate that the event is more likely to occur. Probabilities closer to 0 indicate that the event is less likely to occur. P(A), read P of A, denotes the probability of event A. Probability of chance The P-value is sometimes called the probability of chance. Probability of failure In a binomial experiment, trials must result in a success or a failure. The probability of failure is simply 1 minus the probability of success. Probability of success In a binomial experiment, the probability of success during each individual trial is the same. P-value Assuming the null hypothesis is true, the probability that the test statistic will take on values as extreme as or more extreme than the observed test statistic (computed from sample data) is called the P-value of the test. The smaller the Pvalue computed from sample data, the stronger the evidence against the null hypothesis. Qualitative variable A qualitative variable describes an individual by placing the individual into a category or group, such as male or female. Quantitative variable A quantitative variable has a value or numerical measurement for which operations such as addition or averaging make sense. Quartile Quartiles are those percentiles that divide the data into fourths. The first quartile Q1 is the 25th percentile, the second quartile Q2 is the median, and the third quartile Q3 is the 75th percentile. Quota problem A quota problem uses a binomial distribution to find the number of trials n that provide for a specified number of successes at a given probability.

13 Random sample Use a simple random sample from the entire population. Random variable A quantitative variable x is a random variable if the value that x takes on in a given experiment or observation is a chance or random outcome. Randomization Randomization is used to assign individuals to the two treatment groups. This helps prevent bias in selecting members for each group. Randomized block design In a randomized block experiment, individuals are first sorted into blocks, and then a random process is used to assign each individual in the block to one of the treatments. When we block experimental subjects or objects together based on a similar characteristic that might affect responses to treatments, we have a block design. The use of blocks can account for some of the most important sources of variability among the experimental subjects or objects. In this way, differences among the treatment groups are more likely to be caused by the treatments themselves rather than by other sources of variability. Random-number table A random-number table contains pre-generated random numbers. From a random starting point in the table, simply read off a selection of random numbers. Range The range is the difference between the largest and smallest values of a data distribution. Rank-sum test The rank-sum test (also called the Mann-Whitney test) is a nonparametric method for testing the difference between the sample means of two independent random samples. To use the rank-sum test, first arrange the combined data points in point distributions in increasing order and assign each a rank. The sum of the ranks of the data points in the smaller sample can be used to calculate a sample test statistic to determine if the two sample distributions are the same. Ratio level The ratio level of measurement applies to data that can be arranged in order. In addition, both differences between data values and ratios of data values are meaningful. Data at the ratio level have a true zero. Raw score a z score. The raw score is the value of a random variable in a non-standard normal distribution. The raw score can be converted into Relative frequency The relative frequency of an event is its frequency divided by the number of total observations. Relative frequency of a class The relative frequency of a class is the proportion of all data values that fall into that class. To find the relative frequency of a particular class, divide the class frequency f by the total of all frequencies n (sample size). Relative-frequency histogram In relative-frequency histograms, we use bars to visually represent each class. The width of the bar is the class width, and the height of the bar is the relative frequency of that class. Relative-frequency table First make a frequency table. Then, for each class, compute the relative frequency f/n, where f is the class frequency and n is the total sample size. Replication Replication of the experiment on many subjects reduces the possibility that the differences between the two groups occurred by chance alone. Residual In a scatter diagram, the residual is another name for the unexplained deviation between the y value in a specified data pair (x, y) and the value predicted by the least squares line for the same x. Residual plot One way to assess how well a least-squares line serves as a model for the data is a residual plot. To make a residual plot, we put the x values in order on the horizontal axis and plot the corresponding residuals y = y-hat in the vertical direction. If the least-squares line provides a reasonable model for the data, the pattern of points in the plot will seem random and unstructured about the horizontal line at 0. Resistant measure A resistant measure is one that is not influenced by extremely high or low data values. Response variable In a scatter diagram, we call y the response variable. Right-tailed test A statistical test is right-tailed if the alternate hypothesis states that the parameter is greater than the value claimed in the null hypothesis.

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing! MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

Chapter 2: Frequency Distributions and Graphs

Chapter 2: Frequency Distributions and Graphs Chapter 2: Frequency Distributions and Graphs Learning Objectives Upon completion of Chapter 2, you will be able to: Organize the data into a table or chart (called a frequency distribution) Construct

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

ELEMENTARY STATISTICS

ELEMENTARY STATISTICS ELEMENTARY STATISTICS Study Guide Dr. Shinemin Lin Table of Contents 1. Introduction to Statistics. Descriptive Statistics 3. Probabilities and Standard Normal Distribution 4. Estimates and Sample Sizes

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

Statistics Chapter 2

Statistics Chapter 2 Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

A Correlation of. to the. South Carolina Data Analysis and Probability Standards A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative

More information

List of Examples. Examples 319

List of Examples. Examples 319 Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

2 Describing, Exploring, and

2 Describing, Exploring, and 2 Describing, Exploring, and Comparing Data This chapter introduces the graphical plotting and summary statistics capabilities of the TI- 83 Plus. First row keys like \ R (67$73/276 are used to obtain

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1 DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1 OVERVIEW STATISTICS PANIK...THE THEORY AND METHODS OF COLLECTING, ORGANIZING, PRESENTING, ANALYZING, AND INTERPRETING DATA SETS SO AS TO DETERMINE THEIR ESSENTIAL

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Descriptive Analysis

Descriptive Analysis Research Methods William G. Zikmund Basic Data Analysis: Descriptive Statistics Descriptive Analysis The transformation of raw data into a form that will make them easy to understand and interpret; rearranging,

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER seven Statistical Analysis with Excel CHAPTER chapter OVERVIEW 7.1 Introduction 7.2 Understanding Data 7.3 Relationships in Data 7.4 Distributions 7.5 Summary 7.6 Exercises 147 148 CHAPTER 7 Statistical

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

WHAT IS A JOURNAL CLUB?

WHAT IS A JOURNAL CLUB? WHAT IS A JOURNAL CLUB? With its September 2002 issue, the American Journal of Critical Care debuts a new feature, the AJCC Journal Club. Each issue of the journal will now feature an AJCC Journal Club

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

An Introduction to Statistics using Microsoft Excel. Dan Remenyi George Onofrei Joe English

An Introduction to Statistics using Microsoft Excel. Dan Remenyi George Onofrei Joe English An Introduction to Statistics using Microsoft Excel BY Dan Remenyi George Onofrei Joe English Published by Academic Publishing Limited Copyright 2009 Academic Publishing Limited All rights reserved. No

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions. Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information