DATA INTERPRETATION AND STATISTICS

Size: px
Start display at page:

Download "DATA INTERPRETATION AND STATISTICS"

Transcription

1 PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE STATISTICS Data Data are obtained by making observations of the world about us. Data are obtained from experiments or from studying patients. Data contain information about the system or individuals under study, but in order to make judgements it is usually necessary to process the data to extract relevant information. Types of data: Non-parametric data: Parametric data: nominal or categorical data (e.g. names, colours etc without any preferences) ordinal data (rankings, 1st nd 3rd etc.) numerical/quantitative measurements may be on an interval scale (e.g. height, weight) or they may be discrete values on a discontinuous scale (e.g. number of offspring) Data, especially biological data, tend to be scattered. This form of variability may be an inherent property of the quantity measured or it may be due to the limited accuracy of measurement. It is more difficult to draw conclusions from data that are very scattered. Samples and populations To assess the properties of populations it is frequently necessary to make measurements on subsets called samples. This may be because it is often impossible or unreasonable to carry out measurements on the entire population because it is too large (e.g. the height of all Africans) or because the population is infinite (e.g. a subject s height measured several times. You will not get exactly the same result each time, so you settle for a finite number of measurements since it would not be possible to make an infinite number of measurements). Samples should be representative and not biased. Usually randomly selected. From the properties of sample data, we infer the properties of the population. Describing data numerically Data may be described by calculating quantities that measure:- 1. central tendency: mean, median or mode; X Mean : X = N (n +1) Median : th value Mode : most frequent value. spread or scatter or dispersion: range, variance, standard deviation, coefficient of variation The range = X max - X min is a poor measure of dispersion because it depends entirely on the extreme values, providing no information about the intermediate ones. The mean of the differences between each X and the mean is (X - X) n but this is not a useful measure because it is close to zero when the distribution is symmetrical. If we square the differences before dividing by n we get the variance: (X - X ) n If x, for example, is in cm then the variance is in cm, consequently we use the standard deviation (SD) which is the square root of the variance. SD = (X - X ) n c60notestat.doc 1

2 PholC60 September 001 This formula calculates the SD of a population of n observations X. We know that a sample mean is an estimator of the population mean, but a sample SD, calculated from the above formula, would give a biased estimate of the population SD. This is for rather complicated reasons. Briefly, it is because a single sample is not likely to contain extreme values and the SD calculated over n tends to underestimate the population SD. To remove this bias in the estimate of population SD we divide by (n-1), rather than n, when calculating the sample SD. Thus for a sample:- SD = (X - X ) n -1 The quantity n - 1 is called the number of degrees of freedom (d.f.). Statisticians will tell you that each time you calculate a statistic from sample data the number of degrees of freedom is reduced by 1. Thus we use n in calculating the mean value X, but since we use the mean in calculating SD the d.f. becomes n - 1 and we use this instead of n. This is the way you will nearly always calculate SD. CHECK YOUR CALCULATOR! Use this simple example. The SD of the sample 1,, 3 is 1. If the population is 1,, 3 then the SD is Presenting data graphically Graphic methods allow the visual assessment of data. For nominal data this can take the form of a bar chart or a pie diagram. For example: Number of people 50 0 Histograms Blue Brown brown blue Parametric data i.e. numerical data may be plotted as a histogram. The quantity measured is divided into intervals or classes of appropriate size and the number of observations within each class is plotted. We are, therefore, classifying the data. The total area under the histogram is proportional to the total number of observations. Rather than plotting the number of observations in each class on the vertical axis, it is common to plot the frequency. This is number of observations in each class divided by the total number of observations. The sum of all the frequencies will be 1. Instead of plotting each class as a block, a frequency polygon outlining the profile can be drawn. Such a graph is called the frequency distribution. To summarise: 1. Measurements are performed upon a sample taken from a population.. We may construct a histogram or frequency distribution of the sample data Using Histograms to classify data Scatter plot Frequency polygon 3. We may calculate from our sample data quantities called statistics that are estimators of population properties. These include measures of central tendency: e.g. mean, median and mode. The scatter or spread in the data is best described by statistics such as SD or coefficient of variation (SD/mean as a percentage). FREQUENCY NUMBER NUMBER Body mass (kg) 9 data points CLASS INTERVAL = Body mass (kg) CLASS INTERVAL = Body mass (kg) Body mass (kg) FREQUENCY c60notestat.doc

3 PholC60 September 001 Standard error of the mean If many samples are selected from a population, each has its own mean value, X. The distribution of these means is called a sampling distribution and it is centred around the population mean µ. The width of the sampling distribution depends on the number of items in each sample. Larger samples give narrower sampling distributions. This means that if you take a sample of 0 items, the mean value will be closer to µ than if you take a sample of, say, 5 items. The SD of the sampling distribution is called the standard error of the mean (SE or SEM). The smaller it is, the closer a sample mean is likely to be to µ. Estimating population statistics 1. The sample mean X provides an estimate of the population mean m.. The sample SD s provides an estimate of the population SD s. Just how close these estimates are to the actual values depends on the number of measurements or items in the sample. The standard error of the mean is a measure of the closeness of the sample mean to the population mean. It is given by (Remember, it's always n here, never n-1) SE = Illustrating the spread of data graphically It is usual to show the SD or more commonly the SE on data plots as error bars Box and whisker plot This can be used instead of a dot or scatter plot to indicate the central tendency and the spread of data. It may be drawn horizontally, as below, or vertically. The ends of the whiskers indicate the limits of the data (range), while the box encloses the values within SD s either side of the mean. The central vertical line is the mean value. Alternatively, another common convention is that the central line is the median and the box encloses the upper and lower quartiles. The Normal Distribution The Normal distribution is one of the most common frequency distributions that occurs. It is bell-shaped and symmetrical about the central value. Its shape is completely defined by the mathematical equation or formula that describes it. y = y 1 0 πσ e -(x-µ ) σ It's not necessary for you to manipulate this rather forbidding equation, but if you are mathematical you may notice that Frequency Frequency s n Frequency Frequency SD µ µ n = 0 Distribution of sample means (X) n = 10 The Normal Distribution Measured quantity Central value Samples n = 5 - SD -1 SD +1 SD + SD Standard error of the mean 1 SEM = s n e = height c60notestat.doc 3

4 PholC60 September 001 when x = µ, y is at its maximum. Also when x-µ = ± σ, y is 1/ e or 0.61 of its maximum value. Thus, the Normal curve for a large population is the frequency polygon, centred at the population mean µ and with a half-width of σ (the SD) at 61% of the maximum height. Any Normal curve is completely defined in terms of shape by the parameters µ and σ, which determine its centre and width. Its area is equal to the number of items/observations. A simplified form is provided by the Standard Normal Distribution (SND), where µ is set to zero and the units of measure on the horizontal axis are SD's; (i.e. x has become z = (x - µ)/σ). The area under the whole curve is 1. The area may be divided into parts by drawing vertical lines. We can use this property of a Normal curve to provide an additional way of describing the spread of data in a sample. It applies to large (n > 60) samples taken from a Normally distributed population and it is called the 95% confidence interval: 95% c.i. = X ± (1.96 SE) This doesn't apply to small samples (n < 60) since although the population may be Normally distributed, the samples tend to be distributed according to the the so-called t distribution (a little broader than a Normal curve). An additional complication is that, unlike the Normal distribution, the shape of the t distribution depends on the number of degrees of freedom. Thus 95% c.i. = X ± ( t SE) where the value of t is given by the t tables at d.f. = n - 1 and p =.05. STATISTICAL INFERENCE Tests of significance Significance tests are used to determine the likelihood that two (or more) samples come from the same population. For example, does a particular form of treatment make the patient better or could it have happened by chance? The general procedure is as follows. 1. Formulate a null hypothesis (called H 0 for short). This takes the pessimistic view that differences between sample means is due entirely to chance, i.e. both samples are derived from the same population.. Calculate the significance level of H 0. This is the probability that the null hypothesis is true. 3. If the significance level is (by convention) below 5% (p <.05) we reject H 0. Decisions about significance depend on the area under the appropriate distribution. Test can be two-tailed or they can be single tailed (for differences in one direction only). More on this below. Paired data, small samples For two samples of paired data, i.e. data that are matched or that correspond in a one-to-one relation e.g. measurements on the same individual "before" and "after" treatment, and where n < 60 and the data are from a Normal distribution, we use a paired t test. (t is the number of SE's between the means). This test is best performed by calculating the differences between the measurements on each individual and then determining if the mean difference is significantly different from zero; H 0 states that it is not. t = mean diff (s / n ) d.f. = n - 1 c60notestat.doc 4

5 PholC60 September 001 Example: Hours of sleep in patients after taking sleeping drug. Patient Without drug After drug Difference mean SD SE t 3.18 d.f. 9 H 0 : The means 5.8 h and 7.8 h are not significantly different. Alternatively, the mean difference 1.78 h is not significantly different from zero. Looking in the t table we find: at d.f. = 9 and p <.05 t =.6, at d.f. = 9 and p <.0 t =.8, at d.f. = 9 and p <.01 t = 3.5. We can therefore reject the null hypothesis at p <.0 and conclude that the drug is effective at changing the number of hours of sleep. Another way of putting it is that the probability that the difference in the amounts of sleep was achieved purely by chance is less than %. NOTE: This was a two-sided or two-tailed comparison. It told us that the number of sleep hours would be different but not specifically more. If there was no chance that a particular treatment could reduce sleep hours, then we could use the data in a single-tailed (-sided) test and conclude that for t =.8 and d.f. = 9 the probability of H 0 is < 1% (i.e. half of %). The t tables give values for either case and you have to make the choice. You will nearly always use two-tailed comparisons. Paired data, large samples When n > 60 the t distribution and the Normal distribution are very similar, so we calculate not t but z, the Standard Normal Deviate (see above). Remember z = (difference in means)/se; it does not depend on d.f. Unpaired data An unpaired, or two sample, t test is used to compare samples that have no correspondence, for example a set of patients and a set of healthy controls. The number in each sample does not have to be the same. If the SE for each sample is similar then it is necessary to calculate a pooled SE s p. (If the SE's are rather different then other methods may be used). This is then used to compute t as t = s = p (n -1)s + (n -1)s n + n X 1 - X s ( 1 n + 1 d.f. = n 1 + n - p n ) 1 c60notestat.doc 5

6 PholC60 September 001 For example birth weights (kg) of children born to smokers and non-smokers: Non-smokers Heavy Smokers _.71 X d.f. = = 7 SD SE t =.4 with d.f. = 7 n In the t table t =.47 at p <.0 and so we clearly reject the null hypothesis. Note: For large samples use the Normal table (SND) and compute z from z = X - X s1 n 1 Non-parametric tests of significance When data are not normally distributed we can often still use the parametric tests described above if we can transform the data in a way that makes them normal. This can be achieved in a variety of ways, sometimes simply by taking the logarithm. If this cannot be done or if the data are ordinal rather than parametric, then we must resort to a nonparametric test. For these tests the data are converted from an interval scale into ranked data. The subsequent tests then only consider the relative magnitudes of the data, not the actual values, so some information is lost. There are many different non-parametric tests, all with specific applications. However, there is a correspondence between the parametric and non-parametric methods. These tests are not difficult to use and an appropriate textbook can be consulted for the methods when necessary. As with many of the less common statistical tests, it is advisable to seek the assistance of a statistician before embarking on extensive usage. To illustrate a non-parametric method the Wilcoxon signed rank test will be used on the data used above for the paired t test. Hours of sleep in patients after taking sleeping drug. 1 s + n Patient Before After drug Difference Rank tied & Procedure: 1. Rank the differences, excluding any that = 0; (ignore the signs).. Sum the ranks with positive and with negative differences: T+ = = 50.5 T- = = 4.5 c60notestat.doc 6

7 PholC60 September 001 H 0 : Drug and placebo give the same results. Thus we expect T+ to be similar to T-. If they are not then compare the smallest with that due to chance alone. Let T = smallest of T+ and T-. Thus T = 4.5. Look up T in the Wilcoxon signed rank table at a sample size of N, where N = number of ranked differences excluding zeros. Thus N = 10 and we find that p <.0 and reject H 0. Comparing more than two samples Suppose you were asked to compare blood pressure readings from English, Welsh and Scottish people and were asked if they were different from one-another. The t test is not appropriate for such a study. The equivalent of a t test for more than two samples is called analysis of variance (anova for short). This procedure, which can only be applied to normally distributed data, enables you to determine if the variation between sample means can be accounted for by the variation that occurs within the data as a whole, (this is the null hypothesis), or whether the variation between the means is due to significant differences between them. For a factor of analysis (such as nationality) one way anova is performed, for two factors of analysis (for instance nationality and sex) two way anova is used, and so on. Variances are calculated from "sums of squares" (i.e. Σ(X - X), let us call it SS for short). These may be partitioned in the following way The procedure is as follows. 1. Calculate the total SS, i.e. over all the data. SS total = SS between groups + SS within groups. Calculate the SS between the means of each group or sample. 3. Calculate the residual SS which is the SS within the groups. Now calculate the ratio F of the between-group variance to the within-group variance and deduce the p value from the F table. (Note that for only two groups the result is identical to the t test.) Comparing observed and expected data. The c test. A way of comparing data that can be grouped into categories is to place the results in a contingency table that contains both the observed and expected data. One of the ways of testing that the difference between observed and expected values are significant is the χ test. (Note χ or chi, pronounced as in sky, is a Greek letter. It is not always available to typists and printers, which is why it is sometimes written as chi). The restrictions on the use of this test are 1. n > 0. There must be at least 5 items in any "expected" box 3. The boxes must contain actual data not proportions On the other hand, χ tests are not restricted to Normally distributed data. The χ test can be used to detect an association between two (or more) variables measured for each individual. These variables need not be continuous. They can be discrete or nominal (see above). For two variables we use a x contingency table. For example: Does influenza vaccination reduce the chance of contracting the disease? OBSERVED DATA: 'flu Vaccinated Placebo Total Yes No Expected values are calculated assuming the null hypothesis; e.g. in the first box multiply 40 by the overall proportion catching 'flu: 40 x 100/460 = 5. etc. EXPECTED DATA: Yes No χ = (Obs - Exp ) Exp c60notestat.doc 7

8 PholC60 September 001 χ = (0-5. ) 5. + ( ) ( ) ( ) 17. = The number of degrees of freedom is (no rows-1)(no columns-1) = 1. From the χ table, χ =10.83 for p = greatly exceeds this, so we may reject H 0 conclude that the vaccine is effective. Errors in significance testing Rejection of H 0 is sometimes termed a "positive" finding while acceptance is "negative". For example, when a patient is tested for a particular disease and the result is significantly different from controls, the individual is termed positive for that test. If the test was faulty it might give false positive or false negative results. These are classified as: Type I errors or false positives Incorrect rejection of H 0 Type II errors or false negatives Incorrect acceptance of H 0 Statistical power By definition, the probability of a Type I error is equal to the chosen significance level (usually 5%). We can reduce the probability of a Type I error by setting a lower significance level, say to 1%. The probability of a Type II error is a little more complicated. If H 0 is false then the distribution of sample means will be centred around a population mean that is different from µ. Let us call it µ. We reject H 0 when our sample mean lies in the tails of the sampling distribution centred on µ. However, there is a chance that our sample could have a mean in the overlap region, i.e. there is a b% chance that we would incorrectly accept the null hypothesis. The power of a statistical test is given by the probability of not doing this. i.e. 100-b%. Decreasing the significance level will reduce the power. Increasing sample size will increase the power. c60notestat.doc 8

9 PholC60 September 001 CORRELATION AND LINEAR REGRESSION If we want to measure the degree of association between two variables that we suspect may be dependent on one another we can calculate the correlation coefficient or perform linear regression. These methods test only for a linear association, i.e. that the data are related by an expression of the type y = a + bx. (Recall that this is the equation of a straight line with a slope b and an intercept on the y axis at y = a): An alternative approach and an important preliminary test is to draw a scatter plot of the data. For example compare IQ and height for a sample of individuals. In another example compare probability of heart disease with daily fat intake. r = 0 0 < r < 1 y x 1 b There doesn't seem to be much correlation between height and intelligence, but there appears to be an increased likelihood of heart disease when more fat is consumed. The horizontal axis (sometimes called the abscissa) is usually the independent variable, the one whose values you select or are determined already. The vertical axis (or ordinate) is usually reserved for the dependent variable, the one that is determined by nature. Correlation coefficient This is given by Examples: r = r = 1 r = -1 Intelligence score 100 Height (X - X)(Y - Y) (X - X ) (Y - Y ) Risk of Heart Attack Dietary fat intake Y Y X X It would be inappropriate to calculate the correlation coefficient of data that is non-linear, (i.e. does not follow a straight line relationship) e.g. Y Notice: 1. r has no units.. The closer r is to ± 1, the better the correlation. 3. Correlation doesn't necessarily indicate direct causality. Remember: The data must also be Normally distributed, (otherwise use a non-parametric test such as Spearman's rank correlation test). X c60notestat.doc 9

10 PholC60 September 001 Example: Is there a correlation between body weights of 8 healthy men and their corresponding blood plasma volumes? Subject weight (kg) plasma vol. (l) We find r = 0.76 which is a rather weak correlation. Clearly other factors must affect plasma volume. How much of the observed variation is determined by body weight? This is given by r which is called the coefficient of determination. In our example r = 0.58, so 58% of the variation in plasma volume is accounted for by its correlation with body weight. Linear regression This is an alternative way of assessing dependence, but it also provides the equation of the straight line that best fits the data, by specifying its slope and intercept. This line is called the regression line. This is achieved by minimising the distances between the data points and the fitted line: Usually x is the independent variable (i.e. determined by the investigation) and the vertical (y) distances are minimised. (For example we wish to know how plasma volume is determined by body weight not the converse). The line we obtain is then termed the regression of y upon x. Its equation is given by (X - X )(Y - Y ) Y = a + bx where b = (X - X ) Plasma volume (l) and a = Y - bx Body mass (kg) In our example b = a = so that Y = X and we can construct the line by calculating x and y values. The derived equation can be used to calculate values of y for a given x. Alternatively, y values may be read directly from the straight line graph. Both of these operations should be restricted to the region encompassed by the original data. This is called interpolation. The estimations of y values beyond the data region is called extrapolation. Often there is no reason to assume that the regression line will apply beyond the data limits, so extrapolation can be misleading Plasma volume (l) Body mass (kg) c60notestat.doc 10

11 Areas in tail of the standard normal distribution Proportion of area above z Second decimal place of z z Critical values of t p values one tailed two tailed df

12 inf c Distribution p value df

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

F. Farrokhyar, MPhil, PhD, PDoc

F. Farrokhyar, MPhil, PhD, PDoc Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

More information

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Chapter 3: Central Tendency

Chapter 3: Central Tendency Chapter 3: Central Tendency Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the distribution and represents

More information

7. Tests of association and Linear Regression

7. Tests of association and Linear Regression 7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS

Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS

Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS 1. Introduction, and choosing a graph or chart Graphs and charts provide a powerful way of summarising data and presenting them in

More information

4. Introduction to Statistics

4. Introduction to Statistics Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Lesson Lesson Outline Outline

Lesson Lesson Outline Outline Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ 1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data to get a general overview of the results. Remember, this is the goal

More information

Outline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics

Outline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics Statistical Methods I Tamekia L. Jones, Ph.D. (tjones@cog.ufl.edu) Research Assistant Professor Children s Oncology Group Statistics & Data Center Department of Biostatistics Colleges of Medicine and Public

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Frequency distributions, central tendency & variability. Displaying data

Frequency distributions, central tendency & variability. Displaying data Frequency distributions, central tendency & variability Displaying data Software SPSS Excel/Numbers/Google sheets Social Science Statistics website (socscistatistics.com) Creating and SPSS file Open the

More information

Regression. In this class we will:

Regression. In this class we will: AMS 5 REGRESSION Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Analysis of numerical data S4

Analysis of numerical data S4 Basic medical statistics for clinical and experimental research Analysis of numerical data S4 Katarzyna Jóźwiak k.jozwiak@nki.nl 3rd November 2015 1/42 Hypothesis tests: numerical and ordinal data 1 group:

More information

San Jose State University Engineering 10 1

San Jose State University Engineering 10 1 KY San Jose State University Engineering 10 1 Select Insert from the main menu Plotting in Excel Select All Chart Types San Jose State University Engineering 10 2 Definition: A chart that consists of multiple

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Readings: Ha and Ha Textbook - Chapters 1 8 Appendix D & E (online) Plous - Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability

More information

GCSE Statistics Revision notes

GCSE Statistics Revision notes GCSE Statistics Revision notes Collecting data Sample This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table 2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Introduction to Descriptive Statistics

Introduction to Descriptive Statistics Mathematics Learning Centre Introduction to Descriptive Statistics Jackie Nicholas c 1999 University of Sydney Acknowledgements Parts of this booklet were previously published in a booklet of the same

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Chapter 11: Two Variable Regression Analysis

Chapter 11: Two Variable Regression Analysis Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:

We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students: MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen!

Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen! Information Science, Groningen j.nerbonne@rug.nl Slides improved a lot by Harmut Fitz, Groningen! March 24, 2010 Correlation and regression We often wish to compare two different variables Examples: compare

More information

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

More information

III. GRAPHICAL METHODS

III. GRAPHICAL METHODS Pie Charts and Bar Charts: III. GRAPHICAL METHODS Pie charts and bar charts are used for depicting frequencies or relative frequencies. We compare examples of each using the same data. Sources: AT&T (1961)

More information

Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics. Probability and Statistics Curriculum Guide. Revised 2010 Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Lesson 4 Part 1. Relationships between. two numerical variables. Correlation Coefficient. Relationship between two

Lesson 4 Part 1. Relationships between. two numerical variables. Correlation Coefficient. Relationship between two Lesson Part Relationships between two numerical variables Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear between two numerical variables Relationship

More information

Introductory Statistics Notes

Introductory Statistics Notes Introductory Statistics Notes Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 3 / 4 May 2005 Department of Experimental Psychology University of Cambridge Slides at pobox.com/~rudolf/psychology

More information

Quantitative Data Analysis: Choosing a statistical test Prepared by the Office of Planning, Assessment, Research and Quality

Quantitative Data Analysis: Choosing a statistical test Prepared by the Office of Planning, Assessment, Research and Quality Quantitative Data Analysis: Choosing a statistical test Prepared by the Office of Planning, Assessment, Research and Quality 1 To help choose which type of quantitative data analysis to use either before

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

GCSE HIGHER Statistics Key Facts

GCSE HIGHER Statistics Key Facts GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Appendix E: Graphing Data

Appendix E: Graphing Data You will often make scatter diagrams and line graphs to illustrate the data that you collect. Scatter diagrams are often used to show the relationship between two variables. For example, in an absorbance

More information

AP Statistics: Syllabus 3

AP Statistics: Syllabus 3 AP Statistics: Syllabus 3 Scoring Components SC1 The course provides instruction in exploring data. 4 SC2 The course provides instruction in sampling. 5 SC3 The course provides instruction in experimentation.

More information

Table 2-1. Sucrose concentration (% fresh wt.) of 100 sugar beet roots. Beet No. % Sucrose. Beet No.

Table 2-1. Sucrose concentration (% fresh wt.) of 100 sugar beet roots. Beet No. % Sucrose. Beet No. Chapter 2. DATA EXPLORATION AND SUMMARIZATION 2.1 Frequency Distributions Commonly, people refer to a population as the number of individuals in a city or county, for example, all the people in California.

More information

Statistics and research

Statistics and research Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

E205 Final: Version B

E205 Final: Version B Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56

2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56 2. Describing Data We consider 1. Graphical methods 2. Numerical methods 1 / 56 General Use of Graphical and Numerical Methods Graphical methods can be used to visually and qualitatively present data and

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.

Variables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test. The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide

More information

Sheffield Hallam University. Faculty of Health and Wellbeing Professional Development 1 Quantitative Analysis. Glossary

Sheffield Hallam University. Faculty of Health and Wellbeing Professional Development 1 Quantitative Analysis. Glossary Sheffield Hallam University Faculty of Health and Wellbeing Professional Development 1 Quantitative Analysis Glossary 2 Using the Glossary This does not set out to tell you everything about the topics

More information

Statistics revision. Dr. Inna Namestnikova. Statistics revision p. 1/8

Statistics revision. Dr. Inna Namestnikova. Statistics revision p. 1/8 Statistics revision Dr. Inna Namestnikova inna.namestnikova@brunel.ac.uk Statistics revision p. 1/8 Introduction Statistics is the science of collecting, analyzing and drawing conclusions from data. Statistics

More information

How to choose a statistical test. Francisco J. Candido dos Reis DGO-FMRP University of São Paulo

How to choose a statistical test. Francisco J. Candido dos Reis DGO-FMRP University of São Paulo How to choose a statistical test Francisco J. Candido dos Reis DGO-FMRP University of São Paulo Choosing the right test One of the most common queries in stats support is Which analysis should I use There

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

More information