Basic Statistics for SGPE Students Part I: Descriptive Statistics

Size: px
Start display at page:

Download "Basic Statistics for SGPE Students Part I: Descriptive Statistics"

Transcription

1 Basic Statistics for SGPE Students Part I: Descriptive Statistics Achim Ahrens Anna Babloyan Erkal Ersoy Heriot-Watt University, Edinburgh September 2015

2 Hypothesis testing and p-values 1 / 46 Outline 1. Descriptive statistics Sample statistics (mean, variance, percentiles) Graphs (box plot, histogram) Data transformations (log transformation, unit of measure) Correlation vs. Causation 2. Probability theory Conditional probabilities and independence Bayes theorem 3. Probability distributions Discrete and continuous probability functions Probability density function & cumulative distribution function Binomial, Poisson and Normal distribution E[X] and V[X] 4. Statistical inference Population vs. sample Law of large numbers Central limit theorem Confidence intervals

3 Descriptive statistics In recent years, more and better-quality data have been recorded than any other time in history. The increasing size of data sets that are readily available to us has enabled us to adopt new and more robust statistical tools. Rising data availability has (unfortunately) led to empirical researchers to sometimes overlook some preliminary steps, such as summarizing and visually examining their data sets. Ignoring these preliminary steps can lead to important issues and invalidate seemingly significant results. As we will see in this and following lectures, there are ways in which we can numerically summarize a data set. Before we discuss those approaches, let s take a quick look at what s available to us to visualize a data set graphically. 2 / 46

4 Descriptive statistics Histograms Histograms are extremely useful in getting a good graphical representation of the distribution of data. These figures consist of adjacent rectangles over discrete intervals, whose areas are the frequency of observations in the interval. Histograms are often normalized to show the proportion (or densities) of observations that fall into non-overlapping categories. In such cases, the total area under the bins equal 1. Remark The height of each bin in a normalized histogram represents density or proportion of observations that fall into that category. These can more easily be interpreted as percentages. 3 / 46

5 Descriptive statistics Histograms 1960 Density Life expectancy (in years) Approximately, what is the average life expectancy in 1960? Roughly what percentage of countries had life expectancy above 65? What proportion of countries had a life expectancy less than 55 years? 4 / 46

6 Descriptive statistics Histograms 1960 Density Life expectancy (in years) 5 / 46

7 Descriptive statistics Histograms 1990 Density Life expectancy (in years) 5 / 46

8 Descriptive statistics Histograms 2011 Density Life expectancy (in years) 5 / 46

9 Descriptive statistics The Mean and Standard Deviation A histogram can help summarize large amounts of data, but we often like to see an even shorter (and sometimes easier to interpret) summary. This is usually provided by the mean and the standard deviation. The mean (and median) are frequently used to find the center, whereas standard deviation measures the spread. Definition The (arithmetic) mean of a list of numbers is their sum divided by how many there are. For example, the mean of 9, 1, 2, 2, 0 is = 2.8. More generally, mean = x = 1 n n x i ; i=1 i = 1...n 6 / 46

10 Descriptive statistics The Mean and Standard Deviation The standard deviation (SD) tells us how far numbers on a list deviate from their average. Usually, most numbers are within one SD around the mean. More specifically, for normally distributed variables, about 68% of entries are within one SD of the mean and about 95% of entries are within two SDs. 68% mean mean - one SD mean + one SD 95% mean mean - two SDs mean + two SDs 7 / 46

11 Descriptive statistics Computing the Standard Deviation Definition Standard Deviation = mean of (deviations from the mean) 2 where deviation from mean = entry mean And in formal notation, σ = 1 N Ni=1 (x i µ) 2, where µ = 1 N (x x N ). Example: Find the SD of 20, 10, 15, 15. Answer: mean = x = = 15 Then, the deviations are 5, 5, 0, 0, respectively. So, SD = 5 2 +( 5) = 50 4 = Remark The SD comes out in the same units as the data. For example, if the data are a set of individuals heights in inches, the SD is in inches too. 8 / 46

12 Descriptive statistics The Root-Mean-Square Consider the following list of numbers: 0, 5, 8, 7, 3. Question: How big are these numbers? What is their mean? The mean is 0.2, but this does not tell us much about the size of the numbers it only implies that the positive numbers slightly outweigh the negative ones. To get a better sense of their size, we could use the mean of their absolute values. Statisticians tend to use another measure, though: The root-mean-square. Definition Root mean square (rms) = average of (entries) 2 9 / 46

13 Descriptive statistics The Root-Mean-Square and Standard Deviation There is an alternative way of calculating SD using root-mean-square: Remark SD = mean of (entries) 2 (mean of entries) 2 Recall the four numbers we used earlier to calculate SD: 20, 10, 15, 15. mean of (entries) 2 = = = (mean of entries) 2 = ( ) 2 = ( 60 4 )2 = 225 Therefore, SD = = , which agrees with what we found earlier. 10 / 46

14 Descriptive statistics Variance In probability theory and statistics, variance gets mentioned nearly as often as the mean and standard deviation. It is very closely related to SD and is a measure of how far a set of numbers lie from their mean. Variance is the second moment of a distribution (mean being the first moment), and therefore, tells us about the properties of the distribution (more on these later). Definition Variance = (Std. Dev.) 2 = σ 2 11 / 46

15 Descriptive statistics Normal Approximation for Data and Percentiles S&P 500, January December s.d. -1 s.d. mean +1 s.d. +2 s.d. +3 s.d. +4 s.d. Frequency ,000 10,000 15,000 20,000 25,000 Volume (thousands) Source: Yahoo! Finance and Commodity Systems, Inc. Is the normal approximation satisfactory here? 12 / 46

16 Descriptive statistics Normal Approximation for Data and Percentiles -2 s.d with normal -1 s.d. mean +1 s.d. +2 s.d. Density Life expectancy (in years) How about here? 13 / 46

17 Descriptive statistics Normal Approximation for Data and Percentiles Remark The mean and SD can be used to effectively summarize data that follow the normal curve, but these summary statistics can be much less satisfactory for data that do not follow the normal curve. In such cases, statisticians often opt for using percentiles to summarize distributions. Table. Selected percentiles for life expectancy in 2011 Percentiles Value / 46

18 Descriptive statistics Calculating percentiles 1. Order all the values in your data set in ascending order (i.e. smallest to largest). 2. Select a percentile, P, that you would like to calculate and multiply it by the total number of entries in your data set, n. The value you obtain here is called the index. 3. If the index is not a whole number, round it up to the next integer. 4. Count the entries in your list of numbers starting from the smallest one until you get to the number indicated by your index. 5. This entry is the kth percentile in your data set. 15 / 46

19 Descriptive statistics Calculating percentiles Example Consider the following list of 5 numbers: 10, 15, 20, 25, 30. What is the entry that corresponds to the 25th percentile? What is the median? To obtain the 25th percentile, all we need to do is = After rounding, this value becomes 1, so 25th percentile in this case is the first entry, 10. We were also asked to obtain the median. To do this, calculate = 2.5. Rounding this to the nearest whole number gives 3. So, the median in this case is / 46

20 Descriptive statistics Percentiles The 1st percentile of the distribution is approximately 48, meaning that the life expectancy in 1% of countries in 2011 was 48 or less, and 99% of countries had life expectancy higher than that. Similarly, the fact that 25th percentile is 63 implies that 25% of countries had life expectancy of 63 or less, whereas 75% had a longer expected lifespan. Definition Interquartile range is defined as 75th percentile 25th percentile and is sometimes used as a measure of spread, particularly when the SD would pay too much (or too little) attention to a small percentage of cases in the tails of the distribution. From the table above, the interquartile range equals = 13.9 (and SD was 10.14). 17 / 46

21 Descriptive statistics Box plots The structure of a box plot: Whiskers Adjacent line (Upper adjacent value) The largest value within 75th percentile + 75th percentile/3rd quartile (upper hinge) Box 50th percentile (median) 25th percentile/1st quartile (lower hinge) Whiskers Adjacent line (Lower adjacent value) The smallest value within 25th percentile - Entries less than the lower adjacent value 18 / 46

22 Descriptive statistics Box plots Life expectancy (in years) Life expectancy by region in 2011 EAS ECS LCN MEA NAC SAS SSF Are there any clear patterns emerging from summarizing the data this way? Legend EAS: East Asia & Pacific ECS: Europe & Central Asia LCN: Latin America & Caribbean MEA: Middle East & North Africa NAC: North America SAS: South Asia SSF: Sub-Saharan Africa 19 / 46

23 Descriptive statistics Box plots We might be able to spot some patterns that developed over time if we look at different years: Life expectancy by region 1960 Life expectancy (in years) EAS ECS LCN MEA NAC SAS SSF 20 / 46

24 Descriptive statistics Box plots We might be able to spot some patterns that developed over time if we look at different years: Life expectancy by region 1990 Life expectancy (in years) EAS ECS LCN MEA NAC SAS SSF 20 / 46

25 Descriptive statistics Box plots We might be able to spot some patterns that developed over time if we look at different years: Life expectancy by region 2011 Life expectancy (in years) EAS ECS LCN MEA NAC SAS SSF 20 / 46

26 Data Transformations The effects of changing the unit of measure Now that we know how to summarize a dataset, let us turn to investigating the effects of changing the unit of measure for a variable on the mean and standard deviation. Such changes in the unit of measure could be for practical reasons or based on theory, but regardless of the reason, a statistician should know what to expect. To study this, let s consider a dataset on 200 individuals weights and heights. Each entry is originally reported in kg and cm, respectively, and below are some summary statistics: Table. Summary statistics Variable Mean Standard Deviation Weight (kg) Height (cm) / 46

27 Data Transformations The effects of changing the unit of measure And here are some diagrams that summarize the distribution of the two variables. Weight measured in kg -2 s.d. -1 s.d. mean +1 s.d. +2 s.d. -2 s.d. Height measured in cm -1 s.d. mean +1 s.d. +2 s.d. Density Measured weight in kg Density Measured height in cm Does the normal approximation look satisfactory? 22 / 46

28 Data Transformations The effects of changing the unit of measure And here are some diagrams that summarize the distribution of the two variables. Weight (kg) by sex Height (cm) by sex Measured weight in kg Measured height in cm F M F M 23 / 46

29 Data Transformations The effects of changing the unit of measure And here are some diagrams that summarize the distribution of the two variables. Weight measured in kg -2 s.d. -1 s.d. mean +1 s.d. +2 s.d. -2 s.d. Weight measured in lb -1 s.d. mean +1 s.d. +2 s.d. Density Measured weight in kg Density Measured weight in pounds Do you think the mean matches the original one (in correct units)? How about the standard deviation? 24 / 46

30 Data Transformations The effects of changing the unit of measure And here are some diagrams that summarize the distribution of the two variables. -2 s.d. Height measured in cm -1 s.d. mean +1 s.d. +2 s.d. -2 s.d. Height measured in in -1 s.d. mean +1 s.d. +2 s.d. Density Measured height in cm Density Measured height in inches Do you think the mean matches the original one (in correct units)? How about the standard deviation? 25 / 46

31 Data Transformations The effects of changing the unit of measure Here are the box plots with the transformed data: Weight (lb) by sex Height (in) by sex Measured weight in pounds Measured height in inches F M F M 26 / 46

32 Data Transformations The effects of changing the unit of measure Observations made using the figures are, of course, based on what statisticians and econometricians often call "eye-balling" the data. These observations are certainly not formal, but are a crucial part of effectively analyzing any dataset. In fact, you should make plotting, investigating and eye-balling your data a habit before you dive into complicated models and overlook important features of your dataset. Now that we have made our informal observations, let s look at the actual numbers. Table. Summary statistics Variable Mean SD Mean (converted) SD (converted) Weight (kg) Height (cm) Weight (lb) / / Height (in) / 46

33 Data Transformations The effects of changing the unit of measure Observations made using the figures are, of course, based on what statisticians and econometricians often call "eye-balling" the data. These observations are certainly not formal, but are a crucial part of effectively analyzing any dataset. In fact, you should make plotting, investigating and eye-balling your data a habit before you dive into complicated models and overlook important features of your dataset. Now that we have made our informal observations, let s look at the actual numbers. Table. Summary statistics Variable Mean SD Mean (converted) SD (converted) Weight (kg) Height (cm) Weight (lb) Height (in) / 46

34 Data Transformations The effects of changing the unit of measure We have seen that the mean and the standard deviation remain the same when we change the unit of measure, but how does variance behave? Table. Summary statistics Variable Mean Variance Mean (converted) Variance (converted) Weight (kg) Height (cm) Weight (lb) / / Height (in) / 46

35 Data Transformations The effects of changing the unit of measure We have seen that the mean and the standard deviation remain the same when we change the unit of measure, but how does variance behave? Table. Summary statistics Variable Mean Variance Mean (converted) Variance (converted) Weight (kg) Height (cm) Weight (lb) Height (in) Note that 1 inch = 2.54 cm and similarly, 1cm = = in. Then, (0.3937) The opposite is true as well: (2.54) And we can apply the same to the weights in kg and lbs. And in general / 46

36 Data Transformations Properties of Variance...variance is scaled by the square of the constant by which all the values are scaled. While we are at it, here are some basic properties of variance: Basic properties of variance Variance is non-negative: Var(X) 0 Variance of a constant random variable is zero: P(X = a) = 1 Var(X) = 0 Var(aX) = a 2 Var(X) However, Var(X + a) = Var(X) For two random variables X and Y, Var(aX + by ) = a 2 Var(X) + b 2 Var(Y ) + 2abCov(X, Y )...but Var(X Y ) = Var(X) + Var(Y ) 2Cov(X, Y ) 29 / 46

37 Data Transformations Log Transformation So far, we have only worked with transformations in which we multiply each value with a constant. However, more complicated transformations are quite common in statistics and econometrics. One of the most common and useful transformations uses the natural logarithm. Definition Data transformation refers to applying a specific operation to each point in a dataset, in which each data point is replaced with the transformed one. That is, x i are replaced by y i = f (x i ). In our previous example with heights, our function, f (x), was simply f (x) = 2.54x. Now, let us study a different function: the natural logarithm. 30 / 46

38 Data Transformations Log Transformation Log transformation in action: Output-side real GDP at current PPPs (in mil. 2005US$) UK Real GDP Year Natural log of output-side real GDP at current PPPs UK Real GDP Year 31 / 46

39 Data Transformations Log Transformation Log transformation in action: Output-side real GDP at current PPPs (in mil. 2005US$) UK Real GDP Year Natural log of real GDP at current PPPs (in mil. 2005US$) UK Real GDP Year 31 / 46

40 Data Transformations Log Transformation Life expectancy (in years) Life Expectancy vs. Real GDP Output-side real GDP at current PPPs (in mil. 2005US$) Life expectancy (in years) Log Life expectancy vs. Log Real GDP Natural log of output-side real GDP at current PPPs Important note The log transformation can only be used for variables that have positive values (why?). If the variable has zeros, the transformation can be applied only after these figures are replaced (usually by one-half of the smallest positive value in the data set). 32 / 46

41 Data Transformations Log Transformation Year: JPN GBR USA Region EAS ECS Life expectancy (in years) [linear scale] CHN IDN IND ZAF RUS LCN MEA NAC SAS SSF Population (in million) Real GDP per capita (at constant 2005 national prices) [linear scale] 33 / 46

42 Data Transformations Log Transformation Year: 1960 Region 80 EAS ECS Life expectancy (in years) [linear scale] CHN IDN IND JPN ZAF GBR USA LCN MEA NAC SAS SSF Population (in million) Real GDP per capita (at constant 2005 national prices) [log scale] 33 / 46

43 Data Transformations Log Transformation Year: 1990 Region Life expectancy (in years) [linear scale] CHN IND IDN ZAF RUS JPN GBRUSA EAS ECS LCN MEA NAC SAS SSF Population (in million) Real GDP per capita (at constant 2005 national prices) [log scale] 33 / 46

44 Data Transformations Log Transformation Year: JPN GBR USA Region EAS ECS Life expectancy (in years) [linear scale] IND IDN CHN ZAF RUS LCN MEA NAC SAS SSF Population (in million) Real GDP per capita (at constant 2005 national prices) [log scale] 33 / 46

45 Data Transformations Log Transformation and growth A useful feature of the log transformation is the interpretation of its first difference as a percentage change (for small changes). This is because ln(1 + x) x for a small x: Wolfram Alpha Strictly speaking, a percentage change in Y from period t 1 to period t is defined as Y t Y t 1 Y t 1, which is approximately equal to ln(y t ) ln(y t 1 ). And the approximation is almost exact if the percentage change is small. To see this, consider the percentage change in US GDP from 2010 to 2011: Table. US Real GDP (in mil US$) Year GDP Percentage change ln(y t ) ln(y 2011 ) ln(y 2010 ) And the difference in percentage change is = a discrepancy that we might be willing to live with. 34 / 46

46 Examining Relationships Covariance and Correlation Our daily lives (and not just within economics) are filled with statements about the relationship between two variables. For example, we might read about a study that found that men spend more money online than women. The relationship between gender and spending more online may not be this simple, of course income might play a role in this observed pattern. Ideally, we would like to set up an experiment in which we control the behavior of one variable (keeping everything else the same) and observe its effect on another. This is often not feasible in economics (a lot more on this later!). For the time being, let s focus on simple correlation. 35 / 46

47 Examining Relationships Covariance and Correlation Scatter plots are very useful in identifying the sign and strength of the relationship between two variables. Therefore, it s always extremely useful to plot your data and investigate what the relationship between your two variables are: Life expectancy Life Expectancy (in years) vs. Internet usage Internet users per 100 people 36 / 46

48 Examining Relationships Covariance and Correlation But these plots can also be misleading to the eye simply by changing the scale of the axes: Life expectancy Life Expectancy (in years) vs. Internet usage Internet users per 100 people Life expectancy Life Expectancy (in years) vs. Internet usage Internet users per 100 people 37 / 46

49 Examining Relationships Covariance and Correlation Therefore, it s best to obtain a numerical measure of the relationship. And correlation is the measure statisticians and econometricians tend to use. Definition Correlation measures the strength and direction of a linear relationship between two variables and is usually denoted as r. r x,y = r y,x = s x,y s x s y where s x,y is the sample covariance, and s x and s y are sample standard deviations of x and y, respectively. The former (i.e. sample covariance) is calculated as: s x,y = s y,x = 1 N (x i x)(y i ȳ). N 1 i=1 38 / 46

50 Examining Relationships Understanding covariance To see how a scatter diagram can be read in terms of covariance between the two variables, consider the USA: Log of real GDP per capita (at constant 2005 national prices) Education and GDP per capita (2010) COD KWT xusa x Average years of total schooling } USA yusa ȳ Because x USA > x and y USA > ȳ, the term (x USA x)(y USA ȳ) is positive. Also, (x COD x)(y COD ȳ) > 0, but (x KWT x)(y KWT ȳ) < 0. Thus, countries located in the top-right and bottom-left quadrants have a positive effect on s x,y, whereas countries in the top-left and bottom-right quadrants have a negative effect on s x,y. Question: Should we use covariance or correlation as a more "robust" measure of the relationship? Why? 39 / 46

51 Examining Relationships Understanding covariance To answer this question, let s look more closely at how covariance behaves: A positive (negative) covariance indicates that x tends to be above its mean value whenever y is above (below) its mean value. A sample covariance of zero suggests that x and y are unrelated. In our example, s x,y = This suggests that there is a positive relationship between x and y. But what does the value of 2.69 tell us about the strength of the relationship? Nothing. Why not? Suppose we wanted to measure schooling in decades instead of years. That is, we generate a new variable which equals school measured in years divided by 10. The new covariance is s x,y = which is much closer to zero. Technically speaking, covariance is not invariant to linear transformations of the variables. 40 / 46

52 Examining Relationships Covariance versus Correlation The sample correlation coefficient addresses this problem. While s x,y may take any value between and +, the correlation coefficient is standardised such that r [ 1, 1]. Recall that r x,y = r y,x = s x,y s x s y where s x,y is the covariance of x and y. s x and s y are the sample standard deviations of x and y, respectively. Note that because s x > 0 and s y > 0, the sign of the sample covariance is the same as the sign of the correlation coefficient. Correlation coefficient r x,y > 0 indicates positive correlation. r x,y < 0 indicates negative correlation. r x,y = 0 indicates that x and y are unrelated. r x,y = ±1 indicates perfect positive (negative) correlation. That is, there exists an exact linear relationship between x and y of the form y = a + bx. 41 / 46

53 y y y Examining Relationships Correlation In our example, r x,y = , which indicates positive correlation (because r x,y > 0) and that the relationship is reasonably strong (because r x,y is not too far away from 1). To get a better feeling for what is "strong" and "weak", we generate 100 observations of x and y with varying degrees of correlation and plot them on a scatter diagram. To get a better feeling for what is "strong" and "weak", we generate 100 observations of x and y with varying degrees of correlation and plot them on a scatter diagram. r(x,y)=.9 r(x,y)=-.9 r(x,y)= x x x 42 / 46

54 Examining Relationships Correlation y r(x,y)= y r(x,y)= y r(x,y)= x x x What s unusual about the right-most diagram here? In the right-most diagram, the correlation coefficient indicates that x and y are unrelated, but the graph implies otherwise. In fact, there is a strong quadratic relationship between x and y in this case. 43 / 46

55 Examining Relationships Summary Correlation, r, measures the strength and direction of a linear relationship between two variables. The sign of r indicates the direction of the relationship: r > 0 for a positive association and r < 0 for a negative one. r always lies within [ 1, 1] and indicates the strength of a relationship by how close it is to 1 or / 46

56 Examining Relationships Correlation vs Causation You may have already encountered the statement that Correlation does not imply causation. This is an important concept to grasp, because even a strong correlation between two variables is not enough to draw conclusions about causation. For instance, consider the following examples: 1. Do televisions increase life expectancy? 2. Are big hospitals bad for you? 3. Do firefighters make fires worse? 45 / 46

57 Examining Relationships Correlation vs Causation You may have already encountered the statement that Correlation does not imply causation. This is an important concept to grasp, because even a strong correlation between two variables is not enough to draw conclusions about causation. For instance, consider the following examples: 1. Do televisions increase life expectancy? There is a high positive correlation between the number of television sets per person in a country and life expectancy in that country. That is, nations with more TV sets per person have higher life expectancies. Does this imply that we could extend people s lives in a country just by shipping TVs to them? No, of course not. The correlation between these two variables stem from the nation s income: Richer nations have more TVs per person than poorer ones. These nations also have access to better nutrition and health care. 2. Are big hospitals bad for you? 3. Do firefighters make fires worse? 45 / 46

58 Examining Relationships Correlation vs Causation You may have already encountered the statement that Correlation does not imply causation. This is an important concept to grasp, because even a strong correlation between two variables is not enough to draw conclusions about causation. For instance, consider the following examples: 1. Do televisions increase life expectancy? 2. Are big hospitals bad for you? A study has found positive correlation between the size of a hospital (measured by its number of beds) and the median number of days that patients remain in the hospital. Does this mean that you can shorten a hospital stay by choosing a small hospital? 3. Do firefighters make fires worse? 45 / 46

59 Examining Relationships Correlation vs Causation You may have already encountered the statement that Correlation does not imply causation. This is an important concept to grasp, because even a strong correlation between two variables is not enough to draw conclusions about causation. For instance, consider the following examples: 1. Do televisions increase life expectancy? 2. Are big hospitals bad for you? 3. Do firefighters make fires worse? A magazine has observed that "there s a strong positive correlation between the number of firefighters at a fire and the damage the fire does. So sending lots of firefighters just causes more damage." Is this reasoning flawed? 45 / 46

60 Examining Relationships Reverse Causality In addition to correlation feeding through a third (sometimes unobserved) variable, in economics, we often run into reverse causality problems. Earlier, we showed that real GDP per capita and education (measured by average years of schooling) are positively correlated. This could be because: 1. Rich countries can afford more (and better) education. That is, an increase in GDP per capita causes an increase in schooling. 2. More (and better) education promotes innovation and productivity. That is, an increase in schooling causes an increase in GDP per capita. The relationship between GDP per capita and education suffers from reverse causality. To reiterate, although we can make the statement that x and y are correlated, we do not know whether y is caused by x or vice versa. This is one of the central problems in empirical research in economics. In the course of the MSc, you will learn methods that allow you to identify the causal mechanisms in the relationship between y and x. 46 / 46

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Random Variables. Chapter 2. Random Variables 1

Random Variables. Chapter 2. Random Variables 1 Random Variables Chapter 2 Random Variables 1 Roulette and Random Variables A Roulette wheel has 38 pockets. 18 of them are red and 18 are black; these are numbered from 1 to 36. The two remaining pockets

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

CHAPTER 5 Round-off errors

CHAPTER 5 Round-off errors CHAPTER 5 Round-off errors In the two previous chapters we have seen how numbers can be represented in the binary numeral system and how this is the basis for representing numbers in computers. Since any

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Notes on Continuous Random Variables

Notes on Continuous Random Variables Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing! MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

3: Summary Statistics

3: Summary Statistics 3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

6 3 The Standard Normal Distribution

6 3 The Standard Normal Distribution 290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

More information

Chapter 7 - Roots, Radicals, and Complex Numbers

Chapter 7 - Roots, Radicals, and Complex Numbers Math 233 - Spring 2009 Chapter 7 - Roots, Radicals, and Complex Numbers 7.1 Roots and Radicals 7.1.1 Notation and Terminology In the expression x the is called the radical sign. The expression under the

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

6.4 Logarithmic Equations and Inequalities

6.4 Logarithmic Equations and Inequalities 6.4 Logarithmic Equations and Inequalities 459 6.4 Logarithmic Equations and Inequalities In Section 6.3 we solved equations and inequalities involving exponential functions using one of two basic strategies.

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Solving Quadratic Equations

Solving Quadratic Equations 9.3 Solving Quadratic Equations by Using the Quadratic Formula 9.3 OBJECTIVES 1. Solve a quadratic equation by using the quadratic formula 2. Determine the nature of the solutions of a quadratic equation

More information

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches) PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Mind on Statistics. Chapter 2

Mind on Statistics. Chapter 2 Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

More information

AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

On Correlating Performance Metrics

On Correlating Performance Metrics On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are

More information

Statistics E100 Fall 2013 Practice Midterm I - A Solutions

Statistics E100 Fall 2013 Practice Midterm I - A Solutions STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 1 OF 5 Statistics E100 Fall 2013 Practice Midterm I - A Solutions 1. (16 points total) Below is the histogram for the number of medals won

More information

Unit 7: Normal Curves

Unit 7: Normal Curves Unit 7: Normal Curves Summary of Video Histograms of completely unrelated data often exhibit similar shapes. To focus on the overall shape of a distribution and to avoid being distracted by the irregularities

More information

2. Discrete random variables

2. Discrete random variables 2. Discrete random variables Statistics and probability: 2-1 If the chance outcome of the experiment is a number, it is called a random variable. Discrete random variable: the possible outcomes can be

More information

WEEK #22: PDFs and CDFs, Measures of Center and Spread

WEEK #22: PDFs and CDFs, Measures of Center and Spread WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook

More information

Random variables, probability distributions, binomial random variable

Random variables, probability distributions, binomial random variable Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

More information

Chapter 5. Random variables

Chapter 5. Random variables Random variables random variable numerical variable whose value is the outcome of some probabilistic experiment; we use uppercase letters, like X, to denote such a variable and lowercase letters, like

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Elasticity. I. What is Elasticity?

Elasticity. I. What is Elasticity? Elasticity I. What is Elasticity? The purpose of this section is to develop some general rules about elasticity, which may them be applied to the four different specific types of elasticity discussed in

More information

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

More information

The Normal Distribution

The Normal Distribution Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

2. Filling Data Gaps, Data validation & Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics 2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)

More information

Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis. Psychology 3256 Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative

More information

STAT 360 Probability and Statistics. Fall 2012

STAT 360 Probability and Statistics. Fall 2012 STAT 360 Probability and Statistics Fall 2012 1) General information: Crosslisted course offered as STAT 360, MATH 360 Semester: Fall 2012, Aug 20--Dec 07 Course name: Probability and Statistics Number

More information