EMPIRICAL FREQUENCY DISTRIBUTION



Similar documents
Exploratory Data Analysis

An Introduction to Basic Statistics and Probability

Lecture 8. Confidence intervals and the central limit theorem

Section 5 Part 2. Probability Distributions for Discrete Random Variables

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

PROBABILITY AND SAMPLING DISTRIBUTIONS

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

4. Continuous Random Variables, the Pareto and Normal Distributions

Chapter 4. Probability and Probability Distributions

6.4 Normal Distribution

The Normal distribution

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Week 4: Standard Error and Confidence Intervals

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Lecture 8: More Continuous Random Variables

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Lecture 5 : The Poisson Distribution

Sampling Distributions

Fairfield Public Schools

Goodness of Fit. Proportional Model. Probability Models & Frequency Data

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Binomial Distribution n = 20, p = 0.3

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

DATA INTERPRETATION AND STATISTICS

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Chi Square Tests. Chapter Introduction

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Probability Distributions

Descriptive Statistics

Variables. Exploratory Data Analysis

Chapter 3 RANDOM VARIATE GENERATION

Beta Distribution. Paul Johnson and Matt Beverlin June 10, 2013

The Binomial Probability Distribution

Review of Random Variables

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

The Standard Normal distribution

How To Write A Data Analysis

8. THE NORMAL DISTRIBUTION

Statistics. Measurement. Scales of Measurement 7/18/2012

AP Statistics Solutions to Packet 2

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Quantitative Methods for Finance

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Simple Linear Regression Inference

2 GENETIC DATA ANALYSIS

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

2 Binomial, Poisson, Normal Distribution

Descriptive Statistics

Notes on Continuous Random Variables

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

VI. Introduction to Logistic Regression

II. DISTRIBUTIONS distribution normal distribution. standard scores

E3: PROBABILITY AND STATISTICS lecture notes

Chapter 5. Random variables

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Foundation of Quantitative Data Analysis

Study Guide for the Final Exam

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

A logistic approximation to the cumulative normal distribution

Descriptive Statistics and Measurement Scales

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Probability. Distribution. Outline

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Normal distribution. ) 2 /2σ. 2π σ

Chapter 5: Normal Probability Distributions - Solutions

MEASURES OF VARIATION

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Normal Distribution as an Approximation to the Binomial Distribution

COMMON CORE STATE STANDARDS FOR

Skewness and Kurtosis in Function of Selection of Network Traffic Distribution

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

MAT 155. Key Concept. September 27, S5.5_3 Poisson Probability Distributions. Chapter 5 Probability Distributions

Chapter 4 Lecture Notes

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Week 3&4: Z tables and the Sampling Distribution of X

STAT 35A HW2 Solutions

THE BINOMIAL DISTRIBUTION & PROBABILITY

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

What is the purpose of this document? What is in the document? How do I send Feedback?

Chapter 7. One-way ANOVA

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

Pr(X = x) = f(x) = λe λx

Estimation and Confidence Intervals

Gamma Distribution Fitting

3.4 Statistical inference for 2 populations based on two samples

5.1 Identifying the Target Parameter

Lesson 4 Measures of Central Tendency

Transcription:

INTRODUCTION TO MEDICAL STATISTICS: Mirjana Kujundžić Tiljak EMPIRICAL FREQUENCY DISTRIBUTION observed data DISTRIBUTION - described by mathematical models 2 1

when some empirical distribution approximates a particular probability distribution theoretical knowledge of that distribution could be used answer questions about data evaluation of probabilities is required 3 PROBABILITY (P) measures uncertainty measures the chance of a given event occurring 0 P 1 P = 0 event cannot occur P = 1 event must occur Q = 1-P probability of the complementary event (the event not occurring) 4 2

PROBABILITY (P) Various approaches in probability calculations: Subjective personal degree of belief that the event will occur (e.g. the world sill come to an end in the year 2050) Frequentist the proportion of times the event would occur if the experiment will be repeated a large number of times (e.g. the number of times we would get a head") A priori requires knowledge of the theoretical model probability distribution which describes the probabilities of all possible outcomes of the experiment (e.g. genetic theory allows us to describe the probability distribution for eye color in a baby born t a blue-eyed women and brown-eyed man by initially specifying all possible genotypes of eye color in the baby and their probabilities) 5 PROBABILITY (P) The addition rule: if two events (A and B) are mutually exclusive the probability that either one or the other occurs (A or B) is equal to the sum of their probabilities Prob (A or B) = Prob (A) + Prob (B) The multiplication rule: if two events (A and B) are independent the probability that both events occur (A and B) is equal to the product of the probability of each Prob (A and B) = Prob (A) Prob (B) 6 3

RANDOM VARIABLES random variable a quantity that can take any one of a set of mutally excluseve values with a given probability discrete or discontinuous random variable = numerical values are integer E.g. number of children in family 0, 1, 2, 3, k continuus random variable = numerical values are real numbers E.g. body weight 72,35 kg, blood glucose level 7,2 mmol/l 7 PROBABILITY DISTRIBUTION Probability distribution shows the probabilities of all possible values of the random variable a theoretical distribution that is expressed mathematically has a mean and variance that are analogous to those of and empirical distribution parameters summary measures (e.g. mean, variance) characterizing that distribution are estimated in the sample by relevant statistics depending on whether the random variable is discrete or continuous the probability distribution can be either discrete or continuous 8 4

PROBABILITY DISCRETE (Binomial, Poisson) the probability can be derived corresponding to every possible value of the random variable the sum of all such probabilitis is 1 9 PROBABILITY CONTINUOUS (Normal, Chi-squared, t, F) the probability of the random variable, x, taking values in certain ranges, could be derived if the horizontal axis represents the values of x the curve from the equation of the distribution could be drawn (= probability density function) Total area under the curve = 1 represents the probability of all possible events Probability that x lies between two limits is equal to the area under the curve between these values 10 5

PROBABILITY Probability that x lies between two limits? 11 PROBABILITY Probability that x lies between two limits? 12 6

THE NORMAL (GAUSSIAN) DISTRIBUTION one of the most important distributions in statistics german mathematician C.F. Gauss the most biological measurements follow normal distribution it is used in many analytical models 13 THE NORMAL (GAUSSIAN) DISTRIBUTION Probability density function: f (x) = (1/σ 2π) e a a = -1/2 ((x-µ)/σ) 2 14 7

THE NORMAL (GAUSSIAN) DISTRIBUTION Completely described by two parameters: - mean (µ ) -variance(σ 2 ) X~ N (µ,σ 2 ) 15 THE NORMAL (GAUSSIAN) DISTRIBUTION 16 8

THE NORMAL (GAUSSIAN) DISTRIBUTION normal distribution curve: area under curve = 1 bell-shaped (unimodal= symmetrical about its mean apsolute maximum for x = µ shifted to the right if the mean is increased and to the left if the mean is decreased (assuming constant variance) flattened as the variance is increased but becomes more peaked as the variance is decreased (for a ficed mean) 17 THE NORMAL (GAUSSIAN) DISTRIBUTION the mean and median and mode of a Normal distribution are equal the probability (P) that a normally distributed random variable, x, with mean, µ, and standard deviation, σ, lies between: (µ - σ) and (µ + σ) = 0,68 (µ - 1.96σ) and (µ + 1.96σ) = 0.95 (µ 2.58σ) and (µ + 2.58σ) = 0.99 these intervals may be used to define reference intervals 18 9

THE NORMAL (GAUSSIAN) DISTRIBUTION changing µ, constant σ: 19 THE NORMAL (GAUSSIAN) DISTRIBUTION changing µ, constant σ: 20 10

THE NORMAL (GAUSSIAN) DISTRIBUTION changing σ, constant µ: 21 THE NORMAL (GAUSSIAN) DISTRIBUTION changing σ, constant µ: 22 11

THE NORMAL (GAUSSIAN) DISTRIBUTION changing σ, constant µ: 23 THE STANDARD NORMAL DISTRIBUTION transformation of original value (x) to Standardized Normal Deviate (SND) (z i ): z i = (x 1 - µ)/σ sample: = random variable that has a Standard Normal distribution z i = (x 1 - x)/s mean (µ) = 0; variance (σ 2 ) = 1; N (0,1) 24 12

THE STANDARD NORMAL DISTRIBUTION X 1 Z 1 X 2 Z 2 X 3 Z 3 X n Z n, s =?, s z =? 25 THE STANDARD NORMAL DISTRIBUTION X 1 Z 1 X 2 Z 2 X 3 Z 3 X n Z n, s =0, s z =1 26 13

THE STANDARD NORMAL DISTRIBUTION X 1 Z 1 X 2 Z 2 X 3 Z 3 X n Z n, s =0, s z =1 Z~N(0,1) 27 THE STANDARD NORMAL DISTRIBUTION 28 14

THE STANDARD NORMAL DISTRIBUTION 29 THE STANDARD NORMAL DISTRIBUTION 30 15

THE STANDARD NORMAL DISTRIBUTION 31 THE STANDARD NORMAL DISTRIBUTION 32 16

THE STUDENT S t-distribution W.S. Gossett (pseudonym Student) parameter that characterizes the t-distribution = the degrees of freedom Similar shape as normal distribution (more spread out with longer tails) as the degrees of freedom increase its shape approaches Normality Useful for calculating confidence intervals for testing hypotheses about one or two means 33 THE STUDENT S t-distribution 34 17

THE CHI-SQUARE (χ 2 ) DISTRIBUTION a right skewed distribution taking positive values characterized by its degrees of freedom its shape depends on the degrees of freedom it becomes more symmetrical and approaches Normality as they increase useful for analysing categorical data 35 THE CHI-SQUARE (χ 2 ) DISTRIBUTION 36 18

THE F-DISTRIBUTION skewed to the right defined by a ratio the distribution of a ratio of two estimated variances calculated from Normal dana approximates the F-distritution characterized by degrees of freedom of the numerator and the denominator of the ratio useful for comparing two variances, and more than two means using the analysis of variance 37 THE LOGNORMAL DISTRIBUTION the probability distribution of a random variable whose log (to base 10 or e) follows the Normal distribution highly skewed to the right logs of row data skewed to the right an empirical distribution that is nearly Normal = data approximate Log-normal distribution geometric mean = a summary measure of location 38 19

THE LOGNORMAL DISTRIBUTION 39 THE BINOMIAL DISTRIBUTION theoretical distribution for discrete random variable definition: Jacob Bernuolli, 1700. two outcomes: success i failure n events E.g. n = 100 unrelated women undergoing IVF outcome = success (pregnancy) or failure 40 20

THE BINOMIAL DISTRIBUTION Two parameters that describe the Binomial distribution: n = number of indivudial in the sample (or repetitions of a trial) π = the true probability of success for each individual (or in each trial) X~B(n,p) 41 THE BINOMIAL DISTRIBUTION Mean = nπ (the value for the random variable that we expect if we look at n individuals, or repeat the trial n times) Variance = nπ (1- π) small n the distribution is skewed to the right if π <0.5 the distribution is skewed to dhe right if π >0.5 42 21

THE BINOMIAL DISTRIBUTION the distribution becomes more symmetrical as the sample size increases and approximates to the Normal distribution if both nπ and nπ (1 π) are greater than 5 the properties of the Binomial distribution could be use when making inferences about proportions the Normal approximation of the Binomial distribution when analyzing proportions is often used 43 THE BINOMIAL DISTRIBUTION Example: gene recombination Chromosomal locus: 2 allels: A and a p = probability of A Q = 1 p = probability of a P(A) = p, P(a) = q, (p+q = 1) 44 22

THE BINOMIAL DISTRIBUTION conception outcame space:{aa, Aa, aa} P(AA) = P(A) * P(A)= p 2 P(aa) = P(a) * P(a) = q 2 P(Aa) = P(A) * P(a) = pq P(aA) = P(a) * P(A)= qp 1,0 2pq p 2 + 2pq + q 2 = (p+q) 2 = 1 2 = 1 45 THE BINOMIAL DISTRIBUTION 46 23

THE BINOMIAL DISTRIBUTION Example probability of genotypes: frequency of gene A = 0,33 frequency of gene a = 0,67 (p+q) 2 = (0,33 + 0,67) 2 = 0,33 2 + 2 * 0,33 * 0,67 + 0,67 2 P (AA)= 0,33 2 = 0.1089 P (Aa) = 0,33 * 0,67 = 0,2211 P (aa) = 0,67 * 0,33 = 0,2211 P (aa) = 0,67 2 = 0,4489 47 THE BINOMIAL DISTRIBUTION Graphical presentatnion probabilities of different genotypes 0,5 0,45 0,4 0,35 0,3 P 0,25 0,2 0,15 0,1 0,05 0 AA Aa aa 48 24

THE BINOMIAL DISTRIBUTION Example death outcome as binomial distribution: Letality od neke bolesti = 0,30..(30/100) Survival probability = 0,70 n = 5 Binom: (0,30 + 0,70) 5 Number of death examinees Binom Probability 5 (everybody) 4 3 2 1 0 (nobody) P 5 5p 4 q 10p 3 q 2 10p 2 q 3 5pq 4 q 5 0,00243 0,02835 0,13230 0,30870 0,36015 0,16807 Total 1,00000 49 THE POISSON DISTRIBUTION Poisson (begining of XIX century) the Poisson random variable = the count or the number of events that occur independently and randomly in time or space at some average rate, µ (0 and all positive integers) example: the number of hospital admissions per day typically follows the Poisson distribution use of the Poisson cistribution to calculate the probability of a certain number of admissions on any particular day 50 25

THE POISSON DISTRIBUTION Mean (average rate, µ) = the parameter that describes the Poisson distribution The mean equals the variance in the Poisson distribution Unimodal curve, right skewed if the mean is small, but becomes more symmetrical as the mean increases, when it approximates n Normal distribution 51 26