Introduction to R. I. Using R for Statistical Tables and Plotting Distributions



Similar documents
Review of Random Variables

Probability Calculator

Chapter 3 RANDOM VARIATE GENERATION

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Non-Inferiority Tests for Two Means using Differences

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

4. Continuous Random Variables, the Pareto and Normal Distributions

Lecture 8. Confidence intervals and the central limit theorem

Software for Distributions in R

Confidence Intervals for the Difference Between Two Means

Gamma Distribution Fitting

Confidence Intervals for Exponential Reliability

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

2WB05 Simulation Lecture 8: Generating random variables

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Lecture 5 : The Poisson Distribution

1 Sufficient statistics

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Maximum Likelihood Estimation

Point Biserial Correlation Tests

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

Permutation Tests for Comparing Two Populations

Bootstrap Example and Sample Code

PROBABILITY AND SAMPLING DISTRIBUTIONS

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Normality Testing in Excel

Notes on Continuous Random Variables

Non-Inferiority Tests for One Mean

Generalized Linear Models

Normal distribution. ) 2 /2σ. 2π σ

Confidence Intervals for Cp

Confidence Intervals for One Standard Deviation Using Standard Deviation

Notes on the Negative Binomial Distribution

Dongfeng Li. Autumn 2010

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

MBA 611 STATISTICS AND QUANTITATIVE METHODS

10.2 Series and Convergence

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

How To Test For Significance On A Data Set

Descriptive Statistics

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Introduction to Quantitative Methods

SAS Software to Fit the Generalized Linear Model

5. Continuous Random Variables

6.4 Normal Distribution

Package SHELF. February 5, 2016

The normal approximation to the binomial

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

NCSS Statistical Software

Binomial Distribution n = 20, p = 0.3

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

Tutorial 5: Hypothesis Testing

Exact Confidence Intervals

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

1 Nonparametric Statistics

INSURANCE RISK THEORY (Problems)

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Sampling Distributions

Probability Distributions

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Exploratory Data Analysis

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

Data Analysis Tools. Tools for Summarizing Data

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

6 3 The Standard Normal Distribution

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Univariate Regression

1 The Brownian bridge construction

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Basic Statistical and Modeling Procedures Using SAS

Practice Problems for Homework #6. Normal distribution and Central Limit Theorem.

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

The normal approximation to the binomial

The Standard Normal distribution

START Selected Topics in Assurance

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Questions and Answers

Math 425 (Fall 08) Solutions Midterm 2 November 6, 2008

Multiple Linear Regression

Nonparametric Statistics

Stats on the TI 83 and TI 84 Calculator

Principle of Data Reduction

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Poisson Models for Count Data

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Section 5.1 Continuous Random Variables: Introduction

Math 461 Fall 2006 Test 2 Solutions

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Transcription:

Introduction to R I. Using R for Statistical Tables and Plotting Distributions The R suite of programs provides a simple way for statistical tables of just about any probability distribution of interest and also allows for easy plotting of the form of these distributions. In what follows below, R commands are set in bold courier. Note that R commands are CASE-SENSITIVE, so be careful when typing. General Syntax for Distribution Functions There are four basic R commands that apply to the various distributions defined in R. Letting DIST denote the particular distribution and parameters the parameters to specify that distribution (see Table one), the basic syntax of the four basic commands are: ddist(x, parameters) probability density of DIST evaluated at x. qdist(p, parameters) returns x satisfying Pr(DIST(parameters) x) =p pdist(x, parameters) returns Pr(DIST(parameters) x) rdist(n, parameters) generates n random variables from DIST(parameters) Table 1. Common continuous probability distributions in R. If certain parameters are not specified, the default is assumed. For example, a unit normal unless the mean and variance are specified, and a central distribution unless a noncentrality parameter is given. Other continuous distributions are given in Table 2 (below). Distribution Normal Student s t χ 2 F Syntax norm() unit normal (the default) norm(µ, σ 2 ) normal with mean µ and variance σ 2. t(df) central t with df degrees of freedom (default) t(df,ncp) noncentral t with noncentrality parameter ncp chisq(df) central χ 2 with df degrees of freedom (default) chisq(df,ncp) noncentral χ 2 with noncenturality parameter ncp f(df1,df2) central F with df 1 and df 2 degrees of freedom (default) f(df1,df2,ncp) noncentral F with noncenturality parameter ncp Plotting Probability Distributions One very powerful feature of R is its ability to quickly plot any number of functions for the desired distribution. The command used for plotting is the curve function: curve(function,xlow,xhigh) plots function between xlow and xhigh

2 I: R as a Statical Calculator curve(function,xlow,xhigh,n=500) plots function using 500 equallyspaced points Example 1: Suppose we wish to plot a (central) χ 2 distribution with 5 degrees of freedom, say between 0 and 30. To do this in R type the following: curve(dchisq(x, 5), 0, 30) typing curve(dchisq(x, 5), 0, 30, n = 500) uses 500 (equally spaced) points to draw the curve Suppose that we now wish to examine the effect of moving from a central χ 2 distribution to an increasingly noncentral χ 2. The distribution function for the noncentral χ 2 distribution in R is given by dchisq(x, df,ncp), where df are the degrees of freedom and ncp the noncentrality parameter. To add an additional curve to the one plotted, we use the add=true command in the curve function. For example, to add a curve for e χ 2 with noncentrality parameter 10, curve(dchisq(x, 5,10), 0, 30,add=TRUE) R also lets you specify the color, by using the col = "COLOR" command, for example curve(dchisq(x, 5,10), 0, 30,add=TRUE,col="pink") adds this curve in pink (R has a wide range of colors, so have fun). Example 2: example, R can also be used to plot cumulative probability densities. For curve(dt(x, 5), 0, 30) plots the cumulative probabilities for a student s t with 5 degrees of freedom for x ranging from -3 to +3. curve(pt(x,5,1), -3, 3, add=true, col="green") overlays a green curve for cumulative probability for a noncentral t with noncentrality parameter 1.

INTRODUCTION TO R 3 We can also point the required x values for a given p value. For example, for a (central) F distribution with 2 and 20 degrees of freedom, since now x (the probability) ranges from zero to one, curve( qf(x,2,20), 0, 1) Obtaining p and Critical Values The pdist(x, parameters) returns the p value associated with a particular x value drawn from that distribution, i.e., returns Pr(DIST(parameters) x). Conversely, the qdist(p, parameters) returns the x value to give a particular p value. Example 3:. What is the x value from a student s t distribution with 12 degrees of freedom so that there is a 99 % probability that a random value is below x? Typing qt(0.99,12) in R returns 2.680998. Likewise, if we have a noncentral t (with noncentrality parameter 1.75) and observe a value of 3.5, the probability of being this value or less is given by pt(3.5,12,1.75) or 0.91566. The p value is one minus this (as Pr(DIST >x) = 1-Pr(DIST x), or 1- pt(3.5,12,1.75) which equals 0.08433496. Example 4: Two-side confidence α-level confidence intervals are given by x min,x max, where Pr(DIST (x min )=(1 α/2) and Pr(DIST (x max )= 1 (1 α/2). For example, an 99.9% confidence interval for a unit normal has α =0.999, and hence (1 α)/2 =0.0005. We can obtain upper and low values in R by typing qnorm( 0.0005 ) and qnorm( 0.9995 )

4 I: R as a Statical Calculator As an example of using other features of R, we can also compute these by first specifying to R the α value we wish to use by typing alpha <- 0.999 This tells R that we have assigned alpha the value of 0.999. The lower and upper (respectively) values follow from qnorm((1-alpha)/2) and qnorm( 1-1-alpha)/2 R again returns values of -3.290527, 3.290527. Another useful R is that we can input a vector of values and for many of the distribution command R will output a vector of the designed values. We can define the vector of the upper and lower probabilities (for which we desire associated critical values) by xvals <- c((1-alpha)/2, 1-(1-alpha)/2) Here the command c() means create a vector from the (in this case two ) items in the list, and assign the variable xvals the value of this vector. Typing in qnorm( xvals ) R again returns -3.290527 3.290527. Example 4: While confidence intervals for normal and student t variables are symmetric, those for χ 2 and F distributions are not. For example, the 95% confidence interval for an F drawn from a distribution with 2 and 20 degrees of freedom is (lower value) alpha <- 0.95 qf((1-alpha)/2,2,20) or 0.02534988 and an upper value of qf(1-(1-alpha)/2,2,20) for 4.461255. Other Continuous Probability Distributions Table 2. Some other continuous probability distributions in R. If certain parameters are not specified, the default is assumed. For exact details of how the parameters are defined, see the R help file. Note that this is not a complete set, so check the help file if a distribution is not listed here, as R likely has it.

INTRODUCTION TO R 5 Distribution Exponential Uniform Syntax exp() Exponential with rate 1 (the default) exp(rate) Exponential with rate set by user unif() Uniform on (0,1) (the default) unif(min, max) Uniform over (min, max) Log Normal lnorm() log normal with meanlog=0, SD of log = 1 (the default) lnorm(meanlog,sdlog) log normal with meanlog, SDlog Beta beta(shape1, shape2) beta with shape parameters shape1, shape 2 Gamma Weibull gamma(shape) gamma with shape user specified, scale=1. (the default) gamma(shape,scale) scale parameter set by user. weibull(shape) Weibull with shape user specified, scale=1. (the default) weibull(shape,scale) scale parameter set by user. Wilcox wilcox(m,n) Distribution for Wilcoxon rank sum statistic with sample sizes m and n. Discrete Probability Distributions Table 3 lists several of the discrete probability distributions avilable in R. Table 3. Some of the discrete probability distributions in R. Distribution Binomial Geometric Poisson Negative Binomial Syntax binom(n,p) Binomial with sample size n and success probability p geom(p) Geometric with success probability p poism(λ) Poisson with mean λ nbinom(tar,p,mu) Negative Binomial with target tar, success probability p and mean µ Example 5. In order to plot a discrete function, we must first generate a sequence of integers. R can be used to generate a sequence as follows: x <- seq(0,25,1) this generates a seqeunce from 0 to 25 in steps of one and then stores the resulting sequence in the variable x. More generaly to generate a sequence from a to b in steps of c, we would use seq(a,b,c).

6 I: R as a Statical Calculator To plot the first 26 values (for 0 to 25) for a Poisson distribution with parameter λ =3,inRusing the above sequence, now type: plot(dpois(x,3)) Adding the type ="" command can change the points into a step latter graph (type = "s") or a histogram (type ="h"), so that plots a step latter function while plots a histogram. plot(dpois(x,3),type="s") plot(dpois(x,3),type="h") You will notice that the plot values are essentially zero from values greater than 10. If we wish more discrimation, we can plot the log of the probabilities. This is done in the ddist command by using the option log=true, which returns the log of the probabilities (log=false is the default which is used unless otherwise specificed by the user). Hence, to plot the log of probabilities (as a histogram) for the 51 terms in a Binominal with n =50and success probability p =0.8, type x <- seq(0,50,1) then while plot(dbinom(x,50,0.8,log=true),type="h") plot(dbinom(x,50,0.8),type="h") plots the probabilities on a linear scale.