Cancer - Interpreting Interrogation of Permutation and Biostatic Analysis

Size: px
Start display at page:

Download "Cancer - Interpreting Interrogation of Permutation and Biostatic Analysis"

Transcription

1 Using Permutation Tests and Bootstrap Confidence Limits to Analyze Repeated Events Data from Clinical Trials Laurence Freedman, MA Biometry Branch, National Cancer Institute, Bethesda, Maryland Richard Sylvester, ScD EORTC Data Center, Bruxelles, Belgium David P. Byar, MD Chief, Biometry Branch, National Cancer Institute, Bethesda, Maryland ABSTRACT" In clinical trials comparing treatments for superficial bladder cancer, patients are at risk of repeated recurrences of their disease. Statistical methods of analyzing such data are required. This article presents a nonparametric approach. A statistical test to compare the recurrence or tumor rates in two treatment groups, using the randomization distribution, is described. Confidence intervals for the rate ratio are determined from the bootstrap distribution. The implementation of both requires Monte Carlo methods. Computer simulations support the use of these nonparametric methods when there are more than 60 recurrences in each treatment group. An example illustrating their use is given. The strategy adopted for analysis of these data could be applied to other clinical trials where standard methodology is inappropriate. KEY WORDS: repeated events, permutation test, randomization test, bootstrap confidence limits INTRODUCTION The analysis of data from clinical trials has received much attention from biostatisticians. Methods of analysis are well developed for many commonly encountered circumstances, including response data that are univariate, continuous, or discrete, and possibly subject to censorship. However, there are still some types of data for which methods are not well developed. One of these is where the response to treatment consists of a series of observations Address reprint requests to: Laurence Freedman, Biometry Branch, National Cancer Institute, Executive Plaza North, Room 344, 9000 Rockville Pike, Bethesda, MD Received April 25, 1988; revised August 26, Controlled Clinical Trials 10: (1989) 129

2 130 L. Freedman, R. Sylvester, and D. P. Byar made over time. These observations may or may not be regularly spaced in time and may be continuous or categorical. This article discusses one such example arising from trials of the treatment of superficial bladder cancer. The problems confronting the biostatisticians in these less-charted areas are (1) the definition of useful summary measures of response (data reduction) and (2) statistical estimation and testing of the defined measure or measures. In our example the definition of a summary measure arises fairly naturally and the article concentrates on the second aspect. Nonparametric methods are advocated because of doubts about the appropriateness of simple probability models. These methods may be of use in a wider context, as outlined in the Discussion section. Background In most cancer clinical trials, classical outcomes such as those based on the response rate, time to progression, and duration of survival are appropriate for the assessment and comparison of effects of treatment. In trials measuring the time to an event, it is the first occurrence of the event that is generally of interest, with the patient no longer being followed for subsequent occurrences of the same event. In trials comparing treatments for superficial bladder cancer, the situation is different, however, because patients are followed for multiple occurrences of the same event [1,2]. Thus special techniques are required for efficient analysis of the data. We shall assume that on entry to the trial patients in two or more treatment groups with histologically confirmed Ta and T1 superficial bladder cancer undergo a transurethral resection (TUR) to remove all visible tumor in the bladder. Patients are then followed at regular intervals by cystoscopy for some minimum period of time, for example, every 3 months for a period of at least I year. If during a cystoscopy a tumor recurrence is noted, a TUR is again performed to remove all visible lesions and the patient continues on his assigned treatment. Typical data are presented in Figure 1, where the line lengths represent the duration of follow up for individual patients and the open circles represent cystoscopies at which recurrences were detected. Sylvester [1] and Byar et al. [2] have noted that common outcomes such as the percent of patients with recurrence, the percent with recurrence at a given time, and the disease-free interval or time to first recurrence are inefficient, essentially because they make no use of data collected after the first recurrence. Other classical outcomes such as time to progression (increase in T category or appearance of distant metastases) and duration of survival are likewise inappropriate since these events can be expected to occur in a maximum of only 10%-15% of the patients under study. This article presents summary measures of treatment effectiveness that are appropriate for such trials and that may serve as a basis for treatment comparisons. STATISTICAL METHOD Notation Assume that there are K treatment groups with nk patients (k = 1... K) in each. The ith patient in the kth group is followed for a period of length

3 Analyzing Repeated Events Data 131 O v e O y v ~) v O O O O ~) a' m t2 i t8 2'4 3'0 m COMPLETED MONTHS OF FOLLOI4-UP FIGURE 1 Diagram showing follow-up of patients treated for superficial bladder cancer. Each line represents the follow-up of one patient. The open circles indicate examinations at which a recurrent tumor was found. tik and during this period there are rik recurrences (i.e., examinations where tumor is found) and Sik tumors observed (since more than one tumor may be observed at an examination). nk We denote the total observation period in the kth group by Tk = Y, tik, i=1 the total number of recurrences by Rk = bysk = n k Y~ Sik. i=1 nk Y~ ra, and the total number of tumors i=1 Measuring Treatment Effect If the intervals between each recurrence of tumor are independently and identically distributed across patients and time with the exponential probability distribution function, then the recurrences may be considered to form a Poisson renewal process [3].

4 132 L. Freedman, R. Sylvester, and D. P. Byar For the kth treatment group the maximum likelihood estimate of the recurrence rate )~k for that group is given by Kk = RdTk. This estimate has the appeal of simplicity and may be regarded as an average of the individual rates ridtik weighted by tidtk, that is, the contribution of the ith patient in treatment group k to the total follow-up time for that group. It may be objected that this estimate is of little use, since the Poisson renewal process assumption is unlikely to hold true in practice. Departures from the Poisson model are indeed readily seen from clinical trial data. The Poisson model predicts, for example, that the recurrence rate will be constant throughout the follow up period, whereas data from trials show a recurrence rate that decreases sharply after the first 3 months following the initial treatment [1,4]. However, even when the recurrence rate varies with time, the estimate Kk will still represent a measure of the average recurrence rate over the period of observation. The recurrence rate, )~k, can be generalized to define a tumor rate, %, which describes the rate at which individual tumors recur in the bladder. Again, under a Poisson renewal process the maximum likelihood estimate of ~k is ~k = SdG. Thus instead of simply using the presence or absence of tumor as a criterion, this measure incorporates the number of tumors found at each cystoscopy. Comparison of Treatment Effects Parametric approach Under the assumption that the recurrences form a Poisson renewal process, the total number of recurrences R~ has a Poisson distribution, with mean dependent on the recurrence rate )~k. Specifically the density function is flrk) = e ~krk(kktk) ak/(rk)! Potthoff and Whittinghill [5] provide three tests that may be used to test whether the recurrence rates )~k in each group are equal. In the special case when there are just two treatment groups (k=2) an estimate of the ratio of the two recurrence rates K1/~2 may be calculated. One may then compare the treatments by testing the null hypothesis that K1/)~2 = 1 against a one- or two-sided alternative by noting that )~1/)~2 has (under the null hypothesis) a central F distribution. Confidence limits for the ratio of the recurrence rates may also be determined [6-8]. As noted in the section on Measuring Treatment Effect, the Poisson assumption is, in practice, not always justified. The validity of the statistical test and confidence limits based on the F distribution is therefore questionable.

5 Analyzing Repeated Events Data 133 The next two sections develop distribution-free procedures for significance testing and interval estimation. Nonparametric approach--statistical testing When there are two treatment groups a randomization test may be used to test the null hypothesis that M = X2. This test makes no assumptions about the distribution of times to recurrence, either for individual patients or for groups. The validity of the test requires only that the two groups of patients be followed in a comparable manner. A longer interval of follow up or an increased frequency of examination in one treatment group can bias its comparison with another group. The idea behind the randomization test is as follows. If the total number of patients is denoted by N, where N = n~ + n2, then the number of possible ways of allocating the N patients in the study with nl in treatment group 1 and n2 in gr up 2 equals (N)" n~ F r each all cati n ne c uld c mpute ~1'~2' and their ratio ~.JK2. Assuming that the observed ratio from the actual experiment is greater than 1, then the one-tailed p value is simply the number of permutations giving ratios greater than the observed ratio, divided by the total number of possible permutations, that is ~ "(N~. The two-tailed p value \ nl ] would be the number of permutations giving ratios greater than the observed ratio or less than its reciprocal, divided by the total number of possible permutations. Inpracticethenumber( N)nl istoolarge, exceptforverysmallvaluesof n~ and n2, to allow calculation of the recurrence rate ratio for every possible permutation. One therefore adopts a Monte Carlo approach and selects ran- domly, a fixed number of times, from the (N~ permutations. A selection of \ nl,] 2000 samples provides a 95% confidence interval of about for a true p value of The same approach may be used to compare the tumor rates 61 and ~J2 in two treatment groups. Nonparametric approach--interval estimation The randomization test just described provides only a p value for testing the null hypothesis Xl--K2. Confidence limits for the ratio of the recurrence rates M/K2 usefully complement p values when interpreting trial results. Confidence limits show us what range of estimates for K1/)~2 are consistent with the data, a particularly important consideration when we wish to decide whether statistically significant results are clinically relevant or whether nonsignificant results simply reflect an inadequate sample size. Proper interpretation of clinical trial results, therefore, requires a knowledge of the confidence limits for the summary measure of the treatment effect [9,10]. The bootstrap is a general nonparametric method for estimation that may also be used to construct confidence limits [11]. The basic idea is quite simple. For the kth treatment group, the empirical distribution function of the number of recurrences and total observation time for one individual is given by the / \

6 134 L. Freedman, R. Sylvester, and D. P. Byar discrete distribution with probability of 1/nk on each joint observation (rik, tik). Since the true distribution is not known, we use the empirical distribution function in its place. The sampling distribution of Kk may be estimated by considering all possible samples of size nk made by drawing with replacement from the original nk observations. For each such sample, of which there are --~) distinct ones, a value of )kk maybe calculated. Thus, in principle, n k -- the bootstrap sampling distribution of )~k may be determined. As with the randomization test, r/k is usually too large to enumerate the bootstrap distribution, so Monte Carlo methods are used. For the purposes of this article we will concentrate on confidence limits for the ratio )~1/),2 rather than for the rates )~1 or ~2 themselves. Conditioning on the observed sample sizes, we draw two samples with replacement, one of size nl from group 1 and one of size r/2 from group 2, and we calculate K~/K2 for each pair of samples. This is repeated a large number of times, so that a sampling distribution of )~/)~2 is constructed. The confidence limits are then found by simple reference to the appropriate percentiles of the sampling distribution. This procedure is called the "percentile method." For distributions that are median-biased, that is, P(~1/~2 ~ K1/~2) ~ 0.5, Efron [11,12] has proposed a "bias-corrected (BC) percentile method." However, this procedure is not needed in the following simulations since our observed ratio K~/K2 is distributed as M/K2F(2rl,2r2) and P(FK1) = 0.5. The same approach may be used to obtain confidence limits for the ratio of tumor rates, %/~2. In the next section we compare, by computer simulation, the results using the F distribution with the randomization test and bootstrap methods. A FORTRAN computer program (available on request) has been written to perform the randomization tests and calculate the confidence intervals for the ratios of both recurrence and tumor rates. For trials involving more than two treatment groups, the groups are compared pairwise. SIMULATION RESULTS FOR RANDOMIZATION TEST AND BOOTSTRAP CONFIDENCE LIMITS FOR EXPONENTIALLY DISTRIBUTED DATA To test the performance of the nonparametric methods, data were simulated by computer to represent those arising from a Poisson renewal process. Specifically, times to recurrence were generated to follow an exponential distribution. Data for two treatment groups were generated and the ratios of the recurrence rates were set at 1.0, 1.5, 2.0, and 3.0. Equal numbers of patients in each group were included, ranging through 5, 10, 20, 60, and 100 per group. Each patient had exactly one recurrence. In addition two sets of simulations allowed exactly three recurrences per patient with 10 and 30 patients per group, respectively. In both situations, F test results are exact [13]. We sampled 500 and 1000 times from both the randomization and bootstrap distributions. In a practical situation one is unlikely to know the exact distribution of the rate ratio under the null hypothesis. However, when, as in this simulation, the exact distribution is known, one would desire that the randomization test would lead to closely similar results. For selected rate ratios, comparison of the p values in Table 1 permits us to judge how closely the randomization test results

7 Analyzing Repeated Events Data 135 Table 1 Comparison of F Distribution and Randomization Tests Fraction of One-Tailed- Number Randomization Test Fraction Significant by F p Values Less Than of Test at p Equals or Equal to Recurrence Patients Rate Ratio per Group reproduce those of the F test. In this table 500 experiments were performed for each row and the fractions of these significant by the F test at the four selected p values were compared to the fractions whose one-tailed p values for the randomization test were less than or equal to the same four selected p values. This approach effectively compares the average performances of the two tests. By considering the case where nl = n2 = 1 it becomes clear that the randomization distribution does not approximate the F distribution well for very small samples. The results in Table 1 show close agreement for the two tests even when the number of recurrences in each group is as small as 5. Even though the percentages of rejection when the recurrence rate ratio equals 1.0 exceed the nominal levels somewhat, it is the agreement of the two tests that is important. These randomization test results were based on sampling 500 times from the randomization distribution. Results for sampling 1000 times showed no systematic improvement and are therefore not shown. The proportions of significant F tests in Table I for a recurrence rate ratio of 1.0 are somewhat higher than their nominal values. We repeated the simulation with another 500 experiments and this time obtained values close to those expected. We conclude that the discrepancy noted in Table 1 is a chance occurrence. The results comparing the bootstrap and the F distribution confidence limits are shown in Table 2. The bootstrap confidence limits are generally narrower than the correct limits calculated according to the F distribution for simulations where the number of recurrences is 60 or fewer. For simulations with 120 or

8 O', Table 2 Comparison of Confidence Limits from F Distribution and Bootstrap Total Number Recurrence Number of Patients Number of Recurrences of Estimate of Rate Ratio Per Group Per Patient Recurrences Rate Ratio % Confidence Limits a F Distribution B = 500 B = ~r~,=

9 n. o~ t'~ '~B = Number of bootstrap samples.

10 138 L. Freedman, R. Sylvester, and D. P. Byar Table 3 Summary of Data.Set in Byar et al. (2) Treatment Group Total Number of patients Number of patients with recurrence Number of recurrences Number of tumors Total years of follow-up Recurrence rate (per year) Tumor rate (per year) more recurrences the bootstrap confidence limits agree quite well with the F distribution and there is no sign of bias towards intervals that are too narrow or conversely too wide. EXAMPLE We refer to the data set published in Byar et al. [2], in which 61 patients in 3 treatment groups had a total of 82 recurrences and 116 tumors. Table 3 contains a summary of the data with the estimated recurrence rates and tumor rates. Table 4 shows the results of the pairwise comparisons, including significance tests using the randomization distribution, recurrence rate and tumor rate ratio estimates, and their 95% bootstrap confidence limits. There is good agreement between the significance tests and the confidence limits: where a two-tailed test has a p value less than 0.05, the bootstrap 95% confidence limits exclude the value unity, and the converse is also true. DISCUSSION Schenker [14] has expressed qualms about the general usefulness of bootstrap methods for setting confidence limits and illustrates his concern with the problem of placing 90% confidence limits on the variance estimate for a normal distribution if the sample size is too small. He found that the limits Table 4 Pairwise Comparisons of Treatment Groups for Data Set in Byar et al. (2) Groups Groups Groups 1 vs 2 1 vs 3 2 vs 3 Recurrence rates Randomization test p value ~ Estimate of rate ratio % bootstrap confidence limits b (0.16,1.19) (0.11,0.74) (0.31,1.33) Tumor rates Randomization test p value ~ Estimate of rate ratio % bootstrap confidence limits b (0.18,1.48) (0.11,0.92) (0.37,1.69) atwo-tailed test: 500 random permutations. bl000 bootstrap samples.

11 Analyzing Repeated Events Data 139 were too narrow for sample sizes under 100. The same point is made in a different way by Efron and Tibshirani [15]. These results agree with the intuitive notion that one cannot expect repeated bootstrap sampling to provide an accurate representation of the tails of a distribution if the sample is too small. This point can be illustrated by considering very small samples, say three to five observations, drawn from normal distributions because then the bootstrap distribution can be enumerated exhaustively and the expected values of the most extreme points of these distributions can be determined using the expected values and order statistics for the normal distribution. For example, it may be shown that for samples of three from a normal distribution with zero mean and unknown variance, the upper 92.6% confidence limit has an expected value 2.02 whereas the expected value for the same bootstrap upper limit is Recently Efron [16] has proposed improvements on bias-corrected bootstrap confidence limits that are expected to work much better in small samples, although he comments that, "small sample non-parametric confidence intervals are far from well understood... and should be interpreted with some caution." Readers interested in more details will find a number of other articles in the same issue as this recent contribution by Efron as well as commentary on Efron's article from a number of noted statisticians. The methods we have proposed for testing hypotheses and setting confidence limits for clinical trials having multiple outcomes have the appeal of great generality and absence of parametric assumptions. We have demonstrated that the methods behave satisfactorily for exponentially distributed data when compared to hypothesis testing or confidence limits based on the F distribution. Although these results are extremely encouraging they do not prove that these methods will work well in all situations for samples of moderate size, but it is certain, from asymptotic theory, that they would behave well for large samples. Extensions of these techniques to allow for more sophisticated analyses are possible. For example, in analyzing data from a clinical trial one might choose to divide the follow-up period into several intervals and compare treatments separately within them to check for time-varying treatment effects. Another possible extension might involve adjustment for covariates. This could be accomplished by repeatedly applying a parametric or semiparametric adjustment procedure for each sample drawn from the permutation of bootstrap distributions. For instance, in our example one may use a log-linear model to adjust for imbalance in covariates such as initial tumor size. We have found it possible to program this repeated procedure within the statistical computer package for generalized linear modeling known as GLIM [17]. Another possible extension is to develop permutation tests for simultaneous comparison of three or more treatment groups, rather than relying solely on pairwise comparisons. Although we have concentrated on the analysis of recurrence or tumor rates in superficial bladder cancer, the approaches we have suggested are quite general and could be applied to a variety of problems for which standard methods are not available. Possible applications might include cancer prevention trials where the object is to cause regression or disappearance of precancerous lesions that may reappear on more than one future occasion.

12 140 L. Freedman, R. Sylvester, and D. P. Byar Such lesions might include colonic polyps, leukoplakia in the mouth, dysplasia in the esophagus, or skin tumors. Another possible area of application might include serial measurement of quality of life parameters or measures of toxicity following treatment of chronic diseases. Following the lines of this article, the plan would be firstly to define useful summary measures of the data and then to use the nonparametric methods described above to make statistical comparisons and provide the appropriate confidence limits for such measures. Although in our example the choice of summary measures (recurrence rate ratio and tumor rate ratio) was a fairly natural one, in other circumstances the choice will require considerable thought. For example, summarizing quality of life data is a difficult problem that has been discussed by several authors [18,19]. Nevertheless once the summary measures have been chosen, the nonparametric methods illustrated in this article provide a simple and effective solution to the questions of statistical inference and interval estimation. REFERENCES 1. Sylvester, R: The analysis of results in prophylactic superficial bladder cancer studies. In: EORTC Genitourinary Group Monograph 2, Part B: Superficial Bladder Tumors, Schroeder FH, Richards B, Eds. New York: Alan R. Liss, 1985, pp Byar D, Kaihara S, Sylvester R, Freedman L, Hannigan J, Koiso K, Oohashi Y, Tsugawa R: Statistical analysis techniques and sample size determination for clinical trials of treatments for bladder cancer. In: Developments in Bladder Cancer, Denis L, Niijima T, Prout G, Schr6der F, Eds. New York: Alan R. Liss, 1986, pp Cox DR, Miller HD: The Theory of Stochastic Processes. London: Methuen, 1965, pp MRC Working Party on Urological Cancer: The effect of intravesical thiotepa on the recurrence rate of newly diagnosed superficial bladder cancer. Br J Uro157: , Potthoff RF, Whittinghill M: Testing for homogeneity. II. The Poisson distribution. Biometrika 53: , Gehan EA: Statistical methods for survival time studies. In: Cancer Therapy: Prognostic Factors and Criteria of Response, Staquet MJ, Ed. New York: Raven Press, 1975, pp Lee ET: Statistical Methods for Survival Data Analysis. Belmont, CA: Lifetime Learning Publications, 1980, pp Gross AJ, Clark VA: Survival Distributions: Reliability Applications in the Biomedical Sciences. New York: Wiley, 1975, pp Altman DG, Gore SM, Gardner MJ, Pocock SJ: Statistical guidelines for contributors to medical journals. Br Med J 286: , Pocock SJ: Current issues in the design and interpretation of clinical trials. Br Med J 290:3942, Efron B: Nonparametric standard errors and confidence intervals. Can J Stat 9: , Efron B: The jackknife, the bootstrap, and other resampling plans. In: National Science Foundation-Conference Board of the Mathematical Sciences Monograph 38. Philadelphia: Society for Industrial and Applied Mathematics, Cox DR: Some simple approximate tests for Poisson variates. Biometrika 40: , 1953

13 Analyzing Repeated Events Data Schenker N: Qualms about bootstrap confidence intervals. J Am Stat Assoc 80: , Efron B, Tibshirani R: The bootstrap method for assessing statistical accuracy. Behaviormetrika 17:1-35 (section 5), Efron B: Better bootstrap confidence intervals. J Am Stat Assoc 82: , McCullagh P, Nelder JA: Generalized Linear Models. London: Chapman and Hall, 1983, Appendix E, p Nou E, Aberg T: Quality of survival in patients with surgically treated bronchial carcinoma. Thorax 35: , Fayers PM, Jones DR: Measuring and analyzing quality of life in cancer clinical trials: A review. Stat Med 2: , 1983

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Master programme in Statistics

Master programme in Statistics Master programme in Statistics Björn Holmquist 1 1 Department of Statistics Lund University Cramérsällskapets årskonferens, 2010-03-25 Master programme Vad är ett Master programme? Breddmaster vs Djupmaster

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH QATAR UNIVERSITY COLLEGE OF ARTS & SCIENCES Department of Mathematics, Statistics, & Physics UNDERGRADUATE DEGREE DETAILS : Program Requirements and Descriptions BACHELOR OF SCIENCE WITH A MAJOR IN STATISTICS

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Personalized Predictive Medicine and Genomic Clinical Trials

Personalized Predictive Medicine and Genomic Clinical Trials Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Treatment and Surveillance of Non- Muscle Invasive Bladder Cancer

Treatment and Surveillance of Non- Muscle Invasive Bladder Cancer Treatment and Surveillance of Non- Muscle Invasive Bladder Cancer David Josephson, MD FACS Fellowship Director, Urologic Oncology and Robotic Surgery Program Staging Most important in risk assessment and

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

200609 - ATV - Lifetime Data Analysis

200609 - ATV - Lifetime Data Analysis Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 715 - EIO - Department of Statistics and Operations Research 1004 - UB - (ENG)Universitat

More information

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST Zahayu Md Yusof, Nurul Hanis Harun, Sharipah Sooad Syed Yahaya & Suhaida Abdullah School of Quantitative Sciences College of Arts and

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Comparison of resampling method applied to censored data

Comparison of resampling method applied to censored data International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison

More information

Appendix 1: Time series analysis of peak-rate years and synchrony testing.

Appendix 1: Time series analysis of peak-rate years and synchrony testing. Appendix 1: Time series analysis of peak-rate years and synchrony testing. Overview The raw data are accessible at Figshare ( Time series of global resources, DOI 10.6084/m9.figshare.929619), sources are

More information

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Statistics in Medicine Research Lecture Series CSMC Fall 2014 Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power

More information

Chi Square Tests. Chapter 10. 10.1 Introduction

Chi Square Tests. Chapter 10. 10.1 Introduction Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn

Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn Gordon K. Smyth & Belinda Phipson Walter and Eliza Hall Institute of Medical Research Melbourne,

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

More information

ANALYZING NETWORK TRAFFIC FOR MALICIOUS ACTIVITY

ANALYZING NETWORK TRAFFIC FOR MALICIOUS ACTIVITY CANADIAN APPLIED MATHEMATICS QUARTERLY Volume 12, Number 4, Winter 2004 ANALYZING NETWORK TRAFFIC FOR MALICIOUS ACTIVITY SURREY KIM, 1 SONG LI, 2 HONGWEI LONG 3 AND RANDALL PYKE Based on work carried out

More information

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification.

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification. COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences 2015-2016 Academic Year Qualification. Master's Degree 1. Description of the subject Subject name: Biomedical Data

More information

What are confidence intervals and p-values?

What are confidence intervals and p-values? What is...? series Second edition Statistics Supported by sanofi-aventis What are confidence intervals and p-values? Huw TO Davies PhD Professor of Health Care Policy and Management, University of St Andrews

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

CONFIDENCE INTERVALS FOR COST EFFECTIVENESS RATIOS: A COMPARISON OF FOUR METHODS

CONFIDENCE INTERVALS FOR COST EFFECTIVENESS RATIOS: A COMPARISON OF FOUR METHODS HEALTH ECONOMICS, VOL. 6: 243 252 (1997) ECONOMIC EVALUATION CONFIDENCE INTERVALS FOR COST EFFECTIVENESS RATIOS: A COMPARISON OF FOUR METHODS DANIEL POLSKY 1, HENRY A. GLICK 1 *, RICHARD WILLKE 2 AND KEVIN

More information

Aachen Summer Simulation Seminar 2014

Aachen Summer Simulation Seminar 2014 Aachen Summer Simulation Seminar 2014 Lecture 07 Input Modelling + Experimentation + Output Analysis Peer-Olaf Siebers pos@cs.nott.ac.uk Motivation 1. Input modelling Improve the understanding about how

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA E. J. Dadey SSNIT, Research Department, Accra, Ghana S. Ankrah, PhD Student PGIA, University of Peradeniya, Sri Lanka Abstract This study investigates

More information

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012] Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Guide to Biostatistics

Guide to Biostatistics MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading

More information

Chi-square test Fisher s Exact test

Chi-square test Fisher s Exact test Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

More information

A study on the bi-aspect procedure with location and scale parameters

A study on the bi-aspect procedure with location and scale parameters 통계연구(2012), 제17권 제1호, 19-26 A study on the bi-aspect procedure with location and scale parameters (Short Title: Bi-aspect procedure) Hyo-Il Park 1) Ju Sung Kim 2) Abstract In this research we propose a

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course

Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course Prerequisite: Stat 3201 (Introduction to Probability for Data Analytics) Exclusions: Class distribution:

More information

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc.

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc. An Application of the G-formula to Asbestos and Lung Cancer Stephen R. Cole Epidemiology, UNC Chapel Hill Slides: www.unc.edu/~colesr/ 1 Acknowledgements Collaboration with David B. Richardson, Haitao

More information

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

More information

Correlational Research

Correlational Research Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)

Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship) 1 Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship) I. Authors should report effect sizes in the manuscript and tables when reporting

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

More information

Study Design and Statistical Analysis

Study Design and Statistical Analysis Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing

More information

Statistical Functions in Excel

Statistical Functions in Excel Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

More information

Session 54 PD, Credibility and Pooling for Group Life and Disability Insurance Moderator: Paul Luis Correia, FSA, CERA, MAAA

Session 54 PD, Credibility and Pooling for Group Life and Disability Insurance Moderator: Paul Luis Correia, FSA, CERA, MAAA Session 54 PD, Credibility and Pooling for Group Life and Disability Insurance Moderator: Paul Luis Correia, FSA, CERA, MAAA Presenters: Paul Luis Correia, FSA, CERA, MAAA Brian N. Dunham, FSA, MAAA Credibility

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Statistical Rules of Thumb

Statistical Rules of Thumb Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

More information

ONLINE APPENDIX FOR PUBLIC HEALTH INSURANCE, LABOR SUPPLY,

ONLINE APPENDIX FOR PUBLIC HEALTH INSURANCE, LABOR SUPPLY, ONLINE APPENDIX FOR PUBLIC HEALTH INSURANCE, LABOR SUPPLY, AND EMPLOYMENT LOCK Craig Garthwaite Tal Gross Matthew J. Notowidigdo December 2013 A1. Monte Carlo Simulations This section describes a set of

More information

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month

More information

Tests for Two Proportions

Tests for Two Proportions Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

More information

SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION. Friday 14 March 2008 9.00-9.

SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION. Friday 14 March 2008 9.00-9. SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION Friday 14 March 2008 9.00-9.45 am Attempt all ten questions. For each question, choose the

More information

Online 12 - Sections 9.1 and 9.2-Doug Ensley

Online 12 - Sections 9.1 and 9.2-Doug Ensley Student: Date: Instructor: Doug Ensley Course: MAT117 01 Applied Statistics - Ensley Assignment: Online 12 - Sections 9.1 and 9.2 1. Does a P-value of 0.001 give strong evidence or not especially strong

More information

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Tests for Two Survival Curves Using Cox s Proportional Hazards Model Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Chapter G08 Nonparametric Statistics

Chapter G08 Nonparametric Statistics G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................

More information

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

Basic Concepts in Research and Data Analysis

Basic Concepts in Research and Data Analysis Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the

More information

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp.

More information

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis Computational Statistics and Data Analysis 53 (2008) 17 26 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Coverage probability

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

The Friedman Test with MS Excel. In 3 Simple Steps. Kilem L. Gwet, Ph.D.

The Friedman Test with MS Excel. In 3 Simple Steps. Kilem L. Gwet, Ph.D. The Friedman Test with MS Excel In 3 Simple Steps Kilem L. Gwet, Ph.D. Copyright c 2011 by Kilem Li Gwet, Ph.D. All rights reserved. Published by Advanced Analytics, LLC A single copy of this document

More information

More details on the inputs, functionality, and output can be found below.

More details on the inputs, functionality, and output can be found below. Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information