A gentle introduction to quantile regression for ecologists

Save this PDF as:

Size: px
Start display at page:

Transcription

1 412 REVIEWS REVIEWS REVIEWS A gentle introduction to quantile regression for ecologists Brian S Cade 1,2 and Barry R Noon 3 Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models. Front Ecol Environ 2003; 1(8): Regression is a common statistical method employed by scientists to investigate relationships between variables. Quantile regression (Koenker and Bassett 1978) is a method for estimating functional relations between variables for all portions of a probability distribution. Although it has begun to be used in ecology and biology (Table 1), many ecologists remain unaware of it, as it was developed relatively recently and is rarely taught in statistics courses at many universities. We present this introduction both to encourage additional applications in ecology and to educate those who are already contemplating using the method. In a nutshell: Quantile regression is a statistical method that could be used effectively by more ecologists Statistical distributions of ecological data often have unequal variation due to complex interactions between the factors affecting organisms that cannot all be measured and accounted for in statistical models Unequal variation implies that there is more than a single slope (rate of change) describing the relationship between a response variable and predictor variables measured on a subset of these factors Quantile regression estimates multiple rates of change (slopes) from the minimum to maximum response, providing a more complete picture of the relationships between variables missed by other regression methods The ecological concept of limiting factors as constraints on organisms often focuses on rates of change in quantiles near the maximum response, when only a subset of limiting factors are measured 1 Fort Collins Science Center, US Geological Survey, Fort Collins, CO 2 Graduate Degree Program in Ecology, Colorado State University, Fort Collins, CO; 3 Department of Fishery and Wildlife Biology, Colorado State University, Fort Collins, CO Typically, a response variable y is some function of predictor variables X, so that y = f(x). Most regression applications in the ecological sciences focus on estimating rates of change in the mean of the response variable distribution as some function of a set of predictor variables; in other words, the function is defined for the expected value of y conditional on X, E(y X). Mosteller and Tukey (1977) noted that it was possible to fit regression curves to other parts of the distribution of the response variable, but that this was rarely done, and therefore most regression analyses gave an incomplete picture of the relationships between variables. This is especially problematic for regression models with heterogeneous variances, which are common in ecology. A regression model with heterogeneous variances implies that there is not a single rate of change that characterizes changes in the probability distributions. Focusing exclusively on changes in the means may underestimate, overestimate, or fail to distinguish real nonzero changes in heterogeneous distributions (Terrell et al. 1996; Cade et al. 1999). The Dunham et al. (2002) analyses relating the abundance of Lahontan cutthroat trout (Oncorhynchus clarki henshawi) to the ratio of stream width to depth illustrates the value of the additional information provided by quantile regression (Figure 1). The ratio was used as a predictor variable because it was an easily obtained measure of channel morphology that was thought to be related to the integrity of habitat in small streams like those typically inhabited by cutthroat trout. Quantile regression estimates indicated a nonlinear, negative relationship with the upper 30% ( 70th percentiles, P 0.10 for H 0 : 1 = 0) of cutthroat densities across 13 streams and 7 years. A weighted least squares regression estimated zero change (90% confidence intervals of to 0.012, P = for H 0 : 1 = 0) in mean densities with stream width to depth. If the authors

2 BS Cade and BR Noon Quantile regression for ecologists had used mean regression estimates, they would have mistakenly concluded that there was no relation between trout densities and the ratio of stream width to depth. Many ecological applications have used quantile regression as a method of estimating rates of change for functions along or near the upper boundary of the conditional distribution of responses because of issues raised by Kaiser et al. (1994), Terrell et al. (1996), Thomson et al. (1996), Cade et al. (1999), and Huston (2002). These authors suggested that if ecological limiting factors act as constraints on organisms, then the estimated effects for the measured factors were not well represented by changes in the means of response variable distributions, when there were many other unmeasured factors that were potentially limiting. The response of the organism cannot change by more than some upper limit set by the measured factors, but may change by less when other unmeasured factors are limiting (Figure 2). This analytical problem is closely related to the more general statistical issue of hidden bias in observational studies due to confounding with unmeasured variables (Rosenbaum 1995). Multiplicative interactions among measured and unmeasured ecological factors that contribute to this pattern are explored in more detail relative to regression quantile estimates and inferences in Cade (2003). Self-thinning of annual plants in the Chihuahuan desert of the southwestern US (Cade and Guo 2000) is an example of a limiting factor where conventional mean regression was inappropriate for estimating the reduction in densities of mature plants with increasing germination densities of seedlings (Figure 3). The effects of this density-dependent process were best revealed at the higher plant densities associated with upper quantiles, where competition for resources was greatest and effects of factors other than intraspecific competition were minimal. Changes in the upper quantiles of densities of mature plants were essentially a 1:1 mapping of germination density when it was low Table 1. Applications of quantile regression in ecology and biology Running speed and body mass of terrestrial mammals Koenker et al Global temperature change over the last century Koenker and Schorfheide 1994 Animal habitat relationships Terrell et al. 1996; Haire et al. 2000; Eastwood et al. 2001; Dunham et al Prey and predator size relationships Scharf et al Plant self-thinning Cade and Guo 2000 Vegetation changes associated with agricultural Allen et al conservation practices Mediterranean fruit fly survival Koenker and Geling 2001 Body size of deep-sea gastropods and dissolved McClain and Rex 2001 oxygen concentration Variation in nuclear DNA of plants across Knight and Ackerly 2002 environmental gradients Plant species diversity and invasibility Brown and Peet 2003 Trout per m of stream Intercept (b 0 ) (< 100/0.25-m 2 ), whereas there was a strong decrease in density of mature plants at higher germination densities consistent with density-dependent competition. Stream width:depth Quantile Slope (b 1 ) Quantile Figure 1. Quantile regression was used to estimate changes in Lahontan cutthroat trout density (y) as a function of the ratio of stream width to depth (X) for 7 years and 13 streams in the eastern Lahontan basin of the western US. Photo is of a typical adult Lahontan cutthroat from these small streams. (top left) A scatterplot of n = 71 observations of stream width:depth and trout densities with 0.95, 0.75, 0.50, 0.25, and 0.05 quantile (solid lines) and least squares regression (dashed line) estimates for the model ln y = X +. Weighted estimates used here were based on weights = 1/( X) but did not differ substantially from unweighted estimates used by Dunham et al. (2002). Sample estimates, b 0 ( ) (bottom left) and b 1 ( ) (bottom right), are shown as a red step function. Red dashed lines connect endpoints of 90% confidence intervals. Photo courtesy of JB Dunham 413

3 Quantile regression for ecologists BS Cade and BR Noon 414 Organism response Measured factor Ideal: only measured factor limiting 1 unmeasured factor limiting at some sites 2 unmeasured factors limiting at some sites Unmeasured factors limiting at many sites Figure 2. The top graph represents the ideal statistical situation where an organism response is driven primarily by the measured factor(s) included in the linear regression model; ie all other potential limiting factors are at permissive levels. As we proceed from top to bottom, an increasing number of factors that were not measured become limiting at some sample locations and times, increasing the heterogeneity of organism response with respect to the measured factor(s) included in the regression model. Quantile regression was developed in the 1970s by econometricians (Koenker and Bassett 1978) as an extension of the linear model for estimating rates of change in all parts of the distribution of a response variable. The estimates are semiparametric in the sense that no parametric distributional form (eg normal, Poisson, negative binomial, etc) is assumed for the random error part of the model, although a parametric form is assumed for the deterministic portion of the model (eg, 0 X X 1 ). The conditional quantiles denoted by Q y ( X) are the inverse of the conditional cumulative distribution function of the response variable, F y -1 ( X), where [0, 1] denotes the quantiles (Cade et al. 1999; Koenker and Machado 1999). For example, for = 0.90, Q y (0.90 X) is the 90th percentile of the distribution of y conditional on the values of X; in other words, 90% of the values of y are less than or equal to the specified function of X. Note that for symmetric distributions, the 0.50 quantile (or median) is equal to the mean. Here we consider functions of X that are linear in the parameters; eg Q y ( X) = 0 ( )X ( )X ( )X 2 +,..., + p ( )X p, where the ( ) notation indicates that the parameters are for a specified quantile. The parameters vary with due to effects of the th quantile of the unknown error distribution. Parameter estimates in linear quantile regression models have the same interpretation as those in any other linear model. They are rates of change conditional on adjusting for the effects of the other variables in the model, but now are defined for some specified quantile. In the 1-sample setting with no predictor variables, quantiles are usually estimated by a process of ordering the sample data. The beauty of the extension to the regression model was the recognition that quantiles could be estimated by an optimization function minimizing a sum of weighted absolute deviations, where the weights are asymmetric functions of (Koenker and Bassett 1978; Koenker and d Orey 1987). Currently, the statistical theory and computational routines for estimating and making inferences on regression quantiles are best developed for the linear model (Gutenbrunner et al. 1993; Koenker 1994; Koenker and Machado 1999; Cade 2003), but are also available for parametric nonlinear (Welsh et al. 1994; Koenker and Park 1996) and nonparametric, nonlinear smoothers (Koenker et al. 1994; Yu and Jones 1998). Quantile regression models present many new possibilities for the statistical analysis and interpretation of ecological data. With those new possibilities come new challenges related to estimation, inference, and interpretation. Here we provide an overview of several of the issues ecologists are likely to encounter when conducting and interpreting quantile regression analyses. More technical discussion is provided in papers cited in References. Quantiles and ordering in the linear model Regression quantile estimates are an ascending sequence of planes that are above an increasing proportion of sample observations with increasing values of the quantiles (Figure 4). It is this operational characteristic that extends the concepts of quantiles, order statistics, and rankings to the linear model (Gutenbrunner et al. 1993; Koenker and Machado 1999). The proportion of observations less than or equal to a given regression quantile estimate for example, the 90th percentile given by Q y (0.90 X) in Figure 4 will not in general be exactly equal to. The simplex linear programming solution minimizing the sum of weighted absolute deviations ensures that any regression quantile estimate will fit through at least p + 1 of the n sample observations for a model with p + 1 predictor

4 Quantile regression for ecologists BS Cade and BR Noon 416 Figure 4. (top) A sample (n = 90) from a homogenous error (lognormal with median = 0 and = 0.75) model, y = X 1 +, 0 = 6.0 and 1 = 0.05 with 0.90, 0.75, 0.50, 0.25, and 0.10 regression quantile estimates (solid lines) and least squares estimate of mean function (dashed line). Sample estimates b 0 ( ) (middle) and b 1 ( ) (bottom), are shown as red step functions. Dashed red lines connect endpoints of 90% confidence intervals. Parameters 0 ( ) (middle) and 1 ( ) (bottom), are blue lines. and bottom). In this situation, ordinary least squares regression is commonly modified by incorporating weights (which usually have to be estimated) in inverse proportion to the variance function (Neter et al. 1996). Typically, the use of weighted least squares is done to improve estimates of the sampling variation for the estimated mean function, and not done specifically to estimate the different rates of change in the quantiles of the distributions of y conditional on X. However, Hubert et al. (1996) and Gerow and Bilen (1999) described applications of least squares regression where this might be done. Estimating prediction intervals based on weighted least squares estimates implicitly recognize these unequal rates of change in the quantiles of y (Cunia 1987). Generalized linear models offer alternative ways to link changes in the variances ( 2 ) of y with changes in the mean ( ) based on assuming some specific distributional form in the exponential family for example, Poisson, negative binomial, or gamma (McCullagh and Nelder 1989). But again, the purpose is usually to provide better estimates of rates of change in the mean ( ) of y rather than estimates in the changes in the quantiles of y that must occur when variances are heterogeneous. Estimating prediction intervals for generalized linear models would implicitly recognize that rates of change in the quantiles of y cannot be the same for all quantiles, and these interval estimates would be linked to and sensitive to violations of the assumed error distribution. An advantage of using quantile regression to model heterogeneous variation in response distributions is that no specification of how variance changes are linked to the mean is required, nor is there any restriction to the exponential family of distributions. Furthermore, we can also detect changes in the shape of the distributions of y across the predictor variables (Koenker and Machado 1999). Complicated changes in central tendency, variance, and shape of distributions are common in statistical models applied to observational data because of model misspecification. This can occur because the appropriate functional forms are not used (for example, using linear instead of nonlinear), and because all relevant variables are not included in the model (Cade et al. 1999; Cade 2003). Failure to include all relevant variables occurs because of insufficient knowledge or ability to measure all relevant processes. This should be considered the norm for observational studies in ecology as it is in many other scientific disciplines. An example of a response distribution pattern that may involve changes in central tendency, variance, and shape is shown in Figure 6. These data from Cook and Irwin (1985) were collected to estimate how pronghorn (Antilocapra americana) densities changed with features of their habitat on winter ranges. Here, shrub canopy cover was the habitat feature used as an indirect measure of the amount of winter forage available. Rates of change in pronghorn densities due to shrub canopy cover (b 1 ) were fairly constant for the lower 1/3 of the quantiles (0.25 per 1% change in cover), increased moderately for the central 1/3 of the quantiles ( ), and doubled ( ) in the upper 1/3 of the quantiles (Figure 6, bottom right). Changes in b 1 ( ) do not appear to mirror those for b 0 ( ), indicating that there is more than just a change in central

5 BS Cade and BR Noon Quantile regression for ecologists Figure 5. (top) A sample (n = 90) from a heterogeneous error (normal with = 0 and = X 1 ) model, y = X 1 +, 0 = 6.0 and 1 = 0.10 with 0.90, 0.75, 0.50, 0.25, and 0.10 regression quantile estimates (solid lines) and least squares estimate of mean function (dashed line). Sample estimates b 0 ( ) (middle) and b 1 ( ) (bottom) are shown as red step function. Red dashed lines connect endpoints of 90% confidence intervals. Parameters 0 ( ) (middle) and 1 ( ) (bottom) are blue lines. 417 tendency and variance of pronghorn densities associated with changes in shrub canopy cover. Clearly, too strong a conclusion is not justified with the small sample (n = 28) and large sampling variation for upper quantiles. But either an ordinary least squares regression estimate (b 1 = 0.483, 90% CI = ) or more appropriate weighted least squares regression estimate would fail to recognize that pronghorn densities changed at both lower and higher rates as a function of shrub canopy cover at lower and upper quantiles of the density distribution, respectively. Here, regression quantile estimates provide a more complete characterization of an interval of changes in pronghorn densities ( ) that were associated with changes in winter food availability as measured by shrub canopy cover. These intervals are fairly large because pronghorn densities on winter ranges are almost certainly affected by more than just food availability. Estimates are for intervals of quantiles Regression quantile estimates break the interval [0, 1] into a finite number of smaller, unequal length intervals. Thus, while we may refer to and graph the estimated function for a selected regression quantile such as the 0.90, the estimated function actually applies to some small interval of quantiles; for example, [0.894, 0.905] for the 0.90 regression quantile in Figure 4. Unlike the 1-sample quantile estimates, the [0, 1] interval of regression quantile estimates may be broken into more than n intervals that aren t necessarily of equal length 1/n. The number and length of these intervals are dependent on the sample size, number or parameters, and distribution of the response variable (Portnoy 1991). Estimates plotted as step functions in Figure 4, middle and bottom, are for 101 intervals of quantiles on the interval [0, 1] for which each has an estimate b 0 ( ) and b 1 ( ), corresponding to the intercept and slope. Because the estimates actually apply to a small interval of quantiles, it is appropriate to graph the estimates as a step function. Sampling variation differs across quantiles It is not surprising that sampling variation differs among quantiles. Generally, sampling variation will increase as the value of approaches 0 or 1, but the specifics are dependent on the data distribution, model, sample size n, and number of parameters p. Estimates further from the center of the distribution the median or 50th percentile given by Qy(0.50 X) usually cannot be estimated as precisely. To display the sampling variation with the estimates (Figure 4, middle and bottom), a confidence band across the quantiles [0, 1] was constructed by estimating the pointwise confidence interval for 19 selected quantiles {0.05, 0.10,..., 0.95}. These intervals were based on inverting a quantile rankscore test (Koenker 1994; Cade et al. 1999; Koenker and Machado 1999; Cade 2003). It is possible to compute confidence intervals for all unique intervals of quantiles, but this compu-

8 Quantile regression for ecologists BS Cade and BR Noon 420 Eastwood PD, Meaden GJ, and Grioche A Modeling spatial variations in spawning habitat suitability for the sole Solea solea using regression quantiles and GIS procedures. Mar Ecol Prog Ser 224: Gerow K and Bilen C Confidence intervals for percentiles: an application to estimation of potential maximum biomass of trout in Wyoming streams. North Am J Fish Mana 19: Gutenbrunner C, Jurecková J, Koenker R, and Portnoy S Tests of linear hypotheses based on regression rank scores. J Nonparametr Stat 2: Haire SL, Bock CE, Cade BS, and Bennett BC The role of landscape and habitat characteristics in limiting abundance of grassland nesting songbirds in an urban open space. Landscape Urban Plan 48: Hubert WA, Marwitz TD, Gerow KG, et al Estimation of potential maximum biomass of trout in Wyoming streams to assist management decisions. North Am J Fish Mana 16: Huston MA Introductory essay: critical issues for improving predictions. In: Scott JM, et al. (Eds). Predicting species occurrences: issues of accuracy and scale. Covelo, CA: Island Press. p Kaiser MS, Speckman PL, and Jones JR Statistical models for limiting nutrient relations in inland waters. J Am Stat Assoc 89: Knight CA and Ackerly DD Variation in nuclear DNA content across environmental gradients: a quantile regression analysis. Ecol Lett 5: Koenker R Confidence intervals for regression quantiles. In: Mandl P and Hus ková M (Eds). Asymptotic statistics: proceedings of the 5th Prague Symposium. Physica Verlag: Heidleberg. p Koenker R and Bassett G Regression quantiles. Econometrica 46: Koenker R and d Orey V Computing regression quantiles. Appl Stat 36: Koenker R and Geling O Reappraising medfly longevity: a quantile regression survival analysis. J Am Stat Assoc 96: Koenker R and Machado JAF Goodness of fit and related inference processes for quantile regression. J Am Stat Assoc 94: Koenker R and Park BJ An interior point algorithm for nonlinear quantile regression. J Econometrics 71: Koenker R and Schorfheide F Quantile spline models for global temperature change. Climatic Change 28: Koenker R, Ng P, and Portnoy S Quantile smoothing splines. Biometrika 81: McClain CR and Rex MA The relationship between dissolved oxygen concentration and maximum size in deep-sea turrid gastropods: an application of quantile regression. Mar Biol 139: McCullagh P and Nelder JA Generalized linear models. New York: Chapman and Hall. Mosteller F and Tukey JW Data analysis and regression. New York: Addison-Wesley. Neter JM, Kutner H, Nachtsheim CJ, and Wasserman W Applied linear statistical models. Chicago, IL: Irwin. Portnoy S Asymptotic behavior of the number of regression quantile breakpoints. SIAM J Sci Stat Comp 12: Rosenbaum PR Quantiles in nonrandom samples and observational studies. J Am Stat Assoc 90: Scharf FS, Juanes F, and Sutherland M Inferring ecological relationships from the edges of scatter diagrams: comparison of regression techniques. Ecology 79: Terrell JW, Cade BS, Carpenter J, and Thompson JM Modeling stream fish habitat limitations from wedged-shaped patterns of variation in standing stock. Trans Am Fish Soc 125: Thomson JD, Weiblen G, Thomson BA, et al Untangling multiple factors in spatial distributions: lilies, gophers and rocks. Ecology 77: Vardeman SB What about the other intervals? Am Stat 46: Welsh AH, Carroll RJ, and Rupert D Fitting heteroscedastic regression models. J Am Stat Assoc 89: Yu K and Jones MC Local linear quantile regression. J Am Stat Assoc 93: Zhou KQ and Portnoy SL Direct use of regression quantiles to construct confidence sets in linear models. Ann Stat 24: Zhou KQ and Portnoy SL Statistical inference on heteroscedastic models based on regression quantiles. J Nonparametr Stat 9:

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

Design and Analysis of Ecological Data Conceptual Foundations: Ec o lo g ic al Data

Design and Analysis of Ecological Data Conceptual Foundations: Ec o lo g ic al Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

ECONOMIC INJURY LEVEL (EIL) AND ECONOMIC THRESHOLD (ET) CONCEPTS IN PEST MANAGEMENT. David G. Riley University of Georgia Tifton, Georgia, USA

ECONOMIC INJURY LEVEL (EIL) AND ECONOMIC THRESHOLD (ET) CONCEPTS IN PEST MANAGEMENT David G. Riley University of Georgia Tifton, Georgia, USA One of the fundamental concepts of integrated pest management

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA Michael R. Middleton, McLaren School of Business, University of San Francisco 0 Fulton Street, San Francisco, CA -00 -- middleton@usfca.edu

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

Statistical Rules of Thumb

Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

Population Principles Malthus and natural populations (TSC 220)

Population Principles Malthus and natural populations (TSC 220) Definitions Population individuals of the same species in a defined area Examples: a wildlife ecologist might study the moose and wolves

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

Organizing Your Approach to a Data Analysis

Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

Lecture 18 Linear Regression

Lecture 18 Statistics Unit Andrew Nunekpeku / Charles Jackson Fall 2011 Outline 1 1 Situation - used to model quantitative dependent variable using linear function of quantitative predictor(s). Situation

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

Regression analysis in the Assistant fits a model with one continuous predictor and one continuous response and can fit two types of models:

This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. The simple regression procedure in the

Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

CNHP s mission is to preserve the natural diversity of life by contributing the essential scientific foundation that leads to lasting conservation of Colorado's biological wealth. Colorado Natural Heritage

7. Tests of association and Linear Regression

7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

ECOL 411/ MARI451 Reading Ecology and Special Topic Marine Science - (20 pts) Course coordinator: Professor Stephen Wing Venue: Department of Marine

ECOL 411/ MARI451 Reading Ecology and Special Topic Marine Science - (20 pts) Course coordinator: Professor Stephen Wing Venue: Department of Marine Science Seminar Room (140) Time: Mondays between 12-3

The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

Senior Secondary Australian Curriculum

Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics

Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics Topics covered: Parameters and statistics Sample mean and sample standard deviation Order statistics and

Teaching Business Statistics through Problem Solving

Teaching Business Statistics through Problem Solving David M. Levine, Baruch College, CUNY with David F. Stephan, Two Bridges Instructional Technology CONTACT: davidlevine@davidlevinestatistics.com Typical

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

Influence of Forest Management on Headwater Stream Amphibians at Multiple Spatial Scales

Influence of Forest Management on Headwater Stream Amphibians at Multiple Spatial Scales Background Amphibians are important components of headwater streams in forest ecosystems of the Pacific Northwest

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

, then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

DRAFT: Statistical methodology to test for evidence of seasonal variation in rates of earthquakes in the Groningen field

DRAFT: Statistical methodology to test for evidence of seasonal variation in rates of earthquakes in the Groningen field Restricted PRELIMINARY DRAFT: SR.15.xxxxx April 2015 Restricted PRELIMINARY DRAFT:

The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

EBM Cheat Sheet- Measurements Card

EBM Cheat Sheet- Measurements Card Basic terms: Prevalence = Number of existing cases of disease at a point in time / Total population. Notes: Numerator includes old and new cases Prevalence is cross-sectional

LEARNING OBJECTIVES SCALES OF MEASUREMENT: A REVIEW SCALES OF MEASUREMENT: A REVIEW DESCRIBING RESULTS DESCRIBING RESULTS 8/14/2016

UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION LEARNING OBJECTIVES Contrast three ways of describing results: Comparing group percentages Correlating scores Comparing group means Describe

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

Manual of Fisheries Survey Methods II: with periodic updates. Chapter 6: Sample Size for Biological Studies

January 000 Manual of Fisheries Survey Methods II: with periodic updates : Sample Size for Biological Studies Roger N. Lockwood and Daniel B. Hayes Suggested citation: Lockwood, Roger N. and D. B. Hayes.

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

Spearman s correlation

Spearman s correlation Introduction Before learning about Spearman s correllation it is important to understand Pearson s correlation which is a statistical measure of the strength of a linear relationship

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:

MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having

Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination

Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg - ) Chromium (mg kg - ) Copper (mg kg - ) Lead (mg kg - ) Mercury

Quantitative Methods for Finance

Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

Biodiversity and Ecosystem Services: Arguments for our Future Environment

Biodiversity and Ecosystem Services: Arguments for our Future Environment How have we advanced our understanding of the links between biodiversity, ecosystem functions and ecosystem services? The issue

9.1 (a) The standard deviation of the four sample differences is given as.68. The standard error is SE (ȳ1 - ȳ 2 ) = SE d - = s d n d

CHAPTER 9 Comparison of Paired Samples 9.1 (a) The standard deviation of the four sample differences is given as.68. The standard error is SE (ȳ1 - ȳ 2 ) = SE d - = s d n d =.68 4 =.34. (b) H 0 : The mean

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

Moderation. Moderation

Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

Schools Value-added Information System Technical Manual

Schools Value-added Information System Technical Manual Quality Assurance & School-based Support Division Education Bureau 2015 Contents Unit 1 Overview... 1 Unit 2 The Concept of VA... 2 Unit 3 Control

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

Jitter Measurements in Serial Data Signals

Jitter Measurements in Serial Data Signals Michael Schnecker, Product Manager LeCroy Corporation Introduction The increasing speed of serial data transmission systems places greater importance on measuring

Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

Kernel density estimation with adaptive varying window size

Pattern Recognition Letters 23 (2002) 64 648 www.elsevier.com/locate/patrec Kernel density estimation with adaptive varying window size Vladimir Katkovnik a, Ilya Shmulevich b, * a Kwangju Institute of

MATH. ALGEBRA I HONORS 9 th Grade 12003200 ALGEBRA I HONORS

* Students who scored a Level 3 or above on the Florida Assessment Test Math Florida Standards (FSA-MAFS) are strongly encouraged to make Advanced Placement and/or dual enrollment courses their first choices

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

Continuous Random Variables and Probability Distributions. Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage

4 Continuous Random Variables and Probability Distributions Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage Continuous r.v. A random variable X is continuous if possible values

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

GRADES 7, 8, AND 9 BIG IDEAS

Table 1: Strand A: BIG IDEAS: MATH: NUMBER Introduce perfect squares, square roots, and all applications Introduce rational numbers (positive and negative) Introduce the meaning of negative exponents for

Data Exploration Data Visualization

Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

the number of organisms in the squares of a haemocytometer? the number of goals scored by a football team in a match?

Poisson Random Variables (Rees: 6.8 6.14) Examples: What is the distribution of: the number of organisms in the squares of a haemocytometer? the number of hits on a web site in one hour? the number of

Statistical Functions in Excel

Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

"49 39' 49 38.7' E 49 39.0' E 37 46.7' S 37 47.1' S.

Appendix Template for Submission of Scientific Information to Describe Ecologically or Biologically Significant Marine Areas Note: Please DO NOT embed tables, graphs, figures, photos, or other artwork

Estimation and attribution of changes in extreme weather and climate events

IPCC workshop on extreme weather and climate events, 11-13 June 2002, Beijing. Estimation and attribution of changes in extreme weather and climate events Dr. David B. Stephenson Department of Meteorology

Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

Developing a New Freshman Mathematics Course for Biology and Environmental Sciences Majors. Department of Mathematics December 2015

Developing a New Freshman Mathematics Course for Biology and Environmental Sciences Majors. Department of Mathematics December 2015 Programs changing to allow Math 1044 as an alternate to Math 1042: GSTC

Parametric versus Semi/nonparametric Regression Models

Parametric versus Semi/nonparametric Regression Models Hamdy F. F. Mahmoud Virginia Polytechnic Institute and State University Department of Statistics LISA short course series- July 23, 2014 Hamdy Mahmoud

You have data! What s next?

You have data! What s next? Data Analysis, Your Research Questions, and Proposal Writing Zoo 511 Spring 2014 Part 1:! Research Questions Part 1:! Research Questions Write down > 2 things you thought were

Quantitative Inventory Uncertainty

Quantitative Inventory Uncertainty It is a requirement in the Product Standard and a recommendation in the Value Chain (Scope 3) Standard that companies perform and report qualitative uncertainty. This

What Does the Correlation Coefficient Really Tell Us About the Individual?

What Does the Correlation Coefficient Really Tell Us About the Individual? R. C. Gardner and R. W. J. Neufeld Department of Psychology University of Western Ontario ABSTRACT The Pearson product moment

The Potential Use of Remote Sensing to Produce Field Crop Statistics at Statistics Canada

Proceedings of Statistics Canada Symposium 2014 Beyond traditional survey taking: adapting to a changing world The Potential Use of Remote Sensing to Produce Field Crop Statistics at Statistics Canada

Step 5: Conduct Analysis. The CCA Algorithm

Model Parameterization: Step 5: Conduct Analysis P Dropped species with fewer than 5 occurrences P Log-transformed species abundances P Row-normalized species log abundances (chord distance) P Selected