Nonlinear Regression:

Size: px
Start display at page:

Download "Nonlinear Regression:"

Transcription

1 Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day : Improved Inference and Visualisation Andreas Ruckstuhl Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

2 Nonlinear Regression: Half-Day / 8 Outline: Half-Day 1 Half-Day Half-Day 3 Estimation and Standard Inference The Nonlinear Regression Model Iterative Estimation - Model Fitting Inference Based on Linear Approximations Improved Inference and Visualisation Likelihood Based Inference Profile t Plot and Profile Traces Parameter Transformations Bootstrap, Prediction and Calibration Bootstrap Prediction Calibration Outlook

3 Nonlinear Regression: Half-Day 3 / 8.1 Likelihood Based Inference F-Test for the whole parameter vector θ : T = (n p) p S θ S θ S θ a F p,n p. It s like in lineare regression, where the result holds exactly however. And the resulting confidence region is { ( )} θ S θ S θ 1 + p n p qfp,n p 1 α. In case of the linear regression, this confidence region is identical to the confidence region based on multivariate normal distribution of β. In case of the nonlinear regression, this confidence region is more accurate than that one based on multivariate normal distribution of β. C.f. discussion of the deviance test and the t-test in GLMs.

4 Nonlinear Regression: Half-Day / 8 However, it is very difficult to calculate this more accurate confidence region! p = : We can determine the more accurate confidence region by standard contouring methods, that is, by evaluating S θ for a grid of θ values and approximating the contours by straight line segments in the grid. example, see next slide p 3: There are no contour plots.

5 Nonlinear Regression: Half-Day 5 / 8 Likelihood Contour Lines Nominal 8 and 95% likelihood contours lines ( ) and confidence ellipsoids based on Wald-type asymptotic approximations ( ). + indicates the least-squares estimation. These solutions do agree satisfactorily in the example Puromycin (left), but do disagree in the example Biochemical Oxygen Demand (right) clearly. θ θ θ θ 1

6 Nonlinear Regression: Half-Day 6 / 8 F-Test for a single Parameter: θ k = θ k - Such a null hypothesis ignores the other parameters. - The other parameters are fitted to the data by least-squares. - The minimum is called Sk. It depends on θ k, hence Sk = Sk θ k. The F-test statistic for the test θ k = θk is S k θk S θ T k = (n p). S θ It is approximatly F 1,n p distributed. In linear regression, this F-test is equivalent to the t-test, since the test statistic of the F-test is proportional to the squared of the test statistic of the t-test. In nonlinear regression, this F-test is not equivalent to the t-test of the asymptotic Wald-type test.

7 Nonlinear Regression: Half-Day 7 / 8 A more accurate t-test Based on the previous result, we can construct a t-type test which is more accurate than that introduced initially: Take the square-root from the F-test statistic and multiply it with the sign of θ k θ k, S k θk θ S T k θk := sign θk θk σ This test statistic is t n p distributed approximately.. (In linear regression, this test statistic is equivalent to the usual t-test.)

8 Nonlinear Regression: Half-Day 8 / 8. Profile t Plot and Profile Traces Based on the just introduced test statistic, a graphical tool called profile t plot can be designed for assessing the quality of the linear approximation: We plot the test statistic T k θ k as a function of θ k In linear regression, the profile t function is a straight line. the profile t function In nonlinear regression, the profile t function can be any monotone increasing function. Profile t Plot: Plot T k θ k versus δ k θk def = θ k θk se ( θk ) The more curved the profile t function is the stronger the nonlinearity in a neighbourhood of θk! Hence, the profile t plot shows how accurate the linear approximation of the standard test and standard confidence interval is. The neighbourhood important for statistics is given by δ k θk.5. Why?

9 Nonlinear Regression: Half-Day 9 / 8 Example: Profile t Plots θ θ T 1 (θ 1 ) Level T 1 (θ 1 ) Level δ(θ 1 ) 1 3 δ(θ 1 ) Profile t Plot ( ) for θ 1 for the examples Puromycin data (left) and Biochemical Oxygen Demand data (right).

10 Nonlinear Regression: Half-Day 1 / 8 Example: Cellulose membrane (5) - Profile t plots θ θ T 1 (θ 1 ) Level T (θ ) Level δ(θ 1 ) δ(θ ) θ θ T 3 (θ 3 ) Level T (θ ) Level δ(θ 3 ) δ(θ )

11 Nonlinear Regression: Half-Day 11 / 8 Example: Cellulose membrane (6) Wald-type CI profile -type CI R Output: Parameters: Value Std. Error t value θ θ θ θ Residual standard error:.93 on 35 df R Output: > confint(mem.fit) Waiting for profiling to be done....5% 97.5% θ θ θ θ Approximate 95% confidence intervals ( θk ± se θk q t ) θ 1: [163.5, ] θ : [159.6, 16.11] θ 3: [1.9, 3.5] θ : [-.65, -.37] θ 1: [163.7, ] θ : [159.36, 16.1] θ 3: [1.93, 3.6] θ : [-.69, -.38]

12 Nonlinear Regression: Half-Day 1 / 8 Likelihood Profile Traces Likelihood profile traces are another useful tool. The Parameter θ j, estimated at θ k = θk (k) hence the notation θ j θk. (k j), is evaluated as a function; Remember: min S θ 1,..., θk,..., θp = S θ1,..., θk 1, θk, θk+1,..., θp short = Sk θk {θ h,h k} θ (k) j Plot the profile trace versus θk but reflected at the 5 line; that is y-coordinate vs x-coordinate θ (k) j vs θk overlaid by θj vs θ(j) k overlaid by the profile trace θ (j) k versus θ j

13 Nonlinear Regression: Half-Day 13 / 8 Examples of Likelihood Profile Traces Likelihood Profile Traces for the example Puromycin (left) and the example Biochemical Oxygen Demand (right), complemented by he 8%- and 95% confidence region (gray curve) θ θ θ θ 1

14 Nonlinear Regression: Half-Day 1 / 8 Properties of Likelihood Profile Traces With linear regression: The profile traces are two straight lines. The angle between these two lines represents the correlation between the estimated parameters corresponding to the lines If the correlation between the parameters is, then the lines are orthogonal to each other. If the correlation between the parameters is either 1 or -1, then the lines overlay. With nonlinear regression: Both traces may be curved. The heavier the traces deviated from a straight line, the more insufficient is the linear approximation and the inference based on it. The angle between these two traces at the intersection still represents the correlation between the two estimated parameters θ j and θ k.

15 Nonlinear Regression: Half-Day 15 / 8 Example Cellulose Membrane (7) θ 1 Profile t Plot and Profile Traces Traces for the bottom left corner: θ Red: θ(1) vs θ1 Green: θ vs θ() θ θ θ 1 θ θ 3 θ

16 Nonlinear Regression: Half-Day 16 / 8.3 Parameter Transformations In this section we study the effects of transforming the parameters. This topic is based on the fact that the mean regression function can usually be written down by mathematically equivalent expressions. For example The two expression for the Michaelis-Menten function are equivalent Hence θ 1x θ + x = x ϑ 1 + ϑ. x ϑ 1 = θ θ 1 and ϑ = 1 θ 1. Or, we have the two equivalent expressions θ 1e θ x = ϑ 1ϑ x hence, ϑ 1 = θ 1 and ϑ = e θ.

17 Nonlinear Regression: Half-Day 17 / 8 Motivation The parameters of the regression function are transformed to get rid of collinearities improve the convergence of the algorithm improve the linear approximation (e.g., the Wald-type asymptotic) which results in ( nicer profile traces ) and hence to obtain a better quality of the Wald-type confidence intervals Parameter transformation do not chance either the deterministic nor the stochastic part of the regression model! in contrast to variable transformations.

18 Nonlinear Regression: Half-Day 18 / 8 Constraints of the Parameter Domain Subject matter theory: Parameter domain is subject to constraints e.g., θ 1 >, a < θ b What to do? Ignore the constraints and observe whether the algorithm converge and where to. If it fails: Most of the constraints are such that they can be imposed by a suitable transformation of the concerned parameter

19 Nonlinear Regression: Half-Day 19 / 8 Examples of Constraints θ > : Trsf. θ φ = log θ θ = exp φ > for all φ h x; θ h x; e φ a < θ < b: Trsf. θ φ = log b θ θ a θ = a + b a 1+exp φ Let h x; θ = θ 1 e θx + θ 3 e θx with θ, θ > The two pairs of parameters (θ 1, θ ) and (θ 3, θ ) are exchangeable and may thus cause convergence problems Workaround: Impose the constraint θ < θ! Trsf. θ φ with θ 1 = φ 1, θ = e φ, θ 3 = φ 3, and θ = e φ (1 + e φ ) h x; (θ 1, φ, θ 3, φ ) T = θ 1 exp e φ x + θ 3 exp e φ (1 + e φ ) x

20 Nonlinear Regression: Half-Day / 8 Collinearity Example to show the problem: Let h x; θ = θ 1 e θx ( ) The partial derivatives ( matrix A) are θ 1 h x; θ = e θ x Hence a T 1 def = (e θ x 1,..., e θ x n ) θ h x; θ = θ 1 x e θ x a T def = ( θ 1 x 1 e θ x 1,..., θ 1 x n e θ x n ) The vectors a 1 and a incline to collinearity if all x i >. Reformulate ( ): h x; θ = θ 1 exp θ (x x + x ) Applying the reparametrization φ 1 def = θ 1 e θx und φ def = θ we obtain h x, φ = φ 1 exp φ (x x ). This functions results in (approximately) optimal matrix A if x = x is chosen.

21 Nonlinear Regression: Half-Day 1 / 8 Example Cellulose Membrane (7) θ 1 Profile t Plot and Profile Traces (Slide from Half-day ). θ θ3 and θ highly correlated Profile traces of θ and θ 3 as well as θ and θ are twisted clearly θ θ θ 1 θ θ 3 θ

22 Nonlinear Regression: Half-Day / 8 Example Cellulose Membrane (8) Regression function h x, θ = θ1 + θ 1θ 3+θ ((x i x )+x ) θ 3+θ ((x i x )+x ) Remove collinearity by introducing θ 3 def = θ 3 + θ x, where x = median x i : h x, θ = θ1 + θ 1 θ 3+θ (x i x ) θ 3+θ (x i x ) Improve linear approximation: Def Step 1: Introduce θ = 1 θ : h x, θ = θ1 + θ 1 θ3 θ(x i x θ 3 θ(x i x ) Step : θ 1 Def = θ1 + θ 1 θ 3 1 θ 3 + 1, θ Def = log 1 ( θ1 θ 1 θ θ3 (x i x ) ) h x, θ = θ1 +1 θ 1 θ ) θ 3 θ (x i x )

23 Nonlinear Regression: Half-Day 3 / 8 θ 1 Example Cellulose Membrane (9) Profile t functions and profile traces after reparametrization. θ θ θ θ 1 θ θ 3 θ

24 Nonlinear Regression: Half-Day / 8 Example Cellulose Membrane (1) Original parametrization Parameters: Value Std. Error t value θ θ θ θ Residual standard error: on 35 df Correlation of Parameter Estimates: θ 1 θ θ 3 θ -.56 θ θ Reparametrized Parameters: Value Std. Error t value θ θ θ θ Residual standard error:.931 on 35 df Correlation of Parameter Estimates: θ 1 θ θ3 θ θ θ

25 Nonlinear Regression: Half-Day 5 / 8 Successful Reparametrization A successful reparametrization depends both on the regression function and on the data set There are no general guidelines which results in a tedious search for successful reparameterisations. Another Example: h x, θ = = θ 1θ 3(x () x (3) ) 1 + θ x (1) + θ 3x () + θ x (3) ( ) x () x (3) 1 θ 1 θ 3 + θ θ 1 θ 3 x (1) + θ 3 θ 1 θ 3 x () + θ θ 1 θ 3 x (3) = x () x (3) φ 1 + φ x (1) + φ 3x () + φ x (3) ( ) The parametrization ( ) is preferd to ( ) in most cases (cf. exercises).

26 Nonlinear Regression: Half-Day 6 / 8 Interpretation? In most cases, the original parameters have a physical interpretation parameter must be back-transformed Standard approach for back-transformation: Example: Used parameter transformation: θ φ = ln θ Let φ and σ φ the estimated parameters. Estimate θ by θ = exp φ. Its standard error is obtained commonly by Gauss law of error propagation (cf. Stahel, Sec 6.1): ( ) σ θ exp φ ( σ φ = exp φ ) φ σ exp φ φ σ θ σ φ. φ= φ Hence, an approximate 95% confidence interval for θ is: g φ ( ) ± σ θ q t n p.975 = exp φ 1 ± q σ φ t n p.975. ( ) But this approach is not recommended because... see next slide

27 Nonlinear Regression: Half-Day 7 / 8 Why Parameter Transformation? so that the parameter falls within a predefined domain. Confidence intervals according to ( ) may violate this requirement! due to the insufficient quality of the confidence interval Gauss law of error propagation will nullify the achievements by the reparametrization since it uses the same linear approximation as the Wald-type asymptotic! Alternatives to the standard approach: Back-transformation of the complete confidence interval; Example: { θ : ln θ φ ± σ φ qt df.975 } forms a better, but still approximate 95% confidence interval for θ. It is identical to [ ] = exp φ σ φ qt df.975, exp φ + σ φ qt df.975, since ln/exp is strictly increasing. In the nd case, the most convenient approach is to form the confidence interval based on the profile t function.

28 Nonlinear Regression: Half-Day 8 / 8 Take Home Message Half-Day The commonly used confidence intervals are based on a (crude) linear approximation. Use graphical tools like profile t plots and profile traces to assess the quality of the approximated confidence intervals (and hence the linear approximation). If insufficient: More accurate confidence intervals can be calculated for single parameters θ k by using profile t functions (as in confint() implemented anyway). Convergence properties of the estimating algorithm and the quality of the Wald-type conference intervals can be improved by applying suitable reparametrizations (parameter transformations). If the interpretation of the original parameters is crucial, then the confidence interval should also be backtransformed and not be determined by Gauss law of error propagation.

Nonlinear Regression:

Nonlinear Regression: Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day 1: Estimation and

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

2 Sample t-test (unequal sample sizes and unequal variances)

2 Sample t-test (unequal sample sizes and unequal variances) Variations of the t-test: Sample tail Sample t-test (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

1. Then f has a relative maximum at x = c if f(c) f(x) for all values of x in some

1. Then f has a relative maximum at x = c if f(c) f(x) for all values of x in some Section 3.1: First Derivative Test Definition. Let f be a function with domain D. 1. Then f has a relative maximum at x = c if f(c) f(x) for all values of x in some open interval containing c. The number

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so: Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

How To Test For Significance On A Data Set

How To Test For Significance On A Data Set Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities. Consider a study in which How many subjects? The importance of sample size calculations Office of Research Protections Brown Bag Series KB Boomer, Ph.D. Director, boomer@stat.psu.edu A researcher conducts

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING: POWER OF THE TEST HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

More information

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Chapter 2 Probability Topics SPSS T tests

Chapter 2 Probability Topics SPSS T tests Chapter 2 Probability Topics SPSS T tests Data file used: gss.sav In the lecture about chapter 2, only the One-Sample T test has been explained. In this handout, we also give the SPSS methods to perform

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Back Analysis of Material Properties

Back Analysis of Material Properties Back Analysis of Material Properties 23-1 Back Analysis of Material Properties This tutorial will demonstrate how to perform back analysis of material properties using sensitivity analysis or probabilistic

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Tests for Two Survival Curves Using Cox s Proportional Hazards Model Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

More information

Domain of a Composition

Domain of a Composition Domain of a Composition Definition Given the function f and g, the composition of f with g is a function defined as (f g)() f(g()). The domain of f g is the set of all real numbers in the domain of g such

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Determining distribution parameters from quantiles

Determining distribution parameters from quantiles Determining distribution parameters from quantiles John D. Cook Department of Biostatistics The University of Texas M. D. Anderson Cancer Center P. O. Box 301402 Unit 1409 Houston, TX 77230-1402 USA cook@mderson.org

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

Numerical methods for American options

Numerical methods for American options Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment

More information

Week 5: Multiple Linear Regression

Week 5: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

How Far is too Far? Statistical Outlier Detection

How Far is too Far? Statistical Outlier Detection How Far is too Far? Statistical Outlier Detection Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 30-325-329 Outline What is an Outlier, and Why are

More information

Pearson s Correlation

Pearson s Correlation Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the

More information

Non-Parametric Tests (I)

Non-Parametric Tests (I) Lecture 5: Non-Parametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

More information

Introduction to Logistic Regression

Introduction to Logistic Regression OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010 Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

Independent t- Test (Comparing Two Means)

Independent t- Test (Comparing Two Means) Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

Reflection and Refraction

Reflection and Refraction Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information