Nonlinear Regression:
|
|
- Anissa Jackson
- 7 years ago
- Views:
Transcription
1 Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day : Improved Inference and Visualisation Andreas Ruckstuhl Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften
2 Nonlinear Regression: Half-Day / 8 Outline: Half-Day 1 Half-Day Half-Day 3 Estimation and Standard Inference The Nonlinear Regression Model Iterative Estimation - Model Fitting Inference Based on Linear Approximations Improved Inference and Visualisation Likelihood Based Inference Profile t Plot and Profile Traces Parameter Transformations Bootstrap, Prediction and Calibration Bootstrap Prediction Calibration Outlook
3 Nonlinear Regression: Half-Day 3 / 8.1 Likelihood Based Inference F-Test for the whole parameter vector θ : T = (n p) p S θ S θ S θ a F p,n p. It s like in lineare regression, where the result holds exactly however. And the resulting confidence region is { ( )} θ S θ S θ 1 + p n p qfp,n p 1 α. In case of the linear regression, this confidence region is identical to the confidence region based on multivariate normal distribution of β. In case of the nonlinear regression, this confidence region is more accurate than that one based on multivariate normal distribution of β. C.f. discussion of the deviance test and the t-test in GLMs.
4 Nonlinear Regression: Half-Day / 8 However, it is very difficult to calculate this more accurate confidence region! p = : We can determine the more accurate confidence region by standard contouring methods, that is, by evaluating S θ for a grid of θ values and approximating the contours by straight line segments in the grid. example, see next slide p 3: There are no contour plots.
5 Nonlinear Regression: Half-Day 5 / 8 Likelihood Contour Lines Nominal 8 and 95% likelihood contours lines ( ) and confidence ellipsoids based on Wald-type asymptotic approximations ( ). + indicates the least-squares estimation. These solutions do agree satisfactorily in the example Puromycin (left), but do disagree in the example Biochemical Oxygen Demand (right) clearly. θ θ θ θ 1
6 Nonlinear Regression: Half-Day 6 / 8 F-Test for a single Parameter: θ k = θ k - Such a null hypothesis ignores the other parameters. - The other parameters are fitted to the data by least-squares. - The minimum is called Sk. It depends on θ k, hence Sk = Sk θ k. The F-test statistic for the test θ k = θk is S k θk S θ T k = (n p). S θ It is approximatly F 1,n p distributed. In linear regression, this F-test is equivalent to the t-test, since the test statistic of the F-test is proportional to the squared of the test statistic of the t-test. In nonlinear regression, this F-test is not equivalent to the t-test of the asymptotic Wald-type test.
7 Nonlinear Regression: Half-Day 7 / 8 A more accurate t-test Based on the previous result, we can construct a t-type test which is more accurate than that introduced initially: Take the square-root from the F-test statistic and multiply it with the sign of θ k θ k, S k θk θ S T k θk := sign θk θk σ This test statistic is t n p distributed approximately.. (In linear regression, this test statistic is equivalent to the usual t-test.)
8 Nonlinear Regression: Half-Day 8 / 8. Profile t Plot and Profile Traces Based on the just introduced test statistic, a graphical tool called profile t plot can be designed for assessing the quality of the linear approximation: We plot the test statistic T k θ k as a function of θ k In linear regression, the profile t function is a straight line. the profile t function In nonlinear regression, the profile t function can be any monotone increasing function. Profile t Plot: Plot T k θ k versus δ k θk def = θ k θk se ( θk ) The more curved the profile t function is the stronger the nonlinearity in a neighbourhood of θk! Hence, the profile t plot shows how accurate the linear approximation of the standard test and standard confidence interval is. The neighbourhood important for statistics is given by δ k θk.5. Why?
9 Nonlinear Regression: Half-Day 9 / 8 Example: Profile t Plots θ θ T 1 (θ 1 ) Level T 1 (θ 1 ) Level δ(θ 1 ) 1 3 δ(θ 1 ) Profile t Plot ( ) for θ 1 for the examples Puromycin data (left) and Biochemical Oxygen Demand data (right).
10 Nonlinear Regression: Half-Day 1 / 8 Example: Cellulose membrane (5) - Profile t plots θ θ T 1 (θ 1 ) Level T (θ ) Level δ(θ 1 ) δ(θ ) θ θ T 3 (θ 3 ) Level T (θ ) Level δ(θ 3 ) δ(θ )
11 Nonlinear Regression: Half-Day 11 / 8 Example: Cellulose membrane (6) Wald-type CI profile -type CI R Output: Parameters: Value Std. Error t value θ θ θ θ Residual standard error:.93 on 35 df R Output: > confint(mem.fit) Waiting for profiling to be done....5% 97.5% θ θ θ θ Approximate 95% confidence intervals ( θk ± se θk q t ) θ 1: [163.5, ] θ : [159.6, 16.11] θ 3: [1.9, 3.5] θ : [-.65, -.37] θ 1: [163.7, ] θ : [159.36, 16.1] θ 3: [1.93, 3.6] θ : [-.69, -.38]
12 Nonlinear Regression: Half-Day 1 / 8 Likelihood Profile Traces Likelihood profile traces are another useful tool. The Parameter θ j, estimated at θ k = θk (k) hence the notation θ j θk. (k j), is evaluated as a function; Remember: min S θ 1,..., θk,..., θp = S θ1,..., θk 1, θk, θk+1,..., θp short = Sk θk {θ h,h k} θ (k) j Plot the profile trace versus θk but reflected at the 5 line; that is y-coordinate vs x-coordinate θ (k) j vs θk overlaid by θj vs θ(j) k overlaid by the profile trace θ (j) k versus θ j
13 Nonlinear Regression: Half-Day 13 / 8 Examples of Likelihood Profile Traces Likelihood Profile Traces for the example Puromycin (left) and the example Biochemical Oxygen Demand (right), complemented by he 8%- and 95% confidence region (gray curve) θ θ θ θ 1
14 Nonlinear Regression: Half-Day 1 / 8 Properties of Likelihood Profile Traces With linear regression: The profile traces are two straight lines. The angle between these two lines represents the correlation between the estimated parameters corresponding to the lines If the correlation between the parameters is, then the lines are orthogonal to each other. If the correlation between the parameters is either 1 or -1, then the lines overlay. With nonlinear regression: Both traces may be curved. The heavier the traces deviated from a straight line, the more insufficient is the linear approximation and the inference based on it. The angle between these two traces at the intersection still represents the correlation between the two estimated parameters θ j and θ k.
15 Nonlinear Regression: Half-Day 15 / 8 Example Cellulose Membrane (7) θ 1 Profile t Plot and Profile Traces Traces for the bottom left corner: θ Red: θ(1) vs θ1 Green: θ vs θ() θ θ θ 1 θ θ 3 θ
16 Nonlinear Regression: Half-Day 16 / 8.3 Parameter Transformations In this section we study the effects of transforming the parameters. This topic is based on the fact that the mean regression function can usually be written down by mathematically equivalent expressions. For example The two expression for the Michaelis-Menten function are equivalent Hence θ 1x θ + x = x ϑ 1 + ϑ. x ϑ 1 = θ θ 1 and ϑ = 1 θ 1. Or, we have the two equivalent expressions θ 1e θ x = ϑ 1ϑ x hence, ϑ 1 = θ 1 and ϑ = e θ.
17 Nonlinear Regression: Half-Day 17 / 8 Motivation The parameters of the regression function are transformed to get rid of collinearities improve the convergence of the algorithm improve the linear approximation (e.g., the Wald-type asymptotic) which results in ( nicer profile traces ) and hence to obtain a better quality of the Wald-type confidence intervals Parameter transformation do not chance either the deterministic nor the stochastic part of the regression model! in contrast to variable transformations.
18 Nonlinear Regression: Half-Day 18 / 8 Constraints of the Parameter Domain Subject matter theory: Parameter domain is subject to constraints e.g., θ 1 >, a < θ b What to do? Ignore the constraints and observe whether the algorithm converge and where to. If it fails: Most of the constraints are such that they can be imposed by a suitable transformation of the concerned parameter
19 Nonlinear Regression: Half-Day 19 / 8 Examples of Constraints θ > : Trsf. θ φ = log θ θ = exp φ > for all φ h x; θ h x; e φ a < θ < b: Trsf. θ φ = log b θ θ a θ = a + b a 1+exp φ Let h x; θ = θ 1 e θx + θ 3 e θx with θ, θ > The two pairs of parameters (θ 1, θ ) and (θ 3, θ ) are exchangeable and may thus cause convergence problems Workaround: Impose the constraint θ < θ! Trsf. θ φ with θ 1 = φ 1, θ = e φ, θ 3 = φ 3, and θ = e φ (1 + e φ ) h x; (θ 1, φ, θ 3, φ ) T = θ 1 exp e φ x + θ 3 exp e φ (1 + e φ ) x
20 Nonlinear Regression: Half-Day / 8 Collinearity Example to show the problem: Let h x; θ = θ 1 e θx ( ) The partial derivatives ( matrix A) are θ 1 h x; θ = e θ x Hence a T 1 def = (e θ x 1,..., e θ x n ) θ h x; θ = θ 1 x e θ x a T def = ( θ 1 x 1 e θ x 1,..., θ 1 x n e θ x n ) The vectors a 1 and a incline to collinearity if all x i >. Reformulate ( ): h x; θ = θ 1 exp θ (x x + x ) Applying the reparametrization φ 1 def = θ 1 e θx und φ def = θ we obtain h x, φ = φ 1 exp φ (x x ). This functions results in (approximately) optimal matrix A if x = x is chosen.
21 Nonlinear Regression: Half-Day 1 / 8 Example Cellulose Membrane (7) θ 1 Profile t Plot and Profile Traces (Slide from Half-day ). θ θ3 and θ highly correlated Profile traces of θ and θ 3 as well as θ and θ are twisted clearly θ θ θ 1 θ θ 3 θ
22 Nonlinear Regression: Half-Day / 8 Example Cellulose Membrane (8) Regression function h x, θ = θ1 + θ 1θ 3+θ ((x i x )+x ) θ 3+θ ((x i x )+x ) Remove collinearity by introducing θ 3 def = θ 3 + θ x, where x = median x i : h x, θ = θ1 + θ 1 θ 3+θ (x i x ) θ 3+θ (x i x ) Improve linear approximation: Def Step 1: Introduce θ = 1 θ : h x, θ = θ1 + θ 1 θ3 θ(x i x θ 3 θ(x i x ) Step : θ 1 Def = θ1 + θ 1 θ 3 1 θ 3 + 1, θ Def = log 1 ( θ1 θ 1 θ θ3 (x i x ) ) h x, θ = θ1 +1 θ 1 θ ) θ 3 θ (x i x )
23 Nonlinear Regression: Half-Day 3 / 8 θ 1 Example Cellulose Membrane (9) Profile t functions and profile traces after reparametrization. θ θ θ θ 1 θ θ 3 θ
24 Nonlinear Regression: Half-Day / 8 Example Cellulose Membrane (1) Original parametrization Parameters: Value Std. Error t value θ θ θ θ Residual standard error: on 35 df Correlation of Parameter Estimates: θ 1 θ θ 3 θ -.56 θ θ Reparametrized Parameters: Value Std. Error t value θ θ θ θ Residual standard error:.931 on 35 df Correlation of Parameter Estimates: θ 1 θ θ3 θ θ θ
25 Nonlinear Regression: Half-Day 5 / 8 Successful Reparametrization A successful reparametrization depends both on the regression function and on the data set There are no general guidelines which results in a tedious search for successful reparameterisations. Another Example: h x, θ = = θ 1θ 3(x () x (3) ) 1 + θ x (1) + θ 3x () + θ x (3) ( ) x () x (3) 1 θ 1 θ 3 + θ θ 1 θ 3 x (1) + θ 3 θ 1 θ 3 x () + θ θ 1 θ 3 x (3) = x () x (3) φ 1 + φ x (1) + φ 3x () + φ x (3) ( ) The parametrization ( ) is preferd to ( ) in most cases (cf. exercises).
26 Nonlinear Regression: Half-Day 6 / 8 Interpretation? In most cases, the original parameters have a physical interpretation parameter must be back-transformed Standard approach for back-transformation: Example: Used parameter transformation: θ φ = ln θ Let φ and σ φ the estimated parameters. Estimate θ by θ = exp φ. Its standard error is obtained commonly by Gauss law of error propagation (cf. Stahel, Sec 6.1): ( ) σ θ exp φ ( σ φ = exp φ ) φ σ exp φ φ σ θ σ φ. φ= φ Hence, an approximate 95% confidence interval for θ is: g φ ( ) ± σ θ q t n p.975 = exp φ 1 ± q σ φ t n p.975. ( ) But this approach is not recommended because... see next slide
27 Nonlinear Regression: Half-Day 7 / 8 Why Parameter Transformation? so that the parameter falls within a predefined domain. Confidence intervals according to ( ) may violate this requirement! due to the insufficient quality of the confidence interval Gauss law of error propagation will nullify the achievements by the reparametrization since it uses the same linear approximation as the Wald-type asymptotic! Alternatives to the standard approach: Back-transformation of the complete confidence interval; Example: { θ : ln θ φ ± σ φ qt df.975 } forms a better, but still approximate 95% confidence interval for θ. It is identical to [ ] = exp φ σ φ qt df.975, exp φ + σ φ qt df.975, since ln/exp is strictly increasing. In the nd case, the most convenient approach is to form the confidence interval based on the profile t function.
28 Nonlinear Regression: Half-Day 8 / 8 Take Home Message Half-Day The commonly used confidence intervals are based on a (crude) linear approximation. Use graphical tools like profile t plots and profile traces to assess the quality of the approximated confidence intervals (and hence the linear approximation). If insufficient: More accurate confidence intervals can be calculated for single parameters θ k by using profile t functions (as in confint() implemented anyway). Convergence properties of the estimating algorithm and the quality of the Wald-type conference intervals can be improved by applying suitable reparametrizations (parameter transformations). If the interpretation of the original parameters is crucial, then the confidence interval should also be backtransformed and not be determined by Gauss law of error propagation.
Nonlinear Regression:
Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day 1: Estimation and
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationPOLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.
Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression
More informationChapter 4: Statistical Hypothesis Testing
Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationUsing R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More information2 Sample t-test (unequal sample sizes and unequal variances)
Variations of the t-test: Sample tail Sample t-test (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing
More informationINTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationHypothesis testing - Steps
Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
More information1. Then f has a relative maximum at x = c if f(c) f(x) for all values of x in some
Section 3.1: First Derivative Test Definition. Let f be a function with domain D. 1. Then f has a relative maximum at x = c if f(c) f(x) for all values of x in some open interval containing c. The number
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationKSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationStandard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationChapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:
Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationHow To Test For Significance On A Data Set
Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationConsider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.
Consider a study in which How many subjects? The importance of sample size calculations Office of Research Protections Brown Bag Series KB Boomer, Ph.D. Director, boomer@stat.psu.edu A researcher conducts
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationHYPOTHESIS TESTING: POWER OF THE TEST
HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,
More informationLinear Models and Conjoint Analysis with Nonlinear Spline Transformations
Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationRegression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
More informationChapter 2 Probability Topics SPSS T tests
Chapter 2 Probability Topics SPSS T tests Data file used: gss.sav In the lecture about chapter 2, only the One-Sample T test has been explained. In this handout, we also give the SPSS methods to perform
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationBack Analysis of Material Properties
Back Analysis of Material Properties 23-1 Back Analysis of Material Properties This tutorial will demonstrate how to perform back analysis of material properties using sensitivity analysis or probabilistic
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationresearch/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other
1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationSCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationComparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationTests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationCURVE FITTING LEAST SQUARES APPROXIMATION
CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
More informationDomain of a Composition
Domain of a Composition Definition Given the function f and g, the composition of f with g is a function defined as (f g)() f(g()). The domain of f g is the set of all real numbers in the domain of g such
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationElements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationFactor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models
Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationDetermining distribution parameters from quantiles
Determining distribution parameters from quantiles John D. Cook Department of Biostatistics The University of Texas M. D. Anderson Cancer Center P. O. Box 301402 Unit 1409 Houston, TX 77230-1402 USA cook@mderson.org
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
More informationNumerical methods for American options
Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment
More informationWeek 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationHow Far is too Far? Statistical Outlier Detection
How Far is too Far? Statistical Outlier Detection Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 30-325-329 Outline What is an Outlier, and Why are
More informationPearson s Correlation
Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the
More informationNon-Parametric Tests (I)
Lecture 5: Non-Parametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent
More informationProjects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
More informationVISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA
VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
More informationStatistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationMultiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
More information3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
More informationIntroduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationGenerating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010
Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationIndependent t- Test (Comparing Two Means)
Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent
More informationOutline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test
The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationReflection and Refraction
Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationLecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
More informationLecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More information