n + n log(2π) + n log(rss/n)
|
|
|
- Loraine Beasley
- 9 years ago
- Views:
Transcription
1 There is a discrepancy in R output from the functions step, AIC, and BIC over how to compute the AIC. The discrepancy is not very important, because it involves a difference of a constant factor that cancels when using AIC or BIC to compare two models. But it might be helpful to understand the differences so that you can compare output from these two functions. AIC and BIC are based on the maximum likelihood estimates of the model parameters. In maximum likelihood, the idea is to estimate parameters so that, under the model, the probability of the observed data would be as large as possible. The likelihood is this probability, and will always be between 0 and 1. It is common to consider likelihoods on a log scale. Logarithms of numbers between 0 and 1 are negative, so log-likelihoods are negative numbers. It is also common to multiply log-likelihoods by 2, for reasons we will not explore. In a regression setting, the estimates of the β i based on least squares and the maximum likelihood estimates are identical. The difference comes from estimating the common variance σ 2 of the normal distribution for the errors around the true means. We have been using the best unbiased estimator of σ 2, ˆσ 2 = RSS/(n p) where there are p parameters for the means (p different β i parameters) and RSS is the residual sum of squares. This estimate does not tend to be too large or too small on average. The maximum likelihood estimate, on the other hand, is RSS/n. This estimate has a slight negative bias, but also has a smaller variance. Putting all of this together, we can write 2 times the log-likelihood to be n + n log(2π) + n log(rss/n) in a regression setting. Now, AIC is defined to be 2 times the log-likelihood plus 2 times the number of parameters. If there are p different β i parameters, there are a total of p + 1 parameters if we also count σ 2. The correct formula for the AIC for a model with parameters β 0,..., β p 1 and σ 2 is and the correct formula for BIC is AIC = n + n log 2π + n log(rss/n) + 2(p + 1) BIC = n + n log 2π + n log(rss/n) + (log n)(p + 1) This is what the functions AIC and BIC calculate in R. The AIC and BIC formulas in your textbook ignore the leading two terms n + n log 2π and use p instead of p + 1. When comparing AIC or BIC between two models, however, it makes no difference which formula you use because the differences will be the same regardless which choice you make. > case1201 = read.table("sleuth/case1201.csv", header = T, sep = ",") > attach(case1201) > keep <- STATE!= "Alaska" > x <- data.frame(sat = SAT[keep], ltakers = log(takers[keep]), + income = INCOME[keep], years = YEARS[keep], public = PUBLIC[keep], + expend = EXPEND[keep], rank = RANK[keep]) > detach(case1201) > attach(x) Example Computation in R AIC is part of the base package. You can find the BIC using the AIC function with the option k = log(n), or, you can load the nonlinear mixed effects library and call the BIC function directly. Here is an example that demonsrates the above ideas. > library(nlme)
2 Loading required package: nls Loading required package: lattice Loading required package: grid > n <- nrow(x) > fit0 <- lm(sat ~ 1) > summary(fit0) lm(formula = SAT ~ 1) Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: 71.5 on 48 degrees of freedom > rss0 <- sum(residuals(fit0)^2) > n + n * log(2 * pi) + n * log(rss0/n) + 2 * 2 [1] > AIC(fit0) [1] > n + n * log(2 * pi) + n * log(rss0/n) + log(n) * 2 [1] > AIC(fit0, k = log(n)) [1] > BIC(fit0) [1] > fit1 <- lm(sat ~ ltakers) > summary(fit1) lm(formula = SAT ~ ltakers)
3 Residuals: Min 1Q Median 3Q Max Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** ltakers <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 47 degrees of freedom Multiple R-Squared: 0.811, Adjusted R-squared: F-statistic: on 1 and 47 DF, p-value: < 2.2e-16 > rss1 <- sum(residuals(fit1)^2) > n + n * log(2 * pi) + n * log(rss1/n) + 2 * 3 [1] > AIC(fit1) [1] > n + n * log(2 * pi) + n * log(rss0/n) + log(n) * 3 [1] > AIC(fit1, k = log(n)) [1] > BIC(fit1) [1] The criteria AIC and BIC will not always lead to the same model. Compare the results from these forward selections. Forward Selection using AIC > step(lm(sat ~ 1), SAT ~ ltakers + income + years + public + expend + + rank, direction = "forward") Start: AIC= SAT ~ 1 + ltakers rank
4 + income years <none> public expend Step: AIC= SAT ~ ltakers + expend years <none> rank income public Step: AIC= SAT ~ ltakers + expend + years rank <none> income public Step: AIC= SAT ~ ltakers + expend + years + rank <none> public income Step: AIC= SAT ~ ltakers + expend + years + rank <none> income public lm(formula = SAT ~ ltakers + expend + years + rank)
5 (Intercept) ltakers expend years rank Forward Selection using BIC > n <- nrow(x) > step(lm(sat ~ 1), SAT ~ ltakers + income + years + public + expend + + rank, direction = "forward", k = log(n)) Start: AIC= SAT ~ 1 + ltakers rank income years <none> public expend Step: AIC= SAT ~ ltakers + expend years <none> rank income public Step: AIC= SAT ~ ltakers + expend <none> years rank income public lm(formula = SAT ~ ltakers + expend) (Intercept) ltakers expend
6 Comparison Notice that both approaches begin by first adding ltakers and then adding expend. But at this point, the AIC and BIC criteria lead to different decisions. The best new variable to add by other criterion is years. The change in log-likelihood is this. > fit2 <- lm(sat ~ ltakers + expend) > rss2 <- sum(residuals(fit2)^2) > fit3 <- lm(sat ~ ltakers + expend + years) > rss3 <- sum(residuals(fit3)^2) > n * log(rss3/n) - n * log(rss2/n) [1] This difference is larger than 2, but smaller than log(n) = , so BIC is not willing to pay the penalty, but AIC is. Discussion Here is a summary of some key ideas. 1. The models selected by forward selection, backwards elimination, and stepwise regression might not be the same, even using the same model selection criterion. 2. In a forward selection or a backwards elimination procedure, BIC may result in fewer parameters in the model than AIC. 3. The forward selection, backward elimination, and stepwisse regression procedures are not guaranteed to find the best model according to the AIC or BIC criterion. 4. P-values in resultant models should be treated with more than usual caution, because they do not reflect the model selection process. 5. Generally, there may be several models that are highly similar in the quality of the fit. A rich model with all of the variables can be made richer by considering interactions. You can try this example yourself and learn that the backward elimination procedure beginning with all main effects and two-way interactions produces a model with a very different set of variables than the analysis that assumes no interactions. > step(lm(sat ~ (ltakers + income + years + public + expend + rank)^2), + direction = "backward")
Multiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
Comparing Nested Models
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
Exchange Rate Regime Analysis for the Chinese Yuan
Exchange Rate Regime Analysis for the Chinese Yuan Achim Zeileis Ajay Shah Ila Patnaik Abstract We investigate the Chinese exchange rate regime after China gave up on a fixed exchange rate to the US dollar
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
We extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Lucky vs. Unlucky Teams in Sports
Lucky vs. Unlucky Teams in Sports Introduction Assuming gambling odds give true probabilities, one can classify a team as having been lucky or unlucky so far. Do results of matches between lucky and unlucky
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is
In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing
ANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
Testing for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
Week 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
MIXED MODEL ANALYSIS USING R
Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
Psychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was
An Introduction to Spatial Regression Analysis in R. Luc Anselin University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.
An Introduction to Spatial Regression Analysis in R Luc Anselin University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu May 23, 2003 Introduction This note contains a brief introduction and
Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS
Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,
Part II. Multiple Linear Regression
Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations
Introduction to Hierarchical Linear Modeling with R
Introduction to Hierarchical Linear Modeling with R 5 10 15 20 25 5 10 15 20 25 13 14 15 16 40 30 20 10 0 40 30 20 10 9 10 11 12-10 SCIENCE 0-10 5 6 7 8 40 30 20 10 0-10 40 1 2 3 4 30 20 10 0-10 5 10 15
Simple Linear Regression
Chapter Nine Simple Linear Regression Consider the following three scenarios: 1. The CEO of the local Tourism Authority would like to know whether a family s annual expenditure on recreation is related
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Using R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Logistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
Implementing Panel-Corrected Standard Errors in R: The pcse Package
Implementing Panel-Corrected Standard Errors in R: The pcse Package Delia Bailey YouGov Polimetrix Jonathan N. Katz California Institute of Technology Abstract This introduction to the R package pcse is
4. Multiple Regression in Practice
30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look
Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
Random effects and nested models with SAS
Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
SPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
N-Way Analysis of Variance
N-Way Analysis of Variance 1 Introduction A good example when to use a n-way ANOVA is for a factorial design. A factorial design is an efficient way to conduct an experiment. Each observation has data
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
How Far is too Far? Statistical Outlier Detection
How Far is too Far? Statistical Outlier Detection Steven Walfish President, Statistical Outsourcing Services [email protected] 30-325-329 Outline What is an Outlier, and Why are
Chapter 3 Quantitative Demand Analysis
Managerial Economics & Business Strategy Chapter 3 uantitative Demand Analysis McGraw-Hill/Irwin Copyright 2010 by the McGraw-Hill Companies, Inc. All rights reserved. Overview I. The Elasticity Concept
Outline. Dispersion Bush lupine survival Quasi-Binomial family
Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the
Financial Risk Models in R: Factor Models for Asset Returns. Workshop Overview
Financial Risk Models in R: Factor Models for Asset Returns and Interest Rate Models Scottish Financial Risk Academy, March 15, 2011 Eric Zivot Robert Richards Chaired Professor of Economics Adjunct Professor,
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
Final Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
Coefficient of Determination
Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed
Module 5: Introduction to Multilevel Modelling. R Practical. Introduction to the Scottish Youth Cohort Trends Dataset. Contents
Introduction Module 5: Introduction to Multilevel Modelling R Practical Camille Szmaragd and George Leckie 1 Centre for Multilevel Modelling Some of the sections within this module have online quizzes
MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.
MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Regression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
This chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
Time-Series Regression and Generalized Least Squares in R
Time-Series Regression and Generalized Least Squares in R An Appendix to An R Companion to Applied Regression, Second Edition John Fox & Sanford Weisberg last revision: 11 November 2010 Abstract Generalized
Multi Factors Model. Daniel Herlemont. March 31, 2009. 2 Estimating using Ordinary Least Square regression 3
Multi Factors Model Daniel Herlemont March 31, 2009 Contents 1 Introduction 1 2 Estimating using Ordinary Least Square regression 3 3 Multicollinearity 6 4 Estimating Fundamental Factor Models by Orthogonal
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Penalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Stock Price Forecasting Using Information from Yahoo Finance and Google Trend
Stock Price Forecasting Using Information from Yahoo Finance and Google Trend Selene Yue Xu (UC Berkeley) Abstract: Stock price forecasting is a popular and important topic in financial and academic studies.
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
Hypothesis testing - Steps
Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
Multiple Regression. Page 24
Multiple Regression Multiple regression is an extension of simple (bi-variate) regression. The goal of multiple regression is to enable a researcher to assess the relationship between a dependent (predicted)
Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller
Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive
Module 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
BIOL 933 Lab 6 Fall 2015. Data Transformation
BIOL 933 Lab 6 Fall 2015 Data Transformation Transformations in R General overview Log transformation Power transformation The pitfalls of interpreting interactions in transformed data Transformations
Didacticiel - Études de cas
1 Topic Regression analysis with LazStats (OpenStat). LazStat 1 is a statistical software which is developed by Bill Miller, the father of OpenStat, a wellknow tool by statisticians since many years. These
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Time Series Analysis
Time Series 1 April 9, 2013 Time Series Analysis This chapter presents an introduction to the branch of statistics known as time series analysis. Often the data we collect in environmental studies is collected
Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015
Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field
Ordinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
SAS Syntax and Output for Data Manipulation:
Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining
Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg
Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )
Fitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,
Difference of Means and ANOVA Problems
Difference of Means and Problems Dr. Tom Ilvento FREC 408 Accounting Firm Study An accounting firm specializes in auditing the financial records of large firm It is interested in evaluating its fee structure,particularly
Curve Fitting. Before You Begin
Curve Fitting Chapter 16: Curve Fitting Before You Begin Selecting the Active Data Plot When performing linear or nonlinear fitting when the graph window is active, you must make the desired data plot
Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)
Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria
Paper SA01_05 SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN
Handling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
Section 6: Model Selection, Logistic Regression and more...
Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building
Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.
Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
