Adatelemzés II. [SST35]

Size: px
Start display at page:

Download "Adatelemzés II. [SST35]"

Transcription

1 Adatelemzés II. [SST35] Idősorok 0. Lőw András október 26. Lőw A Adatelemzés II. [SST35] október / 24

2 Vázlat 1 Honnan tudja a summary, hogy éppen mi a dolga? 2 lubridate Dátumok és idők könnyedén 3 Objektumorientáltság az R-ben 4 Egy apró példa S4-ben Lőw A Adatelemzés II. [SST35] október / 24

3 Mellékelt adatok: Orange Így alakult az öt narancsfa átmérője: > summary(orange Tree age circumference 3:7 Min. : Min. : :7 1st Qu.: st Qu.: :7 Median : Median : :7 Mean : Mean : :7 3rd Qu.: rd Qu.:161.5 Max. : Max. :214.0 > str(orange Classes nfngroupeddata, nfgroupeddata, groupeddata and 'data.frame': 35 obs. of 3 variables: $ Tree : Ord.factor w/ 5 levels "3"<"1"<"5"<"2"<..: $ age : num $ circumference: num attr(*, "formula"=class 'formula' length 3 circumference ~ age Tree....- attr(*, ".Environment"=<environment: R_EmptyEnv> - attr(*, "labels"=list of 2..$ x: chr "Time since December 31, 1968"..$ y: chr "Trunk circumference" - attr(*, "units"=list of 2..$ x: chr "(days"..$ y: chr "(mm" Forrás: Draper, N. R. and Smith, H. (1998, Applied Regression Analysis (3rd ed, Wiley (exercise 24.N. Pinheiro, J. C. and Bates, D. M. (2000 Mixed-e ects Models in S and S-PLUS, Springer. Lőw A Adatelemzés II. [SST35] október / 24

4 Így gyarapodtak a narancsfák: 3 Orange dataset by Trees circumference [mm] age [days] Lőw A Adatelemzés II. [SST35] október / 24

5 Az 5 fa 7 alkalommal megmért kerülete: Orange dataset 4 5 count count age circumference age Tree 3 4 count 2 count circumference age Lőw A Adatelemzés II. [SST35] október / 24

6 Mellékelt adatok: Loblolly Ez pedig néhány fenyőfa magasságáról szól: > summary(loblolly height age Seed Min. : 3.46 Min. : : 6 1st Qu.: st Qu.: : 6 Median :34.00 Median : : 6 Mean :32.36 Mean : : 6 3rd Qu.: rd Qu.: : 6 Max. :64.10 Max. : : 6 (Other:48 > str(loblolly Classes nfngroupeddata, nfgroupeddata, groupeddata and 'data.frame': 84 obs. of 3 variables: $ height: num $ age : num $ Seed : Ord.factor w/ 14 levels "329"<"327"<"325"<..: attr(*, "formula"=class 'formula' length 3 height ~ age Seed....- attr(*, ".Environment"=<environment: R_EmptyEnv> - attr(*, "labels"=list of 2..$ x: chr "Age of tree"..$ y: chr "Height of tree" - attr(*, "units"=list of 2..$ x: chr "(yr"..$ y: chr "(ft" Forrás: Kung, F. H. (1986, Fitting logistic growth curve with predetermined carrying capacity, in Proceedings of the Statistical Computing Section, American Statistical Association, Pinheiro, J. C. and Bates, D. M. (2000 Mixed-e ects Models in S and S-PLUS, Springer. Lőw A Adatelemzés II. [SST35] október / 24

7 Így nőttek a fenyőfák: Loblolly dataset Seed height [ft] age [years] Lőw A Adatelemzés II. [SST35] október / 24

8 További summary: Linear Model(1 > summary(swiss Fertility Agriculture Examination Education Catholic Min. :35.00 Min. : 1.20 Min. : 3.00 Min. : 1.00 Min. : st Qu.: st Qu.: st Qu.: st Qu.: st Qu.: Median :70.40 Median :54.10 Median :16.00 Median : 8.00 Median : Mean :70.14 Mean :50.66 Mean :16.49 Mean :10.98 Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. :92.50 Max. :89.70 Max. :37.00 Max. :53.00 Max. : Infant.Mortality Min. : st Qu.:18.15 Median :20.00 Mean : rd Qu.:21.70 Max. :26.60 > summary(lm(fertility ~., data = swiss Call: lm(formula = Fertility ~., data = swiss Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t (Intercept e-07 *** Agriculture * Examination Education e-05 *** Catholic ** Infant.Mortality ** --- Signif. codes: 0 *** ** 0.01 * Lőw A Adatelemzés II. [SST35] október / 24

9 További summary: Generalized Linear Model(2 Treat Prewt Postwt CBT :29 Min. :70.00 Min. : Cont:26 1st Qu.: st Qu.: FT :17 Median :82.30 Median : Mean :82.41 Mean : rd Qu.: rd Qu.: Max. :94.90 Max. : > summary(glm(postwt ~ Prewt + Treat + offset(prewt, family = gaussian, data = anorexia Call: glm(formula = Postwt ~ Prewt + Treat + offset(prewt, family = gaussian, data = anorexia Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t (Intercept *** Prewt *** TreatCont * TreatFT * --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for gaussian family taken to be Null deviance: on 71 degrees of freedom Residual deviance: on 68 degrees of freedom AIC: Number of Fisher Scoring iterations: 2 Lőw A Adatelemzés II. [SST35] október / 24

10 További summary: table (3 > apply(ucbadmissions, c(1, 2, sum Gender Admit Male Female Admitted Rejected > UCBAdmissions,, Dept = A Gender Admit Male Female Admitted Rejected ,, Dept = B Gender Admit Male Female Admitted Rejected 207 8,, Dept = C Gender Admit Male Female Admitted Rejected > summary(ucbadmissions Number of cases in table: 4526 Number of factors: 3 Test for independence of all factors: Chisq = , df = 16, p-value = 0 > apply(ucbadmissions, 3, function(u summary(as.table(u $A Number of cases in table: 933 Number of factors: 2 Test for independence of all factors: Chisq = , df = 1, p-value = 3.28e-05 $B Number of cases in table: 585 Number of factors: 2 Test for independence of all factors: Chisq = , df = 1, p-value = $C Number of cases in table: 918 Number of factors: 2 Test for independence of all factors: Chisq = , df = 1, p-value = $D Number of cases in table: 792 Number of factors: 2,, Dept = D Test for independence of all factors: Chisq = , df = 1, p-value = Gender Admit Male Female $E Admitted Number of cases in table: 584 Rejected Lőw A Number Adatelemzés of factors: II. [SST35] október / 24

11 Vázlat 1 Honnan tudja a summary, hogy éppen mi a dolga? 2 lubridate Dátumok és idők könnyedén 3 Objektumorientáltság az R-ben 4 Egy apró példa S4-ben Lőw A Adatelemzés II. [SST35] október / 24

12 Mire jó a lubridate csomag? Az alap R: > ( r.date <- as.posixct(' ', format = '%d-%m-%y', tz = 'UTC' [1] " UTC" > as.numeric(format(r.date, '%m' [1] 1 > r.date <- as.posixct(format(r.date, '%Y-2-%d', tz = 'UTC' > r.date [1] " UTC" > seq(r.date, length = 2, by = '-1 day'[2] [1] " UTC" > as.posixct(format(as.posixct(r.date, tz = 'UTC', tz = 'GMT' [1] " GMT" A lubridate csomag: > ( l.date <- dmy(' ' [1] " UTC" > month(l.date [1] 1 > month(l.date <- 2 > l.date [1] " UTC" > l.date - days(1 [1] " UTC" > with_tz(l.date, 'GMT' [1] " GMT" Lőw A Adatelemzés II. [SST35] október / 24

13 Vázlat 1 Honnan tudja a summary, hogy éppen mi a dolga? 2 lubridate Dátumok és idők könnyedén 3 Objektumorientáltság az R-ben 4 Egy apró példa S4-ben Lőw A Adatelemzés II. [SST35] október / 24

14 Rövid történelem Az első megoldás 1990 körül jelent meg az S 3-as verziójában, ezért S3. https://github.com/hadley/devtools/wiki/s3 A következő verzióban vezették be a többszörös argumentumokat, javítottak az öröklődésen és absztraktabb lett: S4. https://github.com/hadley/devtools/wiki/s4 Tavaly lépett tovább ezen is az R: R5. https://github.com/hadley/devtools/wiki/r5 Lőw A Adatelemzés II. [SST35] október / 24

15 Vázlat 1 Honnan tudja a summary, hogy éppen mi a dolga? 2 lubridate Dátumok és idők könnyedén 3 Objektumorientáltság az R-ben 4 Egy apró példa S4-ben Lőw A Adatelemzés II. [SST35] október / 24

16 Fogyókúra-napló (1 Először készítsünk házilag idősort! Kell hozzá: kezdő időpont, záró időpont, adat (természetesen. > setclass('timeseries', representation( data = 'numeric', start = 'POSIXct', end = 'POSIXct' [1] "TimeSeries" > my.timeseries <- new('timeseries', data = c(1, 2, 3, 4, 5, 6, start = as.posixct('10/01/2011 0:00:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', end = as.posixct('10/01/2011 0:05:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S' > my.timeseries An object of class "TimeSeries" Slot "data": [1] Slot "start": [1] " UTC" Slot Lőw"end": A Adatelemzés II. [SST35] október / 24

17 Fogyókúra-napló (2 Mikor érvényes egy idősor? van kezdő időpontja, van záró időpontja, akezdetazárásnálkorábbi(természetesen. > setvalidity('timeseries', function(object { <= && == 1 && == 1 } Class "TimeSeries" [in ".GlobalEnv"] Slots: Name: data start end Class: numeric POSIXct POSIXct > validobject(my.timeseries [1] TRUE Lőw A Adatelemzés II. [SST35] október / 24

18 Fogyókúra-napló (3 Most már minden újabb elem felvételekor automatikusan lefut az érvényesség ellenőrzése is. Nem lehetünk ezért elég hálásak! > good.timeseries <- new('timeseries', data = c(7, 8, 9, 10, 11, 12, start = as.posixct('10/01/2011 0:06:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', end = as.posixct('10/01/2011 0:11:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S' > bad.timeseries <- new('timeseries', data = c(7, 8, 9, 10, 11, 12, start = as.posixct('10/01/2011 0:06:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', end = as.posixct('10/01/2008 0:11:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S' Error in validobject(.object : invalid class "TimeSeries" object: FALSE Lőw A Adatelemzés II. [SST35] október / 24

19 Fogyókúra-napló (4 Mekkora időszakot fog át az idősorunk? Mik az adatok? Az első függvényeink [method] az új osztályhoz. > period.timeseries <- function(object { if > 1 { - / - 1 } else { Inf } } > series <- function(object > setgeneric('series' [1] "series" > series(my.timeseries [1] > showmethods('series' Function: series (package.globalenv object="any" object="timeseries" (inherited from: object="any" Lőw A Adatelemzés II. [SST35] október / 24

20 Fogyókúra-napló (5 Kössük a period.timeseries-t, a TimeSeries osztályhoz! > period <- function(object > setgeneric('period' [1] "period" > setmethod(period, signature = 'TimeSeries', definition = period.timeseries [1] "period" attr(,"package" [1] ".GlobalEnv" > showmethods('period' Function: period (package.globalenv object="any" object="timeseries" > period(my.timeseries [1] :01:00 Lőw A Adatelemzés II. [SST35] október / 24

21 Fogyókúra-napló (6 A summary és a darabolás [ nem maradhat ki! > setmethod('summary', signature = 'TimeSeries', definition = function(object { ' to ', sep = '', collapse = '' sep = '', collapse = ', ' } [1] "summary" > summary(my.timeseries [1] " to :05:00" [1] "1, 2, 3, 4, 5, 6" > # > setmethod('[', signature = 'TimeSeries', definition = function(x, i, j,..., drop { } [1] "[" > my.timeseries[3] [1] 3 Lőw A Adatelemzés II. [SST35] október / 24

22 Fogyókúra-napló (7 Eddig még szó sem volt a naplóról! > setclass('weighthistory', representation( height = 'numeric', name = 'character', contains = 'TimeSeries' [1] "WeightHistory" > john.doe <- new('weighthistory', data = c(170, 169, 171, 168, 170, 169, start = as.posixct('08/14/2011 0:00:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', end = as.posixct('09/28/2011 0:00:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', height = 72, name = 'John Doe' > john.doe An object of class "WeightHistory" Slot "height": [1] 72 Slot "name": [1] "John Doe" Slot "data": [1] Slot "start": [1] " UTC" Slot "end": [1] " UTC" Lőw A Adatelemzés II. [SST35] október / 24

23 Fogyókúra-napló (8 További öröklések: > setclass( 'Person', representation( height = 'numeric', name = 'character' [1] "Person" > # > setclass( 'AltWeightHistory', contains = c('timeseries', 'Person' [1] "AltWeightHistory" > # > setclass( 'Cat', representation( breed = 'character', name = 'character' [1] "Cat" > # > setclassunion( 'NamedThing', c('person','cat' [1] "NamedThing" Lőw A Adatelemzés II. [SST35] október / 24

24 Fogyókúra-napló (9 Minden együtt: > jane.doe <- new('altweighthistory', data = c(130, 129, 131, 128, 130, 129, start = as.posixct('08/14/2011 0:00:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', end = as.posixct('09/28/2011 0:00:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', height = 67, name = 'Jane Doe' > jane.doe An object of class "AltWeightHistory" Slot "data": [1] Slot "start": [1] " UTC" Slot "end": [1] " UTC" Slot "height": [1] 67 Slot "name": [1] "Jane Doe" > is(jane.doe,'namedthing' [1] TRUE > is(john.doe,'timeseries' [1] TRUE Lőw A Adatelemzés II. [SST35] október / 24

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Lab 13: Logistic Regression

Lab 13: Logistic Regression Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account

More information

MIXED MODEL ANALYSIS USING R

MIXED MODEL ANALYSIS USING R Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Hands on S4 Classes. Yohan Chalabi. R/Rmetrics Workshop Meielisalp June 2009. ITP ETH, Zurich Rmetrics Association, Zurich Finance Online, Zurich

Hands on S4 Classes. Yohan Chalabi. R/Rmetrics Workshop Meielisalp June 2009. ITP ETH, Zurich Rmetrics Association, Zurich Finance Online, Zurich Hands on S4 Classes Yohan Chalabi ITP ETH, Zurich Rmetrics Association, Zurich Finance Online, Zurich R/Rmetrics Workshop Meielisalp June 2009 Outline 1 Introduction 2 S3 Classes/Methods 3 S4 Classes/Methods

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

Poisson Regression. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. May 1, 2007

Poisson Regression. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. May 1, 2007 Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison May 1, 2007 Statistics 572 (Spring 2007) May 1, 2007 1 / 16 Poisson regression is a form of a generalized linear model

More information

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

All Models are wrong, but some are useful.

All Models are wrong, but some are useful. 1 Goodness of Fit in Logistic Regression As in linear regression, goodness of fit in logistic regression attempts to get at how well a model fits the data. It is usually applied after a final model has

More information

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech. MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

More information

Régression logistique : introduction

Régression logistique : introduction Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence

More information

Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller

More information

Logistic Models in R

Logistic Models in R Logistic Models in R Jim Bentley 1 Sample Data The following code reads the titanic data that we will use in our examples. > titanic = read.csv( + "http://bulldog2.redlands.edu/facultyfolder/jim_bentley/downloads/math111/titanic.csv

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

Outline. Dispersion Bush lupine survival Quasi-Binomial family

Outline. Dispersion Bush lupine survival Quasi-Binomial family Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Lucky vs. Unlucky Teams in Sports

Lucky vs. Unlucky Teams in Sports Lucky vs. Unlucky Teams in Sports Introduction Assuming gambling odds give true probabilities, one can classify a team as having been lucky or unlucky so far. Do results of matches between lucky and unlucky

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

A survey analysis example

A survey analysis example A survey analysis example Thomas Lumley November 20, 2013 This document provides a simple example analysis of a survey data set, a subsample from the California Academic Performance Index, an annual set

More information

n + n log(2π) + n log(rss/n)

n + n log(2π) + n log(rss/n) There is a discrepancy in R output from the functions step, AIC, and BIC over how to compute the AIC. The discrepancy is not very important, because it involves a difference of a constant factor that cancels

More information

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p. Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

More information

Logistic regression (with R)

Logistic regression (with R) Logistic regression (with R) Christopher Manning 4 November 2007 1 Theory We can transform the output of a linear regression to be suitable for probabilities by using a logit link function on the lhs as

More information

Exchange Rate Regime Analysis for the Chinese Yuan

Exchange Rate Regime Analysis for the Chinese Yuan Exchange Rate Regime Analysis for the Chinese Yuan Achim Zeileis Ajay Shah Ila Patnaik Abstract We investigate the Chinese exchange rate regime after China gave up on a fixed exchange rate to the US dollar

More information

BIOSTAT640 R-Solution for HW7 Logistic Minming Li & Steele H. Valenzuela Mar.10, 2016

BIOSTAT640 R-Solution for HW7 Logistic Minming Li & Steele H. Valenzuela Mar.10, 2016 BIOSTAT640 R-Solution for HW7 Logistic Minming Li & Steele H. Valenzuela Mar.10, 2016 Required Libraries If there is an error in loading the libraries, you must first install the packaged library (install.packages("package

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form. One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.

More information

Lecture 8: Gamma regression

Lecture 8: Gamma regression Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

More information

Regression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Regression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Regression in ANOVA James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Regression in ANOVA 1 Introduction 2 Basic Linear

More information

How to calculate an ANOVA table

How to calculate an ANOVA table How to calculate an ANOVA table Calculations by Hand We look at the following example: Let us say we measure the height of some plants under the effect of different fertilizers. Treatment Measures Mean

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Paired Differences and Regression

Paired Differences and Regression Paired Differences and Regression Students sometimes have difficulty distinguishing between paired data and independent samples when comparing two means. One can return to this topic after covering simple

More information

Stat 411/511 ANOVA & REGRESSION. Charlotte Wickham. stat511.cwick.co.nz. Nov 31st 2015

Stat 411/511 ANOVA & REGRESSION. Charlotte Wickham. stat511.cwick.co.nz. Nov 31st 2015 Stat 411/511 ANOVA & REGRESSION Nov 31st 2015 Charlotte Wickham stat511.cwick.co.nz This week Today: Lack of fit F-test Weds: Review email me topics, otherwise I ll go over some of last year s final exam

More information

Lecture 6: Poisson regression

Lecture 6: Poisson regression Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression

More information

Statistics Chapter 2

Statistics Chapter 2 Statistics 9055 Chapter 2 Example: Children and Malaria A random sample of 100 children aged 3 15 years was taken from a village in Ghana. The children were followed for a period of eight months. At the

More information

Simple example of collinearity in logistic regression

Simple example of collinearity in logistic regression 1 Confounding and Collinearity in Multivariate Logistic Regression We have already seen confounding and collinearity in the context of linear regression, and all definitions and issues remain essentially

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

How Do We Test Multiple Regression Coefficients?

How Do We Test Multiple Regression Coefficients? How Do We Test Multiple Regression Coefficients? Suppose you have constructed a multiple linear regression model and you have a specific hypothesis to test which involves more than one regression coefficient.

More information

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was

More information

Simple Linear Regression One Binary Categorical Independent Variable

Simple Linear Regression One Binary Categorical Independent Variable Simple Linear Regression Does sex influence mean GCSE score? In order to answer the question posed above, we want to run a linear regression of sgcseptsnew against sgender, which is a binary categorical

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

And sample sizes > tapply(count, spray, length) A B C D E F And a boxplot: > boxplot(count ~ spray) How does the data look?

And sample sizes > tapply(count, spray, length) A B C D E F And a boxplot: > boxplot(count ~ spray) How does the data look? ANOVA in R 1-Way ANOVA We re going to use a data set called InsectSprays. 6 different insect sprays (1 Independent Variable with 6 levels) were tested to see if there was a difference in the number of

More information

Introduction to Hierarchical Linear Modeling with R

Introduction to Hierarchical Linear Modeling with R Introduction to Hierarchical Linear Modeling with R 5 10 15 20 25 5 10 15 20 25 13 14 15 16 40 30 20 10 0 40 30 20 10 9 10 11 12-10 SCIENCE 0-10 5 6 7 8 40 30 20 10 0-10 40 1 2 3 4 30 20 10 0-10 5 10 15

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Module 5: Introduction to Multilevel Modelling. R Practical. Introduction to the Scottish Youth Cohort Trends Dataset. Contents

Module 5: Introduction to Multilevel Modelling. R Practical. Introduction to the Scottish Youth Cohort Trends Dataset. Contents Introduction Module 5: Introduction to Multilevel Modelling R Practical Camille Szmaragd and George Leckie 1 Centre for Multilevel Modelling Some of the sections within this module have online quizzes

More information

Lecture 5 Hypothesis Testing in Multiple Linear Regression

Lecture 5 Hypothesis Testing in Multiple Linear Regression Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004 Types of tests 1 Overall test Test for addition of a single variable Test for addition of a group of variables Overall

More information

Bivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2

Bivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2 Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS t-test X 2 X 2 AOVA (F-test) t-test AOVA

More information

Testing for Lack of Fit

Testing for Lack of Fit Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

More information

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing

More information

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

More information

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

More information

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015 Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Math 141. Lecture 24: Model Comparisons and The F-test. Albyn Jones 1. 1 Library jones/courses/141

Math 141. Lecture 24: Model Comparisons and The F-test. Albyn Jones 1. 1 Library jones/courses/141 Math 141 Lecture 24: Model Comparisons and The F-test Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 Nested Models Two linear models are Nested if one (the restricted

More information

Data handling rules for RMR, TEE and AREE Residuals

Data handling rules for RMR, TEE and AREE Residuals RMR Residuals One of the two Primary Endpoints of the study is resting metabolic rate (RMR) corrected for changes in body composition. This endpoint variable is referred to as the RMR Residual. To correct

More information

Regression III: Dummy Variable Regression

Regression III: Dummy Variable Regression Regression III: Dummy Variable Regression Tom Ilvento FREC 408 Linear Regression Assumptions about the error term Mean of Probability Distribution of the Error term is zero Probability Distribution of

More information

An Introduction to Spatial Regression Analysis in R. Luc Anselin University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.

An Introduction to Spatial Regression Analysis in R. Luc Anselin University of Illinois, Urbana-Champaign http://sal.agecon.uiuc. An Introduction to Spatial Regression Analysis in R Luc Anselin University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu May 23, 2003 Introduction This note contains a brief introduction and

More information

Financial Risk Models in R: Factor Models for Asset Returns. Workshop Overview

Financial Risk Models in R: Factor Models for Asset Returns. Workshop Overview Financial Risk Models in R: Factor Models for Asset Returns and Interest Rate Models Scottish Financial Risk Academy, March 15, 2011 Eric Zivot Robert Richards Chaired Professor of Economics Adjunct Professor,

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models to Generalized Linear Models Heather Turner ESRC National Centre for Research Methods, UK and Department of Statistics University of Warwick, UK WU, 2008 04 22-24 Copyright c Heather Turner, 2008 to Generalized

More information

ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media

ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media Abstract: The growth of social media is astounding and part of that success was

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Introduction to Stata

Introduction to Stata Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Statistics II Final Exam - January Use the University stationery to give your answers to the following questions.

Statistics II Final Exam - January Use the University stationery to give your answers to the following questions. Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly

More information

Examining a Fitted Logistic Model

Examining a Fitted Logistic Model STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic

More information

Correlational Research

Correlational Research Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

Statistics for Management II-STAT 362-Final Review

Statistics for Management II-STAT 362-Final Review Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Can Annuity Purchase Intentions Be Influenced?

Can Annuity Purchase Intentions Be Influenced? Can Annuity Purchase Intentions Be Influenced? Jodi DiCenzo, CFA, CPA Behavioral Research Associates, LLC Suzanne Shu, Ph.D. UCLA Anderson School of Management Liat Hadar, Ph.D. The Arison School of Business,

More information

Logistic and Poisson Regression: Modeling Binary and Count Data. Statistics Workshop Mark Seiss, Dept. of Statistics

Logistic and Poisson Regression: Modeling Binary and Count Data. Statistics Workshop Mark Seiss, Dept. of Statistics Logistic and Poisson Regression: Modeling Binary and Count Data Statistics Workshop Mark Seiss, Dept. of Statistics March 3, 2009 Presentation Outline 1. Introduction to Generalized Linear Models 2. Binary

More information

Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet Econ 371 Problem Set #3 Answer Sheet 4.1 In this question, you are told that a OLS regression analysis of third grade test scores as a function of class size yields the following estimated model. T estscore

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction

More information

R: A Free Software Project in Statistical Computing

R: A Free Software Project in Statistical Computing R: A Free Software Project in Statistical Computing Achim Zeileis Institut für Statistik & Wahrscheinlichkeitstheorie http://www.ci.tuwien.ac.at/~zeileis/ Acknowledgments Thanks: Alex Smola & Machine Learning

More information

Electronic Thesis and Dissertations UCLA

Electronic Thesis and Dissertations UCLA Electronic Thesis and Dissertations UCLA Peer Reviewed Title: A Multilevel Longitudinal Analysis of Teaching Effectiveness Across Five Years Author: Wang, Kairong Acceptance Date: 2013 Series: UCLA Electronic

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

E205 Final: Version B

E205 Final: Version B Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random

More information

Applications of R Software in Bayesian Data Analysis

Applications of R Software in Bayesian Data Analysis Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx

More information

Introduction of geospatial data visualization and geographically weighted reg

Introduction of geospatial data visualization and geographically weighted reg Introduction of geospatial data visualization and geographically weighted regression (GWR) Vanderbilt University August 16, 2012 Study Background Study Background Data Overview Algorithm (1) Information

More information

data visualization and regression

data visualization and regression data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species

More information

Stat 849: Fitting linear models in R

Stat 849: Fitting linear models in R Stat 849: Fitting linear models in R Douglas Bates Department of Statistics University of Wisconsin, Madison 2010-09-03 Outline lm Summaries Residual plots Anova & coef tables Other extractors Simulation

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

More information

Week 5: Multiple Linear Regression

Week 5: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

More information

Negative Binomials Regression Model in Analysis of Wait Time at Hospital Emergency Department

Negative Binomials Regression Model in Analysis of Wait Time at Hospital Emergency Department Negative Binomials Regression Model in Analysis of Wait Time at Hospital Emergency Department Bill Cai 1, Iris Shimizu 1 1 National Center for Health Statistic, 3311 Toledo Road, Hyattsville, MD 20782

More information

AP * Statistics Review. Linear Regression

AP * Statistics Review. Linear Regression AP * Statistics Review Linear Regression Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

An Introduction to Categorical Data Analysis Using R

An Introduction to Categorical Data Analysis Using R An Introduction to Categorical Data Analysis Using R Brett Presnell March 28, 2000 Abstract This document attempts to reproduce the examples and some of the exercises in An Introduction to Categorical

More information