Adatelemzés II. [SST35]
|
|
- Joan Marian Underwood
- 8 years ago
- Views:
Transcription
1 Adatelemzés II. [SST35] Idősorok 0. Lőw András október 26. Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
2 Vázlat 1 Honnan tudja a summary, hogy éppen mi a dolga? 2 lubridate Dátumok és idők könnyedén 3 Objektumorientáltság az R-ben 4 Egy apró példa S4-ben Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
3 Mellékelt adatok: Orange Így alakult az öt narancsfa átmérője: > summary(orange Tree age circumference 3:7 Min. : Min. : :7 1st Qu.: st Qu.: :7 Median : Median : :7 Mean : Mean : :7 3rd Qu.: rd Qu.:161.5 Max. : Max. :214.0 > str(orange Classes nfngroupeddata, nfgroupeddata, groupeddata and 'data.frame': 35 obs. of 3 variables: $ Tree : Ord.factor w/ 5 levels "3"<"1"<"5"<"2"<..: $ age : num $ circumference: num attr(*, "formula"=class 'formula' length 3 circumference ~ age Tree....- attr(*, ".Environment"=<environment: R_EmptyEnv> - attr(*, "labels"=list of 2..$ x: chr "Time since December 31, 1968"..$ y: chr "Trunk circumference" - attr(*, "units"=list of 2..$ x: chr "(days"..$ y: chr "(mm" Forrás: Draper, N. R. and Smith, H. (1998, Applied Regression Analysis (3rd ed, Wiley (exercise 24.N. Pinheiro, J. C. and Bates, D. M. (2000 Mixed-e ects Models in S and S-PLUS, Springer. Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
4 Így gyarapodtak a narancsfák: 3 Orange dataset by Trees circumference [mm] age [days] Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
5 Az 5 fa 7 alkalommal megmért kerülete: Orange dataset 4 5 count count age circumference age Tree 3 4 count 2 count circumference age Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
6 Mellékelt adatok: Loblolly Ez pedig néhány fenyőfa magasságáról szól: > summary(loblolly height age Seed Min. : 3.46 Min. : : 6 1st Qu.: st Qu.: : 6 Median :34.00 Median : : 6 Mean :32.36 Mean : : 6 3rd Qu.: rd Qu.: : 6 Max. :64.10 Max. : : 6 (Other:48 > str(loblolly Classes nfngroupeddata, nfgroupeddata, groupeddata and 'data.frame': 84 obs. of 3 variables: $ height: num $ age : num $ Seed : Ord.factor w/ 14 levels "329"<"327"<"325"<..: attr(*, "formula"=class 'formula' length 3 height ~ age Seed....- attr(*, ".Environment"=<environment: R_EmptyEnv> - attr(*, "labels"=list of 2..$ x: chr "Age of tree"..$ y: chr "Height of tree" - attr(*, "units"=list of 2..$ x: chr "(yr"..$ y: chr "(ft" Forrás: Kung, F. H. (1986, Fitting logistic growth curve with predetermined carrying capacity, in Proceedings of the Statistical Computing Section, American Statistical Association, Pinheiro, J. C. and Bates, D. M. (2000 Mixed-e ects Models in S and S-PLUS, Springer. Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
7 Így nőttek a fenyőfák: Loblolly dataset Seed height [ft] age [years] Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
8 További summary: Linear Model(1 > summary(swiss Fertility Agriculture Examination Education Catholic Min. :35.00 Min. : 1.20 Min. : 3.00 Min. : 1.00 Min. : st Qu.: st Qu.: st Qu.: st Qu.: st Qu.: Median :70.40 Median :54.10 Median :16.00 Median : 8.00 Median : Mean :70.14 Mean :50.66 Mean :16.49 Mean :10.98 Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. :92.50 Max. :89.70 Max. :37.00 Max. :53.00 Max. : Infant.Mortality Min. : st Qu.:18.15 Median :20.00 Mean : rd Qu.:21.70 Max. :26.60 > summary(lm(fertility ~., data = swiss Call: lm(formula = Fertility ~., data = swiss Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t (Intercept e-07 *** Agriculture * Examination Education e-05 *** Catholic ** Infant.Mortality ** --- Signif. codes: 0 *** ** 0.01 * Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
9 További summary: Generalized Linear Model(2 Treat Prewt Postwt CBT :29 Min. :70.00 Min. : Cont:26 1st Qu.: st Qu.: FT :17 Median :82.30 Median : Mean :82.41 Mean : rd Qu.: rd Qu.: Max. :94.90 Max. : > summary(glm(postwt ~ Prewt + Treat + offset(prewt, family = gaussian, data = anorexia Call: glm(formula = Postwt ~ Prewt + Treat + offset(prewt, family = gaussian, data = anorexia Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t (Intercept *** Prewt *** TreatCont * TreatFT * --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for gaussian family taken to be Null deviance: on 71 degrees of freedom Residual deviance: on 68 degrees of freedom AIC: Number of Fisher Scoring iterations: 2 Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
10 További summary: table (3 > apply(ucbadmissions, c(1, 2, sum Gender Admit Male Female Admitted Rejected > UCBAdmissions,, Dept = A Gender Admit Male Female Admitted Rejected ,, Dept = B Gender Admit Male Female Admitted Rejected 207 8,, Dept = C Gender Admit Male Female Admitted Rejected > summary(ucbadmissions Number of cases in table: 4526 Number of factors: 3 Test for independence of all factors: Chisq = , df = 16, p-value = 0 > apply(ucbadmissions, 3, function(u summary(as.table(u $A Number of cases in table: 933 Number of factors: 2 Test for independence of all factors: Chisq = , df = 1, p-value = 3.28e-05 $B Number of cases in table: 585 Number of factors: 2 Test for independence of all factors: Chisq = , df = 1, p-value = $C Number of cases in table: 918 Number of factors: 2 Test for independence of all factors: Chisq = , df = 1, p-value = $D Number of cases in table: 792 Number of factors: 2,, Dept = D Test for independence of all factors: Chisq = , df = 1, p-value = Gender Admit Male Female $E Admitted Number of cases in table: 584 Rejected Lőw A (low.andras@gmail.com Number Adatelemzés of factors: II. [SST35] október / 24
11 Vázlat 1 Honnan tudja a summary, hogy éppen mi a dolga? 2 lubridate Dátumok és idők könnyedén 3 Objektumorientáltság az R-ben 4 Egy apró példa S4-ben Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
12 Mire jó a lubridate csomag? Az alap R: > ( r.date <- as.posixct(' ', format = '%d-%m-%y', tz = 'UTC' [1] " UTC" > as.numeric(format(r.date, '%m' [1] 1 > r.date <- as.posixct(format(r.date, '%Y-2-%d', tz = 'UTC' > r.date [1] " UTC" > seq(r.date, length = 2, by = '-1 day'[2] [1] " UTC" > as.posixct(format(as.posixct(r.date, tz = 'UTC', tz = 'GMT' [1] " GMT" A lubridate csomag: > ( l.date <- dmy(' ' [1] " UTC" > month(l.date [1] 1 > month(l.date <- 2 > l.date [1] " UTC" > l.date - days(1 [1] " UTC" > with_tz(l.date, 'GMT' [1] " GMT" Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
13 Vázlat 1 Honnan tudja a summary, hogy éppen mi a dolga? 2 lubridate Dátumok és idők könnyedén 3 Objektumorientáltság az R-ben 4 Egy apró példa S4-ben Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
14 Rövid történelem Az első megoldás 1990 körül jelent meg az S 3-as verziójában, ezért S3. A következő verzióban vezették be a többszörös argumentumokat, javítottak az öröklődésen és absztraktabb lett: S4. Tavaly lépett tovább ezen is az R: R5. Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
15 Vázlat 1 Honnan tudja a summary, hogy éppen mi a dolga? 2 lubridate Dátumok és idők könnyedén 3 Objektumorientáltság az R-ben 4 Egy apró példa S4-ben Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
16 Fogyókúra-napló (1 Először készítsünk házilag idősort! Kell hozzá: kezdő időpont, záró időpont, adat (természetesen. > setclass('timeseries', representation( data = 'numeric', start = 'POSIXct', end = 'POSIXct' [1] "TimeSeries" > my.timeseries <- new('timeseries', data = c(1, 2, 3, 4, 5, 6, start = as.posixct('10/01/2011 0:00:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', end = as.posixct('10/01/2011 0:05:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S' > my.timeseries An object of class "TimeSeries" Slot "data": [1] Slot "start": [1] " UTC" Slot Lőw"end": A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
17 Fogyókúra-napló (2 Mikor érvényes egy idősor? van kezdő időpontja, van záró időpontja, akezdetazárásnálkorábbi(természetesen. > setvalidity('timeseries', function(object { object@start <= object@end && length(object@start == 1 && length(object@end == 1 } Class "TimeSeries" [in ".GlobalEnv"] Slots: Name: data start end Class: numeric POSIXct POSIXct > validobject(my.timeseries [1] TRUE Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
18 Fogyókúra-napló (3 Most már minden újabb elem felvételekor automatikusan lefut az érvényesség ellenőrzése is. Nem lehetünk ezért elég hálásak! > good.timeseries <- new('timeseries', data = c(7, 8, 9, 10, 11, 12, start = as.posixct('10/01/2011 0:06:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', end = as.posixct('10/01/2011 0:11:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S' > bad.timeseries <- new('timeseries', data = c(7, 8, 9, 10, 11, 12, start = as.posixct('10/01/2011 0:06:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', end = as.posixct('10/01/2008 0:11:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S' Error in validobject(.object : invalid class "TimeSeries" object: FALSE Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
19 Fogyókúra-napló (4 Mekkora időszakot fog át az idősorunk? Mik az adatok? Az első függvényeink [method] az új osztályhoz. > period.timeseries <- function(object { if (length(object@data > 1 { (object@end - object@start / (length(object@data - 1 } else { Inf } } > series <- function(object {object@data} > setgeneric('series' [1] "series" > series(my.timeseries [1] > showmethods('series' Function: series (package.globalenv object="any" object="timeseries" (inherited from: object="any" Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
20 Fogyókúra-napló (5 Kössük a period.timeseries-t, a TimeSeries osztályhoz! > period <- function(object {object@period} > setgeneric('period' [1] "period" > setmethod(period, signature = 'TimeSeries', definition = period.timeseries [1] "period" attr(,"package" [1] ".GlobalEnv" > showmethods('period' Function: period (package.globalenv object="any" object="timeseries" > period(my.timeseries [1] :01:00 Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
21 Fogyókúra-napló (6 A summary és a darabolás [ nem maradhat ki! > setmethod('summary', signature = 'TimeSeries', definition = function(object { print(paste(object@start, ' to ', object@end, sep = '', collapse = '' print(paste(object@data, sep = '', collapse = ', ' } [1] "summary" > summary(my.timeseries [1] " to :05:00" [1] "1, 2, 3, 4, 5, 6" > # > setmethod('[', signature = 'TimeSeries', definition = function(x, i, j,..., drop { x@data[i] } [1] "[" > my.timeseries[3] [1] 3 Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
22 Fogyókúra-napló (7 Eddig még szó sem volt a naplóról! > setclass('weighthistory', representation( height = 'numeric', name = 'character', contains = 'TimeSeries' [1] "WeightHistory" > john.doe <- new('weighthistory', data = c(170, 169, 171, 168, 170, 169, start = as.posixct('08/14/2011 0:00:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', end = as.posixct('09/28/2011 0:00:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', height = 72, name = 'John Doe' > john.doe An object of class "WeightHistory" Slot "height": [1] 72 Slot "name": [1] "John Doe" Slot "data": [1] Slot "start": [1] " UTC" Slot "end": [1] " UTC" Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
23 Fogyókúra-napló (8 További öröklések: > setclass( 'Person', representation( height = 'numeric', name = 'character' [1] "Person" > # > setclass( 'AltWeightHistory', contains = c('timeseries', 'Person' [1] "AltWeightHistory" > # > setclass( 'Cat', representation( breed = 'character', name = 'character' [1] "Cat" > # > setclassunion( 'NamedThing', c('person','cat' [1] "NamedThing" Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
24 Fogyókúra-napló (9 Minden együtt: > jane.doe <- new('altweighthistory', data = c(130, 129, 131, 128, 130, 129, start = as.posixct('08/14/2011 0:00:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', end = as.posixct('09/28/2011 0:00:00', tz = 'CEST', format = '%m/%d/%y %H:%M:%S', height = 67, name = 'Jane Doe' > jane.doe An object of class "AltWeightHistory" Slot "data": [1] Slot "start": [1] " UTC" Slot "end": [1] " UTC" Slot "height": [1] 67 Slot "name": [1] "Jane Doe" > is(jane.doe,'namedthing' [1] TRUE > is(john.doe,'timeseries' [1] TRUE Lőw A (low.andras@gmail.com Adatelemzés II. [SST35] október / 24
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationLab 13: Logistic Regression
Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account
More informationMIXED MODEL ANALYSIS USING R
Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationHands on S4 Classes. Yohan Chalabi. R/Rmetrics Workshop Meielisalp June 2009. ITP ETH, Zurich Rmetrics Association, Zurich Finance Online, Zurich
Hands on S4 Classes Yohan Chalabi ITP ETH, Zurich Rmetrics Association, Zurich Finance Online, Zurich R/Rmetrics Workshop Meielisalp June 2009 Outline 1 Introduction 2 S3 Classes/Methods 3 S4 Classes/Methods
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationWe extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
More informationEDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationMSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.
MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of
More informationRégression logistique : introduction
Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence
More informationPsychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
More informationComparing Nested Models
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationLucky vs. Unlucky Teams in Sports
Lucky vs. Unlucky Teams in Sports Introduction Assuming gambling odds give true probabilities, one can classify a team as having been lucky or unlucky so far. Do results of matches between lucky and unlucky
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationOutline. Dispersion Bush lupine survival Quasi-Binomial family
Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationA survey analysis example
A survey analysis example Thomas Lumley November 20, 2013 This document provides a simple example analysis of a survey data set, a subsample from the California Academic Performance Index, an annual set
More informationLogistic regression (with R)
Logistic regression (with R) Christopher Manning 4 November 2007 1 Theory We can transform the output of a linear regression to be suitable for probabilities by using a logit link function on the lhs as
More informationn + n log(2π) + n log(rss/n)
There is a discrepancy in R output from the functions step, AIC, and BIC over how to compute the AIC. The discrepancy is not very important, because it involves a difference of a constant factor that cancels
More informationExchange Rate Regime Analysis for the Chinese Yuan
Exchange Rate Regime Analysis for the Chinese Yuan Achim Zeileis Ajay Shah Ila Patnaik Abstract We investigate the Chinese exchange rate regime after China gave up on a fixed exchange rate to the US dollar
More informationHow to calculate an ANOVA table
How to calculate an ANOVA table Calculations by Hand We look at the following example: Let us say we measure the height of some plants under the effect of different fertilizers. Treatment Measures Mean
More informationLecture 8: Gamma regression
Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing
More informationThis can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
More informationLecture 6: Poisson regression
Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationModule 5: Introduction to Multilevel Modelling. R Practical. Introduction to the Scottish Youth Cohort Trends Dataset. Contents
Introduction Module 5: Introduction to Multilevel Modelling R Practical Camille Szmaragd and George Leckie 1 Centre for Multilevel Modelling Some of the sections within this module have online quizzes
More informationUsing R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationSimple example of collinearity in logistic regression
1 Confounding and Collinearity in Multivariate Logistic Regression We have already seen confounding and collinearity in the context of linear regression, and all definitions and issues remain essentially
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationLecture 11: Confidence intervals and model comparison for linear regression; analysis of variance
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was
More informationANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.
ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall
More informationTesting for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationStat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015
Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field
More informationLets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is
In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing
More informationAn Introduction to Spatial Regression Analysis in R. Luc Anselin University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.
An Introduction to Spatial Regression Analysis in R Luc Anselin University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu May 23, 2003 Introduction This note contains a brief introduction and
More informationStatistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl
Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic
More informationIntroduction to Hierarchical Linear Modeling with R
Introduction to Hierarchical Linear Modeling with R 5 10 15 20 25 5 10 15 20 25 13 14 15 16 40 30 20 10 0 40 30 20 10 9 10 11 12-10 SCIENCE 0-10 5 6 7 8 40 30 20 10 0-10 40 1 2 3 4 30 20 10 0-10 5 10 15
More informationISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media
ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media Abstract: The growth of social media is astounding and part of that success was
More informationFinancial Risk Models in R: Factor Models for Asset Returns. Workshop Overview
Financial Risk Models in R: Factor Models for Asset Returns and Interest Rate Models Scottish Financial Risk Academy, March 15, 2011 Eric Zivot Robert Richards Chaired Professor of Economics Adjunct Professor,
More informationCan Annuity Purchase Intentions Be Influenced?
Can Annuity Purchase Intentions Be Influenced? Jodi DiCenzo, CFA, CPA Behavioral Research Associates, LLC Suzanne Shu, Ph.D. UCLA Anderson School of Management Liat Hadar, Ph.D. The Arison School of Business,
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationStat 5303 (Oehlert): Tukey One Degree of Freedom 1
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationR: A Free Software Project in Statistical Computing
R: A Free Software Project in Statistical Computing Achim Zeileis Institut für Statistik & Wahrscheinlichkeitstheorie http://www.ci.tuwien.ac.at/~zeileis/ Acknowledgments Thanks: Alex Smola & Machine Learning
More informationExamining a Fitted Logistic Model
STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationElectronic Thesis and Dissertations UCLA
Electronic Thesis and Dissertations UCLA Peer Reviewed Title: A Multilevel Longitudinal Analysis of Teaching Effectiveness Across Five Years Author: Wang, Kairong Acceptance Date: 2013 Series: UCLA Electronic
More informationCorrelational Research
Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.
More informationChapter 7 Section 1 Homework Set A
Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationApplications of R Software in Bayesian Data Analysis
Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationdata visualization and regression
data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species
More informationEstimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationWeek 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationIntroduction of geospatial data visualization and geographically weighted reg
Introduction of geospatial data visualization and geographically weighted regression (GWR) Vanderbilt University August 16, 2012 Study Background Study Background Data Overview Algorithm (1) Information
More informationUsing R for Windows and Macintosh
2010 Using R for Windows and Macintosh R is the most commonly used statistical package among researchers in Statistics. It is freely distributed open source software. For detailed information about downloading
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationNegative Binomials Regression Model in Analysis of Wait Time at Hospital Emergency Department
Negative Binomials Regression Model in Analysis of Wait Time at Hospital Emergency Department Bill Cai 1, Iris Shimizu 1 1 National Center for Health Statistic, 3311 Toledo Road, Hyattsville, MD 20782
More informationIntroduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
More informationANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
More informationDifference of Means and ANOVA Problems
Difference of Means and Problems Dr. Tom Ilvento FREC 408 Accounting Firm Study An accounting firm specializes in auditing the financial records of large firm It is interested in evaluating its fee structure,particularly
More informationAn Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA
ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing
More informationAn Introduction to Categorical Data Analysis Using R
An Introduction to Categorical Data Analysis Using R Brett Presnell March 28, 2000 Abstract This document attempts to reproduce the examples and some of the exercises in An Introduction to Categorical
More informationImplementing Panel-Corrected Standard Errors in R: The pcse Package
Implementing Panel-Corrected Standard Errors in R: The pcse Package Delia Bailey YouGov Polimetrix Jonathan N. Katz California Institute of Technology Abstract This introduction to the R package pcse is
More informationMulti Factors Model. Daniel Herlemont. March 31, 2009. 2 Estimating using Ordinary Least Square regression 3
Multi Factors Model Daniel Herlemont March 31, 2009 Contents 1 Introduction 1 2 Estimating using Ordinary Least Square regression 3 3 Multicollinearity 6 4 Estimating Fundamental Factor Models by Orthogonal
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationThe imprecision of volatility indexes
The imprecision of volatility indexes Rohini Grover Ajay Shah IGIDR Finance Research Group May 17, 2014 Volatility indexes The volatility index (VIX) is an implied volatility estimate that measures the
More informationInteraction between quantitative predictors
Interaction between quantitative predictors In a first-order model like the ones we have discussed, the association between E(y) and a predictor x j does not depend on the value of the other predictors
More informationThe Chi-Square Test. STAT E-50 Introduction to Statistics
STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationFamily economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from 1990 1995.
Lecture 18 1. Random intercepts and slopes 2. Notation for mixed effects models 3. Comparing nested models 4. Multilevel/Hierarchical models 5. SAS versions of R models in Gelman and Hill, chapter 12 1
More informationGeneral Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.
General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n
More informationS4 Classes in 15 pages, more or less
S4 Classes in 15 pages, more or less February 12, 2003 Overview The preferred mechanism for object oriented programming in R is described in Chambers (1998). The actual implementation is slightly different
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationIntroduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationUsing Stata for Categorical Data Analysis
Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,
More informationCOMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
More informationExercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang
Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang School of Computing Sciences, UEA University of East Anglia Brief Introduction to R R is a free open source statistics and mathematical
More informationPackage dsmodellingclient
Package dsmodellingclient Maintainer Author Version 4.1.0 License GPL-3 August 20, 2015 Title DataSHIELD client site functions for statistical modelling DataSHIELD
More informationBIOL 933 Lab 6 Fall 2015. Data Transformation
BIOL 933 Lab 6 Fall 2015 Data Transformation Transformations in R General overview Log transformation Power transformation The pitfalls of interpreting interactions in transformed data Transformations
More informationName: Date: Use the following to answer questions 2-3:
Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student
More informationSession 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!
Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm! Moderator: David L. Snell, ASA, MAAA Presenters: Brian D. Holland, FSA, MAAA
More information