Statistical Models in R
|
|
|
- Priscilla Kennedy
- 10 years ago
- Views:
Transcription
1 Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; Fall, 2007
2 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova in R
3 Statistical Models First Principles In a couple of lectures the basic notion of a statistical model is described. Examples of anova and linear regression are given, including variable selection to find a simple but explanatory model. Emphasis is placed on R s framework for statistical modeling.
4 General Problem addressed by modelling Given: a collection of variables, each variable being a vector of readings of a specific trait on the samples in an experiment. Problem: In what way does a variable Y depend on other variables X 1,..., X n in the study. Explanation: A statistical model defines a mathematical relationship between the X i s and Y. The model is a representation of the real Y that aims to replace it as far as possible. At least the model should capture the dependence of Y on the X i s
5 General Problem addressed by modelling Given: a collection of variables, each variable being a vector of readings of a specific trait on the samples in an experiment. Problem: In what way does a variable Y depend on other variables X 1,..., X n in the study. Explanation: A statistical model defines a mathematical relationship between the X i s and Y. The model is a representation of the real Y that aims to replace it as far as possible. At least the model should capture the dependence of Y on the X i s
6 General Problem addressed by modelling Given: a collection of variables, each variable being a vector of readings of a specific trait on the samples in an experiment. Problem: In what way does a variable Y depend on other variables X 1,..., X n in the study. Explanation: A statistical model defines a mathematical relationship between the X i s and Y. The model is a representation of the real Y that aims to replace it as far as possible. At least the model should capture the dependence of Y on the X i s
7 The Types of Variables in a statistical model The response variable is the one whose content we are trying to model with other variables, called the explanatory variables. In any given model there is one response variable (Y above) and there may be many explanatory variables (like X 1,....X n ).
8 Identify and Characterize Variables the first step in modelling Which variable is the response variable; Which variables are the explanatory variables; Are the explanatory variables continuous, categorical, or a mixture of both; What is the nature of the response variable is it a continuous measurement, a count, a proportion, a category, or a time-at-death?
9 Identify and Characterize Variables the first step in modelling Which variable is the response variable; Which variables are the explanatory variables; Are the explanatory variables continuous, categorical, or a mixture of both; What is the nature of the response variable is it a continuous measurement, a count, a proportion, a category, or a time-at-death?
10 Identify and Characterize Variables the first step in modelling Which variable is the response variable; Which variables are the explanatory variables; Are the explanatory variables continuous, categorical, or a mixture of both; What is the nature of the response variable is it a continuous measurement, a count, a proportion, a category, or a time-at-death?
11 Identify and Characterize Variables the first step in modelling Which variable is the response variable; Which variables are the explanatory variables; Are the explanatory variables continuous, categorical, or a mixture of both; What is the nature of the response variable is it a continuous measurement, a count, a proportion, a category, or a time-at-death?
12 Types of Variables Determine Type of Model The explanatory variables All explanatory variables continuous All explanatory variables categorical Explanatory variables both continuous and categorical Regression Analysis of variance (Anova) Analysis of covariance (Ancova)
13 Types of Variables Determine Type of Model The response variable what kind of data is it? Continuous Proportion Count Binary Time-at-death Normal Regression, Anova, Ancova Logistic regression Log linear models Binary logistic analysis Survival analysis
14 Model Formulas Which variables are involved? A fundamental aspect of models is the use of model formulas to specify the variables involved in the model and the possible interactions between explanatory variables included in the model. A model formula is input into a function that performs a linear regression or anova, for example. While a model formula bears some resemblance to a mathematical formula, the symbols in the equation mean different things than in algebra.
15 Common Features of model formulas Model formulas have a format like > Y ~ X1 + X2 + Z * W where Y is the explanatory variable, means is modeled as a function of and the right hand side is an expression in the explanatory variables.
16 First Examples of Model Formulas Given continuous variables x and y, the relationship of a linear regression of y on x is described as > y ~ x The actual linear regression is executed by > fit <- lm(y ~ x)
17 First Examples of Model Formulas If y is continuous and z is categorical we use the same model formula > y ~ z to express that we ll model y as a function of z, however in this case the model will be an anova, executed as > fit <- aov(y ~ z)
18 Multiple Explanatory Variables Frequently, there are multiple explanatory variables invovled in a model. The + symbol denotes inclusion of additional explanatory variables. The formula > y ~ x1 + x2 + x3 denotes that y is modeled as a function of x1, x2, x3. If all of these are continuous, > fit <- lm(y ~ x1 + x2 + x3) executes a mutliple linear regression of y on x1, x2, x3.
19 Other Operators in Model Formulas In complicated relationships we may need to include interaction terms as variables in the model. This is common when a model involves multiple categorical explanatory variables. A factorial anova may involve calculating means for the levels of variable A restricted to a level of B. The formula > y ~ A * B describes this form of model.
20 Just the Basics Here, just the basic structure of modeling in R is given, using anova and linear regression as examples. See the Crawley book listed in the syllabus for a careful introduction to models of varying forms. Besides giving examples of models of these simple forms, tools for assessing the quality of the models, and comparing models with different variables will be illustrated.
21 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova in R
22 Approximate Y The goal of a model is to approximate a vector Y with values calculated from the explanatory variables. Suppose the Y values are (y 1,..., y n ). The values calculated in the model are called the fitted values and denoted (ŷ 1,..., ŷ n ). (In general, a hat on a quantity means one approximated in a model or through sampling.) The goodness of fit is measured with the residuals, (r 1,..., r n ), where r i = y i ŷ i.
23 Measure of Residuals Two obtain a number that measures the overall size of the residuals we use the residual sum of squares, defined as RSS = n (y i ŷ i ) 2. i=1 As RSS decreases ŷ becomes a better approximation to y.
24 Residuals: Only Half of the Story A good model should have predictive value in other data sets and contain only as many explanatory variables as needed for a reasonable fit. To minimize RSS we can set ŷ i = y i, for 1 i n. However, this model may not generalize at all to another data set. It is heavily biased to this sample. We could set ŷ i = ȳ = (y y n )/n, the sample mean, for all i. This has low bias in that other samples will yield about the same mean. However, it may have high variance, that is a large RSS.
25 Residuals: Only Half of the Story A good model should have predictive value in other data sets and contain only as many explanatory variables as needed for a reasonable fit. To minimize RSS we can set ŷ i = y i, for 1 i n. However, this model may not generalize at all to another data set. It is heavily biased to this sample. We could set ŷ i = ȳ = (y y n )/n, the sample mean, for all i. This has low bias in that other samples will yield about the same mean. However, it may have high variance, that is a large RSS.
26 Residuals: Only Half of the Story A good model should have predictive value in other data sets and contain only as many explanatory variables as needed for a reasonable fit. To minimize RSS we can set ŷ i = y i, for 1 i n. However, this model may not generalize at all to another data set. It is heavily biased to this sample. We could set ŷ i = ȳ = (y y n )/n, the sample mean, for all i. This has low bias in that other samples will yield about the same mean. However, it may have high variance, that is a large RSS.
27 Bias-Variance Trade-off Selecting an optimal model, both in the form of the model and the parameters, is a complicated compromise between minimizing bias and variance. This is a deep and evolving subject, although it is certainly settled in linear regression and other simple models. For these lectures make as a goal minimzing RSS while keeping the model as simple as possible. Just as in hypothesis testing, there is a statistic calculable from the data and model that we use to measure part of this trade-off.
28 Statistics Measure the Fit Comparing two models fit to the same data can be set up as a hypothesis testing problem. Let M 0 and M 1 denote the models. Consider as the null hypothesis M 1 is not a significant improvement on M 0, and the alternative the negation. This hypothesis can often be formulated so that a statistic can be generated from the two models.
29 Model Comparison Statistics Normally, the models are nested in that the variables in M 0 are a subset of those in M 1. The statistic often involves the RSS values for both models, adjusted by the number of parameters used. In linear regression this becomes an anova test (comparing variances). More robust is a likelihood ratio test for nested models. When models are sufficiently specific to define a probability distribution for y, the model will report the log-likelihood, ˆL. Under some mild assumptions, 2(ˆL 0 ˆL 1 ) follows a chi-squared distribution with degrees of freedom = difference in number of parameters on the two models.
30 Comparison with Null Model The utility of a single model M 1 is often assessed by comparing it with the null model, that reflects no dependence of y on the explanatory variables. The model formula for the null model is > y ~ 1 signifying that we use a constant to approximate y. The natural constant is the mean of y. Functions in R that generate models report the statistics that measure it s comparison with the null model.
31 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova in R
32 Continuous Factors Analysis of variance is the modeling technique used when the response variable is continuous and all of the explanatory variables are categorical; i.e., factors. Setup: A continuous variable Y is modeled against a categorical variable A. Anova Model Structure: On each level approximate the Y values by the mean for that level.
33 Continuous Factors Analysis of variance is the modeling technique used when the response variable is continuous and all of the explanatory variables are categorical; i.e., factors. Setup: A continuous variable Y is modeled against a categorical variable A. Anova Model Structure: On each level approximate the Y values by the mean for that level.
34 Anova Model Structure Suppose Y is the vector (y 1,..., y m ). In an anova model the fit value for y i in level A j is the mean of the y values in level A j. So, the fit vector is a vector of level means. The null model for an anova uses mean(y) to approximate every y i. An anova model with two levels is basically a t test. The t test assesses whether it is statistically meaningful to consider the group means as different, or approximate both the global mean.
35 Setup and Assumptions for a simple anova Consider the continuous variable Y, partioned as y ij, the j th observation in factor level i. Suppose there are K levels. Assumptions: The anova is balanced, meaning every level has the same number n of elements. Let Y i = { y ij : 1 j n } Each Y i is normally distributed. The variances of the Y i s are constant.
36 Setup and Assumptions for a simple anova Consider the continuous variable Y, partioned as y ij, the j th observation in factor level i. Suppose there are K levels. Assumptions: The anova is balanced, meaning every level has the same number n of elements. Let Y i = { y ij : 1 j n } Each Y i is normally distributed. The variances of the Y i s are constant.
37 Setup and Assumptions for a simple anova Consider the continuous variable Y, partioned as y ij, the j th observation in factor level i. Suppose there are K levels. Assumptions: The anova is balanced, meaning every level has the same number n of elements. Let Y i = { y ij : 1 j n } Each Y i is normally distributed. The variances of the Y i s are constant.
38 Sums of Squares measure the impact of level means Several sums of squares help measure the overall deviation in the Y values, the deviation in the model and the RSS. Let ȳ i be the mean of Y i, ȳ the mean of Y (all values). SSY = SSA = RSS = SSE = K n (y ij ȳ) 2 i=1 j=1 K (ȳ i ȳ) 2 i=1 K i=1 j=1 n (y ij ȳ i ) 2
39 The Statistic to Assess Fit is F The statistic used to assess the model is calculated from SSA and SSE by adjusting for degrees of freedom. Let MSA = SSA/(K 1) and MSE = SSE/K(n 1). Define: F = MSA MSE. Under all of the assumptions on the data, under the null hypothesis that all of the level means are the same, F satisfies an F distribution with K 1 and K(n 1) degrees of freedom.
40 Anova in R An anova in R is executed with the aov function. The sample data have a vector Y, of values and an associated factor LVS of three levels, each containing 20 samples. As usual, it should be verified that the values on each level are normally distributed and there is a constant variance across the levels. First visualize the relationship between Y and LVS with a boxplot. > plot(lvs, Y, main = "Boxplot Anova Sample Data") > abline(h = mean(y), col = "red")
41 Plot of Sample Data Boxplot Anova Sample Data A B C
42 Execute Anova and Summarize > aovfit1 <- aov(y ~ LVS) > summary(aovfit1) Df Sum Sq Mean Sq F value Pr(>F) LVS e-06 *** Residuals Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Conclude: there is a significant difference in level means.
43 Plots Assessing the Anova Model > plot(aovfit1) Fitted values Residuals aov(y ~ LVS) Residuals vs Fitted
44 Variables May be Components Frequently, the variables on which a statistical model is generated are components in a data frame. If the data frame dat has components HT and FLD, an anova could be executed as > aovfld <- aov(ht ~ FLD, data = dat) All statistical modeling formulas have a data optional parameter.
45 Underlying Linear Model Actually, this analysis of variance is a form of multiple linear regression, with a variable for each level in the factor. Many features of modeling are the same and R reflects this.
46 (Intercept) LVSB *** > summary.lm(aovfit1) Call: aov(formula = Y ~ LVS) Underlying Linear Model Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) LVSB e-05 LVSC e-06
47 Getting (Almost) Real The assumptions of a classical anova aren t realistic. Various refined methods handle unbalanced data (levels of different sizes), non-normally distributed data, multiple nested factors, readings within a factor collected over time (longitudinal data), and pseudo-replication. The types of models to reference are: split-plot, random effects, nested design, mixed models. These methods can require significant care in defining the model formula and interpreting the result.
48 Getting (Almost) Real The assumptions of a classical anova aren t realistic. Various refined methods handle unbalanced data (levels of different sizes), non-normally distributed data, multiple nested factors, readings within a factor collected over time (longitudinal data), and pseudo-replication. The types of models to reference are: split-plot, random effects, nested design, mixed models. These methods can require significant care in defining the model formula and interpreting the result.
49 Getting (Almost) Real The assumptions of a classical anova aren t realistic. Various refined methods handle unbalanced data (levels of different sizes), non-normally distributed data, multiple nested factors, readings within a factor collected over time (longitudinal data), and pseudo-replication. The types of models to reference are: split-plot, random effects, nested design, mixed models. These methods can require significant care in defining the model formula and interpreting the result.
50 Kruskal-Wallis Test Just as an anova is a multi-level t test, the Kruskal-Wallis test is a multi-level version of the Mann-Whitney test. This is a non-parametric test that does not assume normality of the errors. It sums rank as in Mann-Whitney. For example, the Kruskal-Wallis test applied to the earlier data is executed by > ktest <- kruskal.test(y ~ LVS) ktest will be an htest object, like that generated by t.test.
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
Multiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
N-Way Analysis of Variance
N-Way Analysis of Variance 1 Introduction A good example when to use a n-way ANOVA is for a factorial design. A factorial design is an efficient way to conduct an experiment. Each observation has data
ANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
Recall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups
One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The
We extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
Comparing Nested Models
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Penalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Two-way ANOVA and ANCOVA
Two-way ANOVA and ANCOVA In this tutorial we discuss fitting two-way analysis of variance (ANOVA), as well as, analysis of covariance (ANCOVA) models in R. As we fit these models using regression methods
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
An analysis method for a quantitative outcome and two categorical explanatory variables.
Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model
Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity
Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
CS 147: Computer Systems Performance Analysis
CS 147: Computer Systems Performance Analysis One-Factor Experiments CS 147: Computer Systems Performance Analysis One-Factor Experiments 1 / 42 Overview Introduction Overview Overview Introduction Finding
Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science [email protected]
Dept of Information Science [email protected] October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
1 Theory: The General Linear Model
QMIN GLM Theory - 1.1 1 Theory: The General Linear Model 1.1 Introduction Before digital computers, statistics textbooks spoke of three procedures regression, the analysis of variance (ANOVA), and the
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
Factors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
10. Analysis of Longitudinal Studies Repeat-measures analysis
Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.
ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
Mathematics within the Psychology Curriculum
Mathematics within the Psychology Curriculum Statistical Theory and Data Handling Statistical theory and data handling as studied on the GCSE Mathematics syllabus You may have learnt about statistics and
Elementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
Example G Cost of construction of nuclear power plants
1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor
Descriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
One-Way Analysis of Variance (ANOVA) Example Problem
One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Elements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
HLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
How To Understand Multivariate Models
Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models
Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini
NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these
A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
Electronic Thesis and Dissertations UCLA
Electronic Thesis and Dissertations UCLA Peer Reviewed Title: A Multilevel Longitudinal Analysis of Teaching Effectiveness Across Five Years Author: Wang, Kairong Acceptance Date: 2013 Series: UCLA Electronic
Statistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
Introduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
Introduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
Simple Second Order Chi-Square Correction
Simple Second Order Chi-Square Correction Tihomir Asparouhov and Bengt Muthén May 3, 2010 1 1 Introduction In this note we describe the second order correction for the chi-square statistic implemented
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
International Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: [email protected] 1. Introduction
Section 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
Chapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
GLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
T-test & factor analysis
Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Normality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected]
Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]
Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance
Final Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
Multivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS
Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical
Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance
2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample
Randomized Block Analysis of Variance
Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
Projects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
DATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
Comparison of Estimation Methods for Complex Survey Data Analysis
Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.
