Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang
|
|
|
- Garey Glenn
- 9 years ago
- Views:
Transcription
1 Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang School of Computing Sciences, UEA University of East Anglia Brief Introduction to R R is a free open source statistics and mathematical computing environment. The R project website is A brief introduction to R can be found from its web site: Installing R for Windows You can download and install the current Windows version of R (the latest versions is 2.9.1) from above web site (or clink the link below). You can either run the program, or save the program to your computer and then run it to install R. When installing, you can probably just accept the default settings. Current Windows Version of R Further Details You can see the following link with more details about the latest version of R. The link called r-patched release allows you to install a version that contains minor patches since the current official release (but note that the patched version is not the official version). The link called r- devel release is a version that is under development and likely contains bugs. CRAN Page with latest R version and associated files and instructions Start R Once you have installed it, you will get a window with a prompt R 2.9.0, clink it and R windows will be up running. You can then enter help.start()at the prompt to open a html help page. Help is available in R for most functions using the help command, e.g. for help on setwd type help(setwd). If you type a function name without the (), the R code for the function is displayed, if you type an object name the contents of the object are displayed. Enter getwd() at the prompt. It tells you the default directory R will load files from and save them to. Create a directory in your C drive, call it rfiles. Now enter setwd("c:/rfiles") at the R prompt, or in RGui, click File and choose Change dir to change the working directory. Enter getwd() to check. Note that in R you use forward slash / as separator, not backslash \ as in Windows. Quit R Under the R window, you can stop the R by either (1) clicking File and then Exit, (2) or issuing a command: >q() You will be prompted for saving your work environment, save it if you wish. 1
2 1. Basic Statistics and Analysis of Variance For this example, we will use one of R's built in datasets, GrowthData. > help(plantgrowth) The help page will tell you that this data shows yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions. Let's look at the data: > PlantGrowth weight group ctrl ctrl trt trt trt trt2 > summary(plantgrowth) weight group Min. :3.590 ctrl:10 1st Qu.:4.550 trt1:10 Median :5.155 trt2:10 Mean : rd Qu.:5.530 Max. :6.310 As you can see, there are only 30 data points, 10 in each of the three categories. This is a bit small, but anyway, we use it as a data set to learn basic functions of R. Note: you can use command help(datasets) or library(help= datasets ) to see all the builtin data sets in R Datasets Package. Standard Deviations What are the standard deviations of the plant weights - overall and for the three groups? Using the sd() function to compute the standard deviation of a given attribute, Using the tapply(x, group, FUN) function to apply the provided function FUN to each of the groups, e.g. tapply(plantgrowth$weight, PlantGrowth$group, sd) to apply the function sd to calculate the standard deviation for the weight values of each group. > sd(plantgrowth$weight) [1] > tapply(plantgrowth$weight, PlantGrowth$group, sd) ctrl trt1 trt Can we assume a constant spread across the groups? 2
3 Simple Plots Using the boxplot90 function to plot the box and whiskers plot for the data: > boxplot(weight ~ group, data=plantgrowth, col="blue") Notice that the largest weight in Treatment One has been shown as an outlier in the box plot (above). Lets draw a conditional histogram: > library(lattice) > histogram(~ weight group, data=plantgrowth, col="blue", type="count") 3
4 Can we assume we have normal distributions here? There are some skews, and Treatment One might actually be bimodal. Hypothesis Testing The experimental hypothesis would be that the treatment affects the plant growth. Looking at the plots this looks reasonable - Treatment One (trt1) seems to have reduced growth relative to the control (ctrl) while Treatment Two (trt2) has improved it. But can you show this with a statistical analysis? Our null hypothesis is that the three groups have the same mean. Let's investigate this with an ANOVA/F-test. This first example uses the aov() function to build a model and analyse it. We write weight ~ group to say the response variable weight depends on the predictor variable group. Type help("~") and help(formula) for more details. > analysis <- aov(weight ~ group, data=plantgrowth) > summary(analysis) Df Sum Sq Mean Sq F value Pr(>F) group * Residuals Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 In the next example, we first build a linear model (using the function lm() command) and then use the anova() command to analyse that. > fit <- lm(weight ~ group, data=plantgrowth) > anova(fit) Analysis of Variance Table 4
5 Response: weight Df Sum Sq Mean Sq F value Pr(>F) group * Residuals Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Reassuringly both methods give the same answer. The p-value for the F-statistic is (i.e. 1.6%), which R has marked with a single asterix because it is in the range 0.01 to As a rule of thumb this would be interpreted as only "weak" evidence to reject the null hypothesis (that the three groups have equal means). i.e. There appears to be only weak statistical support for the experimental hypothesis that the treatment affects the plant growth. Maybe they should grow some more plants (use a bigger dataset). However, is there any reason to doubt the validity conclusion? Does the data satisfy the model requirements? We have to assume the treatment groups are independent. What about normality? Looking at the graphs maybe not, but its hard to say from just ten data points. Looking at the standard deviations, can we assume a constant spread (homogeneity of variance)? Treatment One (0.79) is almost double Treatment Two (0.44) Scatter Plots and Linear Regressions Given two vectors x and y describing a set of points, plot(x,y) is all you need to draw an x-y scatter plot. To demonstrate this, first of all I'm going to create some fake data. For x I will simply have the integers 1 to 100, while for y I am going to use x plus some random noise. To do that, I generate 100 random numbers using a normal distribution with mean zero and standard deviation ten: > x <- c(1:100) > y <- x + rnorm(100, mean=0, sd=10) > plot(x,y) If you already had a graph on display, this will replace it; otherwise a second window will appear: 5
6 There are several ways of calculating a best fit line (linear regression), for example using the least squares fit lsfit() function: > fit <- lsfit(x,y) > fit$coefficients Intercept X These values are very close to what you would expect for this artificial data (intercept zero and gradient one). Once you have the coefficients, use the abline() function to actually draw it: > abline(fit, col="blue") This will update the existing graph (assuming you didn't close that window!): 6
7 An equivalent result is obtained using the lm(), or linear model, function: > fit <- lm(y~x) > fit$coefficients (Intercept) x This uses R's model definition syntax, where we write y ~ x to say y depends on x. Type help("~") and help(formula) for more details. 3. R Scripts You can put your commands in the order you wish to run them in the command lines in R Console into a simple text file and save it as an R script file, and run it later any time. e.g. Create or Edit R Script: In R console, RGui, click File, then choose New Script, then R-Editor window appears (you may use any other text editor), you can write commands there and save it in your R work directory, e.g. you may put some the above commands into an R script and save it as ScatterPlotLRegr.R # any things after # are the comments # An R script for Scater plot and linear regression # script file: ScatterPlotLRegr.R x <- c(1:100) y <- x + rnorm(100, mean=0, sd=10) plot(x,y) fit <- lsfit(x,y) abline(fit, col="blue ) Run a R script: You can then run it in R console by using the function >source("scatterplotlregr.r") 7
8 4. Perform statistical tests on a real data set In this exercise, you will use R to perform some simple statistical analysis and hypothesis testing on a real data set, i.e. twins data. The data were generated from a study 1 that was carried out by Jensen in 1974 on the topic that if a person s intelligence is primarily influenced by genetics or the environment, a long standing debatable issue. The study was set up to investigate the IQ scores of twins who were raised either with their biological parents or in foster families. Copy the data set twins.txt file (it will be provided) to your C:/rfiles folder Load the data set Now load the twins data by entering the following: twins.data<-read.table("twins.txt",header=true,row.names=1) (Note that the file name is enclosed in "". You can read files from any directory by using the full path name, for this exercise we will only use the default directory.) The read.table command loads the file as a data.frame object called twins.data. This is like a table, and we can perform functions on the data in it. We have used the options header=true to tell R that the first row contains the header, and row.names = 1 to tell R that the first column is not data, but the names of the rows (in this particular dataset they are numbers). Enter twins.data to see its contents. You can also look at part of a data.frame, e.g. enter twins.data[1:5,] just to see the first 5 rows. (In R we refer to the rows and columns of data.frames using [row,column], in this example the column section is empty, which means all columns are included. Series of rows and columns are referred to using a colon, here 1:5 means rows 1 to 5. The order can be reversed, e.g. 5:1. If several are required use c( ), e.g. twins.data[c(1,2,7,5),] will display rows 1, 2, 7 and 5. Any order can be used.) The columns in a data.frame can also be named. We did this when we loaded twins.txt. We can refer to the columns directly by their names, by entering the object name joined with $name (try entering twins.data$home and twins.data$foster). Note that all object and function names in R are case-sensitive. 4.2 Plotting data. R is very good for plotting graphs. A simple scatter plot can be produced by entering plot(twins.data), and a boxplot with boxplot(twins.data). A plot can be copied directly into programs such as Word as follows: right-click the plot, select Copy as metafile, open a Word document, then use the Word paste command. 1 Jensen, A.R. Behavioral Genetics, 4 (1), pp1-28,
9 (To use R plots in LaTeX right-click the plot, select Save as postscript type a name for the file and save it. Then you can include the file in a LaTeX document in the usual way.) 4.3. Statistical tests: t-test. Most statistical functions are included in the basic R installation. In addition many additional packages providing additional functions have been published and are available via the R project website. Now we will perform a t-test. The command in R for the t-test is t.test(). Perform the t-test by entering t.test(...). (You will need to replace... by the parameters for your data etc. between the (). Use the R help command, as described previously, to open the help page for the t-test, to find out how to use it.) If you simply enter the command t.test(...) the results will be printed at the prompt. It is also possible to store the results of an R function as an object. Enter: twins.ttest<-t.test(...) to store the results of the function t.test in an object which is to be called twins.ttest. Now enter twins.ttest to print the results. The t-test in R returns an object of class htest. This type of object is produced by several of the statistical tests in R and can contain several pieces of information. The parts of an R object can be given names (as we did with the columns in twins.data). To see the names for twins.ttest enter names(twins.ttest) It is possible to get just part of an object by using the name. To do this enter the object name joined with $name, e.g. enter twins.ttest$statistic This allows you to extract specific parts of an object, and is most useful if you write your own methods. For example, if you need to batch process multiple data sets, but you only need part of the outputs of the functions you are using. 4.4 Write R Scripts Write an R script to perform above tasks, i.e. load the data, plot the data and do t-test, and save it with a name twinsdataanalsis.r, into your rfiles folder. Try to run it to see if it works. 9
Multiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
Comparing Nested Models
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
Psychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Descriptive Statistics: Summary Statistics
Tutorial for the integration of the software R with introductory statistics Copyright Grethe Hystad Chapter 2 Descriptive Statistics: Summary Statistics In this chapter we will discuss the following topics:
Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
Final Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
Using R for Windows and Macintosh
2010 Using R for Windows and Macintosh R is the most commonly used statistical package among researchers in Statistics. It is freely distributed open source software. For detailed information about downloading
5 Correlation and Data Exploration
5 Correlation and Data Exploration Correlation In Unit 3, we did some correlation analyses of data from studies related to the acquisition order and acquisition difficulty of English morphemes by both
Least Squares Regression. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University [email protected].
Least Squares Regression Alan T. Arnholt Department of Mathematical Sciences Appalachian State University [email protected] Spring 2006 R Notes 1 Copyright c 2006 Alan T. Arnholt 2 Least Squares
OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE
OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION R is a free software environment for statistical computing
KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
MIXED MODEL ANALYSIS USING R
Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg
Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.
1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis
N-Way Analysis of Variance
N-Way Analysis of Variance 1 Introduction A good example when to use a n-way ANOVA is for a factorial design. A factorial design is an efficient way to conduct an experiment. Each observation has data
E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
Getting Started with R and RStudio 1
Getting Started with R and RStudio 1 1 What is R? R is a system for statistical computation and graphics. It is the statistical system that is used in Mathematics 241, Engineering Statistics, for the following
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:
Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Using R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
We extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
SPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
Projects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected]
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected] Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is
In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing
How Does My TI-84 Do That
How Does My TI-84 Do That A guide to using the TI-84 for statistics Austin Peay State University Clarksville, Tennessee How Does My TI-84 Do That A guide to using the TI-84 for statistics Table of Contents
Data Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
Using Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
This chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction
TI-Inspire manual 1 General Introduction Instructions Ti-Inspire for statistics TI-Inspire manual 2 TI-Inspire manual 3 Press the On, Off button to go to Home page TI-Inspire manual 4 Use the to navigate
Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller
Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller Most of you want to use R to analyze data. However, while R does have a data editor, other programs such as excel are often better
Factors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
The Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.
Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged
Mixed 2 x 3 ANOVA. Notes
Mixed 2 x 3 ANOVA This section explains how to perform an ANOVA when one of the variables takes the form of repeated measures and the other variable is between-subjects that is, independent groups of participants
4 Other useful features on the course web page. 5 Accessing SAS
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
SPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was
ABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
Time Series Analysis of Aviation Data
Time Series Analysis of Aviation Data Dr. Richard Xie February, 2012 What is a Time Series A time series is a sequence of observations in chorological order, such as Daily closing price of stock MSFT in
Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
Time Series Analysis with R - Part I. Walter Zucchini, Oleg Nenadić
Time Series Analysis with R - Part I Walter Zucchini, Oleg Nenadić Contents 1 Getting started 2 1.1 Downloading and Installing R.................... 2 1.2 Data Preparation and Import in R.................
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Section 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
Using SPSS, Chapter 2: Descriptive Statistics
1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,
2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
R Commander Tutorial
R Commander Tutorial Introduction R is a powerful, freely available software package that allows analyzing and graphing data. However, for somebody who does not frequently use statistical software packages,
There are six different windows that can be opened when using SPSS. The following will give a description of each of them.
SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
Regression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
Section 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression
Pearson's Correlation Tests
Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
Introduction to RStudio
Introduction to RStudio (v 1.3) Oscar Torres-Reyna [email protected] August 2013 http://dss.princeton.edu/training/ Introduction RStudio allows the user to run R in a more user-friendly environment.
Module 5: Statistical Analysis
Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the
Beginner s Matlab Tutorial
Christopher Lum [email protected] Introduction Beginner s Matlab Tutorial This document is designed to act as a tutorial for an individual who has had no prior experience with Matlab. For any questions
BIOL 933 Lab 6 Fall 2015. Data Transformation
BIOL 933 Lab 6 Fall 2015 Data Transformation Transformations in R General overview Log transformation Power transformation The pitfalls of interpreting interactions in transformed data Transformations
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day
An introduction to using Microsoft Excel for quantitative data analysis
Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing
Multivariate Analysis of Variance (MANOVA)
Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various
R: A self-learn tutorial
R: A self-learn tutorial 1 Introduction R is a software language for carrying out complicated (and simple) statistical analyses. It includes routines for data summary and exploration, graphical presentation
The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests
Tutorial The F distribution and the basic principle behind ANOVAs Bodo Winter 1 Updates: September 21, 2011; January 23, 2014; April 24, 2014; March 2, 2015 This tutorial focuses on understanding rather
MULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
Analysis of Variance. MINITAB User s Guide 2 3-1
3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
Chapter 7 Section 1 Homework Set A
Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the
12: Analysis of Variance. Introduction
1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider
Inference for two Population Means
Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example
Simple Linear Regression, Scatterplots, and Bivariate Correlation
1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
Using Excel in Research. Hui Bian Office for Faculty Excellence
Using Excel in Research Hui Bian Office for Faculty Excellence Data entry in Excel Directly type information into the cells Enter data using Form Command: File > Options 2 Data entry in Excel Tool bar:
Two-Sample T-Tests Assuming Equal Variance (Enter Means)
Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of
Minitab Tutorials for Design and Analysis of Experiments. Table of Contents
Table of Contents Introduction to Minitab...2 Example 1 One-Way ANOVA...3 Determining Sample Size in One-way ANOVA...8 Example 2 Two-factor Factorial Design...9 Example 3: Randomized Complete Block Design...14
Testing for differences I exercises with SPSS
Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
