Getting Started in R~Stata Notes on Exploring Data
|
|
|
- Francine Dorsey
- 10 years ago
- Views:
Transcription
1 Getting Started in R~ Notes on Exploring Data (v. 1.0) Oscar Torres-Reyna Fall
2 What is R/? What is R? R is a language and environment for statistical computing and graphics * R is offered as open source (i.e. free) What is? It is a multi-purpose statistical package to help you explore, summarize and analyze datasets. A dataset is a collection of several pieces of information called variables (usually arranged by columns). A variable can have one or several values (information for one or several cases). Other statistical packages are SPSS and SAS. Features SPSS SAS R Learning curve Steep/gradual Gradual/flat Pretty steep Pretty steep User interface Programming/point-and-click Mostly point-and-click Programming Programming Data manipulation Very strong Moderate Very strong Very strong Data analysis Powerful Powerful Powerful/versatile Powerful/versatile Graphics Very good Very good Good Excellent Cost Affordable (perpetual licenses, renew only when upgrade) Expensive (but not need to renew until upgrade, long term licenses) Expensive (yearly renewal) Open source NOTE: The R content presented in this document is mostly based on an early version of Fox, J. and Weisberg, S. (2011) An R Companion to Applied Regression, Second Edition, Sage; and from class notes from the ICPSR s workshop Introduction to the R Statistical Computing Environment taught by John Fox during the summer of *
3 This is the R screen in Multiple-Document Interface (MDI)
4 This is the R screen in Single-Document Interface (SDI)
5 12/13+ screen Variables in dataset here Output here History of commands, this window????? Files will be saved here Write commands here Property of each variable here PU/DSS/OTR
6 getwd() # Shows the working directory (wd) setwd("c:/myfolder/data") # Changes the wd setwd("h:\\myfolder\\data") # Changes the wd R install.packages("abc") # This will install the package -ABC--. A window will pop-up, select a mirror site to download from (the closest to where you are) and click ok. library(abc) # Load the package -ABC- to your workspace in R?plot # Get help for an object, in this case for the -plot function. You can also type: help(plot)??regression # Search the help pages for anything that has the word "regression". You can also type: help.search("regression") apropos("age") # Search the word "age" in the objects available in the current R session. help(package=car) # View documentation in package car. You can also type: library(help="car ) Working directory Installing packages/user-written programs Getting help pwd /*Shows the working directory*/ cd c:\myfolder\data /*Changes the wd*/ cd c:\myfolder\stata data /*Notice the spaces*/ ssc install abc /*Will install the user-defined program abc. It will be ready to run. findit abc /*Will do an online search for program abc or programs that include abc. It also searcher your computer. help tab search regression word regression */ /* Get help on the command tab */ /* Search the keywords for the hsearch regression /* Search the help files for the work regression. It provides more options than search */ help(dataabc) # Access codebook for a dataset called DataABC in the package ABC 6
7 # Select the table from the excel file, copy, go to the R Console and type: mydata <- read.table("clipboard", header=true, sep="\t") R # Reading the data directly Data from *.csv (copy-and-paste) /* Select the table from the excel file, copy, go to, in the command line type: edit Data from *.csv /*The data editor will pop-up and paste the data (Ctrl-V). Select the link for to include variable names /* In the command line type */ mydata <- read.csv("c:\mydata\mydatafile.csv", header=true) # The will open a window to search for the *.csv file. mydata <- read.csv(file.choose(), header = TRUE) insheet using "c:\mydata\mydatafile.csv" /* Using the menu */ Go to File->Import-> ASCII data created by spreadsheet. Click on Browse to find the file and then OK. Data from/to *.txt (space, tab, comma-separated) # In the example above, variables have spaces and missing data is coded as -9 mydata <- read.table(("c:/myfolder/abc.txt", header=true, sep="\t", na.strings = "-9") # Export the data write.table(mydata, file = "test.txt", sep = "\t") /* See insheet above */ infile var1 var2 str7 var3 using abc.raw /* Variables with embedded spaces must be enclosed in quotes */ # Export data 7 outsheet using "c:\mydata\abc.csv"
8 R install.packages("foreign") # Need to install package -foreign - first (you do this only once). Data from/to SPSS /* Need to install the program usespss (you do this only once) */ library(foreign) # Load package -foreign-- mydata.spss <- read.spss(" a.sav", to.data.frame = TRUE, use.value.labels=true, use.missings = to.data.frame) # Where: # # to.data.frame return a data frame. # # use.value.labels Convert variables with value labels into R factors with those levels. # # use.missings logical: should information on user-defined missing values be used to set the corresponding values to NA. Source: type?read.spss write.foreign(mydata, codefile="test2.sps", datafile="test2.raw", package= SPSS") # Provides a syntax file (*.sps) to read the *.raw data file ssc install usespss /* To read the *.sav type (in one line): usespss using /* For additional information type */ help usespss Note: This does not work with SPSS portable files (*.por) /* does not convert files to SPSS. You need to save the data file as a file version 9 that can be read by SPSS v15.0 or later*/ /* From type: */ saveold mydata.dta /* Saves data to v.9 for SPSS 8
9 R # To read SAS XPORT format (*.xpt) library(foreign) # Load package -foreign-- mydata.sas <- read.xport(" ta.xpt") # Does not work for files online mydata.sas <- read.xport("c:/myfolder/mydata.xpt") # Using package -Hmisc library(hmisc) mydata <- sasxport.get( ata.xpt) # It works write.foreign(mydata, codefile="test2.sas", datafile="test2.raw", package= SAS") # Provide a syntax file (*.sas) to read the *.raw data Data from/to SAS /*If you have a file in SAS XPORT format (*.xpt) you can use fdause (or go to File->Import). */ fdause "c:/myfolder/mydata.xpt /* Type help fdause for more details */ /* If you have SAS installed in your computer you can use the program usesas, which you can install by typing: */ ssc install usesas /* To read the *.sas7bcat type (in one line): */ usesas using "c:\mydata.sas7bdat /* You can export a dataset as SAS XPORT by menu (go to File->Export) or by typing */ fdasave "c:/myfolder/mydata1.xpt /* Type help fdasave for more details */ NOTE: As an alternative, you can use SAS Universal Viewer (freeware from SAS) to read SAS files and save them as *.csv. Saving the file as *.csv removes variable/value labels, make sure you have the codebook available. 9
10 library(foreign) # Load package -foreign-- R mydata <- read.dta(" ts.dta") mydata.dta <- read.dta(" convert.factors=true, convert.dates=true, convert.underscore=true, warn.missing.labels=true) Data from/to /* To open a file go to File -> Open, or type: */ use "c:\myfolder\mydata.dta" Or use " /* If you need to load a subset of a data file type */ use var1 var2 using "c:\myfolder\mydata.dta" # Where (source: type?read.dta) # convert.dates. Convert dates to Date class # convert.factors. Use value labels to create factors? (version 6.0 or later). # convert.underscore. Convert "_" in variable names to "." in R names? # warn.missing.labels. Warn if a variable is specified with value labels and those value labels are not present in the file write.dta(mydata, file = "test.dta") # Direct export to write.foreign(mydata, codefile="test1.do", datafile="test1.raw", package="") # Provide a do-file to read the *.raw data use id city state gender using "H:\Work\Marshall\Fall10\Session1-2\\Students.dta", clear /* To save a dataset as file got File -> Save As, or type: */ save mydata, replace save, replace /*If the fist time*/ /*If already saved as file*/ 10
11 load("mydata.rdata") load("mydata.rda") R Data from/to R /* can t read R data files */ /* Add path to data if necessary */ save.image("mywork.rdata") # Saving all objects to file *.RData save(object1, object2, file= mywork.rda") # Saving selected objects 11
12 R mydata.dat <- read.fwf(file=" ydata.dat", width=c(7, -16, 2, 2, -4, 2, -10, 2, -110, 3, -6, 2), col.names=c("w","y","x1","x2","x3", "age", "sex"), n=1090) Data from ACII Record form /* Using infix */ infix var1 1-7 var str2 var var str2 var var var using " /* Using infile */ # Reading ASCII record form, numbers represent the width of variables, negative sign excludes variables not wanted (you must include these). # To get the width of the variables you must have a codebook for the data set available (see an example below). # To get the widths for unwanted spaces use the formula: dictionary using c:\data\mydata.dat { _column(1) var1 %7.2f " Label for var1 " _column(24) var2 %2f " Label for var2 " _column(26) str2 var3 %2s " Label for var3 " _column(32) var4 %2f " Label for var4 " _column(44) str2 var5 %2s " Label for var5 " _column(156) str3 var5 %3s " Label for var6 " _column(165) str2 var5 %2s " Label for var7 " } /*Do not forget to close the brackets and press enter after the last bracket*/ Save it as mydata.dct Start of var(t+1) End of var(t) - 1 With infile we run the dictionary by typing: infile using c:\data\mydata For other options check *Thank you to Scott Kostyshak for useful advice/code. Var Rec Start End Format Data locations usually available in codebooks var F7.2 var F2.0 var A2 var F2.0 var A2 var A3 var A2 12
13 R str(mydata) # Provides the structure of the dataset summary(mydata) # Provides basic descriptive statistics and frequencies names(mydata) # Lists variables in the dataset head(mydata) # First 6 rows of dataset head(mydata, n=10)# First 10 rows of dataset head(mydata, n= -10) # All rows but the last 10 tail(mydata) # Last 6 rows tail(mydata, n=10) # Last 10 rows tail(mydata, n= -10) # All rows but the first 10 mydata[1:10, ] # First 10 rows of the mydata[1:10,1:3] # First 10 rows of data of the first 3 variables edit(mydata) # Open data editor sum(is.na(mydata))# Number of missing in dataset rowsums(is.na(data))# Number of missing per variable rowmeans(is.na(data))*length(data)# No. of missing per row mydata[mydata$age=="& ","age"] <- NA # NOTE: Notice hidden spaces. mydata[mydata$age==999,"age"] <- NA The function complete.cases() returns a logical vector indicating which cases are complete. # list rows of data that have missing values mydata[!complete.cases(mydata),] The function na.omit() returns the object with listwise deletion of missing values. # create new dataset without missing data newdata <- na.omit(mydata) Exploring data Missing data describe /* Provides the structure of the dataset*/ summarize /* Provides basic descriptive statistics for numeric data*/ ds /* Lists variables in the dataset */ list in 1/6 /* First 6 rows */ edit /* Open data editor (double-click to edit*/ browse /* Browse data */ mydata <- edit(data.frame()) tabmiss /* # of missing. Need to install, type scc install tabmiss. Also try findit tabmiss and follow instructions */ /* For missing values per observation see the function rowmiss and the egen command*/ 13
14 #Using base commands R Renaming variables edit /* Open data editor (double-click to edit) fix(mydata) # Rename interactively. names(mydata)[3] <- "First" # Using library -reshape-- library(reshape) mydata <- rename(mydata, c(last.name="last")) mydata <- rename(mydata, c(first.name="first")) mydata <- rename(mydata, c(student.status="status")) mydata <- rename(mydata, c(average.score..grade.="score")) mydata <- rename(mydata, c(height..in.="height")) mydata <- rename(mydata, c(newspaper.readership..times.wk.="read")) Variable labels rename oldname newname rename lastname last rename firstname first rename studentstatus status rename averagescoregrade score rename heightin height rename newspaperreadershiptimeswk read Use variable names as variable labels /* Adding labels to variables */ label variable w "Weight" label variable y "Output" label variable x1 "Predictor 1" label variable x2 "Predictor 2" label variable x3 "Predictor 3" label variable age "Age" label variable sex "Gender" 14
15 R # Use factor() for nominal data Value labels /* Step 1 defining labels */ mydata$sex <- factor(mydata$sex, levels = c(1,2), labels = c("male", "female")) # Use ordered() for ordinal data mydata$var2 <- ordered(mydata$var2, levels = c(1,2,3,4), labels = c("strongly agree", "Somewhat agree", "Somewhat disagree", "Strongly disagree")) mydata$var8 <- ordered(mydata$var2, levels = c(1,2,3,4), labels = c("strongly agree", "Somewhat agree", "Somewhat disagree", "Strongly disagree")) # Making a copy of the same variable label define approve 1 "Approve strongly" 2 "Approve somewhat" 3 "Disapprove somewhat" 4 "Disapprove strongly" 5 "Not sure" 6 "Refused" label define well 1 "Very well" 2 "Fairly well" 3 "Fairly badly" 4 "Very badly" 5 "Not sure" 6 "Refused" label define partyid 1 "Party A" 2 "Party B" 3 "Equally party A/B" 4 "Third party candidates" 5 "Not sure" 6 "Refused" label define gender 1 "Male" 2 "Female /* Step 2 applying labels */ label values y approve label values x1 approve tab x1 d x1 destring x1, replace label values x1 approve label values x2 well label values x3 partyid tab x3 destring x3, replace ignore(&) label values x3 partyid tab x3 label values sex gender tab1 y x1 x2 x3 age sex 15
16 # Creating a variable with a sequence of numbers or to index R Creating ids/sequence of numbers /* Creating a variable with a sequence of numbers or to index */ # Creating a variable with a sequence of numbers from 1 to n (where n is the total number of observations) mydata$id <- seq(dim(mydata)[1]) # Creating a variable with the total number of observations mydata$total <- dim(mydata)[1] /* Creating a variable with a sequence of numbers from 1 to n per category (where n is the total number of observations in each category)(1) */ mydata <- mydata[order(mydata$group),] idgroup <- tapply(mydata$group, mydata$group, function(x) seq(1,length(x),1)) mydata$idgroup <- unlist(idgroup) /* Creating a variable with a sequence of numbers from 1 to n (where n is the total number of observations) */ gen id = _n /* Creating a variable with the total number of observations */ gen total = _N /* Creating a variable with a sequence of numbers from 1 to n per category (where n is the total number of observations in each category) */ bysort group: gen id = _n For more info see: (1) Thanks to Alex Acs for the code 16
17 R library(car) mydata$age.rec <- recode(mydata$age, "18:19='18to19'; 20:29='20to29'; 30:39='30to39'") Recoding variables recode age (18 19 = 1 "18 to 19") /// (20/28 = 2 "20 to 29") /// (30/39 = 3 "30 to 39") (else=.), generate(agegroups) label(agegroups) mydata$age.rec <- as.factor(mydata$age.rec) mydata$age.rec <- NULL mydata$var1 <- mydata$var2 <- NULL # Save the commands used during the session savehistory(file="mylog.rhistory") # Load the commands used in a previous session loadhistory(file="mylog.rhistory") # Display the last 25 commands history() Dropping variables drop var1 drop var1-var10 Keeping track of your work /* A log file helps you save commands and output to a text file (*.log) or to a read-only file (*.smcl). The best way is to save it as a text file (*.log)*/ log using mylog.log /*Start the log*/ log close /*Close the log*/ log using mylog.log, append /*Add to an existing log*/ log using mylog.log, replace /*Replace an existing log*/ # You can read mylog.rhistory with any word processor. Notice that the file has to have the extension *.Rhistory /*You can read mylog.log using any word processor*/ 17
18 R table(mydata$gender) table(mydata$read) readgender <- table(mydata$read,mydata$gender) prop.table(readgender,1) # Row proportions prop.table(readgender,2) # Col proportions prop.table(readgender) # Tot proportions chisq.test(readgender) # Do chisq test Ho: no relathionship fisher.test(readgender) # Do fisher'exact test Ho: no relationship round(prop.table(readgender,2), 2) # Round col prop to 2 digits round(prop.table(readgender,2), 2) # Round col prop to 2 digits round(100* prop.table(readgender,2), 2) # Round col % to 2 digits round(100* prop.table(readgender,2)) # Round col % to whole numbers addmargins(readgender) # Adding row/col margins #install.packages("vcd") library(vcd) assocstats(majorgender) # NOTE: Chi-sqr = sum (obs-exp)^2/exp Degrees of freedom for Chi-sqr are (r-1)*(c-1) # NOTE: Chi-sqr contribution = (obs-exp)^2/exp # Cramer's V = sqrt(chi-sqr/n*min) Where N is sample size and min is a the minimun of (r-1) or (c-1) Categorical data: Frequencies/Crosstab s tab gender /*Frequencies*/ tab read tab read gender, col row tab read gender, col row chi2 V /*Crosstabs*/ /*Crosstabs where chi2 (The null hypothesis (Ho) is that there is no relationship) and V (measure of association goes from 0 to 1)*/ bysort studentstatus: tab read gender, colum row 18
19 R install.packages("gmodels") library(gmodels) mydata$ones <- 1 # Create a new variable of ones CrossTable(mydata$Major,digits=2) CrossTable(mydata$Major,mydata$ones, digits=2) CrossTable(mydata$Gender,mydata$ones, digits=2) CrossTable(mydata$Major,mydata$Gender,digits=2, expected=true,dnn=c("major","gender")) CrossTable(mydata$Major,mydata$Gender,digits=2, dnn=c("major","gender")) chisq.test(mydata$major,mydata$gender) # Null hipothesis: no association # 3-way crosstabs test <- xtabs(~read+major+gender, data=mydata) #install.packages("vcd") library(vcd) assocstats(majorgender) Categorical data: Frequencies/Crosstab s tab gender /*Frequencies*/ tab major tab major gender, col row tab major gender, col row chi2 V /*Crosstabs*/ /*Crosstabs which chi2 (The null hypothesis (Ho) is that there is no relationship) and V (measure of association goes from 0 to 1)*/ bysort studentstatus: tab gender major, colum row 19
20 R install.packages("pastecs") Descriptive Statistics summarize /*N, mean, sd, min, max*/ library(pastecs) stat.desc(mydata) stat.desc(mydata[,c("age","sat","score","height", "Read")]) stat.desc(mydata[,c("age","sat","score")], basic=true, desc=true, norm=true, p=0.95) stat.desc(mydata[10:14], basic=true, desc=true, norm=true, p=0.95) # Selecting the first 30 observations and first 14 variables summarize, detail /*N, mean, sd, min, max, variance, skewness, kurtosis, percentiles*/ summarize age, detail summarize sat, detail tabstat age sat score heightin read /*Gives the mean only*/ tabstat age sat score heightin read, statistics(n, mean, median, sd, var, min, max) mydata2 <- mydata2[1:30,1:14] tabstat age sat score heightin read, by(gender) # Selection using the --subset mydata3 <- subset(mydata2, Age >= 20 & Age <= 30) mydata4 <- subset(mydata2, Age >= 20 & Age <= 30, select=c(id, First, Last, Age)) mydata5 <- subset(mydata2, Gender=="Female" & Status=="Graduate" & Age >= 30) mydata6 <- subset(mydata2, Gender=="Female" & Status=="Graduate" & Age == 30) tabstat age sat score heightin read, statistics(mean, median) by(gender) /*Type help tabstat for a list of all statistics*/ table gender, contents(freq mean age mean score) tab gender major, sum(sat) /*Categorical and continuous*/ bysort studentstatus: tab gender major, sum(sat) 20
21 R mean(mydata) # Mean of all numeric variables, same using --sapply--('s' for simplify) mean(mydata$sat) with(mydata, mean(sat)) median(mydata$sat) table(mydata$country) # Mode by frequencies -> max(table(mydata$country)) / names(sort(- table(mydata$country)))[1] var(mydata$sat) # Variance sd(mydata$sat) # Standard deviation max(mydata$sat) # Max value min(mydata$sat) # Min value range(mydata$sat) # Range quantile(mydata$sat) quantile(mydata$sat, c(.3,.6,.9)) fivenum(mydata$sat) # Boxplot elements. From help: "Returns Tukey's five number summary (minimum, lower-hinge, median, upper-hinge, maximum) for the input data ~ boxplot" length(mydata$sat) # Num of observations when a length(mydata) variable is specify # Number of variables when a dataset is specify which.max(mydata$sat) # From help: "Determines the location, i.e., index of the (first) minimum or maximum of a numeric vector" which.min(mydata$sat) # From help: "Determines the location, i.e., index of the (first) minimum or maximum of a numeric vector stderr <- function(x) sqrt(var(x)/length(x)) incster <- tapply(incomes, statef, stderr) Descriptive Statistics summarize /*N, mean, sd, min, max*/ summarize, detail /*N, mean, sd, min, max, variance, skewness, kurtosis, percentiles*/ summarize age, detail summarize sat, detail tabstat age sat score heightin read /*Gives the mean only*/ tabstat age sat score heightin read, statistics(n, mean, median, sd, var, min, max) /*Type help tabstat for a list of all statistics*/ tabstat tabstat age sat score heightin read, by(gender) age sat score heightin read, statistics(mean, median) by(gender) table gender, contents(freq mean age mean score) tab gender major, sum(sat) continuous*/ /*Categorical and bysort studentstatus: tab gender major, sum(sat) 21
22 R # Descriptive statiscs by groups using --tapply-- mean <- tapply(mydata$sat,mydata$gender, mean) # Add na.rm=true to remove missing values in the estimation sd <- tapply(mydata$sat,mydata$gender, sd) median <- tapply(mydata$sat,mydata$gender, median) max <- tapply(mydata$sat,mydata$gender, max) cbind(mean, median, sd, max) round(cbind(mean, median, sd, max),digits=1) t1 <- round(cbind(mean, median, sd, max),digits=1) t1 Descriptive Statistics summarize /*N, mean, sd, min, max*/ summarize, detail /*N, mean, sd, min, max, variance, skewness, kurtosis, percentiles*/ summarize age, detail summarize sat, detail tabstat age sat score heightin read /*Gives the mean only*/ tabstat age sat score heightin read, statistics(n, mean, median, sd, var, min, max) /*Type help tabstat for a list of all statistics*/ # Descriptive statistics by groups using -- aggregate aggregate(mydata[c("age","sat")],by=list(sex=mydat a$gender), mean, na.rm=true) aggregate(mydata[c("age","sat")],mydata["gender"], mean, na.rm=true) aggregate(mydata,by=list(sex=mydata$gender), mean, na.rm=true) aggregate(mydata,by=list(sex=mydata$gender, major=mydata$major, status=mydata$status), mean, na.rm=true) aggregate(mydata$sat,by=list(sex=mydata$gender, major=mydata$major, status=mydata$status), mean, na.rm=true) aggregate(mydata[c("sat")],by=list(sex=mydata$gend er, major=mydata$major, status=mydata$status), mean, na.rm=true) tabstat tabstat age sat score heightin read, by(gender) age sat score heightin read, statistics(mean, median) by(gender) table gender, contents(freq mean age mean score) tab gender major, sum(sat) continuous*/ /*Categorical and bysort studentstatus: tab gender major, sum(sat) 22
23 R library(car) head(prestige) hist(prestige$income) hist(prestige$income, col="green") with(prestige, hist(income)) # Histogram of income with a nicer title. with(prestige, hist(income, breaks="fd", col="green")) # Applying Freedman/Diaconis rule p.120 ("Algorithm that chooses bin widths and locations automatically, based on the sample size and the spread of the data" g6n.html) box() hist(prestige$income, breaks="fd") # Conditional histograms par(mfrow=c(1, 2)) hist(mydata$sat[mydata$gender=="female"], breaks="fd", main="female", xlab="sat",col="green") hist(mydata$sat[mydata$gender=="male"], breaks="fd", main="male", xlab="sat", col="green") # Braces indicate a compound command allowing several commands with 'with' command par(mfrow=c(1, 1)) with(prestige, { hist(income, breaks="fd", freq=false, col="green") lines(density(income), lwd=2) lines(density(income, adjust=0.5),lwd=1) rug(income) }) Histograms hist sat hist sat, normal hist sat, by(gender) 23
24 # Histograms overlaid R hist(mydata$sat, breaks="fd", col="green") hist(mydata$sat[mydata$gender=="male"], breaks="fd", col="gray", add=true) legend("topright", c("female","male"), fill=c("green","gray")) Histograms hist sat hist sat, normal hist sat, by(gender) 24
25 R # Scatterplots. Useful to 1) study the mean and variance functions in the regression of y on x p.128; 2)to identify outliers and leverage points. Scatterplots twoway scatter y x twoway scatter sat age, title("figure 1. SAT/Age") # plot(x,y) plot(mydata$sat) # Index plot plot(mydata$age, mydata$sat) plot(mydata$age, mydata$sat, main= Age/SAT", xlab= Age", ylab= SAT", col="red") abline(lm(mydata$sat~mydata$age), col="blue") # regression line (y~x) lines(lowess(mydata$age, mydata$sat), col="green") # lowess line (x,y) identify(mydata$age, mydata$sat, row.names(mydata)) # On row.names to identify. "All data frames have a row names attribute, a character vector of length the number of rows with no duplicates nor missing values." (source link below). # "Use attr(x, "row.names") if you need an integer value.)" mydata$names <- paste(mydata$last, mydata$first) row.names(mydata) <- mydata$names plot(mydata$sat, mydata$age) identify(mydata$sat, mydata$age, row.names(mydata)) twoway scatter sat age, mlabel(last) twoway scatter sat age, mlabel(last) lfit sat age twoway scatter sat age, mlabel(last) lfit sat age lowess sat age /* locally weighted scatterplot smoothing */ twoway scatter sat age, mlabel(last) lfit sat age, yline(1800) xline(30) twoway scatter sat age, mlabel(last) by(major, total) twoway scatter sat age, mlabel(last) by(major, total) lfit sat age /* Adding confidence intervals */ twoway (lfitci sat age) (scatter sat age) /*Reverse order shaded area cover dots*/ twoway (lfitci sat age) (scatter sat age, mlabel(last)) twoway (lfitci sat age) (scatter sat age, mlabel(last)), title("sat scores by age") ytitle("sat") twoway scatter sat age, mlabel(last) by(gender, total) 25
26 # Rule on span for lowess, big sample smaller (~0.3), small sample bigger (~0.7) library(car) R scatterplot(sat~age, data=mydata) scatterplot(sat~age, id.method="identify", data=mydata) scatterplot(sat~age, id.method="identify", boxplots= FALSE, data=mydata) scatterplot(prestige~income, span=0.6, lwd=3, id.n=4, data=prestige) Scatterplots twoway scatter y x twoway scatter sat age, title("figure 1. SAT/Age") twoway scatter sat age, mlabel(last) twoway scatter sat age, mlabel(last) lfit sat age twoway scatter sat age, mlabel(last) lfit sat age lowess sat age /* locally weighted scatterplot smoothing */ twoway scatter sat age, mlabel(last) lfit sat age, yline(1800) xline(30) twoway scatter sat age, mlabel(last) by(major, total) # By groups scatterplot(sat~age Gender, data=mydata) scatterplot(sat~age Gender, id.method="identify", data=mydata) scatterplot(prestige~income type, boxplots=false, span=0.75, data=prestige) scatterplot(prestige~income type, boxplots=false, span=0.75, col=gray(c(0,0.5,0.7)), data=prestige) twoway scatter sat age, mlabel(last) by(major, total) lfit sat age /* Adding confidence intervals */ twoway (lfitci sat age) (scatter sat age) /*Reverse order shaded area cover dots*/ twoway (lfitci sat age) (scatter sat age, mlabel(last)) twoway (lfitci sat age) (scatter sat age, mlabel(last)), title("sat scores by age") ytitle("sat") twoway scatter sat age, mlabel(last) by(gender, total) 26
27 R scatterplotmatrix(~ prestige + income + education + women, span=0.7, id.n=0, data=prestige) pairs(prestige) # Pariwise plots. Scatterplots of all variables in the dataset pairs(prestige, gap=0, cex.labels=0.9) # gap controls the space between subplot and cex.labels the font size (Dalgaard:186) Scatterplots (multiple) graph matrix sat age score heightin read graph matrix sat age score heightin read, half library(car) 3D Scatterplots scatter3d(prestige ~ income + education, id.n=3, data=duncan) 27
28 R plot(vocabulary ~ education, data=vocab) plot(jitter(vocabulary) ~ jitter(education), data=vocab) plot(jitter(vocabulary, factor=2) ~ jitter(education, factor=2), data=vocab) # cex makes the point half the size, p. 134 plot(jitter(vocabulary, factor=2) ~ jitter(education, factor=2), col="gray", cex=0.5, data=vocab) with(vocab, { abline(lm(vocabulary ~ education), lwd=3, lty="dashed") lines(lowess(education, vocabulary, f=0.2), lwd=3) }) Scatterplots (for categorical data) /*Categorical data using mydata.dat and the jitter option*/ /*"scatter will add spherical random noise to your data before plotting if you specify jitter(#), where # represents the size of the noise as a percentage of the graphical area. This can be useful for creating graphs of categorical data when the data not jittered, many of the points would be on top of each other, making it impossible to tell whether the plotted point represented one or 1,000 observations. Source: s help page, type: help scatter*/ /*Use mydata.dat*/ graph matrix y x1 x2 x3 scatter y x1, jitter(7) title(xyz) scatter y x1, jitter(7) msize(0.5) scatter y x1, jitter(13) msize(0.5) twoway scatter y x1, jitter(13) msize(0.5) lfit y x1 graph matrix x1 x1 x2 x3, jitter(5) graph matrix y x1 x2 x3, jitter(13) msize(0.5) graph matrix y x1 x2 x3, jitter(13) msize(0.5) half 28
29 References/Useful links DSS Online Training Section Princeton DSS Libguides John Fox s site Quick-R UCLA Resources to learn and use R UCLA Resources to learn and use DSS - DSS - R 29
30 References/Recommended books An R Companion to Applied Regression, Second Edition / John Fox, Sanford Weisberg, Sage Publications, 2011 Data Manipulation with R / Phil Spector, Springer, 2008 Applied Econometrics with R / Christian Kleiber, Achim Zeileis, Springer, 2008 Introductory Statistics with R / Peter Dalgaard, Springer, 2008 Complex Surveys. A guide to Analysis Using R / Thomas Lumley, Wiley, 2010 Applied Regression Analysis and Generalized Linear Models / John Fox, Sage, 2008 R for Users / Robert A. Muenchen, Joseph Hilbe, Springer, 2010 Introduction to econometrics / James H. Stock, Mark W. Watson. 2nd ed., Boston: Pearson Addison Wesley, Data analysis using regression and multilevel/hierarchical models / Andrew Gelman, Jennifer Hill. Cambridge ; New York : Cambridge University Press, Econometric analysis / William H. Greene. 6th ed., Upper Saddle River, N.J. : Prentice Hall, Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King, Robert O. Keohane, Sidney Verba, Princeton University Press, Unifying Political Methodology: The Likelihood Theory of Statistical Inference / Gary King, Cambridge University Press, 1989 Statistical Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam Kachigan, New York : Radius Press, c1986 Statistics with (updated for version 9) / Lawrence Hamilton, Thomson Books/Cole,
Exploring Data and Descriptive Statistics (using R)
Data Analysis 101 Workshops Exploring Data and Descriptive Statistics (using R) Oscar Torres-Reyna Data Consultant [email protected] http://dss.princeton.edu/training/ Agenda What is R Transferring
Merge/Append using R (draft)
Merge/Append using R (draft) Oscar Torres-Reyna Data Consultant [email protected] January, 2011 http://dss.princeton.edu/training/ Intro Merge adds variables to a dataset. This document will use merge
Getting Started in Data Analysis using Stata
Getting Started in Data Analysis using Stata (v. 6.0) Oscar Torres-Reyna [email protected] December 2007 http://dss.princeton.edu/training/ Stata Tutorial Topics What is Stata? Stata screen and general
Data Preparation & Descriptive Statistics (ver. 2.7)
Data Preparation & Descriptive Statistics (ver. 2.7) Oscar Torres-Reyna Data Consultant [email protected] http://dss.princeton.edu/training/ Basic definitions For statistical analysis we think of data
A Short Guide to R with RStudio
Short Guides to Microeconometrics Fall 2013 Prof. Dr. Kurt Schmidheiny Universität Basel A Short Guide to R with RStudio 1 Introduction 2 2 Installing R and RStudio 2 3 The RStudio Environment 2 4 Additions
Introduction to RStudio
Introduction to RStudio (v 1.3) Oscar Torres-Reyna [email protected] August 2013 http://dss.princeton.edu/training/ Introduction RStudio allows the user to run R in a more user-friendly environment.
An introduction to using Microsoft Excel for quantitative data analysis
Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing
There are six different windows that can be opened when using SPSS. The following will give a description of each of them.
SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet
Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
Using SPSS, Chapter 2: Descriptive Statistics
1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,
Introduction Course in SPSS - Evening 1
ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Data Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
How to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
Introduction to IBM SPSS Statistics
CONTENTS Arizona State University College of Health Solutions College of Nursing and Health Innovation Introduction to IBM SPSS Statistics Edward A. Greenberg, PhD Director, Data Lab PAGE About This Document
Introduction to STATA 11 for Windows
1/27/2012 Introduction to STATA 11 for Windows Stata Sizes...3 Documentation...3 Availability...3 STATA User Interface...4 Stata Language Syntax...5 Entering and Editing Stata Commands...6 Stata Online
GETTING YOUR DATA INTO SPSS
GETTING YOUR DATA INTO SPSS UNIVERSITY OF GUELPH LUCIA COSTANZO [email protected] REVISED SEPTEMBER 2011 CONTENTS Getting your Data into SPSS... 0 SPSS availability... 3 Data for SPSS Sessions... 4
The Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
Installing R and the psych package
Installing R and the psych package William Revelle Department of Psychology Northwestern University August 17, 2014 Contents 1 Overview of this and related documents 2 2 Install R and relevant packages
January 26, 2009 The Faculty Center for Teaching and Learning
THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i
Data exploration with Microsoft Excel: analysing more than one variable
Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical
Using R for Windows and Macintosh
2010 Using R for Windows and Macintosh R is the most commonly used statistical package among researchers in Statistics. It is freely distributed open source software. For detailed information about downloading
Psychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
SPSS Manual for Introductory Applied Statistics: A Variable Approach
SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All
Data analysis process
Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis
5 Correlation and Data Exploration
5 Correlation and Data Exploration Correlation In Unit 3, we did some correlation analyses of data from studies related to the acquisition order and acquisition difficulty of English morphemes by both
SPSS Introduction. Yi Li
SPSS Introduction Yi Li Note: The report is based on the websites below http://glimo.vub.ac.be/downloads/eng_spss_basic.pdf http://academic.udayton.edu/gregelvers/psy216/spss http://www.nursing.ucdenver.edu/pdf/factoranalysishowto.pdf
Scatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
Data exploration with Microsoft Excel: univariate analysis
Data exploration with Microsoft Excel: univariate analysis Contents 1 Introduction... 1 2 Exploring a variable s frequency distribution... 2 3 Calculating measures of central tendency... 16 4 Calculating
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
Getting Started with R and RStudio 1
Getting Started with R and RStudio 1 1 What is R? R is a system for statistical computation and graphics. It is the statistical system that is used in Mathematics 241, Engineering Statistics, for the following
Appendix III: SPSS Preliminary
Appendix III: SPSS Preliminary SPSS is a statistical software package that provides a number of tools needed for the analytical process planning, data collection, data access and management, analysis,
Importing Data into R
1 R is an open source programming language focused on statistical computing. R supports many types of files as input and the following tutorial will cover some of the most popular. Importing from text
Introduction to PASW Statistics 34152-001
Introduction to PASW Statistics 34152-001 V18 02/2010 nm/jdr/mr For more information about SPSS Inc., an IBM Company software products, please visit our Web site at http://www.spss.com or contact: SPSS
SPSS: Getting Started. For Windows
For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 Introduction to SPSS Tutorials... 3 1.2 Introduction to SPSS... 3 1.3 Overview of SPSS for Windows... 3 Section 2: Entering
How to Use a Data Spreadsheet: Excel
How to Use a Data Spreadsheet: Excel One does not necessarily have special statistical software to perform statistical analyses. Microsoft Office Excel can be used to run statistical procedures. Although
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
IBM SPSS Statistics 20 Part 1: Descriptive Statistics
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 1: Descriptive Statistics Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
SPSS 12 Data Analysis Basics Linda E. Lucek, Ed.D. [email protected] 815-753-9516
SPSS 12 Data Analysis Basics Linda E. Lucek, Ed.D. [email protected] 815-753-9516 Technical Advisory Group Customer Support Services Northern Illinois University 120 Swen Parson Hall DeKalb, IL 60115 SPSS
R with Rcmdr: BASIC INSTRUCTIONS
R with Rcmdr: BASIC INSTRUCTIONS Contents 1 RUNNING & INSTALLATION R UNDER WINDOWS 2 1.1 Running R and Rcmdr from CD........................................ 2 1.2 Installing from CD...............................................
4 Other useful features on the course web page. 5 Accessing SAS
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
When to use Excel. When NOT to use Excel 9/24/2014
Analyzing Quantitative Assessment Data with Excel October 2, 2014 Jeremy Penn, Ph.D. Director When to use Excel You want to quickly summarize or analyze your assessment data You want to create basic visual
An introduction to IBM SPSS Statistics
An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive
Data Analysis in SPSS. February 21, 2004. If you wish to cite the contents of this document, the APA reference for them would be
Data Analysis in SPSS Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Heather Claypool Department of Psychology Miami University
An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman
An Introduction to SPSS Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman Topics to be Covered Starting and Entering SPSS Main Features of SPSS Entering and Saving Data in SPSS Importing
OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE
OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION R is a free software environment for statistical computing
Describing, Exploring, and Comparing Data
24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter
Trade Flows and Trade Policy Analysis. October 2013 Dhaka, Bangladesh
Trade Flows and Trade Policy Analysis October 2013 Dhaka, Bangladesh Witada Anukoonwattaka (ESCAP) Cosimo Beverelli (WTO) 1 Introduction to STATA 2 Content a. Datasets used in Introduction to Stata b.
Module 2 Basic Data Management, Graphs, and Log-Files
AGRODEP Stata Training April 2013 Module 2 Basic Data Management, Graphs, and Log-Files Manuel Barron 1 and Pia Basurto 2 1 University of California, Berkeley, Department of Agricultural and Resource Economics
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
IBM SPSS Direct Marketing 19
IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS
Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs
Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships
IBM SPSS Statistics for Beginners for Windows
ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning
4. Descriptive Statistics: Measures of Variability and Central Tendency
4. Descriptive Statistics: Measures of Variability and Central Tendency Objectives Calculate descriptive for continuous and categorical data Edit output tables Although measures of central tendency and
This chapter reviews the general issues involving data analysis and introduces
Research Skills for Psychology Majors: Everything You Need to Know to Get Started Data Preparation With SPSS This chapter reviews the general issues involving data analysis and introduces SPSS, the Statistical
R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol
R Graphics Cookbook Winston Chang Beijing Cambridge Farnham Koln Sebastopol O'REILLY Tokyo Table of Contents Preface ix 1. R Basics 1 1.1. Installing a Package 1 1.2. Loading a Package 2 1.3. Loading a
Learning SPSS: Data and EDA
Chapter 5 Learning SPSS: Data and EDA An introduction to SPSS with emphasis on EDA. SPSS (now called PASW Statistics, but still referred to in this document as SPSS) is a perfectly adequate tool for entering
SAS Analyst for Windows Tutorial
Updated: August 2012 Table of Contents Section 1: Introduction... 3 1.1 About this Document... 3 1.2 Introduction to Version 8 of SAS... 3 Section 2: An Overview of SAS V.8 for Windows... 3 2.1 Navigating
MBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout
Analyzing Data SPSS Resources 1. See website (readings) for SPSS tutorial & Stats handout Don t have your own copy of SPSS? 1. Use the libraries to analyze your data 2. Download a trial version of SPSS
Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures
Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the
Directions for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
Introduction to StatsDirect, 11/05/2012 1
INTRODUCTION TO STATSDIRECT PART 1... 2 INTRODUCTION... 2 Why Use StatsDirect... 2 ACCESSING STATSDIRECT FOR WINDOWS XP... 4 DATA ENTRY... 5 Missing Data... 6 Opening an Excel Workbook... 6 Moving around
IBM SPSS Forecasting 22
IBM SPSS Forecasting 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release 0, modification
Data analysis and regression in Stata
Data analysis and regression in Stata This handout shows how the weekly beer sales series might be analyzed with Stata (the software package now used for teaching stats at Kellogg), for purposes of comparing
IBM SPSS Direct Marketing 20
IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
R Commander Tutorial
R Commander Tutorial Introduction R is a powerful, freely available software package that allows analyzing and graphing data. However, for somebody who does not frequently use statistical software packages,
Getting started with qplot
Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy
S P S S Statistical Package for the Social Sciences
S P S S Statistical Package for the Social Sciences Data Entry Data Management Basic Descriptive Statistics Jamie Lynn Marincic Leanne Hicks Survey, Statistics, and Psychometrics Core Facility (SSP) July
EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.
EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. Excel is powerful tool and can make your life easier if you are proficient in using it. You will need to use Excel to complete most of your
UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST
UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
Using Excel for Data Manipulation and Statistical Analysis: How-to s and Cautions
2010 Using Excel for Data Manipulation and Statistical Analysis: How-to s and Cautions This document describes how to perform some basic statistical procedures in Microsoft Excel. Microsoft Excel is spreadsheet
Introduction to SPSS 16.0
Introduction to SPSS 16.0 Edited by Emily Blumenthal Center for Social Science Computation and Research 110 Savery Hall University of Washington Seattle, WA 98195 USA (206) 543-8110 November 2010 http://julius.csscr.washington.edu/pdf/spss.pdf
R FOR SAS AND SPSS USERS. Bob Muenchen
R FOR SAS AND SPSS USERS Bob Muenchen I thank the many R developers for providing such wonderful tools for free and all the r help participants who have kindly answered so many questions. I'm especially
SPSS and AM statistical software example.
A detailed example of statistical analysis using the NELS:88 data file and ECB, to perform a longitudinal analysis of 1988 8 th graders in the year 2000: SPSS and AM statistical software example. Overall
Introduction to SPSS (version 16) for Windows
Introduction to SPSS (version 16) for Windows Practical workbook Aims and Learning Objectives By the end of this course you will be able to: get data ready for SPSS create and run SPSS programs to do simple
Working with SPSS. A Step-by-Step Guide For Prof PJ s ComS 171 students
Working with SPSS A Step-by-Step Guide For Prof PJ s ComS 171 students Contents Prep the Excel file for SPSS... 2 Prep the Excel file for the online survey:... 2 Make a master file... 2 Clean the data
KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
GeoGebra Statistics and Probability
GeoGebra Statistics and Probability Project Maths Development Team 2013 www.projectmaths.ie Page 1 of 24 Index Activity Topic Page 1 Introduction GeoGebra Statistics 3 2 To calculate the Sum, Mean, Count,
430 Statistics and Financial Mathematics for Business
Prescription: 430 Statistics and Financial Mathematics for Business Elective prescription Level 4 Credit 20 Version 2 Aim Students will be able to summarise, analyse, interpret and present data, make predictions
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
Table of Contents. Preface
Table of Contents Preface Chapter 1: Introduction 1-1 Opening an SPSS Data File... 2 1-2 Viewing the SPSS Screens... 3 o Data View o Variable View o Output View 1-3 Reading Non-SPSS Files... 6 o Convert
ECONOMICS 351* -- Stata 10 Tutorial 2. Stata 10 Tutorial 2
Stata 10 Tutorial 2 TOPIC: Introduction to Selected Stata Commands DATA: auto1.dta (the Stata-format data file you created in Stata Tutorial 1) or auto1.raw (the original text-format data file) TASKS:
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application
Survey Research Data Analysis
Survey Research Data Analysis Overview Once survey data are collected from respondents, the next step is to input the data on the computer, do appropriate statistical analyses, interpret the data, and
Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.
1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis
Prof. Nicolai Meinshausen Regression FS 2014. R Exercises
Prof. Nicolai Meinshausen Regression FS 2014 R Exercises 1. The goal of this exercise is to get acquainted with different abilities of the R statistical software. It is recommended to use the distributed
SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
Using Excel for Statistical Analysis
2010 Using Excel for Statistical Analysis Microsoft Excel is spreadsheet software that is used to store information in columns and rows, which can then be organized and/or processed. Excel is a powerful
Variables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
EXST SAS Lab Lab #4: Data input and dataset modifications
EXST SAS Lab Lab #4: Data input and dataset modifications Objectives 1. Import an EXCEL dataset. 2. Infile an external dataset (CSV file) 3. Concatenate two datasets into one 4. The PLOT statement will
SOCIOLOGY 7702 FALL, 2014 INTRODUCTION TO STATISTICS AND DATA ANALYSIS
SOCIOLOGY 7702 FALL, 2014 INTRODUCTION TO STATISTICS AND DATA ANALYSIS Professor Michael A. Malec Mailbox is in McGuinn 426 Office: McGuinn 427 Phone: 617-552-4131 Office Hours: TBA E-mail: [email protected]
