Basics of using the R software
|
|
|
- Linette Webster
- 9 years ago
- Views:
Transcription
1 Basics of using the R software Experimental and Statistical Methods in Biological Sciences I Juulia Suvilehto NBE
2 Demo sessions Demo sessions (Thu ) x 5 Demos, example code, and exercises How to pass this part of the course Either attend the session OR Do the demo exercises and submit before that session (exercises available in Mycourses on Monday before the session) i.e. DL on Thursday at 14.00
3 Demo sessions Demo sessions (Thu ) x 5 Demos, example code, and exercises How to pass this part of the course Either attend the session OR Do the demo exercises and submit before that session (exercises available in Mycourses on Monday before the session) i.e. DL on Thursday at 14.00
4 Homework Homework assignments x 5 Online after each demo session Based on real data, publications attached where possible How to pass this part of the course Submit 1) commented.r script, and 2) answers in a pdf document, to mycourses Assignments before the next lecture i.e. DL always on following Wednesday at 9am
5 Help! Reception hours Mondays 3-4pm room F324, Kvarkki building, entrance from Rakentajanaukio 2C In case you need help with R / homework assignments Remember to bring your.r script! Contact us: General questions about the course (long absences etc): Head TA: [email protected] Practical questions (how do I do X in R?) your own assistants Always include course code Becs in subject
6 Course demo content 1. Basics of using the R software 2. Preparing data for statistical analysis 3. Descriptive statistics 4. Plotting data 5. Comparing means with t-tests 6. Analysis of variance 7. Correlation and linear regression 8. Testing categorical associations
7 Course demo content 1. Basics of using the R software 2. Preparing data for statistical analysis 3. Descriptive statistics 4. Plotting data 5. Comparing means with t-tests 6. Analysis of variance 7. Correlation and linear regression 8. Testing categorical associations
8 Structure of demo sessions Short slide show introducing the basic contents Example.r code in mycourses to serve as reference for exercises Exercise sheet Answers demonstrated by TAs.r code available in mycourses after the demo session If you cannot/don't want to attend the demo session, you can make up for you absence by submitting answers to the exercise sheet before the session
9 Outline for today Part I. Basics of using the R software Slides + demonstration Exercises Break Part II. Preparing data for statistical analyses Demonstration Exercises
10 Outline for today: Part I. 0. What is R? 1. Getting started 2. Data types 3. Data structures 4. Subscripting 5. Operations on your data
11 Outline for today: Part I. 0. What is R? 1. Getting started 2. Data types 3. Data structures 4. Subscripting 5. Operations on your data
12 0. What is R? Free open source software environment, based on the S language developed by Bell Labs With R you can write functions, do calculations, apply most available statistical techniques, or create simple or complicated graphs User-contributed packages expand the functionality of R A large user group supports it The use of R requires programming
13 0. What is R? Free open source software environment, based on the S language developed by Bell Labs With R you can write functions, do calculations, apply most available statistical techniques, or create simple or complicated graphs User-contributed packages expand the functionality of R A large user group supports it The use of R requires programming If you have previous programming experience, it should not take too long to learn the R syntax. If you have not programmed before, the learning curve will be a bit steeper in the beginning but once you get the hang of it, learning other languages will get a lot easier.
14 0. What is R? - Readings and info on R Free online tutorials: Books: Crawley, M. J. (2005) Statistics: An introduction using R. Crawley, M. J. (2007) The R book. Spector, P. (2008) Data manipulation with R. Zuur, A. F. et al. (2009) A beginner s guide to R.
15 0. What is R? Readings and info on R Potentially useful sidekicks: R-Commander GUI: Tinn-R: R studio: Online R environment: Online resources: R search engine: Blog aggregator: Reference cards: R graph gallery:
16 0. What is R? How to install and run R on your own computer? Go to Downloads: CRAN Set your mirror (e.g., in Sweden) Go to section Precompiled binary distributions Select your operation system Select base Select the installation and follow installation instructions
17 0. What is R? How to run R in Aalto network Create a network folder for your R demo materials (Z:) Windows: R is installed Starting R opens also Tinn-R; if you don t want to use it, simply close the window Linux: Start by typing R on the command line Or open RStudio
18 0. What is R? R interface Console window Executes functions and commands Displays outputs related to those operations Graphics window Displays visual information (plots) Editor window Resembles a basic word processor Is called when a text file is opened in R Allows to save code, document it, and rerun it at a later stage
19 0. What is R? R interface
20 Outline for today: Part I. 0. What is R? 1. Getting started Command-line syntax Variable assignment Functions and packages 2. Data types 3. Data structures 4. Subscripting 5. Operations on your data
21 1.Getting started: Command line syntax > # This is a comment > # The > is the prompt > 3*(2+2) # This is an example [1] 12 > # The [1] is a counter (not part of the data)
22 1. Getting started: Variables and assignments Variables are names that refer to temporarily stored values Create variables by assigning a value to a name Assignment operators: <- or = Names can contain letters, numbers, _ and., but cannot begin with a number remember also case-sensitivity
23 1. Getting started: Variables and assignments - example x <- 3 # x becomes 3, i.e. assigns value 3 to x x # prints its own current value [1] 3 x <- x+1 y <- x+1 # x becomes x+1, its previous value is lost # y becomes x+1, x is unchanged
24 1. Getting started: Functions, arguments, and returned values Data structures store your data Functions process it Think of functions as operators: like + and but more specialized return value <- function(argument) A function is written once but may be called many times, each time with different arguments A function returns a value, the result of the operation
25 1. Getting started: Functions, arguments, and returned values You call a function by typing its name followed by brackets containing the arguments You assign the returned value to a variable for further processing > x <- 9 # prepare function input > y <- sqrt(x) # call function sqrt() with argument, returned value is assigned to y
26 1. Getting started: Finding functions online No need to write your own functions, others probably have done that already Some useful web sites: R web site: Downloads: Function finder: Task view: Help archive:
27 1. Getting started: Finding functions in R Most functions have a sensible name mean(), sd(), var(), t.test(), Advanced search for function names: > help.search( read ) # search all > apropos( ^read ) # names starting with read > apropos(\\.test$) # names ending with.test
28 1. Getting started: Function help > help.start() # manuals and reference (every function has a help page) > help(t.test) # help page for function t.test >?t.test # same as above usage - a synopsis of the function s arguments arguments - describes the values you can pass as arguments value - describes the value returned by the function examples learn by running examples
29 1. Getting started: Packages and libraries A package is a collection of previously programmed functions, often including functions for specific tasks. While some packages are included in the base installation, most have to downloaded and installed install: add package to base version of R > install.packages(arm) # download and install package arm load: make the functions in the package ready to use for your R session > library( foreign ) # load package named foreign
30 1. Getting started: Useful package related commands > sessioninfo() # packages loaded into memory > library() # list packages installed on your computer > library(help= foreign ) # list the functions in foreign > update.packages() # update one or all packages
31 1. Getting started: Starting and organizing your session > getwd() # get working directory > setwd( /u/jtsuvile/documents/r_course ) # set path to working directory > dir() # list files in working directory > ls() # list objects in workspace > rm(list(=ls()) # remove all objects in workspace, i.e., clear workspace > graphics.off() # close all figures
32 Outline for today: Part I. 0. What is R? 1. Getting started 2. Data types Numeric Character Logical 3. Data structures 4. Subscripting 5. Operations on your data
33 2. Data types Numeric [1] Character [1] apple orange apple orange orange Logical [1] TRUE FALSE TRUE TRUE FALSE
34 2. Data types Conditional operators x == y # x equal to y? x!= y # x not equal to y? x < y # x less than y? x <= y # x less than or equal to y? x > y # x greater than y? x >= y # x greater than or equal to y?
35 2. Data types Logical values are returned by conditional expressions: > 3>2 # is 3 greater than 2? [1] TRUE > apple > orange # characters are compared alphabetically [1] FALSE > Apple > apple # and are case-sensitive [1] TRUE
36 Outline for today: Part I. 0. What is R? 1. Getting started 2. Data types 3. Data structures Vector Array/matrix Data frame List 4. Subscripting 5. Operations on your data
37 3. Data structures: Vector Vector is an ordered row or column of cells Each cell contains a single value Cell values must all be the same data type E.g. [1]
38 3. Data structures: Vector Functions that make vectors: > c(1,3,4.5) # combine > c( apple, apple, orange ) > 1:5 # : > 5.5:-5.5 > seq(0, 2*pi, by=0.1) # sequence of numbers > seq(0, 2*pi, length=64) > rep(1:3, times=3) # repeat a vector > rep(1:3, times=c(3,5,2)) > rep(1:3, each=3)
39 3. Data structures: Arrays/Matrices While vectors are one-dimensional, arrays are a multidimensional extension of vectors The most commonly used array in R is the matrix, a 2-dimensional array (according to R) A matrix is a 2-dimensional layout of cells The cell values must all be the same data type
40 3. Data structures: Matrices Functions that make matrices: > x <- 1:12 # create an example vector > matrix(x, nrow=3, ncol=4) # matrix arranges by columns by default > matrix(x, nrow=3, ncol=4, byrow=true) # arranging by rows > cbind(x,x) # bind vectors or matrices by columns > rbind(x,x) # or by rows
41 3. Data structures: Data frames While we could store our data in a matrix, with rows representing observations and columns representing variables a data frame is like matrix, just better (easy access to elements) and more flexible (can store different data types) In R, data frames are the preferred method for working with observations and variables type datasets This is what we will mostly use
42 3. Data structures: Data frames Make a data frame from column vectors > data.frame() Read text data from a file into a data frame > read.table( ageweight.txt, header=t, sep=, ) Read text data from an online source > read.table( header=t, sep=, ) Read.csv file > read.csv() Export data as a table > write.table() Save and load R objects: > save() > load()
43 3. Data structures: Lists A list is a collection of objects The components of a list can be different types of objects and can be of different lengths Lists are used to pass structures arguments to functions, and to return multi-valued objects from functions > mylist = list(fruit=c( apple, orange ), price=c(1.2, 1.4, 0.8))
44 3. Data structures: Basic diagnostic functions > class(x) # get the class (data type) of x > str(x) # display structure of object x > dim(x) # retrieve dimension (rows, columns) of x > nrow(x) # number of rows present in x > ncol(x) # number of columns present in x > names(x) # get/set column names of x > head(x) # returns the first part of x > tail(x) # returns the last part of x
45 Outline for today: Part I. 0. What is R? 1. Getting started 2. Data types 3. Data structures 4. Subscripting Numeric Character Conditional (logical) 5. Operations on your data
46 4. Subscripting For objects that contain more than one element, subscripting is used to access some or all of those elements Subscripting operations are very fast and efficient, and are often the most powerful tool for accessing and manipulating data in R Types of subscripts in R: numeric, character, logical
47 4. Subscripting: Numeric subscripting of vectors Cells in the vector are ordered The first cell/element is addressed by subscript/index number 1 (not 0) Syntax uses square brackets x[i] x is the vector we want to subscript Subscript i can be a single number or a vector of subscripts Cells are returned in the order given in the subscript
48 4. Subscripting: Numeric subscripting of vectors > x <- 1:12 # create sequence of number from 1 to 12 > x[1] # first cell > x[length(x)] # last cell > x[1:6] # first 6 cells > x[6:1] # first 6 cells (reverse order) > x[c(3,5,1)] # cells 3, 5, 1 in that order
49 4. Subscripting: Numeric subscripting of data frames Matrix and data frame cells have pairs of subscripts: x[i,j] i is the row subscript, j is the column subscript Rule of thumb: rows first, then columns.
50 4. Subscripting: Numeric subscripting of data frames Matrix and data frame cells have pairs of subscripts: x[i,j] i is the row subscript, j is the column subscript Rule of thumb: rows first, then columns. An empty index (e.g., x[i,]) is shorter form of complete index A single subscript (e.g, x[j]) returns data frame column by number Double-brackets (e.g., x[[j]]) return data frame columns in their native data type (i.e. not as data.frame) Negative subscripts return all cells NOT subscripted (e.g., x[-i,]), often useful when removing data!
51 4. Subscripting: Numeric subscripting of data frames > data <- read.table( ageweight.txt, header=t, sep=, ) > data[2:3, 3:2] # rows 2:3 and columns 3:2 > data[2:3,] # rows 2:3 and all columns > data[-(2:3),] # remove rows 2:3 > data[,2:3] # columns 2:3 and all rows > data[2:3] # columns 2:3 > data[[2]] # single column as a vector > data[2:3,2] <- -9 # set specific cells to -9
52 4. Subscripting: Character subscripts Columns of a data frame, and components of a list, can be addressed by name. The syntax uses a $ sign. > names(data) # get the column names > data$age # get the column named AGE A character vector addresses columns by name: > data[, c( AGE, WEIGHT )] # same as data[,2:3]
53 4. Subscripting: Conditional subscripting Uses logical subscript vectors to get or set a subset of data based on a condition Logical index vectors specify a pattern of cells to be addressed: cells corresponding to cells of the logical vector that are TRUE > data$sex == 1 # logical vector (conditional) > data[data$sex==1,] # get rows where sex==1 > data[data$sex==1, AGE ] # get AGE where sex==1 Logical vectors may be combined using operators & (AND) and (OR) to form composite conditions: > data[data$sex==1 & data$age>50,] # get rows where sex==1 AND age > 50
54 Outline for today: Part I. 0. What is R? 1. Getting started 2. Data types 3. Data structures 4. Subscripting 5. Operations on your data Vector arithmetics Vector functions Descriptive functions
55 5. Operations on your data: Vector arithmetics The arithmetic operators are: + - * / ^ %% Unary operators (-x and x^2) apply element-wise Binary operators apply pair-wise If the vectors are different lengths the shorter is recycled to match the length of the longer, concatenating it with itself until it is at least as long, and trimming excess off the end Scalar values are single-element vectors, recycled as necessary
56 5. Operations: Vectorized functions Arithmetic functions take vector arguments and return vector values The arithmetic operation is applied element-wise to the vector argument Example functions: round round to given number of decimal places ceiling round values up floor round values down abs absolute (unsigned) value sqrt square root exp exponential log, log10, log2 log to base e, 10, and 2 sin, cos, tan trigonometric functions asin, acos, atan inverse (arc) trigonometric functions scale scaling and centering
57 5. Operations: Vector arithmetics > x <- c(0,2,4,6,8,10) > y <- x(1,3,5,7,9,11) > -y > y^2 > y + x > y^2 # scalar 2 is recycled to match length(y)
58 5. Operations: Vectorized functions > x <- seq(0,1,length=10) > round(x,2) # round to 2 decimal places > sqrt(x) > log(x+1) > scale(x) # standardize as z-scores
59 5. Operations: Descriptive functions Descriptive functions return a single valued summary of a vector Example functions: length number of elements in a vector sum sum of the values in a vector min, max, range minimum, maximum, and range mean mean of the values in a vector median median of the values in a vector sd, var standard deviation and variance cov, cor covariance and Pearson correlation
60 5. Operations: Descriptive functions > x <- seq(0,1,length=10) > length(x) # length (number of elements) > mean(x) # mean > sd(x) # standard deviation
61 5. Operations: Descriptive functions Descriptive functions cov and cor return either a single value or a matrix, depending upon the arguments If the arguments are two vectors, a single value is returned If the argument is a matrix, a matrix is returned The i,j th element of the matrix is the covariance (using cov) or correlation (using cor) between the i th and j th column vectors
62 5. Operations: Descriptive functions > x <- rnorm(100) #?rnorm > y <- rnorm(100) > cov(x,y) # scalar covariance > cor(x,y) # Pearson correlation coefficient > cov(cbind(x,y)) # covariance matrix > cor(cbind(x,y)) # correlation matrix
63 Time for exercises and a break! Need more to read?
64 Preparing data for statistical analysis Experimental and Statistical Methods in Biological Sciences I Juulia Suvilehto NBE
65 Part II: Preparing data for statistical analysis A few things need to be taken care of before starting the statistical analysis Check for correct data format Check for outliers and missing data Today we focus on how to modify your data in general, later we go into details in preparing your data to some more specific analyses E.g., testing and correcting for normality and other assumptions
66 Outline for part II 1. Example data 2. Recap 3. Factors 4. Outliers and missing data
67 Example data: AgeWeight dataset # load in data: > AgeWeight <- read.csv( ight.csv, header=true) # examine a bit: > head(ageweight)
68 Recap 1/2 We have already encountered several types of data: # numeric > num.v <- c(1.2, 2.2, 3.0) # character > chr.v <- c( cat, dog, mouse ) We have seen simple operations on that data: > num.v/2
69 Recap 2/2 We have seen data collected in data frames: > my.df <- data.frame(x=rnorm(10), y=1:10) And we have learned about subscripting and functions: > my.df[,2] > my.df[2,] > mean(my.df$x)
70 Factors But what if our data represents categories? If the categories are numerically related, use a number. If they are completely unrelated (arbitrary), use a character. BUT if the categories are related but not numerically, use a factor. Examples of factors: male and female populations; easy, moderate, and difficult stimuli; yes, maybe, no etc. answers
71 Factors Examine our data: > head(ageweight) Here, sex (coded as male=1 in variable MALE) is an independent variable with a categorical response. You can create a set of suitable labels: > AgeWeight$SEX <- m > AgeWeight[which(AgeWeight$MALE==0),'SEX'] <- 'f' But AgeWeight$SEX is still just a set of characters. You need to tell R that it s a factor: > AgeWeight$SEX <- factor(ageweight$sex)
72 Factors Factors can also be represented by numbers. Compare the two variables below: > a.v. <- rep(c(1:5), 2) > a.v > b.v. <- factor(rep(c(1:5), 2) > b.v Note that factor are just labels of whatever type.
73 Factors: reading.csv files.csv files only include characters and numbers R has to guess which are factors Rule: numbers are numbers, characters are factors. You have to tell R if numbers are factors (or if characters are strings).
74 summary() So far, we ve looked at variables by typing their names. The summary() function tells you uselful things about the variables. > x <- c(1:10) > summary(x) > y <- c( cat, dog, mouse, horse ) > summary(y) > z <- factor(c( cat, dog, dog, horse )) > summary(z)
75 Objects Everything in R is actually an object of a given class. summary() and other functions take the class into account. > class(x) > class(y) > class(z) > class(class)
76 Objects When you type the name of an object, you are implicitly using the function print() print() won t necessarily tell you everything that s in the object, it will just tell you useful information If you really want to see what s in the object, use str() > z > print(z) > str(z)
77 Objects When you use a function, such as mean(), it returns an object: > mean(x) > print(mean(x)) # equivalents > class(mean(x)) > m <- mean(x) # this stores the result > class(m) Complicated functions return complicated objects! > o.lm <- lm(x ~ y, my.df) # stores your model for linear regression > summary(o.lm) # explore results for linear regression The R way: create objects, then explore them: > cooks.distance(o.lm) # more results > plot(o.lm, which=1:2) # plot these results
78 Outliers in the data While getting familiar with your data, you should make sure that the data makes sense summary(), describe() plotting will help you to identify outliers and missing values One cause for anomalities in the data are outliers, possibly caused by Typing errors Funny coding of missing values (e.g., 999)
79 Missing values Often, we don t have a complete record of the data R uses the special code NA to represent a missing value not available > my.df[7,2] <- NA > my.df
80 Missing values: Effects of NA Simple R functions are picky about data with missing values This is a good thing! You have to explicitely tell R to ignore missing data > mean(my.df$x) # get an error > mean(my.df$x, na.rm=true) Some functions are a bit more complex (see?cor); some ignore NAs by default
R Language Fundamentals
R Language Fundamentals Data Types and Basic Maniuplation Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Where did R come from? Overview Atomic Vectors Subsetting
A Short Guide to R with RStudio
Short Guides to Microeconometrics Fall 2013 Prof. Dr. Kurt Schmidheiny Universität Basel A Short Guide to R with RStudio 1 Introduction 2 2 Installing R and RStudio 2 3 The RStudio Environment 2 4 Additions
R: A self-learn tutorial
R: A self-learn tutorial 1 Introduction R is a software language for carrying out complicated (and simple) statistical analyses. It includes routines for data summary and exploration, graphical presentation
Using R for Windows and Macintosh
2010 Using R for Windows and Macintosh R is the most commonly used statistical package among researchers in Statistics. It is freely distributed open source software. For detailed information about downloading
Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang
Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang School of Computing Sciences, UEA University of East Anglia Brief Introduction to R R is a free open source statistics and mathematical
Psychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
Introduction to R and UNIX Working with microarray data in a multi-user environment
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Introduction to R and UNIX Working with microarray data in a multi-user environment Carsten Friis Media glna tnra GlnA TnrA C2 glnr C3 C5
Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller
Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller Most of you want to use R to analyze data. However, while R does have a data editor, other programs such as excel are often better
There are six different windows that can be opened when using SPSS. The following will give a description of each of them.
SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet
Beginner s Matlab Tutorial
Christopher Lum [email protected] Introduction Beginner s Matlab Tutorial This document is designed to act as a tutorial for an individual who has had no prior experience with Matlab. For any questions
Introduction to R June 2006
Introduction to R Introduction...3 What is R?...3 Availability & Installation...3 Documentation and Learning Resources...3 The R Environment...4 The R Console...4 Understanding R Basics...5 Managing your
Getting Started with R and RStudio 1
Getting Started with R and RStudio 1 1 What is R? R is a system for statistical computation and graphics. It is the statistical system that is used in Mathematics 241, Engineering Statistics, for the following
Installing R and the psych package
Installing R and the psych package William Revelle Department of Psychology Northwestern University August 17, 2014 Contents 1 Overview of this and related documents 2 2 Install R and relevant packages
OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE
OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION R is a free software environment for statistical computing
CSC 120: Computer Science for the Sciences (R section)
CSC 120: Computer Science for the Sciences (R section) Radford M. Neal, University of Toronto, 2015 http://www.cs.utoronto.ca/ radford/csc120/ Week 2 Typing Stuff into R Can be Good... You can learn a
Importing Data into R
1 R is an open source programming language focused on statistical computing. R supports many types of files as input and the following tutorial will cover some of the most popular. Importing from text
Prof. Nicolai Meinshausen Regression FS 2014. R Exercises
Prof. Nicolai Meinshausen Regression FS 2014 R Exercises 1. The goal of this exercise is to get acquainted with different abilities of the R statistical software. It is recommended to use the distributed
ECONOMICS 351* -- Stata 10 Tutorial 2. Stata 10 Tutorial 2
Stata 10 Tutorial 2 TOPIC: Introduction to Selected Stata Commands DATA: auto1.dta (the Stata-format data file you created in Stata Tutorial 1) or auto1.raw (the original text-format data file) TASKS:
Introduction Course in SPSS - Evening 1
ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Data and Projects in R- Studio
Data and Projects in R- Studio Patrick Thompson Material prepared in part by Etienne Low-Decarie & Zofia Taranu Learning Objectives Create an R project Look at Data in R Create data that is appropriate
Introduction to RStudio
Introduction to RStudio (v 1.3) Oscar Torres-Reyna [email protected] August 2013 http://dss.princeton.edu/training/ Introduction RStudio allows the user to run R in a more user-friendly environment.
EXCEL SOLVER TUTORIAL
ENGR62/MS&E111 Autumn 2003 2004 Prof. Ben Van Roy October 1, 2003 EXCEL SOLVER TUTORIAL This tutorial will introduce you to some essential features of Excel and its plug-in, Solver, that we will be using
Introduction to Matlab
Introduction to Matlab Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. [email protected] Course Objective This course provides
R Language Fundamentals
R Language Fundamentals Data Frames Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Tabular Data Frequently, experimental data are held in tables, an Excel worksheet, for
Exploratory Data Analysis and Plotting
Exploratory Data Analysis and Plotting The purpose of this handout is to introduce you to working with and manipulating data in R, as well as how you can begin to create figures from the ground up. 1 Importing
rm(list=ls()) library(sqldf) system.time({large = read.csv.sql("large.csv")}) #172.97 seconds, 4.23GB of memory used by R
Big Data in R Importing data into R: 1.75GB file Table 1: Comparison of importing data into R Time Taken Packages Functions (second) Remark/Note base read.csv > 2,394 My machine (8GB of memory) ran out
BIO503 - Lecture 1 Introduction to the R language
BIO503 - Lecture 1 Introduction to the R language Bio503 January 2008, Aedin Culhane. I R, S and S-plus What is R? ˆ R is an environment for data analysis and visualization ˆ R is an open source implementation
Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16
Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16 Since R is command line driven and the primary software of Chapter 16, this document details
Financial Econometrics MFE MATLAB Introduction. Kevin Sheppard University of Oxford
Financial Econometrics MFE MATLAB Introduction Kevin Sheppard University of Oxford October 21, 2013 2007-2013 Kevin Sheppard 2 Contents Introduction i 1 Getting Started 1 2 Basic Input and Operators 5
arrays C Programming Language - Arrays
arrays So far, we have been using only scalar variables scalar meaning a variable with a single value But many things require a set of related values coordinates or vectors require 3 (or 2, or 4, or more)
Package dsstatsclient
Maintainer Author Version 4.1.0 License GPL-3 Package dsstatsclient Title DataSHIELD client site stattistical functions August 20, 2015 DataSHIELD client site
Dataframes. Lecture 8. Nicholas Christian BIOST 2094 Spring 2011
Dataframes Lecture 8 Nicholas Christian BIOST 2094 Spring 2011 Outline 1. Importing and exporting data 2. Tools for preparing and cleaning datasets Sorting Duplicates First entry Merging Reshaping Missing
AMATH 352 Lecture 3 MATLAB Tutorial Starting MATLAB Entering Variables
AMATH 352 Lecture 3 MATLAB Tutorial MATLAB (short for MATrix LABoratory) is a very useful piece of software for numerical analysis. It provides an environment for computation and the visualization. Learning
5 Correlation and Data Exploration
5 Correlation and Data Exploration Correlation In Unit 3, we did some correlation analyses of data from studies related to the acquisition order and acquisition difficulty of English morphemes by both
Microsoft Excel 2010 Part 3: Advanced Excel
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES Microsoft Excel 2010 Part 3: Advanced Excel Winter 2015, Version 1.0 Table of Contents Introduction...2 Sorting Data...2 Sorting
SPSS: Getting Started. For Windows
For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 Introduction to SPSS Tutorials... 3 1.2 Introduction to SPSS... 3 1.3 Overview of SPSS for Windows... 3 Section 2: Entering
Lecture 2 Mathcad Basics
Operators Lecture 2 Mathcad Basics + Addition, - Subtraction, * Multiplication, / Division, ^ Power ( ) Specify evaluation order Order of Operations ( ) ^ highest level, first priority * / next priority
Step 2: Save the file as an Excel file for future editing, adding more data, changing data, to preserve any formulas you were using, etc.
R is a free statistical software environment that can run a wide variety of tests based on different downloadable packages and can produce a wide variety of simple to more complex graphs based on the data
Quick Tour of Mathcad and Examples
Fall 6 Quick Tour of Mathcad and Examples Mathcad provides a unique and powerful way to work with equations, numbers, tests and graphs. Features Arithmetic Functions Plot functions Define you own variables
Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.
Excel Tutorial Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information. Working with Data Entering and Formatting Data Before entering data
Data Mining with R: learning by case studies
Data Mining with R: learning by case studies Luis Torgo LIACC-FEP, University of Porto R. Campo Alegre, 823-4150 Porto, Portugal email: [email protected] http://www.liacc.up.pt/ ltorgo May 22, 2003 Preface
Data Mining Lab 5: Introduction to Neural Networks
Data Mining Lab 5: Introduction to Neural Networks 1 Introduction In this lab we are going to have a look at some very basic neural networks on a new data set which relates various covariates about cheese
Formulas & Functions in Microsoft Excel
Formulas & Functions in Microsoft Excel Theresa A Scott, MS Biostatistician II Department of Biostatistics Vanderbilt University [email protected] Table of Contents 1 Introduction 1 1.1 Using
InfiniteInsight 6.5 sp4
End User Documentation Document Version: 1.0 2013-11-19 CUSTOMER InfiniteInsight 6.5 sp4 Toolkit User Guide Table of Contents Table of Contents About this Document 3 Common Steps 4 Selecting a Data Set...
Introduction to SPSS 16.0
Introduction to SPSS 16.0 Edited by Emily Blumenthal Center for Social Science Computation and Research 110 Savery Hall University of Washington Seattle, WA 98195 USA (206) 543-8110 November 2010 http://julius.csscr.washington.edu/pdf/spss.pdf
G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.
SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background
Time Series Analysis of Aviation Data
Time Series Analysis of Aviation Data Dr. Richard Xie February, 2012 What is a Time Series A time series is a sequence of observations in chorological order, such as Daily closing price of stock MSFT in
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
MATLAB Basics MATLAB numbers and numeric formats
MATLAB Basics MATLAB numbers and numeric formats All numerical variables are stored in MATLAB in double precision floating-point form. (In fact it is possible to force some variables to be of other types
SPSS 12 Data Analysis Basics Linda E. Lucek, Ed.D. [email protected] 815-753-9516
SPSS 12 Data Analysis Basics Linda E. Lucek, Ed.D. [email protected] 815-753-9516 Technical Advisory Group Customer Support Services Northern Illinois University 120 Swen Parson Hall DeKalb, IL 60115 SPSS
4 Other useful features on the course web page. 5 Accessing SAS
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
An introduction to using Microsoft Excel for quantitative data analysis
Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing
Directions for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
Reading and writing files
Reading and writing files Importing data in R Data contained in external text files can be imported in R using one of the following functions: scan() read.table() read.csv() read.csv2() read.delim() read.delim2()
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Chapter 4. Spreadsheets
Chapter 4. Spreadsheets We ve discussed rather briefly the use of computer algebra in 3.5. The approach of relying on www.wolframalpha.com is a poor subsititute for a fullfeatured computer algebra program
ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.
Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment
SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
The Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
Figure 1. An embedded chart on a worksheet.
8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the
Data exploration with Microsoft Excel: analysing more than one variable
Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical
SAS Analyst for Windows Tutorial
Updated: August 2012 Table of Contents Section 1: Introduction... 3 1.1 About this Document... 3 1.2 Introduction to Version 8 of SAS... 3 Section 2: An Overview of SAS V.8 for Windows... 3 2.1 Navigating
FIRST STEPS WITH SCILAB
powered by FIRST STEPS WITH SCILAB The purpose of this tutorial is to get started using Scilab, by discovering the environment, the main features and some useful commands. Level This work is licensed under
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.
Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to
R with Rcmdr: BASIC INSTRUCTIONS
R with Rcmdr: BASIC INSTRUCTIONS Contents 1 RUNNING & INSTALLATION R UNDER WINDOWS 2 1.1 Running R and Rcmdr from CD........................................ 2 1.2 Installing from CD...............................................
The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests
Tutorial The F distribution and the basic principle behind ANOVAs Bodo Winter 1 Updates: September 21, 2011; January 23, 2014; April 24, 2014; March 2, 2015 This tutorial focuses on understanding rather
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
MATLAB Workshop 3 - Vectors in MATLAB
MATLAB: Workshop - Vectors in MATLAB page 1 MATLAB Workshop - Vectors in MATLAB Objectives: Learn about vector properties in MATLAB, methods to create row and column vectors, mathematical functions with
Linear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
January 26, 2009 The Faculty Center for Teaching and Learning
THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i
SOME EXCEL FORMULAS AND FUNCTIONS
SOME EXCEL FORMULAS AND FUNCTIONS About calculation operators Operators specify the type of calculation that you want to perform on the elements of a formula. Microsoft Excel includes four different types
Getting your data into R
1 Getting your data into R How to load and export data Dr Simon R. White MRC Biostatistics Unit, Cambridge 2012 2 Part I Case study: Reading data into R and getting output from R 3 Data formats Not just
Oracle SQL. Course Summary. Duration. Objectives
Oracle SQL Course Summary Identify the major structural components of the Oracle Database 11g Create reports of aggregated data Write SELECT statements that include queries Retrieve row and column data
MATLAB Programming. Problem 1: Sequential
Division of Engineering Fundamentals, Copyright 1999 by J.C. Malzahn Kampe 1 / 21 MATLAB Programming When we use the phrase computer solution, it should be understood that a computer will only follow directions;
Descriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.
The KaleidaGraph Guide to Curve Fitting
The KaleidaGraph Guide to Curve Fitting Contents Chapter 1 Curve Fitting Overview 1.1 Purpose of Curve Fitting... 5 1.2 Types of Curve Fits... 5 Least Squares Curve Fits... 5 Nonlinear Curve Fits... 6
A beginner s guide to data visualisation and statistical analysis and programming in R
Introductory R A beginner s guide to data visualisation and statistical analysis and programming in R Rob Knell School of Biological and Chemical Sciences Queen Mary University of London Published by Robert
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Introduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
Cluster software and Java TreeView
Cluster software and Java TreeView To download the software: http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/treeview.html Cluster 3.0
TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:
Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap
Descriptive Statistics
Descriptive Statistics Descriptive statistics consist of methods for organizing and summarizing data. It includes the construction of graphs, charts and tables, as well various descriptive measures such
Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
SEQUENCES ARITHMETIC SEQUENCES. Examples
SEQUENCES ARITHMETIC SEQUENCES An ordered list of numbers such as: 4, 9, 6, 25, 36 is a sequence. Each number in the sequence is a term. Usually variables with subscripts are used to label terms. For example,
Getting started with the Stata
Getting started with the Stata 1. Begin by going to a Columbia Computer Labs. 2. Getting started Your first Stata session. Begin by starting Stata on your computer. Using a PC: 1. Click on start menu 2.
A Primer on Using PROC IML
Primer on Using PROC IML SS Interactive Matrix Language (IML) is a SS procedure. IML is a matrix language and has built-in operators and functions for most standard matrix operations. To invoke the procedure,
One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate
1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
Practical Differential Gene Expression. Introduction
Practical Differential Gene Expression Introduction In this tutorial you will learn how to use R packages for analysis of differential expression. The dataset we use are the gene-summarized count data
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
Introduction to the data.table package in R
Introduction to the data.table package in R Revised: September 18, 2015 (A later revision may be available on the homepage) Introduction This vignette is aimed at those who are already familiar with creating
Homework 11. Part 1. Name: Score: / null
Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is
Exercise 4 Learning Python language fundamentals
Exercise 4 Learning Python language fundamentals Work with numbers Python can be used as a powerful calculator. Practicing math calculations in Python will help you not only perform these tasks, but also
