Controlling and Managing Your R Workspace



Similar documents
Introduction to the R Language

INTRODUCTION TO DESKTOP PUBLISHING

Cover Letter Workshop. Career Development Centre

HOW TO SUCCEED WITH NEWSPAPER ADVERTISING

Family Law - Divorce FAQ

Vieta s Formulas and the Identity Theorem

edgebooks Quick Start Guide 4

Odyssey of the Mind Technology Fair. Simple Electronics

Linear functions Increasing Linear Functions. Decreasing Linear Functions

Buoyancy. What floats your boat?

2.2 Derivative as a Function

Access Tutorial 2: Tables

Gettysburg College. Center for Global Education. Pre-Departure Handbook for Domestic Programs

HOW TO ORGANIZE PICTURES

Student Worksheet 1 TI-15 Explorer : Finding Patterns

Regions in a circle. 7 points 57 regions

Writing Thesis Defense Papers

Session 7 Fractions and Decimals

Data Flow Diagrams. Outline. Some Rules for External Entities 1/25/2010. Mechanics

Setting up a basic database in Access 2003

A Short Introduction to Eviews

Hypercosm. Studio.

Explain how Employee Performance is Measured and Managed

8.7 Exponential Growth and Decay

Managing Your Class. Managing Users

Action settings and interactivity

Getting the best from your 360 degree feedback

Express Technology. Maintenance Solutions Express Technology Inc.

Forces between charges

Chapter 28: Expanding Web Studio

Packaging Software: Making Software Install Silently

KPN SMS mail. Send SMS as fast as !

How to Make the Most of Excel Spreadsheets

Quick Start Guide Getting Started with Stocks

McKinsey Problem Solving Test Top Tips

ICS Technology. PADS Viewer Manual. ICS Technology Inc PO Box 4063 Middletown, NJ

send and receive MMS

HOW DO I CREATE A PARTNER PROGRAM? PRESENTED BY

If this PDF has opened in Full Screen mode, you can quit by pressing Alt and F4, or press escape to view in normal mode. Click here to start.

This explains why the mixed number equivalent to 7/3 is 2 + 1/3, also written 2

Understanding BEx Query Designer: Part-2 Structures, Selections and Formulas

Writing a Scholarship Essay. Making the essay work for you!

Client Marketing: Sets

Mailing lists process, creation and approval. Mailing lists process, creation and approval

3 Adwords Profit Secrets That Newbies Don t Know

This guide is to help you get started with Live Chat Support on your Wix website. If you have any additional questions after reading this guide,

MOST FREQUENTLY ASKED INTERVIEW QUESTIONS. 1. Why don t you tell me about yourself? 2. Why should I hire you?

TABLE OF CONTENTS. race result 11 Introduction

The Mind Map Tutor Handbook

Participants Manual Video Seven The OSCAR Coaching Model

Curve Fitting, Loglog Plots, and Semilog Plots 1

User Guide For ipodder on Windows systems

You can probably work with decimal. binary numbers needed by the. Working with binary numbers is time- consuming & error-prone.

The «include» and «extend» Relationships in Use Case Models

Tutorial 4 - Attribute data in ArcGIS

20-30 minutes, can be used within a longer activity

A PRACTICAL GUIDE TO db CALCULATIONS

Session 6 Number Theory

Circuit diagrams and symbols (1)

Sample- for evaluation purposes only. Advanced Crystal Reports. TeachUcomp, Inc.

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

HTML Code Generator V 1.0 For Simatic IT Modules CP IT, IT, IT

5050 Getting Started

Grade 7/8 Math Circles Sequences and Series

The Physics and Math of Ping-pong and How It Affects Game Play. By: Connor Thompson & Andrew Johnson

2. A tutorial on notebooks

The Windows Command Prompt: Simpler and More Useful Than You Think

Introduction to Java and Eclipse

Implementation of Best Practices in Environmental Cleaning using LEAN Methodology. Tom Clancey and Amanda Bjorn

The Easy Way To Flipping Domain Names

APPENDIX. Interest Concepts of Future and Present Value. Concept of Interest TIME VALUE OF MONEY BASIC INTEREST CONCEPTS

Getting Started with WebSite Tonight

OUTSOURCE IT OR KEEP IT IN-HOUSE?

First Affirmative Speaker Template 1

Nashville Number System

Name Partners Date. Energy Diagrams I

After Effects CS4. Getting Started. Getting Started. Essential Training Introduction

Registry Tuner. Software Manual

Keyboard Basics. By Starling Jones, Jr.

COMMONWEALTH OF PA OFFICE OF ADMINISTRATION. Human Resource Development Division. SAP LSO-AE Desk Guide 15 T H J A N U A R Y,

Colorado State University s Systems Engineering degree programs.

Finding the last cell in an Excel range Using built in Excel functions to locate the last cell containing data in a range of cells.

Your logbook. Choosing a topic

Student Communication Package

A step-by-step guide to making a complaint about health and social care

NEW YORK UNIVERSITY Department of Chemistry Summer 2013

GMAT SYLLABI. Types of Assignments - 1 -

Preparing and Revising for your GCSE Exams

Computational Geometry: Line segment intersection

The Logistic Function

Gradus ad Parnassum: Writing the Master s Thesis Michigan State University, College of Music September Carol A. Hess

Unix Shell Scripts. Contents. 1 Introduction. Norman Matloff. July 30, Introduction 1. 2 Invoking Shell Scripts 2

Page 18. Using Software To Make More Money With Surveys. Visit us on the web at:

Manual English KOI Desktop App 2.0.x

SCHEME AND SERVICE INFORMATION

The Promised Land. You will need: Other items depending on the activities chosen

Maintaining employees focus and motivation is essential if they are to make a full contribution to your business.

Transcription:

1 Controlling and Managing Your R Workspace This guide provides an overview of the R Workspace and R Environment including its basic logic and structure. The goal is to develop an understanding of how best to control and manage your Workspace so that you can avoid or quickly diagnose several common errors. Basic Overview The R Environment consists of all the files necessary for running the R Program as well as data sets and other objects that you have created or loaded into your Workspace. These files can be broken down into three basic types: 1. The base packages that run all the standard analyses that we use in this course. These files are installed automatically when you first download and install the R program. 2. Additional packages you can install on your own and which allow for more advanced statistical analysis or additional commands. 3. The data sets that you download and other objects (data sets and variables) that you create. We will not discuss the first two types in great detail. But, it is useful to take a brief look. Using the search command as follows, we can view our entire R Environment. > search() [1] ".GlobalEnv" "package:stats" "package:graphics" [4] "package:grdevices" "package:utils" "package:datasets" [7] "package:methods" "Autoloads" "package:base" When you first install the R program, these are the 9 files you will find in your Environment. The first,.globalenv, is your Global Environment or R Workspace. It is where all your objects and data sets are stored. More on your Global Environment below. The remaining 8 files are required to run all the basic analyses they will (or should) always be there. You can install additional packages to do special types of analyses, not a part of the initial installation. For example, one common package is called MASS. To install this package, simply type the following: >library(mass) Typing in: >search() We can see how our R Environment has changed: [1] ".GlobalEnv" "package:mass" "package:stats" [4] "package:graphics" "package:grdevices" "package:utils" 1

2 [7] "package:datasets" "package:methods" "Autoloads" [10] "package:base" Notice that everything is the same as above, except there is now a new file called package:mass. This is the file that you just installed with the library(mass) command. Also, notice that it is in the 2 nd position, just after the Global Environment, which always stays in the first position. Search Path: The search () command not only shows the entire R Environment; it also shows the order in which R accesses files that is, it shows the search path. For example,.globalenv (The Global Environment or R Workspace) is always first in the search path, while package:base is always last. When you execute commands, R will always look first in the Global Environment for commands, data sets, and variables. If it does not find them there, it goes to the next file in the search path in this case, package:mass. If it does not find them there, it will continue to the 3 rd, 4 th all the way to the package:base. If it does not find the object there, then guess what, you will get an error saying the object is not found. The Global Environment/R Workspace: The global environment is the most important of all the files in the search path. This is where all your objects (data sets and variables) are stored your R Workspace. To see your R Workspace, you need to use ls(), which is short for list. >ls() Since this lists all objects in your R Workspace, the output will vary from student to student. But, let s assume that you currently have 4 objects in your R Workspace: two variables called name and age and two data sets called POL201 and HDEV. Then the output will appear as follows: [1] "age" "HDEV" "name" "POL201" If a variable on its own (that is, not inside of a data set) is in our R Workspace, we can immediately run appropriate analyses on this variable. For example, we can calculate the mean for age simply as follows: >mean(age) [1] 22 ERROR ALERT However, what happens when we try to run analyses on a variable inside of a data set? For example, there is a variable called height in the POL201 data set. If we try to take the mean of this variable (with the na.rm=t argument to remove missing values), we get the following: 2

3 >mean(height, na.rm=t) Error in mean(height, na.rm = T) : object "height" not found We get the error for the reason that is stated object height not found. The reason that the object is not found is that R can currently only see the data set, POL201, not its contents or variables. Accessing Variables in Data Sets: In order to access variables in our data sets, we have two options: 1. Use a $ sign to separate the name of the data set and the variable of interest. For example, >mean(pol201$height, na.rm=t) [1] 68.92958 This tells R to first look at the data set, and then find the appropriate named variable inside of that data set. The one disadvantage of this approach is that involves more typing and longer commands. Thus, we can also opt to first: 2. Attach the data set with the attach() command: >attach(pol201) >mean(height, na.rm=t) [1] 68.92958 * ERROR ALERT. Note: You can only attach data sets. You cannot attach the variables within the data set. For example, the following commands will simply give you error messages: > attach(height) Error in attach(height) : object "height" not found > attach(pol201$height) Error in attach(pol201$height) : 'attach' only works for lists, data frames and environments Attaching Data Sets and Changes to your R Environment: When you attach a data set, R actually creates a copy of the data set and inserts it as a file in your R Environment. You can see this by re-examining your search path: >search() [1] ".GlobalEnv" "POL201" "package:mass" [4] "package:stats" "package:graphics" "package:grdevices" [7] "package:utils" "package:datasets" "package:methods" [10] "Autoloads" "package:base" 3

4 You see that POL201 is now a file in your search path; more specifically, it is now the file in 2 nd position in your search path and all other files have been shifted or pushed down one position. You should now understand that attaching an object from your R Workspace, puts it in your R Environment, and then allows the contents of that data set (the individual variables) to be accessed for analysis. You can remove a data set from your search path by detaching it: >detach(pol201) [1] ".GlobalEnv" "package:mass" "package:stats" [4] "package:graphics" "package:grdevices" "package:utils" [7] "package:datasets" "package:methods" "Autoloads" [10] "package:base" Notice, POL201 is now gone from the search path. Attaching Data Sets, Masking, and the Search Path: It seems reasonable to ask at this point, why not simply attach all data sets and leave them attached? Often, this would work just fine. However, this will not work when: 1. You have an object in your Global Environment that has an identically name (and capitalization) to one in your data set of interest. 2. You have identically named variables in different data sets. Both of these problems produce what are called Masking Errors. It is not technically an error in this sense that it won t allow you to run an analysis. R is simply telling you that one of your objects (variables) is being masked or covered up by another variable of the same name. This means you cannot actually access one of the same named variables for analysis. Thus, the key question is which variable will it access for analysis? To answer this question, you must understand how R accesses data and files. You have already learned this: it accesses files in the order of the search path. The most important thing here is that R always accesses objects in.globalenv (the Global Environment) first. If you have a variable in your Global Environment that is identical to that in an attached data set, you will never be able to access and analyze that variable in the data set. It will always find the one in your Global Environment first. Let s create a masking error of this nature. 4

5 Starting with no data sets attached, let s create a new variable in your environment called height. >height = c(65, 70, 75) Taking a look at our R Workspace, we now see 5 objects: the original 2 data sets and 2 variables as well as our newly created height variable. > ls() [1] "age" "HDEV" "height" "name" "POL201" Now, let s attach the POL201 data set and immediately you will see the masking problem. > attach(pol201) The following object(s) are masked _by_.globalenv : height Notice, it tells you directly that the object height is masked_by_.globalenv. If we try to now take the mean of height, we will be taking the mean of the height variable from the Global Environment, not the one in the POL201 data set. >mean(height, na.rm=t) [1] 70 ERROR ALERT Similarly, if we try to run a scatterplot of the relationship between height and gpa, we will get an error message of a different sort. > plot(height,gpa) Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ This is because again the height variable invoked here is not the one in the POL201 data set. The error occurs because the height variable in the global environment is not the same length as the gpa variable inside the POL201 data set. Other Masking Problems: Being masked by the Global Environment is the most common type of masking problem; however, there are a few others. 1. You attach the same data set twice. You can attach the same data set twice. It simply creates a new copy of the data set, places it at position #2, and pushes the old copy to position #3. 5

6 >attach(pol201) The following object(s) are masked from POL201 ( position 3 ) : boycott career1 career2 colour country country1 country2 dist do.study gender gpa haircut height idlgy internet job lang occupy party petition protest shld.study smoke sport strike talk.pol transport tv vote wkrhrs Notice that it tells you that all the objects from POL201 in (position 3) are now masked, again meaning that they are unavailable for analysis. It should be clear that this is not really a big deal, because those variables are being masked by an exact copy of the other variables. You can see the basic logic of double attaching by examining your search path. >search() [1] ".GlobalEnv" "POL201" "POL201" [4] "package:mass" "package:stats" "package:graphics" [7] "package:grdevices" "package:utils" "package:datasets" [10] "package:methods" "Autoloads" "package:base" You see POL201 is now in two unique positions: Position #2 and Position #3. 2. The second type of masking problem occurs when you attach two different data sets which happen to have an identically named variable. This might happen once or twice throughout the semester. The solution is to simply detach the data set that is not of interest. Detaching Data Sets: To detach a data set you simply use the detach() command. For example, >detach(pol201) 6