Introduction to STATA 11 for Windows
|
|
|
- Wendy Mills
- 9 years ago
- Views:
Transcription
1 1/27/2012 Introduction to STATA 11 for Windows Stata Sizes...3 Documentation...3 Availability...3 STATA User Interface...4 Stata Language Syntax...5 Entering and Editing Stata Commands...6 Stata Online Help...7 Getting Data into Stata...9 Using the Data Editor...9 Reading Data from Delimited Text Files insheet Command...11 Reading Data from Excel Files...12 Reading Data from Delimited Text Files infile Command...13 Saving a Stata Data File...15 Using a Stata Data File...16 Examine the Data...16 browse/edit...16 list...17 describe...18 summarize...20 Missing Values...21 Frequencies...22 Value Labels...23 Tabulations...25 Tabulate Options Percents & Chisquare...26 Summary Statistics...27 summarize...27 tabstat...28 Analysis of Subgoups...28 by() option...29 by: prefix...29 Additional Analyses...30 Confidence Intervals...30 Two sample t-test...31 Paired t-test...32 Selecting Cases
2 if...33 in...34 Creating New Variables...34 generate, replace, drop...34 recode...35 Graphs...36 Descriptive Statistics Bars...37 Histograms...38 Frequency Bars...39 Scatterplots...41 Overlay Graphs, Symbols & Legends...41 Naming Graphs...43 String Variables...44 Dates...46 Efficiency Tricks...47 Using the Review and Variables Windows...47 Using the Do-File Editor:...48 Saving the Review Window as a Do-File:...49 Using Log to Print and Save Output:
3 Stata Sizes Stata is available in several sizes. All have the same features, but differ in the size of data that they can handle: Max. number of variables Max. number of observations Small Stata Intercooled Stata Stata SE or MP ,766 About 1000 Depends on computer s RAM Depends on computer s RAM Documentation All Stata documentation is included with the software as pdf files. Physical manuals can be ordered from the Stata website: Availability Stata is available to University of Massachusetts students, faculty and staff for purchase at a discount. Small Stata is only available to students, and only with a 1-year license. Other sizes are available to all, with perpetual licenses. For list of currently available choices and prices, see To purchase, call Stata at , or [email protected]. Identify yourself as a UMass student, faculty or staff, and make payment directly to Stata. They will fax the University Store to release the software to you. You can also use Stata in any of OIT's computer classrooms. To find classroom locations and hours, go to 3
4 STATA User Interface When you start Stata, you will see the main Stata interface with four sections: Command Window: this is where you enter Stata commands Results Window: displays the results of Stata commands Review Window: shows the history of Stata commands for this session Variables Window: displays all the variables in the active data file The Stata toolbar, located directly underneath the Stata menu, has shortcut icons to the most frequently used commands. To determine what each icon does, hold the mouse pointer over an icon (do not click). A box will appear with a description of the icon function. For example, holding the mouse over the fifth icon from the right, we see this icon activates the data editor. The data editor is a spreadsheet-like interface used for entering, browsing or editing Stata data. 4
5 Stata Language Syntax Stata is a command driven statistical software package. See "Language Syntax Overview" in the Stata User's Guide for more detail than is provided here. Many basic Stata commands are available as Menu choices or tool bar icons. When you use the menus, the command is generated for you, and appears in the Results and Review windows, just as if you had typed it. You can use this feature to learn commands, as well as to rerun commands without going through the menus or re-typing. While using menus is a convenient learning tool, to take full advantage of Stata's capabilities, you will need to type Stata commands in the Stata command window. Therefore, this document will focus on Stata commands, more than on the menu system. The general form of a Stata command is: [by varlist:] command [varlist] [if exp] [in range] [weight=exp] [,options] The square brackets denote optional qualifiers. The brackets are not typed when entering a command. Stata is case-sensititve! All Stata commands are lower case. Since everything except the command is enclosed in square brackets, the simplest form of the language would be to issue just a command. For example, the command: summarize gives the mean, standard deviation and range of all numeric variables in the active Stata dataset. The meaning of the various parts of the command are: by: prefix requests that the command be executed repeatedly for each distinct value or combination of values of the variable(s) in the list that follows by. For example, if variable gender has values M and F, then adding the prefix by gender: requests that the command be executed twice: once for gender M, and again for gender F. See Analysis of Subgroups section for more detail. varlist denotes a list of variable names. Stata variable names can be up to 32 characters long, must start with a letter, and can contain letters, numbers and the underscore ( _ ). Variable names are CASE-SENSITIVE! Variable x and variable X are not the same. The variable list following by specifies the by variables (see above); the list following the command specifies the analysis variables for the command. 5
6 command denotes a Stata command, and is the only part of the general form that is always required. Specific commands may require additional parameters. if exp is an algebraic expression which is used to select the observations to be used by the command. See section Selecting Cases if and in for details. in range is a range of sequential numbers of the observation to be used by the command. See section Selecting Cases if and in for details. weight=exp defines a weighted analysis. Stata supports four kinds of weights, though not all of them are applicable to all commands. See User's Guide section Language Syntax Overview for information about weighting. options is a list of options to be applied to the command. Note that a comma is required to mark the start of the options list. The specific options available vary depending on the command. Entering and Editing Stata Commands You type your commands in the Command window. If a command is long, it will wrap onto as many lines as it takes to complete the command. Do NOT press Enter until the very end of the command. When you press Enter, Stata immediately executes the command. After you press Enter, the command appears in the Review window, followed by the output (if any), or an error message, in the Results window. If you made a typing mistake, click on the command you wish to correct in the Review window. It re-appears in the Command window, ready for you to edit it. Press Enter to submit the corrected command. If the output from a command does not fit on the Review window, Stata displays as much as fits on the screen, and pauses with the message: --more-- Press the spacebar to display the next screenful of output; press Enter to display just one more line. The Variables window displays the names of the variables in the file you are using. When you click on the name of a variable in the Variables window, that name is entered in the Command window. Use this feature to help you enter variable lists faster and with fewer errors in the commands you are typing. Filenames and string values in commands must be enclosed in quotes ("), not apostrophes ('). 6
7 Stata Online Help Stata has extensive online help. From the Help menu you have the choice of Contents, Search or Stata Command for online help. Contents contains a table of contents for all the command help files arranged by subject. Search allows you to do a keyword search for help files. For example, if you wanted to find out how to input data you could go to Help Search, and enter input data in the search dialog. You would then get a Help screen similar to the following: Blue text is a link you can click on for further information. The letter in square brackets on the left indicates the manual that contains information about that command: [U] for Users' Guide, [D] for Data Management, [R] for Reference, [G] for Graphics. Manuals often provide more detail and examples than the online help. For this example, to get more information about the infiling command, we can look in the User's Guide, or click on infiling. The latter gives the following help screen: 7
8 If you know the Stata command that you wish to learn about select Stata command from the Help menu and type the command in the dialog box. For example to learn about the summarize command, type summarize in the Stata command dialog to get the following: 8
9 Getting Data into Stata Stata data consists of observations and variables, corresponding to rows and columns respectively in a spreadsheet-like arrangement. You can only work on one Stata dataset at a time. The data you are working on is stored in memory. Using the Data Editor The data editor can be used to enter new data, or to view or edit existing data. To open the Data Editor, select Windows Data Editor, or click the Data Editor icon (fifth from the right) on the Tool Bar If you have a dataset in memory, it is displayed in the editor. Otherwise you get a blank editor screen, like this one:. As you type data, it is displayed next to var1[1]=. Var1[1] stands for variable 1, observation 1, and corresponds to the highlighted cell. When you press Enter, the data you typed is entered into the highlighted cell. The display changes to var1[2], and the corresponding cell is highlighted. Enter the following three observations and three variables:
10 The data editor will now look like this: Stata assigned the default variable names var1, var2, var3. To change the variable names, double-click the column heading var1. The Stata Variable Properties dialog opens, with var1 in the Name field. Change var1 to your preferred variable name. Notice you can also give the variable an extended label, select how it is stored internally (Type), how it is displayed (Format), and assign value labels. More about this later. 10
11 Here we have changed var1, var2, var3 to a,b,c, respectively. Remember that variable names are case sensitive. Reading Data from Delimited Text Files insheet Command The insheet command is used for text files that: are comma-delimited or tab-delimited, but not space-delimited contain 1 observation per line may or may not have column headings (If there are no column headings, Stata assigns variable names v1, v2, v3, etc. The rename command or the Data Editor can later be used to change the names.) For example, a file arranged like this could be entered into Stata using the insheet command: a,b,c 10,20,30 40,50,60 70,80,90 The command to read this file would have the form: insheet using "c:\statawork\test.txt",clear The clear option specifies that any data in memory can be cleared. If you have not saved the data in memory, it is discarded. Because Stata can only have one dataset at a time in memory you need to clear the memory before you can enter new data. clear can be used either as a stand alone command or as an option on a command that reads new data. 11
12 You can generate the insheet command from the menus. Select File Import ASCII data created by a spreadsheet Fill out the insheet dialog with the file to be imported; don't forget to check "Replace data in memory". Click OK. Use the list command, or the Data Editor to see the data. Reading Data from Excel Files Excel files, and other spreadsheets, can be read into STATA in one of two ways. 1) Convert the Excel file to.txt (In Excel choose File Save As and select.txt from the Save as Type drop down list). Read the text file into STATA with or without column headings using the insheet command described above. 2) Copy and paste into Stata. First, in Excel, select and copy your data (with or without column headings). In Stata o Type clear to clear all data in Stata memory. o Open a blank Data Editor window. o With the first cell selected, press ctrl-v to paste the data into the Data Editor. 12
13 Reading Data from Delimited Text Files infile Command The infile command is used for text files that: are comma-, tab-, or space-delimited, or any combination of the three contain 1 observation per line, more than 1 observation per line, or 1 observation across multiple lines do not contain column headings For the remaining examples we will use data from Appendix A of Minitab Handbook, Second Edition, Ryan, Joiner and Ryan, PWS-KENT Publishing Company, 1985 p For instructional purposes a date variable was added, a few missing values introduced, and gender was coded with letters (M or F) rather than numeric codes. These data are copyrighted and must be acknowledged and used accordingly. Each subject measured his/her pulse rate. The subjects were then randomly divided into two groups. One group ran in place for two minutes; the other did nothing. At the end of the two minutes, everyone measured his/her pulse again. The data can be downloaded from: The description of the data is at: Download the data file and save it in a convenient location. In this document we will assume that all files are saved in C:\statawork. The data is arranged like this: Variable Columns Description BDATE 1-10 Date of Birth (mm/dd/yyyy) PULSE First Pulse (0 = Missing Value ) PULSE Second Pulse (0 = Missing Value ) GROUP 20 Group (1=Ran in place, 2 = Did not run in place ) SMOKE 22 Smokes (1= Yes, 2 = No; 9 = Missing Value ) GENDER 24 Gender (M or F) HEIGHT Height in inches WEIGHT Weight in pounds (0 = Missing Value ) ACTIVITY 34 Usual level of physical activity (1-slight, 2=moderate, 3=very; 9=Missing Value) Here is a listing of the first few cases: 10/15/ F /21/ M /21/ M /05/ M /20/ M
14 We can read this file into a Stata worksheet using the infile command. Type in the Command window: infile str10 bdate pulse1 pulse2 group smoke str1 gender height weight activity using "C:\statawork\minidat_date.dat", clear This is all one command. Since it is long, it will wrap onto two lines as you type. Do not press Enter until the end of the command. The data file does not have variable names at the top of the file. You provide the variable names (bdate, pulse1, pulse2, ) on the infile command. Remember that variable names are case-sensitive. All variables are assumed to be numeric unless you say otherwise,. The "str10" preceding variable bdate tells Stata to interpret the values of bdate as a 10-character string, e.g. "10/15/1985". Similarly str1 preceding gender means that the values of gender are 1-character strings ("M" or "F"). If Stata finds non-numeric values in a variable that it expects to be numeric it reports an error similar to this: '03/21/1985' cannot be read as a number for bdate 'M' cannot be read as a number for gender The Results window echoes your command, and reports the result of the command execution. If the data is read successfully you will see:. infile str10 bdate pulse1 pulse2 group smoke str1 gender height weight activity using "C:\statawork\minidat_date.dat", clear (91 observations read) 14
15 Using the menu system, File Import Unformatted ASCII data generates the infile command. However, you will still need to specify the variable names and string formats for non-numeric variables, so there is little to be gained by using the menu for this command: Saving a Stata Data File Stata stores your working data in memory. To save data for future use, you need to save it as a Stata datafile. You can do this using the menus, or the save command: Using the menu, Select File Save As. In the Save As dialog, browse to the directory where you want to save your file, and enter a filename in the Filename box. Note that Stata data files must have the extension.dta. Use the save command, specifying the location and name for the file to save, in quotes ("), not apostrophes ('). If you do not specify an extension, Stata will add.dta to your filename. For example: save "c:\statawork\minidat2" If a file with that name already exists, Stata will not save the file unless you add the replace option: save "c:\statawork\minidat2", replace 15
16 Using a Stata Data File To use a Stata data file that was saved in a previous session, you use either the menus, or the use command: Using the menu, Select File Open. Browse to the directory where your file is saved, select the file, and click Open. Since all Stata data files have the extension.dta, only those files are available for you to select. Type the use command, specifying the location and name for the file to open, in quotes, not apostrophes. If you do not specify an extension, Stata will add.dta to your filename. For example: use "c:\statawork\minidat2" If there is already some data in memory, Stata will not open the file unless you add the clear option. The clear option tells Stata to discard anything in memory: use "c:\statawork\minidat2", clear Examine the Data After creating a new data file, you typically need to examine the data to identify possible problems. Two commands for looking at the data directly are list and edit. Two useful summary commands for getting brief information about the data are describe and summarize. browse/edit The Data Editor lets you view and edit your data. To open the data Editor, click the Data Editor icon on the Stata toolbar the icon looks like a spreadsheet with a pencil. The Data Browser displays your data just like the Data Editor, but does not let you make changes. Its icon is to the right of the Data Editor icon, and looks like a spreadsheet with a magnifying glass. Use it to avoid unintentionally changing your data. 16
17 You can also open the Data Editor or Browser with the commands: edit browse When you close the Data Editor, any changes you made are preserved in the copy of the data held in memory, and will be used in all subsequent analyses in this Stata session. If you wish to make the changes permanent for future Stata sessions, you must save the latest data to a file. See Saving a Stata Data File. list The list command displays your data in the Results window. Like Browse, list does not permit editing the data. (From here on, Menu choices will be listed in square brackets [ ] following the command itroduced. The command can simply be typed into the Command window, rather than using the menu system. ) list [Data Describe data List Data ] --more-- indicates that there is more output to display; Stata paused because the Results window was full. To see the next screen of output, press the space bar. Pressing Return scrolls the output forward by only one line. Press ctrl-c or the Break icon (Red circle with X) on the toolbar to abort the listing without scrolling to the end. 17
18 describe The describe command gives you some very basic information about your data the number of observations (cases), number of variables, and the name and type of each variable. describe [Data Describe Describe data in memory ] The Result window displays: obs: 91 vars: 9 size: 3,913 (99.3% of memory free) storage display value variable name type format label variable label bdate str10 %10s pulse1 float %9.0g pulse2 float %9.0g group float %9.0g smoke float %9.0g gender str1 %9s height float %9.0g weight float %9.0g activity float %9.0g Sorted by: "Storage type" tells you whether a variable is numeric ("float") or character ("str"). In this case, we see that bdate is a 10-character string, and gender is a 1- character string. Everything else is numeric. "Display format" tells you how Stata will display the values of a variable. %9.0g means the variable will be displayed with up to 9 digits, and Stata will decide whether and how many decimal places to display, depending on context. (The "g" stands for "general".) This works well in most cases, but at times you may need to force Stata to display decimal places. You do this with the format command. For example, to force Stata to display weight using 9 digits with 1 decimal place, use the command: format %9.1f weight Formats always begin with the percent sign (%) and end with a letter. Here, "f" stands for "fixed", meaning the number of decimal places to display cannot vary. We have not assigned any labels, so the label columns are empty. 18
19 You can use the Variables Manager (3rd icon from right in Tool bar) to change formats, assign labels, and even change the how a variable is stored (within limits). For example, this dialog shows that we changed variable group's type from float to int, and its format to %3.0f. Notice in the Results and History windows, this generated the recast and format commands. 19
20 summarize The summarize command displays basic summary statistics for all numeric variables: summarize [Statistics Summaries, Tables, and Tests Summary and Descriptive Statistics Summary Statistics ] Here are the Results: Notice that bdate and gender have no information listed, not even the number of observations. Character data shows up as missing in many situation. See String Variables section for how to deal with this problem. Also notice that pulse1, pulse2 and weight have 0 as the minimum value, and that smoke and activity have 9 for the maximum value. Looking at the codebook for the data, we see that these are missing value indicators. However, they are included in calculating the means and standard deviations. In order to get the correct calculations, we must tell Stata to exclude these values. This is discussed in the Missing Values section. 20
21 Missing Values We noted earlier that missing values coded as numbers are included in all calculations. This is a problem for 0s in variables pulse1, pulse2 and weight, and 9s in smoke and activity. Use the mvdecode command to tell Stata to ignore certain values in all calculations. The following command tells Stata to ignore zero in pulse1, pulse2 and weight: mvdecode pulse1 pulse2 weight, mv(0) The Results window displays: pulse1: 1 missing value generated pulse2: 1 missing value generated weight: 1 missing value generated Here is the command to tell Stata to ignore 9s in smoke and activity: mvdecode smoke activity, mv(9) The Results window displays: smoke: 1 missing value generated activity: 1 missing value generated To check, let's repeat the summarize command. We see that pulse1, pulse2 and weight now have reasonable minimum values, their number of observations is 90 rather than 91, and the mean and standard deviation are different than they were before. Similarly, smoke and activity no longer include 9. summarize Variable Obs Mean Std. Dev. Min Max bdate 0 pulse pulse group smoke gender 0 height weight activity NOTE: Missing values are stored internally as a very large number. Although they are excluded from all statistical analyses, if you use a selection expression of the form variable>=#, observations where that variable is missing will be selected. For example, the selection if pulse2>100 will include the case with pulse2 missing. 21
22 Frequencies Variables group, smoke and activity are categorical, so the above summary with means and standard deviations is not an appropriate way to describe them. In addition, gender is also categorical, with non-numeric values. The tab1 command can be used to get one-way frequencies of many variables. Type tab1 in the Command window, then click each variable to be selected in the Variables window to avoid typing their names. tab1 group smoke gender activity [Statistics Summaries, Tables, and tests Tables Multiple one-way tables ] This is part of the output, showing the results for smoke and gender: -> tabulation of smoke smoke Freq. Percent Cum Total > tabulation of gender gender Freq. Percent Cum F M Total
23 Value Labels In the tables above, it would be nice to have the values of smoke labeled with their meaning (Yes, No), rather than displayed as 1 and 2. Similarly, the values of group and activity should be labeled for improved readability. Labeling values has two components create a label that associates text with the codes, and assign the label to one or more variables. Using the Variables Manager, click Manage next to Value Labels to open the Manage Value Labels dialog, where you can create value labels. Then select a variable in Variables Manager and use the Value Label drop down to assign value labels to that variable. The corresponding commands are label define and label values. The following label define creates three labels, names them yn, group and activitylevel, and assigns appropriate text to the numeric codes: label define yn 1 "Yes" 2 "No" label define group 1 "Run" 2 "No Run" label define activitylevel 1 "little" 2 "some" 3 "very active" Note that a label may have the same name as a variable, e.g. group. The above labels are not yet associated with any variables. Use label values to assign the labels to variables' values: label values group group label values smoke yn label values activity activitylevel You can assign the same label definition to more than one variable. If we have several variables with yes/no responses, all coded 1 and 2 respectively, we could assign the yn label to all of them. Once labels have been assigned, Stata displays these variables using their labels, rather than the numeric codes. For example: tab1 smoke smoke Freq. Percent Cum Yes No Total
24 Describe now shows which variables are labeled, and the names of their labels: obs: 91 vars: 9 28 Aug :06 size: 3,913 (99.6% of memory free) storage display value variable name type format label variable label bdate str10 %10s pulse1 float %9.0g pulse2 float %9.0g group float %9.0g group smoke float %9.0g yn gender str1 %9s height float %9.0g weight float %9.0g activity float %11.0g activitylevel If you forget what the codes underlying the labels are, use label list, or the Manage Value Labels dialog: label list activitylevel: 1 little 2 some 3 very active group: 1 Run 2 No Run yn: 1 Yes 2 No If you ever need output with the actual data values, rather than the labels, use the nolabel option: tab1 smoke, nolabel smoke Freq. Percent Cum Total
25 Tabulations We ve already used the tab1 command to get frequency counts of the categorical variables group, gender, smoke and activity. To get crosstabluation, use the tab2 or tabulate commands. Here we crosstabulate gender by smoke: [Statistics Summaries, Tables, and tests Tables All possible two-way tabulations ] tab2 gender smoke -> tabulation of gender by smoke smoke gender Yes No Total F M Total
26 Tabulate Options Percents & Chisquare Most commands have options to modify how the command operates, or to request more output. Here we use the row and chi2 options on tab2 to add row percents and a chi-square test of independence to the above table. tab2 gender smoke, row chi2 -> tabulation of gender by smoke Key frequency row percentage smoke gender Yes No Total F M Total Pearson chi2(1) = Pr =
27 Summary Statistics summarize We ve already used the summarize command to get basic summary statistics for each variable. The detail option adds considerable additional information. Compare the results of summarize without and with the detail option: summarize weight Variable Obs Mean Std. Dev. Min Max weight summarize weight, detail weight Percentiles Smallest 1% % % Obs 90 25% Sum of Wgt % 145 Mean Largest Std. Dev % % Variance % Skewness % Kurtosis
28 tabstat The tabstat command provides more flexibility in the choice of summary statistics, and may be more compact than summarize with detail. You get only the statistics you request: tabstat height weight, stats(n mean sd semean med) [Statistics Summaries, Tables and Tests Tables Table of Summary Statistics (tabstat)] stats height weight N mean sd se(mean) p Analysis of Subgoups Most Stata commands can be used with the by: prefix to repeat the command for each unique value of one or more categorical variables. Some commands also have a by() option, which accomplishes the same thing a little more easily, and usually provides more compact output. The by: prefix requires that the data be sorted on the by variable(s); the by() option has no such requirement. Let s repeat the above summary statistics for height and weight, but this time we d like to see the results separately for males and females. For brevity, we ll only ask for the mean. 28
29 by() option The tabstat command has a by() option, so we ll try that first: tabstat height weight, stats(mean) by(gender) Summary statistics: mean by categories of: gender gender height weight F M Total by: prefix Now, let s try it with the by: prefix: by gender:tabstat height weight, stats(mean) not sorted r(5); We get an error, because the data is not sorted by gender. We try again, this time sorting the data first: sort gender by gender:tabstat height weight, stats(mean) > gender = F stats height weight mean > gender = M stats height weight mean The by prefix permits a sort option, resulting in the more compact, but equivalent by gender, sort:tabstat height weight, stats(mean) 29
30 Additional Analyses Confidence Intervals To get 95% confidence intervals for normally distributed variables, use the ci command: ci height weight [Statistics Summaries, Tables and Tests Summary and Descriptive Statistics Confidence Intervals] Variable Obs Mean Std. Err. [95% Conf. Interval] height weight Add the level option to get different confidence intervals: ci height weight, level(90) Variable Obs Mean Std. Err. [90% Conf. Interval] height weight Notice that using the Confidence Interval dialog, you get a pre-selected choice of confidence levels. Actually, the ci command is not limited to those values. You can recall the command and change the value of the level option. Or you can type some other value in the confidence level dialog than the ones offered. 30
31 Two sample t-test We use a two sample t-test to see whether runners and non-runners differ in pulse rate after running: ttest pulse2, by(group) [Statistics Summaries, Tables and Tests Classical Tests of Hypotheses Two-group mean-comparison test] Two-sample t test with equal variances Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] Run No Run combined diff diff = mean(run) - mean(no Run) t = Ho: diff = 0 degrees of freedom = 88 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) =
32 Paired t-test Now let s compare pulse before and after running. To do this, we need to use a paired t-test. Further, we ll use the by prefix to get separate analyses for the runners and non-runners. To use the by prefix, the data must be sorted by group: -> group = Run by group, sort:ttest pulse1=pulse2 [Statistics Summaries, Tables and Tests Classical Tests of Hypotheses Mean Comparison Test, paired data] Paired t test Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] pulse pulse diff mean(diff) = mean(pulse1 - pulse2) t = Ho: mean(diff) = 0 degrees of freedom = 33 Ha: mean(diff) < 0 Ha: mean(diff)!= 0 Ha: mean(diff) > 0 Pr(T < t) = Pr( T > t ) = Pr(T > t) = The above is the output for the first group. This is followed by the output for the second group, which we do not show here. 32
33 Selecting Cases The if and the in parameters can be added to any Stata command to select a subset of cases to be used. Notice the if/in tab on most dialogs. if if selects cases that satisfy some logical criterion. You can enter the logical condition in the If: (expression) portion of the if/in tab of a dialog. For example, we can restrict the above paired t-test to the first group: ttest pulse1=pulse2 if group==1 Notice the double equal sign in the if portion of the command. Stata uses the double equal sign in all logical conditions. group==1 is a logical condition because we are asking Stata to check whether the value of variable group is 1, and only include it in the analysis if it is. The single equal sign is reserved for assigning values to variables see Creating New Variables. Note that although group is labeled, and displayed as Run and No Run, we must use its actual numerical value in the if condition. You can combine several conditions on the if criterion. Here, we request the analysis to include only male runners: ttest pulse1=pulse2 if group==1 & gender=="m" The symbols to use for building logical expressions are: Logical (numeric and string) ~ not > greater than! not < less than or >= greater than or equal & and <= less than or equal == equal ~= not equal!= not equal Be careful using > and >= conditions with variables that have missing values! To use these properly, exclude the missing values by adding & <. For example, the following selects lists subjects that weigh over 150 pounds, but not those whose weight is missing. list if weight>150 & weight <. 33
34 in in selects cases based on their position in the datafile. Using the if/in tab of a dialog, select "Use a range of observations" and enter the range of cases to be used. Using commands, to list the first 3 observations: list in 1/3 To list observations from the end of the file, use negative numbers. This lists the last 3 observations: list in -3/-1 Creating New Variables You can create new variables using formulae and existing variables. Two useful commands for doing this are generate and recode. [Data Create or change data Create new variable] [Data Create or change data Change contents of variable] generate, replace, drop generate (abbreviated gen) is used to calculate new variables using arithmetic, algebraic and/or logical constructs. In order to examine the change in pulse, let s compute the difference in pulse rates between the second and first measurements for each student. We ll call the new variable pdiff. We use generate to calculate the new variable: gen pdiff=pulse2-pulse1 Notice that the Result window says: (2 missing values generated) Variables pulse1 and pulse2 each had one missing value. The difference cannot be computed when either of the operands is missing. Therefore, the newly computed difference variable has 2 missing values. 34
35 Basic arithmetic expressions are formed using the operators: + addition (numeric) or concatenation (string) - subtraction * multiplication / division ^ power If you make a mistake in the generate command, and try to re-do it with a correction, you get an error: gen pdiff=pulse2 pulse1 pdiff already defined In order to recalculate an existing variable, use replace instead of generate: replace pdiff=pulse2 pulse1 Alternatively, you can drop a variable, then use generate to recreate it. drop pdiff gen pdiff=pulse2 - pulse1 recode recode is used to make or redefine categories. The following command divides variable height into 5 categories. We use the gen option on recode to store the result in a new variable called ht_cat. Without this, the original values of height would be lost. recode height (min/65=1)(66/68=2)(69/72=3)(73/max=4), gen(ht_cat) To see the results, we run tab1 on the new variable: tab1 ht_cat RECODE of height Freq. Percent Cum Total It would be good to label the newly created categories. We could do this by defining value labels and associating them with variable ht_cat, as shown under Value Labels. Alternatively, recode allows you to create labels as you recode. (Note: the command below will wrap to the next line; do not press Enter until the end of the command.) 35
36 recode height (min/65=1 "65 or less") (66/68=2 "66-68")(69/72=3 "69-72") (73/max=4 "73 or more"), gen(ht_cat) label(ht_cat_label) The tabulation now displays the label for each category. tab1 ht_cat -> tabulation of ht_cat RECODE of height Freq. Percent Cum or less or more Total Graphs Stata has extensive graphical capabilities, with many options. We will introduce only a few simple types of graphs and options. For (lots) more information, consult the Stata Graphics manual. All graph commands begin with: graph graphtype Where graphtype specifies the kind of graph. The most common graphtypes are bar, box, matrix, pie, and twoway. Twoway graphs require additional specification of the kind of twoway graph: area, bar, connected, histogram, line, and scatter are some of the choices within the twoway family. These lists are by no means complete. Many options are available for adding titles, labeling axes, and controlling other features. The options available will vary depending on the type of graph. 36
37 Descriptive Statistics Bars graph bar is used to make bar charts displaying descriptive statistics (such as mean, median, percentiles, sum and others). Typically, such a graph is used to visually compare groups. For example, to compare average pulse2 for the two groups (runners and non-runners): graph bar (mean) pulse2, over(group) To get a title, add: graph bar (mean) pulse2, over(group) title("average Pulse by Group") 37
38 Histograms Histograms show the distribution of continuous variables. Stata chooses appropriate cutpoints for the bars: graph twoway histogram pulse2 We would expect the distribution of pulse2 to be quite different for runners and non-runner. We add the by() option to get separate histograms for the two groups. In addition, we d like to see the histogram scaled in percent, rather than density. graph twoway histogram pulse2, by(group) percent 38
39 Frequency Bars Histogram graphs can also be used to display frequency distributions for a categorical variable. To do this, just add the discrete option to the histogram. Here we also also use the freq option to display the number of subjects (rather than density) in each of the three activity levels: graph twoway histogram activity, discrete freq This graph would look nicer if the bars were separated by some blank space. Also, the scale for activity should only show the integers, 1, 2, and 3, and these should be labeled with their assigned labels, rather than with numbers. To put space between the bars, we add the gap(#) option, where # is the percent of the bars width to leave blank between bars. The xlabel(1(1)3 valuelabel) option specifies that the x-axis (horizontal) should be labeled starting at 1, with increments of 1, ending at 3, and displayed with value labels. graph twoway histogram activity, discrete freq gap(40) xlabel(1(1)3, valuelabel) To get this command from the menus, use Graphics Histogram. On the Main dialog, select the variable to chart, click "data are discrete". Click Bar Properties, 39
40 set the gap value. Under the X axis tab, Major tick/label properties, Rule tab, choose Range, and fill in the Minimum, Maximum and Delta values (1,3,1). On the Labels tab, select Use Value Labels. Click OK. 40
41 Scatterplots Here is a simple scatterplot of pulse2 versus pulse1: graph twoway scatter pulse2 pulse1 Overlay Graphs, Symbols & Legends The above scatterplot does not distinguish the runners from the non-runners. We want distinct markers to distinguish the groups. For this you need to request two graphs on one set of axes. Use parentheses to specify graphs to be overlaid. The first graph is a scatterplot of pulse2 versus pulse1 for group 1 (runners), the second of pulse2 versus pulse1 for group 2 (non-runners). The msymbol option specifies small diamonds for the first group, and small circles for the second. (This is one long command, which wraps onto 2 lines.) 41
42 graph twoway (scatter pulse2 pulse1 if group==1, msymbol(d)) (scatter pulse2 pulse1 if group==2, msymbol(o)) There's one problem. The legend labels the symbols with the name of the y- variable, rather than with the name of the group that the symbol represents. We add the legend option to label the symbols with the group names: graph twoway (scatter pulse2 pulse1 if group==1,msymbol(d)) (scatter pulse2 pulse1 if group==2, msymbol(o)), legend(order(1 "Run" 2 "No Run")) Notice the arrangement of parentheses and options each scatter command is enclosed in parentheses, and has its own option (msymbol) following a comma. These options apply only to the preceding scatter, inside the parenthesis. The legend option follows the comma after the closing parenthesis of the last scatter. This option applies to the entire graph. 42
43 Naming Graphs Each graph you create replaces the previous graph on screen. If you want to keep more than one graph on screen, you need to give them distinct names. Add the name option to any graph command to give a graph a name. (The default name is Graph.) The following two commands will generate the same graphs as before, but the second will not replace the first. In addition to naming each graph, we also used the replace option on each command, so that we can re-run that graph, should we decide to modify it further. graph twoway histogram activity, discrete freq gap(40) xlabel(1(1)3, valuelabel) name(activity, replace) graph twoway (scatter pulse2 pulse1 if group==1,msymbol(d)) (scatter pulse2 pulse1 if group==2, msymbol(o)), legend(order(1 "Run" 2 "No Run")) name(p2vp1, replace) 43
44 String Variables Recall that variables gender and bdate are string: describe gender bdate storage display value variable name type format label variable label gender str1 %9s bdate str10 %10s gender has values M and F. Stata severely limits what you can do with string (non-numeric) variables. You can use them in tabulate, or as by variables, but for most other purposes you need numeric data. For example, in the previous section we made a bar chart of the frequencies of each value of activity, using the command: graph twoway histogram activity, discrete freq We should be able to make a bar chart of the frequencies of each value of gender, using the similar command: graph twoway histogram gender, discrete freq Instead of the expected chart we get an error: varlist: gender: string variable not allowed r(109); String variables must be converted to numeric form in order to use them in most commands, such as the frequency bar chart. The encode command is a convenient way to create a new variable with numeric codes corresponding to the distinct values of a string variable: encode gender, gen(sex) This generates a new numeric variable, sex, and associated labels based on the codes found in gender. The values of sex are integers, assigned alphabetically. Thus sex is assigned value 1 with label F, and 2 with label M. describe shows that gender is string, while sex is numeric, with assigned label sex. 44
45 describe gender sex storage display value variable name type format label variable label gender str1 %9s sex long %8.0g sex To see how the labels were assigned, use label list. Here is the relevant snippet of output: sex: 1 F 2 M You can confirm that sex and gender have the same values with a crosstabulation: tab2 gender sex Now you can use the numeric variable sex to get the bar chart of frequencies. We also add a gap between the bars, and label the integer values, using the assigned labels: graph twoway histogram sex, discrete freq gap(60) xlabel(1 2, valuelabel) Other commands useful for changing from string to numeric values are destring and real. Both of these are used for converting string variables that contain values that look like numbers, to actual numbers. 45
46 Dates The variable bdate was read as a string. If you sort the data by bdate, it will not be sorted properly for dates. For example, 01/11/1987 is before 01/19/1986. sort bdate list bdate in 1/ bdate /06/ /09/ /11/ /19/ /20/ /21/1985 Further, we cannot do any calculations, such as computing age, using bdate. In order to do anything reasonable with bdate, we need to tell Stata to interpret it as a date. When we tell Stata to interpret a string as a date, it stores the information as the number of days since January 1, Here we create a new variable, birthdate, which is a Stata date, and sort the data according to birthdate: generate birthdate=date(bdate, "MDY") format %d day sort birthdate list birthdate birthdate dec feb feb apr apr jun1984 generate uses the date function to creates the new variable, birthdate. The date function is used to interpret strings that look like dates into Stata dates. The MDY in the date function specifies the order in which month day and year 46
47 appear in the string. (If year has only 2 digits you ll need an additional parameter to interpret the string correctly see Help Search date function.) The new variable, birthdate, is should be assigned format %d, in order to display it in human readable form as ddmmmyyyy. Without the %d format it would be displayed as number of days since Jan. 1, We can now drop the original string variable, which serves no further useful purpose: drop bdate Date variables can be used in calculations and in graphs. For example, here we compute each person s age as of January 1, Since Stata dates are measured in days, we divide by to get years, then drop the decimal portion: generate day=date("1/1/2006","mdy") format %d day generate age=trunc((day-birthdate)/365.25) Efficiency Tricks Using the Review and Variables Windows The Review window displays your past commands. If you click on a command in the Review window it is copied to the Command window, so you can edit and run it again. If you double-click on a command in Review, it is immediately executed. The Variables window lists the variables in the Stata dataset. As you type commands, you can click on a variable in the Variable window to copy it to the Command window at the cursor position. Both commands and variable names can be abbreviated to the shortest string that uniquely identifies them. For example, earlier we ran the command: summarize weight This could have been abbreviated (though for the sake of clarity, such extreme abbreviations are not advisable): sum w For variables, it is much better to use the Variable window to quickly generate the full name. 47
48 Using the Do-File Editor: The do-file editor enables you to type a series of Stata commands and submit them all at once. You can save the commands in your do-file, so you can later reproduce, edit or add to your work without having to re-type commands. To open the do-file editor, click on the Do-file Editor icon (it looks like paper and pencil) or select Window Do-file Editor New do-file editor. The do-file editor has basic features of any text editor: cut, copy, paste, undo, open, save and print. Here is the Do-file editor window with commands to o open the minidat2 Stata data file o run summary statistics on pulse1, pulse2, height and weight o tabulate gender by smoke with row percents o compare pulse2 between the run and no-run groups using a t-test o make a scatterplot of pulse1 against pulse2 with the run and no-run groups indicated by different markers. Note the three slashes (///) in the graph command. This is a continuation symbol you can use (in the Do-editor only, not on the Stata command line) if you need to break up a long command to more than one line. To execute all the commands, select Tools Execute (do) If you want to execute only some of the commands in the Do-file editor, select the commands you want, then select Tools Execute (do) You can of course save the contents of the Do-file editor, and Open it to use again in a future session. Note that the Save and Open file menu selections in the Dofile editor window can only be used to save and open do files; the Save and Open file menu selections in the Stata main window only save and open Stata data files. 48
49 Saving the Review Window as a Do-File: Every command you type in the Command window goes to the Review window. In addition to retrieving commands from the Review window for immediate reexecution or editing, you can save the contents of the Review window as a do-file. Select the commands you wish to save from the Review window. Right-click anywhere in the Review window (not on its title bar), and select what to do: Copy - you can then paste the commands into an open Do-file editor window. Send to Do-file Editor - opens a fresh Do-file and puts the commands in it. Save Selected - saves a new do file. Using Log to Print and Save Output: To print the contents of the Results window, select File Print Results Window. If you've selected (a contiguous) section of the Results window, you can choose to print the selection only. If you want to save your Results, you must capture the output in a log file. To start a log file select File Log Begin or click on the Begin Log icon (if looks like a spiral notebook). When the dialog box appears, choose the location, fill in a name and select the type of log file (*.smcl or *.log). o *.scml is a Stata formatted log; it can be printed from Stata, in whole or part, retaining bolds, underlines, and italics as you see them in the Results window. *.scml logs cannot be edited. o *.log is a plain text file; it can be opened for editing or printing in any text editor or word processor. When you begin a log file, all subsequent text output goes to both the Results window to the log file. Graphs do not go to the log, but can be saved using the graph save or graph export commands. To stop collecting output in the log, select 49
50 File Log Suspend (or Close). To print output from the log, open the log in Stata s Viewer. Select File Log View and browse to your log file. Select the portion you want to print. From Stata s main window menu, select File Print View "..." If a portion of the log is highlighted, you can print just that part of the log by choosing "Selection" on the print dialog. You can type a title into the "Header" field, which will print at the top of each page. Uncheck the "Print Logo" box. If you open a file that you are logging to in the Viewer, the Viewer displays a snapshot taken at the time the log is opened for viewing. To update the log in the Viewer, click on the Refresh button at the top of the Viewer window. 50
How to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
4 Other useful features on the course web page. 5 Accessing SAS
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
Introduction to SAS on Windows
https://udrive.oit.umass.edu/statdata/sas1.zip Introduction to SAS on Windows for SAS Versions 8 or 9 October 2009 I. Introduction...2 Availability and Cost...2 Hardware and Software Requirements...2 Documentation...2
SPSS Manual for Introductory Applied Statistics: A Variable Approach
SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All
SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS
SECTION 2-1: OVERVIEW Chapter 2 Describing, Exploring and Comparing Data 19 In this chapter, we will use the capabilities of Excel to help us look more carefully at sets of data. We can do this by re-organizing
WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide
STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO
Appendix 2.1 Tabular and Graphical Methods Using Excel
Appendix 2.1 Tabular and Graphical Methods Using Excel 1 Appendix 2.1 Tabular and Graphical Methods Using Excel The instructions in this section begin by describing the entry of data into an Excel spreadsheet.
Data exploration with Microsoft Excel: analysing more than one variable
Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical
Appendix III: SPSS Preliminary
Appendix III: SPSS Preliminary SPSS is a statistical software package that provides a number of tools needed for the analytical process planning, data collection, data access and management, analysis,
Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
GETTING YOUR DATA INTO SPSS
GETTING YOUR DATA INTO SPSS UNIVERSITY OF GUELPH LUCIA COSTANZO [email protected] REVISED SEPTEMBER 2011 CONTENTS Getting your Data into SPSS... 0 SPSS availability... 3 Data for SPSS Sessions... 4
SPSS Step-by-Step Tutorial: Part 1
SPSS Step-by-Step Tutorial: Part 1 For SPSS Version 11.5 DataStep Development 2004 Table of Contents 1 SPSS Step-by-Step 5 Introduction 5 Installing the Data 6 Installing files from the Internet 6 Installing
SHORT COURSE ON Stata SESSION ONE Getting Your Feet Wet with Stata
SHORT COURSE ON Stata SESSION ONE Getting Your Feet Wet with Stata Instructor: Cathy Zimmer 962-0516, [email protected] 1) INTRODUCTION a) Who am I? Who are you? b) Overview of Course i) Working with
Introduction to SPSS (version 16) for Windows
Introduction to SPSS (version 16) for Windows Practical workbook Aims and Learning Objectives By the end of this course you will be able to: get data ready for SPSS create and run SPSS programs to do simple
Using SPSS, Chapter 2: Descriptive Statistics
1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,
ECONOMICS 351* -- Stata 10 Tutorial 2. Stata 10 Tutorial 2
Stata 10 Tutorial 2 TOPIC: Introduction to Selected Stata Commands DATA: auto1.dta (the Stata-format data file you created in Stata Tutorial 1) or auto1.raw (the original text-format data file) TASKS:
Data Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
Access Queries (Office 2003)
Access Queries (Office 2003) Technical Support Services Office of Information Technology, West Virginia University OIT Help Desk 293-4444 x 1 oit.wvu.edu/support/training/classmat/db/ Instructor: Kathy
SPSS: Getting Started. For Windows
For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 Introduction to SPSS Tutorials... 3 1.2 Introduction to SPSS... 3 1.3 Overview of SPSS for Windows... 3 Section 2: Entering
January 26, 2009 The Faculty Center for Teaching and Learning
THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i
Microsoft Excel 2010 Part 3: Advanced Excel
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES Microsoft Excel 2010 Part 3: Advanced Excel Winter 2015, Version 1.0 Table of Contents Introduction...2 Sorting Data...2 Sorting
Data exploration with Microsoft Excel: univariate analysis
Data exploration with Microsoft Excel: univariate analysis Contents 1 Introduction... 1 2 Exploring a variable s frequency distribution... 2 3 Calculating measures of central tendency... 16 4 Calculating
Introduction Course in SPSS - Evening 1
ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/
Directions for Frequency Tables, Histograms, and Frequency Bar Charts
Directions for Frequency Tables, Histograms, and Frequency Bar Charts Frequency Distribution Quantitative Ungrouped Data Dataset: Frequency_Distributions_Graphs-Quantitative.sav 1. Open the dataset containing
Getting started with the Stata
Getting started with the Stata 1. Begin by going to a Columbia Computer Labs. 2. Getting started Your first Stata session. Begin by starting Stata on your computer. Using a PC: 1. Click on start menu 2.
Using Microsoft Excel to Manage and Analyze Data: Some Tips
Using Microsoft Excel to Manage and Analyze Data: Some Tips Larger, complex data management may require specialized and/or customized database software, and larger or more complex analyses may require
IBM SPSS Statistics for Beginners for Windows
ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning
Scatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
An introduction to using Microsoft Excel for quantitative data analysis
Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing
Intro to Excel spreadsheets
Intro to Excel spreadsheets What are the objectives of this document? The objectives of document are: 1. Familiarize you with what a spreadsheet is, how it works, and what its capabilities are; 2. Using
TIPS FOR DOING STATISTICS IN EXCEL
TIPS FOR DOING STATISTICS IN EXCEL Before you begin, make sure that you have the DATA ANALYSIS pack running on your machine. It comes with Excel. Here s how to check if you have it, and what to do if you
Introduction to SPSS 16.0
Introduction to SPSS 16.0 Edited by Emily Blumenthal Center for Social Science Computation and Research 110 Savery Hall University of Washington Seattle, WA 98195 USA (206) 543-8110 November 2010 http://julius.csscr.washington.edu/pdf/spss.pdf
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
Excel 2007 Basic knowledge
Ribbon menu The Ribbon menu system with tabs for various Excel commands. This Ribbon system replaces the traditional menus used with Excel 2003. Above the Ribbon in the upper-left corner is the Microsoft
How To Use Excel With A Calculator
Functions & Data Analysis Tools Academic Computing Services www.ku.edu/acs Abstract: This workshop focuses on the functions and data analysis tools of Microsoft Excel. Topics included are the function
0 Introduction to Data Analysis Using an Excel Spreadsheet
Experiment 0 Introduction to Data Analysis Using an Excel Spreadsheet I. Purpose The purpose of this introductory lab is to teach you a few basic things about how to use an EXCEL 2010 spreadsheet to do
STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.
STATGRAPHICS Online Statistical Analysis and Data Visualization System Revised 6/21/2012 Copyright 2012 by StatPoint Technologies, Inc. All rights reserved. Table of Contents Introduction... 1 Chapter
Data Analysis Stata (version 13)
Data Analysis Stata (version 13) There are many options for learning Stata (www.stata.com). Stata s help facility (accessed by a pull-down menu or by command or by clicking on?) consists of help files
Using Excel for Analyzing Survey Questionnaires Jennifer Leahy
University of Wisconsin-Extension Cooperative Extension Madison, Wisconsin PD &E Program Development & Evaluation Using Excel for Analyzing Survey Questionnaires Jennifer Leahy G3658-14 Introduction You
IBM SPSS Statistics 20 Part 1: Descriptive Statistics
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 1: Descriptive Statistics Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
SPSS (Statistical Package for the Social Sciences)
SPSS (Statistical Package for the Social Sciences) What is SPSS? SPSS stands for Statistical Package for the Social Sciences The SPSS home-page is: www.spss.com 2 What can you do with SPSS? Run Frequencies
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Using Excel for descriptive statistics
FACT SHEET Using Excel for descriptive statistics Introduction Biologists no longer routinely plot graphs by hand or rely on calculators to carry out difficult and tedious statistical calculations. These
Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)
Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different) Spreadsheets are computer programs that allow the user to enter and manipulate numbers. They are capable
Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick Reference Guide
Open Crystal Reports From the Windows Start menu choose Programs and then Crystal Reports. Creating a Blank Report Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick
SPSS 12 Data Analysis Basics Linda E. Lucek, Ed.D. [email protected] 815-753-9516
SPSS 12 Data Analysis Basics Linda E. Lucek, Ed.D. [email protected] 815-753-9516 Technical Advisory Group Customer Support Services Northern Illinois University 120 Swen Parson Hall DeKalb, IL 60115 SPSS
Introduction to PASW Statistics 34152-001
Introduction to PASW Statistics 34152-001 V18 02/2010 nm/jdr/mr For more information about SPSS Inc., an IBM Company software products, please visit our Web site at http://www.spss.com or contact: SPSS
Basic Excel Handbook
2 5 2 7 1 1 0 4 3 9 8 1 Basic Excel Handbook Version 3.6 May 6, 2008 Contents Contents... 1 Part I: Background Information...3 About This Handbook... 4 Excel Terminology... 5 Excel Terminology (cont.)...
SAS Analyst for Windows Tutorial
Updated: August 2012 Table of Contents Section 1: Introduction... 3 1.1 About this Document... 3 1.2 Introduction to Version 8 of SAS... 3 Section 2: An Overview of SAS V.8 for Windows... 3 2.1 Navigating
Advanced Excel for Institutional Researchers
Advanced Excel for Institutional Researchers Presented by: Sandra Archer Helen Fu University Analysis and Planning Support University of Central Florida September 22-25, 2012 Agenda Sunday, September 23,
Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
Charting LibQUAL+(TM) Data. Jeff Stark Training & Development Services Texas A&M University Libraries Texas A&M University
Charting LibQUAL+(TM) Data Jeff Stark Training & Development Services Texas A&M University Libraries Texas A&M University Revised March 2004 The directions in this handout are written to be used with SPSS
EXST SAS Lab Lab #4: Data input and dataset modifications
EXST SAS Lab Lab #4: Data input and dataset modifications Objectives 1. Import an EXCEL dataset. 2. Infile an external dataset (CSV file) 3. Concatenate two datasets into one 4. The PLOT statement will
Learning SPSS: Data and EDA
Chapter 5 Learning SPSS: Data and EDA An introduction to SPSS with emphasis on EDA. SPSS (now called PASW Statistics, but still referred to in this document as SPSS) is a perfectly adequate tool for entering
The Center for Teaching, Learning, & Technology
The Center for Teaching, Learning, & Technology Instructional Technology Workshops Microsoft Excel 2010 Formulas and Charts Albert Robinson / Delwar Sayeed Faculty and Staff Development Programs Colston
How to Use a Data Spreadsheet: Excel
How to Use a Data Spreadsheet: Excel One does not necessarily have special statistical software to perform statistical analyses. Microsoft Office Excel can be used to run statistical procedures. Although
An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman
An Introduction to SPSS Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman Topics to be Covered Starting and Entering SPSS Main Features of SPSS Entering and Saving Data in SPSS Importing
There are six different windows that can be opened when using SPSS. The following will give a description of each of them.
SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet
Plotting: Customizing the Graph
Plotting: Customizing the Graph Data Plots: General Tips Making a Data Plot Active Within a graph layer, only one data plot can be active. A data plot must be set active before you can use the Data Selector
EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002
EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002 Table of Contents Part I Creating a Pivot Table Excel Database......3 What is a Pivot Table...... 3 Creating Pivot Tables
Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.
SAS Enterprise Guide for Educational Researchers: Data Import to Publication without Programming AnnMaria De Mars, University of Southern California, Los Angeles, CA ABSTRACT In this workshop, participants
KaleidaGraph Quick Start Guide
KaleidaGraph Quick Start Guide This document is a hands-on guide that walks you through the use of KaleidaGraph. You will probably want to print this guide and then start your exploration of the product.
Microsoft Excel Tips & Tricks
Microsoft Excel Tips & Tricks Collaborative Programs Research & Evaluation TABLE OF CONTENTS Introduction page 2 Useful Functions page 2 Getting Started with Formulas page 2 Nested Formulas page 3 Copying
Describing, Exploring, and Comparing Data
24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter
Databases in Microsoft Access David M. Marcovitz, Ph.D.
Databases in Microsoft Access David M. Marcovitz, Ph.D. Introduction Schools have been using integrated programs, such as Microsoft Works and Claris/AppleWorks, for many years to fulfill word processing,
PivotTable and PivotChart Reports, & Macros in Microsoft Excel
PivotTable and PivotChart Reports, & Macros in Microsoft Excel Theresa A Scott, MS Biostatistician III Department of Biostatistics Vanderbilt University [email protected] Table of Contents 1
Getting Started With SPSS
Getting Started With SPSS To investigate the research questions posed in each section of this site, we ll be using SPSS, an IBM computer software package specifically designed for use in the social sciences.
Using Excel as a Management Reporting Tool with your Minotaur Data. Exercise 1 Customer Item Profitability Reporting Tool for Management
Using Excel as a Management Reporting Tool with your Minotaur Data with Judith Kirkness These instruction sheets will help you learn: 1. How to export reports from Minotaur to Excel (these instructions
Table of Contents TASK 1: DATA ANALYSIS TOOLPAK... 2 TASK 2: HISTOGRAMS... 5 TASK 3: ENTER MIDPOINT FORMULAS... 11
Table of Contents TASK 1: DATA ANALYSIS TOOLPAK... 2 TASK 2: HISTOGRAMS... 5 TASK 3: ENTER MIDPOINT FORMULAS... 11 TASK 4: ADD TOTAL LABEL AND FORMULA FOR FREQUENCY... 12 TASK 5: MODIFICATIONS TO THE HISTOGRAM...
Getting Started with Excel 2008. Table of Contents
Table of Contents Elements of An Excel Document... 2 Resizing and Hiding Columns and Rows... 3 Using Panes to Create Spreadsheet Headers... 3 Using the AutoFill Command... 4 Using AutoFill for Sequences...
IBM SPSS Statistics Basics
October, 2011 IBM SPSS Statistics Basics I. Availability... 2 II. SPSS Statistics Versions... 2 III. Documentation... 2 IV. Creating SPSS Statistics data files... 3 Using the SPSS Statistics Data Editor...
Microsoft Excel Tutorial
Microsoft Excel Tutorial Microsoft Excel spreadsheets are a powerful and easy to use tool to record, plot and analyze experimental data. Excel is commonly used by engineers to tackle sophisticated computations
Importing and Exporting With SPSS for Windows 17 TUT 117
Information Systems Services Importing and Exporting With TUT 117 Version 2.0 (Nov 2009) Contents 1. Introduction... 3 1.1 Aim of this Document... 3 2. Importing Data from Other Sources... 3 2.1 Reading
4. Descriptive Statistics: Measures of Variability and Central Tendency
4. Descriptive Statistics: Measures of Variability and Central Tendency Objectives Calculate descriptive for continuous and categorical data Edit output tables Although measures of central tendency and
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; [email protected], and Dr. J.A. Dobelman
Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; [email protected], and Dr. J.A. Dobelman Statistics lab will be mainly focused on applying what you have learned in class with
Excel Charts & Graphs
MAX 201 Spring 2008 Assignment #6: Charts & Graphs; Modifying Data Due at the beginning of class on March 18 th Introduction This assignment introduces the charting and graphing capabilities of SPSS and
Tutorial 2: Using Excel in Data Analysis
Tutorial 2: Using Excel in Data Analysis This tutorial guide addresses several issues particularly relevant in the context of the level 1 Physics lab sessions at Durham: organising your work sheet neatly,
When to use Excel. When NOT to use Excel 9/24/2014
Analyzing Quantitative Assessment Data with Excel October 2, 2014 Jeremy Penn, Ph.D. Director When to use Excel You want to quickly summarize or analyze your assessment data You want to create basic visual
Excel 2003 A Beginners Guide
Excel 2003 A Beginners Guide Beginner Introduction The aim of this document is to introduce some basic techniques for using Excel to enter data, perform calculations and produce simple charts based on
IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
SPSS for Simple Analysis
STC: SPSS for Simple Analysis1 SPSS for Simple Analysis STC: SPSS for Simple Analysis2 Background Information IBM SPSS Statistics is a software package used for statistical analysis, data management, and
Chapter 2 Introduction to SPSS
Chapter 2 Introduction to SPSS Abstract This chapter introduces several basic SPSS procedures that are used in the analysis of a data set. The chapter explains the structure of SPSS data files, how to
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application
KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
Directions for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
Quick Stata Guide by Liz Foster
by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the
S P S S Statistical Package for the Social Sciences
S P S S Statistical Package for the Social Sciences Data Entry Data Management Basic Descriptive Statistics Jamie Lynn Marincic Leanne Hicks Survey, Statistics, and Psychometrics Core Facility (SSP) July
Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs
Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships
Excel 2003: Ringtones Task
Excel 2003: Ringtones Task 1. Open up a blank spreadsheet 2. Save the spreadsheet to your area and call it Ringtones.xls 3. Add the data as shown here, making sure you keep to the cells as shown Make sure
Integrated Company Analysis
Using Integrated Company Analysis Version 2.0 Zacks Investment Research, Inc. 2000 Manual Last Updated: 8/11/00 Contents Overview 3 Introduction...3 Guided Tour 4 Getting Started in ICA...4 Parts of ICA
Formulas & Functions in Microsoft Excel
Formulas & Functions in Microsoft Excel Theresa A Scott, MS Biostatistician III Department of Biostatistics Vanderbilt University [email protected] Table of Contents 1 Introduction 1 1.1 Using
Introduction to IBM SPSS Statistics
CONTENTS Arizona State University College of Health Solutions College of Nursing and Health Innovation Introduction to IBM SPSS Statistics Edward A. Greenberg, PhD Director, Data Lab PAGE About This Document
Licensed to: CengageBrain User
This is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may be suppressed. Editorial review has deemed that any suppressed content does not
INTRODUCTION TO EXCEL
INTRODUCTION TO EXCEL 1 INTRODUCTION Anyone who has used a computer for more than just playing games will be aware of spreadsheets A spreadsheet is a versatile computer program (package) that enables you
TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction
TI-Inspire manual 1 General Introduction Instructions Ti-Inspire for statistics TI-Inspire manual 2 TI-Inspire manual 3 Press the On, Off button to go to Home page TI-Inspire manual 4 Use the to navigate
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
Using MS Excel to Analyze Data: A Tutorial
Using MS Excel to Analyze Data: A Tutorial Various data analysis tools are available and some of them are free. Because using data to improve assessment and instruction primarily involves descriptive and
INTRODUCTION TO SPSS FOR WINDOWS Version 19.0
INTRODUCTION TO SPSS FOR WINDOWS Version 19.0 Winter 2012 Contents Purpose of handout & Compatibility between different versions of SPSS.. 1 SPSS window & menus 1 Getting data into SPSS & Editing data..
