Guide t Stata Ecn B003 Applied Ecnmics T cnfigure Stata in yur accunt: Lgin t WTS (use yur Cluster WTS passwrd) Duble-click in the flder Applicatins and Resurces Duble-click in the flder Unix Applicatins Duble-click in Setup Exceed (type any key) Duble-click in the Stata icn (use yur main UCL passwrd, i.e. e-mail passwrd) NOTE: Stata in the WTS envirnment directly sets up yur persnal path and there is n need t specify it fr any f the files that will be pened r saved during this sessin (yu can see the current wrking path that Stata is using in the bttm-left crner f the Windw) 1. OPENING DATA SETS, LOG-FILES, DO-FILES T pen a lg-file lg using [path fr the file]/class1.lg, replace This cmmand pens a file with extensin.lg, where it keeps recrds f what yu have dne and the results f the cmmands that yu have executed. These files can be pened later n with any editr (MS Wrd, fr example) Befre pening any lg-file yu will need t clse any ther lg-file that yu may have pened. lg ff lg n lg clse lg using [path fr the file]/class1.lg, append T pen a Stata data set file:.dta use [path fr the file]/incme.dta T save a Stata data set file nce it has been mdified save [path fr the file]/incme.dta, replace This saves the changes that have been made in the riginal data set. If yu want t keep the riginal data set, yu must type a different name fr the data set (e.g. incme2.dta). The extensin.dta is assumed by Stata. T clear the memry 1
clear If the data set has nt been saved befre ding clear, the changes will be lst and nly the changes dne befre saving the data set will be saved. D-files Smetimes yu will find it useful t write all the cmmands in a text prcessr and then ask Stata t execute all the cmmands that yu have written in this file. Yu can d this using d-files. An advantage f these files is that yu can use the same file as many times as yu want and yu d nt need t write the cmmands in the Cmmand windw every time yu want t execute them. In additin, yu can add cmment lines t the file t describe what yu are ding in each step. In rder t insert a cmment, put /* at the beginning f the cmment and */ after the cmment. Stata will ignre what it is written between these tw symbls. Example: (in a text prcessr -i.e. MSWrd r Ntepad- we write) /*d-file fr the first tutrial*/ lg using class1.dta, replace use incme.dta describe summarize drp if inc==. /*we drp thse bservatins whse incme is missing*/ save incme2.dta /*saving with different name*/ clear lg clse After saving this file with the extensin.d (e.g. class1.d) we write the fllwing cmmand in Stata: d path fr the file\class1.d 2. GENERAL SYNTAX OF STATA COMMANDS [by varlist :] cmmand [varlist] [=exp] [if exp] [in range] [, ptins] The expressin by, if and in are ptinal f all the Stata cmmands. by varlist: This ptin tells Stata that it has t execute the cmmand separately fr each set f bservatins that share the same value fr the variables specified in varlist. Nte: in rder t use the by ptin yu first need t srt the bservatins f the data set with respect t that variable in varlist. This can be dne in Stata by typing: srt [varname] in range 2
This ptin tells Stata the range f the bservatins ver which we want t apply the cmmand. list [variable name] in 5 lists the 5th bservatin f the variable list [variable name] in 5/10 list [variable name] in 3 list [variable name] in 5/l lists frm 5th t 10th bservatin lists the third-frm-the-last lists frm 5th t last bservatin if expressin After if a lgic expressin shuld be added in rder t specify the cnditins that we want the bservatins t satisfy in rder t execute the Stata cmmand The lgic peratrs that can be used t write the lgic cnditins are the fllwing: == equal (nte, 2 symbls =) ~= different!= different & and r >= (<=) greater (less) r equal than > (<) greater (less) than! nt Nte: In Stata missing values are represented with a dt (.), in case we want t execute sme cmmands nly fr the bservatins which have missing values fr a specific variable. Examples: list if age==30 list if age==30&inc~=. list if age=<30 age>40 3. DATA MANAGEMENT, GENERATION OF NEW VARIABLES T describe the variables in the current data set describe This cmmand will let yu find ut what variables are in the data set and hw they are named, with their descriptin, the labels f the variables, the number f variables and bservatins. 3
It can be als used t describe nly specific variables. Fr example: describe inc T label variables label variable [variable name] cmment This cmmand allws yu t dcument yur data set s that yu can make cmments n the variables and give a shrt descriptin (at mst 31 characters lng) f the variable. T label the values f the variables Fr example, in the data set f ur example, the variable eth takes nly tw pssible values 0 and 1, but this variable is nt really a numerical variable. It is useful t give a variable value 0 whenever the bservatin in Nn-white and 1 whenever it is White. Stata can remember which categry crrespnds t each number by executing the fllwing cmmands: label define ethlabel 0 Nn-white 1 White label values eth ethlabel T list the values f the variables label list This cmmand will tell yu which values have been assigned t each f the categries f a categrical variable. T summarize the variables summarize This cmmand gives a series f summary statistics f all the data in memry. In particular, it gives yu the number f bservatins, means, variances and ranges f all the laded variables. In rder t get mre detailed statistics, such as the percentiles f each variable and higher mments (skewness and kurtsis) the fllwing cmmand shuld be used. summarize, detail Yu can als btain the summary statistics fr nly a list f variables summarize inc eth sum inc eth T list the bservatins list [name f variable] [if expressin] [in range] 4
Examples: list (this cmmand lists all the bservatins f all the variables in the data set) list inc (this cmmand lists all the bservatins f the variable incme) T drp and keep bservatins/variables keep [name f variable] [if expressin] [in range] drp [name f variable] [if expressin] [in range] T generate new variables and replace existing nes generate [new name f variable]=expressin if exp in exp This cmmand allws ne t generate a new variable frm the variables that already exist and yu can als specify the cnditins with if and the range with in. replace [name f variable]=expressin if exp in exp This cmmand allws ne t mdify a variable that already exists in memry. Examples: generate lninc=lg(inc) generate d=1 if inc>1000 replace d=0 if d==. The same result culd be btained if we use the fllwing cmmand in Stata: generate d=(inc>1000) (this cmmand generates a variable d that takes value 1 if the expressin inside the brackets is true and 0 if it is false) 4. HISTOGRAMS, PLOTS AND GRAPHS T draw a histgram graph [varname], histgram bin(#) # is the number if intervals we want t specify (#=5 is the default) Nte that we wuld have btained the same by typing graph [varname], bin(#) 5
because when we nly have ne variable Stata graphs a histgram as a default. If yu include nrmal at the end f the cmmand graph [varname], histgram bin(#) nrmal a line fr the nrmal distributin appears s that yu can cmpare whether yur distributin lks like a nrmal r it is very different. Examples: graph inc, histgram bin(25) srt eth graph inc, by(eth) histgram bin(25) (this graphs tw histgrams. One fr each pssible values f eth) Scatter plt f tw variables plt var1 var2, where var1 is the y-axis variable and var2 is the x-axis variable. Otherwise, yu can use the fllwing cmmand will prduce a better quality graph: graph var1 var2 5. TABLES OF FREQUENCIES AND CORRELATION T tabulate variables: ne-way frequency tables This cmmand allws ne t create tables shwing the frequency f each pssible value f the variable that has been specified in the cmmand. tabulate [varname] if[] in[] T tabulate variables: tw-way frequency tables This cmmand allws ne t create tables shwing the frequency fr all the pssible values f tw variables simultaneusly. tabulate var1 var2 if[] in[] Example: tabulate d eth tabulate d eth, clumn tabulate d eth, rw tabulate d eth if inc~=., clumn 6
Crrelatin and cvariances between tw variables crrelate (this cmmand cmputes the crrelatin cefficient between all the pssible pairs f variables in memry) crrelate var1 var2 (this cmmand cmputes the crrelatin cefficient between the tw variables specified) crrelate var1 var2, cvariance (cmputes the cvariance between the tw variables instead f the crrelatin cefficient) 6. REGRESSION ANALYSIS Yu can d the regressin analysis by regress dep_var x1 x2 x3 This cmmand executes the regressin f the dependent variable (dep_var) n the independent variables r regressrs (x1,x2,x3..). Thus, the first variable in the list f variables is the dependent variable, and then we write the regressrs. Stata autmatically adds a cnstant. The ptin,ncnstant has t be added t the cmmand as fllws if we d nt want t include a cnstant in the regressin: regress dep_var x1 x2 x3, ncnstant Stata autmatically shws many statistics relating t the regressin (number f bservatins, R 2, t-statistic fr each cefficient f the regressin) Statistical Tests fr the values f the cefficients in the regressin: After regressing a particular mdel with the cmmand regress Individual tests: test name f the variable=value (this cmmand tests the null hypthesis that the cefficient f the variable specified is statistically equal t the value) Jint tests: test var1=value1 test var2=value2, accumulate (this cmmand tests the jint null hypthesis that the cefficient f the variable var1 is equal t the value1 and that the cefficient f var2 is equal t the value2) 7
8