1 Introduction Stata Tutorial Econometrics676, Spring2008 1.1 Stata Windows and toolbar Review : past commands appear here Variables : variables list Command : where you type your commands Results : results are displayed here 1.1.1 Toolbar log: start / stop a log le viewer: open viewer window results: open results window graph: open graph window do- le editor: open do- le editor data editor data browser more: continue when paused in long output break: stop the current task 1
1.2 Stata File management 1.2.1 FILES EXTENTIONS Data le lename.dta Do le lename.do (program le) Log le lename.scml (only readable in stata) lename.log (text le) 1.2.2 WORKING DIRECTORY The working directory displayed at the bottom left hand corner of the window. To change your working directory, use the cd command cd c:n dir To open a le use lename, clear use varlist using lename, clear for a subset of the data le To save a data le save, replace save lename, replace 1.2.3 MEMORY save "c:nexample1.dta", replace STATA opens with a default memory of 1m. overwrites current le saves le as lename. Replace is optional but necessary if a le of that name already exists In some cases you may get the message no room to add more observations This is because not enough memory has been assigned to STATA. To change the memory assigned to STATA: set mem #m 1.2.4 LOG FILES set mem 16m All output appearing in the Results window can be can be captured in a log le. The log le can be saved as a STATA formatted (SMCL) or as a text le. By default, logs are written in SMCL (Stata Markup and Control Language). 2
To translate a log le created in smcl to text, go to nfilenlogntranslate To start a log log using lename log using lename, replace log using lename.log starts an smcl log overwrites lename.smcl starts a text log To pause and resume a log log o temporarily suspends log le log on resumes log le These commands can be useful to create a log that contains only results and not intermediate programming. To close a log log close closes current log le You can add comments to your log as you work by entering any comments in the command line (or in your do- le) preceded by a * *unemployment rate 1.2.5 CONTROLLING OUTPUT -moremay appear in your results window when you are trying to output a long listing To see the next line: press Enter To see the next screen: press any key or click on the more- To interrupt a STATA command at any time uses the Break button 2 Manupulating Data 2.1 Destrictive Commands 2.1.1 Describe There are various ways of examining a dataset in Stata, including describe, list, and summarise. produces a summary of the contents of a dataset d d using lename d varlist describes dataset in current memory describes a stored STATA format dataset describe a subset of a dataset 3
2.1.2 Summarise summarize statistics su su varlist su varlist, d calculates and displays a variety of univariate summarise whole dataset summarise subset varlist summarise with the detail command 2.1.3 List most detailed of the commonly used descriptive commands. L displays the values of variables 2.1.4 Graph twoway plot_type varl1 var2 draws scatterplots, line plots, etc Plot type: Scatter, Line, Connected twoway scatter lincome lincomea, t1("graph 1") 2.1.5 SORT and BY Commands sort varlist by varlist : sort region Arranges the observations of the current data into ascending order of the values of the variables of varlist causes the command that follows to be repeated for each unique set of values of the variables in varlist 1 by region: su income (Note) Data must be sorted by varlist, before you use by command. 2.1.6 Cross Tabulation tabulate tab var1 var2 [weight] [if exp] produces one- and two-way tables of frequency counts 2.2 Creating New Variables Generate can create a new variable that is an algebraic expression of other variables. gen newvar = expression 4
To change the contents of an existing variable you must use the replace command. replace oldvar = expression gen agerange =. if age<16 replace agerange=1 if 16<=age & age<25 replace agerange=2 if 25<= age & age<45 gen age16=0 replace age16=1 if age==16 Values for a string variable are denoted by inverted commas gen age= young if agerange==1 replace age= if agerange~=1 The default code for a missing value in STATA is a single period (.) or a blank in the case of a string. replace var =. if var == 99 replace string = if string == not answered 3 Linear Regression regress y X [, noconstant robust] estimates a linear regression y = c + X + predict [type] newvar calculates predictions, residuals and statistics after estimation predict_type xb res stdp Linear prediction (default) residual standard error of prediction reg lwage exper age kid reg lwage exper age kids, robust reg lwage exper age kids, nocons predict u, res (Robust Estimation) (No constant term) (save u=residual) 5
3.1 Ivreg Instrumental variables (two-stage least-squares) regression Syntax ivreg depvar [varlist1] (varlist2 = varlist_iv) [if] [in] [weight] [, options] ivreg y1 x1 x2 (y2 y3 = z1 z2 z3) ivreg y1 x1 x2 (y2 = z1 z2 z3) x3 test x3=5 test y2=x3-x2 predict y1hat 3.2 Hypothesis Testing ttest var t tests test varlist F tests ttest income=180, level(99) ttest group1_income=group2_income ttest income, by(male) test exper age kids 4 Stata Resources Stata Textbook s Econometric Analysis of Cross Section and Panel Data by Je rey M. Wooldridge http : ==www:ats:ucla:edu=stat=stata=examples=eacspd= UCLA Stata Starter Kit http : ==www:ats:ucla:edu=stat=stata= 6