Using Big Datasets in Stata



Similar documents
This guide provides step by step instructions for using the IMF elibrary Data - My Data area. In this guide, you ll learn how to:

416 Agriculture Hall Michigan State University

Web UTAS. Common problems and solutions. Common problems with Java settings. Ensuring that you have the correct Java settings in place

How to set up your Secure in Outlook 2010*

How to increase virtual memory in Windows XP with Service Pack 2

HOW TO CREATE AND MERGE DATASETS IN SPSS

Selecting a Sub-set of Cases in SPSS: The Select Cases Command

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

Microsoft Excel 2013: Using a Data Entry Form

Importing Data into R

EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002

CompleteView Admin Console Users Guide. Version Revised: 02/15/2008

Outlook 2010 and 2013

Guidelines for Creating Reports

Using Computer Programs for Quantitative Data Analysis

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

1. Starting the management of a subscribers list with emill

Cleaning Up Your Outlook Mailbox and Keeping It That Way ;-) Mailbox Cleanup. Quicklinks >>

Directions for Frequency Tables, Histograms, and Frequency Bar Charts

McAfee SIEM Alarms. Setting up and Managing Alarms. Introduction. What does it do? What doesn t it do?

File Management Using Microsoft Windows

SharePoint List Filter Favorites Installation Instruction

Installing FileMaker Pro 11 in Windows

ODBC Reference Guide

Creating a Network Graph with Gephi

How to Set Up Outlook 2007 and Outlook 2010 for Hosted Microsoft Exchange if the Program is Already Installed

Server & Workstation Installation of Client Profiles for Windows

SPSS: Getting Started. For Windows

NOTE: Please refer to the LinkNavigator CD-ROM s IP Setup Utility if you do not know the LinkStation s IP Address or Host Name.

Budgeting in QuickBooks

Six Steps to Completing a Mail-Merge

Graphing in excel on the Mac

Introduction to RStudio

Outlook Express POP Instructions - Bloomsburg University Students

File Manager Pro User Guide. Version 3.0

Filtering with Microsoft Outlook

To configure Outlook Express for your InfoMetrics address:

How to Connect to your ScalIT File and Document Management Server as a Network Drive

HOWTO SAP SECURITY OPTIMIZATION WITH SAP SOLUTION MANAGER

Excel for Data Cleaning and Management

COURSE DESCRIPTION. Queries in Microsoft Access. This course is designed for users with a to create queries in Microsoft Access.

What is a Mail Merge?

Introduction to SPSS 16.0

4. Are you satisfied with the outcome? Why or why not? Offer a solution and make a new graph (Figure 2).

Learning Services IT Guide. Access 2013

CompleteView Admin Console User s Manual. Version 3.8

Getting Started with CashierPRO Inventory Management

Using Windows Task Scheduler instead of the Backup Express Scheduler

Sales Person Commission

Charting LibQUAL+(TM) Data. Jeff Stark Training & Development Services Texas A&M University Libraries Texas A&M University

OUTLOOK ANYWHERE CONNECTION GUIDE FOR USERS OF OUTLOOK 2010

B/E Aerospace FTP Tool Training Guide

A computer running Windows Vista or Mac OS X

SPAM QUARANTINE. Quarantine

How to Create Custom Sound Effects for NXT Robots.

This is a training module for Maximo Asset Management V7.1. It demonstrates how to use the E-Audit function.

Getting Started with Crystal Reports Session Description:

Using Excel for Data Manipulation and Statistical Analysis: How-to s and Cautions

Using Salvage to recover accidently deleted or overwritten files

Monthly Payroll to Finance Reconciliation Report: Access and Instructions

MyNetFone Virtual Fax. Virtual Fax Installation

R and Rcmdr : Basic Functions for Managing Data

Division of Student Affairs Quota Practices / Guidelines

Almyta Control System Advanced Reference Contents

Outlook Express. Make Changes in Red: Open up Outlook Express. From the Menu Bar. Tools to Accounts - Click on Mail Tab.

PRINT CONFIGURATION. 1. Printer Configuration

Mixed 2 x 3 ANOVA. Notes

Scatter Plots with Error Bars

SPSS 12 Data Analysis Basics Linda E. Lucek, Ed.D

XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables

DATA VISUALIZATION WITH TABLEAU PUBLIC. (Data for this tutorial at

Organizing and Managing

Server & Workstation Installation of Client Profiles for Windows (WAN Edition)

Software Application Tutorial

Talk-101 User Guides Web Content Filter Administration

Setting up Your Acusis Address. Microsoft Outlook

Tips and Tricks SAGE ACCPAC INTELLIGENCE

MSP How to guide session 2 (Resources & Cost)

Psy 210 Conference Poster on Sex Differences in Car Accidents 10 Marks

Information Technology Solutions

Using Excel as a Management Reporting Tool with your Minotaur Data. Exercise 1 Customer Item Profitability Reporting Tool for Management

Install MS SQL Server 2012 Express Edition

Mail and Address Book Management

In this example, Mrs. Smith is looking to create graphs that represent the ethnic diversity of the 24 students in her 4 th grade class.

Market Pricing Override

Data exploration with Microsoft Excel: univariate analysis

NETFORT LANGUARDIAN MONITORING WAN CONNECTIONS. How to monitor WAN connections with NetFort LANGuardian Aisling Brennan

MICROSOFT OUTLOOK 2011 ORGANIZE MESSAGES

SENDING S & MESSAGES TO GROUPS

Customer Control Panel Manual

Microsoft Outlook 2003 : Creating an Spam/Junk Mail Filter

Setting Up ALERE with Client/Server Data

C:\Users\<your_user_name>\AppData\Roaming\IEA\IDBAnalyzerV3

User Guide May Using Certificates in Outlook Express

HOW TO CREATE AN HTML5 JEOPARDY- STYLE GAME IN CAPTIVATE

Transcription:

Using Big Datasets in Stata Josh Klugman and Min Gong Department of Sociology Indiana University On the sociology LAN, Stata is configured to use 12 megabytes (MB) of memory, which means that the user can load up datasets that are no more than 12 MB in size. 1 There are two ways to get around this if you are working with a large dataset: increase Stata s memory usage or use DBMS/Copy to make a smaller dataset. Which method you use depends on the size of the dataset; the larger the dataset, the slower Stata is going to run. For Stata SE, we recommend that you set up your computer s mem to a number larger than the dataset you use, so you won t have problem opening the dataset or running analyses. If your dataset is very large, you may want to consider making your dataset smaller is using DBMS/Copy instead. Increasing Stata s Memory Usage The easiest way to use a large dataset is to increase Stata s memory usage. This is simply done by using the set mem command. set mem # In this command, # is the number of kilobytes (KB) of memory that you want Stata to use. Choose a number that is somewhat larger than the size of the dataset you want to use (the reason is that you may need to add variables during your analyses, which will increase the size of the dataset). For example, if you want to use a 30 MB dataset, tell Stata to use 35 MB: set mem 35000 You may abbreviate the number of MB you want Stata to use by dividing the number by 1,000 and adding a lower-case m next to the # of MBs. set mem 35m Using DBMS/Copy DBMS/Copy is a nice program that lets you copy datasets and translate datasets from one statistical package to another. If you want to create a smaller dataset that Stata can use, you can use DBMS/Copy to copy the dataset but to delete a certain number of variables or cases. Here is how DBMS/Copy works: Step 1: Load up DBMS/Copy. You can access the program through the Start menu, under Programs/LAN Statistics. Choose DBMS/Copy DBMS/Copy 8. As far as we 1 You can tell how large a file is by clicking on it in a Windows window; Windows will tell you the size in kilobytes (KB) or megabytes a megabyte is approximately 1,000 KB (e.g. 12,000 KB = 12 MB). 1

know, it doesn t matter if you choose either DBMS/COPY V8 (with DBMSAnalyst) or just DBMS/COPY V8. 2 You should get the following menu choices: Click on the Interactive button. A window should pop up where you choose the input database. 2 Following graphs show DBMS/Copy V7. We have DBMS/Copy V8 now. 2

Since you re dealing with Stata datasets, make sure the program is looking for Stata SE datasets. If you want to transform SPSS or SAS data sets, look for SPSS 12.0 for windows or SAS 9.1 for windows. (We use the Eurobarometer 47.1 dataset as an example.) This window pops up. Ignore this window and press the Done button. 3

Then this window pops up: Step 2: We will show you how to delete cases or delete variables next. (1) To select cases, press Record Filter & Equations; (2) to delete variables, press Variable Information. (1) Selecting Cases: In this example, we are telling DBMS/Copy to select cases where the respondent is from Belgium, W. Germany, E. Germany, and Austria. This implies that we are deleting all the other cases that are not selected. You can either double click select in the box of 4

Goodies or type select variable=value in the last box. The program s syntax is pretty simple; you can also drop cases as well. For more help, look at DBMS/Copy s help file on equations (Y:\DBMSCOPY 8\program files\dataflux\ dfpower Tools\8.0.hlp). After you are done typing in your equation, click on OK In this window, we are specifying which variables to keep and which to delete. The first column Name gives the variable name; under the Rename column you can type in a new variable name for the output dataset; in the Keep column the user indicates what variables are to be retained for the output dataset; and the Label column gives the variable labels. Importantly, whether you choose to Drop Check or Keep Check depends on whether you want to work with a small ( Keep Check ) or large ( Drop Check ) fraction of the variables in the dataset. Notice that Keeping and Dropping have the same effect they trim the output database. In this case, we are telling DBMS/Copy to keep the variables q4901-q4917, which are a series of statements about immigrants in Western European societies. You can tell this because those variables are checked. (2) Deleting Variables: Alternatively, you can drop variables from the output dataset. In that case, you would specify Drop Checked in the button third from the left at the top (which currently says Keep Checked ). The Keep column would then become a Drop column, and any variables you selected would be eliminated from the output dataset. 5

After you are done specifying which variables should be kept or dropped, click on the OK button. Step 3: After selecting the variables, you are back at the Power Panel, shown above. Click the OK button. Here you specify what your output dataset is going to be. Make sure that you specify Stata SE dataset (You can also select SPSS 12.0 for windows or SAS 9.1 for windows if you prefer to use these two statistical packages). After you finish this procedure, click the Save button. 6

You will be taken to this window: This window gives you the DBMS/Copy s syntax for all of your specifications. You can save your program by pushing the Save Program button this is a good idea to keep track of your data manipulations. For now, click the Do-It! button. DBMS/Copy is now processing your commands. When it s done, you re ready to open the Stata/SPSS/SAS file that contains the variables you want. Congratulations!! 7