Accessing IPUMS Decennial, ACS, and March CPS Microdata Using PDQ-Explore To install the PDQ-Explore software and related files on a Windows-based PC, copy the PDQ folder on the USB Flash Drive to C:\Program Files. The PC should then have a C:\Program Files\PDQ folder that contains all of the files and folders that are in the PDQ folder on the Flash Drive. Create a desktop shortcut to the explore.exe file in the C:\Program Files\PDQ folder. The explore.ini file in the C:\Program Files\PDQ folder has two lines that reference the Flash Drive as Drive E: for use on the laboratory machines. These lines need to be edited to reference the C:\Program Files\PDQ folder rather than E:\PDQ. Use Word, Notepad, edit, or other text editor to remove the comment (#) ahead of those lines in the file and insert a comment ahead of each of the two e:\pdq lines. Then save the file back to its original location in C:\Program Files\PDQ. After editing, the affected section of the file should look like this:... [pdqexplore] server=pdq.psc.isr.umich.edu # startupdirectory=e:\pdq startupdirectory=c:\program Files\pdq # initialworkspacedirectory=e:\pdq initialworkspacedirectory=c:\program Files\pdq showsetuponresult=1 To start PDQ-Explore, double-click on the PDQ-Explore icon on the desktop to open the interface. The server should point to: pdq.psc.isr.umich.edu No Subscriber Name or Password needs to be entered to access the PDQ server. The PDQ-Explore client runs on Windows-based PCs. It requires an active connection to the Internet. Note that some highly secure firewalls may block access to our servers.. Queries are constructed using the GUI and then executed by sending the query to one of our Ann Arbor servers. The server for the Census Data Boot Camp is: pdq.psc.isr.umich.edu afa Census Data Boot Camp October 28, 2009 1/8
Continue with the PDQ-Explore startup sequence and then select the default Open an Existing Workspace option in the Welcome to PDQ-Explore window. Then select BootCamp from the list of workspaces presented. The BootCamp workspace contains the saved PDQ-Explore examples from the various seeded-scenarios that have been presented. PDQ-Explore offers easy, intuitive, and fast interactive access to a variety of large microdata files. These include recent Census Public Use Microdata Samples (PUMS) and concatenated versions of the University of Minnesota Population Center s Integrated Public Use Microdata Series (IPUMS) decennial census, American Community Survey (ACS), and March Current Population Survey (CPS) files. These latter files allow tabulations and summary statistics to be run interactively on census microdata covering decennial censuses from 1850 through 2000 along with the 2001-2007 ACS microdata and 1962-2009 March CPS data. The larger of these data sets includes more than 145 million housing and person records spanning 1850 through 2007. Late in 2009, the 2008 ACS data will be added to the concatenated file of 1850 to 2007 data.. To access the latest IPUMS files, contact afa@pdq.com and request the updated codebooks for those files. These can be emailed to you as attachments that should be copied to the same folder/directory where PDQ-Explore is installed (typically C:\Program Files\PDQ). They can then be added to an existing or new workspace as described in the help notes. The Quick Start and a number of documentation files accessible from within the GUI will help you become familiar with PDQ-Explore. Queries are set up by entering the names of data items, variables, and expressions in the Row, Column, Foreach, and 4 dimensions of the desired tabulation or summary statistics or quantiles. The attached pages on Filling in the Blanks describe the structure and syntax for the entries. PDQ-Explore offers convenient access to basic information about the data items, variables, and their codes for each data set. Complete documentation for the IPUMS data sets is available at http://usa.ipums.org/usa/ and http://cps.ipums.org/cps. PDQ-Explore has powerful recoding capabilities. These permit complex operations such as matching husbands to wives or children to parents. They may be executed efficiently. Custom item files for the IPUMS and PUMS data are included in the PDQ folders ipacs08.pdqcustomitems and ipcps09.pdqcustomitems, for example. The items in those files can serve as examples and as starting points for creating additional recodes and data transformations. Please contact Albert Anderson (afa@pdq.com) with any questions or problems related to the use of PDQ-Explore and the data. afa Census Data Boot Camp October 28, 2009 2/8
The client interface and access to the PDQ servers is free for not-for-profit use. For commercial use, please contact Albert Anderson at Public Data Queries, Inc. Sources and some of the data sets available follow. The attached pages introduce the use of expressions with PDQ-Explore. Sources: Census Bureau: www.census.gov IPUMS: www.ipums.org usa.ipums.org/usa cps.ipums.org/cps Public Data Queries, Inc.: www.pdq.com www.pdq.com/products/download Selected Data: acs_2006-2007 acs20057 pums_1980-2000_1 and 5 pct ipacs08 ipcps09 (acs_2006, for example) (2005-2007 Three-Year ACS Data Set) (pums_2000_5pct, for example) (Concatenated 1850-2007 Decennial and ACS Data) (Concatenated 1962-2009 March CPS Data) See the ipums.org website for documentation on the variables in the ipacs08 and ipcps09 data sets. The PDQ-Explore names for the IPUMS variables and items conform in general to the IPUMS names for the variables. The PDQ-Explore interface is not case-sensitive. Item, variable, and custom item names may be entered in upper, lower, or mixed case. Please cite the IPUMS project at the University of Minnesota Population Center and PDQ- Explore at the University of Michigan Population Studies Center as the sources for analyses and reports that use these data. afa Census Data Boot Camp October 28, 2009 3/8
Filling in the Blanks Notes on Expressions PDQ-Explore 10/28/09 Entries may be typed directly into the expression boxes for the specification of universe or selection criteria, row, column, and for dimensions, weights, and entries for summary statistics and quantiles. Items may be dragged from the workspace window to the expression boxes and dropped, or hot keys may be used to enter highlighted entries in the workspace window for row ( 1 or r ), column ( 2 or c ), and for ( 3 or f ). Items may also be selected from the drop-down lists at the right of the expression entry boxes. The entries in the PDQ-Explore setups are often simply the names of individual items such as sex, race, or age. However, items are just the simplest form of the general expressions that can be entered. An expression typically is made up of one or more item names linked by arithmetic and/or logical operations: plus (+), minus (-), divide (/), equal (=), and (&), or ( ), etc. The full list of PDQ- Operations is given in the table on the next page along with the level of precedence for each. The illustrations in this document are based on the Public Use Microdata Samples (PUMS) from the 2000 Census. Expressions may be simple or complex. A simple expression may be used to collapse age to a more manageable number of categories, for example: age/10 or wage and salary income to $1,000 intervals: incws/1000 Complex expressions may be used to allow a characteristic of a married person to be related to a characteristic of the spouse or a characteristic of a child to be related to a characteristic of a father or mother. See the recodes under the heading "PDQ Custom Items" for the 2000 5% PUMS in the PDQ-Explore Workspace window to see a variety of examples of expressions along with the assignment of identifying names to the new categories generated by the expressions. Operations with higher levels of precedence are executed before lower levels unless parentheses are used to control the order of execution. When parentheses are used, execution occurs within the innermost parentheses first. Consider the following example where the logical AND would be executed before the logical OR were it not for the parentheses: state=26 & (age<18 age>=65) The OR within the parentheses is executed first to select persons less than 18 years of age or older than 65. The result (TRUE or FALSE) is then combined with the result of state=26 (Michigan) through the AND. The result will be TRUE if the person is from Michigan and age is either less than 18 or greater than 65. Otherwise, the result will be FALSE. If the parentheses were not present, all persons older than 65 would be included as TRUE in the result along with persons under age 18 who resided in Michigan--probably not the intended result. As illustrated above, expressions are often used to define the universe of interest for a specific query. The universe might consist of persons in the labor force, children, women of child-bearing afa Census Data Boot Camp October 28, 2009 4/8
age, retirees, the physically disabled, persons with income greater than $50,000, married couples where the wife earns more than the husband, married couple households with children where the husband works at home and the wife works away from the home, or any similar part of the population that is of interest. Note that expressions can include recodes and transformations as well as items. PDQ-Explore Arithmetic and Logical Operators Precedence Level Operator Name Example/Comment 9 X:a..b range age:15..44 8 unary + plus sex=+1 (never needed) 8 unary - minus incse<=-1000 7 * multiply 0.87*inctot 7 / divide hinc/persons 7 % modulo subsample%10 6 + add incws+incse 6 - subtract hinc-inctot 5 < less than age<65 5 > greater than age>64 5 <= less than or equal age<=65 5 >= greater than or equal age>=65 4 = or == equal age=23 4!= or <> not equal incse!=0 3 & or && and race1=2 & lookwrk=1 2 ^ exclusive or bit-wise--use with care 1 or or age<18 age>=65 Note that the or ( ) and not equal operator (!) are different. Logical TRUE evaluates as a numeric 1; logical FALSE evaluates as 0 in numeric expressions. All non-zero numeric values are TRUE in logical expressions. Use parentheses freely to control the order of execution of operations, especially if the effect of precedence is not obvious. A common error is to omit a required left or right parenthesis. The range operator specifies the range of values that are to be included in a query and displayed in the results. For example, the tabulation of age:15..44 as the row axis will give counts for each age from 15 through 44. The range occasionally needs to be specified in this manner for recodes and more complex expressions where the PDQ-Explore software cannot reliably determine the range of results. This is especially true for the intrinsic functions described below. Four intrinsic functions, $sum, $min, $max, and $pick, are available that loop through the records in a hierarchy and return the sum, minimum, maximum, or selected values, respectively, of the argument, which is typically an arithmetic or logical expression. In the simplest case, sum returns the count of the number of persons in the housing unit with a given trait. For example: sum(age<18) returns the number of persons under age 18 in the housing unit; min(age) returns the age of the youngest person in the housing unit; and max(educ) returns the level of education for the most highly educated person in the housing unit. The pick function has the structure pick(expression,item). The function returns the value of the item for the first record within a lower hierarchy for which the expression is true. For example, for the 2000 PUMS, pick(relate=2,age) will return the age of the spouse of the head, if present. A afa Census Data Boot Camp October 28, 2009 5/8
value one less than the lowest coded value for the item is returned if the expression is false, -1 or not picked in the case of age of spouse. In a more complex example, pick used in combination with such items as relationship, marital status, sex, and subfamily membership can allow characteristics of husbands for married-spousepresent women to be identified. PDQ-Explore custom item expressions can also be defined to include parameters that are to be assigned when the expression is used. See for example the PDQ-Explore custom items husband and wife. The arithmetic functions listed below perform numeric transformations on item values or expressions. PDQ-Explore Functions Function Result $abs(exp) Absolute value $ceil(exp) Round up $floor(exp) Round down $log(exp) Log base e $log10(exp) Log base 10 $pow(exp1,exp2) Exp1 raised to exp2 power $rint(exp) Round to nearest integer $sqrt(exp) Square root $sin(exp) Sine $cos(exp) Cosine $tan(exp) Tangent $asin(exp) Arcsine $acos(exp) Arccosine $atan(exp) Arctangent Selections should be used to eliminate sources of computational errors when results are not defined for specific values, such as 0 in the case of division or 0 and negative numbers in the case of logarithms. In addition to the above functions, a case statement is available to map a sequence of expressions to results: case(exp1,rslt1,exp2,rslt2,exp3,rslt3,...,default result) For example, case(race1=1,1,race1=2,2,(race1>=3 & race1<=7),3,4) will recode the item race1 to four categories: 1 1, 2 2, 3-7 3, and any other values to 4. Examples of expressions: select: sex=2 & age>=15 & age<=49 select: age>=15 & age<=65 & (occcen5=210 occcen5=306) row: age:15..25 row: age>=15 & age<=25 describe: log(inctot) Note that the two row examples do not yield the same results: age:15..25 will display one row for each single year of age in the 15-25 range; age>=15 & age<=25 will display False and True rows for those cases outside and those within the 15-25 year range, respectively. afa Census Data Boot Camp October 28, 2009 6/8
PDQ-Explore numeric functions generally return floating-point results. Once a floating-point value is encountered in an expression, further calculations are carried out in floating-point. Floating-point calculations can also be coerced by using decimal values in expressions; multiplying by 1.0, for example. Floating-point values are truncated to integer values at the point where they are used as indexes in tabulations. Note also that the range of tabulations is automatically extended to include two additional categories: All those below the default or defined range (<<<<<<) and all those above the default or defined range (>>>>>>). These extended categories may be displayed and included in totals and percentages or neither displayed nor included in totals and percentages subject to the setting of Suppress Above/Below under the Option tab on the Results window PDQ-Explore works very well for generating simple and multi-way tabulations, summary statistics (means and standard deviations), and quantiles of any order. Correlations may also be calculated and data extracts generated. These are not well-implemented in the current version of PDQ- Explore GUI. The IPUMS extraction routines work very well for generating extracts from their data sets. Please feel free to contact Albert Anderson (afa@pdq.com) for help with PDQ-Explore. Manipulating and Displaying Results The Options tab in the Results window offers options for displaying the results of tabulations and summary statistics in a variety of ways. These are generally self-explanatory. Experiment freely with these. Only the Suppress Zeros option is likely to cause difficulties if suppression is removed on a result that has hundreds, thousands, or millions of zero rows. By default the suppression of the display of rows, columns, or the other axes that contain only zero entries is set on. The results of tabulations may be sorted by column or row by clicking on the corresponding column or row heading. The sorts cycle from descending (high to low) through ascending and back to the original orders. Do not sort on a row or column after sorting on the other axis the results can be unpredictable or difficult to interpret. Return to the original order before changing the sort axis. Means may be sorted using the Sort option under the Options tab when using Summary Statistics. The result axes may be pivoted by using the options available to the right of the Dimension display that is above the tabular results. Experiment with restructuring the table. The FOREACH option along with resizing of the display can simplify the browsing of results. Also, duplicating a setup can facilitate comparisons. PDQ-Explore can generate results with far more rows and columns than can be reasonably displayed. Use care when defining potentially huge results. The Total number of cells in result in the setup window gives some indication of the size of the resulting array to be returned. Green, pink, and red shades reflect the magnitude of the results. Tabulating statefip by occ1990, for example, has the potential to generate a table with more than 100,000 cells, but this is a reasonable tabulation as indicated by the pink shade. Adding sex as a third dimension pushes the number of cells over 400,000 and the cell count turns red. This query can still be handled fairly easily, although the results may be displayed slowly on some machines. If the return and display of results takes more than a minute, the query may be too large and the query should be aborted. afa Census Data Boot Camp October 28, 2009 7/8
Citation and Use of IPUMS-USA All persons are granted a limited license to use this data and the accompanying documentation, subject to the following conditions: No fee may be charged for its use. Publications and research reports based on the database must cite it appropriately. The citation is as follows: Steven Ruggles, Matthew Sobek, Trent Alexander, Catherine A. Fitch, Ronald Goeken, Patricia Kelly Hall, Miriam King, and Chad Ronnander. Integrated Public Use Microdata Series: Version 4.0 [Machine-readable database]. Minneapolis, MN: Minnesota Population Center [producer and distributor], 2009. If possible, citations should also include the URL for the IPUMS site: http://usa.ipums.org/usa/ In addition, we request that users report any publications, research reports, or educational material using the data or documentation. Citations should be added to the IPUMS bibliography here. Citation and Use of the IPUMS-CPS All persons are granted a limited license to use this data and the accompanying documentation, subject to the following conditions: No fee may be charged for its use. Publications and research reports based on the IPUMS-CPS database must cite it appropriately. The citation is as follows: Miriam King, Steven Ruggles, Trent Alexander, Donna Leicach, and Matthew Sobek. Integrated Public Use Microdata Series, Current Population Survey: Version 2.0. [Machine-readable database]. Minneapolis, MN: Minnesota Population Center [producer and distributor], 2009. If possible, citations should also include the URL for the IPUMS-CPS site: http://cps.ipums.org We request that IPUMS-CPS users report any publications, research reports, or educational material using the data or documentation. Citations should be added to the IPUMS bibliograpy or sent to ipums@pop.umn.edu. afa Census Data Boot Camp October 28, 2009 8/8