Technical Report Using Microsoft Access Front End to NIMSP MySQL Database by Earl F Glynn 10 Oct. 2013
Contents Purpose... 3 Introduction... 3 National Institute on Money in State Politics... 3 How to use these notes... 3 Getting Started... 4 ODBC Connection to MySQL in Windows... 4 MySQL Connector/ODBC Installation... 4 Windows ODBC Data Source... 5 Create Empty Access Database... 8 Access ODBC Connection to MySQL table... 9 MySQL Table in Access Design View... 10 Access Examples of Extracting NIMSP Data from MySQL... 12 Counts of NIMSP campaign finance records by state by election cycle... 12 State Subset File: Missouri... 18 Finding Common Political Contributors in Two States: North Dakota and Wisconsin... 22 Other Approaches to Working with NIMSP MySQL Database... 26 2
Purpose The purpose of these notes is to provide step-by-step instructions for working with state campaign finance data from the National Institute on Money in State Politics (FollowTheMoney.org) using Microsoft Access as a front end to a back end MySQL database. The intended audience for these notes includes data journalists, data analysts, and others interested in analyzing state campaign finance data in one or all of the 50 states. Introduction Recent WatchdogLabs.org articles show how to download and install MySQL software and build a 50-state database of NIMSP state campaign finance data: These notes show how to use the Microsoft Access database tool to manipulate the data in the NIMSP MySQL database using three specific examples. National Institute on Money in State Politics If you decide to use the NIMSP state campaign finance data described in this series of technical articles, you should be aware of the Influence Explorer Bulk Downloads License statement: State campaign finance data must be attributed to FollowTheMoney.org. How to use these notes Review the Getting Started section to be sure you are ready to work with the NIMSP MySQL database in Access using Open Database Connectivity (ODBC). With the MySQL Connector/ODBC installed, you will be ready to use Microsoft Access (2007 or later) to manipulate the NIMSP MySQL database. Step-by-step Access examples will show how to: Create an inventory of NIMSP campaign finance records by state by election cycle. This example shows how to create an Access crosstab query that can be consulted when working with any state to understand the limitations of available data by election cycle. Create a state NIMSP subset. Often ad hoc studies only need the data for a particular state. Often analysis will go much more quickly if a single state s data are first extracted from the master file of 29 million records. This example will show how to create a Missouri subset of the NIMSP database. Find the common contributors giving to recipients in two states. This example shows how to use MySQL Views as Access tables to import and compare contributor names from North Dakota and Wisconsin. These states have the fewest and the most records in the NIMSP database, respectively. 3
Getting Started Two steps may be necessary before you will be ready to use Access to work with NIMSP data. If necessary, consult the instructions in this recent WatchdogLabs.org article to see how to install a MySQL server and MySQL Workstation in a Windows 8 environment: 1. Installing and Configuring MySQL on Windows 8 In particular, note the instructions on page 11 to install the MySQL Connector/ODBC software: If for some reason Connector/ODBC software was not installed, see the notes in the next section. ODBC is the software that facilitates communication between Access and MySQL. If necessary, load the 29 million NIMSP state campaign finance records into MySQL: 2. 50-State Campaign Finance Database: Building NIMSP MySQL Database The bulk NIMSP data can be downloaded from the Influence Explorer site. ODBC Connection to MySQL in Windows This section walks you through installing MySQL Connector/ODCBC software (if necessary), and setting up an ODBC link from Access to the NIMSP MySQL database. MySQL Connector/ODBC Installation If you do not remember whether MySQL Connector/ODBC software was installed, follow these steps to find out: + X Control Panel Programs and Features Look for an entry like this: Skip to the next section if MySQL Connector/ODBC software is already installed. If not, consult these links to install MySQL Connector/ODBC: 4
http://dev.mysql.com/doc/refman/5.6/en/connector-odbc-installation.html http://dev.mysql.com/doc/refman/5.6/en/connector-odbc-installation-binary-windows.html Figure 1 shows the relationship between the MySQL server back end and the Microsoft Access client front end database. Figure 1. MySQL server back end and Microsoft Access front end databases Windows ODBC Data Source Let s use Microsoft Windows to establish an ODBC connection to the MySQL data source. In Windows 8: + X Control Panel 5
Right-click, run as administrator Select Add. Select MySQL ODBC driver as a new data source. A dialog box from MySQL Connector/ODBC now appears: 6
Assign the Data Source Name to be nimsp and spell out what that means in the Description field. Use the root MySQL login user and password, or another MySQL user, like earl as shown above. [See MySQL User without password for use in scripts in Installing and Configuring MySQL on Windows 8.] Select the nimsp MySQL database. Check the box Allow big result sets or ODBC will limit the number of records it returns. After specifying the connection parameters, select: Test. 7
OK OK OK The Connection successful message means we can use ODBC to connect to the nimsp MySQL database using Access and other Windows tools. Also see: Using Connector/ODBC with Microsoft Applications Connector/ODBC Connection Parameters Create Empty Access Database Let s setup a new, empty Access database and create a linked-table using MySQL as the source. Start Access Select Blank database. 8
Specify a folder and a filename for the new file. Let s now add an Access linked table to the MySQL NIMSP table. Access ODBC Connection to MySQL table In the new, empty Access database, select the External Data ribbon and ODBC Database: Select a link to the data source instead of copying data. Select the Machine Data Source tabsheet and the nimsp data source: We can link to any of the tables in the nimsp MySQL database. For now, select only the nimsp table: 9
OK Access shows the nimsp linked table, which is a 29-million record table. The nimsp linked table is much like any other Access table and can be used to construct Access queries. When liking to a large MySQL database, we must be careful not to use a Make Table Access query that will create a new table and a file greater than 2 GB. Access breaks when a file exceeds 2 GB in size. MySQL Table in Access Design View Access Design Mode provides a way to view the contents of a linked-table that resides in MySQL. Changes cannot be made to MySQL data definitions. Click the nimsp table and select Design view: Yes 10
Close Design view. 11
Access Examples of Extracting NIMSP Data from MySQL With the ODBC connection to MySQL in place, we are ready to run Access queries with Access as the front end to the back end MySQL database. This section shows three examples of working with the NIMSP database in this way. Counts of NIMSP campaign finance records by state by election cycle Let's use the Access Query Wizard to create a cross tabulation of the number of records by state by election cycle. The resulting table is an inventory of NIMSP data by state and election cycle. Start with the Create ribbon and select the Query Wizard in the Queries section: Select: Query Wizard, Crosstab Query Wizard, and OK. Select the nimsp table and Next. Note here the nimsp linked-table to MySQL works like any other Access table. 12
Select recipient_state from available fields and press the > transfer button. This selection specifies that recipient_state information will be the rows of the crosstab. Next Select cycle for the column headings. This second selection specifies that election cycle will make up the columns of the cross tab. Next 13
We now specify the contents of each cell in the state-by-cycle crosstab. Let's count the number of id s for each cycle for each recipient state. Next. Let's name the query: nimsp_crosstab_state_cycle. Finish. View the query in design mode: 14
Microsoft Access shows a Query by Example in the format shown above. We can also view the Access SQL associated with the query. Select: Change from Design View to SQL View to see the Access SQL statement: Unfortunately, this simple Access SQL statement does not work with MySQL. Such crosstabs are easier to create in Access than MySQL Workbench. [This StackOverflow article describes the needed SQL code for MySQL Workbench.] Close Design View and double click on the query to run it. Access sends the query to MySQL and the results arereturned to Access. The query took about 21 minutes on my notebook PC with 12 GB of memory. Your results will vary depending on memory and disk speed. 15
Let s save the Access crosstab query results to an Excel file. Select the External Data ribbon. Select Excel. Save to file: NIMSP-Counts-by-State-2013-08.xlsx 16
For some reason, Access re-runs the long query again before saving the crosstab to Excel. Let s make a few changes to the Excel file: Delete row with recipient state re with cycle cycl Add TOTAL row Add Headers and Footers for printing This table is an inventory of the NIMSP data. The table can be used to determine if data are available for a specific state for a specific election cycle. For example, there are no data from Arkansas, Delaware, Nebraska, or South Dakota before 2000. The current NIMSP file only has 431 records from 2013 and those records are all from Wisconsin. Comparisons can be made to a similar file from April 2013 to see what data were added in July 2013. 17
State Subset File: Missouri Data from only a single state is often needed for projects, and working with such a subset is often much faster than queries against the full 29-million record database. In this example, let s create a new empty Microsoft Access database (Access 2007 or later) named with the state as the filename. Let s use Missouri in this example. Open empty file Missouri.accdb. In the External Data ribbon, select ODBC Database: Select the Machine Data Source tabsheet Select the nimsp Data Source. Select the nimsp table and OK. On the Create ribbon use the Query Wizard to compose a new query: 18
Select all data fields: >> Next Next Name the query: Make-MO 19
Modify the query design Finish the wizard. Add selection criteria MO under [recipient_state]: Select Make Table Query Type Let s name the new table MO : OK Close and save the query 20
Yes Double-click on make-mo make table query to run it: Yes With MySQL as the back end database the front end feedback on the query progress bar in Access is not very accurate: This dialog appears when the query has completed: Yes Access creates a new MO table: The file is a bit large (409 MB) but under the Access 2 GB file limit. 21
The new Missouri.accdb file with the MO table can be used completely independently of MySQL now. Access queries can now be run against the MO table without any need of MySQL or ODBC. Obviously, the same process could be used with other states. At present state subset files will fit in a 2 GB Access file at least one at a time. Finding Common Political Contributors in Two States: North Dakota and Wisconsin This example analyzes the common contributors to state political causes in two states, North Dakota and Wisconsin by looking at exact name matches. These states were picked since they have the smallest and largest number of NIMSP records in the MySQL database. One approach to this problem would be to use the steps shown in the last section to create ND and WI subsets in a single Access file. Working with a two-state subset in Access may often be faster when searching all records than working with the huge 29-million record master database. Instead of the Make Table approach from above, this section shows how to use MySQL Views of North Dakota and Wisconsin data as tables in Access. Review Creating MySQL Views in the document 50-State Campaign Finance Database: Building NIMSP MySQL Database to setup the NorthDakota and Wisconsin views in MySQL. Open a new empty Access file ND-WI.accdb. Select the Extern In the External Data ribbon, select ODBC Database: Select the Machine Data Source tabsheet 22
Select the nimsp Data Source: Select MySQL northdakota and wisconsin views that Access treats as tables. For unclear reasons, ODBC prompts for unique record identifiers when linking to a MySQL view. This is an annoyance but it is easy to specify fields that are unique: The MySQL views now appear as linked tables in Access: Let make a table of unique contributor names from ND and WI and see if there are any exact matches between these two states. 23
The make-nd-unique-query creates a list of unique contributors from North Dakota. In addition to the list of unique names, let s count the number of contributions and add up the total amount: There are 11,756 unique ND contributor_names. Let s create a similar unique-wi table for Wisconsin from a make-wi-unique-query: There are 641,390 unique WI contributors by name. How many of the 11,756 ND contributors match exactly by name with the 641,390 WI contributors? To filter the matches a bit to a more manageable number, let s also restrict the matches to names that have made 5 or more contributions in both states. We use a simple join by contributor_name to find the matches. The results are sorted in descending order by the total dollars of the Wisconsin contributors: 24
131 records matched these search criteria: With more standardization of names, perhaps more entries would collapse together. The use of fuzzy matches is more difficult (to avoid false matches) but would also result in more collapsed records. Additional research is needed with names of people to determine if a name represents multiple people with the same name, multiple contributions by the same person, or some combination of the two. For example, did the same Patrick Murphy make 7 contributions for $3375 in ND and 148 contributions for $28,290 in Wisconsin, or are there multiple people with that name making contributions in both states? 25
Other Approaches to Working with NIMSP MySQL Database Using Access as a front-end to a back-end MySQL database is a convenient way to document and store all the information needed for a particular study for future reference in a single file. Other approaches are also useful for studying the NIMSP MySQL Database. An earlier article showed how to use SQL in MySQL Workstation to find the Top 500 lists of contributors and recipients in NIMSP database. Look for WatchdogLabs articles on how to analyze data in the NIMSP MySQL database with other tools, including Excel and R: Working with NIMSP MySQL database with Excel Analyzing NIMSP MySQL data using R 26