Allegations of Fraud, Mistake, and Tampering

Similar documents
Florida Poll Results Trump 47%, Clinton 42% (Others 3%, 8% undecided) Rubio re-elect: 38-39% (22% undecided)

behavior research center s

The Youth Vote in 2012 CIRCLE Staff May 10, 2013

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Introduction to Quantitative Methods

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

OHIO: KASICH, TRUMP IN GOP SQUEAKER; CLINTON LEADS IN DEM RACE

Clinton Leads Sanders by 29%

Statistics 2014 Scoring Guidelines

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

Forecasting in a Polarized Era: The Time for Change Model and the 2012 Presidential Election

MARYLAND: CLINTON LEADS SANDERS BY 25

Microsoft Business Intelligence Visualization Comparisons by Tool

California Association of Clerks and Elections Officials Canvass Subcommittee

HYPOTHESIS TESTING WITH SPSS:

II. DISTRIBUTIONS distribution normal distribution. standard scores

Fairfield Public Schools

Data Exploration Data Visualization

FLORIDA: TRUMP WIDENS LEAD OVER RUBIO

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

RECOMMENDED CITATION: Pew Research Center, January, 2016, Republican Primary Voters: More Conservative than GOP General Election Voters

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Republicans Get behind Trump, but Not All of His Policies

The Math. P (x) = 5! = = 120.

Voting and Political Demography in 1996

Topline Report: Ohio Election Poll Baldwin Wallace University CRI HOLD FOR RELEASE 6:00 a.m., February 24, 2016

Normality Testing in Excel

Describing, Exploring, and Comparing Data

CALCULATIONS & STATISTICS

SENATE BILL State of Washington 64th Legislature nd Special Session

Diagrams and Graphs of Statistical Data

Inference for two Population Means

Before the Conventions: Insights into Trump and Clinton Voters July 8-12, 2016

Data Mining Introduction

The Fed, Interest Rates, and Presidential Elections

Simple linear regression

MICHIGAN: TRUMP, CLINTON IN FRONT

How To Check For Differences In The One Way Anova

Vote Dilution: Measuring Voting Patterns by Race/Ethnicity. Presented by Dr. Lisa Handley Frontier International Electoral Consulting

VIRGINIA: TRUMP, CLINTON LEAD PRIMARIES

Process Capability Analysis Using MINITAB (I)

NBC News/WSJ/Marist Poll. Indiana Questionnaire

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Trump Continues Big Michigan Lead (Trump 39% - Rubio 19% - Cruz 14% - Kasich 12%)

Descriptive Statistics

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Chapter 8: Political Parties

Obama Gains Edge in Campaign s Final Days

APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE

FOR RELEASE: MONDAY, SEPTEMBER 10 AT 4 PM

Election Activity Watchers Colorado law & regulations

Evaluating the Impact of Vice Presidential Selection on Voter Choice

FLORIDA LATINO VOTERS AND THE 2016 ELECTION

EMBARGOED FOR RELEASE: Wednesday, May 4 at 6:00 a.m.

Position Statement on Electronic Voting

Colorado Secretary of State Election Rules [8 CCR ]

Bernie Sanders has Re-Opened a Lead over Hillary Clinton in the Democratic Presidential Race in New Hampshire

6 3 The Standard Normal Distribution

Data analysis and regression in Stata

COMMON CORE STATE STANDARDS FOR

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

NATIONAL: AN ANGRY AMERICA

Georgia Secretary of State Organizational Structure

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Trump Still Strong Kasich/Cruz Rise (Trump 42% - Kasich 19.6% - Cruz 19.3% - Rubio 9%)

The Presidential Election, Same-Sex Marriage, and the Economy May 11-13, 2012

TEXAS: CRUZ, CLINTON LEAD PRIMARIES

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

What is the purpose of this document? What is in the document? How do I send Feedback?

Challenges for Trump vs. Clinton: Favorability, Attributes and More

Presidential Nominations

THE FIELD POLL. By Mark DiCamillo, Director, The Field Poll

THE FIELD POLL. By Mark DiCamillo and Mervin Field

Summarizing and Displaying Categorical Data

Senate majority within reach for Democrats Report on survey conducted in four Senate battleground states

Consumer Price Indices in the UK. Main Findings

Trump Still on Top - Cruz Rises in Michigan (Trump 42% - Cruz 19% - Rubio 15% - Kasich 14%)

NATIONAL TALLY CENTER (NTC) OPERATIONS PROCEDURES Presidential and Provincial Council Elections

Variables Control Charts

Clinton Leads Sanders by 28%

Geostatistics Exploratory Analysis

EMBARGOED FOR RELEASE: Wednesday, December 23 at 4:00 p.m.

Who Governs? CHAPTER 22 REVIEWING THE CHAPTER CHAPTER FOCUS STUDY OUTLINE

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

The Margin of Error for Differences in Polls

2. Simple Linear Regression

14.30 Introduction to Statistical Methods in Economics Spring 2009

LATINO VOTERS AND THE 2016 ELECTION

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

9. Sampling Distributions

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

Transforming Bivariate Data

Testing for differences I exercises with SPSS

COUNTY AND DISTRICT INITIATIVE & REFERENDUM PETITIONS

Ion Sancho Supervisor of Elections

America s Voice/LD State Battleground Survey, April 2016

In the past, the increase in the price of gasoline could be attributed to major national or global

Transcription:

December 15, 2016 The Honorable Ruth Johnson Michigan Department of State 430 West Allegan Street Lansing, Michigan 48918 Dear Secretary Johnson, I am writing you in your capacity as the elected official charged with supervising elections in the State of Michigan. The occasion for this letter is the recent and unprecedented set of statements by prominent individuals that allege fraud or mistake by both state and local officials in Michigan, tampering with the election through organized hacking sponsored by foreign powers, and vulnerability of election equipment to such hacking and alleged evidence that vote totals could have been manipulated. The effect of these assertions, particularly given partisan tensions and the polarization of the electorate during the recent election campaigns, is to cast doubt on the integrity of the voting process in our state. In this letter, we summarize the results of our elections forensics analysis that addresses some of these doubts. Allegations of Fraud, Mistake, and Tampering We identify three sets of assertions that motivate this analysis: The recall petition of Jill Stein alleging fraud or mistake by county and state boards of canvassers, as well as other election officials. The public statements (including blog posts and articles) of a group of computer scientists, and articles based on these that include assertions that there is evidence that vote totals could have been manipulated. The statement by the White House press secretary that the President of the United States has ordered an intelligence agency review of an alleged pattern of malicious cyber activity related to our presidential election cycle. Any claim that election results were affected by fraud, mistake, or tampering is a serious matter that, if supported by credible evidence, would cause great concern. However, in none of the three sets of statements we reviewed were we able to find specific election result anomalies that were caused, or suspected of being caused, by the alleged activities. Regardless of this lack of specific assertions of data irregularities, we believe the seriousness of the matter justifies an independent evaluation using the tools available to us. Anderson Economic Group LLC www.andersoneconomicgroup.com 1555 Watertower Place, Suite 100 East Lansing, MI 48823 Tel: (517) 333-6984 East Lansing Chicago New York City Istanbul

The Honorable Ruth Johnson, December 15, 2016, page 2 The Use of Election Forensics Shortly after the certification of the election results by the Board of State Canvassers in late November, we began to analyze the results of county elections using methods that have been developed within the relatively new field of Election Forensics. The purpose of Election Forensics is to detect anomalies that may indicate fraud. The statistical methods of Election Forensics cannot, by itself, prove or disprove fraud. Such statistical methods, if applied properly and within a reasonable time after an election, can help citizens and election officials evaluate the seriousness of issues involving the election process, including both detecting evidence of actual fraud, and demonstrating a lack of evidence of fraud. The use of these methods can also form a deterrent to persons that may contemplate future actions to tamper with election results, by highlighting the ability to detect evidence of tampering and by increasing the difficulty and complexity of any such efforts. The methods we use here are intended to detect evidence of episodic fraud or tampering, meaning fraud or tampering that occurred in only some elections and some counties. This is the category of activity that was explicitly alleged in the Jill Stein petition, and the likelihood of such activity was stated in the articles based on the work of the computer scientists. No Substitute for Election or Supervision of Elections Of course, the statistical methods used in Election Forensics are not a substitute for elections themselves, and the results of such analyses do not substitute for the proper supervision of elections and canvassing of votes under law. In the analysis we summarize for you below, we are careful to recognize that voters, not statisticians, are the persons whom the law should give the highest deference. Two Sets of Statistical Tasks: Exploratory Data Analysis and Hypothesis Tests In our forensic analysis, we conducted two sets of statistical tasks designed to detect evidence of episodic tampering or fraud, or the lack of such evidence: 1. Exploratory Data Analysis, and Visualization of Results in Successive Elections We use exploratory data analysis ( EDA ) techniques, and simple statistical analyses that involve comparing results from the same counties in successive elections for the same office. These steps allow us to inspect the pattern of vote across counties, and either detect unusual deviations from the general pattern, or note that no such unusual deviations exist. In particular, we conduct two such tests: EDA Test I: Shape of Distribution of Vote Shares by County, by Candidate We array the vote shares for each candidate across all counties, and visualize it in the form of a histogram. EDA Test II: Pattern of Shifting We look at the vote shares for major party candidates for an array of counties, and then compare them with the same vote shares for the same party in a successive election for the same office. We fit a line representing the predominant pattern of such changes across time, and visualize the scatter plot in a manner that illustrates this pattern. 2. Hypothesis Tests We conducted two hypothesis tests, meaning testing of a hypothesis against the empirical data. We make use of the EDA described above in these tests.

The Honorable Ruth Johnson, December 15, 2016, page 3 Test I: Detection of Unexplained Outlier Counties First, we used a two-step test to detect outlier counties. These counties, at the initial step, are identified by statistical techniques (and inspection of graphics generated using exploratory data analysis). In the second step, we examine whether past voting patterns, or other known factors, provide an unambiguous rationale for the unusual voting patterns in those counties. If counties are identified as outliers in the first step, and analysis of past voting does not provide a rationale for that behavior in this election, we consider this as evidence of a data anomaly that could have been caused by fraud or tampering, and which deserves further investigation. If no such unexplained outliers remain, then we consider this a lack of such evidence. Test II: Detection of Anomalies in the Cumulative Vote Pattern The second set of tests involve examining the statistical pattern of vote shares among counties, and testing it to see if it deviates from an expected pattern. In particular, we calculate an empirical cumulative distribution function from the vote shares (share of vote recorded for candidates) from individual counties. In a series of contested elections for the same office, we expect that some counties will vote more strongly for one party s candidate than others, and that this pattern of relative voting preference among counties is fairly stable over time. Such a pattern creates a signature of voting that is distinctive, and deviations from this pattern can be observed, and statistically tested. Using this procedure, we test whether the cumulative distribution of vote shares (the signature of county voting patterns) is significantly different from an idealized distribution. Such an idealized distribution can be constructed from recent voting behavior. If either visual inspection of the distributions, or a statistical test, demonstrates that these patterns are sharply different in a manner that is not clearly explained, we consider this to be evidence that deserves further investigation. If no such significant difference is established either by inspection or by statistical test, we consider this to be a lack of any such evidence. Assumptions We Did Not Make It is important to note that we do not make a number of assumptions that are sometimes made in studies of voter behavior, and that are clearly violated in practice. In particular, we do not assume any of the following: randomly distributed voters; evenly sized voting districts; the existence of a one-dimensional axis of voting preferences; a symmetric distribution of vote shares among districts; accurate statements by voters responding to polls (including exit polls); lack of sampling error and question bias in polling, including in exit polling; lack of migration among districts; perfectly informed voters; perfectly supervised elections; complete turnout; universal voting behavior by adults; and lack of protest votes. We also do not rely upon any rational voter model, nor any polling data. The primary, and indeed nearly the only, data we rely upon are actual recorded votes in specific voting districts and specific elections. Results of Exploratory Data Analysis The results of our EDA tasks are as follows: The histograms of vote shares across counties, for both 2012 and 2016, indicate the expected pattern across the large majority of counties. In 2016, we can identify 3 outlier counties. We discuss these further below, where we conclude that the results in these counties are readily explainable by at least two methods.

The Honorable Ruth Johnson, December 15, 2016, page 4 See the histograms in Attachment A (including figures 1, 2, 6, and 7). The scatter plots showing the county results in 2012 and 2016 for the same party display either a close relationship, or a very close relationship, between the same-county vote shares in 2012 and 2016 for the same party s candidate for President. The relationship also holds if the comparison is done on total votes. This strongly shows that voter preferences shifted in a consistent manner across counties, as would be expected. No evidence appears here that suggests any irregularities. See Attachment A, figures 3, 4, 8 and 9; both vote shares and vote totals are displayed. The empirical cumulative distribution function graphs show an expected S-curve shape, in both election years, and for the candidates of both parties. These distributions have shifted in the expected direction given the shift in voter preference across a successive election, but retained a similar shape. No obvious indication appears that suggests any irregularities. See Attachment A, figures 5 and 10. Results of Hypothesis Test I The results of Hypothesis Test I are as follows: The distribution of vote shares for both Trump and Clinton can be represented by a histogram that displays the expected shape of a curve, skewed to one side, with a clear central tendency. (The shape of the curve is similar to the classic normal curve, but is not symmetric and cuts off sharply. As discussed further below, the distribution appears to follow a pattern similar to that presented by known statistical distributions, although this is not critical for this test.) The main body of the distribution, for both Trump and Clinton vote shares, is contained within the range represented by the mean plus or minus two standard deviations. There are 3 outlier counties in Michigan in 2016: Wayne, Washtenaw, and Ingham. All these are more than two standard deviations from the mean, and are also easily identifiable as the only counties out of 83 that have vote share that low (for Trump) and high (for Clinton). See the histograms of vote shares in Attachment A, including figures 1, 2, 6, and 7. This initial identification of an outlier only demonstrates the need to look for a clear reason for the unusual result. We examine two possible explanations: 1. Strong local voter preferences that are different from those in other counties in the state. As we are quite familiar with these counties, examining this possibility is relatively easy, and confirms a rationale for this unusual voting behavior. That rationale is further demonstrated below. 2. Past voting patterns consistent with the results from this election. A straightforward comparison (scatter plot) showing the 2012 versus the 2016 vote shares by counties confirms that the relative vote shares for nearly all counties are quite stable across two presidential elections. In particular, the relatively high Democratic party vote share in the presidential elections in 2012 for these counties predicts very closely the high share in the same election in 2016.

The Honorable Ruth Johnson, December 15, 2016, page 5 See the scatter plots of vote shares in Attachment A, figures 3 and 8. Conclusion of Hypothesis Test I Given the clear rationale for the voting behavior in all three outlier counties, and the high correlation of same-county, same-party voting across two subsequent presidential elections, we find no statistical evidence of episodic vote tampering from this test among Michigan counties in the 2016 general election. Results of Hypothesis Test II The results of Test II are as follows: The empirical cumulative distribution function of vote shares across counties demonstrates the S-curve shape we expect given the distribution of vote shares across counties described above, and the historic pattern of votes across counties. A side-by-side inspection of the CDF of vote shares in 2012 and 2016, for both the Republican and Democratic party candidates, suggests that a very similar pattern occurred across counties for the same-party candidates in successive elections. As can be expected, the pattern shifts as voters favor the same-party candidates by different levels in these different elections. However, the relative pattern among counties remains very similar. See the cumulative distribution function graphs in Attachment A, figures 5 and 10. We observe that the shape of the CDF of votes, particularly for one party s candidate in both 2012 and 2016, appears similar to a Weibull, or perhaps a log-normal or exponential, distribution. The Weibull distribution has been demonstrated in past research in different fields to represent the diffusion of ideas and innovations, and therefore has a theoretical and practical motivation. This provides an opportunity for a statistical test that, while not conclusive, is nonetheless worth undertaking. For this, we compare this empirical distribution with an idealized Weibull, normal, and log-normal cumulative distribution. We use statistical tests of the null hypothesis that the empirical distribution which reflects some random and other factors is a sample from the idealized one. For the Republican candidate in 2012 and 2016, the data do not reject the null hypothesis. For the Democratic party candidate, the pattern is also consistent across two elections, though not matching the same distribution as closely. The consistent patterns provide no indication of any irregularities. While this test is not dispositive, it does corroborate the other findings. Visual inspection of the distributions was consistent with these test results. In particular, the smooth S-curve of a Weibull distribution fits the 2012 and 2016 county vote share patterns for the Republican party candidates very closely in both election years. Conclusion of Test II Not rejecting the null hypothesis of a smooth S-curve of empirical vote shares means the data provide no statistical evidence of episodic vote tampering from this test among Michigan counties in the 2016 general election.

The Honorable Ruth Johnson, December 15, 2016, page 6 Conclusion from EDA Tasks and Both Sets of Hypothesis Tests An elections forensic analysis of the 2016 general election results in Michigan is strongly motivated by the three sets of assertions of fraud, mistake, and tampering cited above. We examined the certified election results from the State Board of Canvassers with exploratory data analysis tasks, and two sets of hypothesis tests. These methods were employed to detect evidence of fraud, or indicate no such evidence, consistent with the purpose and the limitations of election forensics methods. The exploratory data analysis tasks included creating and examining histograms of vote shares by county in two elections, scatter charts showing same-county vote shares across two elections, and empirical cumulative distribution functions for the same set of candidates and elections. The results are all consistent in showing a pattern of votes, and shifting of votes across elections, that is not unusual. There is no evidence from these results that suggests irregularities or episodic fraud or tampering with the Michigan 2016 general election results for the office of President. The hypothesis tests involved testing whether the observed results were significantly different from the pattern of results that would be expected given the results from the past two elections. The results from both sets of hypothesis tests indicate no evidence of irregularities or episodic fraud or tampering with the Michigan 2016 general election results for the office of President. Indeed, what we observed was a relatively smooth distribution of votes across counties; a lack of unexplained outliers; and a reasonably predictable shift in recorded votes in the same counties for major parties across two successive elections. Given this, it would be very difficult to conclude there was any large scale tampering, hacking, fraud, or mistake in the 2016 general election in the State of Michigan for the office of President of the United States. Sincerely, Patrick L. Anderson Attachments: A. Selected Graphic Exhibits

Figure 1: Number of Counties by Vote Share, Trump 2016 Number of Counties 0 10 20 30 2 1 4.2.25.3.35.4.45.5.55.6.65.7 Share of County Vote 7 11 12 28 17 2

Figure 2: Number of Counties by Vote Share, Romney 2012 32 Number of Counties 0 10 20 30 1 1 2 6.2.25.3.35.4.45.5.55.6.65.7 Share of County Vote 12 23 5 2

Share of County Vote, Trump 2016 Figure 3:.2.25.3.35.4.45.5.55.6.65.7.75.8 Share of County Vote, 2012 vs. 2016.2.25.3.35.4.45.5.55.6.65.7.75.8 Share of County Vote, Romney 2012 County Linear Best Fit, Relationship between 2012 and 2016 Results Memo: Equal Vote Share for Romney and Trump, for reference

Figure 4: Total Votes by County, 2012 vs. 2016 Total Votes, Trump 2016 0 100000 200000 300000 0 100000 200000 300000 Total Votes, Romney 2012 County Linear Best Fit, Relationship between 2012 and 2016 Results Memo: Equal Vote Total for Romney and Trump, for reference

Figure 5: Cumulative Share of Candidate Vote 0.2.4.6.8 1 Empirical Votes Received by Candidate.3.4.5.6.7 Candidate Share of County Vote Romney (2012) Cumulative Votes Trump (2016) Cumulative Votes

Figure 6: Number of Counties by Vote Share, Clinton 2016 Number of Counties 0 10 20 30 2 17 29 15 10 5 3 1 2.2.25.3.35.4.45.5.55.6.65.7 Share of County Vote

Figure 7: Number of Counties by Vote Share, Obama 2012 Number of Counties 0 10 20 30 2 10 27 26 11 4 2 1 1.2.25.3.35.4.45.5.55.6.65.7 Obama Share of County Vote

Share of County Vote, Clinton 2016 Figure 8:.2.25.3.35.4.45.5.55.6.65.7.75.8 Share of County Vote, 2012 vs. 2016.2.25.3.35.4.45.5.55.6.65.7.75.8 Share of County Vote, Obama 2012 County Linear Best Fit, Relationship between 2012 and 2016 Results Memo: Equal Vote Share for Clinton and Obama, for reference

Figure 9: Total Votes, Clinton 2016 0 200000 400000 600000 Total Votes by County, 2012 vs. 2016 0 200000 400000 600000 Total Votes, Obama 2012 County Linear Best Fit, Relationship between 2012 and 2016 Results Memo: Equal Vote Total for Clinton and Obama, for reference

Figure 10: Cumulative Share of Candidate Vote 0.2.4.6.8 1 Empirical Votes Received by Candidate.2.4.6.8 Candidate Share of County Vote Obama (2012) Cumulative Votes Clinton (2016) Cumulative Votes