Live Event Count Issue



Similar documents
Data Integrator Performance Optimization Guide

MS SQL Performance (Tuning) Best Practices:

DBMS / Business Intelligence, SQL Server

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Oracle Database: SQL and PL/SQL Fundamentals NEW

LICENSING MANAGEMENT SERIES. A Guide to Assessing Windows Server Licensing

SECTION 4 TESTING & QUALITY CONTROL

PUBLIC Performance Optimization Guide

Business Application Services Testing

Implementing and Maintaining Microsoft SQL Server 2008 Integration Services

ICAB4136B Use structured query language to create database structures and manipulate data

StreamServe Persuasion SP5 Microsoft SQL Server

With each new release of SQL Server, Microsoft continues to improve

DBMS / Business Intelligence, Business Intelligence / DBMS

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014


Toad for Oracle 8.6 SQL Tuning

Report and Dashboard Template User Guide

Abstract. For notes detailing the changes in each release, see the MySQL for Excel Release Notes. For legal information, see the Legal Notices.

Oracle Database: Develop PL/SQL Program Units

EASY for NAVISION: Archiving for Microsoft Dynamics NAV.

DB Audit Expert 3.1. Performance Auditing Add-on Version 1.1 for Microsoft SQL Server 2000 & 2005

System Requirements for Archiving Electronic Records PROS 99/007 Specification 1. Public Record Office Victoria

SAP Data Services 4.X. An Enterprise Information management Solution

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Optimizing Performance. Training Division New Delhi

Advanced Oracle SQL Tuning

Application of Predictive Analytics for Better Alignment of Business and IT

Oracle Database 11g: SQL Tuning Workshop

Inside the PostgreSQL Query Optimizer

LearnFromGuru Polish your knowledge

Smarter Balanced Assessment Consortium. Recommendation

Module 1: Getting Started with Databases and Transact-SQL in SQL Server 2008

Tips and Tricks SAGE ACCPAC INTELLIGENCE

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

SQL Server Maintenance Plans

Oracle Database: SQL and PL/SQL Fundamentals NEW

MOC 20462C: Administering Microsoft SQL Server Databases

High-Volume Data Warehousing in Centerprise. Product Datasheet

EMC Documentum Business Activity Monitor

Quality Assurance - Karthik

Testing Big data is one of the biggest

Capacity Plan. Template. Version X.x October 11, 2012

Data processing goes big

Government of Saskatchewan Executive Council. Oracle Sourcing isupplier User Guide

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

PSW Guide. Version 4.7 April 2013

Together we can build something great

TrendWorX32 SQL Query Engine V9.2 Beta III

Statement of Support on Shared File System Support for Informatica PowerCenter High Availability Service Failover and Session Recovery

PERFORMANCE TIPS FOR BATCH JOBS

MySQL for Beginners Ed 3

Rational Rational ClearQuest

Office of History. Using Code ZH Document Management System

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

SQL Server Business Intelligence on HP ProLiant DL785 Server

DBAs having to manage DB2 on multiple platforms will find this information essential.

Integrating Big Data into the Computing Curricula

What's New in SAS Data Management

Oracle Database: Program with PL/SQL

Troubleshooting SQL Server Enterprise Geodatabase Performance Issues. Matthew Ziebarth and Ben Lin

Technical Support. Technical Support. Customer Manual v1.1

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

7.94 New Features Guide

NUTECH COMPUTER TRAINING INSTITUTE 1682 E. GUDE DRIVE #102, ROCKVILLE, MD WEB: TEL:

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

SQL SERVER FREE TOOLS

HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio

Learning SQL Data Compare. SQL Data Compare - 8.0

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at

Windows 2003 Performance Monitor. System Monitor. Adding a counter

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

SQL Server 2012 Performance White Paper

BI 4.1 Quick Start Java User s Guide

Whitepaper Document Solutions

Querying Microsoft SQL Server

How To Use The Correlog With The Cpl Powerpoint Powerpoint Cpl.Org Powerpoint.Org (Powerpoint) Powerpoint (Powerplst) And Powerpoint 2 (Powerstation) (Powerpoints) (Operations

SSMS Built in Reports for Server and Database Monitoring

PRODUCTIVITY IN FOCUS PERFORMANCE MANAGEMENT SOFTWARE FOR MAILROOM AND SCANNING OPERATIONS

PRODUCT OVERVIEW SUITE DEALS. Combine our award-winning products for complete performance monitoring and optimization, and cost effective solutions.

MODULE FRAMEWORK : Dip: Information Technology Network Integration Specialist (ITNIS) (Articulate to Edexcel: Adv. Dip Network Information Specialist)

The Complete Performance Solution for Microsoft SQL Server

SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

SQL Server Performance Tuning and Optimization

Ready Time Observations

Database Application Developer Tools Using Static Analysis and Dynamic Profiling

Determining SQL Server 2012 Core Licensing Requirements at SA Renewal. June 2012

Food Industry. global.com

21 CFR Part 11 Administrative Tools Part 11 Trackable Changes Maintenance Plans Upgrades Part 11 LDAP Support QC-SORT

Standard Operating Procedure for Database Operations

Course 20461C: Querying Microsoft SQL Server Duration: 35 hours

Basic Unix/Linux 1. Software Testing Interview Prep

Transcription:

Appendix 3 Live Event Document Version 1.0

Table of Contents 1 Introduction and High Level Summary... 3 2 Details of the Issue... 4 3 Timeline of Technical Activities... 6 4 Investigation on Count Day and Remedial Action Taken... 8 4.1 SQL and Manual Report Creation Process... 8 4.1.1 Data Structures and SQL... 8 4.2 Manually Created Result Example... 16 5 Investigation following Count Day... 17 5.1 Data Analysis... 17 5.2 System Diagnosis and Root Cause Analysis... 17 5.2.1 Stored Procedure : Dashboard_GetBallots... 18 5.2.2 Investigation into why data order is modified... 18 5.3 Previous Testing Coverage and comparison... 20 5.4 Analysis of Recent System Changes... 20 Document Change Summary.173 Page 2 of 21

1 Introduction and High Level Summary This document describes the issue that occurred on the day of the count which delayed the announcement of the results of the Mayor contest. It also provides details of the remedial action taken on the night of the count, the follow-up investigations after the count, and the findings of the investigation. The summary of the issue is that minor discrepancies were identified in the first set of Constituency Level Mayor Contest Final reports. Manual extraction of data and re-creation of reports allowed the accurate count data to be announced on the night. Analysis of previous test data shows that the issue was not present in UAT reports nor any of the Kit Readiness reports. Analysis of system changes (application code, database schema [including indices, views, and stored procedures], server patch levels, etc.) show no modifications since UAT. The root cause of the issue has been established to be the incorrect construction of the vote matrix due to data extracted from the database being in an unexpected order. The reason has been identified as a pre-existing code defect which was also present in 2012, which combined with a subsequent change to the database server configuration in 2015 (specifically SQL Server Max Degree Of Parallelism) resulted in the ordering of the data during the calculation of the mayoral figures to be incorrect in 2016 compared with 2012. The issue was not evident in the 2012 election because this combination of conditions was not present. The underlying code in this area has not changed since 2012. The server settings used in 2015 & 2016 followed industry best practice guidelines. No such guideline existed in 2012. Analysis of previous test data shows that the issue did not occur in UAT or Kit Readiness. The reason for this is that there is a Cost Threshold parameter which triggers the use of parallelism in SQL Server. The smaller datasets in UAT and Kit Readiness required a lower calculated execution cost which fell below the cost threshold required to invoke parallelism and without this being executed the data was returned in the expected order. Page 3 of 21

2 Details of the Issue Discrepancies were found in the Final Results report for the Constituency level London Mayor contest. The first discrepancy found was that the Total Number Of Ballot Papers Counted did not add up to the Total Number Of Good Votes plus the Total Number Of Ballots Rejected. The second discrepancy found was that the numbers of First Preference and Second Preference votes for candidates differed each time the Final Results report was run. It was found that the Total Number Of Ballot Papers Counted and the Total Number Of Ballots Rejected figures were constant, but the Total Number Of Good First and Second Preference Votes was variable. The following abbreviated screenshots illustrate the issue. Bexley and Bromley Final Report taken at 14:53:51 In the example above, the number of good votes for 1 st preference (189,027) when added to the number of ballots rejected (2,890) give a total of 191,917 which is again not equal to 191,514. Page 4 of 21

Bexley and Bromley Final Report taken at 15:13:00 In the example above, the number of good votes for 1 st preference (189,066) when added to the number of ballots rejected (2890) give a total of 191,956 which is not equal to 191,514. In both examples of the report, 1. The Total Number of Ballot Papers Counted is constant at 191,514 2. The Total Number Of Ballots Rejected on 1 st preference is constant at 2890 3. Not shown, but the Total Number Of Ballots Rejected on 2 nd preference is constant at 25,106 4. The number of good plus number rejected in both 1 st and 2 nd preference does not match the total number of ballots. Page 5 of 21

3 Timeline of Technical Activities The following is a description of the events that occurred from identification of the issue to publication of results to GLA. The times provided are in some cases estimates. Time Description 16:00 The second set of constituency mayor results from Lambeth and Southwark triggered the start of the investigation 16:00 18:45 Consulted with ERS on report discrepancy. Checked data for Bexley and Bromley as well. Confirmed that the sum of the 1 st preference votes + bad 1 st preference votes was not equal to total number of ballots. Investigation into possible causes of data changing. Initial suspicion on the error at this stage was either: - a central consolidation error, - a local count-site consolidation error - an error in the dataset sent to central site Connected to Bexley and Bromley database and extracted the frozen vote matrix from there. Transposed all 1 st preference votes onto paper and totalled them. Concluded that the votes sent to central site (which should have matched the votes in the frozen vote matrix and the votes on the Final Bexley and Bromley report) did not and were in fact short by the exact difference on the central report Additional information was received regarding suspected errors output to the log files information was that totals were different each time the report was run, it was not clear at this stage exactly how these totals were derived Additional information was received stating that provisional reports had been run 3 times and each time they had different data. It was confirmed that no processing was in progress at the count site, i.e. that the data should now be static. Investigation into why the data is changing, suggested possibilities at this time included; MSMQ backlog, incomplete blocked transactions, being hacked. Database trace applied to identify any instances of data updates this confirmed that no data was being updated. Inspected MSMQ nothing on the queues Concluded that it must be a consolidation issue. Started code inspection in this area targeting code suggested by log file output. This area of the code was not familiar to the technical team and there was no immediately obvious cause. 18:50 A decision was made following the 18:50 executive meeting with GLA to manually query the database to produce the data required. 18:50 to 20:25 Generation of SQL to produce manual data to identify total votes per constituency, per candidate, and by preference (1 st or 2 nd ). Checking of data produced. This was done by sanity checking against the SQL in the core product used to generate the vote matrix. 20:25 20:35 1 st preference data produced from SQL 20:41 20:45 2 nd preference data produced from SQL Page 6 of 21

20:40 21:25 Generation of SQL to produce manual data for 2 nd preference data for Goldsmith and Khan per constituency, per candidate. 21:29 21:34 2 nd preference data for Goldsmith and Khan produced from SQL 21:40 22:00 Generation of SQL to produce manual data for 1 st preference = 2 nd preference data 22:16-22:42 Data for counts of votes where 1 st preference = 2 nd preference produced from SQL Page 7 of 21

4 Investigation on Count Day and Remedial Action Taken The technical team were informed of the issue and immediately started to investigate for possible causes. This took on 2 streams of activity; one to trace back the code for the generation of the final report and one to manually query the database to check the numbers. It was discovered that 1. the underlying data stored in the database was correct. This was confirmed by the comparison of the underlying data extracted using independent SQL queries with the data extracted by the stored procedure, as explained below. 2. there was a problem in the way in which the data for the 1 st and 2 nd preference votes for candidates was being extracted and aggregated by the code module. The exact nature and root cause of this problem was not identified on the night 4.1 SQL and Manual Report Creation Process SQL was generated to extract data for the 1st and 2nd preference votes for candidates from the raw table data. This SQL was created independently of the e- Counting system, reviewed by peers, the output was checked against raw data extracted from the live stored procedure (i.e. prior to aggregation via code), then separately checked by ERS. The results were accurate. The SQL generated was stand-alone and did not change the e-counting system or the data within it in any way. The reports were constructed manually using Excel, data cross-checked, converted to PDF, and delivered to GLA. 4.1.1 Data Structures and SQL The core data for determining the results is contained in 4 tables; Contest, Ballot, Vote, and Candidate. The following diagram depicts the relationships between Contest, Ballot, and Vote. The candidate is referenced via ContestID and CandidateNumber. Page 8 of 21

The example outputs shown in this document are for Bexley and Bromley. Although the values for Total Number of Ballot Papers Counted was consistent across the reports, we cross-checked this using the following SQL: The output from this is as follows for Bexley and Bromley: The SQL to generate the 1 st Preference votes per candidate is as follows: The output from this query inserted into Excel and then summed shows the values: Page 9 of 21

The total values for each of these match the total 1 st preference votes presented in the corresponding Constituency Level Mayor Contest Final reports generated from the e-counting system. The individual candidate values did not match; the ones presented in the systemgenerated reports were variable. At this point we cross-checked the figures from our independently generated SQL with a very slightly modified version of the stored procedure Dashboard_GetBallots SQL: This SQL produced the following output from the Results and Messages tabs; the results showing that the number of records for 1 st preference is 188,624 which matches the total number from our independent SQL output: Page 10 of 21

This provided confidence that the underlying data from the stored procedure was accurate. We then extracted the dataset into Excel, removed the BallotId column, leaving the CandidateNumber and Choice (=1) entries. This was then pivoted by CandidateNumber to sum the Choice values and this presented the data shown below: This matches exactly the data we produced independently for 1 st choice preferences. At this stage we could potentially have made the correlation between the sort order being incorrect, but everyone was so intensely involved in generating the data, checking for the root cause had been set aside. The SQL to generate the 2 nd Preference votes per candidate is as follows: Page 11 of 21

The output from this query inserted into Excel and then summed shows the values: We repeated the process of checking these against the slightly modified stored procedure SQL for 2 nd preference votes, checking the output, exporting to Excel and pivoting by candidate number: Page 12 of 21

This matches exactly the data we produced independently for 2nd choice preferences. The SQL to generate the raw data for the matrix of count of 2 nd preference votes per candidate based on 1 st preference candidate vote is as follows: Page 13 of 21

An extract from the output from this SQL is as follows: This SQL was executed for each constituency and the data copied into a text file and delivered to ERS who then checked the data for consistency. This output was also pasted into an Excel spreadsheet and then totals were calculated per row. This produced an output as seen from the extract below. Page 14 of 21

The highlighted cells above show the count of 2 nd preference votes for candidate 1 (Sian Rebecca BERRY) which totals to 28960. The columns were summed to provide a count of the total number of good 2 nd preference votes for the constituency: The overall number of 2 nd preference votes for Bexley and Bromley is 163518. It is also noted that the grand total of 2 nd preference votes across all constituencies is 2212718. Page 15 of 21

4.2 Manually Created Result Example The abbreviated screenshot below shows the correct data extracted. Page 16 of 21

5 Investigation following Count Day 5.1 Data Analysis Further analysis of the data established that for each candidate the sum of the votes for 1 st and 2 nd preference was always constant: Brent & Harrow Final Report taken at 16:56:22 Brent & Harrow Final Report taken at 17:11:44 It can be seen that for the first candidate the total votes for 1 st + 2 nd preference always add up to 31765. Similarly for the second candidate the total is always 3321. 5.2 System Diagnosis and Root Cause Analysis Backups of the databases were restored to test systems and the code was debugged to identify the root cause of the problem. It was established that the construction of the votes matrix relies on a dataset returned from a stored procedure to be in a specific order; ordered by BallotID (a unique sequential identifier for individual ballot papers, and which is a key into the table to identify other attributes of the Ballot, e.g. barcode, batch, etc., and to enable linking to other tables such as Contest and Vote). However, the stored procedure Dashboard_GetBallots which extracts the data for the construction of the vote matrix does not specify an ORDER BY clause and this can cause the code method to calculate the 1 st and 2 nd preference values incorrectly. The source code repository shows that the specific code module, stored procedure, underlying tables and indexes have not been modified since 2012. SQL Server does not guarantee the order of dataset output unless there is a specific ORDER BY clause. https://msdn.microsoft.com/en-us/library/ms188385.aspx. Extract: Page 17 of 21

5.2.1 Stored Procedure : Dashboard_GetBallots It can be seen from above that the query does not specify an ORDER BY clause. The stored procedure was modified in the test environment to include an ORDER BY clause as follows: ORDER BY BallotId The final reports were then re-run several times and it was seen that the data matched the output produced manually on the night of the count every time. This proves that the root cause of the issue is that the data was not sorted as expected when retrieved from the database. 5.2.2 Investigation into why data order is modified Further investigation showed that there had been no significant changes to the code or database in these areas since 2012 and therefore we looked into what could cause the data order to be changed. The investigation identified some examples reported on SQL Server forums for when this scenario can occur: 1. An unordered index scan might be carried out in either allocation order or key order dependant on the isolation level in effect. 2. The merry go round scanning feature allows scans to be shared between concurrent queries. Page 18 of 21

3. Parallel plans are often non deterministic and order of results might depend on the degree of parallelism selected at runtime and concurrent workload on the server. 4. If the plan has nested loops with unordered pre-fetch this allows the inner side of the join to proceed using data from whichever I/Os happened to complete first The most relevant point from above is #3. SQL Server database server settings for Parallelism can have an impact on the execution plan and if parallelism is used the load of the query is spread across multiple processors which, when aggregated together to produce the results, increases the chance that the order is uncertain. There are 2 main settings that are related to Parallelism in this context: 1. Max Degree Of Parallelism (MDOP): https://support.microsoft.com/en-gb/kb/2806535 This limits the number of processors to use in parallel plan execution. There are 3 sets of values that cause difference in operation: 0: This will utilise as many processors as are available and therefore may use parallelism 1: This will only utilise 1 CPU and will effectively disable parallelism. 2-n: This will utilise as many processors as specified [if available] and therefore may use parallelism 2. Cost Threshold For Parallelism: https://technet.microsoft.com/en-us/library/ms188603(v=sql.105).aspx This value is used to specify a value that will be compared to the estimated cost of the query to determine whether or not to use a parallel plan. This is used in conjunction with MDOP. The effect of changing the settings can be seen in the execution plans for the core SQL for the Dashboard_GetBallots stored procedure. No Parallelism (MDOP=1): Page 19 of 21

Parallelism (MDOP=2): The 2016 setting for Max Degree Of Parallelism was 2. The SQL Servers used in the e-counting system were all virtual servers and all had 4 vcpus. This setting is in line with Microsoft s recommended guidelines. Using the setting MDOP=1 (disable parallelism) and running the reports with the original version of the stored procedure produced the correct results. To summarise the sort order investigation, there is a missing ORDER BY clause in the stored procedure, and had this been in place, under any of these circumstances it would have guaranteed the correct report data. The SQL Server settings for MDOP were different in 2016 from 2012 and this is believed to have resulted in the data order change and the problems found in the reports. 5.3 Previous Testing Coverage and comparison The results from the UAT testing were analysed in detail to check if this specific issue occurred during UAT; it did not occur in the UAT system; all reports were accurate. Small datasets; specifically UAT and Kit Readiness did not benefit from parallelism based on cost threshold not being exceeded, and that the query execution plan was simpler and did not modify the order of the data. Load testing was also undertaken. The aim of load testing was to confirm that the system would support the number of transaction throughput levels and ensure that there were no application, server and network performance bottle necks. Random images were processed by the test robots and at no point were there a set of known results to analyse. However, similar investigations on that dataset with MDOP settings enabling parallelism show that the issue would have been present. 5.4 Analysis of Recent System Changes Other than QA Scanning modifications, no software changes have been made since 14-Oct-15 (version 5.1.10.8) prior to UAT and the gold build was produced on 13- Jan-16 from this version. Security patches were applied to servers up to 23-Feb-16. These are listed here https://drsmk.sharepoint.com/docs/_layouts/15/docidredir.aspx?id=drsdoc-106-20785 Page 20 of 21

All desktops and scanners retained system levels as of UAT. SQL Server version was 10.50.4042 for UAT and upgraded to 10.50.6220 prior to Kit Readiness. The change to the SQL Server configuration MDOP setting was as a result of setting up SQL Server in 2015 & 2016 according to industry best-practice guidelines. We cannot determine the guidelines followed in 2012 as we have not retained the originally installed systems from that time and this setting is not preserved in backups. Page 21 of 21