Election data quash Marcos' cheating pattern claim

The most useful and solid conclusions can be drawn from election data that are transparent and capable of being corroborated

Gemma Bagayaua Mendoza and Wayne Manuel

8:0:0am May 14, 2016

10:21:14am May 14, 2016

MANILA, Philippines – Now that the hash mismatch issue has been explained, is there any truth to the claim that votes for vice-presidential candidate Ferdinand Marcos eroded at a "distinct pattern" after the error in displaying the character "Ñ" was corrected in the transparency server results files?

This question triggered a graph-making frenzy online among supporters of Marcos and those of his rival, Camarines Sur Representative Leni Robredo. 

Our own analysis of the unofficial, partial results data from the Commission on Elections (Comelec) mirror server shows that this argument is erroneous. We embedded the spreadsheet with our summaries of various cross sections of the data to help you analyze this issue on your own.  

The Marcos camp said that the alleged “cheating” script was introduced to the transparency server at around 7:30 pm on May 9. After that, they said, Robredo started gaining over Marcos. 

The data, however, show that Marcos’ total lead against Robredo still increased for an hour after the correction to the "Ñ" in the result files was introduced.

The correction was made so that entries that used "Ñ" would read correctly instead of having "?" in its place. Thus, entries like "SE?ERES" and "OSME?A" would read correctly as "SEÑERES" and "OSMEÑA".

This is why the Comelec earlier described the fix as a "cosmetic change" that did not alter the results. Former Comelec chair Sixto Brillantes Jr, who will represent Marcos in the canvass, also agreed.

What we saw in the data

1. Were the votes that Marcos was receiving dropping? Yes. But this can be explained by the total transmission pattern. Very soon after polls closed, the bulk of polling precincts transmitted their results. This is illustrated in the graph below, which compares transmission times for the past 3 elections in the Philippines which used the automated election system. 

Looking at patterns, it is apparent that after polls closed – from 6:30 pm to 6:40 pm – the initial surge in transmitted precinct results pushed Marcos' total votes from 1.79 million to 4.78 million. His lead over rival Robredo in total new votes, however, decreased in the succeeding 10-minute intervals.

Ten minutes after the big boost, Marcos got only an additional one million-plus votes vs Robredo's 984,494 votes. Another 10 minutes later, he got only an additional 697,515 votes vs Robredo's additional 613,285 votes. 

Below is a snapshot of the summaries we made of votes for vice president per 10 minute intervals. (You can see the full data summary under Tab A of the spreadsheet embedded below this story.) 

VOTES BREAKDOWN. Table shows how many votes Marcos and Robredo received per 10 minute interval

VOTES BREAKDOWN. Table shows how many votes Marcos and Robredo received per 10 minute interval

Note that Marcos' lead over Robredo peaked at 8:30 pm at 938,622 votes, a full hour after the big boost, and also about an hour after the hash in the results file changed. This means that the change in the hash had no adverse bearing on his votes. He was still getting a bigger slice of new votes being transmitted. 

The results data also show that way after Marcos was overtaken by Robredo in the unofficial count at 3 am of May 10, there were periods in the transmission when Marcos received more votes than Robredo, further belying the argument of a "distinct pattern of erosion" in his votes.

2. If the transmission pattern is the reason why Marcos' votes were going down, and even the new votes Robredo was receiving were going down, how was Robredo able to overtake Marcos?

The explanation lies in bailiwicks and transmission rates. This is a snapshot of the map of regions. Those colored red favored Marcos more, while those colored yellow favored Robredo.

SOURCES OF VOTES. The map shows which vice-presidential candidate got the most votes in each province and key city

SOURCES OF VOTES. The map shows which vice-presidential candidate got the most votes in each province and key city

If you graph the percentage of precincts from each region that had already transmitted at a particular point in time, it would look like the interactive graph below. Hover over the lines to see the transmission level at each particular point in time. 

The interactive graph below shows precinct results accumulated over time. It clearly shows that in the regions that favored Robredo, precinct results transmission was not as fast.

As of 10 pm on election day, May 9, the NCR was almost done transmitting with 94.65% of precincts reporting. Transmission in the Ilocos region was also at 85.88%, while that for Region II was at 83.58%.

Altogether, these regions, known bailiwicks of Marcos, had already delivered 4,540,787 votes for him by 10 pm.

Note that the lines break at some points in the graph. This indicates that no precincts reported from that particular region for that specific hour. You can also double check this graph against the data summaries in the spreadsheet embedded above. 

As of 11 am of May 12, Robredo’s lead over Marcos in the Bicol region amounted to 1.3 million votes. In VII (Central Visayas), she was leading by 700,000 votes. In Western Visayas (Region VI), the gap between her and Marcos was almost 1 million.

Bicol, Robredo's home region, and Western Visayas, home region of administration standard-bearer Manuel "Mar" Roxas II, were both rather slow in transmitting results. Western Visayas had transmitted only 77.99% of results at that time, and has, so far, given Robredo a lead of almost 1 million over Marcos.  

Even as of 3 am of May 10, when Robredo finally overtook Marcos in the count, these regions still had a significant number of untransmitted votes. Only 88% of precincts from Bicol had transmitted at the time, while Western Visayas' transmission rate was only at 87.28%. 

Compare the graph above with the graph below which shows at what point Robredo started overtaking Marcos.  

Robredo first took the lead in the VP race at 3:29 am on Tuesday, the day after election day. At first, her lead was miniscule, a mere 575 votes. 

At 3:48 am with 87.60% of precincts reporting, Robredo continued to pick up the pace at 12,899,569 million votes while Marcos had 12,890,683 votes – a difference of 8,886.

You can examine the votes per region further if you go to Tab B of the spreadsheet. Colors in the spreadsheet indicate if the balance of total votes from the region favors Marcos (red) or Robredo (yellow) when the difference between the votes of the two candidates is computed. Again you can view the full data summaries and make your conclusions by examining the spreadsheet embedded below. 

MARCOS VS ROBREDO. The colors of each square on this graph indicate who is the candidate favored by that region. The number is the candidate's lead for that region

MARCOS VS ROBREDO. The colors of each square on this graph indicate who is the candidate favored by that region. The number is the candidate's lead for that region

The end of the 9 pm mark on May 9 indicates a point in the transmission when the number of precincts that transmitted dipped to 2,879 nationwide. The number of precincts reporting picked up again at 10:30 pm as shown in the snapshot below.

Again, you can examine and audit the data by navigating through the spreadsheet embedded below.  

PRECINCTS REPORTING. This table summarizes number of precinct results received per hour.

PRECINCTS REPORTING. This table summarizes number of precinct results received per hour.

There are different reasons for varying transmission rates. Some precincts were unable to transmit immediately because (1) there were still voters voting; (2) some precincts encountered signal issues, which is common in far flung areas; (3) failure of elections, which means elections in an area will have to be conducted another day.

Election officers being required to entertain voters within 30 meters from a precinct at closing time was an added factor in transmission delays.

Analyze the data and make your own conclusions

We embedded our summaries so you can independently make your own conclusions. We made the following cross sections of the data:

Tab A summarizes votes for VP at the end of each 10-minute interval. This sheet shows you the following information: total votes Marcos and Robredo received at each interval; new votes received at the end of each interval; the difference between the votes of Marcos vs the votes of Robredo for that interval. Colors in the spreadsheet indicate who got the most votes for that interval. 

Tab B summarizes votes for VP every hour for each region. The numbers depict vote increments at the end of that period for a candidate (Robredo and Marcos) less the votes for the other candidate. Colors depict who got more votes for that hour (yellow for Robredo, red for Marcos). Darker colors depict a high density of votes for that region at that moment in the unofficial canvassing timeline. Scroll to the right to see all regions. Scroll down to see later transmissions.

Tab C summarizes total votes received for all candidates per region. The darker the color, the more votes received at that hour. 

Remaining tabs show transmission status for each region at each point in time so you can compare the numbers with the votes. These are provided in percentage and in actual number of precincts reporting. Again, scroll to the right to see all regions. Scroll down to see later transmissions.

Note that our analysis is based on the time stamps of each precinct, not on the frequency of updates coming from the Comelec transparency and mirror servers.

The system works this way: new precincts are just appended as additional rows in the results file. This means that even if you analyze only the last version of the file, you should see all the precincts that had transmitted so far, including their time stamps. 

Note that it is easy to produce graphs, but drawing conclusions based on incomplete and flawed data is both dangerous and irresponsible. As data scientists say, the most useful and solid conclusions can be drawn from data that are transparent and capable of being corroborated. – with Russell Shepherd, Dominic Gabriel Go, and Joben Ilagan/Rappler.com