Enough Data, Not Enough Model

rogertulee
May 13, 2022
3 min read

I predicted a VP Leni victory because I followed the data. While we had plenty of social media data, we also knew of many off-model factors we did not and could not capture. The unanswerable question is if the current model is adequate to predict the electoral outcome. The answer was obvious after the election but not before, but I took the chance to publish my prediction anyway. Hindsight is 20/20.

In the cold introspection of the monumentally wrong prognosticator, I alone am responsible for being wrong. We also examine where we went wrong, so we may do better in the future and improve our model’s predictiveness. Scientific knowledge is gathered by those who make mistakes and learn from them. For that, we’re neither ashamed we got it wrong, nor apologetic for the mistakes. Learning from mistakes is part and parcel of the process in modeling behaviors..

The central question is, how did we spectacularly get it so wrong when we had the largest amount of social media data in the market? There are three possible and likely scenarios; one is the DE voters are not on social media, the second possibility is the “Shy-BBM” voters, and the last one is the campaign’s use of “Dark Social”.

The objection is that social intelligence is not useful in predicting the election in the Philippines because class DE voters are not consuming social media like ABC voters or those in the US or France like we cited for the Biden or Macron victories. This was one possible scenario we thought of but did not take seriously because Twitter behavior predicted Duterte’s victory in 2016, and most of his supporters have a heavier share of DE markets. We reasoned that if Twitter was predictive in 2016, it surely is also in 2022 since the Philippines’ internet users have only increased in six years.

Another possible scenario was the shy-BBM voters, analogous to the shy-Trump voters. This is where someone may support BBM but is not actively reacting to social media publicly because of the fear of ostracism. They wanted to avoid the strong cancel culture pervading the online scene in the lead up to the elections.

This was not a scenario we had thought of before the prediction, and while we’d like to use it to justify our failure to predict, the sheer number of voters is simply too astonishingly large to make this behavior practicable. We are examining this possibility and diving deeper into the model to monitor people who consume campaign materials without reacting with them.

We noticed the use of dark social in the 2019 campaign season, but we have limited ability to track. This has greater potential for the model’s prediction if there is data to model, but so far, the evidence has only been anecdotal. However, the previous non-reactive behavior data complements well with the dark social behavior, and perhaps a proxy index can be derived.

We will continue the postmortem until we have enough thesis to make new mistakes for the next campaigning season. The fruit of knowledge is sweet, but only the farmer knows the pain and toil. Until we can harvest a clear and present victory, we continue the work and share the discoveries with you as we find them.

"Lego People" by Scoobay is licensed under CC BY-NC-SA 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/2.0/?ref=openverse.