FiveThirtyEight just published an impressive, sophisticated model of the 2020 Democratic primaries. If you’re at all interested in the primaries, take a look — there’s a lot of cool stuff there (they also published a pretty detailed methodology which I also recommend reading).
Conveniently, if you scroll down to the bottom of their forecast and click on “Download forecast” you’ll be able to find the numbers their forecast would have produced dating all the way back to November 2019. This means that I can look at what their forecast would have said on December 28th, the day I published my 2020 predictions. Since FiveThirtyEight’s forecast wasn’t available at the time I made my predictions, my predictions were independent of their numbers (given in the chart below).1
|Event||My Probability||PredictIt Probability||FiveThirtyEight Probability|
|Biden wins Iowa||0.25||0.16||0.3|
|Sanders wins Iowa||0.21||0.36||0.24|
|Warren wins Iowa||0.1||0.11||0.15|
|Buttigieg wins Iowa||0.27||0.25||0.26|
|Klobuchar wins Iowa||0.13||0.07||0.03|
|Biden wins New Hampshire||0.21||0.15||0.22|
|Sanders wins New Hampshire||0.26||0.47||0.27|
|Warren wins New Hampshire||0.2||0.1||0.17|
|Buttigieg wins New Hampshire||0.21||0.19||0.29|
|Klobuchar wins New Hampshire||0.06||0.05||0.02|
|Biden wins Nevada||0.37||0.39||0.32|
|Sanders wins Nevada||0.33||0.37||0.29|
|Warren wins Nevada||0.09||0.08||0.18|
|Buttigieg wins Nevada||0.1||0.07||0.16|
|Klobuchar wins Nevada||0.05||0.04||0.02|
|Biden wins South Carolina||0.67||0.75||0.51|
|Sanders wins South Carolina||0.15||0.1||0.19|
|Warren wins South Carolina||0.07||0.03||0.12|
|Buttigieg wins South Carolina||0.05||0.05||0.13|
|Klobuchar wins South Carolina||0.02||0.02||0.01|
|Biden wins nomination||0.38||0.32||0.4|
|Sanders wins nomination||0.16||0.23||0.24|
|Warren wins nomination||0.17||0.1||0.16|
|Buttigieg wins nomination||0.19||0.11||0.16|
|Klobuchar wins nomination||0.06||0.04||0.02|
A natural question to ask: suppose we treat FiveThirtyEight as the ground truth (i.e. the probability of a contested convention really is 14%, and so on). Who is closer to that ground truth — me or PredictIt? Here’s a simple calculation: we can take the average distance, in percent, from my numbers to FiveThirtyEight’s, and compare it to the average distance from PredictIt’s numbers to FiveThirtyEight’s.
The average distance from PredictIt’s numbers to FiveThirtyEight’s numbers is 8.2%. On the other hand, the average distance from my numbers to FiveThirtyEight’s numbers is 5.3%. So if we treat FiveThirtyEight as the ground truth, by this metric my numbers are 1.55 times more accurate than PredictIt’s.
However there’s a reasonable argument that PredictIt is closer to the ground truth, if you sufficiently believe in the wisdom of crowds. We can compare the average distance from FiveThirtyEight’s numbers to PredictIt’s (8.2%, as previously stated) to the average distance from my numbers to PredictIt’s — this is 5.2% as it turns out. So if we treat PredictIt as the ground truth, by this metric my numbers are 1.57 times more accurate than FiveThirtyEight’s.
(Do you think that the absolute difference between probabilities is a bad measure of distance? If so, I see your point; check out this footnote.2)
When making my predictions in December, after looking at PredictIt’s numbers I made some updated guesses (the numbers in the table above are my original numbers). You might wonder whether, treating FiveThirtyEight as the ground truth, my updated numbers were better or worse than my original numbers. The answer is that after updating, by the absolute difference metric, my new numbers were slightly (1.14 times) worse.3 In other words, updating on PredictIt’s numbers made my numbers (slightly) farther away from FiveThirtyEight’s numbers. In some sense this is to be expected, given that my original numbers were much closer to FiveThirtyEight’s than PredictIt’s numbers were. But on the other hand, maybe you’d expect some sort of average (or perhaps some sort of “smart” average where I update more for things I trust PredictIt more on) to be closer to FiveThirtyEight than either my numbers or PredictIt’s numbers; it turns out that this is not the case.
So what does this all mean? One tentative conclusion is that my numbers are pretty good! In particular, it’s pretty hard to conceive of a world where my number are worse than both PredictIt’s and FiveThirtyEight’s. That would mean that the ground truth is somehow closer to both PredictIt and FiveThirtyEight than to something that’s sort of like the average of the two.
I think it’s also possible that my numbers are better than both PredictIt’s and FiveThirtyEight’s. That’s because each has their disadvantages that I can correct for. PredictIt is subject to weird biases, such as overestimating the probabilities of unlikely events and being optimistic about the chances of particular candidates (e.g. Andrew Yang). On the other hand, FiveThirtyEight’s goal is to produce a general quantitative model that works well from election to election. Not overfitting is therefore crucial, which means that FiveThirtyEight needs to construct simple models of variables such as candidate experience and establishment support. Meanwhile, I can tune my probabilities to various factors outside the scope of FiveThirtyEight’s model (e.g. the fact that Barack Obama said he would speak up if it looked like Sanders might get the nomination). The fact that I have these advantages over PredictIt and FiveThirtyEight means that I can at least aspire to outperform both.
Now, certainly I can’t say with any confidence that my predictions are better than both PredictIt’s and FiveThirtyEight’s. I find it quite plausible that FiveThirtyEight’s model is really good and that I should defer to it except in some isolated cases. I also find it plausible that, modulo adjusting for the aforementioned biases, I should defer to PredictIt’s probabilities. But at minimum, I think the above analysis is evidence that my numbers are reasonable, and that it’s at least plausible that they add something to the picture.
1. A few notes. First, I’m defining winning a state to be winning the popular vote in the state, not getting the most delegates. Second, the numbers listed for “X wins nomination” in FiveThirtyEight’s column are actually for “X wins a plurality of delegates”; I’m treating these as equivalent because in most cases they are. Third, FiveThirtyEight’s number for “contested convention” is actually their probability that no one wins a majority of delegates, which I could imagine being different in a few edge cases. (EDIT: See this FiveThirtyEight article.)↩
2. Another reasonable way of measuring the distance between two probabilities is the odds ratio. The motivation for the odds ratio is that the difference between a 1% and a 2% chance feels much larger than the difference between a 51% and a 52% chance; the odds ratio captures this feeling by considering probabilities as odds. A 1% probability is 1:99 odds; a 2% probability is 1:49 odds. So the odds ratio between 1% and 2% is (roughly 2). On the other hand, the odds ratio between 51% and 52% is , which is roughly 1.04 (close to 1 means the numbers are similar).
If you measure the distance between two probabilities by the absolute value of the log of the odds ratio4, rather than the absolute difference as before, you find that the average distance between my numbers and FiveThirtyEight’s is 0.22; the average distance between my numbers and PredictIt’s is 0.17; and the average distance between FiveThirtyEight’s and PredictIt’s numbers is 0.30. So if we treat FiveThirtyEight as the ground truth, by this metric my numbers are 1.32 times more accurate than PredictIt’s and if we treat PredictIt as the ground truth, by this metric my numbers are 1.77 times more accurate than FiveThirtyEight’s.↩
3. By the log odds ratio metric, my updated predictions were essentially equally close to FiveThirtyEight’s numbers as my original predictions were — about 1.04 times further away.↩
4. Why the log? In some sense, odds ratios are in “product space,” i.e. they are ratios of probabilities and so it’s natural to multiply them, not to add them. If you want to take the average of a bunch of odds ratios, it makes sense to first take their logs.↩