Grading my 2021 predictions

In December 2020, I made 100 probabilistic predictions for 2021. As promised, I’ve come back to evaluate them on two criteria: calibration and personal optimism/pessimism. I also challenged readers to compete with me. More on this later, but first, here are my predictions, color-coded black if they happened and red if they didn’t.

I. US Politics

  1. Jon Ossoff wins his election: 45%
  2. Raphael Warnock wins his election: 60%
  3. Ossoff and Warnock both win their elections: 42%
  4. Democrats hold the Virginia State House: 61%
  5. Andrew Yang is elected mayor of New York: 24%
  6. The average Democratic overperformance in margin in congressional and state legislative elections, as calculated by FiveThirtyEight (see e.g. here), is at least 5%: 21%
  7. …at least 0%: 38% (this resolved to -0.3%, so very close)
  8. …at least -5%: 66%
  9. Major* legislation not directly related to COVID (excluding international agreements) passes: 45%
  10. Major* infrastructure legislation passes: 18%
  11. Joe Biden signs an executive order authorizing a major cancellation of student debt: 59%
  12. Biden is the president of the United States at the end of 2021: 94%
  13. Donald Trump receives a presidential pardon (possibly from himself): 35%
  14. Hunter Biden is charged with a crime: 15%
  15. Donald Trump is charged with a crime: 28%
  16. At least one member of the Senate stops caucusing with the party they are currently caucusing with: 23%
  17. Donald Trump has a TV show or network at some time in 2021: 21%

* For legislation to be considered major, a substantial amount of effort/political capital needs to be spent on it. Major legislation passes on average once every 2-3 years. Examples include the 2009 stimulus bill, Obamacare, and Trump’s 2017 tax law.

II. COVID

  1. I receive my first dose of a COVID vaccine by the end of March: 12%
  2. …the end of April: 34%
  3. …the end of May: 60%
  4. …the end of June: 74%
  5. …the end of July: 82%
  6. …the end of August: 87%
  7. …the end of 2021: 97%
  8. At least 50% of people living in the U.S. receive at least one COVID vaccine dose by the end of 2021: 75%
  9. At least 60%: 58%
  10. At least 70%: 43%
  11. At least 80%: 20%
  12. At least 90%: 4%
  13. Per official statistics, at least 100 thousand Americans die of COVID in 2021: 82%
  14. …at least 200 thousand: 64%
  15. …at least 500 thousand: 25% (this one was very close to being true)
  16. …at least 1 million: 8%
  17. I or one of the seven people I share 25% of my genes with tests positive for COVID: 30%
  18. I test positive for COVID: 5%
  19. I go to my office at Columbia at least once by the end of May: 32%
  20. EC is held at least partially in Budapest: 36%
  21. Canada/USA Mathcamp is held in person (I have no inside information on this): 25%
  22. SPARC is held in person (I have no inside information on this): 38%

III. Miscellaneous

  1. China is involved in an international (counting Taiwan and Hong Kong) conflict that has 1,000 casualties: 9%
  2. A normalization of relationships between Israel and at least one majority-Muslim country is initiated in 2021 during the Biden administration: 48%
  3. Putin is the president of Russia at the end of 2021: 88%
  4. Benjamin Netanyahu is the Prime Minister of Israel at the end of 2021: 57%
  5. Scott Alexander starts publishing again: 85%
  6. Taylor Swift releases her tenth studio album: 65% (45b, won’t be graded — Taylor Swift releases her eleventh studio album: 13%) (ambiguous, since she released re-recordings with new songs, but my designated ambiguous prediction resolver resolved this one positively)
  7. “Foklore” wins a Grammy for Album of the Year: 61%
  8. The third book in the Kingkiller Chronicle has a publication date set by the end of 2021 (the date doesn’t have to be in 2021): 13%
  9. Roger Federer wins a grand slam tournament in 2021: 26%
  10. Someone besides Djokovic, Nadal, and Federer wins a men’s singles grand slam tournament in 2021: 59%
  11. Serena Williams wins a grand slam tournament in 2021: 32%
  12. All women’s singles grand slam tournaments in 2021 are won by different people: 68%
  13. P vs. NP is widely considered resolved by the end of 2021: 1%
  14. A (non-trivial) update on GPT-3 is released: 62%

IV. Personal

A. Academic

  1. I summarize for my blog, or review for a journal, at least 20 papers: 75% (oops)
  2. …at least 30 papers: 65%
  3. …at least 40 papers: 48%
  4. …at least 50 papers: 20%
  5. I attend EC (counts if I go to at least five talks): 74%
  6. The paper I’m writing on aggregating predictions is accepted to a conference or journal: 73%
  7. I write and submit a paper on prediction aggregation and online learning (this is a different one from the one in #59): 77% (this happened in February 2022, which was too late)
  8. …and that paper is accepted: 47%
  9. I resolve the “preventing arbitrage from collusion” scoring rules problem: 40%
  10. My scoring rules paper from a while ago finally gets accepted somewhere: 55%
  11. I publish, or begin writing with the intention to publish, a paper following up directly on “No-Regret and Incentive Compatible Online Learning”: 45%
  12. I publish a computer science paper in a conference held in 2021 or a journal edition issued in 2021: 85%
  13. I publish a paper or note on Zipf’s law: 28%

B. Blog

  1. I write 10 or more blog posts in 2021: 92%
  2. I write 20 or more blog posts in 2021: 78%
  3. I write 30 or more blog posts in 2021: 50%
  4. I write 50 or more blog posts in 2021: 9%
  5. The total number of views of my blog in 2021 is at least 5,000: 95%
  6. The total number of views of my blog in 2021 is at least 10,000: 83%
  7. The total number of views of my blog in 2021 is at least 20,000: 64%
  8. The total number of views of my blog in 2021 is at least 50,000: 27%
  9. The total number of views of my blog in 2021 is at least 100,000: 11%
  10. (Intentionally vague to avoid spoliers) I write a blog post about big aliens: 42%
  11. I publish a blog post on setting the right price: 36%
  12. I publish a blog post about Pi: 33%
  13. I publish a blog post about slowly converging series: 40%
  14. I publish a blog post on Zipf’s law: 80%
  15. I publish a blog post on Bayesian injustice: 25%

C. Other

  1. I vote in the Democratic primary of the New York mayoral election: 93%
  2. I rank Andrew Yang first in the Democratic primary of the New York mayoral election: 51% (I ranked Kathryn Garcia first)
  3. I stick to my virtue points system, or some variation, through the end of 2021: 70%
  4. I’m a SPARC staff member in 2021: 31%
  5. I’m a Mathcamp mentor in 2021: 55%
  6. I (co-)run some OBNYC (NYC rationalist) meetup in 2021: 47%
  7. I consider myself a vegetarian at the end of 2021: 29%
  8. I consider myself a vegan at the end of 2021: 4%
  9. I make a donation of at least $500 to a third world poverty charity in 2021: 66%
  10. I make a donation of at least $500 to existential risk/long-term in 2021: 79% (I donated in January 2022, so technically this didn’t happen)
  11. I make a donation of at least $500 to animal welfare in 2021: 23%
  12. I have a tentative plan to take a gap year (or I take a gap year): 24% (an exciting development that I’ll hopefully be writing about!)
  13. I play squash on at least 10 days in 2021: 65%
  14. I play squash on at least 20 days in 2021: 44%
  15. I visit a country that is not Hungary: 23% (Iceland)
  16. I publish a non-academic piece of writing in some publication in 2021: 33%
  17. I read a book in 2021: 65% (The Scout Mindset)
  18. I read at least two books in 2021: 44%
  19. I read at least three books in 2021: 30%

Calibration

The orange line represents perfect calibration. The blue points represent how I did: for example, among predictions in the 60% to 70% bucket, 74% actually happened. The error bars represent a 95% interval around perfection given my sample size: that is, if I’m perfectly calibrated, each blue dot should be within the corresponding error bar 95% of the time.

So, this looks pretty good! Maybe this is slight evidence for underconfidence? But these results are definitely consistent with perfect calibration.

Personal optimism

For each of my predictions related to personal achievement, I assigned a (private) “importance” score. For example, “my prediction aggregation paper gets published” (#59) had an importance of 5, whereas “I attend EC” (#58) had an importance of 1. My total score for the year would then be the sum of the importance scores of all predictions that would come true. If my predictions were calibrated in terms of optimism — neither optimistic nor pessimistic about what I’d accomplish — the expected value of my score would be 49.6.

My actual score ended up being 35. This is primarily attributable to not blogging as much as I expected: I only published 12 posts (I predicted 30), and in particular I didn’t post any academic paper summaries (I predicted that I would summarize about 40 papers).

Was I incorrectly calibrated (overoptimistic) or did I get “unlucky”? Or to put it another way: if I am correctly calibrated, how surprised should I be about having only accrued 35 points? This is hard to model, because many of my predictions were correlated. But I made some assumptions, did a simulation, and found that if I were perfectly calibrated, I would get a score of 35 or less in 6% of worlds.

Modeled distribution of scores if I were perfectly calibrated in terms of optimism, compared with my actual score (35).

It’s possible that I got unlucky, but my best guess is that I was in fact improperly calibrated. In retrospect I really should have predicted that I’d end up blogging less than I did.

Mini-contest

In my predictions post, I wrote:

You can email me with your predictions for a subset of my predictions with your own prediction. I’ll judge your predictions against mine using the logarithmic scoring rule. I have the advantage of having chosen the questions, but you have the advantage of having seen my predictions. In theory this gives you the upper hand: for example, you could put down my probabilities for every event except a few where you think I clearly messed up. Consider this an exercise in second-order knowledge: figuring out how much weight to put on my probabilities despite not knowing my reasoning behind them.

Stephen Malina emailed me with five predictions that differed from mine. He outperformed me; our biggest disagreement was on the probability that I would summarize 20 academic papers on the blog. He said 60% compared to my 75%, I reviewed zero papers, and so his prediction was better. I’d say it was legitimately better, as opposed to him getting lucky. Congrats to Stephen!

Mike Winer got back to me with thirteen predictions. Most of our disagreement came from Mike being more pessimistic than me about Covid, both in terms of when I would get vaccinated and how many Americans would die. In fact I got vaccinated pretty early and fewer than 500,000 Americans died of Covid, so I ended up doing better. Better luck next time!

Where are my 2022 predictions?

I think there isn’t a lot of value for me in continuing to check my calibration; at this point I’m pretty confident that in general when I say 80%, I really do mean 80%. Other potential biases such as optimism are more interesting, and at some point I may make some personal predictions to further examine whether I have this bias. But I think most of the value of forecasting comes from having informative forecasts. To paraphrase George Washington, “Calibration is easy, young man. Classification is harder.”

And to gauge informativeness, I’d need to compete against other forecasters. I think Metaculus is pretty good at this; I’ve now made some forecasts there, though not many. I’m a bit more excited about prediction markets, where you can gauge how good you are based on how much money you make. I’m particularly excited about Manifold Markets, which I hope to use a lot over the coming years. Hopefully I’ll have a post about that soon!

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s