[Content warning: mild blasphemy.]
Thanks to my friend Mike Winston for the discussion that inspired this post.
Look around the room. Let’s say, for argument’s sake, that you don’t see anyone else. What is the expected value of the number of people in the room, given everything you know?
“Well one, obviously,” you say. “There’s just me. No one else is here.”
Have you considered the possibility that someone’s hiding under the bed? What’s the probability of that? One in a million?
“Sure, fine,” you say. “If you want to be pedantic, it’s more like 1.000001.”
Are you sure? Have you considered the possibility that several of your friends snuck into your closet with the intention of throwing a surprise party?
“Oh, come on. I know there are a bunch of possibilities I haven’t considered, but there’s no way that if I add everything up I’ll get anything larger than 1.1. Unless you think there’s some miniscule change that there are somehow infinitely many-”
Nah, that’s boring. If you want I’ll guarantee you that there is a positive, finite number of people in the room. You think the answer will come out to less than 1.1?
“Yes, obviously. You’d be crazy to say otherwise.”
Everyone learns in school that there are infinitely many natural numbers, that there is no limit to how large a number can be. Few people appreciate this fact.
A googol is — one followed by a hundred zeros. This is a big number: it is larger than the number of atoms in the universe. A googolplex is one followed by a googol zeros. This is a bigger number: it is more than the number of ways that you could rearrange all the atoms in the universe. But even a googolplex does not come close in size to Graham’s number.
As my friend Mike put it: “Writing Graham’s number down in the tiniest font would flood the universe many times over. Writing down the number of universes it would flood would flood the universe many times over. And so on. In fact, writing down how many layers of universes get flooded would flood the universe many times over.” The human mind is simply not equipped to understand the sheer enormity of Graham’s number.
What is the probability that the number of people in your room is equal to Graham’s number?
“Zero. There is no chance, absolutely none, that there are Graham’s number people in this room.”
Why do you say that?
“What do you mean? There are only several billion people. And even if there were Graham’s number people, how could they possibly fit in the universe, let alone this room?”
Well, how certain are you that you don’t have some fundamental misconceptions that falsely lead you to the conclusion that there are not Graham’s number people in your room? Yes, the universe would need to be way larger than you think it is. And yes, your room needs to be way larger than you think it is, or maybe some people need to be extremely tiny. But these are possibilities that, while extremely unlikely, are not literally impossible.1
For instance, what probabilities do you assign to the following three propositions? (1) I’m God; (2) conditional on me being God, I have the ability to put arbitrarily many people into your room; (3) conditional on me being God and having that ability, I did in fact pack Graham’s number people under your bed just to make a point about expected values. The probability that you have Graham’s number people in your room is at least the product of these three probabilities.
What probabilities do you assign to these three events? It seems unlikely that I’m God; maybe the probability of that is . (Maybe you think there’s a chance that there is a God whose physical form is a human on Earth, and a chance (one in ten billion) that I’m that human.) Maybe you think (2) has a one-in-ten probability (after all, there’s a decent change that God, if they exist, is nearly omnipotent) and (3) has a one-in-a-million probability. Maybe these are overestimates, but the point is that you should assign some positive probability to each assertion, and that the product of these probabilities is tiny (perhaps , perhaps ), but certainly much, much larger than one in Graham’s number.
“Okay, hold on,” you say, “your argument seems to make sense, but something doesn’t seem right. Let be the probability that there are people in this room. Then . You can’t just go around assigning nontrivial probabilities to large numbers willy-nilly if you want all the probabilities that add up to one. In fact you’re only allowed to assign a probability of or more to at most numbers. What makes Graham’s number so special?”
This is a great question, and the answer is that Graham’s number is really unique among numbers of its size: for how large it is, Graham’s number is really simple, in the sense that it is really easy to formally describe. Compare for instance the two numbers
They are about equally large, but it turns out that the second number is much simpler: it is simply (whereas the first number was the result of me randomly mashing my keyboard).
Graham’s number is astoundingly simple for its size. We’re talking about a number writing down whose digits would flood the universe many times over (a number of times that would flood the universe many times over), and yet the number is precisely characterized by a short Wikipedia article.
(How do we formally define the simplicity of a number? The most commonly used metric is Kolmogorov complexity, which is the length of the shortest computer program that outputs the number. The smaller the Kolmogorov complexity of a number, the simpler it is.2)
Now suppose we randomly pick some integer of the same order of magnitude as Graham’s number. The probability that the number of people in your room is equal to really is tiny: probably less than one divided by Graham’s number. Graham’s number is special to have such a short description; , on the other hand, has a shortest description which would not fit in the universe. Maybe, as God, I would decide to pick a random number on the same scale as Graham’s number and put that many people in your room. But there are so many numbers on the same scale as Graham’s number that the probability that I’ll pick specifically is extremely tiny: much tinier than . Hypothesizing that I’ll put people in your room is an example of the “privileging the hypothesis” fallacy; hypothesizing that I’ll put Graham’s number people in your room is not.
“Okay, well that’s just weird. If the probability that there are Graham’s number people in this room is , that makes the expected number of people in the room at least Graham’s number divided by , which is ridiculously large. Maybe people’s intuition for expected value is just terrible.”
Indeed, the issue here is that our intuitions are poorly equipped for super low-probability events. We tend to ignore events whose probabilities are unfathomably small (like ), and on the whole this is pretty reasonable: devoting brainspace to such unlikely events almost always ends up being a waste. The possibility that there are Graham’s number people in your room is borne out in a ridiculously tiny fraction of universes.
“So, it seems we have a lower bound of Graham’s number divided by for the expected number of people in my room. But what do you think the expected value actually is?”
The expected value of a finite random variable (a random variable is a quantity whose value you’re unsure of) can be infinite. This might at first be counterintuitive. But suppose that we have a random variable that equals with probability (yes, those probabilities add up to one). Then the expectation of is
So we shouldn’t rule out the possibility that the expected number of people in your room is infinite. But is it?
I’m going to use this question as an excuse to talk about something called Solomonoff induction. If you have spare time, I strongly recommend reading this explanation of Solomonoff induction by Alex Altair and Luke Muehlhauser. I’ll explain the concept in brief below, but their explanation is far better and more thorough.
You’ve probably heard of Occam’s razor: it is the idea that given some set of explanations for what you observe in the world around you, the simplest of the explanations is likely the right one. This makes a lot of sense: why add unnecessary assumptions about how the world works? After all, if you can explain something with one assumption, that’s a more likely explanation than if you need ten different assumptions to all be true. For instance, we observe that some celestial objects revolve around the Sun and also that apples fall toward the center of the Earth. We could postulate that there are two different laws of physics: that things tend to fall toward the center of the Earth, and that the Sun has some sort of aura that causes things that are close to it to rotate around it. Or we could note that both observations can be explained by one simple law (the universal law of gravitation). Fewer assumptions is better because, a priori, the probability that you should assign to a bunch of assumptions all being true is less than the probability that you should assign to one assumption being true.
Solomonoff induction is just a formalized version of Occam’s razor. Like Occam’s razor, the purpose of Solomonoff induction is to separate out the most plausible explanations for a set of observations. But unlike Occam’s razor, Solomonoff induction actually puts a number on how plausible each explanation is. Solomonoff induction posits that the universe is computable, meaning that one can encode the universe into a string of bits and then run a computer program that would simulate it perfectly.3 The way the induction works is: take all encodings of all conceivable universes. Now, throw out all universes that are inconsistent with your observations about the universe you live in. You’re left with a tiny fraction of the universes you started with; these are the universes that are plausibly ours. Now, which universe do you actually live in? Well, you don’t know — they’re all plausible, based on your observations! However, the ones with shorter encodings are more likely: each bit of the encoding of a hypothetical universe adds additional complexity; it is like an additional assumption about the structure of the universe. More concretely, the probability of living in a particular universe whose encoding has length is proportional to (up to a constant factor).4
While Solomonoff induction is completely useless for actually trying to figure out what universe we are in (how can you simulate every single possible universe?), it can still be used to draw some interesting conclusions.
In particular, let’s return to the question of the expected number of people in your room. We can make some really interesting observations. First, consider a number that is enormously complex (like the aforementioned random integer of the same order of magnitude as Graham’s number). The probability that you have exactly people in your room is extremely tiny. In particular, it is upper bounded by some constant times , where is the Kolmogorov complexity of , defined earlier. Indeed, if this were not the case — if the probability were much larger — then there would be a comparatively simple universe that resulted in your room having people. But then there would be a pretty simple representation of . In particular, the following computer program would output : “Run [universe]. Output the number of people in [reader]’s room at [time].” But this is impossible, because by assumption there is no short program that outputs ! Essentially, this correspondence shows that for any integer , any universe that results in there being people in your room has a complexity of at least minus some (large) constant. Now, via Solomonoff induction, this means that the probability that there are people in your room is at most times some (large) constant.
But I think that there is also a similar lower bound on the probability that there are people in your room! In particular, pick some universe and then modify into a universe by taking the physical laws of and hard-coding a clause that says “Oh by the way, at [time], [reader]’s brain gets hacked so they don’t notice anything, the laws of the universe change so as to allow arbitrary numbers of people to be in a room, and people appear in the room.” This modified universe is conceivable (unless it is definitionally — not just physically — impossible for people to be in your room), and in fact is consistent with your observations. For any universe consistent with your observations, this derived universe has complexity equal to at most the complexity of plus plus some (large) constant. That means that the probability that there are people in your room has a lower bound equal to times some (small positive) constant.
In other words, there exist constants and such that for all , the probability that there are people in your room is between and .
But in fact there is nothing special about the quantity “number of people in your room.” The reasoning we’ve gone through holds for basically any quantity.5 Formally, then, for all random variables there exist positive numbers and such that the probability that is between and for every . And that, in my view, is a rather remarkable conclusion to arrive at.
(A small note to avoid any confusion: note that is really small and is really large. This means that for small values of , like , , or even , these bounds don’t tell you very much — they tell you that the probability that the is somewhere between a ridiculously small number and .6 This is why these bounds aren’t really noticeable for reasonably-sized values of , i.e. why it’s not more likely that there is one person in the world than that there are exactly 7.53 billion people in the world. However, we do notice this effect if we take a more macro view: Kolmogorov complexity is one way of looking at why is a more common answer to real-world numerical questions than is.)
Now we can finally answer the question: what is the expected number of people in your room (conditioned, as previously mentioned, on this number being positive but finite)? Well, let be the random variable equal to the number of people in your room. Then for every we have
This means that
It looks like we have a series convergence problem on our hands! If the series converges, then the expectation is some finite, unknowable number; if it diverges, the expectation is infinity.
It turns out that the summation diverges. This is a consequence of a property of Kolmogorov complexity that we’ve already witnessed in the case of Graham’s number: namely, that the Kolmogorov complexity of really huge numbers can be quite small. It turns out that if for every you define to be the largest integer whose Kolmogorov complexity is , then the sequence (known as the busy beaver sequence) grows extremely fast — faster than any computable sequence.7
In particular, this means that for some , for all we have , and so
which clearly diverges.
You probably figured that this would be the conclusion we’d come to as soon as we established the lower bound of Graham’s number divided by . After all, we could have instead used Graham’s number squared, or Graham’s number cubed, and so forth, each time setting a larger and larger lower bound on the expectation. So why did we bother with the formalities of Solomonoff induction and Kolmogorov complexity? Well, first, it gave us a (halfway) rigorous justification for the conclusion. Second, we ended up proving (or at least justifying) a much more remarkable fact: that for essentially any real-world quantity, the probability that the quantity equals any particular lies in a range defined by the Kolmogorov complexity of . And third, I hope you found this discussion interesting, even if you aren’t convinced — and if you have any questions, be sure to leave a comment!
Thanks for reading! Next in the series: what does this all mean for disciplines that apply expected values to the real world, such as decision theory and ethics?
1. One reasonable objection is that I haven’t defined “person” and “room” — and that moreover, if the universe is totally different from how we think of it, the concept of a person or room might be meaningless. For the purposes of this argument, though, you can take any definitions you want — so long as it isn’t mathematically impossible for there to be Graham’s number people in your room.↩
2. What programming language do we have in mind? It turns out that the choice of programming language doesn’t matter, in the sense that for any two programming languages there is a constant such that every number’s complexity in the two programming languages differs by at most . But formally, the Kolmogorov complexity of a number is the length of the shortest encoding of a Turing machine that outputs the number.↩
3. Such an encoding would perhaps consist of the laws of physics and the initial conditions of the universe. It would also need to somehow include all quantum measurements (or at least the ones known to you, the observer), since random quantum “coin flips” influence the universe you observe. One way to do this is just to include your quantum state as part of the encoding.↩
4. Obviously I skipped over many details and justifications in this explanation. If you want more details, see this page. If you still have questions, comment below and I’ll try to answer them as best I can! Note also that there is a discussion to be had about the epistemic status of Solomonoff induction: where does it lie on the spectrum from “crackpot theory no one believes” to “mathematical fact”? I would say that it is a form of reasoning with strong theoretical and empirical justifications, and so I believe that Solomonoff induction, or something similar, is accurate, although I’m not confident in this.↩
5. It probably doesn’t hold for quantities describable in purely mathematical terms, or for quantities whose options are directly limited by your observations, but this discussion probably opens a whole new can of worms.↩
6. I think it starts saying interesting things for values of around , because such values of take about bits to represent, which is also about how many bits it would take to describe our universe.↩
7. A sequence is called computable if there is an Turing machine that, on input , outputs .↩