# Lofty expectations

[Content warning: mild blasphemy.]

Thanks to my friend Mike Winston for the discussion that inspired this post.

***

Look around the room. Let’s say, for argument’s sake, that you don’t see anyone else. What is the expected value of the number of people in the room, given everything you know?

“Well one, obviously,” you say. “There’s just me. No one else is here.”

Have you considered the possibility that someone’s hiding under the bed? What’s the probability of that? One in a million?

“Sure, fine,” you say. “If you want to be pedantic, it’s more like 1.000001.”

Are you sure? Have you considered the possibility that several of your friends snuck into your closet with the intention of throwing a surprise party?

“Oh, come on. I know there are a bunch of possibilities I haven’t considered, but there’s no way that if I add everything up I’ll get anything larger than 1.1. Unless you think there’s some miniscule change that there are somehow infinitely many-”

Nah, that’s boring. If you want I’ll guarantee you that there is a positive, finite number of people in the room. You think the answer will come out to less than 1.1?

“Yes, obviously. You’d be crazy to say otherwise.”

***

Everyone learns in school that there are infinitely many natural numbers, that there is no limit to how large a number can be. Few people appreciate this fact.

A googol is $10^{100}$ — one followed by a hundred zeros. This is a big number: it is larger than the number of atoms in the universe. A googolplex is one followed by a googol zeros. This is a bigger number: it is more than the number of ways that you could rearrange all the atoms in the universe. But even a googolplex does not come close in size to Graham’s number.

As my friend Mike put it: “Writing Graham’s number down in the tiniest font would flood the universe many times over. Writing down the number of universes it would flood would flood the universe many times over. And so on. In fact, writing down how many layers of universes get flooded would flood the universe many times over.” The human mind is simply not equipped to understand the sheer enormity of Graham’s number.

***

What is the probability that the number of people in your room is equal to Graham’s number?

“Zero. There is no chance, absolutely none, that there are Graham’s number people in this room.”

Why do you say that?

“What do you mean? There are only several billion people. And even if there were Graham’s number people, how could they possibly fit in the universe, let alone this room?”

Well, how certain are you that you don’t have some fundamental misconceptions that falsely lead you to the conclusion that there are not Graham’s number people in your room? Yes, the universe would need to be way larger than you think it is. And yes, your room needs to be way larger than you think it is, or maybe some people need to be extremely tiny. But these are possibilities that, while extremely unlikely, are not literally impossible.1

For instance, what probabilities do you assign to the following three propositions? (1) I’m God; (2) conditional on me being God, I have the ability to put arbitrarily many people into your room; (3) conditional on me being God and having that ability, I did in fact pack Graham’s number people under your bed just to make a point about expected values. The probability that you have Graham’s number people in your room is at least the product of these three probabilities.

What probabilities do you assign to these three events? It seems unlikely that I’m God; maybe the probability of that is $10^{-30}$. (Maybe you think there’s a $10^{-20}$ chance that there is a God whose physical form is a human on Earth, and a $10^{-10}$ chance (one in ten billion) that I’m that human.) Maybe you think (2) has a one-in-ten probability (after all, there’s a decent change that God, if they exist, is nearly omnipotent) and (3) has a one-in-a-million probability. Maybe these are overestimates, but the point is that you should assign some positive probability to each assertion, and that the product of these probabilities is tiny (perhaps $10^{-100}$, perhaps $10^{-1000}$), but certainly much, much larger than one in Graham’s number.

“Okay, hold on,” you say, “your argument seems to make sense, but something doesn’t seem right. Let $p(n)$ be the probability that there are $n$ people in this room. Then $\sum_{n = 1}^\infty p(n) = 1$. You can’t just go around assigning nontrivial probabilities to large numbers willy-nilly if you want all the probabilities that add up to one. In fact you’re only allowed to assign a probability of $10^{-1000}$ or more to at most $10^{1000}$ numbers. What makes Graham’s number so special?”

***

This is a great question, and the answer is that Graham’s number is really unique among numbers of its size: for how large it is, Graham’s number is really simple, in the sense that it is really easy to formally describe. Compare for instance the two numbers

$38674689436780789074389608748999708743896089474829$

and

$41745579179292917813953351511015323088870709282081$.

They are about equally large, but it turns out that the second number is much simpler: it is simply $3^{104}$ (whereas the first number was the result of me randomly mashing my keyboard).

Graham’s number is astoundingly simple for its size. We’re talking about a number writing down whose digits would flood the universe many times over (a number of times that would flood the universe many times over), and yet the number is precisely characterized by a short Wikipedia article.

(How do we formally define the simplicity of a number? The most commonly used metric is Kolmogorov complexity, which is the length of the shortest computer program that outputs the number. The smaller the Kolmogorov complexity of a number, the simpler it is.2)

Now suppose we randomly pick some integer $N$ of the same order of magnitude as Graham’s number. The probability that the number of people in your room is equal to $N$ really is tiny: probably less than one divided by Graham’s number. Graham’s number is special to have such a short description; $N$, on the other hand, has a shortest description which would not fit in the universe. Maybe, as God, I would decide to pick a random number on the same scale as Graham’s number and put that many people in your room. But there are so many numbers on the same scale as Graham’s number that the probability that I’ll pick $N$ specifically is extremely tiny: much tinier than $10^{-1000}$. Hypothesizing that I’ll put $N$ people in your room is an example of the “privileging the hypothesis” fallacy; hypothesizing that I’ll put Graham’s number people in your room is not.

***

“Okay, well that’s just weird. If the probability that there are Graham’s number people in this room is $10^{-1000}$, that makes the expected number of people in the room at least Graham’s number divided by $10^{1000}$, which is ridiculously large. Maybe people’s intuition for expected value is just terrible.”

Indeed, the issue here is that our intuitions are poorly equipped for super low-probability events. We tend to ignore events whose probabilities are unfathomably small (like $10^{-100}$), and on the whole this is pretty reasonable: devoting brainspace to such unlikely events almost always ends up being a waste. The possibility that there are Graham’s number people in your room is borne out in a ridiculously tiny fraction of universes.

“So, it seems we have a lower bound of Graham’s number divided by $10^{1000}$ for the expected number of people in my room. But what do you think the expected value actually is?”

***

The expected value of a finite random variable (a random variable is a quantity whose value you’re unsure of) can be infinite. This might at first be counterintuitive. But suppose that we have a random variable $X$ that equals $n$ with probability $\frac{6}{\pi^2} \cdot \frac{1}{n^2}$ (yes, those probabilities add up to one). Then the expectation of $X$ is

$\mathbb{E}[X] = \sum_{n = 1}^\infty n \cdot \frac{6}{\pi^2} \cdot \frac{1}{n^2} = \frac{6}{\pi^2} \sum_{n = 1}^\infty \frac{1}{n} = \infty$.

So we shouldn’t rule out the possibility that the expected number of people in your room is infinite. But is it?

I’m going to use this question as an excuse to talk about something called Solomonoff induction. If you have spare time, I strongly recommend reading this explanation of Solomonoff induction by Alex Altair and Luke Muehlhauser. I’ll explain the concept in brief below, but their explanation is far better and more thorough.

You’ve probably heard of Occam’s razor: it is the idea that given some set of explanations for what you observe in the world around you, the simplest of the explanations is likely the right one. This makes a lot of sense: why add unnecessary assumptions about how the world works? After all, if you can explain something with one assumption, that’s a more likely explanation than if you need ten different assumptions to all be true. For instance, we observe that some celestial objects revolve around the Sun and also that apples fall toward the center of the Earth. We could postulate that there are two different laws of physics: that things tend to fall toward the center of the Earth, and that the Sun has some sort of aura that causes things that are close to it to rotate around it. Or we could note that both observations can be explained by one simple law (the universal law of gravitation). Fewer assumptions is better because, a priori, the probability that you should assign to a bunch of assumptions all being true is less than the probability that you should assign to one assumption being true.

Solomonoff induction is just a formalized version of Occam’s razor. Like Occam’s razor, the purpose of Solomonoff induction is to separate out the most plausible explanations for a set of observations. But unlike Occam’s razor, Solomonoff induction actually puts a number on how plausible each explanation is. Solomonoff induction posits that the universe is computable, meaning that one can encode the universe into a string of bits and then run a computer program that would simulate it perfectly.3 The way the induction works is: take all encodings of all conceivable universes. Now, throw out all universes that are inconsistent with your observations about the universe you live in. You’re left with a tiny fraction of the universes you started with; these are the universes that are plausibly ours. Now, which universe do you actually live in? Well, you don’t know — they’re all plausible, based on your observations! However, the ones with shorter encodings are more likely: each bit of the encoding of a hypothetical universe adds additional complexity; it is like an additional assumption about the structure of the universe. More concretely, the probability of living in a particular universe whose encoding has length $\ell$ is proportional to $2^{-\ell}$ (up to a constant factor).4

While Solomonoff induction is completely useless for actually trying to figure out what universe we are in (how can you simulate every single possible universe?), it can still be used to draw some interesting conclusions.

In particular, let’s return to the question of the expected number of people in your room. We can make some really interesting observations. First, consider a number $N$ that is enormously complex (like the aforementioned random integer of the same order of magnitude as Graham’s number). The probability that you have exactly $N$ people in your room is extremely tiny. In particular, it is upper bounded by some constant times $2^{-\kappa(N)}$, where $\kappa(N)$ is the Kolmogorov complexity of $N$, defined earlier. Indeed, if this were not the case — if the probability were much larger — then there would be a comparatively simple universe that resulted in your room having $N$ people. But then there would be a pretty simple representation of $N$. In particular, the following computer program would output $N$: “Run [universe]. Output the number of people in [reader]’s room at [time].” But this is impossible, because by assumption there is no short program that outputs $N$! Essentially, this correspondence shows that for any integer $n$, any universe that results in there being $n$ people in your room has a complexity of at least $n$ minus some (large) constant. Now, via Solomonoff induction, this means that the probability that there are $n$ people in your room is at most $2^{-\kappa(n)}$ times some (large) constant.

But I think that there is also a similar lower bound on the probability that there are $n$ people in your room! In particular, pick some universe $U$ and then modify $U$ into a universe $U'$ by taking the physical laws of $U$ and hard-coding a clause that says “Oh by the way, at [time], [reader]’s brain gets hacked so they don’t notice anything, the laws of the universe change so as to allow arbitrary numbers of people to be in a room, and $n$ people appear in the room.” This modified universe $U'$ is conceivable (unless it is definitionally — not just physically — impossible for $n$ people to be in your room), and in fact is consistent with your observations. For any universe $U$ consistent with your observations, this derived universe $U'$ has complexity equal to at most the complexity of $U$ plus $\kappa(n)$ plus some (large) constant. That means that the probability that there are $n$ people in your room has a lower bound equal to $2^{-\kappa(n)}$ times some (small positive) constant.

In other words, there exist constants $c_1$ and $c_2$ such that for all $n$, the probability that there are $n$ people in your room is between $c_1 \cdot 2^{-\kappa(n)}$ and $c_2 \cdot 2^{-\kappa(n)}$.

But in fact there is nothing special about the quantity “number of people in your room.” The reasoning we’ve gone through holds for basically any quantity.5 Formally, then, for all random variables $X$ there exist positive numbers $c_1(X)$ and $c_2(X)$ such that the probability that $X = n$ is between $c_1(X) \cdot 2^{-\kappa(n)}$ and $c_2(X) \cdot 2^{-\kappa(n)}$ for every $n$. And that, in my view, is a rather remarkable conclusion to arrive at.

(A small note to avoid any confusion: note that $c_1$ is really small and $c_2$ is really large. This means that for small values of $n$, like $1$, $100$, or even $10^{100}$, these bounds don’t tell you very much — they tell you that the probability that the $X = n$ is somewhere between a ridiculously small number and $1$.6 This is why these bounds aren’t really noticeable for reasonably-sized values of $n$, i.e. why it’s not more likely that there is one person in the world than that there are exactly 7.53 billion people in the world. However, we do notice this effect if we take a more macro view: Kolmogorov complexity is one way of looking at why $3$ is a more common answer to real-world numerical questions than $5,204,602,658,352$ is.)

***

Now we can finally answer the question: what is the expected number of people in your room (conditioned, as previously mentioned, on this number being positive but finite)? Well, let $X$ be the random variable equal to the number of people in your room. Then for every $n$ we have

$c_1(X) \cdot 2^{-\kappa(n)} \le \text{Pr}[X = n] \le c_2(X) \cdot 2^{-\kappa(n)}$.

This means that

$c_1(X) \cdot S \le \mathbb{E}[X] \le c_2(X)\cdot S$

where

$S = \sum_{n = 1}^\infty n \cdot 2^{-\kappa(n)}$.

It looks like we have a series convergence problem on our hands! If the series converges, then the expectation is some finite, unknowable number; if it diverges, the expectation is infinity.

It turns out that the summation diverges. This is a consequence of a property of Kolmogorov complexity that we’ve already witnessed in the case of Graham’s number: namely, that the Kolmogorov complexity of really huge numbers can be quite small. It turns out that if for every $k$ you define $n_k$ to be the largest integer whose Kolmogorov complexity is $k$, then the sequence $\{n_k\}$ (known as the busy beaver sequence) grows extremely fast — faster than any computable sequence.7

In particular, this means that for some $M$, for all $k \ge M$ we have $n_k > 2^{2k}$, and so

$S \ge \sum_{k = M}^\infty 2^{2k} \cdot 2^{-k} = \sum_{k = M}^\infty 2^{k}$,

which clearly diverges.

***

You probably figured that this would be the conclusion we’d come to as soon as we established the lower bound of Graham’s number divided by $10^{1000}$. After all, we could have instead used Graham’s number squared, or Graham’s number cubed, and so forth, each time setting a larger and larger lower bound on the expectation. So why did we bother with the formalities of Solomonoff induction and Kolmogorov complexity? Well, first, it gave us a (halfway) rigorous justification for the conclusion. Second, we ended up proving (or at least justifying) a much more remarkable fact: that for essentially any real-world quantity, the probability that the quantity equals any particular $n$ lies in a range defined by the Kolmogorov complexity of $n$. And third, I hope you found this discussion interesting, even if you aren’t convinced — and if you have any questions, be sure to leave a comment!

Thanks for reading! Next in the series: what does this all mean for disciplines that apply expected values to the real world, such as decision theory and ethics?

1. One reasonable objection is that I haven’t defined “person” and “room” — and that moreover, if the universe is totally different from how we think of it, the concept of a person or room might be meaningless. For the purposes of this argument, though, you can take any definitions you want — so long as it isn’t mathematically impossible for there to be Graham’s number people in your room.

2. What programming language do we have in mind? It turns out that the choice of programming language doesn’t matter, in the sense that for any two programming languages there is a constant $c$ such that every number’s complexity in the two programming languages differs by at most $c$. But formally, the Kolmogorov complexity of a number is the length of the shortest encoding of a Turing machine that outputs the number.

3. Such an encoding would perhaps consist of the laws of physics and the initial conditions of the universe. It would also need to somehow include all quantum measurements (or at least the ones known to you, the observer), since random quantum “coin flips” influence the universe you observe. One way to do this is just to include your quantum state as part of the encoding.

4. Obviously I skipped over many details and justifications in this explanation. If you want more details, see this page. If you still have questions, comment below and I’ll try to answer them as best I can! Note also that there is a discussion to be had about the epistemic status of Solomonoff induction: where does it lie on the spectrum from “crackpot theory no one believes” to “mathematical fact”? I would say that it is a form of reasoning with strong theoretical and empirical justifications, and so I believe that Solomonoff induction, or something similar, is accurate, although I’m not confident in this.

5. It probably doesn’t hold for quantities describable in purely mathematical terms, or for quantities whose options are directly limited by your observations, but this discussion probably opens a whole new can of worms.

6. I think it starts saying interesting things for values of $n$ around $10^{10^{100}}$, because such values of $n$ take about $10^{100}$ bits to represent, which is also about how many bits it would take to describe our universe.

7. A sequence $\{a_i\}$ is called computable if there is an Turing machine that, on input $i$, outputs $a_i$.

## 21 thoughts on “Lofty expectations”

1. Jordan Schneider says:

Let X be the finite amount of money a beggar claims to be able to give you tomorrow if you give him your wallet right now. AFAICT this argument immediately makes you vulnerable to Pascal’s Mugging. Is there some technical detail of Solomonoff induction that prevents this?

Like

1. Eric Neyman says:

(For those unfamiliar, Pascal’s mugging is a thought experiment in which someone tells you “Give me $5 or I’ll torture you Graham’s number years.” Jordan is pointing out that I have argued that the probability that the person follows through on their threat is tiny but still much larger than one in Graham’s number. If a causal decision theorist accepts this, the corollary is that you should give them$5.)

Honestly I don’t know how to deal with Pascal’s mugging. This isn’t very well-thought-out, but here’s a possible argument against giving them $5: in a world where you give such people$5, people are much more likely to Pascal’s-mug you. Since you want to avoid being Pascal’s-mugged, you should precommit to not being blackmailable.

(This feels like something called functional decision theory, whose basic premise is that you should modify yourself into the kind of person who has a high utility. Maybe someone with a better understanding of FDT could comment on whether what I said above makes sense.)

Like

1. Morgan Redding says:

There was a compelling solution to Pascal’s Wager I read somewhere years ago (which is probably relevant to this post as well). The argument goes:

Consider some random variable X, and p(X=t) (your prior). Since the integral of p(t) must be 1, p(t) must fall faster than O(1/t). Hence at some point p(t) must start (and continue) decreasing faster than t increases.

Hence, if we hold the mugger’s “convincingness” (in some sense) equal, at *some* point increasing the cost of the threat starts to *decrease* the expected cost of the threat, since the credibility of the threat falls faster than the cost increases.

We can take it one step further and enforce that our prior has a defined expected value, which means it must fall faster than O(1/t^2), which makes the tradeoff for the mugger more severe, as now every time they double the cost, the prior probability they could follow through on it is cut by more than a factor of 4 (again, asymptotically).

The point at which the asymptotic behavior starts to dominate presumably depends heavily on the particulars of the situation.

Of course, just assuming priors are monotonic “eventually” isn’t really an answer to your particular problem, which is that Graham’s number is (presumably) far more likely than (say) Graham’s number minus 239483713904082.

But I think that it is true that, with the right family of priors (i.e. one that doesn’t view Graham’s number as a radically more likely possibility than the numbers around it), an agent shouldn’t be exploitable by these kinds of Pascal Muggings-type situations.

Like

1. Eric Neyman says:

Hmm so I’m not sure I agree that “we can take it one step further and enforce that our prior has a defined expected value, which means it must fall faster than $O(1/t^2)$.” Are you saying that your prior *should* fall faster than $O(1/t^2)$? I agree that this would be convenient for resolving Pascal’s mugging, but unfortunately I think it’s not the case that the prior falls this quickly (and instead falls as something like $\frac{1}{polylog(t)} 2^{-\kappa(t)}$, which is just a tad faster than $\frac{1}{t}$ — something like the “slowest possible converging series” if that makes sense). (I didn’t write about the polylog term in the post — it comes from something I haven’t yet written about but gestured at in my reply to dionisos’ comment below where I talked about a Turing machine that decides on the length of the Turing machine that simulates the universe.)

Like

2. dionisos says:

I believe the probability that there is a error in this, is much greater than the probability that there is a expected value of a infinity of people in my room.

It is really interesting, but I think you should consider it like a paradox (and not like a fun conclusion), to put you in the correct mindset to find a probable error in it.

Like

1. Eric Neyman says:

Fair enough! Do you have thoughts on where the error is?

Like

1. dionisos says:

Still not, but I am trying to find something.

“The probability that you have exactly N people in your room is extremely tiny. In particular, it is upper bounded by some constant times 2^{-\kappa(N)}, where \kappa(N) is the Kolmogorov complexity of N”

I am really unsure but I think this is false : If the probability that the program encoding the universe is of size L, is equal to 2^-L, then the probability for a particular program of size L to be the one encoding the universe should be a lot less, because there are a lot of programs of size L.

So what I mean is that is should be : P(k(N)=L) = 2^(-k(N)) and not P(N) = 2^(-k(N))
I think P(N) = 2^(-2*k(N))

Liked by 1 person

1. Eric Neyman says:

So first, congratulations, when I posted this to Facebook I wrote that there’s a mistake in this post that ultimately doesn’t interfere with the main point and encouraged people to look for it, and I think you’ve found it. I implicitly assume that the probability of $\kappa(N) = L$ is 1 for all L, but this isn’t the case because the probabilities need to sum to 1. The probability can’t even be uniform over $L$; but maybe it goes as $\frac{1}{L^2}$ of something. You suggest it goes as $2^{-L}$, which could be but I’m not sure why it would be exponentially decreasing; do you mind explaining?

Actually there’s a post I never wrote but might come back to write, which is: what’s the probability of the Turing machine that simulates the universe having complexity L? Well, maybe there’s a random Turing machine that decides this length, so the probability is proportional to $2^{-\kappa(L)}$. This unfortunately (just barely!) doesn’t converge, but you can ask: what about the random Turing machine deciding on the length of the actual Turing machine? Maybe its length is decided by another Turing machine! If you take this to its natural conclusion, you get Turing machines all the way down, and in fact a Turing machine for every (countable?) ordinal, and this gets absolutely crazy and at this point I think I’m fully untethered from reality but I think this is really cool to think about.

(By the way, the reason I say that the mistake doesn’t interfere with the main point is that the “infinite expectation” thing still holds if you replace $2^{-L}$ with e.g. $4^{-L}$ as you suggest.)

Liked by 1 person

1. dionisos says:

Thank you for your reply and for you congratulation, it is very pleasing 🙂

I agree so far none of my objection contradict the main conclusion, and I still didn’t find why it is false.
But it have to be false because, hey, this is my room, I am pretty sure I didn’t invite a arbitrary large expected number of people ! Now everyone go out of my room I will call the police ! aaaaaah !
(ok, feel free to remove that x) )

“You suggest it goes as $2^{-L}$, which could be but I’m not sure why it would be exponentially decreasing; do you mind explaining?”

I think that for Pr(|K(X)| = L), any series whose sum [absolutely?] converge toward 1 is “ok”.
Why choose 2^(-n), and more importantly why choose something exponentially decreasing ?

I am really unsure, maybe we can defend this by considering any added bit should divide the probability of a hypothesis by the same amount, whatever the current size of the theory.
In fact, I was believing you (and others) was choosing this particular series. (“More concretely, the probability of living in a particular universe whose encoding has length \ell is proportional to $2^{-\ell}$ (up to a constant factor).”)
But now I am really unsure.

After that I think it is natural to consider than any hypothesis X for which K(X) = L have the same probability.
This give a probability Pr(X=N) = Pr(K(X) = L) * 1/ |{z | ∀z, K(z) = L}| >= 2^(-K(X)) * 1/ 2^K(X) = 2^(-2*K(X))

So we have 2^(-2*K(N)) <= Pr(X=N) <= 2^(-K(N))
(I don’t know how to go further than this inequality for the apriori probability)

Like

2. dionisos says:

“maybe there’s a random Turing machine that decides this length, so the probability is proportional to 2^-K(L). This unfortunately (just barely!) doesn’t converge”

Can you say a little more about why this doesn’t converge ?

Like

3. Eric Neyman says:

So, I agree that all hypotheses of the same complexity are equally likely, and it follows from this that $Pr[X = x] = \frac{1}{2^{\kappa(x)}} \cdot Pr[\kappa(X) = \kappa(x)]$, where X is the random variable ranging over hypotheses. And the question is, for each k, what’s the probability that $\kappa(X) = k$? I don’t know have a good answer.

“Can you say a little more about why this doesn’t converge?” Sure! (Though this merits a post.)

If you think there’s a random Turing machine that decides on the size of the Turing machine that writes the universe, then it stands to reason that the probability that the size of the Turing machine that writes the universe is m is proportional to $\kappa(m)$.

For almost all binary strings of length n (i.e. integers of size roughly $2^n$), the Kolmogorov complexity of the string is roughly n. So if $Pr[\kappa(x) = k] \propto \frac{1}{2^\kappa(k)}$ then $Pr[\kappa(x) = k] \approx \frac{1}{k}$. Then

$\sum_x Pr[X = x] \approx \sum_k \sum_{x: \kappa(x) = k} \frac{1}{2^k} \cdot \cdot \frac{1}{k} = \sum_k \frac{1}{k}$

which diverges. But you might point out that maybe there’s a random Turing machine that decides on the length of the random Turing machine that decides on the length of the random Turing machine that writes the universe. If so, the sum you get is

$\sum_k \frac{1}{k \log k}$

which still (just barely) diverges. You can keep iterating:

$\sum_k \frac{1}{k \log k \log \log k}$

and so on. And these all diverge. So then you might ask: what if you look at the $\omega$-th iteration (i.e. the first infinite ordinal; see here: https://en.wikipedia.org/wiki/Ordinal_number). And the answer is

$\sum_k \frac{1}{k \log k \log \log k \dots \log^*(k)}$

where log* is the iterated log function (here the $\dots$ means “go until you can’t take logs anymore without going negative”). And this still (even more barely) diverges!

But then you can ask: what if you keep going? Will it eventually converge? I have no idea, and I don’t even know if that question makes sense. Ordinals are kind of crazy.

But um, I don’t expect everything I wrote to make sense, and at this point I’m mostly abandoning the pretense that maybe I’m actually describing how the universe works. But maybe I’ll get around to writing this out in a post.

Liked by 1 person

2. dionisos says:

“then there would be a comparatively simple universe that resulted in your room having N people. But then there would be a pretty simple representation of N”

I think this is false because the complexity could come from the reference to your room in the universe.

Take the library of Babel, it is a pretty simple universe. anyways take a random book in it and this book will have a big complexity.
In the couple (library of Babel, a reference to a particular book in this library), almost all the complexity come from the reference.

Liked by 1 person

1. Eric Neyman says:

Interesting! I’m inclined to agree with this objection, at least insofar as a counterargument isn’t immediately apparent to me. I’ll have to think more about this. (But note that this is an objection to the upper bound on the probability of N, whereas I think you intuitively find fault with the *lower bound* (i.e. think that the probability that N = Graham’s number is much smaller than I say it is).)

Liked by 1 person

3. dionisos says:

Is there any way I can edit my posts ? (I tend to make a lot of mistakes)

Like

1. Eric Neyman says:

Working on it. I wasn’t immediately able to figure out how to do this but will look into it more.

(By the way, a way to make your LaTeX render is to write $latex \frac{1}{2}$ or whatever, but without the space between space between the $and latex.) EDIT: It looks like I’d have to pay$300/year to install a plug-in that would allow users to edit their own comments. I might end up doing that at some point, but for now I’ll ask you to just reply to your own comment and say what edits you want to make. Thanks!

Like

1. dionisos says:

300$/year for this functionality seem really costly to me. (If you want I can try to see if I find something free given some information about what you are using.) Anyways I don’t want to bother you with that, and I don’t want to trigger notifications and give you modifications to do just to correct some mistakes. (I also want to let you know that it will not bother me if you modify or not modify my messages in anyway you like to make them cleaner) Like 1. Eric Neyman says: I’m using WordPress. The only solution I found online is a plug-in “simple comment editing”, which I think is what Scott Alexander uses. Unfortunately to install plug-ins you need to upgrade to a “business plan”, which costs$300/year.

(If I were to upgrade to that, it wouldn’t be just for this functionality. There are several things, such as footnotes, that would be much easier with plug-ins.)

Like

4. Nice thought! I’m always trying to convince other philosophers that infinity is more important than they think it is. But I’m not convinced by this argument. I’m already skeptical of Solomonoff induction (in particular because of just how supremely language-dependent it is – although any sequence with a well-defined limiting Kolmogorov complexity has the same limit under computable translations, for any particular number, the complexity can be changed by an arbitrarily large amount by a particular computable translation) so I’m inclined to treat this as one more reductio. I’m definitely fine with some quantities having infinite expectations (like in a St. Petersburg game) but if *all* quantities either have infinite (if they are bounded below) or undefined (if they are unbounded in both directions) expectations, then that’s a bit much. I’m about to comment on the follow-up post as well.

Like

1. Eric Neyman says:

That’s definitely reasonable. Maybe this is just wishful thinking, and I don’t have anything to back this up besides just a vague intuition, but I feel like there’s some sort of “Platonic encoding” out there that’s objectively simplest in some way, and that probabilities are derived from the notion of complexity that comes from *that* encoding.

Liked by 1 person

1. dionisos says:

Yes, I think there is a way in which a logical gate is more fundamental than the operation finding the inverse of a matrix.

Or at least, even if there isn’t a purely platonic way to decide which set of operations is simpler that the other, we reduced the philosophical difficulty of comparing the complexity of two arbitrary things, to the difficulty of comparing the complexity of set of operations of Turing machine.

Like