(Related: Lofty expectations)
In August of 1939, just as World War II was beginning, physicists Albert Einstein and Leó Szilárd wrote a letter to President Roosevelt, letting him know about the theoretical possibility of nuclear weapons and suggesting that the United States start a nuclear program. The letter resulted in the Manhattan Project, which led to the creation of the first nuclear bomb.
In sending the letter, did Einstein and Szilárd make the morally correct choice?
It was presumably clear to Einstein and Szilárd that the consequences of their letter would be enormous. All of these were possibilities at the time:
- The letter would be missed entirely or ignored.
- The letter would fall into the wrong hands and this information would end up in the hands of the United States’ enemies.
- Enormous resources would be poured into an ultimately unsuccessful Manhattan Project, costing the allies the war.
- The Manhattan Project would be a success; faced with the prospect of their countries being utterly devastated, the Axis powers surrender.
- The Manhattan Project would be a scientific success, but a ruthless American general would decide to use the nuclear weapons to bomb Germany without restraint, costing tens or hundreds of millions of lives.
- The Manhattan Project would start an arms race between two world powers, and ultimately to a nuclear war.
- The Manhattan Project would start an arms race between two world powers, and ultimately to a fraught but stable relative peace, held together by mutually assured destruction.
Some of these outcomes would be quite good, others extremely bad; this is also true of the counterfactual situation where the letter would not be sent. For this reason, it should not have been clear to Einstein and Szilárd whether sending the letter was the morally correct course of action. But faced with such a difficult moral dilemma, Einstein and Szilárd could not just give up and do whatever felt right — after all, literally billions of future lives rested on their ultimate decision. No, they had a moral imperative to think carefully, long and hard, about this choice, using their genius to weigh the costs and benefits. They needed to figure out how to weigh mundane but likely consequences (a portion of the defense budget being diverted to building nuclear weapons) against remote but extreme consequences (a total nuclear war that destroys human civilization).
Not being in Eistein and Szilárd’s situation, I have the privilege of not trying to answer this dilemma one way or the other. Instead I’d like to speak about such dilemmas in more general terms and about their consequences for decision theory and utilitarianism.
In decision theory, an agent (i.e. a person who has some decision to make) has a utility, which is a number describing how happy the agent is. The agent can then take an action (or has a set of actions to choose from), which modifies the agent’s utility in some (possibly random) way. The agent’s goal is to maximize the expected value of their utility.
For example, suppose the agent is offered the following deal: “I will lower your utility by 3 and then immediately roll a fair six-sided die and increase your utility by the amount that is rolled.” The agent would take this deal, as their utility is expected to increase by
(If the concept of utility doesn’t make sense, think of this deal as being in dollars instead of units of utility (utils). However, there is a fundamental difference between utils and dollars, which is that on the macro scale it doesn’t make sense to maximize your expected money: you probably wouldn’t want to take a coinflip that either takes all your money away or doubles your money and gives you an extra dollar. On the other hand, utils are defined to scale in such a way that you want to maximize your expected utility. It’s not obvious that a function that scales in such a way exists, but in fact it necessarily exists, under some pretty intuitive assumptions.)
When given a set of possible actions (above, there are two actions: taking the deal and not taking the deal), decision theory posits a clear choice: take the action that maximizes your expected utility.1
Closely related to decision theory is the moral framework known as utilitarianism. Morality is different from decision theory in its goal: while both are used to compare different actions, decision theory asks which action benefits you the most, whereas morality asks which action is most moral. Utilitarianism can be thought of as an impartial restatement of decision theory: take the action that maximizes global utility (i.e. the sum of everyone’s utilities).2 I am partial to utilitarianism — I think some version of it is the right way to make decisions. (Why? See this article by Scott Alexander.) So if I were in the position of Eistein and Szilárd, I would figure out which world would have the highest aggregate utility in expectation: the one where I send the letter to FDR, or the one where I don’t.
Earlier I argued that the expected value of almost any positive real-world quantity is infinity. If the quantity is allowed to be negative, then the expected value is undefined (when computing the expected value, you essentially end up adding positive infinity to negative infinity).
This presents a problem to decision theory and utilitarianism, both of which seek to maximize expected utility. How do you choose between two actions, both of which have an infinite expected utility (for you in the case of decision theory, and for the world in the case of utilitarianism)? How do you choose between inaction (i.e. an action that doesn’t change anyone’s utility) and an action whose expected utility is undefined?
We can see this theoretical consideration play out in practice if we put ourselves in the shoes of Eistein and Szilárd. On the one hand, maybe sending the letter would lead to a balance of power that would usher in an era of peace that would ultimately result in humanity spreading throughout the universe, resulting in quadrillions of happy lives that would have otherwise never existed. (Given what we know now, this might actually be the eventual outcome!) Or maybe the letter would result in a civilization-ending nuclear war, causing quadrillions of happy lives to never exist.
But really, when I say “quadrillions,” that’s a pathetic underestimate compared to what’s possible. Philosopher Nick Bostrom calculated a lower bound of on the number of human life years that we could realistically realize. Maybe (with some smaller probability) the number will be far larger than that. If we’re going to reason about expected numbers of lives, I think we’re dealing with infinities. Maybe there’s a 1% chance of lives and another 0.1% chance of lives and so on — then the expected number of lives is infinity. And even if you don’t buy that, the numbers we’re dealing with are clearly very, very big, and comparing these numbers becomes really hard.
What this boils down to is: considering low-probability extreme events gets really hairy. Maybe the problem I’m working on in grad school (which is on the theory of auctions) will lead to an economic breakthrough that will make the difference between humans dying out and humans spreading throughout the galaxy. Or maybe the economic breakthrough will lead to the development of misaligned AIs that will cause humans to go extinct. Both of these are extremely unlikely — perhaps they have probability — but in each case the consequences are enormous.
I argued (uncontroversially, I think) that Einstein and Szilárd had a moral obligation to think long and hard about the potential consequences of sending and not sending the letter, considering each one with extreme care. I am not suggesting that I have that same moral obligation every time I turn to a new problem in grad school: after all, if we were constantly consumed by carefully evaluating the morality of mundane decisions, nothing would ever get done and the world would be much worse off. But suppose I did want to reason — perhaps as an intellectual exercise — about the morality of working on the problem that I’m working on. How can I do so in a rigorous way without throwing up my hands and saying “the positive consequences of my actions have expected utility positive infinity and the negative consequences have expected utility negative infinity, so the expectation isn’t well-defined”?
In practice what we do is ignore low-probability events. “Yes, working on my problem might play out in a way that leads to an apocalypse, but the probability is so small that I can just round it to zero,” the reasoning goes. This will lead you to the right answer almost all the time — but your goal isn’t to be right almost all the time; it’s to be right in expectation. Perhaps there is a compelling justification for rounding small probabilities to zero besides the practical concern that you’ll go crazy if you don’t. But that raises the question of what cutoff to set: do you round all probabilities under 0.1% to zero? Under 0.0001%? Any cutoff here seems arbitrary.3
One other way to justify ignoring low-probability events is through a “remote worlds discounting” approach. Economics and other disciplines have a notion of time discounting: under a time-discounting model, one cares about future utility less than present utility. This is a somewhat counterintuitive notion, but it could perhaps be justified by saying that future-you is a different person from current you, and so from a decision-theoretic perspective, you should care about that person less than about yourself. I’m agnostic on whether you should value your future self less than your current self, though I think that if you should then it’s only by a very small amount. However, the fundamental notion here isn’t that you should care about future-you less than about current-you; rather, it’s that you should care about a version of you that’s very different from current-you less than about current-you. From this perspective, perhaps world-altering events are reasonable to discount because you will be a very different person after such an event. For instance, if some day all humans merge with AIs into super-beings, arguably the super-being that replaces you won’t be you anymore – it would be half-you, or something – and you should care about that person less than about yourself.
Similarly, when making moral decisions, maybe you should care about remote worlds (worlds very different from our own) less than about worlds similar to the one we live in. If so, it’s reasonable to ignore possibilities that fundamentally alter the world we live in, so long as they are sufficiently unlikely – that is, it’s the combination of weighing worlds in proportion to likelihood and how similar they are to ours that lets us round remote worlds to zero. In contrast, Einstein and Szilárd did not have that luxury: they had to weigh civilization-ending consequences that had a not-so-small chance of actually happening.
Ultimately, though, I’m not satisfied by these two suggestions for dealing with infinities in utility calculations. Ignoring low-probability events seems totally arbitrary and unjustified. Remote worlds discounting doesn’t feel too unreasonable, but is ultimately just a hack for getting a reasonable answer: a theory that I would have dismissed a priori, but that I like a little bit just because it maybe deals with this issue.
So I don’t have a general solution. But I do have a way of dealing with infinite and/or undefined expectations some of the time, in a way that’s actually satisfying.
To motivate this method, suppose you were offered the following two options: (1) I pick a positive integer at random with probability proportional to and give you utility; or (2) I pick a positive integer at random with probability proportional to and give you utility.
Both of these options have infinite expected utility, so naïve decision theory won’t tell you which is better: they’re both infinitely good. On the other hand, in some sense it’s clear that the second option is better — in particular, it’s better by one util. This is because you can “pair up” the outcomes of option 1 with the outcomes of option 2 in such a way that the outcome in option 2 is always better by one util. This notion of pairing up outcomes is known to probability theorists as coupling and it allows us to compare some events whose expectations are infinite or undefined.
To phrase this more precisely: let’s say you have to choose between two options. The utility of the first option is a random variable and the utility of the second option is a random variable . If there a coupling of and such that is well-defined (but perhaps infinite), then you should choose the first option if this quantity is positive and the second option if the quantity is negative. That is, if you can find random variables and that have the same distributions as and , respectively, but are correlated in such a way that is well-defined, then you can judge which of the two options is better.
(So in the concrete example above with utility drawn in proportion to , you could let and .)
Now, if you’re mathematically-minded, you might be wondering if this comparison method is even well-defined: perhaps there could be two different couplings, one of which will tell you to pick option 1 and the other of which will tell you to pick option 2. It turns out that the method is well-defined: in particular, for any coupling for which is well-defined, this expectation is always the same! If you’re up for a challenge (and this is a difficult probability theory question), you could try proving this. If you’d like to see a proof, here’s one on Math StackExchange.
Does this have any practical consequences for decision-making? I think it might. In particular, if you’re making a decision between two choices, both of which have infinite, or undefined, or very large, or very negative, expected utilities, it may be daunting or impossible to compute each expectation and then take the bigger number. But sometimes you can compare the choices anyway: you can consider how the median outcome of option 1 compares to the median outcome of option 2, and how the 90th percentile outcome of option 1 compares to the 90th percentile outcome of option 2, and so on. If you find that option 1 is generally better than option 2 in these comparisons (more formally, if the expectation of option 1’s utility minus option 2’s utility, over a uniform choice of percentile, is positive) then it makes sense to pick option 1.
And perhaps that’s what Eistein and Szilárd did. They asked, “in each case, what’s the worst thing that can happen?” If they sent the letter, there might be a nuclear war destroying human civilization; but if they didn’t send the letter, Germany might destroy human life through nuclear weapons anyway. “What’s the 10th percentile outcome?” Perhaps a non-civilization-ending nuclear war if they sent the letter, compared with Germany winning the war using nuclear weapons and having control over the world for decades if they didn’t. And so on. If in each comparison, sending the letter led to a better outcome, then sending the letter was the right choice.
I like this idea because it’s intuitive but not obvious. The obvious way to decide whether to send the letter is to consider various unknowns about the world (e.g. whether Germany has the capability to build nuclear weapons) and, for each setting of the unknowns, decide which option is better. The problem with this method is that it isn’t powerful: for example, maybe sending the letter would have very positive expected utility in the world where Germany had the capability to build nuclear weapons, but very negative expected utility in the world where it did not. And then you’re back to square one: computing just how positive the utility is in one case and just how negative it is in the other so you can decide which action has a higher utility in expectation over all possible worlds. In contrast, the method I’m proposing is more versatile, and thus more powerful: it allows you to pair up worlds however you want in an attempt to make the comparison as easy as possible. And while using this method might still be very difficult, it at least gives you a foothold in attempting to make the most difficult decisions in the world.
1. Decision theory gets much more interesting when there are multiple interacting agents and agents’ utilities depend on actions taken by other agents. This leads to some pretty interesting situations where people’s intuitions strongly disagree about the correct course of action.↩
2. Or the average, or something else, depending on the exact version of utilitarianism.↩
3. Plus, now it matters how you’re grouping possible outcomes together. If you set a cutoff of and there are three different ways that your decision could lead to the apocalypse, each with probability , you might reach a completely different decision if you group these possibilities together than if you don’t.↩