Jelle’s marbles have skill

Do you miss the pre-COVID excitement of watching sports? Are you bored right now, looking for some way to entertain yourself? Welcome to the world of Jelle’s Marble Runs.

Round 1 of the 2020 Marbula One (intro ends at 1:45)
Round 1 of the 2020 Marble League (intro ends at 2:50)

In the Marble League, 16 teams of marbles — the Raspberry Racers, Minty Maniacs, and Crazy Cat’s Eyes to name a few — compete in a variety of events. Who can make it down a dirt road the fastest? Who can stay in a funnel the longest? Who can travel the farthest down a narrow beam?

I discovered Jelle’s Marble Runs through a Last Week Tonight episode, and at first I assumed that even though I like watching sports, I wouldn’t like marble racing. After all, there’s no people working hard to get better, no skill being measured — I might as well be watching a random number generator!

But I decided to give it a shot. I watched some events of Marble Leagues from previous years, and I rather liked the sport! I found myself developing feelings about the various teams. The Hazers were awesome; the Raspberry Racers were pretty good too… and who in their right mind would be a Hornets fan? And then I noticed that I wanted the Hazers to win because they’d done well in the previous few events I’d watched. Even though I knew that each event I was watching was totally random, I was desperately hoping that this wasn’t so. I wanted there to be a story to the randomness — maybe the Hazers are actually a good team and not just getting lucky! — and so I suspended disbelief, rooting for the teams whose victories would fit into a clean narrative, a convenient way to explain away the randomness.

But then… I got suspicious. It was subtle, but I felt that some teams really were doing better than you could explain by chance. So I decided to test this theory: that some teams really are more skilled than others, and are more likely to do well. Fair warning: my analysis will contain spoilers for the 2020 Marble League (watch here) and Marbula One (watch here)! If you’d like to watch some of these events spoiler-free, do so before reading on!

Here is the raw data for the 2020 Marble League. There were 16 teams and 16 events. (What were the events? See this footnote.1) In the table below I’ve listed how each team performed in each event: I gave 15 points for 1st place, 14 points for 2nd, down to 0 points for 15th.2

Figure 1: Team placements in the 2020 Marble League (15 = 1st place, 14 = 2nd place, etc.)

The most natural way I can think of to measure how much skill is involved in the Marble League is the variance of the numbers in the “Sum” column. Think of it this way: if all the team placements in each event are completely random, the teams will likely end up with pretty similar scores at the end. But if some team always came in first, another team always second, and so on, there would be a huge amount of variance in the teams’ total scores.

Thus, one natural way to check whether there’s skill involved in the Marble League is to compute the variance of the “Sum” column in Figure 1, and check whether this number is considerably higher than the variance you’d expect if team placements were decided completely at random. The variance of the sums turns out to be roughly 645. To see how this compares to what you would expect under the null hypothesis (no skill, everything is random), I ran simulations that awarded team placements in each event randomly. Here’s what I found:

Figure 2: Probability distribution over the variance of Marble League team scores, if team placements were uniformly random and independent for each event

The 2020 Marble League had much more variance than you would expect if there were no skill involved: there was more variance in the 2020 Marble League than in 98.5% of all simulations I ran! Put another way, if the Marble League were entirely random, it would be quite unlikely (only a 1.5% chance) that teams would end up as separated as they actually ended up. This alone is fairly strong evidence that the Marble League is, in fact, not random — that some teams are more skilled than others.

Teams, of course, are made up of individual marbles. So it’s natural to ask: can we somehow check how skilled individual marbles are? This task is difficult with the Marble League, because there are many team events and an individual’s performance on a team isn’t easy to isolate. But in Marbula One — a different marble race that’s modeled after Formula One (as opposed to the Marble League, which is modeled after the Olympics) — it’s pretty easy to measure the performance of individual marbles.

The Marbula One — also a competition between 16 teams3 — consists of eight races. Each of the competing teams sends two representatives, each of which competes in four of the eight races against representatives of the other fifteen teams. For example, the Savage Speeders sent Speedy and Rapidly. Speedy competed in events 1, 3, 6, and 8, and Rapidly in events 2, 4, 5, and 7.

Just as before, I assigned each team 15 points for coming in first place in an event, 14 for second place, and so on. This time, though, I paid attention to which team member got how many points (out of a maximum possible 60).

Figure 3: Total points scored by each marble in the 2020 Marbula One (1st place = 15 points, 2nd place = 14 points, etc.)

So now we can ask the same question, but for individual marbles instead of teams: is the variance in the points scored by the 32 competing marbles substantially larger than we would expect if each race were completely random? The variance in the actual 2020 Marble League (i.e. the variance of the numbers in the third and fifth columns) is 134; how does that compare to what you’d expect if everything were random?

Figure 4: Probability distribution over variance of Marbula One scores of individual marbles, if placements were uniformly random and independent for each race

The answer is even starker than in the previous chart: the 2020 Marbula One had much more variance than you would expect if everything were random. In fact, there was more variance in the 2020 Marbula One than in 99.1% of all simulations I ran! If you still believed that marble racing was entirely random, this should have come as a big surprise: if marble racing is random, there’s a less than 1% chance of seeing so much spread in the performance of individual marbles.

The other thing that really stands out to me in Figure 3 is Mary. Poor Mary — she (it?) came in last in three of her four races, and second-to-last in the fourth race! What are the odds of having some marble do as poorly as Mary if everything were completely random? I checked for this as well, and the answer is: just 0.15%. Mary clearly has anti-skill.

To me, all of these results taken together are pretty conclusive evidence that some marbles are more skilled than other marbles. Now, there’s not that much skill involved. For example, if the same team always came in first, the same team always came in second, and so on, the variance in the 2020 Marble League would have been 5440, much much larger than 645. But it’s not a trivial amount of skill either!

So, Jelle’s marbles have skill. But why?

I could come up with two possible explanations. The first is that the marbles are somehow different from each other. Maybe some are a bit heavier than others, or maybe have their weight distributed differently, or are smoother, or closer to being perfect spheres. The second is that Jelle does multiple takes, and selects the best one for advancing a narrative. I favor the first explanation; here’s why.

Figure 5: Scatter plot of scores, for each team, of the two teammates who participated in the 2020 Marbula One

There’s no correlation between how one teammate performed in the 2020 Marbula One and how the other teammate performed. This is what you would expect if individual marbles had skill, but particular teams didn’t have skill beyond just getting lucky in terms of which marbles are on the team. And that’s exactly what you would expect under the first explanation: the company that manufactured the marbles wasn’t perfect, and some marbles ended up different from others. But the particular way that the marbles were painted after they were manufactured (the Orangers are orange, and so on) doesn’t affect their skill.

On the other hand, if Jelle selectively chose which take to upload to YouTube in order to advance a narrative, I would have expected there to be a positive correlation in Figure 5. If Jelle wanted to create an illusion that particular marbles were better than others, he would likely choose for whole teams to be skilled, as opposed to some arbitrary collection of individual marbles.

So in conclusion, it is absolutely the case that some of Jelle’s marbles are more skilled than others. And if I were to guess, I’d say that this skill originates in the marble manufacturing process.

1. E1: balancing; #2: halfpipe; E3: funnel endurance; E4: Newton’s cradle; E5: long jump; E6: 5m hurdles; E7: block pushing; E8: triathlon; E9: sand mogul race; E10: 5m sprint; E11: black hole funnel; E12: relay run; E13: high jump; E14: team aquathlon; E15: collision; E16: marble marathon.

2. The Marble League assigns 15 points for third place, 20 for second, and 25 for first, but I decided that my system was a more natural way to compare marbles on skill.

3. These aren’t the same 16 teams as the ones who competed in the 2020 Marble League. There are 28 teams total in Jelle’s universe, and there are qualifying tournaments to decide who competes in each league.


4 thoughts on “Jelle’s marbles have skill

  1. Another reason why it probably isn’t staged is basically of the sheer difficulty of staging some of the events. The domino-magnet-ball-bearing launchers in Event 15, the collision round, where the O’Rangers famously got dead last in an event they had a history of being good in, had to be reset *32 times.* Of course, since the O’Rangers were 20 points ahead of 2nd Place, many people were suspicious that the O’Rangers blowing it was rigged so that the Final Event wouldn’t be predictable, despite the fact that the Collision Event is almost just pure entropy at work (granted it’s interesting how the Speeders got 16th last year and 1st this year while the O’Rangers had the opposite) with no real way to rig it, and also that there literally would not be enough time to brute-force doing retakes until the ideal result was achieved.


  2. Great analysis! I had assumed that manufacturing likely caused slight differences in smoothness/weight/roundness, so this makes a lot of sense. #MidnightWisps


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s