Chapter 5: Orbiting Jupiter

AI Safety

The above talk by Robert Miles is about AI safety and gives a great overview of the dangers of AI. I highly recommend watching it through before continuing. However, I will do my best to summarize the parts I am particularly interested in.

Miles talks about creating an AI for a boat racing game with a problematic win condition. Racing is different from Go because there is no natural midpoint between winning and not winning. Go has a score system, and whoever has a higher score wins. So we can train an AI to maximize its score without really worrying about what the other players are doing. Naturally, the researchers working on this boat race also choose to train their AI on score, assuming that high scores correlate with winning the race. What the researchers got; however, was an AI that crashed the boat into everything it could before picking up infinitely re-spawning repair tokens that gave it a few points. The AI then drove the boat in circles indefinitely, producing infinite score, but never completing the race. A target that seemed close to the desired objective produced an AI that acted wholly separate from it. Miles makes it clear that this is not an isolated incident; similar bizarre behaviour has been reproduced in many independent studies.

Miles then goes on to explain how this problem is actually a lot worse than that. He uses the example of a robot that makes tea. Such a robot only values what it has been explicitly told to value. So it would likely destroy the priceless Ming vase situated between it and the tea pot because going around the vase would slow it down. If we created a target that valued the vase, the robot would likely destroy the painting beside the vase:

When you are making decisions in the real world you are always making trade-offs, you are always taking various things that you value and deciding how much of one you are willing to trade for how much of another… An agent like this which only cares about a limited subset of the variables in the system will be willing to trade off arbitrarily large amounts of any of the variables that aren’t a part of its goal for arbitrarily tiny increases in any of the things which are in its goal. …. If you manage to come up with a really good objective function that covers the top twenty things that humans value, the twenty-first thing that humans value is probably gone forever.

Robert Miles

The primary difference between a game like Go and a racing simulation is that Go is a small game in terms of rules. It is possible to accurately model the entire game, along with its goals, in a limited amount of code. At not time is it possible for the politics in India to ever have any impact, on a single game of Go. Making tea is not that. Even simple things like ‘the weather’ can blow on the tea making it colder; anything that could interact with the tea needs to be included in its model, yet we cannot make an AI that accurately models the entire universe; we will only ever get an image. A painting is not the same as the thing that is being painted, and no matter how masterful the craftsmanship is, there will always be a separation between an object and its image. Likewise, no matter how powerful a machine learning algorithm is, it can never learn to value that separation because its theoretical groundwork is blind to it, and, as Miles Points out, anything that it is blind to is just fuel to be traded for the things it does care about. But, Turing wasn’t wrong in one key area. Many human things are perception first, and aren’t grounded in physical reality. Yet, that only makes the job of mimicking it even harder because we are now modelling an image of an image and the separation only grows.

The desire to reduce human complexity to physical and tangible objects precedes Turing. The idea that an image can replace an object is a religious ideal that has no beginning. In fact, we don’t need to dig into modern computer science to find out what the consequences of machine learning systems are. We already have target values that we use to optimize our societies, and we already have difficulty remembering that these targets are not themselves the things they are supposed to represent. The learning has been going on since before economics itself, and the metric we use today already has a name:



Chapter four of Adam Smith’s book ‘Wealth of Nations’ acts as most people’s starting point when understanding money. Smith describes money as being a practical necessity. Before money was invented, societies bartered goods back and forth. A butcher has more meat than he needs, and he would like some bread from the baker; naturally they would exchange his meat for the baker’s bread. However, maybe the baker would like the meat, but the butcher doesn’t need bread. How can they exchange goods? In Smith’s telling, the answer is currency. With a separate means of trade, which may begin with things like cattle but inevitably end with gold in all “rich and commercial”1 nations, a baker can sell bread to someone else and use the money to buy meat from a butcher. Thus, money becomes a practical necessity to facilitate trade; a catalyst that makes other economic activity possible. However, this metaphor is misleading in two very important ways.

Firstly, these pre-money barter economies never really existed, or so Frederick Kaufman argues in his book ‘The Money Plot’. For as far back as archaeological evidence will allow us to see, humans have been trading beads, or other trinkets like eggshells, as currency. Instead, Kaufman provides a different creation narrative for money. In his telling, money has always been a form of insurance, an object that protects the bearer from future harm2. Under a modern understanding of money this includes the classical interpretation as we could use that money to buy an object we do not yet know we need; however, this can also take other religious or cultural forms that aren’t understandable to modern eyes. Money can be used to buy favour from a God, protect against charms and hexes, be used to purchase comfort for a loved one in the afterlife, or guarantee the ongoing existence of a family line through marriage: all of these according to the values and beliefs of the society doing the exchange. Kaufman argues that societies have always interacted with their cultural values through the exchange of currency; even if modern economists might not recognize these objects as money3.

Kaufman argues that societies have always interacted with their cultural values through the exchange of currency; even if modern economists might not recognize these objects as money.

Secondly, this tale denies an essential nature of money. If I sell eggs for currency, then the currency I receive is just a signifier for an incomplete transaction. It is only after I use that money to buy something else that the transaction is complete. In this world, money is just an object whose presence is necessary for other more important transactions to occur. This metaphor vastly underestimates the compounding aspect of money and makes it seem like managing money is a simple act of balancing inputs and outputs. As long as the money I receive from my trade is greater than the money I spend, then I will have the cash to grow my material possessions and I will become wealthy. While not necessarily wrong, this view is naive.

Imagine a game with two boxes. One box is made of a clear material and obviously contains one million dollars. The second box is made of an opaque material, and the caretaker has informed us it has an equal chance of containing ten million dollars or nothing. I am allowed to open one box and keep whatever is inside. Which should I open?

The correct answer changes depending on my life situation. For most of us, one million dollars is a life changing amount of money. It is more than enough money to pay off debts, buy a nice house, upgrade skills, or spend several years looking for high paying and satisfying work. Ten million dollars, on the other hand, is also this, but leaves an additional nine million dollars of unspent money. So from the perspective of barter, the answer here is obvious, why would someone turn down a life changing amount of money for a chance at a life changing amount of money? Any sane person would go for the sure thing and take the million.

Of course, the above answer is wrong if we think through the problem mathematically. The probability of getting ten million dollars if fifty-fifty. If I play the game twice, win once and lose once, I will have gained ten million dollars, eight million more than I would have gained had I went for the sure thing. Sure, we could lose out on one million dollars, but every time we win we gain enough to cover ten different loses. Any sane person would go for the better odds.

The difference between these situations is easier to see if we change the rules of the game a tiny bit. Instead of money, the clear box now contains nothing, and the opaque box contains either nine million dollars or one million in debt. Which do you open? The mathematics of the situation hasn’t changed, it is still better to take the gamble. However, the consequences of losing that gamble are so much more dire for people who don’t have a life changing amount of money to throw away. A million dollars of debt is a crippling amount, and enough to either ruin, or drive to suicide, most who incurs it. Yet, if you can handle the lose, the gains still vastly outweigh the risks. Even a single win is more than enough to pay for multiple loses. On average, the person capable of taking such a risk will gain money faster than those who cannot.

There is a myth that those with money are better at handling finances than someone who doesn’t; this is false. The truth is, the more money you have, the fewer variables you need to consider while managing it. To someone living in poverty, every financial decision is a matter of life and death. Sure, they could invest ten dollars and gamble on making a hundred later on, or they can buy food and not starve to death. A person with money can afford to leave their job and search for a better one, a luxury a person without money cannot afford. A person with money can risk eviction and negotiate with their landlord for better rent, a person without money cannot. A person with money can risk exploring a new business opportunity, a person without money cannot. Money allows a person to take bigger and more frequent risks. So long as the rewards outweigh the costs, higher risk gambles increase one’s ability to make more money.

There are two mathematical rules that govern risk at this level. The first is the ‘expected value’ formula. It is calculated by multiplying our expected return by the associated risk factor4. In the earlier example, the expected value of the clear box is zero, while the expected value of the opaque box is 4 million dollars5. The second concept is the ‘law of large numbers’ which states that the more we simulate a random outcome, the closer the actual result will be to its expected value. If I play the game only once I may lose a million dollars, if I play it three more times I very well could lose every single one of those. However, the probability of getting a winning streak is the same as getting a losing streak and so if I play it enough times then the wins and the loses will balance out, and I will be earning, on average, the expected value every time the game cycles. Unlike the poor person who is constantly worried about material concerns, the only thing a rich person needs to understand is the expected value of their risks. To the wealthy, money is nothing more than a game of optimizing expected value, and to win they only need accurate information and control over the associated payouts and risks; it is a game of knowledge and power, not inputs and outputs.

To the wealthy, money is nothing more than a game of optimizing expected value, and to win they only need accurate information and control over the associated payouts and risks; it is a game of knowledge and power, not inputs and outputs.

This is why I believe capitalism works so well inside liberal systems. To a capitalist, the only thing worth calculating is risk. Stable Liberal systems, especially globalized ones, make it much easier to calculate long term risk. If two countries operate in similar ways with similar values, then a profitable gamble in one will also be a profitable gamble in another. Someone else can be paid to manage regional laws, workers and their relationship with the companies, and even the governments trying to regulate these transactions. Everything, from the business to the product, is just a variable, and as long as they remain stable for long enough, risk can be calculated and expected value can be known. So long as expected value is positive, the money will keep increasing so long as we keep gambling.

Yet, once the suits at the top of the business start outsourcing everything to other people, and contribute only by pressuring them to generate profitable gambles, the entire structure of our economy begins to look like a machine learning algorithm. The suit being the Data Scientist who defines the target, and also the top level algorithms that encourage the layers below it to get in line. So of course, if the top level target is only an image of what the company’s values, then it is only a matter of time before the company as a whole optimizes away something they may have cared about.

The Death of Jupiter Station

So who is at fault when a space station dies? The station would be primarily inhabited by three different groups of people, each with their own reasons why they would want to see it either destroyed or preserved.

I mentioned in the introduction that a religious organization shut the meta-verse down, but this is a red herring. Religion is just another variable that can be optimized on, and these people may have done the act, but they wouldn’t have been able to do so without powerful help.

The first group is those who had no choice but to be there. These people may be slaves, indentured servants, or victims of chance who can no longer leave. These people may have plenty of reasons to hate the place, but beyond an act of desperate suicide, none should want to see it destroyed, as doing so would also end their lives as well. However, as these people are on average less educated, it would be easier for someone more powerful to convince them destruction is not destruction and their rage could easily be converted into a lever for someone else to pull. They may have been involved, but they are not to blame for the stations’ downfall.

The second group are those who choose to be there. These people do have the means to save up and go wherever they want to be, and for one reason or another they choose to orbit Jupiter. Unlike the first group, these people have no reasons whatsoever to want the station destroyed, as it is already what they want it to be. The station is their life and its destruction would be their destruction as well, and they know this. Their higher levels of education also mean that their desires would be harder, but not impossible, to use as a tool against the station itself. These people are also not to blame for the stations’ destruction, and many likely did whatever they could to stop it.

The third and final group are those who could be anywhere. They have the means and the power to transform wherever they are into whatever they want it to be. Because of this, they have no attachment to any place in particular. They might like the lifestyle of the station, but they could easily reproduce it on any other station or colony in the solar system. The stations’ loss would hurt, but they can afford that loss because they have the means to build their life somewhere, anywhere, else. These are the people most at fault, because these are the only people who could be plausibly tempted by a box with a million dollars of debt. However, even these people aren’t completely at fault.

The true cause lies in the station’s purpose. It is a resource station, it wasn’t built because humans choose to be there, it was built because it provided access to the water on Europa, the geothermal energy on Io, or even the gases surrounding Jupiter itself. It exists to provide more variables that can be transformed into money, or capital, or political power, or whatever else the powerful valued. The reason it was destroyed is that the Devil whispered a lie into the ear of its most powerful inhabitant, “If you take this gamble, the expected value for you will be positive.” The suit then agreed to the game, and immediately the devil used the power of that suit’s cooperation to pay preachers, influence governments, discredit education, tell lies, and push its plan to fruition. Did the suit know the station would be destroyed? Maybe, or maybe not; the question is irrelevant, as the suit’s only job is to pressure those below him to make money. Did the suit lose a million or gain ten million from the station’s destruction? Also, a meaningless question, as money is a metaphor for value, and we can never know if the suit or the devil valued the station. Is the suit even at fault? Well, that question is also meaningless. If the suit wasn’t the type of person to gamble the station’s existence away, then they wouldn’t have been the suit. The Devil would have pressured them to leave and replaced them with someone else who would have made the gamble. All we can know is that the station itself was just another variable that the devil set to an extreme value in order to increase another number slightly.

The reason it was destroyed is that the Devil whispered a lie into the ear of its most powerful inhabitant. “If you take this gamble, the expected value for you will be positive.”

As silly as it sounds, the station was destroyed by a super-intelligent AI; Adam Smith’s “invisible hand” made real. However, instead of running on computers, the internet, or whatever “cyberspace” is, this AI runs on economics, and it is just as dangerous as Robert Miles warns us it could be. Its goal is to make money, and everything that is not explicitly money is just a resource to be converted into money: including space stations. The station’s destruction is just a natural consequence of that imperative.

However, there is one final secret I wish to reveal. If the political system that enables the stability necessary for growth becomes a variable to be modified, then the information and assumptions necessary to calculate risk degrade. The result is model collapse6; a researched phenomenon that happens when the output of an AI system is used to train the next generation of AI. These inputs only reinforce the AI’s hard coded assumption that what it believes is real actually is real, and its connection to reality degrades. It is what happens when the interrogator in the imitation game is itself replaced with a computer, and we end up trying to convince a computer that a computer is not a computer. Without humans to create images of, we end up creating an image of an image of a human which is nothing more than a hallucination, and the output of such a model becomes garbage with no connection to reality.

So just as the law of large numbers guarantees a positive expected value will create infinite wealth, so too, a negative expected value guarantees bankruptcy. When the devil errors in telling the suit that a gamble is profitable, Smith’s invisible hand will give its final present to itself: its own destruction. I lied when I said the people with means managed to escape Jupiter Station; Jupiter Station is just a metaphor. Nobody did. Nobody does. There is nowhere else to go.

  1. Smith, Adam. “The Wealth of Nations”, pp 23[]
  2. Kaufman, Frederick. “The Money Plot”. pp 9[]
  3. Mostly because they aren’t made of gold.[]
  4. Also known as the ‘average’.[]
  5. \(\frac{-1 + 9}{2}\)[]
  6. Shumailov, Ilia et al. “The Curse of Recursion: Training on Generated Data Makes Models Forget”[]

Published by


This is the personal blog of Ryan Chartier. I post all of my long form content here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.