Playing to Win: TicTacToe

While attending university, I spent a couple of summers working as a counsellor at various overnight children’s camps. One year, we played a game where each senior counsellor would set up an activity. Each cabin, led by an activity leader, would compete to see who could complete the most activities in a set period of time. The activity I put together required one volunteer from each cabin to challenge me to a game, anything they could come up with. If they beat me, they would get my point. Most of the challenges were things I was doomed to fail at from the start: a cartwheel contest, staring contest, a race to see who could count to ten the fastest. However, the one that stuck with me the most is the one that I shouldn’t have lost at all. A child foolishly challenged me to a game of TicTacToe. After trading ties for a few rounds, something strange happened: I lost.

TicTacToe is the frictionless surface of the game theory world; it’s less a game, and more a theoretical demonstration of what it means for a game to be unwinnable. If two players enter the game, and play perfectly, it is impossible for either player to win; this fact is common knowledge. I knew it, and the child across from me knew it too. Yet, she still won, I still signed her cabin’s activity sheet proving that she won, and I went home knowing that I lost the last game of TicTacToe I would ever play for real stakes.

Discussions about TicTacToe are common in the world of computer programmers. It is the perfect first project for anyone trying to teach themselves game theory and computer AI. A computer sees a game like TicTacToe as little more than an optimization problem: given any particular board state, return the optimal next move. Thus, all discussions around the game are discussions about what constitutes the most optimal move. The most important position, and thus the most discussed, is the opening position. The most common argument goes as follows.

  1. The best move is one that is most likely to result in a win.
  2. Assuming we are playing against a player who plays randomly, the best move is the one that creates the most losing opportunities for the opposing player.
  3. <math>… 7 > 4 …</math>
  4. Therefore, playing in the corner is best.

This discussion might include some scientifically collected statistics of game results pitting various levels of computer AI against each other[efn_note]https://blog.ostermiller.org/tic-tac-toe-strategy/[/efn_note], or an admission that the analysis applies only to perfect play and more analysis is necessary to account for imperfect play[efn_note]https://funpaperandpencilgames.blogspot.com/2019/02/tic-tac-toe-strategy-tutorial.html[/efn_note]. However, the results are always the same. The reader learns some interesting facts about what move is best in certain situations, but is still just as likely to lose to a child when challenged to real stakes: as I did.

The problem is that such discussions don’t really talk about winning at TicTacToe, they are about beating dumber computer programs, which highlights a fundamental difference between how humans and computer approach games and decision making in general: randomness. Human intelligence is utterly incapable of randomness, while artificial intelligence depends almost completely on it[efn_note]Alright, for the geek reading this. It’s not really random numbers; computers can’t do that either. Instead they use pseudo random numbers. An algorithm that deterministically generates numbers that are random enough for their purpose. The difference though is meaningless unless you are a hacker, cryptographer, or speedrunner.[/efn_note]. In TicTacToe, randomness represents the baseline player; it is the dumbest possible computer program we are capable of generating[efn_note]With the exception of one that is actively trying to lose of course.[/efn_note]. However, it does not represent the dumbest possible human strategy. In fact it is not a possible human strategy at all, nor is it a reasonable approximation of one.

To demonstrate, take a look at the following position.

Now, try as hard as you can to place yourself in the body of a ten year old who knows nothing about the game. It’s your turn to play. Where do you move?

A random player is just as likely to play in any open position, but did you pick randomly? If not, why did you place the piece where you did? Did if feel strongest? Do you know it’s strongest? Did it fulfill some sort of pattern in your brain? If you did pick randomly, how did you choose which one was your random choice? Did you simply pick the one that feels the most random?

How humans feel out what position to play in is not a fundamentally part of game theory, but it is of absolute importance when talking about winning at real games. Whatever your answer, hold onto it; we’ll get back to it soon.

Some AI Basics

First let’s go over a simple but important concept quickly: minimax tree search.

In the above position, it’s black’s turn to play. Who wins?

Black wins. Playing in the top left hand corner ends the game with a win. They could, if they want, play somewhere else, but that would allow red to win or force a draw and is a mistake. For the time being we will assume players don’t make those. A game is said to be ‘winning for black’ if it’s black’s turn and they have at least one move that wins.

We have now travelled one turn back in time. It’s red’s turn. Who wins?

Black still wins. Black has two winning moves available to them and red can only block one of them on their turn. Thus this position is still ‘winning for black’ because all of red’s moves transition the game to a position where black has at least one winning move.

Through computer analysis, we can analyze every state of the game starting from winning positions and working backwards to decide who wins in every stage. If the current player has any winning move available to them then the position as a whole is winning for that player. If all of the moves available are losing for them then the position as a whole is losing as well. Anything else is a tie. The reason TicTacToe as a whole is considered an unwinnable game is because none of the moves available at the start of the game are winning.

The following widget will allow you to play TicTacToe against yourself or a computer player (by clicking on the AI button). It includes an option to “show hints”. If selected, each empty positions will gain a highlight: green indicates that the move is winning for the current player, red means losing, and yellow is a tie. Before moving on, I recommend playing around with the widget until you understand this concept thoroughly[efn_note]To learn more about how I built this I suggest this blog post by Pascal Pons: https://blog.gamesolver.org/. He is solving connect4, but much of the theory is applicable to both games.[/efn_note].

The Opening Positions

Let’s consider the available opening moves.

The corner is the simplest opener. Either the opponent plays in the centre, or they lose.

The centre opening is a simple 50-50. For computers, half of the available squares are losses. For humans things simplify around corners and edges. All of the corners lead to one outcome, while all of the edges lead to another. The probability of winning is then just the probability that the player you are up against is the type of person who doesn’t play in corners or edges.

The edge opening is the hardest to understand completely. Like centre, the computer player is looking at a solid 50-50. However, there are no easy rules of thumb a human player can use to memorize all safe positions. Instead of just deciding between edges and corners the opponent now has to also consider distance; It’s less obvious which corners and which edges are safe.

If we considering only winning and losing positions, the corner opener certainly seems like the best option. There is only one response, and this simple fact is commonly why it’s considered the best; all other openers simply offer less losing moves. The edge opener feels like the worst, as it is mathematically inferior to the corner opener. Not only is the center square safe, but others are as well. However, corner opener has one profound weakness. The singular correct response for red is also the one square simply begging to have a token placed on it. Earlier, did your inner ten year old play in the centre? I won’t say they did, but I will confidently bet that much more than one in eight of you did. I say this for one main reason; it is the most symmetrical. It has the most triples running through it and therefore it just feels more powerful to anyone aware of the goal of the game.

Consciously or unconsciously humans always have a reason for everything we do, even if that reason is simply to create an subjectively aesthetic pattern. Positions don’t exist in a vacuum, they are always in relationship with each other. In any situation where we don’t know the correct answer, the action we take will still fulfill some internal criteria: maybe we placed our token next to the black tile, maybe across from it[efn_note]It’s a similar effect to the one that causes 7 to feel like the most random number from one to ten. Some (nonscientific but fun nonetheless) data can be found here.[/efn_note], or maybe we took the answer from some unrelated part of our environment (decision anchoring). Either way, simply counting how many safe squares are available to the opponent is not a good indicator of how good that opener is. We must also take into account how likely it is for a human to know the correct answer, and also how much the wrong answers feel like right answers.

Corner play loses to both of these. Our internal pattern matching system tells us that there are three, not eight, possible replies to opening corner: corners, edges and the centre[efn_note]Yes there is some technical differences between opposite and adjacent corners and edges; however, these structural differences are irrelevant to both the games outcome and the winning players strategy in terms of patterns. Neither the player learning to enact or defend against this opener needs to notice these differences.[/efn_note]. Two of these end the game immediately, meaning a new player, learning the game, will only have to play a maximum three games before completely exhausting the outcomes of each of these replies, assuming they don’t play center immediately just because it feels right. Likewise, it’s easy to remember once learned: centre is strong. Remember that and you will never lose to a corner opener again.

Now this doesn’t mean probabilistic modelling is useless. While we can’t model any particular individual as a random number generator, we can model communities as a whole. Imagine a million children getting asked to play as red after we play corner. Someone is going to play in every tile, but some tiles will get played more than others. What comes out is a probability distribution; the probability that any particular player will be the type of player who plays in certain squares. This is refereed to as a games ‘meta’, a generalization of what a community believes is powerful at any given point in time. Metas are not static, they change and grow as the community changes and grows. They are a weird mix a human intuition and learned behaviour that can change radically depending on the geography, size, common experiences, unwritten rules, and theoretical knowledge of the community as a whole.

Understanding a local meta is vital when picking openers. If we can assume that a community thinks that playing corner is powerful, then they are more likely to know that centre is the right response. In such situations playing centre could be better. Yes there is a higher probability in random play that they will guess right, but that is still better than them just knowing what the right response is, or worse feeling what the right response is.

Complexity

TicTacToe is a game of mistakes, and if we are to find a winning strategy it needs to be one that allows our opposition as many opportunist to make a mistake as possible. Even if someone knows the correct response to an opener, that doesn’t mean they know the correct response for later positions. It’s much hard to memorize the correct responses to a sequence of moves, then the response to a single hard position.

One way to get an idea of how complex an opener is, is to visualize how complicated the resulting game tree becomes. There are thousands of possible games of TicTacToe, so in order to come up with a useful set of positions to visualize, we need to simplify the game by making a few assumptions on what constitutes a reasonable game.

Firstly, let’s assume mostly perfect play.

  1. If a winning move is presented to a player they will always take it.
  2. Players always block a simple win (see below graphic).
  3. Either player will sometimes, at very low probability, pick a losing move. We are primarily interested in states where this can happen.
  4. Once a win is no longer possible, given the above three points, ignore all further variations.
  5. We ignore positions that are just rotations or mirrors images of positions we have already considered[efn_note]Once we determine that a position is losing we don’t need a reminder that the same position rotated ninety degrees is also losing.[/efn_note].
No reasonable red player will ever play anything but top edge here.

In the below visualization each node is a key/value pairing. Each key represents an action a player could make; the key “x_1_1” means that the X player places a token at the row one column one square (counting starts at zero). The value associated with the key represents the result of that decision. An integer value means that the game is over: -1 means X has lost, 1 means X has won, and 0 represents a tie. If the game continues a button appears allowing us to reveal the key/value pairs for the next stage of the game.

The centre opener is the least complicated and produces only one real line of play. After the O player responds in any corner, X has only two moves that don’t immediately end the game in a tie. Playing next to their opening only succeeds at creating a route to victory for O and should be avoided. Playing in the opposite corner challenges the O player to play in one more corner before the game ends in a tie. Memorizing this sequence is trivial, even for a small child.

Starting in the corner is a bit better.

After the O player responds in the centre, the X player is given a choice between two lines; both can lead to a win. Playing in the opposite corner mixes things up by forcing the O player to play on an edge before ending the game in a tie. Playing on an opposite edge is a bit more interesting. If the O player is aware of this unusual position they can play in the opposite corner and force the X player to dodge a single bad tile before ending the game in a tie. Everywhere else is a minefield that the O player needs to wade through, at least one position creating a second such minefield. Either way, ties are a lot harder to stumble into than the centre start.

Now the last one; the edge start.

I can’t summarize this position in one paragraph. The most important line, where the O player responds in the centre, alone is about as complicated as the other two openers combined. In fact, it actually contains some of the corner openers more complicated lines. As well, a lot more variations here end at six moves, meaning that the O player is frequently required to play at least three times before ending the game. A relatively deep understanding of the game is necessary to navigate this opener safely. If you know more about the game than your opponent, opening edge is a really good way to offer your opponent plenty of opportunities to screw up.

Conclusion

If we focus only on the perspective of game theory or AI, we get this warped perspective of what it means to play a game. AIs are optimizers, programs who find the optimal action given some rules and contexts. However, few games actually have truly optimal actions or strategies. Much more common are games that seem like they have best strategies, but acting on those strategies can result in a worse performance: the prisoners’ dilemma for example. To play such games well we need more than just theoretical insight into the game itself; we also need a model of our opponent. Focusing on what’s optimal, or by relying on strategies that are optimal given a specific meta, makes us predictable and easy to manipulate. It’s like finding a bug in computer software; once found it can be exploited indefinitely until the code is changed. What is far more powerful is first understanding what moves a player is likely to make and then searching the game for positions that punish that action. If someone is known to play in corners, then any position where a corner is a losing move is optimal. Even better would be to coax them into playing corners more often by priming them to think of corners as being good by opening with a relatively safe move: like the centre.

This is the fate of the corner opener. The fact that it is viewed so favourably means that any somewhat competent opponent is likely to already have studied it, and know the proper responses. Edge play, conversely, can punish people with some knowledge of the game, as such a player is more likely to ignore their gut instinct and ‘shake things up’ by playing in a square they may erroneously think is safe. Likewise, edge play is the only opener that forces a player to consider not just the differences between edges and corners, but also the differences between different corners. However, playing corner isn’t a bad move; it’s a test. When my opponent plays in the corner they are testing my knowledge of the game. They do so knowing that they are not likely to win. However, maybe winning right away isn’t their plan. Maybe I’m being set up for something bigger, an attempt to put my brain into autopilot, an attempt to convince me that TicTacToe is a simple game. Because, after I start believing that, my brain shuts off, they play edge, put me in a position I was unprepared for, and win.

Corner opener might be the best way for a new player to beat another new player, but opening edge is the best opener at higher levels as it is the only opener that forces both players to prove that their knowledge of the game transcends the Dunning-Kruger effect[efn_note]https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect[/efn_note]. Whoever was lying will eventually lose. If neither was lying then, and only then, do we reach perfect play and the game becomes unwinnable.