AlphaGo vs. Lee Sedol, 2016: The Game That Humbled Humanity

“Go is the oldest board game still widely played. The number of legal Go positions is estimated at 2.08×10^170 — more than the number of atoms in the observable universe. Those decades do not come. The decades are compressed into months. AlphaGo is here, in this room, now, and the match is about to begin.”

— The decades that did not come

Seoul, South Korea. March 9, 2016. The Four Seasons Hotel on the banks of the Han River. A room set up for a historic occasion: a five-game Go match between Lee Sedol — nine-dan professional, the most decorated Go player of his generation, widely described as one of the two or three best players in the history of the game — and AlphaGo, a computer program developed by DeepMind, a British AI company owned by Google.

Go is the oldest board game still widely played. It was invented in China more than 2,500 years ago. It is played on a 19×19 grid; players alternate placing black and white stones on the intersections. The player with more territory at the end of the game wins. The rules can be stated in a paragraph. The depth of the game is inexhaustible.

The number of legal Go positions is estimated at 2.08×10^170 — more than the number of atoms in the observable universe. The game has been described as the most complex two-player board game in existence. Chess, which computers had mastered by 1997, was considered vastly simpler. When Deep Blue defeated Kasparov, the chess world had been humbled but most people believed Go was safe: surely a machine would need decades more before it could challenge the best human players.

Those decades do not come. The decades are compressed into months. AlphaGo is here, in this room, now, and the match is about to begin.

AlphaGo vs. Lee Sedol match begins

Date:: March 9, 2016
Location:: Four Seasons Hotel, Seoul, South Korea
Significance:: A five-game Go match between Lee Sedol (9-dan professional, one of the greatest players in history) and AlphaGo (DeepMind’s AI system). When Deep Blue defeated Kasparov in 1997, most observers believed Go was decades away from machine mastery. The match compressed those decades into months.
Outcome:: AlphaGo won the match 4-1. Lee Sedol won Game 4 with “the divine move.” Lee Sedol retired from professional Go in 2019, citing AlphaGo’s existence as the reason.

Note

The number of legal Go positions is estimated at 2.08×10^170 — more than the number of atoms in the observable universe. Chess, which Deep Blue mastered in 1997, was considered vastly simpler. When Deep Blue defeated Kasparov, most observers believed Go was decades away from machine mastery. Those decades did not come. AlphaGo compressed them into months.

Why Go Was Considered the Last Bastion

The claim that Go was too complex for computers to master was not idle boast. It was based on specific arguments about what computer game-playing required and what Go required that distinguished it from chess.

Chess programs like Deep Blue had won by combining brute force search — examining enormous numbers of possible move sequences — with a reasonably accurate evaluation function that could assess the quality of a position. The evaluation function was essential: it provided the numbers that the search tree needed to make decisions. For chess, evaluation functions could be designed that captured most of what mattered: piece values, king safety, centre control, pawn structure. These functions were imperfect, but they were good enough.

For Go, the evaluation problem was dramatically harder. At any given point in a Go game, it was very difficult to assess which player was ahead. A position that looked good for one player could reverse into a position that was good for the opponent through a sequence of moves that only a very experienced player could see. The intuitions that strong human players used to evaluate Go positions — the sense of which groups were alive and which were in danger, which territories were secure and which were vulnerable, which moves created the most aji (latent potential) — were not easily formalised into an evaluation function.

Definition

Evaluation function — In a game-playing program, the function that assigns a numerical score to a board position, used to compare leaf nodes of the search tree. For chess, evaluation functions could capture most of what mattered (piece values, king safety, centre control, pawn structure). For Go, evaluation was dramatically harder — the intuitions strong players used to judge positions (which groups were alive, which territories were secure, where aji lay) were not easily formalised. AlphaGo’s breakthrough was to learn the evaluation function from data rather than hand-craft it.

The search problem was also qualitatively different. In chess, the branching factor — the average number of legal moves at each position — was approximately 35. In Go, it was approximately 250. The search tree for Go grew far faster than for chess, making pure brute-force search hopeless to a depth that would produce good play.

Previous attempts to write strong Go programs had consistently failed. The best computer Go programs in 2014 were playing at the level of a strong amateur — significantly below the level of even a low-ranked professional. The leap from amateur-level to world-champion-level, which had taken approximately thirty years for chess programs after Shannon’s foundational paper, seemed likely to take at least as long for Go.

DeepMind compressed those thirty years into approximately three years.

Important

Before 2016, Go was widely considered safe from computer mastery. Two specific arguments supported this belief:

The evaluation problem was dramatically harder than for chess — the intuitions strong players used were not easily formalised
The search problem was qualitatively different — the branching factor in Go (~250) was an order of magnitude larger than in chess (~35), making brute-force search hopeless to useful depths

DeepMind compressed what looked like thirty years of progress into approximately three years.

DeepMind and the AlphaGo Project

DeepMind was founded in London in 2010 by Demis Hassabis, Shane Legg, and Mustafa Suleyman. Hassabis was a former chess prodigy, game designer, and neuroscience researcher who had developed a specific vision of AI research that combined deep learning, reinforcement learning, and neuroscience-inspired approaches. The company was acquired by Google in 2014 for approximately $500 million — one of the largest acquisitions of an early-stage AI company to that point.

Demis Hassabis

Born:: July 27, 1976, London, England
Died:: Living
Nationality:: British
Role:: AI researcher, entrepreneur, neuroscientist
Known for:: Co-founder and CEO of DeepMind (2010, acquired by Google in 2014); former chess prodigy and game designer; led DeepMind through AlphaGo, AlphaFold, and the 2024 Nobel Prize in Chemistry for AlphaFold. One of the most influential figures in modern AI.

DeepMind’s reputation in the AI community was established primarily by the “Atari paper” — a 2013 paper in which DeepMind researchers demonstrated that a single deep reinforcement learning system could learn to play forty-nine Atari video games at or above human level, learning directly from the raw pixel inputs of the screen without any game-specific programming. The result was impressive not because the specific games were hard but because the same general-purpose system could learn to play all of them without any domain-specific knowledge.

Definition

Deep reinforcement learning — A machine learning paradigm that combines deep neural networks with reinforcement learning. The agent learns by taking actions in an environment and receiving rewards or punishments, with a deep network approximating the value of states or the optimal policy. DeepMind’s 2013 Atari paper demonstrated that a single deep reinforcement learning system could learn to play 49 different Atari games at or above human level, directly from raw pixels, without any game-specific programming. The same paradigm would produce AlphaGo three years later.

The Go project — AlphaGo — began in 2014 as an attempt to apply the deep learning and reinforcement learning approaches that had produced the Atari results to the harder and more structured problem of Go.

The AlphaGo team, led by David Silver, developed an architecture that combined three learned components: a policy network that predicted good moves given a position, a value network that evaluated the overall quality of a position, and Monte Carlo tree search that combined the policy and value networks to produce actual move recommendations.

David Silver

Born:: 1976, London, England
Died:: Living
Nationality:: British
Role:: Computer scientist, AI researcher
Known for:: Lead researcher on the AlphaGo project at DeepMind; led the subsequent AlphaGo Zero, AlphaZero, and MuZero projects; professor at University College London before joining DeepMind. The principal architect of DeepMind’s deep reinforcement learning programme.

The policy network was trained first on a dataset of approximately thirty million positions from professional Go games — learning to predict which moves professional players would make in given positions. This gave the network a reasonable approximation of “what strong human players do,” which was used to initialise a reinforcement learning phase.

The reinforcement learning phase trained the policy network further by having it play against previous versions of itself, improving through experience. Crucially, the training was self-play — the network learned entirely by playing games against itself, receiving wins and losses as the only feedback. This self-play training produced a policy that was substantially stronger than the supervised learning policy alone.

The value network was trained to evaluate positions — to predict, from a given board position, who was more likely to win the game. This network provided the evaluation function that had been missing from previous Go programs: not a hand-crafted function based on human understanding of what made positions good or bad, but a learned function derived from the statistics of millions of self-play games.

The three components — policy network, value network, and Monte Carlo tree search — were combined into AlphaGo’s decision process. At each move, AlphaGo used the policy network to identify promising candidate moves, evaluated them using a combination of Monte Carlo rollouts and the value network, and selected the move with the highest estimated probability of winning.

Definition

Monte Carlo Tree Search (MCTS) — A search algorithm that uses random sampling (Monte Carlo rollouts) to evaluate positions in a game tree. Rather than exhaustively searching to a fixed depth, MCTS plays out many random games from candidate positions and uses the statistics of those games to estimate the value of each position. AlphaGo combined MCTS with learned policy and value networks: the policy network guided which moves to consider, the value network estimated position quality, and MCTS provided the search framework that integrated them.

Fan Hui and the First Warning

The public announcement of the AlphaGo match against Lee Sedol in March 2016 was preceded by a quieter, more significant event: a match against Fan Hui, a European Go champion and two-time European Go Champion, in October 2015.

Fan Hui was a professional Go player — the European champion — but was significantly below Lee Sedol’s world-class level. In Go terms, the gap between Fan Hui and Lee Sedol was approximately the same as the gap between a regional chess champion and a world championship contender.

AlphaGo defeated Fan Hui 5-0. The result was not close.

AlphaGo defeats Fan Hui

Date:: October 2015
Location:: London (DeepMind headquarters)
Significance:: A closed-door five-game match between AlphaGo and Fan Hui (reigning European Go champion). AlphaGo won 5-0. The match was kept secret while DeepMind prepared the public announcement of the Lee Sedol match and the Nature paper.
Outcome:: The first time a computer program defeated a professional Go player in an even game. A warning that DeepMind kept quiet for tactical reasons — they wanted to skip the “AI can beat amateurs but not professionals” conversation and go directly to the professional match.

The match was conducted secretly — Fan Hui was sworn to confidentiality about the result — while DeepMind prepared to announce the match against Lee Sedol and the publication of the AlphaGo paper in Nature. The secrecy was partly practical (DeepMind wanted to control the announcement) and partly tactical (the result against Fan Hui would have been dismissed as proof that AlphaGo could beat amateurs but not professionals, and DeepMind wanted to skip this conversation and go directly to the professional match).

Fan Hui’s experience of the matches was, by his subsequent accounts, disorienting. He had expected to win. He was a professional Go player; the computer programs he had encountered were amateurs. He lost the first game, then the second, then the third. The moves AlphaGo was making were not the moves a human player would make, but they worked. The positional understanding the machine demonstrated was different from human positional understanding, but it was not inferior.

After the series, Fan Hui reported that AlphaGo had changed how he thought about Go. Some of its moves, which had seemed strange to him during the games, he now recognised as deeply correct — moves that revealed aspects of the game’s theory that human players had not noticed. The machine was not just beating him; it was teaching him.

Lee Sedol: The Human Champion

Lee Sedol was born in 1983 in South Korea and began studying Go at the age of seven. By eighteen, he had become the youngest player in history to reach the top professional rank in South Korea. Over the following decade and a half, he accumulated an extraordinary record of achievement: eighteen international title wins, a record of decisive victories against the world’s best players, and a reputation for creative, aggressive play that was widely admired.

Lee Sedol

Born:: March 2, 1983, Sinan County, South Korea
Died:: Living
Nationality:: South Korean
Role:: Professional Go player (9-dan)
Known for:: One of the greatest Go players of his generation; 18 international titles; subject of the March 2016 AlphaGo match; famous for his “divine move” (move 78) in Game 4 — the only game he won against AlphaGo; retired from professional Go in November 2019, citing AlphaGo’s existence as the reason

Lee Sedol was not just technically excellent. He was creative in a way that distinguished him from players who were merely technically superior. He was known for surprising moves that violated conventional wisdom but proved to be deeply correct — moves that changed how the Go world thought about certain types of positions. He was aggressive in ways that created complex, messy positions where his creativity gave him an advantage. He was, by general consensus, one of the most interesting human beings who had ever played Go.

He agreed to the match against AlphaGo with significant confidence. His pre-match statements were not dismissive of the computer — he acknowledged that he might lose a game — but were optimistic about the overall outcome. He expected to win the match 5-0 or, at worst, 4-1. He believed that the complexity of Go, and the specific kinds of creative, intuitive moves that had defined his playing style, would be beyond a computer.

This confidence was not arrogance. It was a reasonable assessment given the state of computer Go as it had been known to the public. The gap between professional-level and amateur-level Go play was enormous, and the strongest known computer programs were playing at approximately the level of a strong amateur. Lee Sedol had no reason to believe that the gap had been closed.

The DeepMind team knew the gap had been closed. They had watched AlphaGo play tens of millions of games against itself. They had seen it develop strategies that human professionals had not anticipated. They were confident about the outcome. But they could not be certain — Go was complex, and human creativity might find moves that AlphaGo’s training had not prepared it for.

Warning

The DeepMind team knew the gap between AlphaGo and Lee Sedol had been closed — they had watched AlphaGo play tens of millions of games against itself and had seen it develop strategies that human professionals had not anticipated. They were confident about the outcome but could not be certain. Lee Sedol, by contrast, had no reason to believe the gap had been closed — the strongest publicly known computer programs were still playing at approximately the level of a strong amateur. The asymmetry of information shaped the pre-match dynamics.

Game 1: The First Shock

The first game of the match began at 1:00 PM on March 9, 2016, and was watched by approximately 60 million people on Korean television and hundreds of millions more via live streaming.

AlphaGo played white. Lee Sedol played black, giving him the advantage of the first move. The game developed normally through the first twenty moves — both sides playing in the expected manner for the opening positions each chose. The Go world watching the game knew what to look for and could follow the unfolding patterns.

Then the games started to diverge from what the observers expected. AlphaGo began playing moves that were unusual — not wrong, but not the moves that the professional community would have predicted. Moves that seemed too cautious in some positions, too aggressive in others, moves that established influence in unconventional ways.

Lee Sedol, playing with characteristic aggression, pressed his positional advantages in the lower part of the board. In a human game, this kind of pressure often led to complex fighting where the more experienced player could outread the opponent. Against AlphaGo, the fighting did not produce the results that Lee Sedol was seeking. The computer’s position was stable in ways that defied expectation.

At move 186, Lee Sedol resigned. The game had lasted approximately three and a half hours. AlphaGo had won the first game.

The reaction was significant. Lee Sedol had been expected to win comfortably; instead, he had lost a well-played game to a machine. The Go world was shaken, but one game was not a match.

AlphaGo vs. Lee Sedol, Game 1

Date:: March 9, 2016
Location:: Four Seasons Hotel, Seoul
Significance:: The first game of the match. Watched by approximately 60 million people on Korean television and hundreds of millions more via live streaming. AlphaGo played moves that diverged from what the professional community expected — moves that were unusual but not wrong.
Outcome:: Lee Sedol resigned at move 186, after approximately three and a half hours. AlphaGo had won the first game. The Go world was shaken, but one game was not a match.

Game 2 and Move 37: The Move That Changed Everything

Game 2, played on March 10, was where the match became historic.

The game was forty-five minutes old. Lee Sedol had played black; AlphaGo had played white. The position was complex and contested — both sides had been playing at a high level, and the evaluation of who was ahead would have been disputed among the professional commentators watching the game.

AlphaGo played move 37: the white stone was placed on the fifth line in a position that no professional human player would have played. The commentators’ reaction was immediate: a sharp intake of breath, then silence, then discussion. The move violated one of the foundational principles of classical Go theory — stones on the fifth line were considered too high to be efficient territory-building tools.

Move 37 — AlphaGo’s fifth-line stone

Date:: March 10, 2016 (Game 2, move 37)
Location:: Four Seasons Hotel, Seoul
Significance:: AlphaGo placed a white stone on the fifth line in a position no professional human player would have played — violating a foundational principle of classical Go theory. The move shocked the commentators and initially appeared to be a mistake.
Outcome:: Over the next forty moves, move 37’s influence became apparent. The stone was not a mistake. It was a strategic conception of unusual depth that no human had played before. AlphaGo had discovered an exception to a rule that human players had not discovered. The move is now widely regarded as one of the most important in the history of Go.

But AlphaGo had calculated that the move was deeply correct. The stone on the fifth line created a positional structure that would prove devastating forty moves later, when it influenced a part of the board that nobody — not Lee Sedol, not the commentators, not the Go world — had yet understood would be relevant.

Lee Sedol left the playing room after seeing the move. He took fifteen minutes, which is a long time in a match at this level, before returning to the board and continuing. The professional commentators, who were among the strongest players in the world, spent those fifteen minutes debating what move 37 meant. Several of them concluded it was a mistake — that the computer had made an error, that Lee Sedol would be able to exploit it.

They were wrong. Over the next forty moves, move 37’s influence became apparent. The stone on the fifth line was not a mistake. It was a strategic conception of unusual depth — a move that looked wrong from the perspective of classical Go theory but was correct from the perspective of the specific position that was unfolding. AlphaGo had found a move that human theory had not developed, and the move was not just clever; it was correct.

Lee Sedol lost Game 2. He lost it because he could not find a response to a move that no human had played before and that changed his understanding of what Go was.

In a post-game press conference, Lee Sedol said: “I’ve never thought of Go as something I was doing alone. There was always the opponent. But after today, I’m not sure. The moves it played seemed like they were from somewhere else entirely.”

Quote

“I’ve never thought of Go as something I was doing alone. There was always the opponent. But after today, I’m not sure. The moves it played seemed like they were from somewhere else entirely.” — Lee Sedol, post-game press conference, Game 2, March 10, 2016

Important

Move 37 in Game 2 was the move that changed the match. AlphaGo played a stone on the fifth line in a position no professional human player would have played — violating a foundational principle of classical Go theory. The move looked like a mistake. It was not. It was a strategic conception of unusual depth that proved correct forty moves later. AlphaGo had discovered an exception to a rule that human theory had not discovered. The professional Go world had to confront the fact that the machine had found moves they did not know existed.

Games 3 and 4: Despair and Triumph

Game 3 followed two days later, on March 12. Lee Sedol was visibly troubled — his pre-game preparations were longer than usual, his demeanour more subdued. He had lost the match already, by the terms of the best-of-five format; AlphaGo’s 2-0 lead meant it needed only one more win to claim the match.

AlphaGo won Game 3 as well, giving it a 3-0 lead and the match. Lee Sedol was 0-3 in the match. The entire Go community was in shock.

Then came Game 4. On March 13, with the match already decided, Lee Sedol played the most important move in the series — a move that has since been called “the divine move” by the Korean Go world.

Lee Sedol’s “divine move” — Game 4, move 78

Date:: March 13, 2016 (Game 4, move 78)
Location:: Four Seasons Hotel, Seoul
Significance:: With the match already lost (AlphaGo leading 3-0), Lee Sedol played a “wedge” move on move 78 that cut into AlphaGo’s position in a way the program was apparently not prepared for. AlphaGo’s response was unusual — the program appeared to be in difficulty for the first time in the match.
Outcome:: AlphaGo made an error on move 79 that professional commentators recognised immediately. AlphaGo continued to deteriorate and resigned on move 180. Lee Sedol’s only win in the match — and the only game AlphaGo lost. The Korean Go world called move 78 “the divine move.”

On move 78, playing white, Lee Sedol placed a stone in a specific point on the board — a “wedge” move that cut into AlphaGo’s position in a way that the program was apparently not prepared for. AlphaGo’s response to the wedge was unusual — the program appeared to be in difficulty for the first time in the match. Its moves became less certain, its play less fluid. It made an error on move 79 that professional commentators recognised immediately as a mistake.

Lee Sedol exploited the mistake. AlphaGo continued to deteriorate, making further errors that compounded the problem created by the wedge. On move 180, AlphaGo resigned.

The reaction in the hall, where dozens of professional Go players and thousands more watching online had been following the game, was extraordinary. Lee Sedol’s move 78 had done something that nobody had thought possible after the first three games: it had found a move that AlphaGo was not prepared for. A human player had found a gap in the machine’s preparation.

The victory was celebrated not as a sign that humans could consistently defeat AlphaGo — it was clear from the match that they could not — but as proof that human creativity and intuition could still, in specific moments and specific positions, find moves that the machine had not anticipated. It was a beautiful, poignant, temporary victory.

Note

Lee Sedol’s Game 4 victory was beautiful, poignant, and temporary. It was not a sign that humans could consistently defeat AlphaGo — it was clear from the match that they could not. It was proof that human creativity and intuition could still, in specific moments and specific positions, find moves that the machine had not anticipated. The window in which this was possible was narrow and closing.

Game 5: The Final Statement

Game 5, on March 15, was the final game. Lee Sedol played with less apparent burden than in the previous games — he had already lost the match, had already won a game, and was now playing with a freedom that the earlier pressure had not allowed.

The game was complex and long. Both sides played at a high level. AlphaGo made an unusual move that commentators debated — one of the professionals present said it looked like the kind of move a human would make to “test” the opponent, to see how they responded. Whether AlphaGo was doing anything analogous to testing, or whether the move was simply the result of its policy network’s calculations, was impossible to determine.

Lee Sedol lost Game 5. The final score of the match was AlphaGo 4, Lee Sedol 1.

In his final press conference, Lee Sedol said: “This made me realise that Go is not something that humans can master. I had never thought that. I know I’m still a strong player, but after this series I’m not so confident in the way I play. The problem is AlphaGo doesn’t have any human bad habits.”

He paused. “I heard from the DeepMind team that AlphaGo improved itself for this match by watching millions of games. I’m not sure whether I should be angry about that or impressed.”

Quote

“This made me realise that Go is not something that humans can master. I had never thought that. I know I’m still a strong player, but after this series I’m not so confident in the way I play. The problem is AlphaGo doesn’t have any human bad habits.”

“I heard from the DeepMind team that AlphaGo improved itself for this match by watching millions of games. I’m not sure whether I should be angry about that or impressed.” — Lee Sedol, final press conference, March 15, 2016

What AlphaGo Actually Was: Inside the Machine

Understanding the significance of AlphaGo requires understanding what it was actually doing — what kind of computation was producing moves that shocked the Go world.

AlphaGo was not doing what Deep Blue did to defeat Kasparov. Deep Blue searched explicitly through game trees, examining hundreds of millions of positions per second. Its strength came primarily from the sheer quantity of positions it could evaluate, combined with a well-crafted evaluation function.

AlphaGo was doing something more interesting and more alien. Its policy network — trained through supervised learning on human games and reinforced through self-play — had learned a kind of Go intuition. Given a board position, the policy network would assign probabilities to different moves, reflecting what moves it had learned led to wins. This was not calculation in the traditional sense; it was pattern recognition applied to an extraordinarily complex domain.

The value network — trained to evaluate positions rather than predict moves — had learned something similar: a representation of board position that captured, in a non-explicit form, the kind of territorial and structural assessment that strong human players spent years developing.

The combination of these learned components, integrated through Monte Carlo tree search, produced a system that could evaluate millions of continuations, guided by learned intuitions, to produce moves that reflected an understanding of the game that was genuine but non-human.

Definition

Policy network — A neural network that takes a board position as input and outputs a probability distribution over possible moves. In AlphaGo, the policy network was first trained on approximately 30 million positions from professional Go games (supervised learning), then improved through self-play (reinforcement learning). The policy network gave AlphaGo a kind of “Go intuition” — pattern recognition applied to an extraordinarily complex domain — that guided which moves to consider in its search.

Definition

Value network — A neural network that takes a board position as input and outputs a single number estimating who is more likely to win from that position. In AlphaGo, the value network was trained on millions of self-play games. It provided the evaluation function that had been missing from previous Go programs: not a hand-crafted function based on human understanding of what made positions good or bad, but a learned function derived from the statistics of millions of self-play games.

The “non-human” quality was visible in specific features of AlphaGo’s play. It was comfortable with long-term strategic sacrifices that human players found psychologically difficult to make — it would give up territory or stones in exchange for positional advantages that would only materialise dozens of moves later, without the uncertainty about whether the sacrifice would pay off that troubled human players. It did not respond to pressure in the way human players did — it did not become defensive when attacked, did not become aggressive when losing. It played, always, what its calculations suggested was the best move.

The move 37 in Game 2 was the most visible example of AlphaGo’s non-human quality. The move was correct, but it was correct in a way that violated the explicit principles of classical Go theory. Human players learn to avoid stones on the fifth line in certain positions because such stones are typically too high to build territory efficiently. AlphaGo had learned, through millions of self-play games, that in this specific position, the fifth-line stone created a structural advantage that overrode the general principle. It had discovered an exception to a rule that human players had not discovered.

This was not superhuman intelligence in the general sense. AlphaGo could not play chess, could not converse in natural language, could not plan a dinner party. But within the specific domain of Go, it had developed insights that exceeded what human theory had developed — it had found moves and strategies that the human Go world had not seen.

Important

AlphaGo’s moves were correct, but they were correct in ways that violated the explicit principles of classical Go theory. It was comfortable with long-term strategic sacrifices that human players found psychologically difficult. It did not respond to pressure the way humans did. It played, always, what its calculations suggested was the best move. Move 37 was the most visible example — a stone on the fifth line that human theory said was too high to be efficient. AlphaGo had learned, through millions of self-play games, that in this specific position the fifth-line stone created a structural advantage that overrode the general principle. It had discovered an exception to a rule that human players had not discovered.

Lee Sedol’s Game 4 Move: Why It Mattered

Lee Sedol’s move 78 in Game 4 — the wedge that forced AlphaGo’s only loss in the series — deserves its own analysis, because it reveals something important about the relationship between human and machine intelligence in adversarial games.

AlphaGo’s training had involved approximately thirty million positions from human games, followed by millions of self-play games. Its preparation was extraordinary by any standard. But move 78 was a move that AlphaGo’s training had apparently not prepared it for — a move that fell into a part of the distribution of possible positions where AlphaGo’s policy and value networks were less reliable.

This was not a random event. Lee Sedol, with his deep understanding of the game and his creativity, had deliberately sought the kind of complex, unexpected position where human creativity might find moves that AlphaGo had not anticipated. The entire context of his Game 4 play — including the setup that made move 78 possible — was oriented toward creating exactly this kind of situation.

The success of this strategy was partial and temporary. AlphaGo’s subsequent AlphaGo Zero version, trained entirely through self-play, would develop such comprehensive preparation that even deeply creative human players could not find the kinds of gaps that Lee Sedol had found in Game 4. The specific window in which human creativity could still find moves that the machine had not anticipated was narrow and closing.

But the success was also meaningful. It demonstrated that human creativity — the ability to find genuinely novel approaches to complex problems — was not simply inferior to machine calculation. It was different, and in specific circumstances, it could find things that the machine had missed. The difference would eventually be overcome by scale of training. But it was a difference, not just a deficit.

The Cultural Impact: What the Match Meant for Asia

The cultural impact of the AlphaGo match in East Asia — particularly in South Korea and China, where Go is a significant cultural institution — was significantly greater than in the West.

In South Korea, Go is not just a game. It is a cultural practice with philosophical dimensions — a game that is associated with strategic thinking, long-term planning, and a specific kind of intellectual cultivation. Strong Go players are celebrities in South Korea in a way that chess players are not in most Western countries. The national audience for the match was comparable to the audience for major sporting events.

The loss of three consecutive games to a machine was experienced as a cultural shock of a specific kind — not just the defeat of an individual player, but a challenge to the specific cultural value placed on Go mastery. If a machine could master Go, what was the status of the human mastery that Go had represented?

The reaction in China was different but equally intense. China had produced some of the best Go players in the world, and the Go world was watching closely. The implications of AlphaGo’s success for China’s AI ambitions were not lost on Chinese technologists and policymakers — within months of the match, China had announced major AI development initiatives that cited AlphaGo as evidence of the strategic importance of AI.

The match was, for East Asian audiences, a demonstration that AI had arrived — that the technology had reached a level of capability in a culturally significant domain that made its implications for human activity undeniable. The significance of this demonstration was not lost on the region’s governments and companies, and the accelerated AI investment that followed was partly a product of the cultural resonance of the AlphaGo victory.

Info

In South Korea, Go is not just a game. It is a cultural practice with philosophical dimensions — associated with strategic thinking, long-term planning, and a specific kind of intellectual cultivation. Strong Go players are celebrities in a way that chess players are not in most Western countries. The loss of three consecutive games to a machine was experienced as a cultural shock — not just the defeat of an individual player, but a challenge to the specific cultural value placed on Go mastery. Within months of the match, China had announced major AI development initiatives that cited AlphaGo as evidence of the strategic importance of AI.

The Match’s Influence on DeepMind’s Research

The AlphaGo match was not just a demonstration. It was also a scientific experiment — one that advanced DeepMind’s understanding of what its approach could and could not do.

The specific features of AlphaGo’s play that had surprised professional Go players — the unconventional moves, the long-term sacrifices, the non-human strategic conception — were themselves scientifically interesting. They revealed that the combination of deep learning and reinforcement learning could discover strategic insights in complex domains that human experts had not developed. This was a specific and important capability: not just matching human performance but developing understanding that extended beyond what humans had discovered.

The match also revealed specific limitations of AlphaGo’s approach. The failure mode in Game 4 — the inability to handle Lee Sedol’s unexpected wedge — suggested that AlphaGo’s preparation, while extraordinarily comprehensive, had gaps in certain unusual positions. The training distribution had not covered everything.

These observations directly informed the development of AlphaGo Zero, published in 2017. AlphaGo Zero was trained entirely through self-play, starting from random play and receiving only win/loss feedback, without any training on human games. The result was a system that was dramatically stronger than the AlphaGo that had defeated Lee Sedol — and that had developed a different, more comprehensive understanding of the game, because it was not constrained by the patterns of human play.

AlphaGo Zero published

Date:: October 2017 (Nature)
Location:: DeepMind, London
Significance:: AlphaGo Zero — trained entirely through self-play, starting from random play and receiving only win/loss feedback, with no access to human games — was dramatically stronger than the AlphaGo that had defeated Lee Sedol. After approximately 40 days of training, it had become the strongest Go player in history.
Outcome:: Demonstrated the power of the pure self-play paradigm: AI systems could develop expertise in complex domains by playing against themselves, without any need for human guidance beyond the rules of the domain. The progression from AlphaGo (which used human games) to AlphaGo Zero (which did not) was itself a research finding.

AlphaGo Zero’s playing style was even more alien than AlphaGo’s. It discovered openings that human theory had not developed, strategic patterns that human players found unfamiliar, and a consistency that reflected its lack of any human-derived biases about how the game should be played.

The progression from AlphaGo to AlphaGo Zero was a demonstration of the power of the self-play paradigm: that AI systems could develop expertise in complex domains by playing against themselves, without any need for human guidance beyond the rules of the domain.

Important

AlphaGo Zero (2017) was trained entirely through self-play, with no access to human games. After approximately 40 days of training, it had become the strongest Go player in history — substantially stronger than the AlphaGo that had defeated Lee Sedol. The progression from AlphaGo (which used human games) to AlphaGo Zero (which did not) demonstrated the power of the pure self-play paradigm: AI systems could develop expertise in complex domains by playing against themselves, without any need for human guidance beyond the rules of the domain. The same paradigm would later produce AlphaZero (chess, shogi) and contribute to the reinforcement-learning-from-human-feedback techniques behind ChatGPT.

Lee Sedol’s Retirement: The Deepest Response

In November 2019, three years after the match, Lee Sedol announced his retirement from professional Go. He was thirty-six years old — young by the standards of a sport that many professionals play well into their forties. His stated reason was AlphaGo.

In an interview with Yonhap News Agency, he said: “With the debut of AI in Go games, I’ve realised that I’m not at the top even if I become the number one through frantic efforts. There is an entity that cannot be defeated.”

Quote

“With the debut of AI in Go games, I’ve realised that I’m not at the top even if I become the number one through frantic efforts. There is an entity that cannot be defeated.” — Lee Sedol, Yonhap News Agency interview, November 2019

Lee Sedol’s decision to retire because of AlphaGo’s existence was, in the opinions of many observers, a more profound response to the match than anything that had happened at the board. It was the response of a person who had devoted his life to mastering something, who had reached the summit of that mastery, and who had been shown that his summit was not the summit.

The response was not despair, exactly. Lee Sedol had won a game against AlphaGo — had played the divine move, had found a gap in the machine’s preparation. He knew he was extraordinary. The retirement was something more reflective: the recognition that the context in which his extraordinary had meaning had changed.

Professional Go — at the very top level — was partly about the pursuit of mastery for its own sake, and partly about competition against the best. If the best was no longer human, if the ceiling of human achievement had been unambiguously established as below the machine’s capabilities, the competitive dimension of that pursuit was changed in a specific and irrecoverable way.

Whether Go itself was changed was a more complex question. The game remained what it had always been — a game of extraordinary depth that could be played and enjoyed at every level. Amateur and amateur-to-professional competition continued to be meaningful. The human game remained rich and challenging. But the summit of human achievement had been located, and the location was lower than the mountain was tall.

Important

Lee Sedol’s retirement in November 2019 was, in the opinions of many observers, a more profound response to the match than anything that had happened at the board. It was the response of a person who had devoted his life to mastering something, who had reached the summit of that mastery, and who had been shown that his summit was not the summit. The competitive dimension of top-level Go had been changed in a specific and irrecoverable way.

The Philosophical Questions: Human Intuition and Machine Reasoning

The AlphaGo match raised philosophical questions that have been discussed since 2016 with no settled resolution.

What is intuition? Strong Go players typically describe their play partly in terms of intuition — a felt sense of which moves are good that operates faster than explicit reasoning and that experienced players have developed over years of practice. AlphaGo’s policy network was, in a specific sense, doing something analogous: producing move recommendations based on pattern recognition in high-dimensional space. Was this machine intuition? Or was it something categorically different — calculation that looked like intuition because the policy network’s processing was not transparent?

What makes a move creative? Move 37 in Game 2 and Lee Sedol’s move 78 in Game 4 were both, in different ways, creative. AlphaGo’s move was creative in the sense that it was unexpected, novel, and correct in a way that violated received wisdom. Lee Sedol’s move was creative in the sense that it found a specific gap in the machine’s preparation, a position that the training had not covered. Are these the same kind of creativity, or different kinds?

What is the value of human mastery when machines master better? Lee Sedol’s retirement articulated a specific concern: that the existence of an entity that “cannot be defeated” changed the value of his own mastery. This is a question that will be faced in many domains as AI systems develop capabilities that exceed human capabilities. The question is not just personal — about what individual practitioners should do — but cultural: what is the value of human excellence when machine excellence exists?

Info

The AlphaGo match raised three philosophical questions that are still being discussed:

What is intuition? Strong Go players describe their play partly in terms of a felt sense that operates faster than explicit reasoning. AlphaGo’s policy network was doing something analogous — producing move recommendations based on pattern recognition. Was this machine intuition, or calculation that looked like intuition because the processing was not transparent?
What makes a move creative? AlphaGo’s move 37 was creative in the sense of being unexpected, novel, and correct in a way that violated received wisdom. Lee Sedol’s move 78 was creative in the sense of finding a specific gap in the machine’s preparation. Are these the same kind of creativity?
What is the value of human mastery when machines master better? Lee Sedol’s retirement articulated a concern that will be faced in many domains as AI develops capabilities that exceed human capabilities.

These questions do not have simple answers, and the philosophical traditions that address them — virtue ethics, competitive aesthetics, theories of meaning and achievement — were not designed for the specific situation that AlphaGo created. The questions will need to be worked out in the specific contexts in which they arise, and the Go world has been one of the first to confront them concretely.

The Match’s Place in History: More Than a Game

The AlphaGo-Lee Sedol match was, like Deep Blue-Kasparov in 1997, a moment of cultural significance that exceeded what happened at the board. It demonstrated, to an audience that extended far beyond the AI research community, that machines had arrived at a capability that had seemed uniquely human.

The specific significance of the Go match exceeded the chess match in a specific way. Deep Blue had won at chess through a combination of brute force search and human-crafted evaluation — impressive engineering but not a fundamentally new kind of intelligence. AlphaGo won at Go by learning — by discovering, through self-play, a kind of strategic understanding that exceeded what human study had produced. The machine was not just calculating faster. It was learning in a way that produced genuine insight.

Important

AlphaGo’s significance exceeded Deep Blue’s in a specific way. Deep Blue had won at chess through a combination of brute force search and human-crafted evaluation — impressive engineering but not a fundamentally new kind of intelligence. AlphaGo won at Go by learning — by discovering, through self-play, a kind of strategic understanding that exceeded what human study had produced. The machine was not just calculating faster. It was learning in a way that produced genuine insight.

This difference mattered for how the match was interpreted. Deep Blue’s victory raised questions about computation and intelligence but could be partially deflected by arguing that chess was a special domain amenable to brute force calculation. AlphaGo’s victory was harder to deflect. Go was the game that had been chosen specifically as the test of whether machines could match human strategic thinking. When AlphaGo passed the test, the argument that machines could match human strategic thinking in this domain was established.

The implications were drawn quickly. If machines could learn Go strategy through self-play, what else could they learn through analogous approaches? The answer, which the subsequent years would confirm, was: very much. The reinforcement learning from human feedback that produced ChatGPT, the protein structure prediction that produced AlphaFold, the game-playing agents that achieved superhuman performance in a wide range of video games — all drew on the self-play and reinforcement learning paradigm that AlphaGo had demonstrated.

The Second Match: AlphaGo Masters vs. All Comers

After the Lee Sedol match, DeepMind continued to improve AlphaGo and played a subsequent series of online matches against professional players under the pseudonym “Master” in late 2016 and early 2017. The Master account played sixty games against professional players and won all sixty.

The most significant aspect of the Master games was not the score but the reaction of the professional players who played against it. Several of the world’s strongest Go players — Ke Jie, Park Junghwan, Gu Li — reported that the Master games had changed their understanding of Go theory. They identified specific moves and strategies that Master used that they had not previously considered and that, upon reflection, they recognised as genuinely correct.

This was a specific and remarkable claim: that an AI system had not just beaten the best human players but had taught them something about the game they had devoted their lives to. The machine was not just a measuring stick for human achievement; it was a teacher.

This teaching role is one of the most interesting and most hopeful aspects of the AlphaGo story. The capability that was revealed — the ability of AI systems trained through self-play to discover insights in complex domains that human practitioners had not developed — has applications that extend far beyond games. Scientific discovery, engineering design, drug development, material science — any domain with a clear objective function and a complex search space could in principle be approached with similar methods.

Note

After the Lee Sedol match, the “Master” account played sixty games against professional players in late 2016 and early 2017 — and won all sixty. Several of the world’s strongest Go players (Ke Jie, Park Junghwan, Gu Li) reported that the Master games had changed their understanding of Go theory. The machine was not just a measuring stick for human achievement; it was a teacher. This teaching role is one of the most hopeful aspects of the AlphaGo story — and the capability extends far beyond games to any domain with a clear objective function and a complex search space.

AlphaGo vs. Lee Sedol, 2016: The Game That Humbled Humanity

Why Go Was Considered the Last Bastion

DeepMind and the AlphaGo Project

Fan Hui and the First Warning

Lee Sedol: The Human Champion

Game 1: The First Shock

Game 2 and Move 37: The Move That Changed Everything

Games 3 and 4: Despair and Triumph

Game 5: The Final Statement

What AlphaGo Actually Was: Inside the Machine

Lee Sedol’s Game 4 Move: Why It Mattered

The Cultural Impact: What the Match Meant for Asia

The Match’s Influence on DeepMind’s Research

Lee Sedol’s Retirement: The Deepest Response

The Philosophical Questions: Human Intuition and Machine Reasoning

The Match’s Place in History: More Than a Game

The Second Match: AlphaGo Masters vs. All Comers

Further Reading

Comments

Why Go Was Considered the Last Bastion

DeepMind and the AlphaGo Project

Fan Hui and the First Warning

Lee Sedol: The Human Champion

Game 1: The First Shock

Game 2 and Move 37: The Move That Changed Everything

Games 3 and 4: Despair and Triumph

Game 5: The Final Statement

What AlphaGo Actually Was: Inside the Machine

Lee Sedol’s Game 4 Move: Why It Mattered

The Cultural Impact: What the Match Meant for Asia

The Match’s Influence on DeepMind’s Research

Lee Sedol’s Retirement: The Deepest Response

The Philosophical Questions: Human Intuition and Machine Reasoning

The Match’s Place in History: More Than a Game

The Second Match: AlphaGo Masters vs. All Comers

Further Reading

Comments

Subscribe