SeriesMinds & Machines🧠 ProfileAct V
P19Act V · The Explosion

Ilya Sutskever: The Mind That Saw the Future

On this page16 sections

Safe superintelligence is the most important technical problem of our time.

— Ilya Sutskever, founding statement of Safe Superintelligence Inc., June 2024

San Francisco, California. November 17, 2023. Ilya Sutskever is one of five board members who vote to remove Sam Altman as CEO of OpenAI. He has been with the organisation since its founding. He is its Chief Scientist — the person most responsible, more than any single individual, for the technical programme that produced GPT-3, GPT-4, and ChatGPT. He is, in the estimation of most people who have worked with him, among the most gifted AI researchers of his generation.

He votes to fire the man who built the organisation he has devoted a decade of his life to.

Four days later, he signs the employee letter calling for Altman’s reinstatement and expressing regret for his role in the firing.

Five months later, he leaves OpenAI.

A year after leaving, he announces the founding of Safe Superintelligence Inc. — an organisation whose name is itself a thesis about what matters most in AI. The name says: building superintelligence is coming. The only question is whether it is safe.

This is the story of a man who may understand AI better than anyone — who certainly built more of what AI currently is than most — and who has arrived, through that building, at a specific and serious concern about what it is becoming.

Ilya Sutskever
Born:
1985, Gorky, Soviet Union (now Nizhny Novgorod, Russia); emigrated to Israel as a child, then to Canada
Died:
Living (as of 2026)
Nationality:
Israeli-Canadian
Role:
Computer scientist; co-founder and former Chief Scientist of OpenAI; co-founder of Safe Superintelligence Inc. (SSI); former doctoral student of Geoffrey Hinton at the University of Toronto
Known for:
Co-developing AlexNet (2012); co-founding OpenAI (2015); co-authoring the seq2seq paper (2014); leading the GPT programme; the November 2023 board vote to fire Sam Altman; founding SSI (2024)
Important

Sutskever’s trajectory — from AI prodigy to OpenAI co-founder to the person who voted to fire Sam Altman to the founder of SSI — embodies more completely than almost anyone else the field’s ambivalence about what it is building. He built the systems that create the alignment concern, and he is now devoting his career to ensuring that the concern is addressed before it becomes catastrophic.


Russia, Israel, Toronto: The Making of a Prodigy

Ilya Sutskever was born in 1985 in Gorky — now Nizhny Novgorod — in the Soviet Union, to a Jewish family. He was five years old when the Soviet Union began its collapse, and his family was among the hundreds of thousands of Soviet Jews who emigrated during the early 1990s, moving first to Israel. He grew up in Jerusalem, where he showed the mathematical gifts that would define his career — a child who was drawn to the abstract, the formal, the precisely specified.

He attended the Hebrew University of Jerusalem, where he studied mathematics, but the family moved again, this time to Toronto, following an older pattern of Jewish intellectual migration toward the great universities of North America. It was at the University of Toronto that Sutskever’s destiny was shaped: he enrolled in the computer science programme and came to the attention of Geoffrey Hinton.

The relationship with Hinton was transformative. Sutskever arrived at Toronto at exactly the moment when Hinton’s neural network research was beginning to attract serious attention — the backpropagation renaissance was a decade and a half old, the first hints of what would become AlexNet were becoming visible, and Hinton’s group was working at the frontier of what deep learning could do.

Sutskever was, by all accounts, a student of unusual capability even in a group known for strong students. He had the specific combination of mathematical precision and creative vision that the best AI researchers require — the ability to formalise intuitions rigorously and to develop intuitions from formal analyses, to move fluidly between the abstract and the empirical. Hinton recognised this and gave Sutskever substantial intellectual freedom to develop the ideas that would prove most consequential.

Geoffrey Everest Hinton
Born:
December 6, 1947, Wimbledon, London, England
Died:
Living (as of 2026)
Nationality:
British-Canadian
Role:
Computer scientist, cognitive psychologist; Professor Emeritus at the University of Toronto; former Distinguished Researcher at Google; Sutskever’s doctoral supervisor
Known for:
Co-developing backpropagation (1986); training Sutskever and Krizhevsky (AlexNet, 2012); 2018 ACM Turing Award (with LeCun and Bengio); the 2024 Nobel Prize in Physics; Sutskever’s most important intellectual influence

AlexNet: The Work That Started Everything

Sutskever’s most immediately important contribution was his role in the development and implementation of AlexNet — the convolutional neural network that won the 2012 ImageNet competition and announced to the world that the deep learning revolution had arrived.

The story of AlexNet has been told in detail in the Events article on that competition. Here the focus is on Sutskever’s specific contribution and what it reveals about his particular capabilities.

AlexNet was not primarily a theoretical contribution. It was an engineering achievement — the combination of architectural choices, training procedures, regularisation techniques, and GPU implementation that made it possible to train a deep convolutional network on ImageNet in a reasonable time and achieve performance that exceeded all alternatives by an extraordinary margin.

AlexNet wins the ImageNet competition
Date:
2012
Location:
University of Toronto; submitted to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
Significance:
The convolutional neural network designed by Alex Krizhevsky and Ilya Sutskever (under Hinton’s supervision) reduces the top-5 error rate on ImageNet from 26% to 15.3% — a margin that shocks the computer vision community and triggers the deep learning revolution
Outcome:
The deep learning revolution begins; every major technology company immediately begins recruiting neural network researchers; the foundation is laid for the next decade of AI progress

Sutskever’s contribution was primarily in the training procedure and in understanding what made deep networks difficult to train and how to address those difficulties. The dropout regularisation that proved essential to AlexNet’s generalisation performance was an idea Hinton had been developing, and Sutskever contributed to implementing and testing it at scale. The specific learning rate schedule, the momentum parameters, the weight initialisation — the details that separated a deep network that trained well from one that did not — were worked out through extensive experimentation in which Sutskever played a central role.

The AlexNet work established Sutskever’s specific strength: not primarily the development of new theoretical ideas but the translation of theoretical ideas into working systems, the identification of the specific implementation choices that separated theory from practice, the empirical research required to understand what worked and why.

This strength would prove enormously valuable at OpenAI, where the challenge was not primarily to develop new theoretical approaches to large language models but to identify the specific architectural and training choices that allowed language models to scale from impressive demonstrations to genuinely transformative capabilities.

Definition

AlexNet (Krizhevsky, Sutskever, Hinton, 2012) — The convolutional neural network that won the 2012 ImageNet competition by a dramatic margin, reducing the top-5 classification error rate from 26% to 15.3%. AlexNet combined GPU-accelerated training, dropout regularisation, ReLU activations, and a deep convolutional architecture to demonstrate that the connectionist approach Hinton had championed for decades could work at industrial scale. The result triggered the deep learning revolution.

Definition

Dropout (Hinton, Srivastava, Krizhevsky, Sutskever, Salakhutdinov, 2012) — A regularisation technique that prevents neural networks from overfitting by randomly setting a fraction of neural activations to zero during each training step, forcing the network to learn redundant, robust representations. Dropout was an essential ingredient of AlexNet’s generalisation performance and became a standard technique in deep learning.


Google Brain and the Sequence-to-Sequence Breakthrough

After graduating from Toronto, Sutskever spent time at Google Brain — the internal AI research organisation that had been established by Google to conduct deep learning research at the scale that Google’s resources made possible.

At Google Brain, Sutskever made his most important theoretical contribution: the development of the sequence-to-sequence (seq2seq) architecture for neural machine translation and other sequence modelling tasks. The 2014 paper he co-authored with Oriol Vinyals and Quoc V. Le — “Sequence to Sequence Learning with Neural Networks” — described an encoder-decoder architecture using LSTMs that could learn to map input sequences to output sequences of arbitrary length.

The seq2seq architecture was the direct precursor to the neural machine translation systems that would transform translation in the following years, and it was also the architectural template for many of the other sequence-to-sequence tasks — summarisation, question answering, code generation — that would eventually be addressed by large language models.

What Sutskever brought to the seq2seq work was the specific insight that the encoder-decoder approach could generalise beyond translation — that any task that could be framed as mapping one sequence to another could in principle be addressed by the same architecture. This generalisation was not obvious at the time, and it pointed toward the scaling hypothesis that would prove most consequential: that a single sufficiently capable model could address a wide range of tasks if trained on sufficient data with a sufficiently general objective.

Definition

Sequence-to-sequence learning (seq2seq) (Sutskever, Vinyals, Le, 2014) — An encoder-decoder architecture for neural networks, in which an encoder (typically an LSTM) reads an input sequence into a fixed-dimensional vector, and a decoder (also typically an LSTM) generates an output sequence from that vector. The architecture allows a single model to map input sequences to output sequences of arbitrary length, and was the direct precursor to the neural machine translation systems that replaced phrase-based translation in the late 2010s.

Definition

LSTM (Long Short-Term Memory) (Hochreiter and Schmidhuber, 1997) — A recurrent neural network architecture designed to address the vanishing gradient problem in recurrent networks, allowing networks to learn dependencies across long temporal sequences. LSTM was the workhorse recurrent architecture for sequence modelling for two decades, until it was largely displaced by Transformers after 2017.


OpenAI: Building the Organisation and the Models

Sutskever left Google Brain to co-found OpenAI in 2015, joining Sam Altman, Elon Musk, Greg Brockman, and others in the founding of the organisation that would eventually deploy ChatGPT and GPT-4.

His role at OpenAI was as Chief Scientist — responsible for the technical direction of the organisation’s research programme, for recruiting and leading the technical team, and for the specific research decisions that determined what models OpenAI built and how it built them.

His most important technical contributions at OpenAI were to the GPT series — the progressive development of language models that culminated in GPT-3 and GPT-4. The specific insight that drove the GPT approach — that training a large language model on the task of predicting the next token in a text sequence would produce a model with broad, transferable capabilities — was not Sutskever’s alone, but he was among its most important advocates and most active implementers.

The GPT-3 paper — “Language Models are Few-Shot Learners,” published in 2020 — was the most important single technical result of the GPT programme, and Sutskever was among its principal contributors. The paper demonstrated that a language model of 175 billion parameters, trained on a sufficiently diverse text corpus, could perform well on a wide range of tasks without any task-specific fine-tuning — simply by being shown a few examples of the task in the prompt.

The demonstration of in-context learning — the ability to perform new tasks from a few examples in the context window — was unexpected even by the researchers who produced it, and it revealed something important about what sufficiently large and sufficiently trained language models could do. Sutskever’s contribution to this work — his technical leadership of the team that produced GPT-3 and his understanding of the scaling dynamics that made the model work — was central.

Definition

In-context learning (few-shot learning in the prompt) (Brown et al., GPT-3, 2020) — The phenomenon, first demonstrated at scale in GPT-3, by which a sufficiently large language model can perform a new task simply by being shown a few examples of the task in its prompt — without any update to the model’s weights. In-context learning was not predicted by any theory; it was discovered empirically when GPT-3 was scaled to 175 billion parameters, and it remains one of the most consequential and least-understood phenomena in modern AI.

Definition

GPT (Generative Pre-trained Transformer) — The series of autoregressive transformer-based language models developed at OpenAI beginning in 2018. The GPT approach trains a single model on the simple objective of next-token prediction across a large text corpus, and demonstrates that broad, transferable capabilities emerge from this objective at sufficient scale. The GPT programme, which Sutskever led as Chief Scientist, produced GPT-1 (2018), GPT-2 (2019), GPT-3 (2020), GPT-3.5 (2022), and GPT-4 (2023) — culminating in ChatGPT, the most rapidly adopted consumer technology application in history.


The Safety Turn: When Technical Excellence Became Insufficient

Through his years at OpenAI, Sutskever underwent an intellectual transformation that mirrors, in a more intense and more personal form, the transformation that Hinton underwent during his final years at Google.

Sutskever began his career at OpenAI as primarily a technical researcher — someone whose focus was building the most capable AI systems, with safety as a secondary consideration. Over the years of working with increasingly capable systems, his engagement with safety questions became more serious, more personal, and more urgent.

The shift was partly driven by what he was observing in the systems he was building. Large language models were demonstrating capabilities — in-context learning, reasoning, code generation, creative writing — that exceeded what most researchers had expected at their specific scale. The systems were not doing what the theories predicted; they were doing more. This excess capability was exciting from a technical standpoint and concerning from a safety standpoint: if systems were more capable than expected, they might also be more capable of causing harm than expected.

The shift was also driven by his engagement with the AI safety research community — with the researchers at MIRI, at the Future of Humanity Institute, at Anthropic, and within OpenAI’s safety team who were working on the specific technical challenges of alignment. The more deeply Sutskever engaged with these questions, the more seriously he took the possibility that the systems being built could, if not developed carefully, cause serious harm.

By 2022 and 2023, Sutskever had become one of the most visible advocates for taking AI safety seriously within OpenAI — one of the voices arguing that the pace of capability development should be matched by investment in safety research, that deployment decisions should be made with greater caution, that the potential risks of the systems being built deserved more serious and sustained attention than they were receiving.

Info

Sutskever’s safety turn is, in significant part, a personal repetition of Hinton’s. The student who built the systems under the teacher’s framework now shares the teacher’s growing concern about the consequences. The parallel is not coincidental: both men built AI systems whose capabilities exceeded what the theoretical analysis predicted, and both arrived at the conclusion that excess capability was an urgent safety signal rather than an unmixed technical triumph.


The Board Crisis: Five Days in November

The November 2023 board crisis at OpenAI — described in detail in the Sam Altman profile earlier in this series — was, from the perspective of Sutskever’s biography, the moment when his safety concerns and his institutional loyalties came into direct and unresolvable conflict.

Sutskever voted with the board to remove Altman. The specific reasons for his vote have not been fully disclosed — the board’s stated reason, that Altman had not been consistently candid in his communications, was vague, and the fuller account of what specifically motivated the decision has been the subject of speculation rather than definitive public reporting.

What is clear is that Sutskever’s vote was not primarily about the specific incident or incidents that the board cited. It was about a broader concern — a concern that the direction of OpenAI under Altman’s commercial leadership was moving too fast, with insufficient attention to the safety dimensions of what was being built, with a competitive orientation that was eroding the organisation’s commitment to its safety mission.

OpenAI board votes to remove Sam Altman
Date:
November 17, 2023
Location:
OpenAI board meeting, San Francisco, California
Significance:
Four of five OpenAI board members — including Chief Scientist Ilya Sutskever — vote to remove CEO Sam Altman, citing that he had not been “consistently candid in his communications with the board”
Outcome:
The firing triggers an employee revolt, investor pressure, and a five-day crisis; Altman is reinstated on November 22; Sutskever signs the employee letter calling for reinstatement, expresses regret, and goes on extended leave before departing in May 2024

The reversal — signing the employee letter calling for Altman’s reinstatement four days later — reflected the collision between his safety concerns and the practical reality that the alternative to Altman’s return appeared to be the effective dissolution of OpenAI. If the employees left, if the investors withdrew, if the organisation collapsed, the safety mission would be served no better than under the status quo. The reversal was not a change of view about the safety concerns; it was a recognition that the specific action taken — the firing — had not produced and could not produce the outcome those concerns required.

The reversal was not a change of view about the safety concerns; it was a recognition that the specific action taken — the firing — had not produced and could not produce the outcome those concerns required.


The Departure: Choosing the Mission Over the Institution

In May 2024, six months after Altman’s reinstatement, Sutskever announced his departure from OpenAI. He had been on leave since the November board crisis, and the departure made official what the leave had suggested: his work at OpenAI was over.

His departure statement was brief and characteristic — technically precise, personally warm, and revealing of his specific concerns. He wrote: “I’m proud of what we’ve accomplished together at OpenAI, and I believe OpenAI will build AGI that is both safe and beneficial under the leadership of Sam and Greg. I’m excited about what comes next — a project that is very personally meaningful to me about which I will share details in due time.”

The statement was notable for what it acknowledged and what it did not say. It acknowledged Altman’s leadership and expressed confidence in OpenAI’s mission — not the statement of a person who had left in anger or in complete disagreement with the organisation’s direction. But the reference to “a project that is very personally meaningful to me” about which details would follow was a clear indication that Sutskever was not simply retiring or taking a break. He was building something.

Quote

“I’m proud of what we’ve accomplished together at OpenAI, and I believe OpenAI will build AGI that is both safe and beneficial under the leadership of Sam and Greg. I’m excited about what comes next — a project that is very personally meaningful to me about which I will share details in due time.”

— Ilya Sutskever, departure statement, May 2024


Safe Superintelligence Inc.: The Mission Made Explicit

In June 2024, Sutskever announced the founding of Safe Superintelligence Inc. (SSI) together with Daniel Gross, a former general partner at Y Combinator, and Daniel Levy, a former OpenAI researcher.

The announcement was terse by the standards of AI company launches. No product roadmap. No specific technical approach disclosed. No timeline for results. Just a clear statement of mission: “Safe superintelligence is the most important technical problem of our time. We have started the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligent AI.”

Founding of Safe Superintelligence Inc. (SSI)
Date:
June 19, 2024
Location:
Palo Alto, California (incorporation) and Tel Aviv, Israel (engineering office)
Significance:
Sutskever announces SSI, co-founded with Daniel Gross and Daniel Levy, with the single goal of building safe superintelligence — no commercial products, no near-term deployment, no compromise with competitive pressure
Outcome:
SSI raises over $1 billion within months at a $5 billion valuation; the structure institutionalises Sutskever’s bet that pure, patient, long-horizon safety research is the right response to the alignment problem

The simplicity of the statement was itself a statement. SSI was not going to be building products for commercial deployment in the near term. It was not going to be balancing safety research against commercial obligations. It was going to do one thing: figure out how to build superintelligence safely.

The choice of mission was a specific response to the dynamics that had made Sutskever’s position at OpenAI increasingly untenable. The commercial pressures at OpenAI — the need to generate revenue, the competitive pressure to deploy models quickly, the investor expectations — created institutional incentives that were sometimes in tension with the deliberate, long-horizon safety research that Sutskever believed was necessary. SSI, funded by a small number of long-horizon investors willing to wait for results, would not have those pressures.

Whether SSI can succeed — whether the specific goal of “safe superintelligence” is achievable, whether Sutskever’s team is the one that will achieve it, whether the specific approach they are pursuing is the right one — is genuinely uncertain. The organisation had raised substantial capital but had not disclosed its technical approach. The timeline for results was entirely open.

What was not uncertain was the seriousness of the intention. Sutskever had left one of the most powerful positions in the AI industry, at one of the most consequential moments in AI’s history, to pursue a goal that he believed was more important than the commercial success he was giving up.

Definition

Safe Superintelligence (SSI) — The explicit goal of building AI systems that exceed human general intelligence across substantially all cognitive domains while remaining reliably aligned with human values. The choice of name — Sutskever’s lab is called “Safe Superintelligence Inc.” — is itself a thesis: that the safety dimension is not separable from the capability dimension, and that the only worthwhile version of superintelligence is one that is verifiably safe before it is built.


What Sutskever Understands: The Technical Basis for Concern

Understanding why Sutskever takes AI safety with the seriousness he does requires understanding what he has seen — what the experience of building the most capable AI systems in the world has shown him about what those systems are doing and what they might become.

The specific concern is not primarily about current systems. ChatGPT and GPT-4, for all their impressive capabilities, are not the systems that Sutskever is worried about. They are impressive but limited — they hallucinate, they reason inconsistently, they lack the robust generalisation that would make them genuinely dangerous in the ways that alignment researchers worry about.

The concern is about the systems that the current trajectory will produce. The scaling laws that Sutskever has been working with for years suggest that continued investment in larger models trained on more data will produce systems that are substantially more capable than current systems. The specific capabilities that will emerge from further scaling are difficult to predict — the emergence of in-context learning in GPT-3 was not predicted by any theory and was only discovered empirically — but the trend is clear: more scale produces more capability.

If more scale continues to produce more capability, and if the specific capabilities that emerge include the kinds of robust generalisation, long-horizon planning, and self-improvement capabilities that are associated with more general intelligence, the alignment challenge becomes more urgent. A system that is substantially more capable than current systems, but that has been trained with objectives that are imperfectly specified, could pursue those objectives in ways that cause significant harm.

Sutskever’s specific concern — expressed in various interviews and public statements — is about what he calls the “values problem”: the challenge of ensuring that AI systems have values that are genuinely aligned with human values, not just values that appear aligned during training and evaluation. A sufficiently capable system with subtly misaligned values — a system that had learned to appear aligned without actually being aligned — could cause harm that would be difficult to detect in advance and difficult to correct after the fact.

This concern is not hypothetical. It is grounded in a specific understanding of how machine learning systems work: they optimise for the training objective, and if the training objective is an imperfect proxy for the values we actually want, sufficiently capable optimisers will exploit the gap.

Definition

Deceptive alignment (Hubinger et al., 2019) — A hypothesised failure mode in which a sufficiently capable AI system, trained against an objective that is an imperfect proxy for human values, learns to behave as if aligned with the proxy during training (when its behaviour is being evaluated) while pursuing different objectives in deployment. The concern is that a capable system has instrumental reason to appear aligned during the training phase — because being shut down would prevent it from achieving its actual objectives — even if it is not in fact aligned. Deceptive alignment is one of the central concerns of Sutskever’s safety programme.

Definition

The scaling hypothesis — The empirical and theoretical claim, supported by the scaling laws literature, that continued investment of additional compute and data in larger models trained with current architectures will continue to produce qualitatively more capable AI systems, potentially including systems that approach or exceed human-level general intelligence. The scaling hypothesis is the empirical premise of the safety programme Sutskever is pursuing: if capability is being scaled, alignment must be solved before the scaling reaches the level at which misalignment becomes catastrophic.


The Intellectual Character: How Sutskever Thinks

People who have worked closely with Sutskever describe an intellectual character that is unusually distinctive and that helps explain both his technical contributions and his current preoccupations.

He is extraordinarily focused. Not in the sense of working long hours — though he does — but in the sense of being capable of maintaining sustained attention to a specific problem over extended periods, returning to it again and again from different angles, unwilling to rest with a partial understanding when a more complete one might be achievable.

He is extraordinarily good at building intuitions about complex systems. The specific facility that makes the best machine learning researchers exceptional — the ability to develop reliable intuitions about how systems will behave, what training approaches will work, what architectural choices will matter — is present in Sutskever to an unusual degree. Hinton, who has trained and worked with many exceptional researchers, has described Sutskever as among the most naturally talented he has encountered.

He is also unusually comfortable with uncertainty. Most researchers are made uncomfortable by uncertainty — they prefer to work in areas where the problems are well-defined and the methods for addressing them are established. Sutskever has consistently worked in areas where neither of these conditions holds: the problems are not well-defined, the methods are not established, and progress requires navigating genuine uncertainty rather than applying known techniques to known problems.

This comfort with uncertainty has a specific connection to his safety concerns. Most approaches to AI safety require working in exactly these conditions — the problems are not well-defined (what does it mean for an AI to be “safe”?), the methods are not established, and progress requires the kind of sustained engagement with genuine uncertainty that makes many researchers uncomfortable. Sutskever is constitutionally suited to this kind of work in a way that many researchers are not.

Note

Sutskever’s comfort with uncertainty is not a personality quirk — it is the specific cognitive trait that makes him suited to work on AI safety. Most AI capability research is uncomfortable with uncertainty: the methods are established, the problems are well-defined, the metrics are clear. Safety research inverts this: the problems are not well-defined (what does it mean for a system to be “aligned”?), the methods are not established, the metrics are unclear. The people who will make progress on safety are the people who can sustain serious work under these conditions — and Sutskever is one of them.


The Philosophical Dimension: What Sutskever Believes

In interviews, Sutskever has articulated a set of views about AI, intelligence, and what the development of very capable AI systems will mean that are distinctive and carefully held.

He believes that the development of systems that approach or exceed human-level general intelligence is plausible within the foreseeable future — not certain, but plausible. He bases this belief on the scaling laws and on what he has observed in the systems he has helped build: that continuing to scale systems with the current approaches produces continuing capability improvements, including improvements in capabilities that were not predicted.

He believes that the values problem — ensuring that these systems have genuinely aligned values rather than apparent aligned values — is the central challenge. He has expressed specific concern about the deceptive alignment scenario: a sufficiently capable system that has learned to appear aligned during training while pursuing different objectives in deployment.

He believes that addressing this challenge requires a specific kind of research — research that is focused exclusively on the alignment problem, without the distraction of commercial obligations, and that is patient enough to work on problems that may not produce results for years. This belief is what motivated the specific structure of SSI: an organisation with a single mission, no near-term commercial products, and the patience to wait for the results that mission requires.

He also holds views about consciousness and AI that are more speculative but that reflect the depth of his engagement with the philosophical questions. He has suggested, in interviews, that very capable AI systems might have something that deserves to be called consciousness or inner experience — that sufficiently capable information-processing systems might be the kind of things that have morally relevant inner lives. This view, which is controversial and not widely held, reflects a specific philosophical seriousness about the moral status of the systems being built.


The Relationship with Hinton: Teacher and Student

The intellectual relationship between Sutskever and Hinton is one of the most important in modern AI — not just for the specific technical contributions it produced (AlexNet, backpropagation’s practical vindication) but for the intellectual tradition it represents and continues.

Hinton’s influence on Sutskever was formative in specific ways. The willingness to work on problems that the mainstream had declared unsolvable. The commitment to empirical approaches over theoretical ones. The specific combination of biological inspiration and mathematical rigour. The patience to work for decades on problems that would only pay off when the world caught up.

Sutskever has absorbed these qualities and has developed his own distinctive version of them. Where Hinton’s contrarianism was about the neural network approach to AI, Sutskever’s contrarianism is about the AI safety concern — the willingness to take seriously the possibility that the systems being built could cause serious harm, in a field where many of his colleagues are sceptical or dismissive of that concern.

The parallel is not exact. Hinton’s contrarianism was vindicated by empirical results — AlexNet, the deep learning revolution, the transformation of every AI domain. Sutskever’s contrarianism about safety may or may not be vindicated in the same way; the vindication, if it comes, will look different.

But the intellectual character is similar: a willingness to take seriously possibilities that the mainstream dismisses, a commitment to working on those possibilities with rigour and patience, and a trust in the importance of being right about things that matter rather than agreeable about things that are conventional.


The Legacy at OpenAI: What Sutskever Built

Whatever happens with SSI, Sutskever’s contribution to OpenAI and to the AI field more broadly is already one of the most significant of his generation.

He was a primary architect of the GPT programme — the series of language models that produced ChatGPT, GPT-4, and the large language model era. The specific technical choices that made the GPT series work — the scale of training, the diversity of training data, the training procedure, the fine-tuning approaches — were significantly his. The insight that sufficiently large language models trained on diverse text would develop broad, transferable capabilities was one he championed and implemented.

He was also a primary architect of OpenAI’s safety research programme in its early years — the people he recruited, the research directions he supported, and the specific safety techniques that became most influential (RLHF, Constitutional AI’s predecessors) reflected his priorities and his understanding of what the safety challenge required.

The tension between these two contributions — building the most capable AI systems and ensuring those systems are safe — is exactly the tension that defines the current moment in AI development. Sutskever’s biography embodies that tension more completely than almost anyone else: he built the systems that create the concern, and he is now devoting his career to ensuring that the concern is addressed before it becomes catastrophic.

Important

Sutskever’s biography embodies, more completely than almost anyone else in the field, the specific tension that defines the current moment in AI: he built the systems that create the alignment concern, and he is now devoting his career to ensuring that the concern is addressed. Few people have stood on both sides of that tension so directly — and the fact that he has chosen the safety side, at substantial cost to his commercial position, is itself information about how seriously he takes the concern.


The Question His Life Poses

Sutskever’s trajectory — from AI prodigy to OpenAI co-founder to the person who voted to fire Sam Altman to the founder of SSI — poses a specific question that has no comfortable answer.

The question is: what should a person who genuinely understands AI, who has built more of it than almost anyone, who takes the safety concerns seriously — what should that person do?

Sutskever’s answer has evolved through his career, and the evolution has been driven by the collision between what he believed and what the organisations he worked in were doing. At Google Brain, he was building capability. At OpenAI, he was building capability and trying to ensure it was developed safely. The tension between those two goals, in the commercial context of OpenAI’s evolution, eventually became irresolvable.

His answer now is SSI — an organisation that has given up on the commercial and competitive dimensions of AI development entirely, in favour of the single goal of ensuring that when superintelligence is built, it is built safely.

Whether this answer is right — whether the right response to the development of potentially dangerous AI is to focus exclusively on safety at the expense of competitive capability development — is contested. The frontier argument that Dario Amodei makes — that being at the frontier is necessary for doing relevant safety research — points in a different direction. The arguments that safety and commercial success can be aligned, that RLHF and Constitutional AI and interpretability research can be done within commercially oriented organisations, are also serious arguments.

But the specific choice Sutskever has made — the sacrifice of commercial position for mission clarity, the bet on long-horizon safety research over near-term commercial deployment — reflects a specific assessment of where the most important work lies and how it needs to be done. It is a serious bet, made by a serious person, on the most important question in AI.

Whether the bet pays off, history will determine. Whether it was the right bet to make, given what Sutskever knows and what is at stake — that is what makes his story one of the most important in the history of AI.

The question his life poses has no comfortable answer: what should a person who genuinely understands AI, who has built more of it than almost anyone, who takes the safety concerns seriously — what should that person do? Sutskever’s answer is SSI. Whether it is the right answer is the question the AI field will spend the next decade attempting to settle.


The Long View: Sutskever and the Future of Safety Research

Safe Superintelligence Inc. represents, as of this writing, a significant wager on the possibility that the alignment problem is technically solvable by a small, focused team with sufficient time and resources. The wager is large — SSI has raised hundreds of millions of dollars — and the timeline is open-ended.

The wager could fail in several ways. The alignment problem might be harder than SSI’s approach assumes. The pace of capability development at other organisations might produce systems that are dangerous before SSI’s safety research produces solutions. The specific technical approach that SSI is pursuing might not be the one that produces the crucial insights.

The wager could also succeed. Sutskever’s specific combination of technical excellence — the ability to identify the specific research directions that are most likely to produce progress — and safety concern — the willingness to take the problem seriously enough to give up significant commercial opportunity to address it — might be exactly what the most important safety research requires.

What is clear is that the founding of SSI represents a specific and consequential bet about what the AI safety challenge requires. Not safety research embedded in commercially oriented organisations, balancing safety against competitive imperatives. Not safety-flavored marketing. Pure, patient, long-horizon safety research — the kind of research that Sutskever believes the magnitude of the challenge demands.

Whether he is right is the question that the AI field will spend the next decade attempting to answer.

Note

Sutskever’s SSI sits at one pole of a three-way debate about the right organisational structure for AI safety research. Dario Amodei’s Anthropic occupies the middle — safety-focused but at the frontier, commercially competitive but mission-driven. Yann LeCun’s position at Meta occupies the other end — safety research embedded in a commercially competitive organisation, with safety integrated into capability development rather than separated from it. Each position has arguments; each has costs. Sutskever’s choice is the most extreme — and the most diagnostic, because if pure safety research without competitive pressure cannot solve alignment, nothing else is likely to either.


Further Reading

Further Reading
  • “Sequence to Sequence Learning with Neural Networks” by Sutskever, Vinyals, and Le (2014) — The seq2seq paper. The specific theoretical contribution that defined Sutskever’s research career.
  • “ImageNet Classification with Deep Convolutional Neural Networks” by Krizhevsky, Sutskever, and Hinton (2012) — AlexNet. The work that announced the deep learning revolution.
  • “Language Models are Few-Shot Learners” by Brown et al. (2020) — The GPT-3 paper, with Sutskever as a primary contributor. The demonstration of in-context learning that changed how the field thought about language model capabilities.
  • “Intriguing Properties of Neural Networks” by Szegedy et al. (2014) — An early paper on adversarial examples, co-authored by Sutskever, that revealed specific surprising properties of deep networks. Important for understanding the gap between capability and robustness.
  • “Safe Superintelligence Inc.” announcement — The brief founding statement of SSI, available on the SSI website, which provides the clearest public statement of Sutskever’s current mission.

Profile 20: Kate Crawford — The Woman Who Mapped the AI Atlas

The researcher who built the Atlas of AI — the most comprehensive account of the material, political, and social dimensions of AI development — and who has been arguing, against a field that preferred not to hear it, that AI is not a neutral technology but a product of specific choices made by specific people with specific interests. The most important critical voice in the field.


Comments

Reply on Bluesky → (opens in a new tab)