The Scientific AI: When Machines Became Research Partners

“She opens her laptop and navigates to a database that has been online for three months: the AlphaFold Protein Structure Database, which contains predicted structures for essentially every protein in the human proteome, freely available to any researcher in the world. She searches for her protein. It is there. The predicted structure appears on her screen — the twisted ribbons and coiled strands of the three-dimensional architecture that she has been trying to determine for three years. She looks at the structure for a long time. Then she writes in her notebook: ‘This changes everything.’”

— “This changes everything.”

Cambridge, England. October 2021. A biologist named Kathryn Lilley is working late in her laboratory at the University of Cambridge. She has spent three years trying to determine the structure of a specific protein — a protein that plays a role in a disease mechanism she has been studying for most of her career. The experimental methods available to her — X-ray crystallography, cryo-electron microscopy — have not cooperated. The protein resists crystallisation. The samples are impure. The data is noisy.

She opens her laptop and navigates to a database that has been online for three months: the AlphaFold Protein Structure Database, which contains predicted structures for essentially every protein in the human proteome, freely available to any researcher in the world.

She searches for her protein. It is there. The predicted structure appears on her screen — the twisted ribbons and coiled strands of the three-dimensional architecture that she has been trying to determine for three years.

She looks at the structure for a long time. Then she writes in her notebook: “This changes everything.”

This is the scientific AI moment — not a single event but a recognition, spreading through one field of science after another, that the tools of artificial intelligence are not just analytical aids but genuine research partners, capable of accelerating discovery in ways that are changing what it means to do science.

Important

The scientific AI moment is not a single event but a recognition, spreading through one field of science after another, that the tools of artificial intelligence are not just analytical aids but genuine research partners — capable of accelerating discovery in ways that are changing what it means to do science. The Cambridge biologist who, after three years of failed crystallography, opens her laptop and finds her protein’s structure already predicted and freely available in the AlphaFold database, captures the moment in three words written in her notebook: “This changes everything.”

The AlphaFold Inflection: When AI Solved a Grand Challenge

The AlphaFold story — covered in depth in E17 — was the first and most dramatic demonstration that AI could solve long-standing scientific grand challenges. But its significance extends beyond the specific protein folding problem: it established the paradigm of AI as a scientific problem-solver at a level of capability that changed what the broader research community considered possible.

Before AlphaFold, the conventional wisdom was that AI could be a useful tool in science — for analysing large datasets, for identifying patterns in existing data, for automating specific well-defined tasks — but that the hard creative work of scientific discovery remained the domain of human researchers. AI could help scientists do what they already did; it could not independently advance the frontiers of knowledge.

AlphaFold challenged this conventional wisdom directly. The protein structure prediction problem was not a data analysis problem or a pattern recognition problem in the ordinary sense. It was a fundamental scientific problem — predicting how a sequence of amino acids would fold into a three-dimensional structure — that required understanding physics, chemistry, and the evolutionary constraints that shape protein structure. The scientific community had been working on this problem for fifty years and had made progress, but the problem had remained unsolved.

AlphaFold solved it.

The solution was not achieved by understanding the physics and chemistry of protein folding at a fundamental level — AlphaFold did not develop a new physical theory of protein folding. It was achieved by learning, from the database of experimentally determined protein structures, what structural patterns were associated with what sequence patterns, and using that learned knowledge to make predictions for sequences with unknown structures.

This achievement established something important: that AI systems could develop sufficient understanding of a scientific domain — through learning from existing scientific data — to make predictions that exceeded what expert human scientists could make. The understanding was not the same as human scientific understanding; it was encoded in the weights of a neural network rather than in explicit theories and models. But the capability it produced was genuine scientific capability — the ability to predict, with accuracy comparable to expensive experimental determination, the structure of proteins that had never been directly observed.

Definition

Protein folding problem — The 50-year-old grand challenge of predicting, from a protein’s amino acid sequence, the three-dimensional structure into which it will fold. The sequence determines the structure (Anfinsen’s theorem, 1973), but the mapping is extraordinarily complex: a typical protein has hundreds of amino acids, each of which can adopt many conformations, and the biologically active native structure depends on the interactions of all of them. Before AlphaFold, experimental determination (X-ray crystallography, cryo-electron microscopy) was the only reliable method — taking years per protein and failing entirely on proteins that resist crystallisation. AlphaFold’s solution, announced at CASP14 in November 2020, achieved median GDT scores of ~92.4 (previous best ~75; “experimental quality” threshold ~90) — a level John Moult, the CASP chair, summarised as “in some sense the problem is solved.”

Drug Discovery: The Acceleration of a Trillion-Dollar Industry

The pharmaceutical industry has been one of the most consequential early domains for AI-driven scientific acceleration. Drug discovery — the process of finding molecules that can treat diseases safely and effectively — is extraordinarily expensive, time-consuming, and prone to failure. It typically takes ten to fifteen years and more than a billion dollars to bring a new drug from initial discovery to approval, and the majority of drug candidates fail at various stages of the process.

AI has been applied to multiple stages of the drug discovery pipeline, with significant demonstrated impact.

Target identification. Before a drug can be designed, researchers need to identify which proteins or biological pathways to target — which molecules, if their function were altered, would produce the desired therapeutic effect without unacceptable side effects. AI systems trained on genomic, proteomic, and clinical data can identify potential drug targets from patterns in biological data that human analysis would miss.

Structure-based drug design. Once a target protein has been identified, drug design involves finding small molecules that bind to the target in specific ways — inhibiting its function, activating it, or altering its interactions with other molecules. AlphaFold structures have transformed this process: for the first time, drug designers can access structural information about their target proteins without waiting years for experimental structure determination.

Molecular generation. AI generative models can design novel molecules — generating structures that have desired properties, including binding affinity to specific targets, selectivity, and drug-like properties — without the need for exhaustive screening of existing molecular libraries. Generative molecular design produces candidate molecules that are genuinely novel, exploring areas of molecular space that human medicinal chemists might not have reached.

Clinical trial design and patient selection. AI analysis of clinical and genomic data can identify which patients are most likely to respond to a specific drug, enabling more precise clinical trials with better-selected patient populations. This reduces the size and cost of trials and increases the probability of success.

The practical impact has been documented in several high-profile cases. Insilico Medicine developed a drug for idiopathic pulmonary fibrosis that reached Phase II clinical trials in 2023 — the first AI-designed drug molecule to reach Phase II — in a process that took approximately two years from initial target identification to clinical candidate, compared to the four to five years that would typically be expected.

Insilico Medicine’s INS018_055 reaches Phase II clinical trials

Date:: 2023 (Phase II initiation)
Location:: Insilico Medicine (Hong Kong / New York)
Significance:: Insilico Medicine’s INS018_055 — a small-molecule inhibitor for idiopathic pulmonary fibrosis (IPF) — became the first AI-designed drug molecule to reach Phase II clinical trials. The process took approximately two years from initial AI-driven target identification to clinical candidate, compared to the four to five years typically expected.
Outcome:: The Insilico milestone was the first concrete demonstration that AI-driven drug discovery could compress the timeline from target identification to clinical candidate by roughly half. Whether the drug ultimately proves safe and effective in Phase III trials — and whether the timeline compression generalises to other drug classes — remains to be validated over the next several years.

Isomorphic Labs, DeepMind’s drug discovery company, has entered partnerships with Eli Lilly and Novartis worth hundreds of millions of dollars to apply AI-driven drug discovery to a range of disease targets. The commercial scale of these partnerships reflects genuine confidence that the approach will produce results.

The broader promise — that AI could significantly compress the timeline and reduce the cost of drug discovery, bringing treatments for diseases that have resisted decades of human research effort within reach — is still being validated. The clinical trial data that will confirm or refute this promise will accumulate over the next several years.

Mathematics: When AI Writes Proofs

One of the most unexpected applications of AI to scientific discovery has been in mathematics — a domain that might seem to require the kind of precise, deductive reasoning that AI systems have historically struggled with.

AlphaGeometry, published by DeepMind in January 2024, demonstrated that AI could solve complex geometry problems — specifically, problems of the type that appear in the International Mathematical Olympiad — at a level competitive with human mathematical olympiad participants. The system combined a language model for high-level reasoning with a symbolic geometry engine for precise formal calculations, achieving performance on olympiad problems that had not been achieved by any previous AI system.

DeepMind publishes AlphaGeometry

Date:: January 17, 2024
Location:: Google DeepMind, London
Significance:: DeepMind published AlphaGeometry — a system that combined a language model for high-level reasoning with a symbolic geometry engine for precise formal calculations. The system solved 25 of 30 problems from the International Mathematical Olympiad (IMO) geometry corpus — a level competitive with human IMO gold medalists and not achieved by any previous AI system.
Outcome:: AlphaGeometry was significant for what it demonstrated about AI’s relationship to mathematical reasoning: that with the right combination of neural and symbolic approaches, AI could perform the kind of structured, precise reasoning that mathematical proof requires. The broader implication — whether AI can assist professional mathematicians in proving theorems, not just verifying human-constructed proofs — became an active research direction.

The AlphaGeometry result was significant for what it demonstrated about AI’s relationship to mathematical reasoning: that with the right combination of neural and symbolic approaches, AI could perform the kind of structured, precise reasoning that mathematical proof requires.

The implications for mathematics as a discipline go beyond olympiad problems. The broader question is whether AI can assist professional mathematicians in the hard work of proving theorems — not just verifying proofs that humans have constructed, but actively contributing to the proof discovery process.

Lean, Coq, and other formal proof assistants have existed for decades, allowing mathematicians to construct proofs that are verified by computer. But the use of these systems has required translating mathematical intuition into the formal language of the proof assistant — a translation that is time-consuming and requires specific expertise.

Large language models trained on mathematical text have shown capability in bridging this translation. A system like Lean-Copilot can suggest proof steps to a mathematician working in the Lean proof assistant, reducing the friction of formal verification. AI systems that can suggest lemmas, identify similar previously proved results, and translate informal mathematical reasoning into formal proof steps are beginning to change how mathematicians work.

Definition

Formal proof assistant — A software system (such as Lean, Coq, or Isabelle) that allows mathematicians to construct mathematical proofs in a formal, machine-checkable language. The proof assistant verifies each step of the proof against the rules of logic and the axioms of the chosen mathematical foundation, guaranteeing that any accepted proof is mathematically valid. Formal proof assistants have existed since the 1970s (Automath, then Coq, Isabelle, Lean), but their use has been limited by the laborious translation of mathematical intuition into their formal input languages. Large language models trained on mathematical text (Lean-Copilot, 2023) are beginning to bridge this translation — suggesting proof steps, identifying similar previously proved results, and translating informal reasoning into formal proof steps.

The Field Institute in Toronto has established a programme specifically for exploring AI-assisted mathematical research. The programme brings together mathematicians and AI researchers to work on specific open problems in mathematics, exploring what AI tools can contribute to the discovery of new mathematical results.

Climate Science: AI and the Planet

Climate science is one of the domains where AI’s potential for scientific acceleration is most consequential — and where the stakes of getting the science right are highest.

The complex Earth system models that simulate climate dynamics and project future climate scenarios are among the most computationally intensive scientific tools in existence. Running a full Earth system model simulation at high resolution requires supercomputer time that limits the number and variety of simulations that climate scientists can conduct. AI offers the potential to dramatically accelerate climate modelling by learning statistical emulators of specific model components — replacing computationally expensive physical simulations with learned approximations that run orders of magnitude faster.

Nvidia’s FourCastNet and Google DeepMind’s GraphCast are AI weather prediction models that can generate global weather forecasts in seconds rather than the hours required by traditional numerical weather prediction. These models, trained on decades of weather observation and simulation data, can match the accuracy of traditional forecasting methods on standard benchmarks while requiring a tiny fraction of the computational resources.

DeepMind publishes GraphCast

Date:: November 14, 2023
Location:: Google DeepMind, London
Significance:: DeepMind published GraphCast — an AI weather prediction model that could generate global weather forecasts in seconds rather than the hours required by traditional numerical weather prediction. Trained on decades of historical weather data (the ECMWF’s ERA5 reanalysis), GraphCast matched the accuracy of traditional forecasting methods on standard benchmarks while requiring a tiny fraction of the computational resources.
Outcome:: GraphCast demonstrated that AI weather prediction was not a future possibility but a present reality. The model was integrated into the European Centre for Medium-Range Weather Forecasts’ operational pipeline in 2024, marking the first time an AI weather model joined a major operational forecasting system.

For climate science, the acceleration provided by AI emulators enables new types of analysis that were not previously feasible:

Large ensemble modelling. Running thousands of simulations with slightly different initial conditions to understand the range of possible climate futures — important for understanding climate uncertainty — becomes feasible when each simulation takes seconds rather than hours.

High-resolution regional projections. Providing detailed, high-resolution projections of climate impacts at local scales — the information most relevant for adaptation planning — requires computationally intensive downscaling from global models. AI-based downscaling methods can provide this detail much more efficiently.

Attribution studies. Determining the contribution of human-caused climate change to specific extreme weather events requires comparing many simulations of a world with and without human forcing. AI emulators make it feasible to conduct the large number of simulations required for robust attribution.

Beyond climate modelling, AI is being applied to the specific scientific challenges of climate adaptation and mitigation:

Renewable energy optimisation. AI systems optimising the placement and operation of renewable energy infrastructure — predicting solar and wind output, optimising grid management, managing energy storage — can increase the efficiency of renewable energy systems in ways that accelerate decarbonisation.

Carbon measurement and monitoring. AI analysis of satellite imagery is being used to measure forest carbon stocks, to detect deforestation and illegal logging, and to verify the carbon credits that fund forest conservation. The AI measurement approaches are more precise, more scalable, and more cost-effective than the ground-based measurement methods they replace.

Climate tipping point prediction. AI analysis of climate data is being used to identify early warning signals of tipping points — the thresholds beyond which specific climate systems might undergo abrupt, potentially irreversible changes. The identification of these early warning signals requires pattern recognition in high-dimensional, noisy data that AI systems are well-suited to.

Physics: AI as a Theoretical Partner

Physics has been one of the most intellectually ambitious applications of AI to scientific discovery — with researchers exploring whether AI can make genuine contributions to theoretical physics, not just in analysing data but in developing and discovering physical theories.

Condensed matter physics. The simulation of quantum materials — understanding how electrons interact in complex materials to produce phenomena like superconductivity, magnetism, and topological states — is one of the most computationally demanding problems in physics. Neural quantum states — neural network representations of quantum many-body wavefunctions — have demonstrated the ability to represent and compute properties of quantum systems that would be intractable with conventional methods.

Particle physics. The analysis of particle collision data from accelerators like the LHC at CERN involves processing enormous quantities of data to identify the signatures of rare physics events. AI systems trained for particle physics event classification have substantially improved the sensitivity of searches for new physics, including supersymmetric particles and other beyond-Standard-Model phenomena.

Gravitational wave astronomy. The analysis of gravitational wave data from detectors like LIGO requires filtering extremely weak signals from a noise background that is many orders of magnitude larger. AI signal processing has improved the sensitivity of gravitational wave detection and enabled the identification of signals that would have been missed by traditional matched-filter methods.

Symbolic regression. One of the most intellectually interesting AI approaches to physics is symbolic regression — the use of AI systems to discover mathematical equations that describe observed data. Tools like AI Feynman, developed by MIT researchers, can take experimental data and output the mathematical formula that best describes it, including the known fundamental physics equations in some cases.

Definition

Symbolic regression — The use of AI systems to discover mathematical equations that describe observed data — not by fitting parameters to a pre-specified model form (as in standard regression) but by searching the space of possible mathematical expressions for the one that best describes the data while remaining simple. Tools like AI Feynman (developed by MIT’s Max Tegmark and colleagues) can take experimental data and output the mathematical formula that best describes it, including — in some cases — recovering known fundamental physics equations (Newton’s laws, Einstein’s energy-momentum relation, the Hamiltonian of a double pendulum) directly from data. Symbolic regression is intellectually interesting because it represents a form of scientific discovery that goes beyond pattern recognition and approaches genuine theoretical contribution: the AI is generating mathematical laws, not just fitting them.

The implications of symbolic regression for theoretical physics are profound. If AI can discover mathematical laws from data — not just fit parameters to existing models but discover the mathematical structure of the laws themselves — it represents a form of scientific discovery that goes beyond pattern recognition and approaches genuine theoretical contribution.

Materials Science: Discovering What We Need

Materials science has been transformed by AI in ways that are already producing practical applications in energy, electronics, and manufacturing.

The materials discovery challenge is one of the classic needle-in-a-haystack problems in science: the space of possible materials — combinations of elements in different proportions and structures — is astronomical, but only a tiny fraction of possible materials have useful properties. Traditional materials discovery relied on intuition-guided experimentation, testing promising candidates one by one. AI enables a different approach: screening enormous numbers of candidate materials computationally, identifying the most promising candidates for experimental synthesis.

DeepMind’s GNoME system, published in November 2023, used graph neural networks to predict the stability of inorganic crystal structures. In a single study, the system predicted the structures of 2.2 million new stable crystals — more than had been discovered by experimental methods across all of human history. Among the predicted structures were materials with properties relevant to batteries, superconductors, and other technologies with significant practical importance.

DeepMind publishes GNoME

Date:: November 29, 2023
Location:: Google DeepMind, London
Significance:: DeepMind published GNoME (Graph Networks for Materials Exploration) — a graph neural network system that predicts the stability of inorganic crystal structures. In a single study, GNoME predicted the structures of 2.2 million new stable crystals — more than had been discovered by experimental methods across all of human history. Among the predicted structures were materials with properties relevant to batteries, superconductors, and other technologies with significant practical importance.
Outcome:: GNoME demonstrated that AI-accelerated materials discovery was not a future possibility but a present reality. The 381,000 most stable of the predicted structures were released to the Materials Project database for experimental verification, providing a roadmap for materials scientists that could accelerate the energy transition (better batteries, solar cells, fuel cells) in ways not achievable through traditional experimental approaches.

The Microsoft Azure Quantum team published research in 2023 demonstrating AI-accelerated discovery of solid-state electrolyte materials for next-generation batteries. The AI screening process identified 18 candidate materials from a starting pool of 32 million potential compounds in a process that took days rather than the years that exhaustive experimental screening would have required. Two of the identified materials showed promise in initial experimental testing.

The practical implications of AI-accelerated materials discovery are significant for the energy transition. The batteries, solar cells, and fuel cells that a decarbonised energy system will require all depend on materials with specific properties. AI-enabled discovery of better materials for these applications could accelerate the energy transition in ways that are not achievable through the traditional experimental materials science approach.

The Nature of AI-Assisted Science: What Is Being Changed

The specific character of AI-assisted science — the ways in which the research process is changing when AI systems are genuine partners — deserves careful analysis. Not all science is being changed in the same way, and understanding what AI changes and what it does not is essential for evaluating the transformation.

Data analysis and pattern recognition. The most straightforward role of AI in science is in data analysis — finding patterns in large, complex datasets that human analysis could not extract efficiently or reliably. This role has been important in genomics, in astronomy, in clinical research, and in other data-rich scientific domains for many years. The specific advances of the current AI era have expanded this role significantly: the patterns that AI can now find are more complex, more subtle, and more informative than those accessible to previous analytical methods.

Hypothesis generation. More ambitiously, AI systems are being used to generate scientific hypotheses — to propose explanations for observed phenomena, to suggest experiments that would distinguish between competing hypotheses, to identify potential mechanisms for disease processes. This role is more complex than data analysis because it involves not just finding patterns but interpreting them and generating explanatory accounts.

Experimental design and optimisation. AI systems are being used to design and optimise experiments — to determine which experiments would provide the most information about the questions being investigated, to identify the optimal conditions for experimental protocols, and to guide the sequential design of experiments in adaptive research processes.

Literature synthesis. AI systems can process and synthesise the vast scientific literature in ways that human researchers cannot. A researcher in a new sub-field who needs to understand the existing knowledge in that sub-field can use AI to survey thousands of relevant papers and identify the key findings, the open questions, and the most important methodological approaches. The AI-assisted literature synthesis is not perfect — it is susceptible to hallucination and to missing important nuance — but it can provide a starting point that would have taken months of manual literature review to develop.

Scientific communication. AI systems are being used to assist with the writing of scientific papers, grant applications, and scientific communications more broadly. The assistance ranges from proofreading and editing to generating first drafts of standard sections to helping translate scientific results into language accessible to non-specialist audiences.

What AI is not doing — what remains the domain of human scientists — is the kind of conceptual innovation that represents the highest level of scientific creativity: the development of fundamentally new theoretical frameworks, the identification of the deep questions that define a research agenda, the creative leap from anomalous data to transformative insight. These capabilities may eventually be within the reach of more capable AI systems, but they are not demonstrated capabilities of current systems.

Info

AI plays five roles in modern science, in roughly increasing order of ambition:

Data analysis and pattern recognition — finding patterns in large complex datasets (genomics, astronomy, clinical research; the longest-standing role)
Hypothesis generation — proposing explanations, suggesting discriminating experiments, identifying mechanisms
Experimental design and optimisation — determining which experiments are most informative, optimal conditions, adaptive sequential design
Literature synthesis — surveying thousands of relevant papers to identify key findings, open questions, methodological approaches (imperfect — susceptible to hallucination — but a months-of-manual-review starting point)
Scientific communication — proofreading, editing, drafting standard sections, translating for non-specialists

What AI is not doing — what remains the domain of human scientists — is the highest level of scientific creativity: developing fundamentally new theoretical frameworks, identifying the deep questions that define a research agenda, making the creative leap from anomalous data to transformative insight. These may eventually be within reach of more capable AI systems; they are not demonstrated capabilities of current systems.

The Epistemological Question: How Science Works with AI

The integration of AI into scientific practice raises fundamental questions about epistemology — about how scientific knowledge is justified and what it means to understand something.

The classic ideal of scientific knowledge is that it is based on mechanistic explanation: we understand a phenomenon when we can explain why it happens in terms of more fundamental principles. We understand the motion of the planets because we understand Newton’s laws of gravitation. We understand the properties of water because we understand the chemical bonding of hydrogen and oxygen. Understanding, in this view, means knowing the mechanism.

AI-generated scientific knowledge sometimes fits this model and sometimes does not. AlphaFold’s protein structure predictions are accurate — they describe what the structures are — but they do not, on their own, explain why proteins fold the way they do in terms of fundamental physical principles. The knowledge encoded in AlphaFold is genuine and useful, but it is not mechanistic knowledge in the classical sense. It is a black-box model that can predict outcomes without specifying the mechanism that produces them.

This black-box character of AI-generated scientific knowledge is a genuine epistemological challenge for science. Science has historically aimed at mechanistic understanding, not just accurate prediction. The ability to predict outcomes accurately is valuable, and AI’s ability to make accurate predictions in domains where mechanistic understanding has been elusive is a genuine scientific contribution. But prediction without understanding is not the same as the understanding that science has traditionally aimed at.

Several scientific communities are grappling with this challenge in specific ways.

In biology, the AlphaFold predictions have been extraordinarily useful for generating hypotheses that can be tested experimentally. The structures predict what regions of a protein might be important for its function, what interactions with other proteins might be possible, what mutations might affect stability or function. These predictions guide experimental research that can then provide the mechanistic understanding that AlphaFold alone does not provide.

In physics, the AI discoveries of mathematical laws through symbolic regression are closer to traditional scientific understanding: they provide equations that can be interpreted, that can be related to existing theoretical frameworks, and that can generate testable predictions. When AI discovers the Lagrangian of a physical system from experimental data, the discovery contributes to genuine understanding in the way that human theoretical physics does.

In drug discovery, AI prediction of drug candidates is valuable primarily as a screening tool — it narrows the space of candidates to be tested experimentally. The mechanistic understanding of why a specific drug works comes from the experimental and clinical research that follows the AI-enabled initial discovery.

The classic ideal of scientific knowledge is mechanistic explanation: we understand a phenomenon when we can explain why it happens in terms of more fundamental principles. AI-generated scientific knowledge sometimes fits this model and sometimes does not. AlphaFold’s protein structure predictions are accurate — they describe what the structures are — but they do not, on their own, explain why proteins fold the way they do in terms of fundamental physical principles. The knowledge is genuine and useful, but it is black-box knowledge: a model that can predict outcomes without specifying the mechanism that produces them.

Science has historically aimed at mechanistic understanding, not just accurate prediction. AI’s ability to make accurate predictions in domains where mechanistic understanding has been elusive is a genuine scientific contribution — but prediction without understanding is not the same as the understanding science has traditionally aimed at. The scientific community is responding by treating AI predictions as hypothesis generators: the prediction guides the experiment that provides the mechanistic understanding the AI alone cannot.

The Reproducibility Question: AI and Scientific Reliability

Science depends on reproducibility — the ability of independent researchers to repeat experiments and obtain the same results, confirming that the findings are reliable rather than the product of error, bias, or chance. The integration of AI into scientific research raises new reproducibility challenges that the scientific community is beginning to grapple with.

Model variability. AI models used in scientific research may produce different results depending on the specific version of the model, the specific training procedure, and the specific random seed used in training. The reproducibility of AI-assisted scientific findings requires reporting the specific model and its specific parameters, which adds complexity to scientific reporting.

Data leakage. AI models trained on large datasets may have “seen” specific scientific findings as part of their training data, creating the appearance of discovery when the system is actually recalling training data. This is a specific risk in domains where the training data and the test data overlap significantly — including in language model applications to literature synthesis and scientific question answering.

Interpretability of results. When AI systems identify patterns or generate predictions that researchers build on, the interpretability of those results matters for the reproducibility of subsequent research. If the AI’s contribution cannot be understood by other researchers, they cannot independently verify it or build on it reliably.

The scientific community is developing standards for AI-assisted research that address these challenges. Several journals have published guidance on reporting AI-assisted research. The clinical research community has developed specific standards for AI in clinical trials. The broader scientific reproducibility movement has extended its focus to include the specific challenges of AI-assisted research.

Warning

AI integration into scientific research raises three reproducibility challenges that the scientific community is still developing standards to address:

Model variability — the same model architecture can produce different results depending on version, training procedure, and random seed. Reproducibility requires reporting the specific model and parameters — adding complexity to scientific reporting.
Data leakage — AI models trained on large datasets may have “seen” specific scientific findings as part of their training data, creating the appearance of discovery when the system is actually recalling training data. This is a specific risk in domains where training and test data overlap.
Interpretability of results — when AI systems identify patterns that researchers build on, the interpretability of those results matters for reproducibility of subsequent research. If the AI’s contribution cannot be understood by other researchers, they cannot independently verify it or build on it reliably.

The scientific community is responding with journal reporting standards, clinical-research-specific AI standards, and extensions to the broader reproducibility movement — but the standards are still catching up with the practice.

The International Dimension: Who Benefits from Scientific AI

The benefits of AI-assisted scientific discovery are not uniformly distributed across the global scientific community. The AI tools that are most capable are developed primarily in the United States and China, and access to those tools — and the ability to build on them — is not equally available to researchers in all countries.

This creates a specific concern about the international distribution of AI-assisted scientific progress. If the scientific advances enabled by AI primarily benefit researchers in high-income countries with access to frontier AI tools and computing resources, the global inequality in scientific capacity may worsen rather than improve.

Several specific dimensions of this concern deserve attention.

Language. The large language models that underpin AI-assisted scientific tools are primarily trained on English-language text. Their performance on scientific tasks in other languages is generally lower, disadvantaging researchers who work primarily in non-English languages and who have contributed to a scientific tradition that AI systems have not adequately incorporated.

Training data. AI systems trained on scientific datasets are trained on the datasets that are available — primarily from research published in high-impact journals, which in turn reflects research conducted at well-resourced institutions. The specific scientific knowledge that AI systems incorporate reflects the specific research priorities and methodological traditions of the institutions that have historically dominated scientific publishing.

Computational access. Running frontier AI models for scientific research requires significant computational resources — GPU clusters that most research institutions in low-income countries do not have access to. The ability to use and develop AI tools for scientific research is therefore concentrated in institutions with substantial computing resources.

Addressing these disparities is not just a matter of equity — it is a matter of scientific quality and completeness. The scientific questions that matter most for human welfare include questions about diseases, environmental challenges, and technological needs that are most acute in the global south. AI-assisted science that is primarily conducted by researchers in high-income countries will address primarily the scientific questions that matter most to those countries.

Info

The benefits of AI-assisted science are not uniformly distributed across the global scientific community. Three dimensions of inequality deserve attention:

Language — large language models are primarily trained on English-language text; performance on scientific tasks in other languages is generally lower, disadvantaging researchers in non-English-language scientific traditions
Training data — AI systems trained on scientific datasets are trained primarily on research from high-impact journals, which reflects the priorities and methodological traditions of well-resourced institutions (typically in high-income countries)
Computational access — running frontier AI models requires GPU clusters that most research institutions in low-income countries do not have access to; the ability to use and develop AI tools is concentrated in well-resourced institutions

Addressing these disparities is not just a matter of equity — it is a matter of scientific quality and completeness. The scientific questions that matter most for human welfare include questions about diseases, environmental challenges, and technological needs most acute in the global south. AI-assisted science conducted primarily by researchers in high-income countries will address primarily the questions that matter most to those countries.

The Scientific Revolution in Progress

The transformation of science by AI is not a completed event but a revolution in progress — a transformation that is accelerating, that is spreading from domain to domain, and that is still in its early stages in most scientific fields.

The fields that have been most transformed — protein biology, drug discovery, weather prediction — have specific features that made them particularly amenable to early AI transformation: large datasets of labelled examples, clear objective functions, and well-defined problem structures. The fields that have been less transformed to date — some branches of physics, many branches of the social sciences, historical and qualitative research — have features that make AI tools less directly applicable or that require the development of new AI approaches.

The direction of travel is consistent across fields: more AI assistance in more phases of the research process, with AI contributing most directly to the data-intensive and pattern-recognition-intensive phases while human researchers continue to provide the conceptual innovation and interpretive judgment.

The specific trajectory of the scientific AI revolution will be shaped by several factors:

Technical advances in AI. The specific AI capabilities that would most expand the scope of AI-assisted science — better reasoning under uncertainty, better ability to generate and evaluate novel hypotheses, better ability to connect across scientific domains — are active research directions that will determine how far beyond current capabilities scientific AI can advance.

Scientific community adoption. The integration of AI tools into scientific practice requires changes in how scientists are trained, how research is conducted and reported, and how scientific institutions evaluate and reward research contributions. The pace of these changes will determine the pace of AI adoption across scientific fields.

Governance and data access. The availability of high-quality scientific data — and the governance frameworks that determine who can access it and under what conditions — will shape which scientific communities can benefit from AI tools and which questions AI can be applied to.

The Partnership Metaphor and Its Limits

The metaphor that describes the relationship between AI and science as a “partnership” is useful but has limits that are worth acknowledging.

A partnership implies mutual contribution — each partner bringing something that the other lacks. AI brings computational power, pattern recognition at scale, and the ability to process information that exceeds human cognitive capacity. Human researchers bring creativity, domain expertise, interpretive judgment, and the ability to identify the questions that are worth asking.

But a partnership also implies some degree of equality — of shared goals and shared investment in the outcome. AI systems do not have goals in the way that human partners do; they do not care whether the science succeeds. They will find the patterns that are consistent with their training objectives, regardless of whether those patterns correspond to genuine scientific understanding.

The partnership metaphor also risks obscuring important questions about who benefits from AI-assisted science and whose interests shape the research agenda. AI systems reflect the priorities of the organisations that build them and the datasets on which they are trained. If those priorities are primarily commercial — finding drugs that are profitable rather than drugs that address the most pressing health needs — the “partnership” between AI and science may serve those priorities rather than the broader human interest in scientific understanding.

With these caveats noted, the partnership metaphor captures something real and important: the integration of AI into scientific practice is not the replacement of human scientists but the augmentation of human scientific capability — making possible research that would not have been possible without AI assistance, accelerating the pace of discovery, and extending the reach of science into domains where human analysis alone would be insufficient.

This is the scientific AI moment: not the end of human science but the beginning of a new kind of science, in which the tools of artificial intelligence are genuine partners in the oldest and most consequential human project — the effort to understand the world.

Important

The “partnership” metaphor for AI in science is useful but has limits worth acknowledging. A partnership implies mutual contribution — AI brings computational power, pattern recognition at scale, and processing capacity that exceeds human cognition; humans bring creativity, domain expertise, interpretive judgment, and the ability to identify questions worth asking. ✓

But a partnership also implies some degree of equality of shared goals. AI systems do not have goals in the way that human partners do; they do not care whether the science succeeds. And the metaphor risks obscuring questions about whose interests shape the research agenda — AI systems reflect the priorities of the organisations that build them and the datasets on which they are trained. If those priorities are primarily commercial (finding profitable drugs rather than drugs that address the most pressing health needs), the “partnership” may serve those priorities rather than the broader human interest in scientific understanding.

With these caveats: the partnership metaphor captures something real. The integration of AI into scientific practice is not the replacement of human scientists but the augmentation of human scientific capability — making possible research that would not have been possible without AI assistance, accelerating the pace of discovery, and extending the reach of science into domains where human analysis alone would be insufficient.

The Scientific AI: When Machines Became Research Partners

The AlphaFold Inflection: When AI Solved a Grand Challenge

Drug Discovery: The Acceleration of a Trillion-Dollar Industry

Mathematics: When AI Writes Proofs

Climate Science: AI and the Planet

Physics: AI as a Theoretical Partner

Materials Science: Discovering What We Need

The Nature of AI-Assisted Science: What Is Being Changed

The Epistemological Question: How Science Works with AI

The Reproducibility Question: AI and Scientific Reliability

The International Dimension: Who Benefits from Scientific AI

The Scientific Revolution in Progress

The Partnership Metaphor and Its Limits

Further Reading

Comments

The AlphaFold Inflection: When AI Solved a Grand Challenge

Drug Discovery: The Acceleration of a Trillion-Dollar Industry

Mathematics: When AI Writes Proofs

Climate Science: AI and the Planet

Physics: AI as a Theoretical Partner

Materials Science: Discovering What We Need

The Nature of AI-Assisted Science: What Is Being Changed

The Epistemological Question: How Science Works with AI

The Reproducibility Question: AI and Scientific Reliability

The International Dimension: Who Benefits from Scientific AI

The Scientific Revolution in Progress

The Partnership Metaphor and Its Limits

Further Reading

Comments

Subscribe