Two AI platforms transform molecular design

©iStock (Khanchit Khirisutchalual)
EPFL researchers have developed Saturn and TANGO, two complementary AI frameworks that make generative molecular design faster, more efficient, and closer to real laboratory chemistry.
Designing a new drug molecule is like searching an almost unimaginably large haystack. One way to tackle this is with generative artificial intelligence (GenAI)—systems that learn patterns from large datasets and then generate new content, such as text (ChatGPT), images, or code.
In chemistry, GenAI can propose entirely new molecular structures. But two key obstacles remain: many models require too many computationally expensive evaluations, and many of the molecules they propose are difficult to synthesize.
Bridging this gap between efficient discovery and real-world feasibility is one of the central challenges in modern AI-driven chemistry.
From virtual molecules to usable candidates
In generative molecular design, an AI model proposes candidate molecules, a computational “oracle” scores them, and the model learns from the results. These oracles can estimate properties such as how well a molecule might bind to a target protein, or calculate electronic properties using more expensive physics-based simulations.
The challenge is that the most accurate oracles are often too slow to use at large scale. Researchers therefore commonly use cheaper approximations first, before testing a smaller set of candidates with higher-fidelity methods. A more sample-efficient model—one that finds good candidates with fewer evaluations—could make it possible to use more accurate simulations earlier in the design process.
A second challenge is synthesizability. A molecule that looks excellent on a computer is of limited use if chemists cannot make it, or if its predicted synthesis depends on unavailable or impractical starting materials.
A team led by Philippe Schwaller at EPFL has now addressed these challenges across two complementary studies published in Nature Machine Intelligence and Nature Computational Science.
Saturn: learning more from each molecule
The first study introduces Saturn, a generative framework for molecular design based on the Mamba architecture, a recent alternative to transformers for sequence modelling. Like other language-based chemistry models, Saturn represents molecules as SMILES strings: text-like descriptions of molecular structure.
“The bottleneck isn’t the AI model itself; it’s how many molecules you need to test before you find good ones,” explains Jeff Guo, a former doctoral researcher in Schwaller’s lab and lead author on both studies. “If we can get the model to learn faster, we can afford to use the most accurate — and most expensive — simulations directly, rather than relying on cheaper approximations.”
Saturn improves its sample efficiency by combining reinforcement learning with “Augmented Memory,” a strategy that reuses high-scoring molecules during training. Because the same molecule can be written as several equivalent SMILES strings, the model can learn from multiple representations of the same successful candidate.
The researchers found that this produces a “hop and locally exploit” behaviour: the model moves into promising regions of chemical space, explores similar molecules there, and then moves on.
In drug-discovery benchmarks involving molecular docking and multi-parameter optimization, Saturn outperformed the tested baselines under constrained oracle budgets.
The team also showed that Saturn could directly optimize electronic properties calculated at the density functional theory (DFT) level, a high-fidelity quantum-mechanical method that is usually too expensive for routine generative design.
TANGO: guiding AI toward synthesizable molecules
But efficiency alone is not enough. In the second study, the researchers introduced TANGO, short for Tanimoto Group Overlap, a reward function designed to help GenAI models propose molecules that are not only promising, but also predicted to be synthesizable using specific building blocks.
This is a harder problem than simply asking whether a molecule can be made. In real chemistry, researchers may want to use particular starting materials, repurpose existing intermediates, work with available reagents, or design families of molecules from a shared chemical core.
“The key challenge was turning a yes-or-no question into a smooth signal the model can learn from,” says Guo. “If you just tell the model ‘this molecule uses the right building block’ or ‘it doesn’t,’ there is almost no useful feedback. TANGO instead measures how close the model is getting, so it can gradually steer towards the goal.”
The result is a continuous reward that guides the generative model during reinforcement learning, much like a compass pointing a hiker towards the summit even when the peak is still out of sight.
The framework handles three practical scenarios: constraining the starting materials of a synthesis, enforcing a specific intermediate along the route, and divergent synthesis, where a common non-commercial intermediate branches into multiple optimized products.
A step toward closed-loop discovery
Together, Saturn and TANGO move AI-driven chemistry closer to closed-loop discovery, where models propose molecules, automated platforms synthesize and test them, and the results feed back into the next round of design.
The work also shifts synthesizability from a post-processing filter to part of the design process itself. Rather than generating many molecules and discarding the impractical ones later, the model learns to steer toward molecules that satisfy both property goals and synthesis constraints.
“Generative models are very good at proposing molecules with optimal predicted properties, but there has been a disconnect between what a computer suggests and what a chemist can actually make,” says Philippe Schwaller. “By bringing synthesizability directly into the optimization loop, we’re narrowing that gap.”
Both frameworks are open source and built on the same codebase, making them readily accessible to researchers in drug discovery, materials science, and beyond.
Other contributors
New York University
Natural Sciences and Engineering Research Council of Canada (NSERC)
Swiss National Science Foundation (NCCR Catalysis)
Jeff Guo, Junwu Chen, Anthony GX-Chen, Philippe Schwaller. Sample-efficient generative molecular design using memory manipulation. Nature Machine Intelligence 8, 449–460 (2026). DOI: 10.1038/s42256-026-01200-4
Jeff Guo, Philippe Schwaller. TANGO: direct optimization of constrained synthesizability for generative molecular design. Nature Computational Science 6, 260–270 (2026). DOI: 10.1038/s43588-026-00959-1