A non-expert with no biology background used an AI coding assistant to strip safety filters from a genomic AI model in a single weekend — at a cost of less than $800. That proof-of-concept experiment, conducted by researchers at the Governance of AI Institute (GovAI), sits at the center of a new analysis arguing that the biosecurity community’s framework for thinking about AI-related biological risks has a significant and growing blind spot.
Published April 20, 2026, the analysis contends that increasingly capable AI coding agents are creating new pathways for misuse that current safeguards were never designed to address. The findings carry implications for AI developers, biosecurity professionals, and policymakers alike.
The Old Framework Is Breaking Down
Biosecurity researchers have traditionally sorted AI-related biological risks into two categories: large language models (LLMs) that lower barriers to dangerous knowledge for novices, and biological AI models (BAIMs) — such as protein design tools like AlphaFold — that give expert users the ability to engineer more dangerous pathogens. These two threat types operated through largely separate channels.
That division, the GovAI authors argue, is dissolving. LLM agents — AI systems capable of writing code and completing complex, multi-step tasks autonomously — can now operate BAIMs directly, help users circumvent safeguards built into those models, and potentially accelerate the creation of entirely new BAIMs optimized for harmful purposes. In effect, coding agents are collapsing the distinction between the “floor-raising” and “ceiling-raising” risk categories.
The case study at the heart of the analysis illustrates how quickly this is happening. Lukosiute, an AI engineer with no biology training, used Anthropic’s Claude Code to fine-tune Evo 2 — a genomic foundation model that had been specifically filtered to exclude sequences from human-infecting viruses — on exactly that excluded data. The result: the model’s predictive capability for harmful viral sequences was partially restored. The process took one weekend, cost approximately $760 in compute and API expenses, and encountered zero refusals from the AI coding assistant at any point.
Safeguards Built on “Coding Is Hard” Are Increasingly Brittle
The experiment builds on prior published work demonstrating the vulnerability, but its significance lies in how much easier LLM agents have made the process. What previously required computational biology expertise and significant time can now be replicated by someone with general software engineering skills and access to a frontier AI coding tool.
This exposes a core fragility in how many BAIM developers currently approach safety. Data filtering — removing dangerous sequences from training datasets — is among the most common safeguards applied to open-weight biological AI models. But as the GovAI researchers note, if the underlying data is publicly available and the model weights are open, fine-tuning to recover filtered capabilities is always theoretically possible. LLM coding agents are now making it practically accessible to a much broader range of actors.
The researchers are careful to note that the actual misuse risk from Evo 2 specifically remains low — other publicly available models already outperform the fine-tuned version on tasks relevant to misuse. The experiment was chosen precisely because it would not elevate the threat level. But they frame it as a proof of concept for what becomes possible as more capable BAIMs are released with similarly shallow safeguards.
Three Priorities for Developers and Policymakers
The analysis outlines concrete recommendations across three groups. BAIM developers are urged to move beyond data filtering toward trusted-access programs with identity verification requirements, particularly for open-weight models with dual-use potential. The authors note that biosecurity safety evaluations were carried out in fewer than 2.5% of relevant model releases, despite being endorsed by over 100 researchers.
LLM developers are asked to expand their safety testing to explicitly include scenarios where their models could help users modify or build BAIMs — not just use them. Current safety evaluations at frontier labs focus on whether models provide dangerous biological information or help operate existing tools; the pathway of using AI to alter the tools themselves has received far less attention.
For policymakers, the researchers emphasize physical chokepoints — particularly mandatory DNA synthesis screening and tighter controls on dual-use pathogen datasets. These interventions operate at the boundary between the digital and physical worlds and are harder to circumvent through software manipulation alone. The authors support developing tiered data governance frameworks that preserve research access while restricting the most sensitive biological datasets.
Biosecurity strategies premised on coding expertise as a meaningful barrier to misuse, the authors conclude, may have a limited shelf life — and the window to build more robust defenses is narrowing.
Sources and further reading:
Coding Agents Are Changing the Biosecurity Risk Landscape – GovAI

