Shade 23: AI-Enabled Bioweapons / Catastrophic Misuse

In 2024, RAND’s red-team study found that current LLMs did not measurably increase the operational risk of a biological weapons attack: plans generated with AI assistance were no more viable than those generated without it (RAND, 2024). That finding was reassuring and is now outdated. By April 2025, OpenAI’s own safety assessment stated that its models are “on the cusp of being able to meaningfully help novices create known biological threats” and that the company expects them to “cross this threshold in the near future.” Anthropic’s uplift trial for Claude Opus 4 found 2.53x uplift in bioweapons acquisition planning, high enough that the company was unable to rule out its ASL-3 capability threshold, and activated ASL-3 protections for the first time in May 2025. The two companies that know the most about their models’ capabilities are both signaling that the biosecurity landscape is shifting beneath them.

The most concrete evidence of that shift came from the Virology Capabilities Test, a benchmark published in April 2025 that measures the ability to troubleshoot complex virology laboratory protocols. Expert virologists with internet access scored an average of 22.1 percent on questions tailored to their own sub-specialties. OpenAI’s o3 scored 43.8 percent, outperforming 94 percent of expert virologists on the same domain-matched questions. The disparity between humans and models is widening with each generation. RAND’s comprehensive benchmarking study (November 2025), evaluating 39 frontier models against eight biological and chemical knowledge benchmarks, found that reasoning models are exceeding expert human performance on laboratory protocol and graduate-level biology benchmarks. Many benchmarks are approaching saturation. The knowledge barrier is falling. As CSIS noted, the shift from “no measurable uplift” (2024) to “on the cusp of meaningful assistance to novices” (2025) took approximately one year.

Knowledge, however, is necessary for bioweapons and far from sufficient. The practical barriers remain substantial: acquiring specific pathogens, culturing biological agents, weaponizing them for effective dispersal, and delivering them to targets. The Aum Shinrikyo cult had significant resources, recruited scientists with relevant expertise, and still failed to deploy an effective biological weapon despite years of effort. The bottleneck has historically been operational capability, and current AI does not provide that. An Epoch AI analysis of biorisk evaluations (June 2025) argued that existing lab evaluations fail to capture “somatic tacit knowledge,” the physical experimental skills acquired through hands-on laboratory work, which may represent a more durable barrier than informational access. SecureBio’s testing found that Anthropic’s models could design DNA fragments that either assembled into pathogenic viruses or evaded synthesis screening protocols, but not both simultaneously. The operational gap between knowing how to build a weapon and being able to build one remains wide.

The risk shifts when AI moves from providing information to providing laboratory automation and experimental design. RAND’s 2025 Global Risk Index for AI-enabled Biological Tools assessed 57 state-of-the-art tools and found 13 indexed as high priority, with one demonstrating critical-level misuse capabilities and all categories showing substantial room for capability growth. The Council on Strategic Risks’ 2025 year-in-review documented that biological AI tools are growing more capable with many having open-weight components, and that foundation AI models increasingly enable the function of a biomolecule to be preserved even when its sequence changes, a capability with direct implications for evading biosecurity screening. Researchers used three open-source biological AI models to design proteins structurally similar to proteins of concern while differing enough in sequence to potentially evade synthesis screening. They worked with screening companies to identify and patch vulnerabilities, but the exercise demonstrated that the screening infrastructure, built for a pre-AI era, faces an adversarial challenge it was not designed to meet. The convergence of capable AI models, cloud laboratory infrastructure that enables remote experimentation, and open-source biological design tools creates a threat surface that no single intervention can close. The open-weight dimension compounds this. RAND’s November 2025 benchmarking study explicitly focused on custom-tuned versions of open-weight models that can be modified to remove safety guardrails and potentially increase biological capabilities. The refusal training and deployment protections that proprietary labs implement (OpenAI’s biological safeguards, Anthropic’s ASL-3 restrictions) are structurally irrelevant if equivalent or near-equivalent capability is available through open-weight models like DeepSeek-R1 or fine-tuned Qwen variants that can be downloaded, modified, and run locally with no safety layer. The defensive strategy of restricting proprietary model outputs assumes a world in which proprietary models are the only capable ones. That world ended in January 2025.

The CNAS report on AI and biological national security (2024) identified three distinct risk categories that tend to be conflated in public discussion. The first is informational uplift: AI helping a novice access knowledge that was previously available only to trained specialists. This is the category measured by the VCT and the RAND benchmarks, and it is where progress has been fastest. The second is operational uplift: AI guiding the physical execution of laboratory protocols, procurement of materials, and troubleshooting of failures in real time. This is where the current gap remains widest and where somatic tacit knowledge provides a durable barrier. The third is design uplift: AI generating novel pathogens or modifications to existing ones that enhance transmissibility, lethality, or resistance to countermeasures. This is the category with the highest catastrophic potential and the least current evidence. A forecasting analysis on the EA Forum (October 2025) estimated that AI could provide basic support for novices in bioweapons acquisition by late 2025 (a threshold that may already be crossed), advanced support for experts by late 2026, and advanced support for novices by early 2028. A July 2025 study by the Forecasting Research Institute, gathering views from biosecurity experts and superforecasters, postulated that AI could make a pandemic five times more likely.

The governance challenge is that AI’s contribution to biosecurity is deeply dual-use. The same capabilities that enable a novice to troubleshoot a virology protocol for malicious purposes enable a researcher to accelerate vaccine development. The same protein design tools that could evade synthesis screening could design novel therapeutics. The defensive applications deserve the same specificity as the offensive ones: AI-driven metagenomic sequencing is accelerating pathogen surveillance, enabling identification of novel viruses from environmental samples in hours rather than weeks. mRNA vaccine platform design, which enabled the COVID-19 vaccines, is being further accelerated by AI-guided sequence optimization. AI-powered diagnostics are improving early detection of disease outbreaks. If AI makes it easier to create a pandemic pathogen, it also makes it easier to detect one, develop a countermeasure, and deploy it at scale. The question is whether the defensive applications are funded, deployed, and institutionally supported at the same pace as the offensive capabilities proliferate. OpenAI acknowledged the tension explicitly in a 2025 assessment: “the same underlying capabilities driving progress, such as reasoning over biological data, predicting chemical reactions, or guiding lab experiments, could also potentially be misused.” Restricting dual-use capabilities throttles beneficial research along with harmful applications. The governance toolkit includes DNA synthesis screening (now being strengthened through the IBBIS Technical Consortium launched in November 2025), know-your-customer protocols for cloud laboratory access, biosecurity evaluations of frontier models before deployment, and investment in pandemic preparedness and surveillance. CNAS recommended considering a licensing regime for biological design tools with potentially catastrophic capabilities, if such capabilities begin to materialize. The Biological Weapons Convention, the primary international instrument, predates current AI capabilities and lacks verification or enforcement mechanisms.

Mustafa Suleyman’s “containment problem” from The Coming Wave (2023) applies with particular force here. Unlike nuclear weapons, which require industrial-scale enrichment infrastructure, biological weapons can be developed in facilities that are difficult to distinguish from legitimate research laboratories. Once biological design tools are widely accessible and capable enough to provide meaningful operational guidance, restricting their misuse becomes progressively harder. The governance dividend is modest (3 points) because even effective governance can only reduce probability. The Aum Shinrikyo operational barrier, the strongest reassurance in the current environment, is a barrier against current-generation tools. As AI progresses from providing knowledge (where it already exceeds most experts) to providing experimental guidance and eventually autonomous laboratory operation, that barrier erodes. The question is whether biosecurity infrastructure, synthesis screening, surveillance systems, and pandemic preparedness can scale faster than the offensive capabilities that AI enables. Nothing in the current trajectory of investment, particularly given proposed budget cuts to NIST, NIH, CDC, and ASPR in the U.S., suggests that it can.

Key tension: The labs building frontier models are themselves signaling that the biosecurity threshold is approaching. The informational barrier has largely fallen. The operational barrier remains, but it is a function of AI’s current inability to guide physical experimentation, a limitation that laboratory automation and improved agentic capabilities are designed to overcome. The defense is underfunded relative to the offense.