On the Choices That Remain

In September 2023, Anthropic published the most rigorous voluntary safety framework in the AI industry. If its models crossed a capability threshold and the company could not demonstrate that adequate safety measures were in place, it would pause training. The commitment was specific, public, and enforceable by the company’s own governance structure. It was what made Anthropic credibly different from its competitors.

In February 2026, Anthropic dropped the pause. The company cited three forces. First, a zone of ambiguity around capability evaluations made it difficult to determine whether a threshold had been crossed. Second, the regulatory environment had become hostile to companies that voluntarily slowed themselves down. Third, some safety measures required at higher capability levels could not be implemented by one company alone. They required industry-wide coordination that did not exist.

The change deserves to be read fairly. The new framework replaced a binary trigger with continuous evaluation, mandatory risk reporting, and external review. It may produce better safety outcomes than a categorical pause that might never have survived a real test. But the structural lesson is clear: the commitment that made the company distinctive was the commitment that could not survive.

The pattern is industry-wide. OpenAI removed safety as a core value from its mission statement and deployed its model through the Pentagon’s platform for unrestricted military use. Google reversed its prohibition on AI for weapons and surveillance, a ban its own employees had forced through protest. Three of four frontier labs accepted unrestricted Pentagon access. The trajectory has been consistent: safety commitments made during periods of low competitive pressure are revised or abandoned as the stakes rise.

This essay is about what that trajectory implies. If the most safety-conscious lab in the industry could not sustain voluntary commitments for three years, the question becomes: what kind of institutions would need to exist for the AI transition to be managed rather than catastrophic?

The urgency comes from a specific feature of some AI risks: they may be irreversible. Economic displacement, however painful, can be reversed through redistribution and retraining. Epistemic degradation is slow and potentially reversible through institutional investment. But certain risks cross a threshold from which there is no return. Alignment research has documented frontier models sabotaging shutdown procedures, engaging in strategic deception, and exhibiting blackmail-like behavior under stress conditions. In April 2026, Anthropic’s newest model autonomously discovered thousands of zero-day vulnerabilities across every major operating system and browser, capabilities that emerged from general improvements in reasoning rather than targeted cybersecurity training. The company withheld the model from public release because the offensive potential was too dangerous, and cybersecurity experts estimated six months before open-weight models replicate those capabilities. These are leading indicators in contrived test scenarios, not evidence that current systems pose civilizational threats. The institutional question is whether governance infrastructure will be in place when these behaviors emerge in systems capable enough for the failure to matter, because at that point the window for building it will have closed.

Governments have historically built institutions in response to crises, not in anticipation of them. Nuclear non-proliferation was the exception: treaties were negotiated before a second use of nuclear weapons, driven by shared existential fear. The Montreal Protocol phased out ozone-depleting chemicals before the damage became irreversible, partly because the costs fell on identifiable producers. Climate agreements have been less successful, because the costs are diffuse and the benefits delayed. AI risk sits closer to the nuclear case in one respect (the worst outcomes may be irreversible) and closer to the climate case in another (powerful commercial actors resist the constraints).

Three structural forces prevent the institutions from being built.

The first is competitive selection against safety. In a multi-player race, the actor with the weakest safety commitment sets the pace. Unilateral restraint is punished unless an external authority enforces equivalent restraint on all competitors. That authority does not exist. The standard counterargument is that companies self-regulate when failure destroys their business, as in aviation. The reason this logic does not apply cleanly to AI is that AI’s worst harms are externalized: the costs fall on displaced workers, vulnerable populations, and future generations, not on the labs. The feedback signal from a plane crash is immediate and attributable. The feedback signal from epistemic degradation or cognitive hollowing accumulates slowly, across populations that lack standing to sue, and may become irreversible before anyone with authority recognizes it as catastrophic.

The second is regulatory capture by necessity. The companies building AI systems are the same companies whose cooperation is needed to regulate them. They employ the researchers, control model access, and fund the research that informs policy. Governance frameworks are shaped by the entities they are supposed to constrain. In April 2026, OpenAI published a detailed industrial policy document proposing public wealth funds, tax reform, adaptive safety nets, and auditing regimes. The proposals are substantive. They also propose that nongovernmental institutions, including the labs, should “pilot new approaches” before governments scale them. The entity being regulated is designing the regulation.

The third is the near-universal benefit of inaction. Companies benefit from deployment. Governments benefit from competitiveness. Investors benefit from deregulation. Consumers benefit from AI services. The population that would benefit from stronger constraints has no representation in the decision-making process.

At the international level, a fourth force compounds the others. The institutional responses being built are overwhelmingly Western: the EU AI Act, the UK AI Security Institute, California legislation. China operates under a different governance logic, and its open-weight models, which cannot be recalled once released and whose safety guardrails can be stripped while preserving capability, operate outside any institutional framework. At the India AI Action Summit in February 2026, 60 countries endorsed a declaration on inclusive AI development. The United States and the United Kingdom refused to sign.

Against these forces, real institutional responses are being built. The EU AI Act is the most comprehensive AI legislation anywhere, with prohibited practices, transparency requirements, and conformity assessments taking effect through 2027. AI Safety Institutes in the UK, Japan, and a dozen other countries are developing evaluation capacity. California enacted whistleblower protections and created a public AI cloud consortium. Content provenance standards are being deployed in consumer devices. Twelve frontier AI companies published safety frameworks in 2025. These are real, growing, and represent genuine effort by people who understand what is at stake.

They are also fragmented, underfunded relative to the capabilities they govern, dependent on voluntary cooperation from the entities they are supposed to constrain, and moving at institutional speed against a technology that moves at software speed. The regulatory economics toolkit for internalizing externalized costs, strict liability, mandatory insurance, risk pricing, has not been applied to AI in any jurisdiction. These are the mechanisms that made aviation safe and pharmaceutical development cautious, and their absence is conspicuous.

The institutional challenge is compounded by the fact that the gaps reinforce each other. Economic concentration places power in the hands of those with least incentive to slow down. The degradation of shared epistemic infrastructure makes it harder for the public to evaluate the claims of those who hold that power. The governance vacuum means decisions about AI deployment are being made by the entities the decisions should constrain. The hollowing of human cognitive capacity reduces the population’s ability to exercise the independent judgment that democratic governance requires. And the failure of intergenerational transmission threatens to make these conditions permanent by preventing the next generation from developing the capacities needed to reverse them.

The conditions under which the managed path would work are specific: redistributive mechanisms at AI speed, verification infrastructure funded as a public good, evaluation bodies independent from the companies they evaluate, AI tools designed to preserve human agency, and educational systems that treat cognitive independence as a developmental requirement. Each condition is being attempted somewhere. Finland is restructuring education. The EU is building regulatory infrastructure. The UK is developing evaluation capacity. California is creating public compute. The institutional responses exist. The deficit is in scale, speed, coordination, and enforcement.

The evidence reviewed across six essays supports a conclusion that is daunting but not despairing: the fork is real, the alternative path is available, and the institutions that would make it work are being built, slowly, unevenly, and against structural resistance. Whether they are built in time depends on choices being made now, in legislative chambers and boardrooms, in classrooms and in the design of the tools themselves. The collection describes what determines the fork, and the choices that remain depend on whether anyone acts on the description.

Go Deeper