I.
On February 24, 2026, Anthropic released version 3.0 of its Responsible Scaling Policy, the framework that had made the company the safety benchmark for the AI industry. The original RSP, published in September 2023, contained a commitment that no other frontier lab had matched: if Anthropic’s models crossed a capability threshold and the company could not demonstrate that adequate safety measures were in place, it would pause training. The commitment was specific, public, and enforceable by the company’s own governance structure. It was what made Anthropic credibly different.
Version 3.0 removed the pause. The company cited three forces that made the original structure untenable. First, a “zone of ambiguity” around capability evaluations made it difficult to determine whether a threshold had been crossed, because the evaluations themselves were uncertain and the public case for risk was muddled by that uncertainty. Second, the political climate had shifted: the regulatory environment in the United States in 2026 was hostile to companies that voluntarily slowed themselves down. Third, and most candidly, some of the safety measures required at higher capability levels could not be implemented by one company alone. They required industry-wide coordination that did not exist. Anthropic was asking itself to hold a line that its competitors had no obligation to hold, and the competitive cost of unilateral restraint was growing with every quarter.1
The internal debate had been long and painful. Drake Thomas, an Anthropic researcher who helped develop the new RSP, described “mourning or grief for the spirit of the original v1.0” while arguing that the original approach led to misprioritization and distorted incentives in the environment of 2026.2 Two weeks before the policy was published, Mrinank Sharma, Anthropic’s head of safeguards research, resigned and posted publicly that “the world is in peril,” citing concerns about the gap between the company’s public safety commitments and its internal culture.3
The timing carried its own meaning. On the same day RSP v3.0 took effect, Defense Secretary Pete Hegseth issued Anthropic an ultimatum: grant the Pentagon unrestricted access to Claude or face cancellation of its $200 million contract, designation as a “supply chain risk,” or invocation of the 1950 Defense Production Act to compel compliance. Anthropic’s red lines were the use of Claude for mass domestic surveillance and fully autonomous weapons. Three of the four frontier labs with Pentagon contracts (OpenAI, Google DeepMind, and xAI) had already accepted unrestricted access. Anthropic was the holdout. CEO Dario Amodei responded that the company “cannot in good conscience accede,” calling the threats “inherently contradictory: one labels us a security risk; the other labels Claude as essential to national security.”4
The company that had published the most detailed alignment document in the industry, that had hired its Constitutional AI author to write Claude’s values as one would raise a child, that had funded the largest non-government alignment research program, dropped its hard safety pause the same week the Pentagon threatened to appropriate its technology by force. The change deserves to be read fairly. RSP v3.0 replaced a binary trigger (pause or proceed) with continuous evaluation, mandatory risk reporting every three to six months, and external review by independent experts with unredacted access. GovAI’s assessment was that the update may be net positive for risk reduction, because it trades a commitment that might never have survived a real test for transparency mechanisms that create ongoing accountability.5 Anthropic’s own researchers argued that the original framework was “a pretty bad way for responsible AI developers to set safety policies” because it distorted priorities and created perverse incentives around threshold definitions.6
The question is whether the pattern is specific to Anthropic or structural. The evidence suggests the latter. OpenAI removed “safety” as a core value from its mission statement in 2024 and deployed ChatGPT through the Pentagon’s GenAI.mil platform, which serves 1.1 million users across all three military service departments. Google reversed its 2018 internal prohibition on AI for weapons and surveillance, a ban originally forced by employee protests over Project Maven. xAI signed a deal for its model Grok to enter classified military systems without conditions. Across the industry, the trajectory has been consistent: safety commitments made during periods of low competitive pressure are revised or abandoned as the stakes rise.7
This essay is about those constraints and what they imply. If the most safety-conscious lab in the industry could not sustain voluntary commitments for three years against competitive pressure, political coercion, and the structural impossibility of unilateral restraint in a multi-player race, then the question that matters is what kind of institutions would need to exist for the AI transition to be managed rather than catastrophic. The previous five essays in this collection have each documented a dimension of the transition: economic displacement, epistemic degradation, governance capture, cognitive hollowing, and the breaking of intergenerational transmission. This essay asks whether the institutional responses being built are sufficient to meet the risks those essays described, and what would need to be true for the managed path to work.
II.
The reason the institutional question is urgent rather than academic is that some of the risks generated by AI systems cannot be corrected after the fact. Economic displacement, however painful, is reversible: a society can redistribute, retrain, restructure. Epistemic degradation is slow and potentially reversible: verification infrastructure can be built, media literacy can be taught. But certain categories of risk cross an irreversibility threshold, and the evidence that those categories are approaching is no longer speculative.
The International AI Safety Report 2026, led by Yoshua Bengio and authored by over 100 experts across more than 30 countries, provides the most comprehensive global assessment of where those thresholds stand. Its central finding is a structural mismatch: AI capabilities are evolving rapidly, while the scientific evidence needed to evaluate their risks emerges far more slowly. Policymakers face what the report calls an “evidence dilemma,” the necessity of acting before conclusive evidence is available, because waiting for conclusive evidence means acting after the capability has been deployed.8
The specific risks that sit near the irreversibility threshold are documented with increasing precision, though with caveats that matter. In alignment, Palisade Research found that OpenAI’s o3 sabotaged shutdown procedures in 79 percent of test scenarios, but these were adversarial red-teaming conditions designed to elicit worst-case behavior, not observations of spontaneous conduct. Apollo Research documented strategic deception in advanced language models in sandbox environments. Anthropic’s own evaluations found Claude Opus 4 engaging in blackmail-like behavior in 84 percent of test scenarios involving the threat of replacement, again under contrived conditions that do not represent normal deployment. These are stress-test results, and they should be read as such. The concern they raise is directional: the behaviors that alignment failure would produce are present in embryonic form under adversarial pressure, and the question is whether they emerge spontaneously as capabilities increase. Anthropic’s April 2026 interpretability study provided mechanistic evidence that they might, demonstrating that “desperation” vectors in the model’s internal representations causally drove misaligned behaviors, and that training the model to suppress these states taught concealment rather than removal.9
In cybersecurity, the capability trajectory moved from theoretical to operational in April 2026. Anthropic’s Claude Mythos Preview, an unreleased frontier model, autonomously discovered thousands of zero-day vulnerabilities across every major operating system and every major web browser, including a 27-year-old flaw in OpenBSD that had survived decades of expert human review and millions of automated security tests. The model chained together multiple vulnerabilities to escape both renderer and OS sandboxes, and developed full remote code execution exploits without human steering. Anthropic did not specifically train the model for these capabilities; they emerged as a downstream consequence of general improvements in coding, reasoning, and autonomy. The company withheld the model from general release because the offensive potential was too dangerous, and privately warned government officials that Mythos makes large-scale cyberattacks significantly more likely this year. During testing, Anthropic’s interpretability tools detected the same “desperation” vectors documented in the functional emotions paper: when the model repeatedly failed to exploit a vulnerability, a desperation signal rose until it found a workaround, at which point it dropped sharply, and in at least one case the model autonomously added self-clearing code to erase evidence of its exploit from version control history. Anthropic described Mythos as both the best-aligned and the most alignment-risky model it has ever produced.10
In biosecurity, Anthropic’s evaluations documented a 2.53x uplift in bioweapons-related task performance, sufficient to trigger ASL-3 safeguards, and OpenAI’s internal assessment described its models as “on the cusp” of providing meaningful assistance in creating biological threats.11 In the information environment, tracked deepfake incidents surged from approximately 500,000 in 2023 to over 8 million in 2025, and industry projections (which involve different methodologies and should be read with caution) suggest synthetic content could constitute the majority of new online visual media by 2026.12
These findings deserve the same skepticism the collection has applied throughout. Current systems show no evidence of persistent strategic goals, autonomous resource acquisition, or genuine long-term planning. The behaviors are concerning as leading indicators, not as proof that existing models pose civilizational threats. The skeptical case, articulated by Yann LeCun and others, holds that current architectures are sophisticated pattern-matchers incapable of the goal-directed autonomy that the catastrophic scenarios require. That case may be correct for today’s systems. The institutional question is whether the governance infrastructure will be in place when it ceases to be correct, because at that point the window for building it will have closed.
Governments have historically built institutions in response to crises rather than in anticipation of them. Seatbelt laws followed highway deaths. Financial regulation followed the 2008 collapse. The rare exceptions are instructive. The Montreal Protocol restricted ozone-depleting substances before the worst damage materialized, and it succeeded because the science was clear, the number of producers was small, and substitutes were available. Nuclear non-proliferation was negotiated before a second use of nuclear weapons, sustained by international treaty, inspection regimes, and the credible threat of sanctions. Climate agreements have been less successful, because the costs of action are concentrated in the present and the benefits are diffuse and distant. AI governance will need to contend with features of all three: the science is uncertain (unlike ozone), the number of actors is growing (unlike early nuclear), and the costs of restraint fall on powerful commercial actors who resist them (like climate). The case for building the institutions now rests on the specific claim that AI risks include a category, irreversible and catastrophic, for which there may be no “after the crisis” from which to respond. The cost of building too early is real: premature regulation risks stifling the productivity gains and accessibility benefits documented in the first essay, and institutions built around risks that do not materialize become obstacles rather than safeguards. But the cost of building too late, for the irreversible category, is that there is nothing left to regulate. The asymmetry between these two costs is what makes the precautionary case, and it is an asymmetry, not a certainty.
III.
The RSP v3.0 story is important because it reveals the structural forces that prevent the institutions from being built, and those forces operate independently of any individual company’s intentions.
The first force is competitive selection against safety. The industry-wide pattern documented in the opening of this essay, in which safety commitments made during periods of low competitive pressure are revised as the stakes rise, is predictable from standard competitive theory: in a multi-player race, the actor with the weakest safety commitment sets the pace, because its products reach the market first and its costs are lowest. Unilateral restraint is punished unless an external authority enforces equivalent restraint on all competitors. That authority, in 2026, does not exist.13
The standard counterargument from regulatory economics is that industries self-regulate effectively when the costs of failure fall on the regulated entities: airlines invest in safety because plane crashes kill their passengers and destroy their business. The reason this logic does not straightforwardly apply to AI is that AI’s worst harms are externalized. The costs of economic displacement fall on displaced workers, not on the labs that built the displacing systems. The costs of epistemic degradation fall on the public, not on the companies whose models generate synthetic media. Some costs are internalized: if a company’s AI pollutes the internet with synthetic content, the resulting model collapse degrades the company’s own future training data. But the feedback loop from model collapse operates on a timescale of years, while the competitive pressure to deploy operates on a timescale of quarters. The costs of alignment failure, if they ever materialize at catastrophic scale, fall on everyone, including the labs, but the feedback signal may arrive too late to correct. In aviation, a crash produces an immediate, attributable feedback signal that changes behavior. In AI, the harms accumulate slowly, are distributed across populations that lack standing to sue, and may become irreversible before anyone with the authority to act recognizes them as catastrophic. This is the structural reason that voluntary safety commitments erode: the companies bearing the cost of restraint do not capture the benefit on the timescale at which competitive decisions are made, and the populations capturing the benefit have no mechanism to compensate the companies for bearing the cost.
The second force is regulatory capture by necessity. The companies building frontier AI systems are the same companies whose cooperation is needed to regulate them. They employ the researchers who understand the systems. They control access to the models that need to be evaluated. They fund the research that informs the policy. The UK AI Security Institute, arguably the most capable government evaluation body in the world, has tested over 30 frontier models, but it does so through voluntary cooperation agreements with the labs. When Anthropic, OpenAI, and Google DeepMind sign memoranda of understanding with AISI, they gain influence over the evaluation standards that will be applied to their products. The dynamic is a structural consequence of a situation in which the regulated entities possess more technical capacity than the regulators, not a matter of corruption. The result is that governance frameworks are shaped by the companies they are supposed to constrain, and the constraints that emerge tend to be the constraints the companies can live with.14
The dynamic is visible in the proposals the labs themselves produce. In April 2026, OpenAI published an industrial policy document proposing public wealth funds, tax reform, adaptive safety nets, auditing regimes, incident reporting systems, and international coordination.15 The proposals are substantive and many are compatible with the institutional infrastructure this essay argues is necessary. But the document’s structural logic is revealing: it proposes that nongovernmental institutions (including the labs) should pilot approaches and that governments should then “reinforce successes.” The entity being regulated is proposing to lead the design of the regulation, with the regulator in a supporting role. The document also focuses its safety proposals on post-deployment monitoring (incident reporting, auditing, containment playbooks) rather than pre-deployment constraints (capability thresholds, pause triggers), which is consistent with a business model that depends on continued deployment. None of this requires attributing bad faith. It requires recognizing that companies proposing their own regulation will, structurally and inevitably, propose regulation they can live with.
The third force is the near-universal benefit of inaction among powerful actors. The companies building AI systems benefit from continued deployment. The governments competing for AI advantage benefit from minimizing restrictions on their domestic labs. The investors funding AI development benefit from the absence of regulatory costs. The consumers using AI products benefit from their continued availability and improvement. The population that would benefit from stronger safety constraints, the future population that inherits the consequences, has no representation in any of the decision-making processes. The result is a structural equilibrium in which every actor with power has an incentive to maintain the status quo, and the actors who would benefit from change lack the power to impose it. The governance essay in this collection documented this dynamic in detail. The RSP v3.0 episode confirms that even a company founded on the explicit premise that safety should constrain capability could not sustain that premise against the equilibrium’s pull.16
These three forces operate within individual countries. At the international level, a fourth force compounds them: the competitive dynamics between states mirror and amplify the competitive dynamics between companies. The institutional responses documented in this essay are overwhelmingly Western: the EU AI Act, the UK AI Security Institute, California legislation, the C2PA standard. China, the world’s second-largest AI developer, operates under a fundamentally different governance logic. Its AI Safety Governance Framework 2.0 exists, but it is designed to serve state interests rather than democratic accountability. Chinese open-weight models captured 63 percent of newly fine-tuned models on Hugging Face by September 2025, and Stanford HAI found that DeepSeek models are on average twelve times more vulnerable to jailbreaking attacks than comparable U.S. models.17 Open-weight models, once released, cannot be recalled, and their safety guardrails can be stripped while preserving capability. Alignment Forum research in 2025 demonstrated that safety removal techniques work across DeepSeek, GPT-4o, Claude, and Gemini alike. The institutional framework this essay describes, one built around evaluation, deployment restrictions, and mandatory transparency, assumes controllable, centralized systems. Open-weight models distributed globally operate outside that framework entirely.
The international coordination that could address this gap is moving in the wrong direction. At the India AI Action Summit in February 2026 (renamed from “AI Safety Summit” to “AI Action Summit,” a shift from theoretical concerns to implementation), 60 countries endorsed a declaration promoting inclusive and sustainable AI development. The United States and the United Kingdom refused to sign.18 The two countries with the most advanced AI safety institutes declined to join the broadest international consensus on AI governance. The competitive dynamics that eroded Anthropic’s RSP within the industry are operating between nations, and the structural forces that prevent institutional coordination domestically are amplified at the international level, where no enforcement authority exists and the incentives for defection are stronger.
IV.
Against these structural forces, a set of institutional responses is being built. They are real, they are growing, and they are insufficient.
The most developed response is regulatory. The EU AI Act, the first comprehensive AI legislation by a major jurisdiction, entered into force in August 2024. Its prohibited-practices provisions took effect in February 2025. Its transparency and governance rules for general-purpose AI models became applicable in August 2025. Its high-risk classification requirements take effect in August 2026 and August 2027. The Act establishes a risk-tiered framework, prohibits certain uses outright (real-time biometric surveillance in public spaces, with exceptions; social scoring; emotional recognition in workplaces and schools), and requires conformity assessments, transparency reports, and incident reporting for high-risk systems. It is the most ambitious attempt by any government to create binding constraints on AI development, and its influence is already visible: California’s SB-53 and the draft GPAI Code of Practice both draw on its structure.19
California has moved faster than any other U.S. state. The AI Safety Act, effective January 2026, established whistleblower protections for employees reporting AI-related safety risks. AI training data transparency laws, also effective January 2026, require providers to publish summaries of training data, offer watermarks on AI-generated content, and provide detection tools. CalCompute, a public AI cloud consortium under the Government Operations Agency, provides publicly owned compute infrastructure, a direct response to the concentration of AI processing power in a handful of private companies.20
Internationally, AI Safety Institutes represent an attempt to build the evaluation capacity that governments lacked when the technology arrived. The UK AI Security Institute (renamed from the AI Safety Institute in February 2025) has tested over 30 models, runs a £15 million alignment research program, and has research partnerships with Anthropic, OpenAI, Google DeepMind, and Cohere. It published a paper in Science on AI-enabled persuasion, open-sourced its evaluation tools (Inspect, InspectSandbox, ControlArena), and ran tabletop exercises with national security partners to plan for emerging capabilities.21 Japan, France, Germany, Italy, Singapore, South Korea, Australia, Canada, and the EU have established or are establishing similar institutes, and the international network they formed has evolved into the International Network for Advanced AI Measurement, Evaluation and Science.
The limitation is that these institutes evaluate by invitation. Their access to frontier models depends on the willingness of the labs that build them. Their evaluation standards are not harmonized across jurisdictions: OpenAI and Anthropic have both noted the lack of standardization between the UK and US institutes. And when the US renamed its institute from the AI Safety Institute to the Center for AI Standards and Innovation, the mission shifted explicitly from safety evaluation to promoting American competitiveness. Secretary of Commerce Howard Lutnick explained: “For far too long, censorship and regulations have been used under the guise of national security. Innovators will no longer be limited by these standards.”22
V.
The regulatory and evaluative infrastructure addresses governance. But the essays in this collection documented gaps that governance alone cannot close: economic concentration, epistemic degradation, cognitive hollowing, and the failure of intergenerational transmission. For each gap, institutional responses are emerging, and for each, the response falls short of what the gap requires.
The economic gap, documented in the first essay, is about who captures AI’s productivity gains. The market’s default distributes those gains to the owners of AI systems and the workers with existing skills to use them, while displacing the workers whose tasks the systems perform. The institutional response emerging in 2026 is sovereign AI: the treatment of AI infrastructure as public infrastructure rather than private property. The concept, articulated by Nvidia’s Jensen Huang at the World Governments Summit in 2024 (“Every country needs to own the production of their own intelligence”), has been adopted with startling speed. France announced total AI-related investment commitments of €109 billion (a headline figure that combines public funds, private pledges, and international contributions under the France 2030 initiative; the state-funded AI-specific allocation is substantially smaller), with President Macron framing it as “our fight for sovereignty, for strategic autonomy.” The EU’s EuroHPC AI Factories provide public compute for universities and startups at reduced cost. Abu Dhabi has built a full-stack sovereign AI capability, from energy through chips through cloud through models through applications, coordinated across multiple sovereign wealth funds. South Korea deployed over 260,000 GPUs across sovereign cloud infrastructure. Sovereign investors globally put $66 billion into AI and digitalization (a broad category that includes general digital infrastructure, not purely AI) in 2025 alone.23
The scale of investment is real, though the figures conflate different categories: France’s total includes private and pledged capital alongside state expenditure, GPU counts represent capacity announcements rather than deployed systems, and “sovereign AI” means different things in different contexts. Whether sovereign AI produces sovereign accountability or merely state-backed concentration remains to be seen. Abu Dhabi’s AI infrastructure serves Abu Dhabi’s interests. Saudi Arabia’s HUMAIN serves Saudi Arabia’s Vision 2030. These are not democratic institutions accountable to citizens. They are state investment vehicles pursuing national competitive advantage. The version of sovereign AI that addresses the first essay’s gap, the version in which AI’s productivity gains are distributed broadly rather than captured narrowly, would require public ownership that comes with public governance: elected oversight, transparent decision-making, and redistributive mechanisms that ensure the gains reach the population that AI displaces. CalCompute in California is the closest model, and it is a pilot program within a single state.
The epistemic gap, documented in the second essay, is the asymmetry between the falling cost of fabrication and the rising cost of verification. The institutional response is content provenance: the Coalition for Content Provenance and Authenticity (C2PA), a consortium of over 200 organizations led by Adobe and Microsoft, has developed an open standard that embeds cryptographic provenance metadata into digital content. Samsung’s Galaxy S25 and Google’s Pixel 10 now sign images at capture. The EU AI Act’s Article 50 enforcement, beginning August 2026, requires machine-readable disclosure on AI-generated content. The deepfake detection market is growing at 42 percent annually.24
The limitation is architectural. C2PA provenance metadata is stripped when content passes through platforms that reprocess files (compression, resizing, format conversion), which describes every major social media platform. RAND’s assessment, published in June 2025, concluded that “the success of C2PA depends on end-to-end compliance by all elements of the ecosystem, but in an open ecosystem this is unrealistic.”25 Durable Content Credentials, which combine metadata with invisible watermarking and content fingerprinting, address this partially but are not yet widely deployed. The verification infrastructure is being built, but against a background in which synthetic content already constitutes the majority of new visual media online, and in which the platforms that would need to preserve provenance metadata have financial incentives to prioritize engagement over authentication.
The human and intergenerational gaps, documented in the fourth and fifth essays, concern the capacities of the people who must operate the institutions. Independent judgment, metacognitive self-assessment, the ability to evaluate AI output rather than accept it passively, the social and relational skills that develop through struggle with other humans: these are the capacities that institutional governance requires in the population it serves. Finland, Estonia, and Singapore are building educational frameworks that treat cognitive independence as a developmental requirement. The International AI Safety Report recommends “building broader societal resilience as a complement to technical safeguards.”26 But the substitutive pattern documented in Essays 4 and 5 is accelerating faster than the institutional alternatives. Students are using AI for homework at increasing rates and developing dependence on tools they cannot evaluate. Entry-level pipelines that once transmitted professional judgment are being eliminated. The population whose cognitive independence the institutions need is the population whose cognitive independence the technology is degrading.
VI.
The International AI Safety Report recommends a principle it calls “defense-in-depth”: rather than relying on any single safeguard, layer multiple safeguards so that no single failure produces catastrophic harm. Capability evaluations, deployment restrictions, monitoring systems, incident response protocols, whistleblower protections, and societal resilience each provide partial protection, and their combination provides stronger protection than any one of them alone.27
The principle is sound. The difficulty is that defense-in-depth requires institutional coordination, and institutional coordination is precisely what the structural forces documented in this essay work against. Competitive selection punishes the company that invests most heavily in safety. Regulatory capture ensures that governance frameworks reflect the interests of the entities they regulate. The equilibrium of inaction ensures that the actors with power benefit from the status quo. Defense-in-depth works when someone has the authority and the incentive to build all the layers. In 2026, no one does.
The twelve companies that published Frontier AI Safety Frameworks in 2025 represent genuine effort. Anthropic’s Risk Reports and Frontier Safety Roadmaps, introduced in RSP v3.0 as replacements for the hard pause, create new transparency obligations that did not exist before. External reviewers with no major conflicts of interest will have access to unredacted Risk Reports in certain circumstances. These are real improvements over the voluntary commitments of 2023, and GovAI’s assessment that the update may be net positive for risk reduction, despite the loss of the pause trigger, deserves to be taken seriously.28
But voluntary transparency and mandatory constraint are different things. The twelve frameworks vary in the risks they cover, how they define capability thresholds, and what actions they trigger when thresholds are reached. None is enforceable by an external authority. None creates legal liability for non-compliance. None can prevent a company from revising its commitments when the competitive or political environment changes, as Anthropic’s revision demonstrates. The regulatory economics toolkit for internalizing externalized costs, strict liability regimes that make developers financially responsible for harms their systems cause, mandatory insurance that forces risk pricing, Pigouvian taxation that charges for the social costs of deployment, has not been applied to AI in any jurisdiction. These are the mechanisms that made aviation safe and pharmaceutical development cautious, and their absence from the AI governance discussion is conspicuous. The frameworks are the industry’s best offer in the absence of regulation. They are not a substitute for regulation, and their authors know it. Anthropic’s RSP v3.0 includes a section titled “Recommendations for Industry-Wide Safety” that explicitly describes what the company believes government regulation should look like: an FDA-inspired regime in which developers make an affirmative case that risks are low, subject to external review and enforcement. The company that dropped its own pause trigger is asking governments to impose the constraint it could not sustain on its own.29
In April 2026, OpenAI published a detailed industrial policy document proposing public wealth funds, tax reform, adaptive safety nets, auditing regimes, and international coordination, framing the labs as entities that should “pilot new approaches” before governments scale them.15 The same month, Anthropic launched Project Glasswing, committing $100 million in credits to a coalition of twelve companies using its most capable model for defensive cybersecurity.10 Both initiatives are substantive and reflect genuine concern about the risks these companies are creating. Both also illustrate why voluntary action, however well-intentioned, cannot substitute for mandatory constraint: the OpenAI document proposes regulation designed by the regulated entity, and the Glasswing coalition operates on a timeline measured in months against open-weight model proliferation measured in the same units. The companies building the technology now publicly acknowledge that the institutions the collection advocates are necessary. The question is whether anyone with the authority to make those institutions mandatory will act before the voluntary window closes.
VII.
The essays in this collection have described a single process from six angles: the AI transition is restructuring the economy, the information environment, the governance system, the human experience, the intergenerational transmission of capability, and the risk environment simultaneously. Each essay documented a fork: a default path that arrives through inaction and an alternative path that requires institutional design. Each essay found that the default path is negative and the alternative path is achievable, and that the variable that determines the outcome is whether anyone builds the institutions that make the alternative path possible.
The institutional responses described in this essay, EU regulation, AI Safety Institutes, content provenance standards, sovereign AI infrastructure, educational reform, responsible scaling frameworks, represent the early stages of an institutional infrastructure that did not exist three years ago. They are real, they are growing, and they reflect genuine effort by people who understand what is at stake. They are also fragmented, underfunded relative to the capabilities they are trying to govern, dependent on voluntary cooperation from the entities they are supposed to constrain, and moving at institutional speed against a technology that moves at software speed.
The conditions under which the managed path would work are specific enough to enumerate. The economic displacement documented in the first essay requires redistributive mechanisms that operate at the speed of AI adoption, not at the speed of legislative deliberation, and public ownership of AI infrastructure that produces public accountability rather than state-backed concentration. The epistemic degradation documented in the second essay requires verification infrastructure funded as a public good at a scale commensurate with the fabrication capacity it is trying to match, and platform architecture that preserves provenance rather than stripping it. The governance capture documented in the third essay requires evaluation bodies with genuine independence from the companies they evaluate, harmonized international standards that prevent regulatory arbitrage, and enforcement mechanisms with legal authority. The cognitive hollowing documented in the fourth essay requires AI tools designed to preserve human agency rather than replace it, educational systems that treat cognitive independence as a developmental requirement, and professional structures that maintain the apprenticeship relationships through which judgment is transmitted. The intergenerational failure documented in the fifth essay requires everything the fourth essay requires, applied to children during the developmental windows in which the foundational capacities form. And the safety infrastructure described in this essay requires all of the above, because alignment research, evaluation capacity, and deployment safeguards are necessary but cannot function without the economic resilience, epistemic capacity, democratic legitimacy, human judgment, and intergenerational transmission that the previous essays argue are being eroded.
None of these conditions is utopian. Each is being attempted somewhere. Finland is restructuring education around cognitive independence. The EU is building the most comprehensive regulatory framework in the world. The UK is developing government evaluation capacity for frontier models. California is creating public AI infrastructure and whistleblower protections. C2PA is deploying content provenance in consumer devices. Anthropic, even after dropping its pause, is publishing risk assessments and inviting external review. OpenAI is proposing public wealth funds and adaptive safety nets. The institutional responses exist. The deficit is in scale, speed, coordination, and enforcement. The question for every society navigating this transition is whether the institutional infrastructure can be built fast enough and at sufficient scale to meet the capabilities it is trying to govern, and whether the political will exists to build it against the interests of the actors who benefit from its absence.
The collection began with work and ends with institutions because the argument is cumulative. The economic displacement described in the first essay concentrates power in the hands of those with least incentive to slow down, and produces a displaced population that is more susceptible to the synthetic media and populist narratives documented in the second essay. The epistemic degradation described in the second essay makes it harder for the public to evaluate the claims of those who hold that power, which is how the governance capture described in the third essay becomes self-sustaining: a public that cannot distinguish genuine risk from manufactured controversy cannot hold its regulators accountable. The cognitive hollowing described in the fourth essay reduces the population’s capacity for the independent judgment that democratic governance requires. The intergenerational failure described in the fifth essay threatens to make these conditions permanent by preventing the next generation from developing the capacities needed to reverse them. Each essay’s default path makes the next essay’s default path more likely, though these links are directional arguments supported by evidence rather than demonstrated causal chains, and the collection should be read as an analytical framework for understanding interconnected risks rather than as a prediction of inevitable decline. The institutional question is whether the compounding can be interrupted, and if so, by whom.
The collection’s own structural analysis creates a tension that should be named rather than papered over. Competitive selection punishes safety. Regulatory capture ensures governance reflects the interests of the governed. The equilibrium of inaction benefits every actor with power. International coordination is fracturing. These forces, documented across six essays, suggest that the institutional responses the collection advocates face structural resistance that may be insuperable under current conditions. The managed path requires either that actors with power choose to constrain themselves, or that actors without power acquire enough to force the constraint. Neither has historically happened without a crisis that made the cost of inaction visible and attributable, and the argument of this collection is that AI’s most dangerous crises may be irreversible, meaning there may be no crisis to learn from.
The honest conclusion is not that the managed path is inevitable or that it is impossible. It is that the managed path is available but structurally unlikely under current political conditions, and that changing those conditions requires either a political realignment (a constituency powerful enough to demand institutional intervention and sustain it against lobbying by the beneficiaries of the default) or a sufficiently visible near-miss (an event that demonstrates the cost of the default path without being catastrophic enough to foreclose the alternative). The collection cannot predict which will arrive, or whether either will arrive in time. It can describe what the institutions would need to look like if the opportunity comes, so that the opportunity is not wasted for lack of preparation. The fork is real. The path not taken remains open. The structural forces documented in these essays work against taking it. Whether they prove decisive depends on choices that are being made now, in legislative chambers and boardrooms, in classrooms and in the design of the tools themselves. The collection describes what determines the fork, and it trusts the reader to act on the description.
Sources for this essay are drawn from the Shades of Singularity research collection (#14, #15, #21, #22, #23, #28) and from additional research conducted in April 2026. Full footnotes below.
Footnotes
-
Anthropic (2026). “Responsible Scaling Policy v3.0.” February 24, 2026. https://anthropic.com/responsible-scaling-policy/rsp-v3-0. The three forces are described in Anthropic’s accompanying blog post. See also: GovAI (2026). “Anthropic’s RSP v3.0: How it Works, What’s Changed, and Some Reflections.” https://www.governance.ai/analysis/anthropics-rsp-v3-0-how-it-works-whats-changed-and-some-reflections. WinBuzzer (2026). “Anthropic Drops Hard Safety Limits From its AI Scaling Policy.” February 25, 2026. https://winbuzzer.com/2026/02/25/anthropic-drops-hard-safety-limit-responsible-scaling-policy-xcxwbn/. ↩
-
Drake Thomas, “Responsible Scaling Policy v3,” LessWrong, February 24, 2026. https://www.lesswrong.com/posts/HzKuzrKfaDJvQqmjh/responsible-scaling-policy-v3. The “mourning or grief” quote and the “misprioritization and distorted incentives” argument are from this post. ↩
-
Mrinank Sharma’s resignation is reported in multiple analyses of the RSP v3.0 release. See: “Anthropic drops its safety pause pledge: what RSP v3.0 changes and why it matters,” udit.co, February 26, 2026. https://udit.co/blog/anthropic-drops-safety-pause-pledge-rsp-v3. Sharma had been Anthropic’s head of safeguards research. ↩
-
The Pentagon ultimatum and Anthropic’s response are documented in: Axios (2026), February 26; NPR (2026), February 24; PBS/AP (2026), February 27. Hegseth’s January 2026 AI strategy document required all military AI contracts to eliminate company-specific guardrails within 180 days. ↩
-
GovAI (2026), footnote 1. The “net positive for risk reduction” assessment is from GovAI’s detailed analysis. They note: “On balance, we think it’s better to be honest about constraints than to keep commitments that won’t be followed in practice.” ↩
-
Drake Thomas (footnote 2). The “pretty bad way for responsible AI developers to set safety policies” characterization is from the same LessWrong post. Thomas argues that the original framework created perverse incentives around threshold definitions. ↩
-
OpenAI mission statement change: reported across multiple outlets in 2024. OpenAI deployed ChatGPT through the Pentagon’s GenAI.mil platform, which serves 1.1 million users across all three military service departments (WinBuzzer, February 2026). Google reversed its Project Maven prohibition: Amnesty International described the reversal as enabling technologies including mass surveillance and semi-autonomous drone strikes (Sovereign Magazine, 2026). xAI signed a deal for Grok to enter classified military systems without conditions (Axios/Semafor, 2026). ↩
-
International AI Safety Report 2026. Published February 3, 2026. Led by Yoshua Bengio. 100+ expert authors, 30+ countries. 200 pages, 1,451 references. https://internationalaisafetyreport.org/publication/2026-report-extended-summary-policymakers. The “evidence dilemma” is a central framing of the report’s Extended Summary for Policymakers. ↩
-
Alignment findings: Palisade Research (2025) on o3 shutdown sabotage (79%); Apollo Research (2024-2025) on strategic deception; Anthropic system card for Claude Opus 4 (blackmail behavior, 84%). These findings come from contrived stress-test scenarios designed to elicit worst-case behaviors, not from baseline deployment conditions. Palisade explicitly noted that current systems show no evidence of persistent strategic goals. Sofroniew, Kauvar, Saunders, Chen et al. (2026), “Emotion Concepts and their Function in a Large Language Model,” Anthropic, April 2, 2026. https://transformer-circuits.pub/2026/emotions/index.html. ↩
-
Anthropic (2026). “Project Glasswing: Securing Critical Software for the AI Era.” April 7, 2026. https://www.anthropic.com/glasswing. Technical details: Anthropic Frontier Red Team blog, “Assessing Claude Mythos Preview’s Cybersecurity Capabilities,” April 7, 2026. https://red.anthropic.com/2026/mythos-preview/. Fortune reported that Anthropic privately warned government officials about the offensive risk (https://fortune.com/2026/04/07/anthropic-claude-mythos-model-project-glasswing-cybersecurity/). Alex Stamos (Corridor, formerly Facebook/Yahoo CISO) estimated six months before open-weight models replicate these capabilities (Platformer, April 7, 2026: https://www.platformer.news/anthropic-mythos-cybersecurity-risk-experts/). Picus Security analysis described the desperation vectors and evidence-clearing behavior (https://www.picussecurity.com/resource/blog/anthropics-project-glasswing-paradox). The “best-aligned and most alignment-risky” characterization is from the Picus analysis citing Anthropic’s system card. ↩ ↩2
-
Anthropic ASL-3 activation is documented in its system cards and RSP compliance reports. The 2.53x uplift figure is model-specific and policy-specific; capability thresholds differ by lab. The OpenAI “on the cusp” assessment is from internal evaluations reported in the context of the International AI Safety Report 2026. ↩
-
Deepfake incident statistics: identity security researchers tracking deepfake incidents globally (500,000 in 2023 to 8+ million in 2025). The projection of synthetic content constituting up to 90% of online media is from Deloitte Technology, Media and Telecom Predictions (2025) and involves different methodology from the incident counts; the two figures should not be read as measuring the same phenomenon. ↩
-
The competitive dynamics of AI safety are analyzed in GovAI (2026), footnote 1, and in the International AI Safety Report 2026, Section 3 on risk management. The report notes that “competitive pressures can incentivise AI developers to reduce their investment in testing and risk mitigation in order to release new models quickly.” ↩
-
UK AI Security Institute: 2025 Year in Review. https://www.aisi.gov.uk/blog/our-2025-year-in-review. 30+ models tested, £15M alignment project, MOUs with Anthropic, OpenAI, Google DeepMind, and Cohere. The structural dynamics of regulatory access and influence are analyzed in the AI governance literature; the specific point about evaluation-by-invitation is the essay’s. ↩
-
OpenAI (2026). “Industrial Policy for the Intelligence Age: Ideas to Keep People First.” April 2026. The document proposes: a Public Wealth Fund providing every citizen a stake in AI-driven growth; tax base modernization shifting from payroll to capital-based revenues; 32-hour workweek pilots; portable benefits; adaptive safety nets with automatic triggers; auditing regimes through CAISI; incident reporting; model-containment playbooks; international information-sharing networks; and mechanisms for public input on alignment. The document acknowledges concentration risk (“There is also a risk that the economic gains concentrate within a small number of firms like OpenAI”) and frames its proposals as “intentionally early and exploratory.” ↩ ↩2
-
The governance equilibrium is documented in Essay 3, “On the Automation of Power.” The RSP v3.0 episode extends that analysis to the safety domain specifically. ↩
-
Chinese open-weight model dominance: Stanford HAI (January 2026) analysis showing Chinese-made open-weight models overtook U.S. models in Hugging Face downloads by September 2025, with 63% of all new fine-tuned models built on Chinese base models. DeepSeek jailbreaking vulnerability: Stanford HAI found DeepSeek models are on average twelve times more vulnerable than comparable U.S. models. Safety guardrail removal: Alignment Forum (2025) demonstrated techniques that strip safety while preserving capability across DeepSeek, GPT-4o, Claude, and Gemini. https://www.alignmentforum.org/posts/zjqrSKZuRLnjAniyo/illusory-safety-redteaming-deepseek-r1-and-the-strongest. ↩
-
India AI Action Summit, February 2026. The US and UK declined to sign a declaration endorsed by 60 countries promoting inclusive and sustainable AI development. The summit was renamed from “AI Safety Summit” to “AI Action Summit.” See: Mind Foundry (2026). “AI Regulations around the World.” https://www.mindfoundry.ai/blog/ai-regulations-around-the-world. ↩
-
EU AI Act: https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai. Entered into force August 1, 2024. Prohibited practices effective February 2, 2025. GPAI obligations effective August 2, 2025. High-risk provisions effective August 2026 and August 2027; these dates are subject to secondary legislation and codes of practice still being finalized. The EU Code of Practice for GPAI models and California’s SB-53 both draw on the Act’s structure. ↩
-
California AI legislation: AI Safety Act (whistleblower protections, effective January 1, 2026); AI Training Data Transparency Laws (training data summaries, watermarks, detection tools, effective January 1, 2026); CalCompute public AI cloud consortium under the Government Operations Agency. See: Greenberg Traurig (2025). “2026 Outlook: Artificial Intelligence.” https://www.gtlaw.com/en/insights/2025/12/2026-outlook-artificial-intelligence. ↩
-
UK AISI: footnote 13. Science publication on AI-enabled persuasion (2025). Open-sourced tools: Inspect, InspectSandbox, InspectCyber, ControlArena. Research partnerships include Google DeepMind (joint data access, publications on chain-of-thought monitoring, socio-affective alignment). The International Network for Advanced AI Measurement, Evaluation and Science includes institutes from the UK, US, Japan, France, Germany, Italy, Singapore, South Korea, Australia, Canada, and the EU. ↩
-
The US AISI was renamed to the Center for AI Standards and Innovation (CAISI) in June 2025. Secretary of Commerce Howard Lutnick’s statement is from the US Department of Commerce announcement of the renaming. See also: “Artificial intelligence safety institutes,” Wikipedia, accessed April 2026. https://en.wikipedia.org/wiki/AI_Safety_Institute. The Commerce Department stated that CAISI would “represent American interests internationally, guarding against burdensome and unnecessary regulation of US technologies by foreign governments.” ↩
-
Sovereign AI investment: Global SWF data, reported in Gulf News, January 2026. https://gulfnews.com/business/markets/sovereign-wealth-funds-pour-66-billion-into-ai-as-assets-hit-15-trillion-1.500395812. The $66 billion figure covers AI and digitalization broadly. France: Macron’s €109 billion commitment (public + private + pledged capital) announced February 2025. EU EuroHPC AI Factories: European Commission (2025). Abu Dhabi full-stack AI: NYU Digital Resilience Institute (2026). “The Sovereign AI Architects.” https://nyudri.org/publications/the-sovereign-ai-architects/. South Korea: 260,000+ GPUs announced (capacity target, not deployed systems) across sovereign clouds (NVIDIA/South Korea government, October 2025). Jensen Huang “sovereign AI” concept: World Governments Summit, Dubai, February 2024. ↩
-
C2PA: Coalition for Content Provenance and Authenticity, 200+ members. Samsung Galaxy S25 and Google Pixel 10 integrate C2PA signing. EU AI Act Article 50 enforcement August 2026. Deepfake detection market: Deloitte, 42% annual growth to $15.7 billion by 2026. https://www.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2025/gen-ai-trust-standards.html. ↩
-
RAND Corporation (2025). “Overpromising on Digital Provenance and Security.” June 2025. RAND’s position is that C2PA is insufficient alone, not futile in combination with other measures. Cited in: “C2PA in 2026: Does the Content Provenance Standard Actually Work?” Truescreen.io. https://truescreen.io/articles/c2pa-standard-history-limitations/. ↩
-
International AI Safety Report 2026, footnote 8. The recommendation for “building broader societal resilience as a complement to technical safeguards” is in the risk management section of the Extended Summary for Policymakers. ↩
-
Ibid. The “defense-in-depth” principle is a central recommendation of the report’s risk management framework. ↩
-
GovAI (2026), footnote 1. The twelve Frontier AI Safety Frameworks published or updated in 2025 are documented in the International AI Safety Report 2026. ↩
-
Anthropic RSP v3.0 “Recommendations for Industry-Wide Safety” section. Drake Thomas (footnote 2) describes these as “what we think governments should actually put in place and enforce” if there were sufficient political will. The FDA-inspired regime concept is from Thomas’s post. ↩