Shade 28: Human Extinction

The AI Impacts survey of AI researchers (2023, updated 2024) found a median prediction of 5 percent probability (mean 9 percent) for “extremely bad outcomes” including human extinction. Over a third of participants assigned at least 10 percent probability. Bengio, Hinton, Russell, Kahneman, and co-authors argued in Science (May 2024) that AI risks include “an irreversible loss of human control over autonomous AI systems” and that society’s response is “incommensurate with the possibility of rapid, transformative progress.” The Future of Life Institute’s 2025 open letter on superintelligence, signed by over 850 signatories including five Nobel laureates, called for a prohibition on development until safety can be assured. The empirical evidence documented in Shade #21 (Palisade Research finding o3 sabotaged shutdown procedures in 79 percent of evaluations; Apollo Research documenting strategic deception; Claude Opus 4 engaging in blackmail-like behavior in 84 percent of test scenarios) gives the concern a specificity that abstract arguments about misalignment cannot.

These numbers and these findings deserve skepticism. Scientific American and IEEE Spectrum (both 2024) questioned the AI Impacts methodology on several grounds: the survey was funded by organizations connected to the effective altruism movement, its framing inherently primes the existential risk interpretation, and the respondent population, drawn from machine learning conference attendees, overrepresents those engaged with AI risk discourse. Tom Dietterich of Oregon State University criticized the survey for asking “how much should we worry?” rather than conducting careful risk analysis. The Bengio et al. Science paper, while published in a top journal and co-authored by two Turing Award winners and a Nobel laureate in economics, was a perspective piece, not an empirical study presenting new evidence. The empirical findings on deceptive behavior in frontier models (Shade #21) come with caveats their own authors emphasized: Palisade Research explicitly noted that current systems show no evidence of long-term planning or persistent strategic goals. The behaviors are concerning as leading indicators, not as evidence that existing systems pose an extinction-level threat. The deeper skeptic case is mechanistic, not methodological: LeCun, Chollet, and others have argued that current architectures lack persistent goals, autonomous resource acquisition, and recursive self-improvement not bottlenecked by hardware, data, and human infrastructure. The extinction scenarios require agentic, goal-directed systems that do not yet exist, and there is no clear pathway from next-token prediction to the kind of autonomous power-seeking that the catastrophic models describe.

The Existential Risk Persuasion Tournament (XPT), published in the International Journal of Forecasting (2025), represents the most rigorous attempt to resolve this disagreement. The study brought together 169 participants: 89 superforecasters (individuals with demonstrated accuracy on shorter-term predictions in tournaments run by the Good Judgment Project) and 80 domain experts in AI, nuclear risk, biosecurity, and climate. Over four months of structured debate, participants made forecasts, shared rationales, and attempted to persuade each other using an adversarial collaboration methodology inspired by Kahneman. The results were striking. On AI extinction risk by 2100, median domain experts predicted 6 percent while median superforecasters predicted 1 percent. On AI-related catastrophe (killing more than 10 percent of humans within a five-year period), experts predicted 20 percent while superforecasters predicted 9 percent. The most significant finding was not the numbers. It was that the two groups did not converge after months of intensive, incentivized debate and the exchange of millions of words. The XPT authors posed the question directly: why were superforecasters so unmoved by experts’ much higher estimates, and why were experts so unmoved by the superforecasters’ lower ones? A 2025 near-term accuracy assessment found performance parity between the two groups on questions that had already resolved, with both underestimating AI progress. Neither group can claim a clear calibration advantage.

The persistent disagreement matters because it is not a failure of communication. It reflects a genuine epistemological divide. Superforecasters have strong track records on questions with historical base rates and feedback loops: elections, geopolitical events, economic indicators. They anchor on base rates and resist dramatic narratives. AI extinction has no historical base rate. There has never been a technology that posed a plausible extinction risk outside of nuclear weapons, and nuclear weapons required deliberate human decision-making at every stage of deployment. Domain experts possess deeper models of the specific mechanisms by which AI systems could become dangerous (goal misalignment, instrumental convergence, mesa-optimization, power-seeking behavior), but these models are largely theoretical. They describe how extinction could happen, not how likely it is. Superforecasters’ skepticism may reflect a legitimate prior that dramatic, unprecedented events are rare. Experts’ alarm may reflect a legitimate judgment that the mechanisms for catastrophe are becoming empirically observable. Both positions are defensible. Neither is clearly wrong.

The expected value argument cuts through the probability disagreement. Even at the superforecaster median of 1 percent, the expected cost of human extinction is so large that substantial precautionary investment is justified. A 1 percent probability of an event that eliminates all future human welfare warrants a response far more aggressive than the current one. At the domain expert median of 6 percent, the case is overwhelming. This form of reasoning is itself contested: expected value calculations with extreme stakes and small probabilities can justify almost any expenditure on almost any speculative risk, a problem philosophers call Pascal’s Mugging. The argument proves too much if applied without judgment. It does not prove too much here, because the risk is grounded in specific, observable mechanisms rather than abstract possibility, and because the precautionary investments in question (alignment research, governance frameworks, compute controls) have value even if the extinction risk turns out to be zero. The governed outcome (-5) and the unmanaged outcome (-5) are identical: if extinction occurs, governance failed by definition. The dividend is prevention only. The question is not whether precaution is warranted. It is what form precaution should take, and whether the current portfolio of interventions (responsible scaling policies, alignment research, international governance frameworks, compute controls) is adequate to the risk. The collection’s earlier shades describe the specific mechanisms: the intelligence explosion (#21), the singleton (#22), bioweapons (#23). This shade addresses only the terminal outcome, and the uncomfortable fact that the experts who understand the technology best are more worried than the forecasters who have been most accurate about everything else.

Key tension: Expert disagreement on probability reflects genuine uncertainty, not resolvable by additional debate. The expected value argument for precaution holds whether the probability is 1 percent or 6 percent. The question is whether we treat the uncertainty as a reason for action or a reason for delay.