Page 42 - The Annual AI Governance Report 2025 Steering the Future of AI

P. 42

The Annual AI Governance Report 2025: Steering the Future of AI

marginalisation or extinction of humanity. Although this concept defines the 'global worst-
195
case scenario' that would permanently halt human development, there is a lack of consensus
among researchers about how such risks arise and how to manage them. Understanding these
196
risks hinges on addressing core concerns such as the inability to fully comprehend and influence Theme 6: AI Safety
decision-making processes in deep learning models, the rapid pace of AI development, and the
growing human-like capabilities of AI systems. Some identify two primary theoretical pathways
197
to AI x-catastrophes: the decisive AI x-risk hypothesis, which posits that abrupt, large-scale events
are caused by advanced AI systems (e.g. uncontrollable superintelligence); and the accumulative
AI x-risk hypothesis, which suggests that a gradual build-up of smaller, interconnected AI-
induced disruptions erodes societal resilience until irreversible collapse occurs. Despite these
198
theoretical frameworks, expert opinion on the likelihood and imminence of severe outcomes,
such as loss-of-control scenarios, varies greatly, with some considering them implausible and
others viewing them as a global priority comparable to pandemics and nuclear war,

Challenges of Risk Mitigation Frameworks: Developing robust mitigation frameworks to
address the long-term risks and enhance societal resilience presents significant empirical
challenges. The rapid and often unpredictable advancements in general-purpose AI capabilities
create an 'evidence dilemma' for policymakers, making it difficult to fully assess and prepare for
these emerging threats. Dangerous capabilities can appear spontaneously without explicit
199
programming, which makes them hard to predict. Current empirical evaluations, which are
200
often limited to 'spot-checks' and demonstrations, often fail to reliably rule out dangerous
capabilities or predict how advanced AI systems might behave in different settings. This can
201
result in alignment being falsely demonstrated under testing conditions. Therefore, effective
mitigation requires a proactive, adaptive governance approach that mandates rigorous
evaluations, including 'safety cases' where developers must demonstrate that risk levels are
acceptable. 202

6.6 Verification as a path to reduce risks from AI

Verification in AI: Verification refers to the ability of one party to confirm or validate the actions
or claims of another. In the context of frontier AI, this involves confirming a range of important
aspects related to the development and deployment of AI systems such as details of training runs,
the implementation of safety tests and mitigations, evaluation outcomes on system capabilities

195 Martínez, Eric and Winter, Christoph. (December 15, 2022). Ordinary Meaning of Existential Risk LPP Working
Paper No. 7-2022.
196 Stauffer, M., Seifert, K., Aristizábal, A., Chaudhry, H. T., Kohler, K., Hussein, S. N., Salinas Leyva, C., Gebert, A.,
Arbeid, J., Estier, M., Matinyi, S., Hausenloy, J., Kaur, J., Rath, S., & Wu, Y.-H. (2023, March 13). Existential risk
and rapid technological change – a thematic study for UNDRR. Simon Institute for Longterm Governance.
197 Abungu, C., Malonza, M., & Adan, S. N. (2023, December 7). Can apparent bystanders distinctively shape an
outcome? Global south countries and global catastrophic risk-focused governance of artificial intelligence.
arXiv.org.
198 Kasirzadeh, A. (2024, January 15). Two types of AI existential risk: decisive and accumulative. arXiv.org.
199 Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel,
B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika,
M., Michael, J., Zeng, Y. (2025, January 29). International AI Safety Report. arXiv.org.
200 Bengio, Y., Hinton, G., Yao, A., Song, D., Abbeel, P., Darrell, T., Harari, Y. N., Zhang, Y., Xue, L., Shalev-Shwartz,
S., Hadfield, G., Clune, J., Maharaj, T., Hutter, F., Baydin, A. G., McIlraith, S., Gao, Q., Acharya, A., Krueger,
D., Mindermann, S. (2024). Managing extreme AI risks amid rapid progress. Science, 384(6698), 842–845.
201 Fn. 170
202 Fn. 170

37 38 39 40 41 42 43 44 45 46 47