Page 42 - The Annual AI Governance Report 2025 Steering the Future of AI
P. 42

The Annual AI Governance Report 2025: Steering the Future of AI



                   marginalisation or extinction of humanity.  Although this concept defines the 'global worst-
                                                         195
                   case scenario' that would permanently halt human development, there is a lack of consensus
                   among researchers about how such risks arise and how to manage them.  Understanding these
                                                                                  196
                   risks hinges on addressing core concerns such as the inability to fully comprehend and influence   Theme 6: AI Safety
                   decision-making processes in deep learning models, the rapid pace of AI development, and the
                   growing human-like capabilities of AI systems.  Some identify two primary theoretical pathways
                                                           197
                   to AI x-catastrophes: the decisive AI x-risk hypothesis, which posits that abrupt, large-scale events
                   are caused by advanced AI systems (e.g. uncontrollable superintelligence); and the accumulative
                   AI x-risk hypothesis, which suggests that a gradual build-up of smaller, interconnected AI-
                   induced disruptions erodes societal resilience until irreversible collapse occurs.  Despite these
                                                                                         198
                   theoretical frameworks, expert opinion on the likelihood and imminence of severe outcomes,
                   such as loss-of-control scenarios, varies greatly, with some considering them implausible and
                   others viewing them as a global priority comparable to pandemics and nuclear war,

                   Challenges of Risk Mitigation Frameworks: Developing robust mitigation frameworks to
                   address the long-term risks and enhance societal resilience presents significant empirical
                   challenges. The rapid and often unpredictable advancements in general-purpose AI capabilities
                   create an 'evidence dilemma' for policymakers, making it difficult to fully assess and prepare for
                   these emerging threats.  Dangerous capabilities can appear spontaneously without explicit
                                         199
                   programming, which makes them hard to predict.  Current empirical evaluations, which are
                                                                200
                   often limited to 'spot-checks' and demonstrations, often fail to reliably rule out dangerous
                   capabilities or predict how advanced AI systems might behave in different settings.  This can
                                                                                             201
                   result in alignment being falsely demonstrated under testing conditions. Therefore, effective
                   mitigation requires a proactive, adaptive governance approach that mandates rigorous
                   evaluations, including 'safety cases' where developers must demonstrate that risk levels are
                   acceptable. 202


                   6.6  Verification as a path to reduce risks from AI

                   Verification in AI: Verification refers to the ability of one party to confirm or validate the actions
                   or claims of another. In the context of frontier AI, this involves confirming a range of important
                   aspects related to the development and deployment of AI systems such as details of training runs,
                   the implementation of safety tests and mitigations, evaluation outcomes on system capabilities






                   195   Martínez, Eric and Winter, Christoph. (December 15, 2022). Ordinary Meaning of Existential Risk LPP Working
                      Paper No. 7-2022.
                   196   Stauffer, M., Seifert, K., Aristizábal, A., Chaudhry, H. T., Kohler, K., Hussein, S. N., Salinas Leyva, C., Gebert, A.,
                      Arbeid, J., Estier, M., Matinyi, S., Hausenloy, J., Kaur, J., Rath, S., & Wu, Y.-H. (2023, March 13). Existential risk
                      and rapid technological change – a thematic study for UNDRR. Simon Institute for Longterm Governance.
                   197   Abungu, C., Malonza, M., & Adan, S. N. (2023, December 7). Can apparent bystanders distinctively shape an
                      outcome? Global south countries and global catastrophic risk-focused governance of artificial intelligence.
                      arXiv.org.
                   198   Kasirzadeh, A. (2024, January 15). Two types of AI existential risk: decisive and accumulative. arXiv.org.
                   199   Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel,
                      B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika,
                      M., Michael, J., Zeng, Y. (2025, January 29). International AI Safety Report. arXiv.org.
                   200   Bengio, Y., Hinton, G., Yao, A., Song, D., Abbeel, P., Darrell, T., Harari, Y. N., Zhang, Y., Xue, L., Shalev-Shwartz,
                      S., Hadfield, G., Clune, J., Maharaj, T., Hutter, F., Baydin, A. G., McIlraith, S., Gao, Q., Acharya, A., Krueger,
                      D., Mindermann, S. (2024). Managing extreme AI risks amid rapid progress. Science, 384(6698), 842–845.
                   201   Fn. 170
                   202   Fn. 170



                                                            33
   37   38   39   40   41   42   43   44   45   46   47