Page 36 - The Annual AI Governance Report 2025 Steering the Future of AI
P. 36

The Annual AI Governance Report 2025: Steering the Future of AI



                       Theme 6: AI Safety and Risk Management


                   6.1  Risks of AI and System Safety Assessment


                   AI Risk Classification and Dual Use Nature: Risks stemming from AI can be categorised into       Theme 6: AI Safety
                   three main areas: Malicious use risks, where systems are deliberately repurposed for harmful
                   activities such as cyberattacks, disinformation campaigns or even the development of biological
                   weapons; risks from malfunctions, which arise from unforeseen technical failures, inherent biases
                   in training data or a lack of understanding of the system's true capabilities; and systemic risks,
                   which encompass broader societal impacts such as market concentration, large-scale labour
                   market disruption and the exacerbation of global inequalities.  A significant source of these
                                                                           139
                   risks is the dual-use nature of AI technology: its powerful capabilities can be harnessed for
                   tremendous benefit or severe harm. The dominant discourse surrounding AI safety remains
                   Western-centric, often failing to adequately incorporate the diverse linguistic traditions, cultural
                   values and lived experiences of communities within Global Majority nations and marginalised
                   groups.  This systemic exclusion risks entrenching global inequities and underscores the
                          140
                   need for more inclusive approaches and culturally informed evaluation frameworks in AI safety
                   development.

                   Best Practices in System Safety Assessment: Safety institutes and AI labs routinely conduct
                   structured model evaluations to assess the capabilities and risks of advanced AI systems,
                   particularly general-purpose AI and frontier AI models.  This includes pre-deployment risk
                                                                     141
                   assessments, dangerous capabilities evaluations, and benchmarking against standardised
                   tasks.  Red-teaming—where experts attempt to “break” model safeguards—is increasingly
                        142
                   used to identify vulnerabilities before deployment.  Safety cases, meaning structured safety
                                                                 143
                   arguments supported by evidence, are gaining traction as a scalable method for demonstrating
                   that an AI system is safe within its deployment context.  Other best practices to assess model
                                                                    144
                   safety include ongoing monitoring, incident response planning, and regular third-party audits
                   to ensure that safety measures remain effective over time and across evolving deployment
                   scenarios.
                   Robustness and Reliability Testing: AI system robustness refers to its ability to maintain
                   consistent performance, even when faced with unexpected conditions or altered input data.
                                                                                                     145
                   This is particularly important for applications such as autonomous driving, where malfunctions





                   139   Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel,
                      B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika,
                      M., Michael, J., Zeng, Y. (2025, January 29). International AI Safety Report. arXiv.org.
                   140   Okolo, C. T. (2025, February 12). A new writing series: Re-envisioning AI safety through global majority
                      perspectives. Brookings.
                   141   Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel,
                      B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika,
                      M., Michael, J., Zeng, Y. (2025, January 29). International AI Safety Report. arXiv.org.
                   142   Friedland, A. (2025, May 28). AI Safety Evaluations: An Explainer. Center for Security and Emerging
                      Technology.
                   143   Schuett, J., Dreksler, N., Anderljung, M., McCaffary, D., Heim, L., Bluemke, E., & Garfinkel, B. (2023, May 11).
                      Towards best practices in AGI safety and governance: A survey of expert opinion. arXiv.org.
                   144   Buhl, M. D., Sett, G., Koessler, L., Schuett, J., & Anderljung, M. (2024, October 28). Safety cases for frontier AI.
                      arXiv.org & Hilton, B., Davidsen Buhl, M., Korbak, T., & Irving, G. (2025). Safety Cases: A Scalable Approach
                      to Frontier AI Safety. arXiv.org.
                   145   Wozniak, A., Duong, N. Q. K., Benderitter, I., Leroy, S., Segura, S., & Mazo, R. (2023). Robustness testing of
                      an industrial road object detection system. IEEE, 82–89.



                                                            27
   31   32   33   34   35   36   37   38   39   40   41