Page 36 - The Annual AI Governance Report 2025 Steering the Future of AI

P. 36

The Annual AI Governance Report 2025: Steering the Future of AI

Theme 6: AI Safety and Risk Management

6.1 Risks of AI and System Safety Assessment

AI Risk Classification and Dual Use Nature: Risks stemming from AI can be categorised into Theme 6: AI Safety
three main areas: Malicious use risks, where systems are deliberately repurposed for harmful
activities such as cyberattacks, disinformation campaigns or even the development of biological
weapons; risks from malfunctions, which arise from unforeseen technical failures, inherent biases
in training data or a lack of understanding of the system's true capabilities; and systemic risks,
which encompass broader societal impacts such as market concentration, large-scale labour
market disruption and the exacerbation of global inequalities. A significant source of these
139
risks is the dual-use nature of AI technology: its powerful capabilities can be harnessed for
tremendous benefit or severe harm. The dominant discourse surrounding AI safety remains
Western-centric, often failing to adequately incorporate the diverse linguistic traditions, cultural
values and lived experiences of communities within Global Majority nations and marginalised
groups. This systemic exclusion risks entrenching global inequities and underscores the
140
need for more inclusive approaches and culturally informed evaluation frameworks in AI safety
development.

Best Practices in System Safety Assessment: Safety institutes and AI labs routinely conduct
structured model evaluations to assess the capabilities and risks of advanced AI systems,
particularly general-purpose AI and frontier AI models. This includes pre-deployment risk
141
assessments, dangerous capabilities evaluations, and benchmarking against standardised
tasks. Red-teaming—where experts attempt to “break” model safeguards—is increasingly
142
used to identify vulnerabilities before deployment. Safety cases, meaning structured safety
143
arguments supported by evidence, are gaining traction as a scalable method for demonstrating
that an AI system is safe within its deployment context. Other best practices to assess model
144
safety include ongoing monitoring, incident response planning, and regular third-party audits
to ensure that safety measures remain effective over time and across evolving deployment
scenarios.
Robustness and Reliability Testing: AI system robustness refers to its ability to maintain
consistent performance, even when faced with unexpected conditions or altered input data.
145
This is particularly important for applications such as autonomous driving, where malfunctions

139 Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel,
B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika,
M., Michael, J., Zeng, Y. (2025, January 29). International AI Safety Report. arXiv.org.
140 Okolo, C. T. (2025, February 12). A new writing series: Re-envisioning AI safety through global majority
perspectives. Brookings.
141 Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel,
B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika,
M., Michael, J., Zeng, Y. (2025, January 29). International AI Safety Report. arXiv.org.
142 Friedland, A. (2025, May 28). AI Safety Evaluations: An Explainer. Center for Security and Emerging
Technology.
143 Schuett, J., Dreksler, N., Anderljung, M., McCaffary, D., Heim, L., Bluemke, E., & Garfinkel, B. (2023, May 11).
Towards best practices in AGI safety and governance: A survey of expert opinion. arXiv.org.
144 Buhl, M. D., Sett, G., Koessler, L., Schuett, J., & Anderljung, M. (2024, October 28). Safety cases for frontier AI.
arXiv.org & Hilton, B., Davidsen Buhl, M., Korbak, T., & Irving, G. (2025). Safety Cases: A Scalable Approach
to Frontier AI Safety. arXiv.org.
145 Wozniak, A., Duong, N. Q. K., Benderitter, I., Leroy, S., Segura, S., & Mazo, R. (2023). Robustness testing of
an industrial road object detection system. IEEE, 82–89.

31 32 33 34 35 36 37 38 39 40 41