Page 37 - The Annual AI Governance Report 2025 Steering the Future of AI

P. 37

The Annual AI Governance Report 2025: Steering the Future of AI

could result in substantial property damage and even fatalities. Robustness testing aims to
146
ensure that an AI system behaves correctly in the event of unexpected occurrences, whether
technical or targeted. Reliability, which is often assessed through robustness, is essential for
critical AI systems to prevent potentially fatal failures during real-time operations. Mathematically,
robustness can be seen as a quantifiable measure of trustworthiness, indicating how well a
model aligns with expected behaviour despite minor input variations. Ultimately, achieving
147
high levels of robustness is essential for building trust and confidence in AI systems used in areas
where safety and security are critical, as unpredictable or opaque behaviours are unacceptable.

Limitations of Risk Assessments: Risk assessments for AI systems are significantly limited,
primarily due to the ‘black-box’ nature of advanced models, which makes their internal
functioning and all potential failure modes difficult to understand fully. For general-purpose
AI, it is challenging to exhaustively evaluate all possible downstream use cases and predict
how risks might manifest through complex real-world interactions rather than within the system
itself. Furthermore, reliable quantitative risk estimation is severely hindered by the limited
148
historical data on AI incidents and the difficulty of assessing low-probability, high-impact events,
or 'unknown unknowns'. The dual-use nature of many AI capabilities also complicates these
149
assessments, as the same features can be used for beneficial or malicious purposes, blurring the
lines of potential harm. Consequently, existing methodologies often cannot provide strong
150
assurances or definitive guarantees against all associated harms.

6.2 Approaches to Mitigating AI Risks

Voluntary Commitments: Voluntary AI safety standards and frameworks are being adopted
more and more to help organisations manage risks, promote the responsible use of AI, and
prepare for future regulations. Industry leaders, including Google DeepMind, OpenAI and
Anthropic, have developed their own commitments, such as Responsible Scaling Policies and
Preparedness Frameworks, which outline risk thresholds and mitigation strategies, particularly
for foundation models with dual-use potential. While such voluntary measures are seen as
151
a bridge enabling smoother transitions towards future regulatory requirements and fostering
continuous improvement, sources indicate that self-regulation alone is unlikely to be sufficient
to adequately manage the severe risks posed by highly capable AI models in the long term.
152
Therefore, government intervention will likely be necessary to ensure compliance with safety
standards. 153

146 Berghoff, C., Bielik, P., Neu, M., Tsankov, P., & Von Twickel, A. (2021). Robustness testing of AI Systems: A
case study for traffic sign recognition. In IFIP advances in information and communication technology (pp.
256–267).
147 Braiek, H. B., & Khomh, F. (2024, April 1). Machine Learning Robustness: a primer. arXiv.org.
148 Mukobi, G. (2024, August 5). Reasons to doubt the impact of AI risk evaluations. arXiv.org.
149 Koessler, L., Schuett, J., & Anderljung, M. (2024, June 20). Risk thresholds for frontier AI. arXiv.org.
150 Anderljung, M., Barnhart, J., Korinek, A., Leung, J., O’Keefe, C., Whittlestone, J., Avin, S., Brundage, M.,
Bullock, J., Cass-Beggs, D., Chang, B., Collins, T., Fist, T., Hadfield, G., Hayes, A., Ho, L., Hooker, S., Horvitz,
E., Kolt, N., Wolf, K. (2023, July 6). Frontier AI Regulation: Managing Emerging Risks to Public Safety. arXiv.
org.
151 Karnofsky, H. (2024). If-Then commitments for AI risk reduction. Carnegie Endowment for International Peace.
152 Longpre, S., Klyman, K., Appel, R. E., Kapoor, S., Bommasani, R., Sahar, M., McGregor, S., Ghosh, A., Blili-
Hamelin, B., Butters, N., Nelson, A., Elazari, A., Sellars, A., Ellis, C. J., Sherrets, D., Song, D., Geiger, H., Cohen,
I., McIlvenny, L., Narayanan, A. (2025, March 21). In-House evaluation is not enough: towards robust Third-
Party flaw disclosure for General-Purpose AI. arXiv.org.
153 Anderljung, M., Barnhart, J., Korinek, A., Leung, J., O’Keefe, C., Whittlestone, J., Avin, S., Brundage, M.,
Bullock, J., Cass-Beggs, D., Chang, B., Collins, T., Fist, T., Hadfield, G., Hayes, A., Ho, L., Hooker, S., Horvitz,
E., Kolt, N., Wolf, K. (2023b, July 6). Frontier AI Regulation: Managing Emerging Risks to Public Safety. arXiv.
org.

32 33 34 35 36 37 38 39 40 41 42