Page 37 - The Annual AI Governance Report 2025 Steering the Future of AI
P. 37

The Annual AI Governance Report 2025: Steering the Future of AI



                  could result in substantial property damage and even fatalities.  Robustness testing aims to
                                                                           146
                  ensure that an AI system behaves correctly in the event of unexpected occurrences, whether
                  technical or targeted. Reliability, which is often assessed through robustness, is essential for
                  critical AI systems to prevent potentially fatal failures during real-time operations. Mathematically,
                  robustness can be seen as a quantifiable measure of trustworthiness, indicating how well a
                  model aligns with expected behaviour despite minor input variations.  Ultimately, achieving
                                                                                 147
                  high levels of robustness is essential for building trust and confidence in AI systems used in areas
                  where safety and security are critical, as unpredictable or opaque behaviours are unacceptable.

                  Limitations of Risk Assessments: Risk assessments for AI systems are significantly limited,
                  primarily due to the ‘black-box’ nature of advanced models, which makes their internal
                  functioning and all potential failure modes difficult to understand fully. For general-purpose
                  AI, it is challenging to exhaustively evaluate all possible downstream use cases and predict
                  how risks might manifest through complex real-world interactions rather than within the system
                  itself.  Furthermore, reliable quantitative risk estimation is severely hindered by the limited
                       148
                  historical data on AI incidents and the difficulty of assessing low-probability, high-impact events,
                  or 'unknown unknowns'.  The dual-use nature of many AI capabilities also complicates these
                                        149
                  assessments, as the same features can be used for beneficial or malicious purposes, blurring the
                  lines of potential harm.  Consequently, existing methodologies often cannot provide strong
                                       150
                  assurances or definitive guarantees against all associated harms.

                  6.2  Approaches to Mitigating AI Risks


                  Voluntary Commitments: Voluntary AI safety standards and frameworks are being adopted
                  more and more to help organisations manage risks, promote the responsible use of AI, and
                  prepare for future regulations. Industry leaders, including Google DeepMind, OpenAI and
                  Anthropic, have developed their own commitments, such as Responsible Scaling Policies and
                  Preparedness Frameworks, which outline risk thresholds and mitigation strategies, particularly
                  for foundation models with dual-use potential.  While such voluntary measures are seen as
                                                            151
                  a bridge enabling smoother transitions towards future regulatory requirements and fostering
                  continuous improvement, sources indicate that self-regulation alone is unlikely to be sufficient
                  to adequately manage the severe risks posed by highly capable AI models in the long term.
                                                                                                    152
                  Therefore, government intervention will likely be necessary to ensure compliance with safety
                  standards. 153


                  146   Berghoff, C., Bielik, P., Neu, M., Tsankov, P., & Von Twickel, A. (2021). Robustness testing of AI Systems: A
                     case study for traffic sign recognition. In IFIP advances in information and communication technology (pp.
                     256–267).
                  147   Braiek, H. B., & Khomh, F. (2024, April 1). Machine Learning Robustness: a primer. arXiv.org.
                  148   Mukobi, G. (2024, August 5). Reasons to doubt the impact of AI risk evaluations. arXiv.org.
                  149   Koessler, L., Schuett, J., & Anderljung, M. (2024, June 20). Risk thresholds for frontier AI. arXiv.org.
                  150   Anderljung, M., Barnhart, J., Korinek, A., Leung, J., O’Keefe, C., Whittlestone, J., Avin, S., Brundage, M.,
                     Bullock, J., Cass-Beggs, D., Chang, B., Collins, T., Fist, T., Hadfield, G., Hayes, A., Ho, L., Hooker, S., Horvitz,
                     E., Kolt, N., Wolf, K. (2023, July 6). Frontier AI Regulation: Managing Emerging Risks to Public Safety. arXiv.
                     org.
                  151   Karnofsky, H. (2024). If-Then commitments for AI risk reduction. Carnegie Endowment for International Peace.
                  152   Longpre, S., Klyman, K., Appel, R. E., Kapoor, S., Bommasani, R., Sahar, M., McGregor, S., Ghosh, A., Blili-
                     Hamelin, B., Butters, N., Nelson, A., Elazari, A., Sellars, A., Ellis, C. J., Sherrets, D., Song, D., Geiger, H., Cohen,
                     I., McIlvenny, L., Narayanan, A. (2025, March 21). In-House evaluation is not enough: towards robust Third-
                     Party flaw disclosure for General-Purpose AI. arXiv.org.
                  153   Anderljung, M., Barnhart, J., Korinek, A., Leung, J., O’Keefe, C., Whittlestone, J., Avin, S., Brundage, M.,
                     Bullock, J., Cass-Beggs, D., Chang, B., Collins, T., Fist, T., Hadfield, G., Hayes, A., Ho, L., Hooker, S., Horvitz,
                     E., Kolt, N., Wolf, K. (2023b, July 6). Frontier AI Regulation: Managing Emerging Risks to Public Safety. arXiv.
                     org.



                                                           28
   32   33   34   35   36   37   38   39   40   41   42