Page 38 - The Annual AI Governance Report 2025 Steering the Future of AI
P. 38

The Annual AI Governance Report 2025: Steering the Future of AI



                   AI Safety Institutes:  AI Safety/Security Institutes (AISIs) and their equivalents have been
                   established around the world , including in the US, the UK, the EU, Japan, Singapore,
                                               154
                   Canada, France, Kenya and Australia. These institutes form an international network that aims
                   to accelerate AI safety science and foster a common understanding of best practices.  These      Theme 6: AI Safety
                                                                                               155
                   institutes coordinate research, develop model evaluation tools, and promote interoperability
                   of safety standards, aiming to support rigorous oversight and scientific consensus on AI risks.
                                                                                                     156
                   AI Incident Reporting and Response Systems: Pre-deployment risk management alone is often
                   insufficient, given that very dangerous models may be deployed, or deployed models may
                   become dangerous after release.  An AI incident is defined as an event or series of events
                                                 157
                   involving the development, use or malfunction of one or more AI systems that directly or indirectly
                   leads to harm such as injury, disruption to critical infrastructure, violations of human rights, or
                   damage to property, communities or the environment.  To address this gap, governments and
                                                                   158
                   standard-setting organizations are exploring mechanisms for post-deployment monitoring and
                   response, including incident reporting. The OECD’s Global AI Incident Reporting Framework,
                   released in early 2025, is a step towards standardised, interoperable AI incident reporting
                   worldwide.  The framework is designed to identify high-risk systems, inform real-time risk
                             159
                   management and support mandatory and voluntary reporting via the AI Incidents Monitor
                   (AIM).
                         160

                   6.3  Corporate Risk Mitigation Practices and its Limitations

                   Corporate Technical Safety Research: AI companies such as Anthropic, Google DeepMind,
                   and OpenAI primarily direct their technical safety research towards pre-deployment areas,
                   focusing on model alignment, testing, and evaluation to ensure AI systems behave as intended
                   and to minimise large-scale misuse or accident risks.  Key approaches in this research include
                                                                  161
                   reinforcement learning from human feedback, adversarial testing, red-teaming, and robustness
                   analysis, all aimed at preventing unintended harmful behaviours as AI models become
                   more capable and autonomous.  However, there are significant research gaps in high-risk
                                                 162
                   deployment areas such as healthcare, finance, misinformation, and the handling of persuasive
                   or addictive features, which are often less prioritised due to commercial imperatives.
                                                                                                     163
                   Moreover, the concentration of safety research within a limited number of major corporations
                   can exacerbate these oversights and restrict broader public and academic scrutiny, particularly



                   154   Araujo, R. (2025, April 10). Understanding the first wave of AI safety Institutes: characteristics, functions, and
                      challenges — Institute for AI Policy and Strategy. Institute for AI Policy and Strategy.
                   155   Araujo, R. (2025, April 10). Understanding the first wave of AI safety Institutes: characteristics, functions, and
                      challenges — Institute for AI Policy and Strategy. Institute for AI Policy and Strategy.
                   156   Allen, G. C.,  &  Adamson,  G. (2024).  The AI  Safety Institute  International Network: Next steps and
                      recommendations. CSIS.
                   157   O’Brien, J., Ee, S., & Williams, Z. (2023, September 30). Deployment Corrections: An incident response
                      framework for frontier AI models. arXiv.org.
                   158   OECD (2025). Towards a common reporting framework for AI incidents. OECD Artificial Intelligence Papers,
                      No. 34, OECD Publishing, Paris.
                   159   OECD (2025). Towards a common reporting framework for AI incidents. OECD Artificial Intelligence Papers,
                      No. 34, OECD Publishing, Paris.
                   160   OECD AI Policy Observatory Portal. (2014, January 1).
                   161   Buhl, M. D., Bucknall, B., & Masterson, T. (2025, February 5). Emerging practices in frontier AI safety
                      frameworks. arXiv.org.
                   162   Delaney, O. (2025, April 10). Mapping technical safety research at AI Companies — Institute for AI Policy and
                      Strategy. Institute for AI Policy and Strategy.
                   163   Strauss, I., Moure, I., O’Reilly, T., & Rosenblat, S. (2025). The State of AI Governance Research: AI Safety and
                      Reliability in Real World Commercial deployment. Social Science Research Council.



                                                            29
   33   34   35   36   37   38   39   40   41   42   43