Page 46 - AI Standards for Global Impact: From Governance to Action
P. 46
AI Standards for Global Impact: From Governance to Action
Part 2: Thematic AI
Standards Workshops
6 Trustworthy AI testing and validation
The main objectives of the Trustworthy AI testing and validation workshop were:
a) Discuss the research around AI system testing and verification methods
b) Provide an overview of the different methodologies that are used to test and verify AI
systems and their strengths/limitations
c) Identify any gaps in current methodologies for AI system testing and verification
d) Explore examples of some of the methodologies and their applications in AI system testing
such as Agentic AI testing and LLM security testing
e) Discuss opportunities for international collaboration on AI testing and verification through
an international collaborative platform
Collaboration will be key in developing a shared understanding of what constitutes trustworthy
AI and sharing lessons learnt when it comes to best practices and appropriate technical tools
and standards for AI validation and verification. The workshop's main aims were to provide
information about the research trends on AI system testing and verification, covering key
methods, their strengths, and limitations and the opportunities for international collaboration
onfo AI testing.
6�1 AI system testing
The first session discussed the challenges of AI testing and current research work underway in
the field of trustworthy AI testing.
Princeton University shared their work on testing autonomous driving. Trust can be placed in
AI just as it is in humans. AI becomes trustworthy when models deliver consistent, error-free
responses across different environments and make reliable decisions. When users see an AI
system behaving predictably and dependably, they begin to trust it – just as they would a
reliable person. The first and most critical step towards building trust is ensuring that an AI
performs reliably even when faced with unfamiliar data. It is important that AI not only functions
in controlled lab settings but also delivers consistent results when applied to real-world data. All
too often, we see AI models failing to meet expectations when exposed to real-life conditions,
and that undermines trust.
To assess the trustworthiness of an AI system, it needs to be examined from the perspectives
of different stakeholders and its context or socio-technical ecosystem. This socio-technical
"systems view" can help to understand the expected behaviour of the AI system for various
input scenarios.
For example, in the case of autonomous driving, how should adequate test metrics be defined
for AI system testing and what are the various contexts to be taken into consideration for AI
safety?. This is a difficult and multi-faceted question, requiring conscious intervention at every
34