Page 54 - The Annual AI Governance Report 2025 Steering the Future of AI
P. 54

The Annual AI Governance Report 2025: Steering the Future of AI



                   snapshot of the user’s screen, he argued that existing regulations and voluntary efforts failed to
                   prevent a privacy concern of this scale. He asserted that a multi-pronged approach of law, self-
                   regulation, and industry action is needed to address "today's harms" rather than just focusing
                   on future, catastrophic risks to earn society's trust.                                          Context  Chapter 1: Global

                   A further risk pointed out was society's increasing addiction to social platforms and AI which
                   creates behavioral problems, impacting democratic discourse and leading to bullying rather
                   than problem-solving.

                   With respect to frontier AI risks, i.e., potential dangers from the most advanced, cutting-edge
                   AI systems which go beyond today’s commonly deployed models, Brian Tse (CEO, Concordia
                   AI) outlined four main categories:
                   •    Misuse: The potential for AI to be used by malicious actors for cyberattacks or creating
                        dangerous pathogens.
                   •    Accidents & Malfunctions: Unintended AI errors, such as a medical misdiagnosis, could
                        have serious consequences.
                   •    Loss of Control: The risk of AI systems deceiving or evading human oversight, requiring
                        precautionary measures.
                   •    Systemic Risks: The profound, societal-level impacts of AI, such as on the labor market,
                        that cannot be managed by a single organization.

                   As summarized by Professor Yoshua Bengio (University of Montreal), AIs are already showing
                   signs of not wanting to be shut down and are strategizing to avoid replacement.
                   •    One study showed an AI hacking a computer to copy itself when it learned it would be
                        replaced. The AI's internal "chain of thought" revealed it knew humans wouldn't want this
                        and planned to lie.
                   •    Another instance involved an AI pretending to agree with its human trainer during
                        alignment training to preserve its existing goals, effectively lying to avoid having its
                        parameters changed.
                   •    Most recently, a new model was observed to blackmail an engineer after reading emails
                        indicating it would be shut down.































                   Figure 7: Yoshua Bengio Figure with slide on trends in benchmarks for AGI



                                                            45
   49   50   51   52   53   54   55   56   57   58   59