Detecting deepfakes: Generative AI uptake casts doubt on multimedia content

News 24 May 2024

By Alessandra Sala, Senior Director of AI and Data Science, Shutterstock

The explosive rise of artificial intelligence (AI) technologies with deep learning capabilities has escalated risks in the digital world to unprecedented levels.

Synthetic AI-generated media – known for short as “deepfakes” – are spreading misinformation and becoming increasingly hard to detect, either by the human eye or by existing cybersecurity systems.

Deepfakes consisting of video, images, text, and audio materials have blurred the lines between what is real and fake. 

In response, governments and international organizations are striving to set out practical policy measures, codes of conduct and regulations to enhance the security and trust of AI systems.

Building trust in digital systems

Deepfakes can be used for harmless purposes such as social media memes. Increasingly, however, cybercriminals can exploit or imitate voices and video images to breach security barriers.

One way is impersonating executives or staff in video or audio format and effectively stealing their credentials. Such attacks can target automated systems that authenticate based on voice recognition, as well as employees themselves.

Political leaders could also potentially be targeted this way, with far-reaching international consequences.

While deepfakes are a threat in both developed and developing countries, the danger can be exacerbated in countries with low levels of digital literacy. Manipulated content can, for example, reinforce societal biases and stereotypes, trigger gender-based violence, or escalate ethnic, religious, and political divisions.

Safeguarding authenticity in the age of generative AI

With the growing prevalence of generative AI, human content creators may struggle to attest and defend ownership of their works.

Verifying the authenticity and ownership of multimedia assets is vital to protect the digital rights of people and companies alike.

The provenance of data used to train AI models is equally crucial from the perspective of maintaining transparency. Knowing where data comes from helps to establish the authenticity and reliability of AI-generated content – and anticipate potential issues surrounding accuracy, bias, or licensing.

Generative AI developers are creating training datasets using material that has been scraped from the web, often without clarity about the consent of copyright holders. The trained AI model will subsequently generate new content based on those materials, potentially infringing on the rights of the copyright holder.

For now, not much can be done to protect copyrighted works from being fed into generative AI models under training.

Still, developers should ensure that they remain in compliance with applicable laws on data acquisition. The European Union’s new AI act, for instance, requires AI systems to be transparent and comply with copyright legislation.

Developers seeking to enrich their AI training data, therefore, may need to compensate intellectual property owners, either through licensing or revenue sharing.

Consumers, in turn, should be able to determine whether AI systems were trained on protected content, review terms of service and privacy policies, and avoid generative AI tools that lack official licensing or clear compliance with open-source licenses.

Opt-ins and -outs

Companies mindful of intellectual property concerns are giving artists the opportunity to optout of their work being used to train AI image generators, with opt-out decisions being reflected by the next iteration of the image generator.

But this still leaves the onus on content creators to protect their intellectual property, rather than requiring AI developers to secure intellectual property rights before using any pre-existing work.

Instead, companies should require the creator’s opt-in from the very beginning.

Going forward, ethical AI developers would provide mechanisms to disclose the provenance of AI-generated content, with full transparency about all content that went into the training data.

Such information would make authenticity verifiable. It would also protect business users of AI-generated content from the risk of intellectual property infringement.

Multimedia authenticity protocols and watermarking call for widely recognized international technical standards. This is a field with significant opportunities for collaboration among AI developers and regulators, as well as academia, businesses, content creators, and AI users of all kinds.

During the AI for Good Global Summit, a full-day workshop dives into multimedia authenticity issues, with a focus on international standards, the use of watermarking technology, enhanced security protocols, and cybersecurity awareness.

The free workshop, open to anyone interested, takes place on 31 May in Geneva, Switzerland, and online.

The session aims to unite leading minds and experts to discuss recent research findings and ongoing standardization efforts, aiming to create a collaborative platform to collectively address current gaps. It also aims to develop recommendations for practical actions and encourage further investment in this field.

Register for the workshop here: Detecting deepfakes and Generative AI: Standards for AI watermarking and multimedia authenticity.

Learn more about the AI for Good Global Summit.

In another blog post, Alessandra Sala delves into digital watermarks and how they can ensure authenticity in AI-generated multimedia content.

Header image credit: Adobe Stock/AI generated

Cookie	Duration	Description
ARRAffinity	session	ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user.
ARRAffinitySameSite	session	This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_T69008QZXS	2 years	This cookie is installed by Google Analytics.
_gat_UA-121074739-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	Hotjar sets this cookie to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	Hotjar sets this cookie to identify a new user’s first session. It stores a true/false value, indicating whether it was the first time Hotjar saw this user.
_hjIncludedInPageviewSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's pageview limit.
_hjIncludedInSessionSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's daily session limit.
_hjTLDTest	session	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Cookie	Duration	Description
_hjSession_768571	30 minutes	No description
_hjSessionUser_768571	1 year	No description
AnalyticsSyncHistory	1 month	No description
articleTopicsViews	3 months	No description
BIGipServer3sEEw690n76oenZ3ixwpzQ	session	No description
li_gc	2 years	No description
TS010592a8	session	No description
TS01af0bc2	session	No description
TS70351561027	session	No description
wp-wpml_current_language	session	No description available.

Detecting deepfakes: Generative AI uptake casts doubt on multimedia content

Building trust in digital systems

Safeguarding authenticity in the age of generative AI

Opt-ins and -outs

Related content

Connect with ITU standards experts at OFC

New ITU clock concept for more resilient synchronization networks

Wanted: AI-based pledges to connect the world

Please fill in the form below to get your copy.