Work item:
|
X.sg-sd
|
Subject/title:
|
Security guidelines for synthetic data in the context of AI systems
|
Status:
|
Under study
|
Approval process:
|
TAP
|
Type of work item:
|
Recommendation
|
Version:
|
New
|
Equivalent number:
|
-
|
Timing:
|
2027-Q3 (Medium priority)
|
Liaison:
|
-
|
Supporting members:
|
China Unicom, Alibaba China Co, China Telecom
|
Summary:
|
Nowadays, with the rapid development of AI technology, the demand for data, especially high-quality data, in the training and development of large models is increasing day by day. However, in the real world, the amount of data required for training large models is becoming increasingly scarce and facing many problems.
Synthetic data refers to simulated data generated through computer algorithms, which simulates the distribution and characteristics of real-world data, and constructs new datasets through mathematical models and generation techniques, rather than directly from real-world observations or records. Synthetic data would provide more data resources which can conduct data analysis, model development, and algorithm testing. In recent years, synthetic data has been widely applied in various AI industries, such as image recognition,, robotics, etc. It is an essential tool to solve data bottlenecks. However, the emergence of synthetic data not only brings benefits for AI system and multiple industries, but also poses a range of security threats and challenges.
Therefore, it is essential for organizations to have effective security guidelines in place to address related security threats and challenges for synthetic data in the context of AI systems.
Although there exists standards of ITU-T that offer solid guidelines for general data security, synthetic data differs from general data in terms of source, generation method, advantages and disadvantages, and application scenarios. Additionally, synthetic data has been widely applied in multiple industries, therefore, there is an urgent need for standards.
This document will provide various application scenarios of synthetic data in the context of AI systems and analyse the security threats and challenges brought by synthetic data. And the focus of the document is to propose corresponding security guidelines for different application scenarios and security threats. Furthermore, due to the strong correlation between synthetic data and AI development, this standard will help ensure the security of using synthetic data in the process of AI development, expand AI applications, and promote the development of AI large models and related industries.
|
Comment:
|
-
|
Reference(s):
|
|
|
Historic references:
|
Contact(s):
|
|
ITU-T A.5 justification(s): |
|
|
|
First registration in the WP:
2025-04-17 14:34:51
|
Last update:
2025-08-11 17:38:27
|
|