Page 44 - Crowdsourcing AI and Machine Learning solutions for SDGs

Page 44 - Crowdsourcing AI and Machine Learning solutions for SDGs - ITU AI/ML Challenges 2024 Report

P. 44

Crowdsourcing AI and Machine Learning solutions for SDGs

Annex 1: Data

1 Types of data

Three different types of datasets will be offered: real data, open data, and synthetic data. In
some instances, no data will be required to address relevant problem statements.

Real data: This is anonymized network data from operators. The problem statements derived
from this data can span across all three tracks but are more likely to play a role in the Network
and Verticals tracks. Network data is sensitive and cannot be shared on an open platform
and requires a high level of security. However, this type of dataset is important for inference
using ML in 5G networks. Different security levels to access training and testing data would be
offered to accommodate privacy issues: tracks that run with real data will ensure that isolated,
segregated sandboxes (see ITU-T Y.3172) and best practices are in place for secure data
handling (“secure track”). Access to this data may be restricted on a role basis and need basis.
Secure data-handling techniques (see ITU-T Y.3174) would be put in place for the “secure-
track”.

Open data: This is data that is open and freely available on the Internet related to network
operations. This type of data can span across multiple tracks.

Synthetic Data: This data is from simulations. This will be used to solve problems from different
tracks depending on the application.
No data: In some instances, there will be no data required to address relevant problem
statements. An example is build-a-thon in which the development of toolsets to support/
enable an end-to-end implementation of AI/ML in 5G networks does not require any data.

2 Data sets

Real data sets: This type of dataset is provided by ITU AI/ML Challenge partners. They provide
datasets from real networks in accordance with relevant privacy policies.

Open data sets: Compiled list of open datasets is made available on the Challenge website.
Synthetic data sets: Simulation platforms with associated data will be provided by ITU AI/ML
Challenge partners.

3 Data privacy policy

Data will be handled in accordance with policies and regulations relevant to the entities and
data concerned. Data may be pre-processed and provided using pre-published APIs and may
be secured using login/token. Data handling APIs (according to ITU-T Y.3174) will be provided
based on the use case and filtered based on the policies of the involved organization(s). Data
anonymization may be applied according to relevant policies and regulations. A non-disclosure
agreement (NDA) may be included in the terms of participation. In cases where the Challenge
involves local user data, the results may be presented in the form of a competition paper not
including local user data. API access to data shall be monitored and licensed based on the
agreement. Some test data sets may be private and will not be disclosed.

39 40 41 42 43 44 45 46 47 48 49