Page 48 - Crowdsourcing AI and Machine Learning solutions for SDGs - ITU AI/ML Challenges 2024 Report
P. 48
Crowdsourcing AI and Machine Learning solutions for SDGs
Table 3: Data Classification Categories (continued)
Secret Also known as “personal, or confidential,” this is composed of
highly sensitive information that may cause serious distress or
increase risk to an individual’s safety violate an individual’s privacy
or impact the compliance to privacy regulations by organiza-
tions. This includes personal data that could identify an individual
(either on their own or if combined with other data sets), and
protection incident management information.
NOTE – This kind of data should be avoided from being shared.
In order to determine the sensitivity level of a dataset/information type, it is recommended that
the data owner perform a classification of data and risk assessment on the potential impact of
the disclosure of each dataset/information type.
For the ITU AI/ML Challenge, we are interested in data that is classified as open or restricted.
2 Options for hosting “restricted data” for AI/ML in the 5G Challenge
Data providers who would like to share data under the “restricted data” category have the
following options to choose from;
Option-1: Self-hosted
• Data providers host ML sandbox, including toolsets (e.g. for training) and data handling.
These will be on-premises for data providers.
NOTE - ML Sandbox: defined in [ITU-T Y.3172]
• According to step-7 of the “data sharing guideline”, user agreements are drafted for
access to this ML sandbox. E.g. No download of data may be allowed.
• According to discussions with participants, a list of interested participants for the problem
statement (specific to the data provider) is made by ITU and discussed with the data
provider.
• The data provider shortlists the candidates who can access the restricted data.
• User agreement is signed, and this makes the participants eligible to compete in the
challenge using the restricted data.
Option-2: ITU hosted
• Data providers instantiate ML sandbox, including toolsets (e.g. for training) and data
handling. These will be in-premise of ITU (Geneva).
• All other steps remain the same as option-1
NOTE – in this option, ML sandbox maintenance is taken care of by ITU.
NOTE – ITU-hosted ML sandbox may be reused in future editions of such challenges.
NOTE – ITU may facilitate sharing of data between data providers and eligible participants, this may eliminate
the need for each participant or team to negotiate with the data provider individually.
3 Risk assessment
Risk assessment must be carried out at an institutional level because data sensitivity is
• Contextual: What may not constitute sensitive data and information in one context, may
be sensitive in another.
40