Page 51 - Crowdsourcing AI and Machine Learning solutions for SDGs - ITU AI/ML Challenges 2024 Report
P. 51
Crowdsourcing AI and Machine Learning solutions for SDGs
Data pipelines enable the flow of data from an application to a data warehouse, from a data
lake to an analytics database, or into an ML pipeline system, for example.
Step-6: Label/Tag the data. (optional step) Data labeling is the process of detecting and
tagging data samples. The process can be manual but is usually performed or assisted by
software. Labeled data is a group of samples that have been tagged with one or more labels.
In machine learning, if you have labeled data, that means your data is marked up, or annotated,
to show the target, which is the answer you want your machine learning model to predict. In
general, data labeling can refer to tasks that include data tagging, annotation, classification,
moderation, transcription, or processing. Labeled data highlights data features - or properties,
characteristics, or classifications - that can be analyzed for patterns that help predict the target.
Step-7: Draft user agreements. A user agreement is an agreement made between the owner,
administrator, or provider of a service (data owner) and the user of such a service (challenge
participants), that defines the rights and responsibilities of both parties. Privacy policies, terms
and conditions, etc. are examples of a user agreement.
Step-8: Secure hosting of data. In this step, the data owner or ITU provides a platform for the
challenge to store sensitive data (private or secure data) that in a manner compliant with the
entity’s data-sharing policy. The challenge participants can access the secure data hosted on
the platform by signing non-disclosure agreements or user agreements. This data can be
accessed by using passwords or tokens.
43