Page 758 - AI for Good Innovate for Impact

P. 758

AI for Good Innovate for Impact

(continued)

Item Details
Meta Data Audio, Text (raw, processed)
This project develops a language model to turn audio of Kurdish
speakers into text. The approach involves several sub-activities,
including domain specific data collection, data validation and review,
pre-processing, alignment of audio and transcript, reviewing selected
technology, language model development and fine-tuning, and
language model testing and evaluation.
Recent speech-to-text systems often use advanced AI/ML technolo-
gies such as Transformer models and Natural Language Processing
(NLP) techniques. Transformer models, like those used in OpenAI’s
Whisper and Meta’s Wav2Vec2.0, are popular due to their ability to
handle long-range dependencies in data. NLP techniques are crucial
for understanding and processing the linguistic aspects of speech.

Model Training and To address the challenges of low-resource languages, the project lever-
Fine-tuning ages multilingual models pre-trained on a wide range of languages,
which can improve performance on low-resource languages. Data
augmentation techniques, such as adding noise, changing pitch, or
speed, are used to create more training data.
Including publicly available data, such as Mozilla’s Common Voice, can
provide a diverse set of voice samples that grow the dataset for train-
ing and fine-tuning the language model. Fine-tuning involves using
frameworks like Hugging Face's Transformers to adapt pre-trained
models to the specific dataset. We will ensure data privacy is main-
tained by running training locally.
By integrating these technologies and approaches, the project aims
to develop an efficient and effective speech-to-text model for Kurdish
and its dialects, enhancing data processing efficiency and contribut-
ing to industry innovation and infrastructure development

Testbeds of Pilot • This model is a fine-tuned version of openai/whisper-small. [3]
Deployment • This model is a fine-tuned version of facebook/w2v-bert-2.0 on the
common_voice_16_0 dataset [4].

Code Repositories See above.

2 Use Case Description

2�1 Description

In the Kurdistan Region of Iraq and neighboring countries such as Syria, Turkiye, and Iran, as
well as among Kurdish immigrants in Europe, various dialects of the Kurdish language are
spoken. The UNHCR collects information from these communities through unstructured text
and interview audio recordings in Kurdish. The manual processing and assessment of this data
are time-consuming and often result in incomplete analysis.

The primary aim of this project is to develop a Kurdish speech-to-text language model integrated
into KoboToolbox. This will enable UNHCR colleagues to focus on data analysis rather than

722

753 754 755 756 757 758 759 760 761 762 763