Page 757 - AI for Good Innovate for Impact

P. 757

AI for Good Innovate for Impact

Use Case 6: Fine-Tuning ASR Language Models for

Underrepresented Low-Resource Languages 4.9: Accessibility

Organization: United Nations High Commissioner for Refugees(UNHCR)

Country: Iraq

Contact Person(s):

Roshna Abdulrahman, abdulrar@ unhcr .org
Sofia Kyriazi, kyriazis@ unhcr .org
Rebeca Moreno Jimenez, morenoji@ unhcr .org

1 Use Case Summary Table

Item Details
Category Accessibility

Problem Addressed In the Kurdistan Region, UNHCR receives a lot of information from
communities via unstructured text and audio recordings in Kurdish.
These have to be manually processed and assessed, which is time-con-
suming, so the data is often not fully analyzed.

Key Aspects of Solution The result of this project will be a language model for Kurdish speech
to text - STT that is integrated into Kobotoolbox, a survey tool used by
humanitarian organizations globally. Using this tool, colleagues can
have more time to work on analyzing the data instead of spending
time on transcription.

Technology Keyword LLMs, GenAI, GPTs, Data Collection, Data Analysis, Low resource
languages, ASR, STT, NLP
Data Availability Public: a multi-language, open-source voice dataset for training
speech-enabled applications [2]
An experimental dataset of Sorani Kurdish that could be used in
speech recognition using CMUSphinx [1].
Private data collection:
• Synthetic Storytelling recordings
• Interviews with Kurdish Speakers

721

752 753 754 755 756 757 758 759 760 761 762