Page 757 - AI for Good Innovate for Impact
P. 757
AI for Good Innovate for Impact
Use Case 6: Fine-Tuning ASR Language Models for
Underrepresented Low-Resource Languages 4.9: Accessibility
Organization: United Nations High Commissioner for Refugees(UNHCR)
Country: Iraq
Contact Person(s):
Roshna Abdulrahman, abdulrar@ unhcr .org
Sofia Kyriazi, kyriazis@ unhcr .org
Rebeca Moreno Jimenez, morenoji@ unhcr .org
1 Use Case Summary Table
Item Details
Category Accessibility
Problem Addressed In the Kurdistan Region, UNHCR receives a lot of information from
communities via unstructured text and audio recordings in Kurdish.
These have to be manually processed and assessed, which is time-con-
suming, so the data is often not fully analyzed.
Key Aspects of Solution The result of this project will be a language model for Kurdish speech
to text - STT that is integrated into Kobotoolbox, a survey tool used by
humanitarian organizations globally. Using this tool, colleagues can
have more time to work on analyzing the data instead of spending
time on transcription.
Technology Keyword LLMs, GenAI, GPTs, Data Collection, Data Analysis, Low resource
languages, ASR, STT, NLP
Data Availability Public: a multi-language, open-source voice dataset for training
speech-enabled applications [2]
An experimental dataset of Sorani Kurdish that could be used in
speech recognition using CMUSphinx [1].
Private data collection:
• Synthetic Storytelling recordings
• Interviews with Kurdish Speakers
721

