Page 760 - AI for Good Innovate for Impact
P. 760
AI for Good Innovate for Impact
Reduced Inequality
By creating tools that can process and interpret audio data in Kurdish, the project helps ensure
that feedback and interviews from Kurdish-speaking populations are accurately captured
and understood. Transcribing voice into text allows for easier translation, classification,
summarization, and integration into data systems such as dashboards and reports. This
amplifies the voices of marginalized communities and enables more inclusive decision-making
processes, ultimately supporting equity and representation.
Partnerships and Collaboration
The project is grounded in collaboration with key partners to maximize impact. One partnership
is with a local university that specializes in Kurdish computational linguistics, providing deep
academic insight and cultural relevance. Another is with Kobo, the organization behind
KoboToolbox, a widely used platform for data collection and visualization. By working with
these partners, the project ensures the practical deployment of the language model within
humanitarian data workflows and promotes adoption at scale through trusted platforms.
2�3 Future Work
Direct next steps:
• Conduct Data Collection: We will commission 10 hours of synthetic data and conduct
humanitarian domain-specific audio data collection using the KoboToolbox survey.
• Development of Kurdish Speech to Text Language Model: We will set up pipelines for
data processing, develop and fine-tune the speech-to-text model, and evaluate the model
with test data and real-life testing.
• KoboToolbox Integration: We will create an API and establish a server for hosting
language models, and modify KoboToolbox software to include the custom language
model.
• Test the Integrated Model: We will test the integrated model in real-life settings and
capture feedback.
• Communication and Dissemination: We will prepare a summary report and host a
dissemination conference.
Further future work:
• Expanding the model to support additional Kurdish dialects beyond Sorani and Kurmanji,
ensuring broader coverage and usability across different regions.
• Exploring the development of a Kurdish Large Language Model (LLM) capable of
performing basic tasks such as summarization, translation, and Q&A with the transcribed
text. This would enhance the utility of the transcriptions and provide more comprehensive
language support.
• Collaborating with other operations to adapt the developed template for other low-
resource languages. This would involve customizing the model and processes to fit the
specific linguistic and operational needs of different regions and languages.
Our project has the potential for several collaborations and expansions:
• Partnering with other international organizations working in similar fields to share
knowledge, resources, and best practices. This can help in scaling the project to other
regions and similar languages.
724

