Page 758 - AI for Good Innovate for Impact
P. 758

AI for Good Innovate for Impact



                      (continued)

                       Item                  Details
                       Meta Data             Audio, Text (raw, processed)
                                             This project develops a language model to turn audio of Kurdish
                                             speakers into text. The approach involves several sub-activities,
                                             including domain specific data collection, data validation and review,
                                             pre-processing, alignment of audio and transcript, reviewing selected
                                             technology,  language  model  development  and  fine-tuning,  and
                                             language model testing and evaluation.
                                             Recent speech-to-text systems often use advanced AI/ML technolo-
                                             gies such as Transformer models and Natural Language Processing
                                             (NLP) techniques. Transformer models, like those used in OpenAI’s
                                             Whisper and Meta’s Wav2Vec2.0, are popular due to their ability to
                                             handle long-range dependencies in data. NLP techniques are crucial
                                             for understanding and processing the linguistic aspects of speech.

                       Model Training and  To address the challenges of low-resource languages, the project lever-
                       Fine-tuning           ages multilingual models pre-trained on a wide range of languages,
                                             which can improve performance on low-resource languages. Data
                                             augmentation techniques, such as adding noise, changing pitch, or
                                             speed, are used to create more training data.
                                             Including publicly available data, such as Mozilla’s Common Voice, can
                                             provide a diverse set of voice samples that grow the dataset for train-
                                             ing and fine-tuning the language model. Fine-tuning involves using
                                             frameworks like Hugging Face's Transformers to adapt pre-trained
                                             models to the specific dataset. We will ensure data privacy is main-
                                             tained by running training locally.
                                             By integrating these technologies and approaches, the project aims
                                             to develop an efficient and effective speech-to-text model for Kurdish
                                             and its dialects, enhancing data processing efficiency and contribut-
                                             ing to industry innovation and infrastructure development

                       Testbeds    of  Pilot •  This model is a fine-tuned version of openai/whisper-small. [3]
                       Deployment            •  This model is a fine-tuned version of facebook/w2v-bert-2.0 on the
                                                common_voice_16_0 dataset [4].

                       Code Repositories     See above.


                      2      Use Case Description


                      2�1     Description

                      In the Kurdistan Region of Iraq and neighboring countries such as Syria, Turkiye, and Iran, as
                      well as among Kurdish immigrants in Europe, various dialects of the Kurdish language are
                      spoken. The UNHCR collects information from these communities through unstructured text
                      and interview audio recordings in Kurdish. The manual processing and assessment of this data
                      are time-consuming and often result in incomplete analysis.

                      The primary aim of this project is to develop a Kurdish speech-to-text language model integrated
                      into KoboToolbox. This will enable UNHCR colleagues to focus on data analysis rather than









                  722
   753   754   755   756   757   758   759   760   761   762   763