How AI is helping revitalise indigenous languages
By ITU News
Thirty-five years ago, New Zealand adopted a law declaring the official status of Te Reo Māori, the language spoken by the country’s indigenous Māori people.
Decades of repression put the language, also called simply te reo, under serious threat: only one in four Māori spoke it by 1960, with a very low percentage of speakers among children.
Since then, the language has started regaining lost ground, enjoying formally equal status with English and being taught widely to New Zealand schoolchildren.
Still, reviving it as a living language takes time and persistence.
Lately, the nascent te reo renaissance is gaining added momentum with the help of artificial intelligence (AI).
The non-profit media organisation Te Hiku was founded in 1990, three years after the passage of ground-breaking law, to disseminate indigenous languages and culture by leveraging New Zealand’s radio stations and broadcasting services.
Based on knowledge and trust built among local communities over the last three decades, Te Hiku is now using AI-trained language models to help revitalise te reo.
Today, over 50,000 people are thought to speak te reo “well or very well”, with similar numbers speaking it “fairly well”. However, the majority of active indigenous language users are aged 35 or more.
“It’s not only about teaching the computer to speak our language or about machine learning,” said Te Hiku chief executive Peter-Lucas Jones at a recent AI for Good webinar. “It’s about the survival and the future of our language and our future generations.”
Stewarding indigenous knowledge
A network of 21 radio stations and television channels helped Te Hiku build an extensive audio-visual archive of Māori words, phrases, and idioms. “We spent many years interviewing our elderly about every river, plant, beach,” Jones said. “Anything they would want to talk about.”
Older populations, and especially women, have a key role to play in the transmission of ancestral knowledge, as this year’s International Day of the World’s Indigenous Peoples highlights.
In recent years, Te Hiku has also livestreamed traditional dance competitions, funeral processions, and other community events, aiming to connect with Māori who have migrated to urban areas or abroad, far from traditional territory.
A digital platform set up in 2014 disseminates Te Haku’s content using new media channels, providing crucial outreach to different generations and including younger people.
Jones and chief technology officer Keoni Mahelona soon realised they would need the help of AI to take things a step further, bringing together years of stored knowledge.
Thus began Korero Maori, an open-source app for collecting oral recordings in indigenous languages. In addition to Te Reo Māori, the project now also includes recordings in Hawai’ian, Cook Islands Māori, and New Zealand English, which will be used to train models through machine learning.
In the first ten days of using the app, Te Hiku’s data science team collected over 300 hours of recordings. This was in addition to 3,500 hours of audio from native speakers with semi-labelled words, phrases and expressions, which allowed the data team to develop its first speech-to-text systems.
Among the project’s latest achievements is its Rongo application for practising Te Reo Māori pronunciation. It aims to “restore the native sound of the language,” and to avoid the assimilation of English as far as possible, explained Jones.
Today, Te Hiku Media is collaborating with local and international data scientists to perfect tools that will allow New Zealanders “to engage with technology in the language they use or aspire to use every day,” for example through voice assistants.
Te Haku’s work is part of a global movement calling for digital sovereignty for indigenous cultures, Mahelona says.
To succeed, every product or service the organization puts out must have a positive impact for Māori people. All partnerships with external agencies are protected under the Kaitiakitanga license, which ensures that data sovereignty remains within local communities and prohibits the use of data in applications that surveil, discriminate, or violate human rights.
“We know what it means losing sovereignty,” Jones said. “Data is the new land. Having had our land taken off us, and the experience of language loss in our family, we take data sovereignty very seriously.”
“If we give corporates access to our data, they will definitely profit from it,” giving people the “privilege” to use paid products and services created from local knowledge that was passed down for centuries, Mahelona adds.
Shortly after the first version of the Korero Māori app was developed, for example, a US-based translation company asked Te Reo Māori speakers and academics for voice logs to develop real-time translation services. Te Hiku’s team saw this as a case of “indigenous data theft” and a tactic to profit from indigenous culture and knowledge.
“If we want to use AI for good, rather than big tech gobbling up our data and selling it back to us, we should empower communities to lead their own platforms and solutions to help move their people forward,” Mahelona claims.
See the full AI for Good session below:
For over two decades, the International Telecommunication Union (ITU) has worked to empower indigenous communities through information and communication technologies (ICTs), as reflected in resolutions adopted by ITU members.
ITU Development work promotes the digital inclusion of indigenous peoples through a dedicated programme to foster their socio-economic development and the preservation of their cultural legacy and traditions.
Image credit: coreeducation via Flickr