Tuk Gwet: Fixing the Digital Future for Societies Excluded by Language
Kamusi Project International
Session 278
Addressing linguistic inequity through ICT: approaches to collecting and sharing the data necessary to enable non-lucrative languages as equal players in the ICT universe
Speakers of most of the world’s 7000 languages are excluded from participation in global engines of knowledge and prosperity because their languages do not have a meaningful digital presence. Language inequity is not accidental; colonial languages that were imposed on Africa at the Berlin Conference of 1884 remain the foreign languages in which most African students must traverse their secondary studies, and children of disempowered languages worldwide share similar stories of violent linguistic suppression across a similarly long time span. This inequity has now been embedded in the tools and technologies available to languages in the digital age. The languages with the most political and economic power have benefited from decades of research and investment. Speakers of languages that are not favored by those who control the purse strings are effectively blocked from digital pathways toward most Sustainable Development Goals - not only endangered languages that contain a wealth of vanishing human patrimony, but also demographically strong languages with tens of millions of speakers and growing.
Digital technologies have changed lives in ways too numerous to count. Whether the ability to consult with a doctor in e-health, apply for a business license in e-governance, or study STEM online, speakers of lucrative languages experience a world highly mediated by language technologies. The path to success is well trodden: assemble copious linguistic data, such as words and language models, in forms that are compatible with existing natural language processing techniques, then put that data to work in well-developed tools. The missing elements for most languages to join this path are the money and attention to acquire the data, the belief that digital linguistic equity is a realistic prospect, and, critically, the commitment to linguistic equity as a development priority.
This panel will examine language equity through the lens of a concept described by a term from the Basaa language of Cameroon: tuk gwét. Tuk gwét refers to waging war against an injustice in order to repair, not to punish, with the aim of preventing it from happening in the future. Without concentrating on the history that has created the current linguistic digital divide, the panel will focus on methods that can be deployed going forward for non-privileged languages to leapfrog into advanced language technologies. This will include both the technical environment, such as the Platform for African Language Empowerment that the African Academy of Languages proposes for the African Union with technology and data from Kamusi.org and other technology and linguistic partners, and the financial and ideological transformations that must occur in order to move from rhetoric to reality, for any and all languages that have been suppressed or ignored by those guiding research and investment from centuries past into the current day.

Martin Benjamin is the founder and director of the Kamusi Project, an international non-profit dedicated to producing dictionary and data resources for languages worldwide that was recognized as a Big Data "lighthouse project" by the White House Office of Science and Technology Policy, and a technology and language expert for ACALAN, the African Union's African Academy of Languages. His PhD in Anthropology, from Yale (2000), examined aid in rural Tanzania. He also has a YouTube channel as the "Pirate Professor" to explore aspects of language technology and linguistic equity, and runs Kamusi Labs, a virtual lab with a special focus on training a cadre of young African technologists, to build advanced language technology tools for languages outside the mainstream of research and investment.

Parameswari Krishnamurthy is a computational linguist and currently working as an Assistant Professor at the Language Technologies Resource centre at the International Institute of Information Technology (IIIT- Hyderabad). Her research interest lies in building language technology tools and applications for Indian languages. She has worked on building machine translation systems, shallow parser tools, syntactic parsing and semantic parsing.


Emmanuel NGUE UM is a linguist and the current Head of the Department of Cameroonian Langues and Cultures at the University of Bertoua. His research interest lies at the intersection of Languages, Cultures and Digital Technologies. He is an Associate Editor of the International Journal of Humanities and Arts Computing (IJHAC). He is also a language activist and member of the Governance committee of the Endangered Languages Project.

Claudia Pozo is Languages Coordinator of Whose Knowledge?. As a Bolivian brown feminist and human rights technologist, Claudia is a multifaceted activist, social scientist and strategist, whose work is grounded on situated struggles across Latin America. She is a tech person who has worked as a web developer and content producer across diverse formats and in multiple languages for over 15 years. Claudia is passionate about Free/Libre and Open Source Software, community radio and digital art; and a strong advocate for privacy and security online. She holds an MPhil in Development Studies and a BA in Communications. More recently, Claudia has become a council member for the Global Fund for Women’s Artist Changemaker Program. When not trekking in the mountains, you can find her drinking black tea and datamoshing late at night.
-
C2. Information and communication infrastructure
-
C3. Access to information and knowledge
-
C6. Enabling environment
-
C7. ICT applications: benefits in all aspects of life — E-learning
-
C8. Cultural diversity and identity, linguistic diversity and local content
-
C11. International and regional cooperation
Language technology is fundamental infrastructure for any action that requires citizen participation. If people cannot understand, they cannot engage, and most people do not understand technologically-endowed languages at a level that enables their participation. Attention to language therefore addresses Action Lines C2 and C6 by directly enabling the ICT infrastructure for previously excluded peoples. Information and knowledge (C3) can only be transmitted through a language that a person knows comfortably, including but not limited to e-learning and all the other e-’s of Action Line C7. Linguistic diversity (C8) is at the heart of these goals, including preservation of endangered languages and incorporating large but resource-bereft languages into the digital future. The proposed ACALAN Platform for African Language Empowerment is an example of cooperation toward agreed language goals among the African Union’s member states (C11), although equality for non-European languages is to date mostly unsupported by government and private funders in the global North who espouse “cooperation” in their public rhetoric and aggressively fund digitalization in their own national languages. “Tuk gwét” looks toward solutions through which those who steer the agenda and those who value non-lucrative languages share responsibility for enabling an equitable world going forward, for people to participate in all the Action Lines from which they are currently blocked because their mother tongues have been passively or intentionally outside the historical scope of research and investment in language technology.
-
Goal 4: Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all
-
Goal 8: Promote inclusive and sustainable economic growth, employment and decent work for all
-
Goal 9: Build resilient infrastructure, promote sustainable industrialization and foster innovation
-
Goal 10: Reduce inequality within and among countries
-
Goal 16: Promote just, peaceful and inclusive societies
The link between language and sustainable development has rarely been articulated on a global stage. Development agencies often find means to address particular language needs on an ad hoc basis, such as local translators or the occasional localized interface for a digital app. However, the production of a viable linguistic infrastructure is outside the scope of the mandates of most organizations. At the same time, funding for research and development for most languages on an academic or industrial level is paltry or non-existent - a problem that is being greatly exacerbated by the current fixation on AI, for which only languages that already have large amounts of digitized data to undergird “large language models” can participate. Moreover, a gender dimension in many places, where education in the lucrative languages that lead to political and economic power is often denied to girls, garners almost no consideration among planners and policymakers. The need to prioritize language within the development process is manifest, but the actuality of doing so is mostly absent.
African Academy of Languages: https://au.int/en/african-academy-languages-acalan
Platform for African Language Empowerment: http://kamu.si/pale-acalan
Kamusi Project: http://kamusi.org
Whose Knowledge?: https://whoseknowledge.org/
A short video regarding STEM education in national languages in Africa and Europe: http://kamu.si/mali-stem