Tuk Gwet: Fixing the Digital Future for Societies Excluded by Language


Kamusi Project International

Session 278

Monday, 3 April 2023 13:00–14:00 (UTC+02:00) Thematic Workshop

Addressing linguistic inequity through ICT: approaches to collecting and sharing the data necessary to enable non-lucrative languages as equal players in the ICT universe

Speakers of most of the world’s 7000 languages are excluded from participation in global engines of knowledge and prosperity because their languages do not have a meaningful digital presence. Language inequity is not accidental; colonial languages that were imposed on Africa at the Berlin Conference of 1884 remain the foreign languages in which most African students must traverse their secondary studies, and children of disempowered languages worldwide share similar stories of violent linguistic suppression across a similarly long time span. This inequity has now been embedded in the tools and technologies available to languages in the digital age. The languages with the most political and economic power have benefited from decades of research and investment. Speakers of languages that are not favored by those who control the purse strings are effectively blocked from digital pathways toward most Sustainable Development Goals - not only endangered languages that contain a wealth of vanishing human patrimony, but also demographically strong languages with tens of millions of speakers and growing.


Digital technologies have changed lives in ways too numerous to count. Whether the ability to consult with a doctor in e-health, apply for a business license in e-governance, or study STEM online, speakers of lucrative languages experience a world highly mediated by language technologies. The path to success is well trodden: assemble copious linguistic data, such as words and language models, in forms that are compatible with existing natural language processing techniques, then put that data to work in well-developed tools. The missing elements for most languages to join this path are the money and attention to acquire the data, the belief that digital linguistic equity is a realistic prospect, and, critically, the commitment to linguistic equity as a development priority.


This panel will examine language equity through the lens of a concept described by a term from the Basaa language of Cameroon: tuk gwét. Tuk gwét refers to waging war against an injustice in order to repair, not to punish, with the aim of preventing it from happening in the future. Without concentrating on the history that has created the current linguistic digital divide, the panel will focus on methods that can be deployed going forward for non-privileged languages to leapfrog into advanced language technologies. This will include both the technical environment, such as the Platform for African Language Empowerment that the African Academy of Languages proposes for the African Union with technology and data from Kamusi.org and other technology and linguistic partners, and the financial and ideological transformations that must occur in order to move from rhetoric to reality, for any and all languages that have been suppressed or ignored by those guiding research and investment from centuries past into the current day.

 


Martin Benjamin
Martin Benjamin Executive Director Kamusi Project International (Switzerland) Moderator

Martin Benjamin is the founder and director of the Kamusi Project, an international non-profit dedicated to producing dictionary and data resources for languages worldwide that was recognized as a Big Data "lighthouse project" by the White House Office of Science and Technology Policy, and a technology and language expert for ACALAN, the African Union's African Academy of Languages. His PhD in Anthropology, from Yale (2000), examined aid in rural Tanzania. He also has a YouTube channel as the "Pirate Professor" to explore aspects of language technology and linguistic equity, and runs Kamusi Labs, a virtual lab with a special focus on training a cadre of young African technologists, to build advanced language technology tools for languages outside the mainstream of research and investment.


Parameswari Krishnamurthy
Parameswari Krishnamurthy Assistant Professor, Language Technology Resources Centre International Institute of Information Technology, Hyderabad, India

Parameswari Krishnamurthy is a computational linguist and currently working as an Assistant Professor at the Language Technologies Resource centre at the International Institute of Information Technology (IIIT- Hyderabad). Her research interest lies in building language technology tools and applications for Indian languages. She has worked on building machine translation systems, shallow parser tools, syntactic parsing and semantic parsing. 


Onyothi Nekoto
Onyothi Nekoto Namibia

Emmanuel Ngué Um
Emmanuel Ngué Um Associate professor of African languages and Linguistics University of Yaoundé 1 & University of Bertoua (Cameroun)

Emmanuel NGUE UM is a linguist and the current Head of the Department of Cameroonian Langues and Cultures at the University of Bertoua. His research interest lies at the intersection of Languages, Cultures and Digital Technologies. He is an Associate Editor of the International Journal of Humanities and Arts  Computing (IJHAC). He is also a language activist and member of the Governance committee of the Endangered Languages Project.


Claudia Pozo
Claudia Pozo Language Justice Coordinator Whose Knowledge? (Bolivia)

Claudia Pozo is Languages Coordinator of Whose Knowledge?. As a Bolivian brown feminist and human rights technologist, Claudia is a multifaceted activist, social scientist and strategist, whose work is grounded on situated struggles across Latin America. She is a tech person who has worked as a web developer and content producer across diverse formats and in multiple languages for over 15 years. Claudia is passionate about Free/Libre and Open Source Software, community radio and digital art; and a strong advocate for privacy and security online. She holds an MPhil in Development Studies and a BA in Communications. More recently, Claudia has become a council member for the Global Fund for Women’s Artist Changemaker Program. When not trekking in the mountains, you can find her drinking black tea and datamoshing late at night.


Topics
Big Data Cultural Diversity Digital Divide Digital Inclusion Education Infrastructure
WSIS Action Lines
  • AL C2 logo C2. Information and communication infrastructure
  • AL C3 logo C3. Access to information and knowledge
  • AL C6 logo C6. Enabling environment
  • AL C7 E–LEA logo C7. ICT applications: benefits in all aspects of life — E-learning
  • AL C8 logo C8. Cultural diversity and identity, linguistic diversity and local content
  • AL C11 logo C11. International and regional cooperation

Language technology is fundamental infrastructure for any action that requires citizen participation. If people cannot understand, they cannot engage, and most people do not understand technologically-endowed languages at a level that enables their participation. Attention to language therefore addresses Action Lines C2 and C6 by directly enabling the ICT infrastructure for previously excluded peoples. Information and knowledge (C3) can only be transmitted through a language that a person knows comfortably, including but not limited to e-learning and all the other e-’s of Action Line C7. Linguistic diversity (C8) is at the heart of these goals, including preservation of endangered languages and incorporating large but resource-bereft languages into the digital future. The proposed ACALAN Platform for African Language Empowerment is an example of cooperation toward agreed language goals among the African Union’s member states (C11), although equality for non-European languages is to date mostly unsupported by government and private funders in the global North who espouse “cooperation” in their public rhetoric and aggressively fund digitalization in their own national languages. “Tuk gwét” looks toward solutions through which those who steer the agenda and those who value non-lucrative languages share responsibility for enabling an equitable world going forward, for people to participate in all the Action Lines from which they are currently blocked because their mother tongues have been passively or intentionally outside the historical scope of research and investment in language technology.

Sustainable Development Goals
  • Goal 4 logo Goal 4: Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all
  • Goal 8 logo Goal 8: Promote inclusive and sustainable economic growth, employment and decent work for all
  • Goal 9 logo Goal 9: Build resilient infrastructure, promote sustainable industrialization and foster innovation
  • Goal 10 logo Goal 10: Reduce inequality within and among countries
  • Goal 16 logo Goal 16: Promote just, peaceful and inclusive societies

The link between language and sustainable development has rarely been articulated on a global stage. Development agencies often find means to address particular language needs on an ad hoc basis, such as local translators or the occasional localized interface for a digital app. However, the production of a viable linguistic infrastructure is outside the scope of the mandates of most organizations. At the same time, funding for research and development for most languages on an academic or industrial level is paltry or non-existent - a problem that is being greatly exacerbated by the current fixation on AI, for which only languages that already have large amounts of digitized data to undergird “large language models” can participate. Moreover, a gender dimension in many places, where education in the lucrative languages that lead to political and economic power is often denied to girls, garners almost no consideration among planners and policymakers. The need to prioritize language within the development process is manifest, but the actuality of doing so is mostly absent.

 

Links

African Academy of Languages: https://au.int/en/african-academy-languages-acalan
Platform for African Language Empowerment: http://kamu.si/pale-acalan
Kamusi Project: http://kamusi.org
Whose Knowledge?: https://whoseknowledge.org/
A short video regarding STEM education in national languages in Africa and Europe: http://kamu.si/mali-stem