Named Entity Recognition (NER) in Low Resource Languages of Ho

Named Entity Recognition (NER) in Low Resource Languages of Ho

Satya Ranjan Dash, Bikram Biruli, Yasobanta Das, Prosper Abel Mgimwa, Muhammed Abdur Rahmaan Kamaldeen, Aloka Fernando
Copyright: © 2024 |Pages: 26
DOI: 10.4018/979-8-3693-0728-1.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The Ho tribe is an indigenous community that primarily inhabits the Indian states of Odisha, Jharkhand, West Bengal, Assam, and Chhattisgarh. The Ho language, which belongs to the Austroasiatic language of Munda family, is their primary means of communication. Warang Chiti is the script for writing Ho language. Creating user-friendly tools, applications, and resources that support Ho language users in various aspects, such as typing, spell-checking, dictionary lookup, text conversion between UNICODE and 8-bit encodings, speech-to-text, and text-to-speech translation, this chapter discusses data augmentation techniques, transfer learning methods, domain adaptation strategies, and the importance of resource creation. It also emphasizes the need for collaborative efforts and community-driven initiatives to advance NER research in low resource language settings.
Chapter Preview
Top

1. Introduction

The HO tribe is an indigenous community that primarily inhabits the Indian states of Odisha, Jharkhand, West Bengal, Assam and Chhattisgarh. Their population is 5 million. The Ho language, which belongs to the Austroasiatic language of Munda family, is their primary means of communication (Kumar & Kumar, 2019). Warang Chiti is the script for writing Ho language. Pandit Lako Bodra created Warang Chiti Script in 1954. Jharkhand tribes are primarily found in the eastern and western parts of Singhbhum Ranchi, Dumka, Hazaribagh, Palamu, and Giridih. According to the 2011 census, the total ST population in the state is 8,645,042, with rural areas having 7,868,130 and urban areas having 776,892. Scheduled tribes accounted for 26.3% of the population in 2001.

The Ho community is a notable Munda tribe in India, predominantly centred in the Chotanagpur Plateau area. They have a rich cultural legacy, including traditional traditions, festivals, and ceremonies. They are noted for their art, music, and dancing, showing their affinity for nature and agrarian methods. Social cohesiveness is a fundamental part of their communal life. Historically, the Ho people have been involved in agriculture, farming crops like rice, millet, and pulses, and manufacturing handicrafts like jewellery and textiles.

However, due to the lack of documentation and language resources, the Ho language is currently facing the threat of extinction. Therefore, there is a growing interest in applying NLP and NER techniques to the Ho language to preserve and document this endangered language. This paper presents a survey of existing work in this area, along with future research directions. Named Entity Recognition (NER) in low resource languages presents unique challenges due to limited annotated data and linguistic resources. This research article investigates the specific difficulties associated with NER in low resource languages and explores various approaches and solutions to address these challenges. The article discusses data augmentation techniques, transfer learning methods, domain adaptation strategies, and the importance of resource creation (Das & Mandal, 2020). It also emphasizes the need for collaborative efforts and community-driven initiatives to advance NER research in low resource language settings. There are several different tribes represented in Odisha's population, with the Kolha, Ho, and Munda tribes having the largest populations. Odisha is also home to a number of additional indigenous cultures in addition to these tribes. There is a sizable community of the Kolha tribe in the state, which is renowned for its distinctive cultural legacy. The Ho tribe, with its own language and customs, is also well-represented (Mohanty, 2008). The Munda tribe, known for their extensive cultural traditions, also makes up a portion of Odisha's tribal population. Other indigenous tribes, each with a unique character, also live in Odisha besides the Kolha, Ho, and Munda tribes. The Santal, Oraon, Kondh, Gond, Juang, and several more tribes are among them (Das, 2015). Each tribe has its own unique language, culture, and traditions, which add to the state's rich cultural diversity. These tribes are spread out throughout various areas of Odisha in terms of population. While certain tribes are mostly found in a few districts, others are dispersed among several districts. Odisha is a key centre of indigenous communities in India as a result of the variety of its tribal inhabitants, which adds to the state's cultural richness and legacy (Behera, 2017).

The foundation of human culture and identity is language, which embodies millennia of knowledge, tradition, and history (Panda, 2018). Unfortunately, many indigenous languages are slowly dying out in the face of globalisation and dominant languages, putting the priceless information they contain in peril (Das, 2021). One such endangered language is Ho, a tribal tongue spoken mostly by the Ho tribe in the Indian state of Jharkhand and nearby areas (Tribal Research Institute).

The Ho language is a valuable storehouse of indigenous knowledge since it has a distinctive linguistic structure and a vibrant oral tradition (Tribal Language Development Authority). However, the preservation and recording of the Ho language have encountered significant difficulties as a result of numerous socioeconomic issues and constrained technical resources.

Complete Chapter List

Search this Book:
Reset