Machine Learning Approach for Kashmiri Word Sense Disambiguation

Machine Learning Approach for Kashmiri Word Sense Disambiguation

Aadil Ahmad Lawaye, Tawseef Ahmad Mir, Mahmood Hussain Mir, Ghayas Ahmed
Copyright: © 2024 |Pages: 24
DOI: 10.4018/979-8-3693-0728-1.ch006
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Studying the senses of words in a given data is crucial for analysing and understanding natural languages. The meaning of an ambiguous word varies based on the context of usage and identifying its correct meaning in the given situation is a famous problem known as word sense disambiguation (WSD) in natural language processing (NLP). In this chapter, the authors discuss the important WSD research works carried out in the context of different languages using different techniques. They also explore a supervised approach based on the hidden Markov model (HMM) to address the WSD problem in the Kashmiri language, which lacks research in the NLP domain. The performance of the proposed approach is also examined in detail along with future improvement directions. The average results produced by the proposed system are accuracy=72.29%, precision=0.70, recall= 0.70, and F1-measure=0.70.
Chapter Preview
Top

Introduction

Natural Language Processing (NLP), an important branch of Artificial Intelligence (AI), enables machines to understand and generate natural languages like humans (Chowdhary and Chowdhary 2020; Eisenstein 2019; Fanni et al. 2023). To interpret or generate the natural language, it is necessary to identify the desired meaning of words in the given data. However, many words in every natural language are ambiguous and may give different meanings based on the context of usage. Interpreting the meaning of a given natural language text becomes complex due to these ambiguous words. For example, look at the following two sentences using the ambiguous word “passage”:

This passage is difficult for me to understand. (1)

Don’t bother he will change with the passage of time. (2)

In sentence (1) it gives the sense of “a section in a book” whereas in sentence (2) it means “the act of passing from one state or place to the next”. Similarly, consider the four sentences in Kashmiri below:

نازیٖزو اَپنو یوٚہوٗدٮ۪ن خٲطرٕ سخت رٔویہٕ (3)

Nazeezo apnove yahoodeyen khater sakht ravaye (Transliteration)

سخت تاپَن زٲلۍ ٲس (4)

Sakht tapen zeal aes (Transliteration)

سیٖتاس چھِ پونٛسَن ہٕںٛز سخت ضروٗرت (5)

Sita’s che poonsen hanz sakht zaruret (Transliteration)

خت مُشکِل حالتَن مَنٛز تہِ چھےٚ نہٕ ڈاکہٕ گٲڑۍ یِوان رَد کَرنہٕ (5)

Sakht mushkil halaten manz te che ne daek gaed yewaan raed karne (Transliteration)

The four Kashmiri sentences 3,4,5 and 6 above use the word سخت in four different contexts. In sentence 3 it translates to “strict”, in sentence 4 it translates to “severe”, in sentence 5 it translates to “substantially” and in sentence 6 it translates to “hard”.

The process of making the correct sense prediction of ambiguous words in the given natural language data is given the name Word Sense Disambiguation (WSD). WSD has a direct influence on different NLP applications like machine translation, question answering, text classification, sentiment analysis, information extraction and retrieval, etc. It is considered a difficult problem as ambiguity may arise at different levels. Homonymy exists when we have words with ditto spellings and sounds but exhibit unalike senses. For example, the “ugly woman” or “flexible container used for carrying personal items” senses of the word bag. On the other hand, polysemy exists when the different senses of a word are connected. For example, the word “mouse” may refer to an “animal” or “peripheral connected to a computer” and these senses are related due to resemblance in shape. The overall WSD process has two steps. In the first step list of possible senses of the underlying word is collected from a sense inventory and in the second step the feasible sense to the word is assigned.

Key Terms in this Chapter

Natural Language Processing: Natural Language Processing is a derivative of Artificial Intelligence that unfolds the rules to facilitate the interaction between humans and machines. Understanding, generating and interpreting the natural languages by machines just like humans do is the aim objective that NLP fulfills.

Ambiguity: Ambiguity is a concept in NLP that refers to describing circumstances where a lexical term phrase or a sentence might have distinct interpretations. The ambiguity may arise at different levels like lexical level, syntactic level, pragmatic level or semantic level.

Machine Learning: Machine learning is a part of Artificial Intelligence concerned with the development of models that let computers learn and make judgments without the requirement of being programmed explicitly. The machine learning models are designed with the aim of elevating their performance through experience or exposure to data.

Context-Window: Context-Window lists the words that are present in the surrounding of a particular word within a specified range.

Word Sense Disambiguation: The task of deciding the most relevant sense of a dubious term that has numerous potential meanings or senses is called word sense disambiguation. This relevant sense of the term is decided by the surrounding words.

Cross-Validation: Cross-validation is a valuable technique that gives a reliable estimate of the performance of the machine learning model. It is helpful in spotting the overfitting issues as well as deciding the relevant parameters and best model for the task at hand.

Sense-Inventory: Lexical resource that contains the structured set of senses for words. WordNet is considered de facto standard sense inventory for English and has been developed for other languages also.

Complete Chapter List

Search this Book:
Reset