An Efficient Methodology for Resolving Uncertain Spatial References in Text Documents

An Efficient Methodology for Resolving Uncertain Spatial References in Text Documents

Raja K., Kanagavalli V. R., Nizar Banu P. K., Kannan K.
DOI: 10.4018/IJSSMET.2020070101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In recent decades, all the documents maintained by the industries are getting transformed into soft copies in either structured documents or as an e-copies. In text document processing, there is a number of ways available to extract the raw data. As the accuracy in finding the spatial data is crucial, this domain invites various research solutions that provide high accuracy. In this article, the Fuzzy Extraction, Resolving, and Clustering (FERC) architecture is proposed which uses fuzzy logic techniques to identify and cluster uncertain textual spatial reference. When the text corpus is queried with a spatial-keyword, FERC returns a set of relevant documents sorted in view of the fuzzy pertinence score. Any two documents may be compared in light of the spatial references that exist in them and their fuzzy similarity score is presented. This enables finding the degree to which the two documents speak about a specified location. The proposed architecture provides a better result set to the user, unlike a Boolean search where the document is either rated relevant or irrelevant.
Article Preview
Top

1. Introduction

Text documents are used in multiple domains for presenting the information. Though there is a structured model of information, the unstructured text documents are more user-friendly. The text documents are generally grouped based on the spatial or thematic terms present in it. Spatial Information conveyed through text documents often involves ambiguous fuzzy descriptions and fuzzy spatial adjectives (Mehta et al., 2011a, 2011b). Previous studies show that queries on text documents based on spatial references constitute more than 80% of the total queries; this is attributed to the spatial references in the content of documents.

The issue in comprehending these text documents is the uncertainty, vagueness or ambiguity present in it. Though there are various forms of uncertainty and vagueness defined in the natural language processing literature, this paper focuses on uncertain spatial references alone. The uncertainty is used in the sense that it is not certain whether a term is spatial or not in nature. The classification of the documents may be based on the thematic concept or spatial references. There are very few classifications based on spatial references.

Fuzzy rationale, an expansion of fresh Boolean rationale obliges for the fuzziness of a component having a place with a set (Kanagavalli & Raja, 2010; Song & Croft, 1999). This work applies fuzzy rationale methods to determine the vulnerability of the spatial part present in the content and to discover the level of spatial likeness between reports. The basic idea is to use the words adjoining the uncertain spatial references to ascertain the degree of confidence that a term would be a spatial term. The confidence is ascertained by the fuzzy values of the uncertain spatial references.

This research presents Fuzzy Extraction (Rahpeyma & Zarei, 2018), Resolving and Clustering (FERC) design for taking care of the dubious spatial data questioned expressly and groups the reports dependent on the spatial-catchphrase present in them. The system also compares two given archives dependent on the fresh and fuzzy spatial references found in them and a fuzzy similarity score are presented. It enables the user to find documents that describe similar locations. The fuzzy similarity values are displayed along with the document identification numbers.

1.1. Types of Image Noises

Information retrieval is different from information extraction in the sense that information extraction concentrates more on converting the unstructured information to structured information and can store the data into predefined templates. Information retrieval, on the other hand, concentrates on retrieving or identifying the documents from the repository that matches the user requirements. It may involve other activities like finding a suitable index for effective querying of information, comparing the information present in the corpus and using the feedback from the user to fine-tune the results of the query. The information retrieval task may be used for summarization of text, clustering of the documents; question answering, etc., the response time and the quality of the results obtained are used in evaluating the efficiency of an information retrieval system. The response time depends on factors such as the size of the corpus, the indexing mechanism used, the sort of question presented to the framework. The quality of the results is measured by the performance metrics review and exactness. The review measure is characterized as the level of important records recovered to the total number of applicable reports.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 6 Issues (2022): 2 Released, 4 Forthcoming
Volume 12: 6 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing