Article Preview
Top1. Introduction
Text documents are used in multiple domains for presenting the information. Though there is a structured model of information, the unstructured text documents are more user-friendly. The text documents are generally grouped based on the spatial or thematic terms present in it. Spatial Information conveyed through text documents often involves ambiguous fuzzy descriptions and fuzzy spatial adjectives (Mehta et al., 2011a, 2011b). Previous studies show that queries on text documents based on spatial references constitute more than 80% of the total queries; this is attributed to the spatial references in the content of documents.
The issue in comprehending these text documents is the uncertainty, vagueness or ambiguity present in it. Though there are various forms of uncertainty and vagueness defined in the natural language processing literature, this paper focuses on uncertain spatial references alone. The uncertainty is used in the sense that it is not certain whether a term is spatial or not in nature. The classification of the documents may be based on the thematic concept or spatial references. There are very few classifications based on spatial references.
Fuzzy rationale, an expansion of fresh Boolean rationale obliges for the fuzziness of a component having a place with a set (Kanagavalli & Raja, 2010; Song & Croft, 1999). This work applies fuzzy rationale methods to determine the vulnerability of the spatial part present in the content and to discover the level of spatial likeness between reports. The basic idea is to use the words adjoining the uncertain spatial references to ascertain the degree of confidence that a term would be a spatial term. The confidence is ascertained by the fuzzy values of the uncertain spatial references.
This research presents Fuzzy Extraction (Rahpeyma & Zarei, 2018), Resolving and Clustering (FERC) design for taking care of the dubious spatial data questioned expressly and groups the reports dependent on the spatial-catchphrase present in them. The system also compares two given archives dependent on the fresh and fuzzy spatial references found in them and a fuzzy similarity score are presented. It enables the user to find documents that describe similar locations. The fuzzy similarity values are displayed along with the document identification numbers.
1.1. Types of Image Noises
Information retrieval is different from information extraction in the sense that information extraction concentrates more on converting the unstructured information to structured information and can store the data into predefined templates. Information retrieval, on the other hand, concentrates on retrieving or identifying the documents from the repository that matches the user requirements. It may involve other activities like finding a suitable index for effective querying of information, comparing the information present in the corpus and using the feedback from the user to fine-tune the results of the query. The information retrieval task may be used for summarization of text, clustering of the documents; question answering, etc., the response time and the quality of the results obtained are used in evaluating the efficiency of an information retrieval system. The response time depends on factors such as the size of the corpus, the indexing mechanism used, the sort of question presented to the framework. The quality of the results is measured by the performance metrics review and exactness. The review measure is characterized as the level of important records recovered to the total number of applicable reports.