Automatic Detection of Semantic Clusters in Glossaries

Automatic Detection of Semantic Clusters in Glossaries

Marcela Ridao, Jorge Horacio Doorn
DOI: 10.4018/978-1-7998-3473-1.ch052
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

To define the services that a software system will provide, in terms of the business process needs, is a rather important task. As a first step, the requirements engineer must deeply understand the peculiarities of the context in which the future software system will run. There have been proposed several approaches to elicit and model such knowledge. The research whose main results are presented in this chapter, was carried out in a process that models all the information acquired in natural language. The advantages of the use of natural language are somehow shadowed by its ambiguity and the way models are arranged. This chapter deals with some techniques developed to visualize hidden information in this sort of models. This is mainly done by means of the use of directed graphs and the finding of clusters on them. This chapter enhances a previous one, by improving the visualization technique and by adding an automatic cluster detection technique.
Chapter Preview
Top

Introduction

The most important principle of the Requirements Engineering is to perform all of its activities within the framework of the clients and users culture. This is the major difference between Requirements Engineering and the system analysis, since the latter models the context in which the future software system will be inserted, using computer science techniques, models and resources (Pressman, 2005; Sommerville, 2011). To plan the functionalities that a future software system will provide, in terms of the business process peculiarities, is not an easy task for the requirements engineer. As a first step, he or she must deeply understand the peculiarities of the context in which the future software system will run. There have been proposed several approaches to elicit and model such knowledge (Castro, Kolp, & Mylopoulos, 2002; Van Lamsweerde, 2001). Creating a glossary (Weidenhaupt, Pohl, Jarke, & Haumer, 1998) of the vocabulary used in Universe of Discourse (UofD), at the beginning of the project, is one of them. The UofD is the context in which the software is developed. It includes sources of information and people involved with the software: users, software engineers, domain experts, etc.

The research, whose main results are presented in this chapter, was done in a process, which precisely begins with developing of the Language Extended Lexicon (LEL), which is by itself a sort of enriched glossary (Leite & Franco, 1993). The fact that language carries knowledge, cultural information and reflects the substantial and particular ways of thinking of people (Crozect & Lidedicont 2000; Nettle & Romaine, 2000; Salim, 2017) has allowed, along many previous studies, to accelerate and to systematize the comprehension of the context for which the system will be developed (Leite, Hadad, Doorn & Kaplan, 2000). Besides the LEL, all the remainder models needed to establish the software system requirements, are also created using natural language; this is done to improve the communication among all stakeholders.

Lately, a deep analysis of the LEL has proven that the way in which it was used, did not take full advantage of its potential. There is some sort of hidden information, not easily perceived. In the chapter “Displaying Hidden Information in Glossaries”, included in the fourth edition of the Encyclopedia of Information Science and Technology, a strategy to visualize the grouping of terms in the LEL was presented (Ridao & Doorn, 2018). Experiments performed on real world cases have shown that the clusters obtained using syntactic resources of the LEL model, coincide with information nuclei of the application domain. Such strategy was based on graphs constructed using hypertext links embedded in the LEL model. Unfortunately, graphs of a relevant quantity of LELs do not present clearly distinguishable clusters, even when groups of related terms exist, and they may be found by human observers, after a careful scrutiny. This is the weakness of relying only on the detection of groups of terms by means of the visual perception of clusters in the graph. Coping with this weakness was the motivation of the research project whose results are reported here.

Key Terms in this Chapter

Force-Directed Algorithm: A method for calculating a graph layout. It calculates the layout using only information contained within the structure of the graph itself, rather than relying on domain-specific knowledge.

LEL Graph: A graph where the vertices are the LEL entries and the edges represent the hypertext links among entries.

Graph Drawing Methods: An area of mathematics and computer science that combines methods from geometric graph theory and information visualization to derive two-dimensional graph depictions.

Semantic Cluster: A group of terms in the Language Extended Lexicon, representing an area of interest in the business process.

Language Extended Lexicon: A glossary composed by a set of symbols, words or phrases which are peculiar and frequently used in a given application domain. The symbols included in a LEL have a different meaning than the regular use. It contains hypertext links that interconnect symbols.

Requirements Engineering: An area of the Software Engineering, which is responsible for acquiring and defining the needs of the software system.

Complete Chapter List

Search this Book:
Reset