A Method for Generating Comparison Tables From the Semantic Web

A Method for Generating Comparison Tables From the Semantic Web

Arnaud Giacometti, Béatrice Markhoff, Arnaud Soulet
Copyright: © 2022 |Pages: 20
DOI: 10.4018/IJDWM.298008
Article PDF Download
Open access articles are freely available for download

Abstract

This paper presents Versus, which is the first automatic method for generating comparison tables from knowledge bases of the Semantic Web. For this purpose, it introduces the contextual reference level to evaluate whether a feature is relevant to compare a set of entities. This measure relies on contexts that are sets of entities similar to the compared entities. Its principle is to favor the features whose values for the compared entities are reference (or frequent) in these contexts. The proposal efficiently evaluates the contextual reference level from a public SPARQL endpoint limited by a fair-use policy. Using a new benchmark based on Wikidata, the experiments show the interest of the contextual reference level for identifying the features deemed relevant by users with high precision and recall. In addition, the proposed optimizations significantly reduce the number of required queries for properties as well as for inverse relations. Interestingly, this experimental study also show that the inverse relations bring out a large number of numerical comparison features.
Article Preview
Top

1. Introduction

A comparison table (see Table 1) is a double-entry table with entities to compare in columns and comparison features in rows. The comparison table is a particularly useful tool for decision making by isolating the common points and major differences between compared entities. Therefore, this analytical technique is popular in science to compare works, in culture to compare art works or in commerce to compare products or services. For instance, the SocialCompare website1 uses crowdsourcing to build a varied spectrum of comparison tables. The participants build a list of features for each entity and then, they construct tables by manually selecting the compared entities and the comparison features. The need to compare entities goes far beyond that. DBpedia, that is one of the largest hubs of the Semantic Web, was established around producing a queryable knowledge graph derived from Wikipedia content that’s able to answer questions like “​What have Innsbruck and Leipzig in common?​”2.

Despite the intensive use of comparison tables in real life, to the best of our knowledge, there is no method to automate the choice of the set of features for a given set of entities to compare. Automating the construction of comparison tables has several advantages. On the one hand, it makes it possible to create objective comparison tables, based on publicly available data. On the other hand, it also makes it possible to build comparison tables for fields where this type of analysis is not carried out due to a lack of expertise.

In this paper, the aim is to automate the process of generating a comparison table for a set of entities by querying a knowledge base (KB). For instance, starting from Ada Lovelace and Alan Turing, an end user wants to obtain a comparison table like the one presented by Table 1, built from Wikidata (the last column is the value of crl, a measure explained later). Beyond persons, the goal is to compare any type of entities, such as places (countries, cities), objects (tapestries, statues), institutions (universities, political parties), events (tournaments, festivals) and so on. Unfortunately, there is no theoretical framework for the design of comparison tables to determine if a feature is interesting for comparing entities. This task is non-trivial: according to the experiments carried out, in 17% of the cases a human evaluator does not know whether a feature is interesting or not for comparing the entities presented to him/her (see Section 7 for details). In Table 1, it seems natural to use gender to compare two persons. Besides, specifying that Turing was a member of the Royal Society is interesting only because it is two English scientists who are compared. Thus, the main challenge is to formalize the notion of interesting comparison feature. In addition, it is important to benefit from the huge knowledge bases available on the Semantic Web such as DBpedia (Auer & al., 2007), YAGO (Suchanek & al., 2007) or Wikidata (Vrandečić & Krötzsch, 2014) but this raises a problem of robustness and efficiency. Indeed, these knowledge bases are relatively reliable but they suffer from incompleteness (Razniewski & al., 2016; Zaveri & al., 2016). For this reason, it would be desirable that a feature considered interesting at a given moment remains so despite the subsequent addition of facts. For instance, in Table 1, completing Ada Lovelace’s religion should not affect the fact that “religion” is an interesting comparison feature. Furthermore, rather than downloading and centralizing data, it is more relevant to directly query public SPARQL endpoints to build the comparison tables. This has the advantage of guaranteeing an optimal level of values freshness. Nevertheless, the fair-use policy of these public endpoints, which cut off queries that are too expensive, raises optimization needs (Soulet & Suchanek, 2019).

Table 1.
A comparison table of Ada Lovelace and Alan Turing as running example
FeaturesEntitiesAda LovelaceAlan Turingcrl
sex or genderfemalemale0.908
spoken languageEnglishEnglish0.472
member ofRoyal Society0.205
field of workmathematics, computingmathematics, logic, cryptanalysis, cryptography, computer science0.110
manner of deathnatural causessuicide0.100
religion?atheism0.015

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing