Detecting Restriction Class Correspondences in Linked Data: The Bayes-ReCCE Bayesian Model Approach

Detecting Restriction Class Correspondences in Linked Data: The Bayes-ReCCE Bayesian Model Approach

Brian Walshe, Rob Brennan, Declan O'Sullivan
DOI: 10.4018/978-1-5225-5042-6.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Linked Data consists of many structured data knowledge bases that have been interlinked, often using equivalence statements. These equivalences usually take the form of owl:sameAs statements linking individuals, links between classes are far less common Often, the lack of class links is because their relationships cannot be described as one to one equivalences. Instead, complex correspondences referencing logical combinations of multiple entities are often needed to describe how the classes in an ontology are related to classes in a second ontology. This chapter introduces a novel Bayesian Restriction Class Correspondence Estimation (Bayes-ReCCE) algorithm, an extensional approach to detecting complex correspondences between classes. Bayes-ReCCE operates by analysing features of matched individuals in the knowledge bases, and uses Bayesian inference to search for complex correspondences between the classes these individuals belong to. Bayes-ReCCE is designed to be capable of providing meaningful results even when only small numbers of matched instances are available.
Chapter Preview
Top

Introduction

Linked Open Data provides access to a wealth of information in standardised and navigable form. It is designed to be combined easily. Bizer (2009) notes however that … most Linked Data applications display data from different sources alongside each other but do little to integrate it further. To do so does require mapping of terms from different vocabularies to the applications target schema. Links usually take the form of owl:sameAs statements linking individuals, but links between classes are far less common (Schmachtenberg, Bizer & Paulheim, 2014). Heterogeneity issues, such as differences in class scope or hierarchy granularity mean that simple one to one correspondences between atomic classes are not always enough to describe the mappings between schemas, or more generally, ontologies. The YAGO2 (Suchanek, Kasneci & Weikum, 2008) knowledge base, for example, contains a rich class hierarchy based on WordNet (Miller,1995), and includes many professions described as classes. An instance of a person in YAGO2 who is a film director, belongs to the class yago:FilmDirector. In contrast, version 3.9 DBpedia (Bizer, Lehmann, Kobilarov, Auer, Becker, Cyganiak & Hellman, 2009) has a shallower class hierarchy, with professions described as attribute-values, not classes. In this version of the DBpedia ontology there is no named class for film directors. If one to one mappings between named classes is the only mechanism available, then we could say that yago:FilmDirector maps to dbpedia:Person with a subsumption relationship; but this does not describe which members of the class Person are film directors. If, instead, complex correspondences between non-atomic classes were used, then it could be asserted that yago:FilmDirector corresponds with the set of instances of Person in DBpedia with the attribute dbpedia-owl:occupation set to dbpedia:Film_director. More formally, correspondences where at least one of the entities described in the correspondence is non-atomic are known as complex correspondences (Ritze, Meilicke, Svab-Zamazal & Stuckenschmidt, 2009).

Research has shown that complex correspondences can be classified into commonly reoccurring Correspondence Patterns (Scharffe, 2009). Extensional methods, which compare the instance sets of classes using some metric such as the Jaccard index, have been shown to be capable of detecting complex correspondences between ontologies used in Linked Open Data (Parundekar, Knoblock & Ambite, 2010; Parundekar, Knoblock & Ambite, 2012). However, extensional approaches have several issues. When only small amounts of instance data are available they can give high scores to spurious matches, and when the amount of data is large, the search space of potential correspondences can grow very quickly. A more subtle problem is that directly comparing the instance sets of two classes to test similarity is not consistent with the Open World Assumption. Furthermore existing extensional approaches have an a priori assumption that all forms of complex correspondences are equally probable, and the approaches do not provide a systematic way for us to specify any prior beliefs we have that certain patterns of correspondences may be more probable than others.

In this chapter we describe Bayes-ReCCE, a scalable complex correspondence detection algorithm which uses Bayesian statistics to estimate the true Jaccard index of the classes being compared, and which provides a method to specify prior beliefs about certain patterns of correspondence being more or less probable than others. Bayes-ReECCE presents the most probable correspondences to a user, combined with a summary of the evidence for each of these correspondences. Using the probability measure for the correspondence and examining the evidence allows a user to make a more informed decision on whether to accept or reject the correspondence.

The objectives of this chapter are to demonstrate:

Complete Chapter List

Search this Book:
Reset