Article Preview
Top1. Introduction
A comparison table (see Table 1) is a double-entry table with entities to compare in columns and comparison features in rows. The comparison table is a particularly useful tool for decision making by isolating the common points and major differences between compared entities. Therefore, this analytical technique is popular in science to compare works, in culture to compare art works or in commerce to compare products or services. For instance, the SocialCompare website1 uses crowdsourcing to build a varied spectrum of comparison tables. The participants build a list of features for each entity and then, they construct tables by manually selecting the compared entities and the comparison features. The need to compare entities goes far beyond that. DBpedia, that is one of the largest hubs of the Semantic Web, was established around producing a queryable knowledge graph derived from Wikipedia content that’s able to answer questions like “What have Innsbruck and Leipzig in common?”2.
Despite the intensive use of comparison tables in real life, to the best of our knowledge, there is no method to automate the choice of the set of features for a given set of entities to compare. Automating the construction of comparison tables has several advantages. On the one hand, it makes it possible to create objective comparison tables, based on publicly available data. On the other hand, it also makes it possible to build comparison tables for fields where this type of analysis is not carried out due to a lack of expertise.
In this paper, the aim is to automate the process of generating a comparison table for a set of entities by querying a knowledge base (KB). For instance, starting from Ada Lovelace and Alan Turing, an end user wants to obtain a comparison table like the one presented by Table 1, built from Wikidata (the last column is the value of crl, a measure explained later). Beyond persons, the goal is to compare any type of entities, such as places (countries, cities), objects (tapestries, statues), institutions (universities, political parties), events (tournaments, festivals) and so on. Unfortunately, there is no theoretical framework for the design of comparison tables to determine if a feature is interesting for comparing entities. This task is non-trivial: according to the experiments carried out, in 17% of the cases a human evaluator does not know whether a feature is interesting or not for comparing the entities presented to him/her (see Section 7 for details). In Table 1, it seems natural to use gender to compare two persons. Besides, specifying that Turing was a member of the Royal Society is interesting only because it is two English scientists who are compared. Thus, the main challenge is to formalize the notion of interesting comparison feature. In addition, it is important to benefit from the huge knowledge bases available on the Semantic Web such as DBpedia (Auer & al., 2007), YAGO (Suchanek & al., 2007) or Wikidata (Vrandečić & Krötzsch, 2014) but this raises a problem of robustness and efficiency. Indeed, these knowledge bases are relatively reliable but they suffer from incompleteness (Razniewski & al., 2016; Zaveri & al., 2016). For this reason, it would be desirable that a feature considered interesting at a given moment remains so despite the subsequent addition of facts. For instance, in Table 1, completing Ada Lovelace’s religion should not affect the fact that “religion” is an interesting comparison feature. Furthermore, rather than downloading and centralizing data, it is more relevant to directly query public SPARQL endpoints to build the comparison tables. This has the advantage of guaranteeing an optimal level of values freshness. Nevertheless, the fair-use policy of these public endpoints, which cut off queries that are too expensive, raises optimization needs (Soulet & Suchanek, 2019).
Table 1. A comparison table of Ada Lovelace and Alan Turing as running example
FeaturesEntities | Ada Lovelace | Alan Turing | crl |
sex or gender | female | male | 0.908 |
spoken language | English | English | 0.472 |
member of | | Royal Society | 0.205 |
field of work | mathematics, computing | mathematics, logic, cryptanalysis, cryptography, computer science | 0.110 |
manner of death | natural causes | suicide | 0.100 |
religion | ? | atheism | 0.015 |