Article Preview
TopIntroduction
Reasoner performance prediction for OWL 2 ontologies has been studied so far from different dimensions. One key aspect of these studies has been the prediction of how much time a particular reasoning task for a given ontology will consume. Several approaches have adopted machine-learning techniques to predict time consumption of different reasoning tasks depending on features of the input ontologies. However, these studies have mainly focused on the complexity of their TBoxes, while paying little attention to ABox details. ABox information is particularly important in real-world scenarios, where data volumes are much larger than data-describing schema information.
The language OWL 2 DL (Cuenca-Grau et al. (2008)), the most expressive profile of OWL 2, has a worst-case complexity that is 2NEXPTIME-complete (Kazakov (2008)), which constitutes a bottleneck for performance critical applications. Empirical studies show that even the EL profile, with PTIME-complete complexity and less expressiveness, can become too time-consuming (Dentler et al. (2011), Kang et al. (2012b)).
There have been several studies regarding performance prediction of ontologies. Kang et al. (2012a) investigated the hardness category (categories according to reasoning time) for reasoner-ontology pairs and used machine-learning techniques to make a prediction. Using the reasoners FaCT++ (Tsarkov & Horrocks (2006)), HermiT (Glimm et al. (2014)), Pellet (Sirin et al. (2007)), and TrOWL (Pan et al. (2016, 2012), Ren et al. (2010), Thomas et al. (2010)), their prediction had high accuracy in terms of hardness category, but not in terms of reasoning time. In a subsequent study, Kang et al. (2014) investigated regression techniques to predict reasoning time. They made experiments, based on their syntactic metrics, using the reasoners FaCT++, HermiT, JFact, MORe (Armas-Romero et al. (2012)), Pellet, and TrOWL. These metrics are generally effective when there is a balance between TBox axioms and ABox axioms. However, our preliminary experiments in Guclu, Bobed, Pan, Kollingbaum & Li (2016) showed that the accuracy of these metrics decreases when the relative size of the ABox with respect to the TBox increases.
We regard this observation important as there are many real-world scenarios where the amount of data exceeds by far the size of the schema associated with them (e.g., Linked Data repositories (Bizer et al. (2009))). Besides, as observed in Yus & Pappachan (2015), there is an increasing interest in using semantic technologies on mobile devices (Bobed et al. (2015)). Given that the ABox constitutes the data of an ontology (Fokoue et al. (2012), Hogan et al. (2011), Ren et al. (2012)), whereas TBox constitutes the schema, on mobile devices, with their restricted resources, TBox axioms are expected to be rather static, whereas the ABox axioms (data) tend to change more frequently. Thus, due to volume and dynamism, an approach that can capture the influence of the ABox in reasoning performance in a more accurate way is needed to make accurate overall predictions. Plenty of applications can benefit from this prediction mechanism, both in resource-limited scenarios as well as in non-limited ones. For example, on the one hand, having an accurate processing time prediction can be combined with battery consumption prediction (Guclu, Li, Pan & Kollingbaum (2016)) to devise new adaptive methods for reasoning in mobile devices. On the other hand, semantic applications dealing with highly volatile data can also benefit from these predictions to decide whether or not to update the materialization of their knowledge (Bobed et al. (2014)).