Article Preview
Top1. Introduction
Any representation of data is to be rich, well-structured and connected to accomplish the goals of the semantic web. This representation is provided by the ontology, an abstract model that describes a domain of interest with a set of concepts and rich relationships (IS-A + other relations) among the concepts in question. With the development of the semantic web, there has been a remarkable growth in the number of ontologies available. Systems processing these ontologies are required to have a basic understanding of the underlying information of the ontology to facilitate improved retrieval, management and exploitation.
Assessing similarities between ontological concepts is a basic step to understand the underlying information (Sánchez et al., 2012) which is obtained by the various similarity measures in the literature reviewed. In general, inputs to similarity measures are two concepts from the same or multiple ontologies and the similarity is represented by a real value, usually ranging from 0 to 1.
The similarity measures are mainly used in the following application areas. 1) ontology and document clustering to discover similar concepts (Do & Rahm, 2007; Hamdi et al., 2010; Hu, Qu & Cheng, 2008; Sridevi & Nagaveni, 2011), 2) ontology-matching systems for semantic interoperability (Euzenat & Shvaiko, 2007), 3) ontology mapping for semantic integration, etc. The similarity measures are used to eliminate heterogeneity among the different ontologies of same domain, thus enabling semantic interoperability and integration among the ontologies.
They are also used in a variety of applications such as classification (Im et al., 2018), Query expansion (Singh & Kumar, 2017), similar concept discovery in biomedical field (Pesquita et al., 2009; Zhang et al., 2008), recommendation (Likavec, Osborne & Cena, 2015), e-learning (Deborah, Baskaran & Kannan, 2012), web service discovery (Fellah, Malki & Elci, 2016), and assorted natural language processing tasks such as spelling error correction and detection (Budanitsky & Hirst, 2001), information retrieval (Hliaoutakis, 2006; Hwang & Kim, 2009), cross-lingual processing (Huang & Kuo, 2010), detection of synonyms (Lin, 1998), word sense disambiguation (Patwardhan, Banerjee & Pedersen, 2003), and so on.
As a general rule, each similarity measure exploits different views of information, such as linguistic, semantic and instances of the ontology. Linguistic similarity measures consider the information about a concept in terms of its name, label, comment, annotation, synonyms, etc. Structural similarity measures use the structural information of the concept such as its depth, Information Content (IC), neighbourhood, synset (synonym set), and the path length between two concepts to compute the similarity. Extensional similarity measures like the Jaccard similarity and Hamming distance use instances of the concepts to compute similarity values.
In this paper, we concentrate on measures which exploit semantic information. Semantic similarity measures in the literature are classified into categories based on the type of information used: path-based, depth-based, information-content (IC) based, hybrid, and feature-based ones (Sathiya et al.)