Features of Semantic Similarity Assessment: Content- and Model-Based Perspectives

Features of Semantic Similarity Assessment: Content- and Model-Based Perspectives

Vijayarani J., Geetha T. V.
DOI: 10.4018/978-1-7998-9594-7.ch005
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Semantic similarity is a fundamental concept in computational linguistics. The models used for the representation of text have a major role in similarity computation. The text with multilingual and multimodal components shows the need for computing similarity based on different characteristics of text. This chapter studies various aspects of semantic similarity of linguistic units, cross-level similarity, semantic models, and similarity measures. One of the main motivations of this chapter is to analyze semantic similarity models such as geometric models, feature-based models, graph-based models, vector space models, and formal concept analysis models. In addition, a composite summary score based on words and hashtags is applied for the tweet summarization task which is effective when compared with other measures.
Chapter Preview
Top

Introduction

Computing semantic similarity is a necessary process in natural language processing and information retrieval. Semantic similarity has different aspects, such as proximity, closure and continuity, in addition to similar meaning. Semantic relations and attributes are intrinsically connected, where relations are more global than attributes (Goldstone et al., 1991). Semantic relatedness is more general than similarity. It covers all possible semantic relations (Zhu & Iglesias, 2017) and has a broad range of applications. Semantic relations such as hypernym or hyponym (‘is-a-kind-of’), meronym (‘is-a part-of’, ‘is an example of’) and antonym (‘is-opposite-of’) show the diversity of relatedness (Mohammad & Hirst, 2005). ‘Teacher’ and ‘professor’ are semantically similar, but ‘teacher’ and ‘student’ are semantically related.

Attributional, relational and functional similarities determine the most relevant meaning. The attributional similarity shows the correspondence between the attributes of two objects (Bollegala et al., 2011). ‘Hot’ and ‘cold’ have similar attributes such as ‘temperature’ and are considered related even though they are antonyms. Similarly, ‘winter’ and ‘summer’ share seasonal attributes. ‘Dog’ and ‘cat’ share relational attributes, and the pairs (Dog: bark) and (Cat: meow) describe a relational similarity. (Wheel, rotate) and (Birds, fly) are related to functional similarity. Words that are hierarchically related, such as (Furniture, table) and (Flowers, rose), have taxonomical similarity.

Additional research has been carried out on the similarity among words, sentences and documents (Ahsaee et al., 2014; Deguchi & Ishii 2021; Lopez-Gazpio et al., 2019; Sultan et al., 2015; Vigneshvaran et al., 2013). Word similarity is analyzed with features such as cooccurrence, context and sense. The similarity may vary over time due to inter- or intralinguistic factors causing semantic variations such as broadening, narrowing, metaphoric and metonymic (Tang et al., 2016). Concept similarity is derived from the information content of concepts based on a knowledge graph (Zhu & Iglesias, 2017). Ontology-based similarity measures rely on the ‘is-a’ hierarchy or taxonomical features (Sanchez et al., 2012). Word and context similarities provide concept mapping between ontologies (Zhen et al., 2008).

Sultan et al. (2015) used word alignment and semantic vector composition to analyze the similarity of sentences. Document similarity is examined in parts of the document, such as words, phrases, and sentences, and then the similarities are aggregated into a single unit. Cross-level similarities require that the complexity of similarity computation should be extended with the multilingual (Khakimova et al., 2020) and multimodal (Diao et al., 2020) contents of the text. Semantic models define the ways of projecting the text onto the semantic space and provide a mapping between the text and semantic space. They are categorized as geometric (Demaine et al., 2021), feature, graph, vector space, formal concept analysis (FCA) (Belohlavek & Mikula, 2020) and hybrid models. Similarity measures are categorized as corpus- and knowledge-based (Gomaa & Fahmy, 2013; Gupta et al., 2017) or path-, ‘information content’- and feature-based (Abdelrahman & Kayed 2015; Elavarasi et al., 2014).

Complete Chapter List

Search this Book:
Reset