Article Preview
TopIntroduction
An entity synonymous relation is a semantic relationship between a pair of terms representing the same entity in the real world with the same or similar meaning (Abu-Salih, 2021; Qu et al., 2017; Shen et al., 2019). For example, is a pair of entity synonymous relation, since the “United States” and the “'USA” both represent the same country: The “United States of America.” In the specific applications, entity synonymous relations play an important role in many entity-based tasks, such as taxonomy construction (Abu-Salih et al., 2018; Huang et al., 2019; Huang et al., 2020; Wang et al., 2019), document retrieval (Kong et al., 2019; Liu et al., 2016; Wongthongtham et al., 2018; Yin et al., 2016), and topic detection (Padmanabhanet al., 2017; Xie et al., 2015). Therefore, extracting entity synonymous relations automatically is a crucial work for many downstream applications.
In previous work, the entity synonymous relation extraction approaches are mainly using lexical patterns or distributional corpus-level statistics:
- •
Lexical Pattern-Based Approaches: Such approaches employ lexical patterns to mine entity synonymous relations from texts (Nguyen et al., 2017; Simanovsky et al., 2011; Wang et al., 2010). For example, given a lexical pattern “X is referred to Y” and a sentence “The acetylsalicylic acid is often referred to as the aspirin,” it is possible to use the above lexical pattern to infer that “acetylsalicylic acid” and “aspirin” are synonymous.
- •
Distribution-Based Approaches: Such approaches exploit distributional corpus-level statistics to mine entity synonymous relations from texts (Chakrabarti et al., 2012; Qu et al., 2017; Turney, 2001). Based on the distributional hypothesis (Harris, 1954), the distribution-based approaches hold that terms that often appear in similar or same contexts are likely to be synonymous (Qu et al., 2017).
However, the above approaches have the following limitations:
- •
Low Coverage and Weak Ability in Processing Complex Texts: Lexical pattern-based approaches use the lexical patterns to mine the entity synonymous relations and thus result in low coverage. This is because it is difficult for the lexical patterns to effectively obtain entity synonymous relations from complex text.
- •
Low Precision and Wrong Entity Synonymous Relation Label: Distribution-based approaches may bring some noise. Some nonsynonymous entities can also appear in similar or same contexts. For example, “UK” and “USA” often appear in similar contexts, which could be labeled as a wrong entity synonymous relation pair.
- •
Little Attention Paid on Context Semantics: Lexical pattern-based and distribution-based approaches pay less attention on context semantics, and thus it is difficult to balance precision and recall.
In order to address the above limitations, this paper proposes an entity synonymous relation extraction approach based on context-aware permutation invariance. Specifically, the triplet network is employed to learn the permutation invariance (Huang et al., 2020; Shen et al., 2019) between the entities, and the entity relational contexts are employed to enhance the synonymous training signals in the triplet network. The main contribution of the paper is as follows: