Research on Measuring Semantic Correlation Based on the Wikipedia Hyperlink Network

Research on Measuring Semantic Correlation Based on the Wikipedia Hyperlink Network

Feiyue Ye, Feng Zhang
Copyright: © 2013 |Pages: 11
DOI: 10.4018/ijsi.2013070101
(Individual Articles)
No Current Special Offers


As a free online encyclopedia with a large-scale of knowledge coverage, rich semantic information and quick update speed, Wikipedia brings new ideas to measure semantic correlation. In this paper, the authors present a new method for measuring the semantic correlation between words by mining rich semantic information that exists in Wikipedia. Unlike the previous methods that calculate semantic relatedness merely based on the page network or the category network, the authors' method not only takes into account the semantic information of the page network, it also combines the semantic information of the category network and it improves the accuracy of the results. Besides this, the authors analyze and evaluate the algorithm by comparing the calculation results with famous knowledge base (e.g., Hownet) and traditional methods based on Wikipedia on the same test set and prove its superiority.
Article Preview

Download and Structured Processing for the Wikipedia Corpus

In Wikipedia, each entry is established and organized by define rules, the main structural elements include explanatory pages (topic pages and category pages), special pages (e.g. redirect pages and disambiguation pages), templates and information boxes and so on (Li Yun, 2009). Explanatory pages are the most important part of Wikipedia and can be viewed as the semantic context of the concepts. A topic page in Wikipedia corresponds to a topic concept, which is edited by Wikipedia's contributors. Category pages mainly reflect the upper or lower relationship between categories, as well as all the pages a lower category contained. Through category pages, Wikipedia normatively organizes the large number of pages. Redirect pages and disambiguation pages are important resources when mining semantic information of Wikipedia, and it can be used to create synonyms thesaurus and word sense disambiguation thesaurus. Information boxes with a high structured degree, is an important structural source for semantic information mining.

In order to make information coverage of the knowledge base as wide as possible, “topic pages” and “category pages”(including simplified and traditional) in the Chinese field until November 5, 2012 were downloaded from the Wikipedia official open source site. And through conversion by simplified/and traditional interchanging API interface of Microsoft, finally, we collected 2,500,000 Chinese entries corresponding to topic pages, and 270,000 category entries.

Complete Article List

Search this Journal:
Volume 11: 1 Issue (2023)
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing