Venue-Influence Language Models for Expert Finding in Bibliometric Networks

Venue-Influence Language Models for Expert Finding in Bibliometric Networks

Abdullah Al-Barakati (King Abdulaziz University, Jeddah, Saudi Arabia) and Ali Daud (King Abdulaziz University, Jeddah, Saudi Arabia & International Islamic University, Islamabad, Pakistan)
Copyright: © 2018 |Pages: 18
DOI: 10.4018/IJSWIS.2018070109

Abstract

This article investigates the fundamental problem of traditional language models used for expert finding in bibliometric networks. It introduces novel Venue-Influence Language Modeling methods based on entropy, which can accommodate citation links based weights in an indirect way without using links information. Intuitively, an author publishing in topic-specific venues, either journals or for conferences, will be an expert on a topic as compared to an author publishing in multi-topic venues. The proposed methods are evaluated on real world data, the Digital Bibliography and Library Project (DBLP) dataset to test the performance. Experimental results show that their proposed venue influence language models (ViLMs) based methods outperform the traditional (non-venue based) language models (LM).
Article Preview

1. Introduction

Bibliometric Networks are highly investigated networks recently due to their valuable applications for academic recommendations. They usually emerge from the authors writing together and papers citing other papers. Many computer science bibliometric networks exist these days, such as, Citeseer1 and DBLP2 which provide us with several knowledge discovery problems. For example, with the evolution of Information Retrieval (IR) technologies from traditional document-level to modern object-level (Nie et al., 2007), expert finding task has attracted a lot of researchers. Typically, demand of such system arises when someone needs to learn about a subject and is looking for an expert to guide him. Formally, expert finding addresses the problem of finding the right expert for a specific domain. The question can be like ‘Who are the experts on the topic Data Mining?’ and found researchers with topic relevant expertise can be used to automatically accomplish diverse recommendation tasks. Such as; to hire faculty members of specific research area, to find appropriate members for the project evaluation committees and reviewers for research papers reviewing for conferences and journals, etc.

Text Retrieval Conference (TREC) provides a very well-organized platform for expert finding task (Craswell and Vries, 2005). The methods proposed for tackling this task can be mainly classified into three types. (1) The adoption of traditional vector space model (Salton et al. 1975) based document retrieval methods for proposing language models for finding experts related to a query (Balog et al., 2006; Zhu et al., 2009) (2) the association or PageRank based graph link exploitation methods, which calculates the importance of authors based on co-authors relationships and citing links in co-author networks (Fu et al., 2007; Wei et al. 2010) and (3) topic modeling based methods (Daud et al., 2010; Tang et al., 2008; Zhang et al., 2008) which exploits the text based semantics for overcoming exact term matching problem of language model based methods. Unfortunately, none of the above-mentioned methods exploit the venue’s influence via entropy.

Our intuition is based on the fact that the publication venues which are stringent in accepting papers related to their topics of interest are of higher-level and have low entropy as compared to the publication venues which are not very stringent in accepting papers related to their topics of interest thus, have high entropy. Table 1 shows supporting evidence to our intuition of including venue’s influence in language models. One can see that well known Symposium on Geometric Processing (SGP), Special Interest Group conference on Information Retrieval (SIGIR), SIG conference on Management of Data (SIGMOD) and other venues in high-level venues column have low entropy (disorder) as compared to the venues in ‘low-level’ column which have higher entropy. In addition, to support that venues having low entropy are of high-level, we have considered the citations received by the papers published in that venue. One can see that the average citations per each paper published in venues with low entropy, which are called here high-level venues, is 27.68. While average citations per each paper published in venues with high entropy, which are called low-level venues, is only 2.61. It shows negative relationship between entropy and average number of citations.

Table 1.
Venues with entropies
High-Level VenuesEntropyCitations per PaperLow-Level VenuesEntropyCitations per Paper
SGP1.5827.83BIBE1.992.27
SIGIR1.7426.06CBMS1.992.71
SIGMOD1.7131.05CIARP1.931.05
SODA1.6721.39PAKDD1.895.04
SPAA1.7218.00ADMA1.951.14
STOC1.6741.79IOLTS1.963.48
Average1.6827.68Average1.952.61

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 16: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 15: 4 Issues (2019): 3 Released, 1 Forthcoming
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing