Article Preview
Top1. Introduction
Bibliometric Networks are highly investigated networks recently due to their valuable applications for academic recommendations. They usually emerge from the authors writing together and papers citing other papers. Many computer science bibliometric networks exist these days, such as, Citeseer1 and DBLP2 which provide us with several knowledge discovery problems. For example, with the evolution of Information Retrieval (IR) technologies from traditional document-level to modern object-level (Nie et al., 2007), expert finding task has attracted a lot of researchers. Typically, demand of such system arises when someone needs to learn about a subject and is looking for an expert to guide him. Formally, expert finding addresses the problem of finding the right expert for a specific domain. The question can be like ‘Who are the experts on the topic Data Mining?’ and found researchers with topic relevant expertise can be used to automatically accomplish diverse recommendation tasks. Such as; to hire faculty members of specific research area, to find appropriate members for the project evaluation committees and reviewers for research papers reviewing for conferences and journals, etc.
Text Retrieval Conference (TREC) provides a very well-organized platform for expert finding task (Craswell and Vries, 2005). The methods proposed for tackling this task can be mainly classified into three types. (1) The adoption of traditional vector space model (Salton et al. 1975) based document retrieval methods for proposing language models for finding experts related to a query (Balog et al., 2006; Zhu et al., 2009) (2) the association or PageRank based graph link exploitation methods, which calculates the importance of authors based on co-authors relationships and citing links in co-author networks (Fu et al., 2007; Wei et al. 2010) and (3) topic modeling based methods (Daud et al., 2010; Tang et al., 2008; Zhang et al., 2008) which exploits the text based semantics for overcoming exact term matching problem of language model based methods. Unfortunately, none of the above-mentioned methods exploit the venue’s influence via entropy.
Our intuition is based on the fact that the publication venues which are stringent in accepting papers related to their topics of interest are of higher-level and have low entropy as compared to the publication venues which are not very stringent in accepting papers related to their topics of interest thus, have high entropy. Table 1 shows supporting evidence to our intuition of including venue’s influence in language models. One can see that well known Symposium on Geometric Processing (SGP), Special Interest Group conference on Information Retrieval (SIGIR), SIG conference on Management of Data (SIGMOD) and other venues in high-level venues column have low entropy (disorder) as compared to the venues in ‘low-level’ column which have higher entropy. In addition, to support that venues having low entropy are of high-level, we have considered the citations received by the papers published in that venue. One can see that the average citations per each paper published in venues with low entropy, which are called here high-level venues, is 27.68. While average citations per each paper published in venues with high entropy, which are called low-level venues, is only 2.61. It shows negative relationship between entropy and average number of citations.
Table 1. High-Level Venues | Entropy | Citations per Paper | Low-Level Venues | Entropy | Citations per Paper |
SGP | 1.58 | 27.83 | BIBE | 1.99 | 2.27 |
SIGIR | 1.74 | 26.06 | CBMS | 1.99 | 2.71 |
SIGMOD | 1.71 | 31.05 | CIARP | 1.93 | 1.05 |
SODA | 1.67 | 21.39 | PAKDD | 1.89 | 5.04 |
SPAA | 1.72 | 18.00 | ADMA | 1.95 | 1.14 |
STOC | 1.67 | 41.79 | IOLTS | 1.96 | 3.48 |
Average | 1.68 | 27.68 | Average | 1.95 | 2.61 |