Article Preview
Top1. Introduction
Information retrieval is presently a prominent area of research due to the exponential increase in the dimensions of World Wide Web in the form of text documents, images, videos and much more. It is applied to various general applications such as web search, media search, information filtering, etc. Moreover, it is applied also to domain specific applications such as vertical search, geographic information retrieval, etc. Search engine is the most significant tool designed for information retrieval from various abundant sources of information available on the web. These sources may be authoritative or unreliable, may have restricted or open access, etc., there are several reasons for rapid change and expansion in the size and structure of World Wide Web. Therefore, the web search algorithms need to be dynamic and robust in order to tackle the inevitable challenges such as spam filtering, ambiguous and lengthy queries, complex search, personalized search, session search etc. Single search engines, individually, are not sufficient to tackle all the obstacles for delivery of effective and reliable results. For this, usage of the multiple search engines is an essential task for the retrieval of web data. A single search engine encounters several problems and suffers severe drawbacks which are found in literature.
A solo search engine is insufficient to encompass enormous resources present on the web (Beg and Ahmad, 2003). Moreover, the indexed copyrighted documents such as online digital libraries, etc. is not included in the overall web coverage. Also, the fast growth in the size of web data, the search engine algorithm is hardly able to optimize the trade-off between the web page update frequency and coverage (Renda and Straccia, 2003). In addition to this, a tricky problem of spam ranking is permanently existent due to which biasness in rankings cannot be alienated. This causes misleading and inaccurate ranking results. To get rid of these shortcomings, the technique of rank aggregation (RA) is integrated with multiple search algorithms. The consequential search algorithm obtained by this method is known as meta-search (MS) algorithm.
A MS algorithm incorporates different methods and techniques of the contributing search engines by the fusion of all the result rankings together. The involvement of several search algorithms has the impact that diverse coverage of web documents is achieved, due to which a MS engine reaches almost the entire World Wide Web. A reduction on spam is enabled by the consistency check on the rankings of different search engines. An irregular ranking generated by a search algorithm is filtered out by the RA algorithm (Aslam and Montague, 2001). The MS engine enables user to get rid of advanced query formulation by the automatic word associations (Dwork et al., 2001). As a result, a simple query that contains only keywords is sufficient for web search. Due to these advantages the MS engine is a preferred choice in place of the solo search engine. It is designed by the application of RA algorithm on the resultant rankings of the contributing search algorithms. Therefore, the effectiveness of the MS engine depends upon merit of the underlying RA algorithm.
Numerous types of RA algorithms found in literature are (i) positional methods; (ii) score-based methods; (iii) learning techniques; (iv) probabilistic methods; (v) markov chain techniques; and (vi) fuzzy logic techniques. (Dwork et.al., 2001; Aslam and Montague, 2001; Renda and Straccia, 2003; Beg and Ahmad, 2003; Liu et al., 2007; Ailon, 2008; Akritidis et al, 2010; Qin et al., 2010; Yasutake et al., 2013, Desarkar et al., 2016). RA technique using nuclear norm minimization by Gleich and Lim (2011) combines pairwise aggregation and matrix completion problem to obtain the rankings of objects. By realizing the analogy of pairwise preference judgments with tournament problem led to derive relevance score of documents. The rankings are generated using these scores (Bashir et al., 2013). RA technique is also applied in crowdsourcing to obtain the judgments over the objects using the ratings and preferences of the objects (Nui et al., 2015).