Article Preview
TopIntroduction
Ranking is the problem observed in many information retrieval systems. Learning to rank is the application of using machine learning methods to construct a model for automatically predicting the ranking for a list of documents. Learning to rank has been applied in document retrieval, multimedia retrieval, and collaborating filtering. A large number of algorithms have been proposed for learning to rank (Liu, 2011) for document retrieval applications. In document ranking, training data contain a list of queries and a list of documents related to each query where relative judgments are provided for each query-document pair. Each instance is a query-document pair that is represented by a feature vector. Training data apply a learning algorithm to learn a ranking model.
Learning to rank algorithms contain three major approaches: pointwise, pairwise and listwise which are different in input and output space, hypothesis and loss function. In the first approach the input space contains a single document-query pair, a hypothesis contains the scoring functions that predict the relevant degree of a document and a loss function evaluates the accuracy of every document. The output space depends on the algorithm that is used. In regression based algorithms, the output is a relevancy score which is a real value. Some studies consider ranking as a classification problem including binary and multiclass classification. Output space in these works contains binary relevant judgment (0 or 1) and multiclass label. In some studies which consider ranking as an ordinal regression problem, output are ordered categories. In the pairwise approach input space includes pairs of documents represented by feature vectors, and a hypothesis which mostly contains scoring functions. The loss function measures the inconsistency between ground truth and model output and output space contains a preference between pairs of documents. Many algorithms presented for this approach contain neural network model (Burges & Shaked, 2005), boosting methods (Freund & Iyer, 2003) and SVM models (Joachims, 2006). In the listwise approach input space includes a list of documents associated with a query, hypothesis include scoring function, two kinds of loss function proposed; first directly based on evaluation measure and the second type loss of function is not measured specifically (Cao & Qin, 2007). The first type performs in three ways: approximate a measure, optimize a bound for a measure and directly optimize evaluation measures (Xu & Li, 2007). Output space contains ranked list of documents.
Learning to rank algorithms apply standard ranking models including BM25, language model, and PageRank as features. The number of features in datasets is considerable making computations time consuming leading to a complicated training process. Moreover, the existence of irrelevant, noisy and redundant features can lead to reducing the accuracy of the ranking algorithm. Irrelevant features are those that do not have much effect on the output and a redundant feature plays the same role. Feature selection methods can effectively provide solutions to the aforementioned problems.
In recent years, several works have applied feature selection for learning to rank. Feature selection methods used for classification are categorized into three principal groups (Guyon & Elisseeff, 2003). All of the three categories have been applied in learning to rank, which include preprocessing filter methods, wrapper and embedded methods. Filter methods are preprocessing algorithms which are independent from learning phase contain (Geng & Liu, 2007; Naini & Altingovde, 2014; Gupta & Rosso, 2012; Shirzad & Keyvanpour, 2015; Gigli & Lucchese, 2016; Yu & Oh, 2009; Hua & Zhang, 2010), the wrapper methods constitute an other preprocessing strategy, conducted based on ranking algorithm (Pan & Converse, 2009; Dang and Croft, 2010; Pahikkala & Airola, 2010; Sousa & Canuto, 2016; Yu & Oh, 2009; Hua & Zhang, 2010) where embedded methods run feature selection during learning ranker (Sun & Qin, 2009; Lai & Pan, 2012; Lai & Pan, 2013; Lai & Pan, 2013; 2014; Chang & Zheng, 2009; Krasotkina & Mottl, 2015). Feature selection for learning to rank algorithms can investigate two aspects, ranking model and feature selection models. Learning to rank algorithms has been intensively studied (Liu, 2011) and, consequently, we focus on feature selection strategies. Due to the importance of feature selection in learning to rank algorithms and with regard to the studies that have shown the effectiveness and efficiency of applying it in ranking, in this paper we review and classify most studies.