A Survey on Data Mining Techniques in Research Paper Recommender Systems

A Survey on Data Mining Techniques in Research Paper Recommender Systems

Benard Magara Maake (Tshwane University of Technology, South Africa), Sunday O. Ojo (Tshwane University of Technology, South Africa) and Tranos Zuva (Vaal University of Technology, South Africa)
Copyright: © 2019 |Pages: 25
DOI: 10.4018/978-1-5225-8437-7.ch006

Abstract

In this chapter, the authors give an overview of the main data mining techniques that are utilized in the context of research paper recommender systems. These techniques refer to mathematical models and tools that are utilized in discovering patterns in data. Data mining is a term used to describe a collection of techniques that infer recommendation rules and build models from research paper datasets. The authors briefly describe how research paper recommender systems' data is processed, analyzed, and then, finally, interpreted using these techniques. They review different distance measures, sampling techniques, and dimensionality reduction methods employed in computing research paper recommendations. They also review the various clustering, classification, and association rule-mining methods employed to mine for hidden information. Finally, they highlight the major data mining issues that are affecting research paper recommender systems.
Chapter Preview
Top

1. Introduction

Recommender systems are lately gaining significant roles in information filtering search. In the field of research paper recommender systems, various data mining techniques have been utilized to perform various tasks. This chapter intends to highlight the use of data mining and associated methods that have been used in research paper recommendation. We partly adopt the data mining steps and methods for recommender systems as highlighted by (Amatriain, Jaimes, Oliver, & Pujol, 2011) in the recommender systems handbook by (Ricci, Rokach, & Shapira, 2011) to represent the various data mining methods and technologies that were employed at various levels of computing research paper recommendations. Data mining in this context consists of three main steps namely: Data preprocessing stage, Data analysis stage and the Result interpretation stage. We may not have a crisp separation and categorization of some of the methods and algorithms since most of them overlap.

This review chapter is organized according to the following sections: The chapter introduction and overview is presented in Section 1. A summary of data preprocessing methods and measures as utilized in research paper recommender systems is presented in Section 2. Classification algorithms utilized by research paper recommender systems are highlighted in Section 3. Section 4 presents clustering algorithms, while Section 5 presents other approaches to classification. Section 6 presents the main data mining issues facing research paper recommendation, whereas Section 7 concludes the chapter.

Figure 1.

Data Mining in RPRS

978-1-5225-8437-7.ch006.f01

Figure 1 highlights data mining features, approaches, and processes utilized in research paper recommender systems (RPRS). It represents the three main data mining steps which are consecutively applied during the processing of data, and they include data preprocessing step, data analysis step and finally, the results interpretation step. This chapter, however, dwell much on the first two steps, data preprocessing and data analysis steps since they actively utilize various data mining techniques.

Top

2. Data Preprocessing In Rprs

Data preprocessing is an important step in machine learning and information retrieval because it screens data for any problems to prevent the possibility of producing misleading results after the processing process. Real-world datasets in the field of RPRS were generally incomplete (Gupta & Varma, 2017), noisy (Bogers & Van den Bosch, 2008; Bollen & Van de Sompel, 2006; Dong, Tokarchuk, & Ma, 2009; J. He, Nie, Lu, & Zhao, 2012; Y. Liang, Li, & Qian, 2011; McNee et al., 2002; Torres, McNee, Abel, Konstan, & Riedl, 2004; Tran, Huynh, & Hoang, 2015; Wu, Hua, Li, & Pei, 2012; Xue, Guo, Lan, & Cao, 2014) and inconsistent (Capocci & Caldarelli, 2008) and thus required tasks that will transform them (Nascimento, Laender, da Silva, & Gonçalves, 2011). These preprocessing tasks include: data cleaning (Ferrara, Pudota, & Tasso, 2011), data integration (Hwang, Hsiung, & Yang, 2003; Mönnich & Spiering, 2008; Wu et al., 2012; Zarrinkalam & Kahani, 2012), data transformation (Joran Beel & Gipp, 2009), data reduction and data discretization. Data cleaning ensures that missing values are filled, noisy data is smoothed, outliers are removed (T.-P. Liang, Yang, Chen, & Ku, 2008) and all inconsistencies are resolved. Data integration ensures integration of all necessary files or databases (Zarrinkalam & Kahani, 2012). Data transformation normalises and aggregates the data going to be used for analysis. Data discretization ensures that some parts of numerical attributes are replaced with nominal ones, when the need arises.

Key Terms in this Chapter

Data Mining: The practice of examining large pre-existing databases in order to generate new information.

Classification: The action or process of categorizing or grouping something.

Similarity Measure: The measure of how much alike two data objects are. In data mining context, it is a distance with dimensions representing features of the objects.

Algorithms: A process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.

Recommender system: A subclass of information filtering system that seeks to predict the rating and preference a user would give to an item.

Complete Chapter List

Search this Book:
Reset