Article Preview
TopIntroduction
Recommender systems have played an important role in people’s search for articles of interest, thus making searching an easy and enjoyable exercise. But more importantly, giving personalized recommendations is the beauty of any recommender system. Effectiveness of a recommender system in this regard essentially depends on the quality of recommendations obtained from it. This no doubt, and explains why many authors have placed high degree of importance on determining the quality of recommendations obtained. This can be judged using several metrics and criteria. In addition to using metrics and criteria, effectiveness can also be measured from users and system perspectives (Cremonesi et al., 2013). It is reasonable that, for an effective evaluation of recommenders systems to take place, this must be done from both perspectives. Evaluation of a recommender system is incomplete without evaluating user experiences in using a certain recommender system, no matter how effective the system might seem from system’s perspective. This perspective will involve considering evaluation metrics. There have been attempts to produce an evaluation framework that combines system-centric and user-centric evaluation methods (Kavu et al., 2017). This seems to be a way of not leaving out any part in trying to get a clear picture of effectiveness or quality of a recommender system. From these perspectives, harmonized metrics and criteria for carrying out such an evaluation is desirable, especially if they can be categorized. It has been affirmed that performing an evaluation before deploying a recommender system is very essential (Fazeli et al., 2017). This therefore requires a set of metrics and criteria that will provide an agreeable result among relevant stakeholders. While our work focuses on healthcare, there is general lack of uniformity in the metrics for evaluation of recommender systems because many of them are in use (Del Olmo & Gaudioso, 2008). A number of these metrics have been used by individuals based on their perceived appropriateness to their work. Therefore, harmonizing and categorizing them will provide a uniform platform for evaluating recommender systems regardless of the objectives of an individual researcher or stakeholder. Valdez et al. (2016) listed evaluation of a recommender system as one of the important steps. It can then be deduced that thinking about developing a recommendation system should go along with thinking about making it meet evaluation criteria. This obviously is better done as part of requirements and design of such a recommender system.
In justifying their choice of metrics and proving their appropriateness, many authors have chosen to evaluate their work using two or three metrics. Accuracy has been the most popular metric for evaluating recommender systems and this sometimes is based on the algorithm used (Del Olmo & Gaudioso, 2008; Sokolova & Lapalme, 2009; Rebouças Filho et al., 2017; Rodrigues et al., 2018). Related closely to this are Precision, recall and sort priority which have also been identified as the common evaluation metrics for evaluation of recommender systems. (Zhong & Li, 2016; Moreira et al., 2018) However, some have argued that there is the need to look beyond accuracy (Vargas & Castells, 2011; Wu et al., 2012; He et al., 2016). It has also been reported that quality of recommendation is an important metric that deserves an attention rather than just the predictive accuracy of algorithms (Ge et al., 2008). While accuracy has been given wider publicity as a metric for determining the effectiveness of a recommender system, some researchers have opined that it is not a good measure of quality perceived by the users (Cremonesi et al., 2011) but other metrics such as serendipity and coverage (Ge et al., 2008) along with confidence (Duan et al., 2011) have more roles to play in satisfying users recommendations desired other than the accuracy. While it may be true to have surprise recommendations that fit the needs of a particular user (serendipity), delivering health recommendations as a surprise to the user should be done with a lot of caution. In an ongoing project, Recommendations Sharing Community for Aged and Chronically Ill People (ReSCAP), concern is more about addressing the specific needs of the individuals within this community in order to reduce time spent in searching, and because of the chronic nature of the ailments of the individuals involved. In this type of project, timeliness as an important factor to be considered can enhance accuracy (Zhang et al., 2017).