Article Preview
Top1. Introduction
Rating scales are visual widgets that are characterized by specific features (e.g. granularity, numbering, presence of a neutral position, etc.) which allow users to provide quantitative input to a system. Each system uses its own different rating scale, with different features such as granularity and visual presentation. Examples of rating scales are stars in Amazon (Amazon), Anobii (Anobi) and Barnes & Noble (Barnes & Noble), thumbs in Facebook (Facebook) and YouTube (Youtube), circles in Tripadvisor (Tripadvisor), squares in LateRooms (LateRooms), bare numbers in Criticker (Criticker). In recommender systems (Adomavicius and Tuzhilin, 2005), users rate items to receive personalized suggestions about other items (similar to the previous ones or liked by similar users).
Understanding how users perceive rating scales, and why they might prefer one to another, is very important for interface designers in order to create more effective and pleasant web sites. This problem can be framed in terms of preferential choices (Jameson, 2012), i.e., when two or more options are available, none of which can be defined as “incorrect”, but one of which can be preferred for some reason (e.g., tasks, user skills, usage context, habits, etc. (Jameson et al., 2011)).
Rating scales are widely studied in the literature, especially in survey design (Garland, 1991; Colman et al., 1997; Amoo & Friedman, 2001; Dawes, 2008) and Human-Computer Interaction (HCI) (Cosey et al., 2003; van Barneveld & van Setten, 2004; Nobarany et al., 2012; Herlocker et al., 2004), but not in terms of preferential choices, i.e. not focusing on users’ decision making process. In the survey design field, scales are compared according to their psycometric properties, i.e., their ability to detect “real” user opinions. In the HCI field, scales are mainly studied from a usability point of view. In a sense, the question that all previous works aimed to answer was: “What is the scale that measures the best?”. We instead aim to answer a different question: “What is the scale that users would choose?”, assuming that there may be other criteria beside precision and usability. With this paper, we investigate how users choose rating scales when they are offered this opportunity. In particular, we study whether users prefer different scales for evaluating different objects, and whether user’s choices change after they have rated a certain object repeatedly, gaining a higher level of experience with the evaluated object.
To answer these questions, we first analysed existing rating scales in order to define an abstract model, which allowed us to identify three generic “classes” of rating scales. Then, we carried out a user study to investigate user choices with respect to three scales chosen as representatives of each class. According to our findings, user choices are influenced by the evaluated objects and overall preferences for rating scales can change after their repeated use. Based on our results, we formulate some guidelines for systems designers.
The main contributions of this paper are:
- •
A general model of rating scales;
- •
The results of a user study on preferential choices about rating scales in a website;
- •
New insights which can help system designers to include the most appropriate rating scales.
The paper is structured as follows. Section 2 presents a preliminary study to devise a model for describing rating scales. Section 3 clarifies how we chose the scales, the objects to evaluate and the use case for our user study, while Sections 4 and 5 describe the study and its results. Section 6 provides the theoretical background for our research and analyzes related works, offering a systematic comparison with the results obtained by the most relevant ones. Section 7 concludes the paper with some guidelines derived from our results and with some possible directions for future work.