As near-infinite amount of data are becoming accessible on the Web, it becomes more important to support intelligent personalized retrieval mechanisms, to help users identify the results of a manageable size satisfying user-specific needs. Example case studies include major search engines, such as Google and Yahoo, recently released personalized search, which adapts the ranking to the user-specific search context. Similarly, e-commerce sites, such as Amazon, are providing personalized product recommendation based on the purchase history and user browsing behaviors. To achieve this goal, it is important to model user preference and mine user preferences from user behaviors (e.g., click history) for personalization. In this article, we discuss recent efforts to extend mining research to preference and identify goals for the future works.
Traditional modeling for user preference can be categorized into (1) quantitative and (2) qualitative approaches. In the quantitative approach, given a set of data objects D, a utility function F assigns a numerical score F(o) for an object o in D. This utility score may aggregate scores on one (i.e., uni-attribute model) or more (i.e., multi-attribute model) attributes F(a1, . . ., an), when o = (a1, . . ., an). For instance, the utility of a house with price = 100k and size = 100 square foot can be quantified by a user-specific utility function, e.g., F = size/price, into a score, such that houses maximizing the scores, e.g., with largest size per unit price, can be retrieved.
Alternatively, in the qualitative approach, the preference on each object is stated in comparison to other objects. That is, given two objects x and y, instead of quantifying preferences into numerical scores, users simply state which one is more preferred, denoted as x > y or y > x. Alternatively, users may state indifference x ¡ y. Compared to quantitative modeling requiring users to quantify numerical scores of all objects, qualitative modeling is considered to be more intuitive to formulate (Payne, Bettman, & Johnson, 1993), while less efficient for evaluating the absolute utility of the specific object, as such evaluation requires relative comparisons to all other objects. Meanwhile, qualitative approach may also aggregate multiple orderings. Optimal results from such aggregation is formally defined as pareto-optimality as stated below.
Definition 1 (Pareto-optimality). A tuple x dominates another tuple y if and only if as x > y or x ¡ y in all the given orderings.Top
Major components of enabling personalized retrieval can be identified as (1) preference modeling, (2) preference mining, and (3) personalization, each of which will be discussed in the following three subsections.
As discussed in the prior section, preferences are modeled typically as (1) quantitative utility function (Fishburn, 1970; Keeney & Raiffa, 1976) or (2) qualitative utility orderings (Payne et al., 1993). Personalization is guided by preferences represented in these two models, to retrieve ideal data results that maximize the utility. To maximize quantitative utility, ranking query (Guentzer, Balke, & Kiessling, 2000; Fagin, Lotem, & Naor, 2003) of returning few highly preferred results has been studied, while to maximize qualitative utility, skyline query (Börzsönyi, Kossmann, & Stocker, 2001; Godfrey, Shipley, & Gryz, 2007) of returning pareto-optimal objects not less preferred to (or “dominated” by) any other object based on the given qualitative orderings, as we will discuss in detail in the personalization section.