Article Preview
TopIntroduction
One of the goals of recommender systems is to help users navigating large amounts of data. Existing recommender systems are usually categorized into content-based methods and collaborative filtering methods (Adomavicius et al., 2005). Content-based methods recommend to the user items similar to the ones that interested him in the past, whereas collaborative filtering methods recommend to the user items that interested similar users.
Applying recommendation technology to database, especially for recommending queries, is an emerging and promising topic (Khoussainova et al., 2009; Chatzopoulou et al., 2009; Stefanidis et al., 2009). It is of particular relevance to the domain of multidimensional databases, where OLAP analysis is inherently tedious since the user has to navigate large datacubes to find valuable information, often having no idea on what her forthcoming queries should be. This is often the case in discovery-driven analysis (Sarawagi et al., 1998) where the user investigates a particular surprising drop or increase in the data.
In our earlier works (Giacometti et al., 2008, 2009a) we proposed to adapt techniques stemming from collaborative filtering to recommend OLAP queries to the user. The basic idea is to compute a similarity between the current user’s sequence of queries (a session) and the former sequences of queries logged by the server. In these works, similarity between sessions is only based on the query text, irrespective of the query results. In this present article, to take into consideration what the users were looking for, we leverage query results to compute recommendations. Our approach is inspired by what is done in web search and e-commerce applications (Parikh et al., 2008) where inferred properties of former sessions are used to support the current session.
The present work improves on Giacometti et al. (2009b), where we proposed a framework tailored for recommending queries in the context of discovery driven analysis of OLAP cubes. The basic idea is to infer, for every former session on the OLAP system, what the user was investigating. As it is the case in discovery-driven analysis, this has the form of a pair of cells showing a significant unexpected difference in the data. We proposed a framework for detecting in the log of an OLAP server such pairs, arranging them into a specialisation relation, and recording per session the queries at various levels of detail that contain the pairs detected. During subsequent analyses, if a difference is found that was investigated in a former session, then the discoveries of this former session are suggested to the current user.
The goal of the present paper is to demonstrate the validity of this approach for recommending query in the particular context of discovery-driven analysis of OLAP cubes. To this end, we extend the work of Giacometti et al. (2009b) in the following ways: First the framework has been slightly changed to better take into account sessions investigating the same difference pair. This means that discoveries are no more recorded only for a particular session but can span across sessions. Second, the framework has been implemented and we undertook a few experiments to assess the effectiveness and the efficiency of our approach. Finally, we propose a dedicated architecture for implementing the approach beyond a prototypical setting.
This paper is organized as follows. The section discusses our approach with a simple yet realistic example. The third section reviews related work. Preliminary definitions on OLAP data model and query model are recalled in the fourth section. The framework of our recommender system is formally presented in the fifth section, and the algorithms are presented in the sixth section. In these sections, the example given in the second section is used as a running example to illustrate the framework. The seventh section introduces our prototypical implementation of the framework, and the eighth section presents some preliminary experiments. Finally, before concluding, we briefly discuss the feasibility of our approach in a real context and propose an architecture thereof.