On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets

On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets

Eleftherios Tiakas (Department of Informatics, Aristotle University, Thessaloniki, Greece), Apostolos N. Papadopoulos (Department of Informatics, Aristotle University, Thessaloniki, Greece) and Yannis Manolopoulos (Department of Informatics, Aristotle University, Thessaloniki, Greece)
Copyright: © 2013 |Pages: 23
DOI: 10.4018/ijkbo.2013100104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The last years there is an increasing interest for query processing techniques that take into consideration the dominance relationship between items to select the most promising ones, based on user preferences. Skyline and top-k dominating queries are examples of such techniques. A skyline query computes the items that are not dominated, whereas a top-k dominating query returns the k items with the highest domination score. To enable query optimization, it is important to estimate the expected number of skyline items as well as the maximum domination value of an item. In this article, the authors provide an estimation for the maximum domination value under the dinstinct values and attribute independence assumptions. The authors provide three different methodologies for estimating and calculating the maximum domination value and the authors test their performance and accuracy. Among the proposed estimation methods, their method Estimation with Roots outperforms all others and returns the most accurate results. They also introduce the eliminating dimension, i.e., the dimension beyond which all domination values become zero, and the authors provide an efficient estimation of that dimension. Moreover, the authors provide an accurate estimation of the skyline cardinality of a data set.
Article Preview

Introduction

Top-k and skyline queries are two alternatives to pose preferences in query processing. In a top-k query a ranking function is required to associate a score to each item. The answer to the query is the set of k items with the best scores. A skyline query does not require a ranking function, and the result is based on preferences (minimization or maximization) posed in each attribute. The result is composed of all items that are not dominated. For the rest of the work we deal with multidimensional items, where each dimension corresponds to an attribute. Formally, a multidimensional item dominates another item when:

where d is the number of dimensions. A top-k dominating query may be seen as a combination of a top-k and a skyline query. More specifically, a top-k dominating query returns the k items with the highest domination scores. The domination value of an item p, denoted as dom(p), equals the number of items that p dominates (Yiu & Mamoulis, 2007; Yiu & Mamoulis, 2009).

The maximum domination value is the number of items dominated by the top-1 (best) item. More formally, let us assign to each item t of the data set D a score, m(t), which equals the number of items that t dominates: Then, if p is the item with the maximum domination value we have:

An example is illustrated in Figure 1. A tourist wants to select the best hotel according to the attributes distance to the beach and price per night. The domination values of all hotels A, B, C, D, E, F, G, H, I, J are 0, 1, 0, 0, 2, 0, 4, 6, 2, 3 respectively, thus the hotel with the maximum domination value is H. This hotel is the best possible selection, whereas the next two best choices are hotels G and J.

Figure 1.

The hotel data set

In this work, we focus on estimating the maximum domination value and the skyline cardinality in multi-dimensional data under the dinstinct values and attribute independence assumptions. Estimating the maximum domination value and the skyline cardinality contributes in:

  • Optimizing top-k dominating and skyline algorithms,

  • Estimating the cost of top-k dominating and skyline queries,

  • Developing pruning strategies for these queries and algorithms.

Moreover, we show that the maximum domination value is closely related to the cardinality of the skyline set.

A preliminary version of this study appears in (Tiakas et al., 2010) where we have presented the estimation methods for the maximum domination value only. The current version is more complete with the following new material:

  • The proposed estimation method is further extended to provide efficient estimations for the skyline cardinality also,

  • Additional experimental results are presented to study the accuracy of all estimation methods (including the proposed) for the skyline cardinality,

  • A study for the eliminating dimension, i.e., the dimension after which there are no dominations between the items is presented.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing