A New Progressive Method for Computing Skyline Queries

A New Progressive Method for Computing Skyline Queries

Zekri Lougmiri (Department of Computer Science, University of Oran1 Ahmed Ben Bella, Oran, Algeria)
Copyright: © 2017 |Pages: 21
DOI: 10.4018/JITR.2017070101
OnDemand PDF Download:
$37.50

Abstract

Skyline queries are important in many fields, especially for decision making. In this context, objects or tuples of databases are defined according to some numerical and non numerical attributes. The skyline operator acts on the numerical ones. The algorithms that implements this skyline operator are genrally of progressive or non progressive. The progressive ones return the skyline operator during its execution while non preogressive alogrithms return the result at the end of its execution. This paper presents a new progressive algorithm for computing the skyline points. This algorithm is based on sorting as a preprocessing of the input. The authors present new theorems for deducing promptly the first skyline points and reducing the candidate space. A new version of Divide-and-Conquer algorithm is used for computing the final skyline. Intensive experimentations on both real and synthetic datasets show that our algorithm presents best performance comparatively to other methods.
Article Preview

1. Introduction

In databases, every object is described by a set of attributes. This set induces that is impossible to make a total order on objects. In this context, skyline queries are used in order to locate the set of objects which are not dominated by any other objects. This type of queries is useful for multi-criteria decision-making. The skyline has been known as the Pareto front. It was defined in the context of mathematics from the works of the economist Vilfredo Pareto(Pareto, 1896). In 2001, it was adapted in the context of databases (Börzsonyi et al.,2001) and the skyline operator was defined. The dominance relation is defined as follows: Given a set S of points in a vector space of dimension d (d criteria or attributes). Let D = {d1, d2,…, dd} the set of all dimensions and consider two points or objects p and q in S. The dominance () according to D is defined as follows:

  • For 1 ≤ i;j≤ d, p q, if and only if {∀di∈D, p(i)≤ q(i)} and {∃dj∈D / p(j)< q(j)}.

When (p q) and (q p) we said that p and q are not dominated or are competitors.

From the set S, the skyline operator returns all points which are not dominated by any other object according to all dimensions D: SkyD(S) = {p∈S/p}.

The datasets, on which this operator applies, are either real or synthetic. The real datasets arise from real areas such as sports, banks, real estate agencies,…, while the synthetic data are products of programs which generate data following laws of statistics (Börzsonyi et al., 2001). Synthetic data are of three types. The first one defines the correlated data. This type represents an environment in which the points that are good in one dimension are also good in other dimensions. The second type contains the anti-correlated data where an environment in which the points that are good in one dimension are bad in one or all of the other dimensions. Independent data gives up the third kind where all attribute values are generated independently using a uniform distribution.

The algorithms that process the skyline should work progressively (Kossmann et al., 2002); that is, they must return the results to users as they detect them. This requirement is essential because users wish to receive quickly answers to their queries.

The contributions of this paper are presented below:

  • 1.

    In this paper, we give, in more details, our method for reducing the data space and the calculus of the key points Maxsys and Minsys (Zekri & Belaicha (a), 2015). We explain it on a three dimensions example;

  • 2.

    We give new results about our pruning method. These results include statistics about the first skyline points early detected in our method and about discarded points with respect to the set of overall points. This paper contains also new results, which we could not present in (Zekri & Belaicha (b), 2015). This publication was a short paper,(just six pages) with a special format, we gave only theroritical aspects of the elimination process with no pruning method results . The results in this paper concern real and synthetic datasets;

  • 3.

    We present a new method for computing the final skyline. This algorithm is of type Divide-and-Conquer. It can avoid duplicated tests from which DC (Börzsonyi et al.,2001) suffers. This new algorithm is called SRDS for Sorted and Reduced Data Space. We apply our algorithm on the reduced space obtained by the filtering method (Zekri & Belaicha(a), 2015);

  • 4.

    As BNL is defined as the best algorithm in the best cases, we compare it to SRDS in terms of time response according to cardinality and dimension variations;

  • 5.

    Experimentations on real and synthetic databases are also given.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2017)
Volume 9: 4 Issues (2016)
Volume 8: 4 Issues (2015)
Volume 7: 4 Issues (2014)
Volume 6: 4 Issues (2013)
Volume 5: 4 Issues (2012)
Volume 4: 4 Issues (2011)
Volume 3: 4 Issues (2010)
Volume 2: 4 Issues (2009)
Volume 1: 4 Issues (2008)
View Complete Journal Contents Listing