A Tour of Lattice-Based Skyline Algorithms

A Tour of Lattice-Based Skyline Algorithms

Markus Endres (University of Augsburg, Germany) and Lena Rudenko (University of Augsburg, Germany)
DOI: 10.4018/978-1-5225-5396-0.ch006
OnDemand PDF Download:
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

A skyline query retrieves all objects in a dataset that are not dominated by other objects according to some given criteria. There exist many skyline algorithms which can be classified into generic, index-based, and lattice-based algorithms. This chapter takes a tour through lattice-based skyline algorithms. It summarizes the basic concepts and properties, presents high-performance parallel approaches, shows how one overcomes the low-cardinality restriction of lattice structures, and finally presents an application on data streams for real-time skyline computation. Experimental results on synthetic and real datasets show that lattice-based algorithms outperform state-of-the-art skyline techniques, and additionally have a linear runtime complexity.
Chapter Preview
Top

Introduction

The Skyline operator (Börzsönyi, Kossmann, & Stocker, 2001) has emerged as an important and very popular summarization technique for multi-dimensional datasets. A Skyline query, also known as Pareto preference query, selects those objects from a dataset D that are not dominated by any others. An object p having d attributes (dimensions) dominates an object q, if p is better than q in at least one dimension and not worse than q in all other dimensions, for a defined comparison function. This dominance criterion defines a partial order and therefore transitivity holds. The Skyline is the set of points which are not dominated by any other point of D. Without loss of generality, the Skyline with the min function for all attributes is used in this chapter.

The most cited example on Skyline queries is the search for a hotel that is cheap and close to the beach (Börzsönyi, Kossmann, & Stocker, 2001). Unfortunately, these two goals are complementary, as the hotels near the beach tend to be more expensive. In Figure 1 each hotel is represented as a point in the two-dimensional space of price and distance to the beach. Interesting are all hotels that are not worse than any other hotel in both dimensions.

The hotels p6, p7, p8 are dominated by hotel p3. The hotel p9 is dominated by p4, while the hotels p1, p2, p3, p4, p5 are not dominated by any other hotels and build the Skyline. From the Skyline, one can now make the final decision, thereby weighing the personal preferences for price and distance to the beach.

Figure 1.

Skyline example

978-1-5225-5396-0.ch006.f01

Algorithms of the block-nested-loop class (BNL) (Börzsönyi, Kossmann, & Stocker, 2001) are the most prominent algorithms for computing Skylines. In fact, the basic operation of collecting maxima during a single scan of the input data can be found at the core of several Skyline algorithms, cp. (Godfrey, Shipley, & Gryz, 2007; Chomicki, Ciaccia, & Meneghetti, 2013). There are also algorithms utilizing an index structure to compute the Skyline, e.g., (Tan, Eng, & Ooi, 2001; Papadias, Tao, Fu, & Seeger, 2003; Lee, Zheng, Li, & Lee, 2007; Endres & Weichmann, 2017). However, index-based algorithms are not capable of processing arbitrary data without any preparations.

Complete Chapter List

Search this Book:
Reset