Article Preview
TopIntroduction
Skyline (Borzsony, Kossmann, & Stocker, 2001) algorithm has been widely concerned since it was proposed, and it has been widely used in the field of recommendation strategy. Due to the limitation that skyline algorithm can't set query points individually, its variant dynamic skyline query emerges as the times require. However, the main function of skyline and dynamic skyline(Zhou, Qin, Zhang, & Jian-Qiu, 2019) is to find the target point according to the query point. For example, when a diner wants to find the most favorite restaurant to eat, he can find the most qualified restaurant by using his demands as the query point. But what if the query is not from a diner but a restaurant? How do restaurant owners find potential diners? Reverse skyline algorithm is proposed to solve this problem. Reverse skyline algorithm takes the information attribute of the businesses as the query point and the potential user as the output point. This can help businesses to find potential customers, so as to target advertising and attract target customers.
With the rapid growth of data in recent years, the traditional reverse skyline query algorithm based on single machine(Rajba, 2021) processing mode is no longer practical. Therefore, it is vital to improve and apply the Reverse Skyline query algorithm in a distributed environment (Gaj et al., 2013). The MapReduce(Dean & Ghemawat, 2008) design pattern effectively implements parallel processing of big data under distributed environment by using the idea of partition(Ning, Sun, Zhao, Xing, & Li, 2020). However, MapReduce was designed to store the data without moving(Ramanathan & Latha, 2019), as well as the subsequent data manipulation. Each query will inevitably result in the overall traversal of the data. Quantities of IO transmission in the shuffle process will consequently ruin the query efficiency. Therefore, how to take advantage of MapReduce and avoid traversal is a matter of high importance to design and implement an efficient, stable, universal, and well-suited reverse skyline query algorithm for distributed environments in the contemporary big data environment.
Most of the previous skyline query algorithms based on the pruning strategy(Krishnamoorthy, 2015) are difficult to deal with the situation of a large number of data(Islam, Rahayu, Liu, Anwar, & Stantic, 2017). This situation is worse in reverse skyline query, where the amount of calculation is generally greater than skyline query. This paper defines the filter set to rule out a large portion of the data points and cut the cost of subsequent operations. The window query(Jing, Xin, & Liu, 2005) is a necessary and sufficient condition to judge whether a data point is the reverse skyline point of the query point. This concept is widely used in the reverse skyline query algorithm. Create a window for data points and query points. If there are no other data points in the window, the point becomes the reverse skyline point of the query point. The decision set is defined to reduce the cost of window queries on the remaining points from the filter set. In turn it cuts the consumption of resources to traverse on the remaining points.
Based on the existing research in the industry(Kim & Oh, 2015), the authors make an optimization on the Reverse Skyline query algorithm aiming at efficiency, real-time, reusability and combination with index structure. The main contribution is as follows: