Recognition of Air Passengers' Willingness to Pay for Seat Selection for Imbalanced Data Based on Improved XGBoost

Recognition of Air Passengers' Willingness to Pay for Seat Selection for Imbalanced Data Based on Improved XGBoost

Baiyu Hong, Xiaolong Ma, Weining Tang, Zhangguo Shen
DOI: 10.4018/IJCINI.312249
Article PDF Download
Open access articles are freely available for download

Abstract

Passenger-paid seat selection is one of the important sources of ancillary revenue for airlines, and machine learning-based willingness-to-pay identification is of great practicality for airlines to accurately tap potential willing passengers. However, affected by periodic statistical errors, air passenger order data often has some problems such as high noise, high latitude, and unbalanced category. In view of this, this paper proposes a method for identifying air passengers' willingness to pay for seat selection based on improved XGBoost, which is improved and integrated from three stages: data, feature, and algorithm. The feasibility of the proposed multi-stage improved integration method is verified by real airline passenger dataset, and the experimental results show that the proposed improved method has better classification effect when compared with the classical six imbalance classification models, which provides a basis for accurate marketing of airline paid seat selection programs.
Article Preview
Top

1. Introduction

Affected by the global economic environment and market competition, airline's main ticket business revenue has gradually decreased with the continuous decline in ticket prices, and many airlines have begun to generate revenue by increasing the added value of products and developing ancillary services to ease the financial pressure on their operations. In addition, coupled with the huge impact of the novel coronavirus pneumonia on global air passenger demand, better performance of ancillary revenues can help airlines survive the epidemic crisis to some extent. The “paid seat selection service”, one of the new domestic add-on services, has brought the airlines considerable profits due to its almost zero marginal expenditure. Since ancillary products and services are optional, so it is very important for airlines to understand passengers' willingness to pay and let more passengers choose this service.

Over the years, A great deal of exploration has been done in the identification of customers' willingness to pay for various ancillary services, and machine learning-based willingness identification methods have been found to be more advantageous than traditional statistical methods (Jing et al., 2021; Maliah & Shani, 2021). By studying and analyzing the travel purpose of the ticket-purchased passengers, mining the behavioral characteristics of the known paid seat-selecting passengers and constructing their behavioral models (Borisyak et al., 2020; J. Pang, Chen, Li, Xu, & Lin, 2021), it is possible to identify the passengers who may have similar willingness and ability from the total number of passengers, so as to achieve increased revenue from accurate marketing at a lower cost.

The current research on the travel behavior of civil aviation passengers is divided into two main directions: passenger behavior segmentation (Pan & Truong, 2021) and passenger value calculation (Nakahara & Yada, 2011). Most airlines subdivide passengers according to the fare of the ticket they purchased or the accumulated mileage distance. The RFM model proposed by marketing expert Bob Stone can be used to quantify customer value, many scholars (Wu et al., 2020; Wu et al., 2021; Zong & Xing, 2021) use RFM and improved k-means clustering to divide passenger groups. However, such segmentation method only discovers the value of passengers and does not point out the behavioral characteristics of passenger groups. The three attributes of traditional RFM model do not fully reflect the passenger behavior preferences, so on this basis, the LRFMC model and the TCSDG model have been successively evolved. Combining the improved model with classical machine learning classification algorithms such as SVM, KNN, GBDT and NN, etc., it is widely used in passenger behavior prediction (S. Q. Pang & Liu, 2011), personalized recommendation (Tao, 2020), flight delay prediction (Jiang, Liu, Liu, & Song, 2020) and other aviation fields in business. However, experimental studies on identifying passengers' willingness to choose a seat for a fee are still limited.

The biggest challenge in the field of air passengers' pay-for-seat willingness identification is its noisy, latitudinal, and uneven distribution due to environmental and recording influences when collecting data. Therefore, this paper proposes an improved XGBoost-based method for identifying airline passengers' willingness to take paid seats based on high-dimensional imbalance data, which conducts integrated classification prediction from three stages of data, features and algorithm.

The main contributions made in this paper are shown below:

  • 1.

    Data preprocessing: Random undersampling of the majority class samples and generation of new samples to supplement the minority class samples using CGAN, so as to solve the serious imbalance of passenger order data.

  • 2.

    Feature selection: Then combining chaos theory, introducing nonlinear convergence factors and adaptive weights, improving the whale optimization algorithm based on the opposing learning strategy to find the optimal feature subset and reduce the dimensionality of the traveler data set.

  • 3.

    Algorithm improvement: A new gradient harmonizing mechanism (GHM) is introduced to improve the loss function of XGBoost, and the genetic algorithm is used to optimize the parameters of XGBoost to obtain the final willingness recognition model.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 1 Issue (2022)
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing