Article Preview
Top1. Introduction
Affected by the global economic environment and market competition, airline's main ticket business revenue has gradually decreased with the continuous decline in ticket prices, and many airlines have begun to generate revenue by increasing the added value of products and developing ancillary services to ease the financial pressure on their operations. In addition, coupled with the huge impact of the novel coronavirus pneumonia on global air passenger demand, better performance of ancillary revenues can help airlines survive the epidemic crisis to some extent. The “paid seat selection service”, one of the new domestic add-on services, has brought the airlines considerable profits due to its almost zero marginal expenditure. Since ancillary products and services are optional, so it is very important for airlines to understand passengers' willingness to pay and let more passengers choose this service.
Over the years, A great deal of exploration has been done in the identification of customers' willingness to pay for various ancillary services, and machine learning-based willingness identification methods have been found to be more advantageous than traditional statistical methods (Jing et al., 2021; Maliah & Shani, 2021). By studying and analyzing the travel purpose of the ticket-purchased passengers, mining the behavioral characteristics of the known paid seat-selecting passengers and constructing their behavioral models (Borisyak et al., 2020; J. Pang, Chen, Li, Xu, & Lin, 2021), it is possible to identify the passengers who may have similar willingness and ability from the total number of passengers, so as to achieve increased revenue from accurate marketing at a lower cost.
The current research on the travel behavior of civil aviation passengers is divided into two main directions: passenger behavior segmentation (Pan & Truong, 2021) and passenger value calculation (Nakahara & Yada, 2011). Most airlines subdivide passengers according to the fare of the ticket they purchased or the accumulated mileage distance. The RFM model proposed by marketing expert Bob Stone can be used to quantify customer value, many scholars (Wu et al., 2020; Wu et al., 2021; Zong & Xing, 2021) use RFM and improved k-means clustering to divide passenger groups. However, such segmentation method only discovers the value of passengers and does not point out the behavioral characteristics of passenger groups. The three attributes of traditional RFM model do not fully reflect the passenger behavior preferences, so on this basis, the LRFMC model and the TCSDG model have been successively evolved. Combining the improved model with classical machine learning classification algorithms such as SVM, KNN, GBDT and NN, etc., it is widely used in passenger behavior prediction (S. Q. Pang & Liu, 2011), personalized recommendation (Tao, 2020), flight delay prediction (Jiang, Liu, Liu, & Song, 2020) and other aviation fields in business. However, experimental studies on identifying passengers' willingness to choose a seat for a fee are still limited.
The biggest challenge in the field of air passengers' pay-for-seat willingness identification is its noisy, latitudinal, and uneven distribution due to environmental and recording influences when collecting data. Therefore, this paper proposes an improved XGBoost-based method for identifying airline passengers' willingness to take paid seats based on high-dimensional imbalance data, which conducts integrated classification prediction from three stages of data, features and algorithm.
The main contributions made in this paper are shown below:
- 1.
Data preprocessing: Random undersampling of the majority class samples and generation of new samples to supplement the minority class samples using CGAN, so as to solve the serious imbalance of passenger order data.
- 2.
Feature selection: Then combining chaos theory, introducing nonlinear convergence factors and adaptive weights, improving the whale optimization algorithm based on the opposing learning strategy to find the optimal feature subset and reduce the dimensionality of the traveler data set.
- 3.
Algorithm improvement: A new gradient harmonizing mechanism (GHM) is introduced to improve the loss function of XGBoost, and the genetic algorithm is used to optimize the parameters of XGBoost to obtain the final willingness recognition model.