Forecasting Post-Epidemic Air Passenger Flow Among Hub Cities in China Based on PLS and GA-SVR Model

For stakeholders and companies involved in civil aviation to plan and make wise decisions, accurate estimates of air passenger flow are necessary. The conclusions drawn from this model serve as an invaluable resource for pertinent choices. The top hub cities in mainland China’s air travel network are predicted using a variable weight combination model that combines PLS and GA-SVR. According to the test results, the model developed in this study improves prediction accuracy. This demonstrates how well detailed information about social and economic development can be gleaned from linear development patterns and nonlinear fluctuation rules. Predictions can be made with more accuracy and a better fit as a result. Over the next five years, it is predicted that over 300 million passengers will fly between the top hub cities in mainland China, an increase of 6.51% per year. The growth in passenger traffic varies significantly between different routes. The routes from Beijing to Shanghai and Shanghai to Shenzhen saw the most travellers, while the Beijing-Chengdu route saw the fastest growth in traveller numbers. The study’s findings provide useful advice for civil aviation businesses and people involved in their decision-making, fostering growth in the sector during the post-pandemic period.


Research Background and Significance
Global civil aviation has seen substantial expansion in recent years as a result of improvements in economic globalization and air transportation infrastructure.Since 1978, when China started its economic reform and opening up, the country's civil aircraft transportation has had a brisk yearly increase of 16.3%.According to L. Zhang et al. (2021), this growth rate is much larger than that of other modes of transportation.However, the unexpected COVID-19 epidemic in late 2019 had a significant global social and economic impact.Traffic restrictions have been in place all over the world for an exceptionally long time because of COVID-19, an illness that is spread by inhaling the virus.These limitations set off a series of events that affected many areas, including the global supply chain, and led to a severe economic downturn.The worldwide aviation industry as well as China suffered from the 3.4% decline in the global economy that occurred in 2020.
The COVID-19 crisis has had an effect on the aviation sector and has had a substantial impact on the passenger market for civil aviation.This has led to two significant obstacles for industry's growth in the post-epidemic age.Managing the complicated and uncertain business environment of civil aviation firms has gotten harder, according to Pereira et al. (2021).Second, because of sudden outside shocks, the industry is currently subject to severe regulations.According to Shahul Hameed et al. (2022), the increasing complexity will demand more flexible decision-making from companies.After the pandemic, L. Zhang et al. (2021) discovered sizable changes in the market for civil aviation passenger travel.In China, the scale and popularity of passenger air travel have altered, resulting in greater disparities across various times and areas and a much higher level of unpredictability.
Airlines and airports must be more prepared to handle the complex passenger market and postepidemic commercial climate if they are to successfully navigate it.Wierzbinski et al. (2023) states that precise market data collection and analysis are crucial for managing enterprise knowledge.This is a crucial element that aids companies in being more competitive and succeeding in the postpandemic market rivalry.In order to make adaptable strategic decisions and safeguard themselves from increasing business complexity, airlines and airports need be aware of their whole target market and each segment's capability.Additionally, those who work in the civil aviation sector (such as government representatives, regulators, financiers, etc.) need to be fully aware of the industry's projected development rate and size.With the use of this information, they may assess the value of their assets, come up with investment plans, get funding, determine their financial capability, and choose whether or not to apply for government assistance.The implementation of the "New Ten Principles" and other policies by the end of 2022 is anticipated to help the Chinese civil aviation sector enter a phase of recovery and high-quality development in 2023.To support and develop China's air passenger transport business, a strong air transportation network connecting key cities is essential.For Chinese aviation companies and stakeholders to support the recovery and development of the industry, accurate forecasting of the volume and growth of air travel is essential.
The ambiguity and instability of air travel make it difficult to predict the demand for civil aviation passenger transport after an epidemic.As a result, we are faced with uncertainty.According to research by Kumbure et al. (2022) and Wu and Xiong (2021), earlier forecast techniques that depended on traditional statistical models and a single model are unable to accurately reflect the current situation of the civil aviation industry.It is crucial to rely on models that are more adapted for complicated and quickly changing scenarios in order to perform future study.

Literature Review
Time series models, regression models, and machine learning techniques have all been used in previous studies on the forecasting of the air transportation market.However, with little study being done at the route level, the main emphasis of these studies has been on predicting the national and airport levels.Studies that examine the relationship between the expansion of the economy and the volume of air travel, however, tend to concentrate on only one aspect of the economy as the driving force behind changes in the volume of air travel.
Several researchers have examined the general trajectory of national demand in the passenger and cargo transportation market on a national scale.For instance, Chen and Li (2022) employed the extreme learning machine and particle swarm optimization algorithm to forecast the volume of passenger traffic in China's civil aviation sector.In their study, Y. Zhang et al. ( 2021) employed a comprehensive model that integrated various forecasting techniques, including the gray time series model, gray relational system model, exponential smoothing model, autoregressive moving average model, multiple linear regression model, and partial least square model.The objective was to predict the demand for China's air transportation market, specifically focusing on aircraft demand.Shen et al. (2019) employed system dynamics methodology to forecast the aggregate demand of the Chinese air transportation market, considering both internal and external factors.Furthermore, certain scholars have directed their attention towards analyzing the trajectory of passenger and cargo throughput within airports or groups of airports.Liang et al. (2017) employed the fuzzy C-means algorithm to forecast the airport passenger throughput after conducting seasonal adjustment and noise decomposition of the data.Moreover, existing studies on air flow prediction at the route level primarily focus on investigating the factors that influence these flows.For instance, Li et al. (2020) employed a gravity model to forecast the market demand for air passenger transportation.
All the studies above describe a single economic scale as the background for changes in air travel passenger volume and report only a single total amount.Nevertheless, the COVID-19 pandemic has impacted not only on economic growth but also had significant chain reactions across the supply chain, social consumption, population, and other fields.Meanwhile, studies by Hanson et al. (2022) and Truong (2021) demonstrate that COVID-19's impact on air transportation is influenced by socioeconomic factors such as income and population status.Economic and social trends also affect air passenger flow, making it necessary to consider comprehensive development of the social economy in predicting post-epidemic air passenger flow.And should continue with a thorough description of trends and fluctuations.
This study reevaluates the relationship between air travel passenger numbers and general economic and social development, in contrast to prior literature, which concentrated on the national or airport economic scale.This study intends to produce more accurate forecasting results by providing a more thorough description of the socioeconomic dynamics surrounding changing air travel passenger counts.This study provides a straightforward method for predicting air passenger flow based on the economic and social development of the cities at both ends of the route.Applying it to different routes in mainland China allows us to eventually obtain insight into how the air transportation market is predicted to evolve among China's hub cities.

Research Methods
The general expansion of the social economy exhibits a linear trend with non-linear deviations.It is difficult to precisely discern the economic and social development law that both sets of information are trying to convey, nevertheless, when studying it holistically.Therefore, in order to extract, fit, and predict the two portions of the data, we must use a combination forecasting model.Due to its capacity to extract a wider range of original sequence information and improve prediction accuracy, the combined prediction model is used increasingly frequently.Bates and Granger made the original argument for the combined prediction model in 1969.By using a linear or nonlinear combination of the aforementioned models, the model efficiently makes use of the independent information that is contained in each individual model.By using posteriori knowledge to make adaptive adjustments, the variable weight combination model has the potential to improve its prediction accuracy in comparison to the fixed weight mixed model.Due to the variable weight combination model's ongoing optimization with the lengthening original sequence, it has become more widely used.Notable studies by Zhang (2018) and Zhang et al. (2022) have contributed to its popularity.Certain researchers in China have undertaken investigations into the application of a fixed weight combined model for the purpose of forecasting air passenger flows.Notable examples include the studies conducted by Y. Zhang et al. (2021) and Liang et al. (2017).According to Zhang (2018), the variable weight combined model is deemed more appropriate for predicting time series data due to the presence of both linear trend and volatility in the development of air passenger flows.Therefore, this study utilizes a flexible weight combination prediction model to fully grasp the link between the overall growth of social and economic development and air travel.
It is required to employ a linear regression model to measure the association between air passenger flow and the linear trend of economic and social growth.Most of the variables have similar development trends since the measurement of overall socio-economic development needs to include a number of linked elements, such as economic scale, population size, social consumption, and others.This study chooses the partial least squares regression model (hereinafter referred to as PLS) to extract the main characteristics of economic and social development, including the development of air transportation, in order to avoid the impact of multicollinearity on the accuracy of the model, and finally more accurately measure the relationship between air transportation passenger flow and economic and social development.
It is required to utilize a machine learning model that is better suited for complicated scenarios in order to extract the association between air passenger flow and economic and social volatility.Using tiny data sets, this study continues the projection for the annual air passenger flow.It is appropriate to use the support vector regression model (hereinafter referred to as SVR) in this study to describe the effect of the fluctuation law of social and economic development on air passenger flow because it is suitable for small data sets, has strong generalization ability, stability, and high prediction accuracy.This study employs a genetic algorithm (hence referred to as GA) to optimize the parameters for building an SVR model in order to get the best performance of fitting the present data.The best SVR model parameters are obtained as a result.
This study will create multiple city pairs of the four major hub cities in the multi-airport system of mainland China based on the construction of the variable weight combination forecasting model composed of PLS and GA-SVR, and use the model to predict the size of the air passenger transport market between the two cities in these cities.The analysis and discussion of the prediction outcomes is done towards the end.The hub cities listed above include Chengdu, Beijing, Shanghai, Guangzhou, Shenzhen, and Chongqing.
This study used a simulation method to develop a unique and efficient air passenger flow model.This research aims to inform government, airport, and aviation construction and operational strategies.

Model Building Method
This section describes the specific steps for building a model in this study.The flow chart of the model building method is shown in Fig. 1.The combination prediction model's implementation procedures are as follows: (1) The data must be preprocessed, divided, and standardized.

Variable weight Combined Model and weight Calculation
The utilization of the variable weight combined model is employed to address real-world issues.The model can be mathematically represented as follows: Let n represent the quantity of individual models and m denote the quantity of moments.The variable ŷt denotes the anticipated value of the integrated model at a given time t.The weight coefficient, denoted as w it , signifies the importance assigned to the individual prediction model i at time t.Meanwhile, the predicted value of the individual prediction model i at time t is represented by ŷt .The total weight coefficients of each individual model at each time interval equals 1.
The weight calculation method employed in this study is the reciprocal variance method.The weight in the reciprocal method of variance is determined by the sum of squares of prediction errors.The sum of squares of prediction errors is the sum of squares between the predicted values and the actual values provided by the fitting model.The accuracy of a model can be assessed by calculating the sum of squares of prediction errors.A higher value indicates lower accuracy.The weight coefficients of each individual model at each moment are determined through the following calculation: (2) In this context, y t represents the true value of time t, while Q it represents the sum of squared errors between the prediction result of a single prediction model i and the true value at time t.
Due to the inherent uncertainty in predicting future values, it is not possible to calculate the deviation between the predicted value and the actual value.To accomplish the objective of prediction, it is imperative to forecast the future weight based on the weight of each time period computed from the available data.If the weight derived from the available data does not exhibit a discernible time series pattern, the rolling calculation can be performed using the mean value of the existing weight Deng et al. (2022), this refers to the combination of historical data and predicted data to predict the future data.The calculation rule can be described as follows: (3)

PLS Model
The PLS model is an extension of the principal component regression model.The problem of multiple collinearities among independent variables in multiple regression analysis can be addressed by employing the principal component regression model.This model utilizes dimension reduction techniques to extract the principal components of the independent variables.Given the aforementioned circumstances, the PLS model considers the interdependence between the primary factors and the outcome variables.This characteristic enables the PLS model to effectively tackle the challenge of multiple regression in scenarios where the sample size is limited, there is a higher number of variables, and a substantial multicollinearity exists among all the variables.The sample size is limited as it was determined based on the annual sample selected for the purpose of conducting a long-term forecast in this study.A wide range of explanatory variables are selected, with a predominant focus on macroeconomic factors, resulting in comparable patterns of development.Based on the preceding information, it is expected that the presence of multiple collinearities among variables poses a concern.Consequently, the PLS model can be employed to delineate the trajectory of demand evolution in the realm of air passenger transportation.The realization method of the PLS model is as follows: (1) Following the process of standardizing the dependent variable y i (where i = 1, 2,..., k) and the independent variable x j (where j = 1, 2,..., l), the resulting dependent variable Y matrix and independent variable X matrix are constructed.
(2) The initial principal component axis vectors extracted from X and Y are denoted as p 1 and q 1 , respectively.The resulting first principal components generated by these vectors are denoted as u 1 and v 1 , where u X p = × , respectively.The optimization model is formulated in the following manner with the objective of maximizing the correlation between u 1 and v 1 , as well as u 1 and v 1 .This is done to ensure that u 1 and v 1 capture the maximum amount of variation information from the original sequence.

max ( , )
. ., (3) In the mathematical optimization problem, the Lagrange multiplier method is a method to find the extreme value of the multivariate function whose variables are restricted by one or more conditions.This method introduces a new scalar unknown number, namely Lagrange multiplier, which is the coefficient of each vector in the linear combination of the gradient of the constraint equation.By introducing the Lagrange multiplier and constructing the Lagrange function, it is possible to derive the result that p 1 represents the eigenvector associated with the maximum eigenvalue of X YY X T T , while q 1 represents the eigenvector associated with the maximum eigenvalue of Y XX Y T T . Through the implementation of supplementary computations, it is possible to ascertain the initial principal component pair, denoted as u 1 and v 1 .The regression equations for X and Y in relation to u 1 and v 1 can be expressed as follows: (4) The outcomes of the least squares method are as follows: The relationship between p 1 and c 1 can also be determined as follows: (5) Re-evaluate the values of X and Y by following steps (1) through (4), utilizing the components of X and Y that are not accounted for by the principal component u 1 (referred to as X 0 and Y 0 ) as the updated X and Y .The variable r represents the cumulative count of primary components that have been extracted.The initial values of X and Y can be expressed as follows, considering the aforementioned calculations.
The association between variable p i and variable c j can be characterized as: Therefore, matrices X and Y can be expressed as follows: Simultaneously, the aforementioned equation can demonstrate the correlation between X and Y .
(6) Through additional calculations, it is possible to derive the reductive regression equation from The regression coefficient, a i (where i = 1, 2, ..., k), represents the coefficient associated with each variable in the final reduction equation.

GA-SVR Model
SVR is an algorithm that has been developed based on the principles of support vector machine (SVM).The SVR model aims to achieve an optimal model by simultaneously minimizing the total fitting loss and maximizing the distance between the sample points and the hyperplane function.The SVR model is well-suited for analyzing small data sets due to its robust generalization capabilities, stability, and high accuracy in predicting outcomes.Therefore, it is deemed appropriate to employ the SVR model in this study to characterize the volatility of air passenger flows.The SVR model, which is fitted by a nonlinear function f x , can be mathematically represented as follows: Within this set, the penalty parameter is denoted as C, the tolerance parameter is represented by In equation ( 13), the variable n represents the quantity of support vectors.Given the solution to equation (13), it is observed that only the coefficient ( of the samples falling outside the interval band is non-zero.Consequently, these particular samples can be identified as support vectors, with their total count denoted as n. The construction of an SVR model often encounters a challenge known as the linear inseparability problem.This means that a data set cannot be classified by a linear classifier (straight line or plane), which is quite common in practical applications.To address this issue, it becomes imperative to employ a technique that involves mapping the two-dimensional samples to a higher-dimensional space.By doing so, it becomes feasible to utilize a hyperplane for the purpose of partitioning.To streamline the inner product operation within the mapping space, it is widespread practice to incorporate the kernel function into the model.This study introduces the utilization of the radial basis kernel function for constructing the model, resulting in the final decision function being expressed in the form of the radial basis kernel function.
The s represents the kernel parameter in the radial basis kernel function.
The performance of the SVR model can be enhanced by optimizing its parameters to achieve the most accurate fitting of the given data.Based on the aforementioned information, the model's adjustable parameters include the penalty parameter C, the tolerance parameter e , and the kernel parameter s .The utilization of GA is employed in this study for the purpose of parameter optimization.GA is capable of identifying the most optimal solution to a given problem through the simulation of the natural process of evolution, which involves selection, crossover, and mutation.The GA exhibits notable attributes such as rapid computational efficiency, exceptional precision, and seamless integration capabilities with other algorithms.Furthermore, given the limited sample size, the present study employs the cross-validation technique to enhance the robustness of the optimization procedure, thereby mitigating the risks of overfitting and underfitting.The parameters of the GA are configured according to the specifications presented in Table 1.

Influential Factor Selection and Variable Framework development
The present analysis synthesizes the findings of Daldoul et al. (2016), Gundelfinger-Casar and Coto-Millán (2017), Yan and Chai (2017), Huang et al. (2013), Hanson et al. (2022), Tirtha et al. (2023) .Collectively, these studies identify a total of fourteen primary factors that exert influence on the air passenger transportation market.Table 2 displays the variables that are pertinent to this study.

Sample Selection and data Processing
In 2019, the Beijing-Shanghai route exhibited the highest volume of air passenger traffic within China.This study examines the effects of the COVID-19 epidemic on the civil aviation industry, focusing on the period from 2019 to the present.To conduct empirical testing and comparison, the study specifically selects the air passenger flow data between Beijing and Shanghai from 1999 to 2019.The dependent variable data used in this study are sourced from the China Civil Aviation Statistical Yearbook spanning the years 2000 to 2020.We use this dataset because this long series provides a sufficiently large set of actual operational data covering all the necessary routes to estimate passenger flows.At the same time, the dataset is compiled by the Development Planning Department of the China Civil Aviation Administration, which provides a certain guarantee of data quality.The original calculated data pertaining to economic and demographic characteristics are obtained from the Beijing Statistical Yearbook and Shanghai Statistical Yearbook, also covering the period from 2000 to 2020.The statistical yearbooks of all cities are compiled by the local government statistics office and their data quality is dependable.
In the present study, the initial dataset undergoes pre-processing, wherein a set of rules are employed.
(1) The random missing data is imputed through the use of interpolation techniques.
(2) The utilization of the random sample partition function is employed for the purpose of dividing the training set and the test set.The test set comprises 20% of the total dataset, and a fixed random seed of fifteen is used.
(3) The data is standardized using the Z-score standardization method.Z-score standardization is a common method of data processing that involves dividing the difference between the measured value and the mean by the standard deviation.Through this process, data of different magnitudes can be transformed into unified Z-scores for comparison.

Prediction Results of PLS Model
Based on the modeling procedure of the PLS model, the initial step involves utilizing the training set to extract the principal components of the PLS model.This process results in the acquisition of three principal components.Subsequently, the dependent variables are employed for regression fitting and reduction calculation.Table 3 displays the coefficients and constants of the restored PLS regression equation.Table 4 displays the fitting and test results, along with their respective weights.

Prediction Results of GA-SVR Model
The initial step in the modeling and optimization of the GA-SVR model involves importing the training data and subsequently utilizing the GA to optimize the parameters.The modeling parameters that exhibit the highest fitness under cross-validation are chosen after undergoing numerous iterations.Through a series of extensive experiments, it has been determined that the penalty parameter C yields optimal fitness at a value of 5.61952194.Similarly, the tolerance parameter e demonstrates optimal fitness at a value of 0.23896516, while the kernel parameter s exhibits optimal fitness at a value of  0.00817128.The graph depicting the fitness curve is presented in Fig. 2. The SVR model was constructed using the aforementioned parameters for the purposes of fitting and testing.The outcomes and corresponding weight values are presented in Table 4.

Prediction Results of Variable Weight Combined Model
Based on the procedural guidelines of the variable weight combined model, the initial step involves determining the weights of each individual model at each given instance.This is accomplished through the utilization of the weight calculation formula in conjunction with the outcomes of each respective single model.Subsequently, the variable weight combined model is evaluated by computing the fitting and test results based on the assigned weights.These results are presented in Table 4. Fig. 3 shows the comparison between the predicted value and the actual value of the PLS, GA-SVR, and variable weight combined model, where the blue line indicates that the horizontal axis is equal to the vertical axis, and the five-pointed star indicates the relative relationship between the predicted value and the actual value.When the pentagram is closer to the blue line, the gap between the predicted value and the actual value is smaller.From the visualization results, it can be seen that the prediction effect of the variable weight combination prediction model is better.

Model Comparison
This study compares the coefficient of determination (referred to as R 2 ) and the mean absolute percentage error (referred to as MAPE) of each model to assess the level of agreement between the predicted and actual values.A higher R 2 indicates a greater level of concordance between the predicted value and the observed value.A decrease in the MAPE corresponds to a decrease in the level of error.Table 5 displays the R 2 and MAPE values for each model during both the training and testing stages.The variable weight combined model exhibits a higher R 2 value compared to alternative models, indicating a stronger correlation between the predicted and actual values.Additionally,

Predicting Future Values of Influential Factors
To accurately predict the future value, it is imperative to effectively forecast the factors that exert influence.The ARIMA model, which stands for autoregressive integrated moving average, is employed to identify the hyperparametric model that minimizes the information criterion.The prediction model is depicted in Table 6.

Predicting the Air Passenger Flows
The influencing factors' predicted values are inputted into two separate models for computation, resulting in the predicted values of each individual model.The weight is determined using formula (2), and the future value projected by the composite model incorporating the predicted values from each individual model can be obtained by combining them.Considering the significant ramifications of the COVID-19 pandemic on the growth and functioning of the civil aviation sector between 2020 and 2022, there was a substantial decline in business data.However, with the introduction of the "New Ten Principles" towards the end of 2022 and the complete resumption of China's international passenger flights on 8 January 2023, the civil aviation industry is expected to gradually recover and regain its pre-pandemic (2019) level of development starting from 2023.Hence, this research employs the year 2023 as the subsequent year following 2019 to acquire the projected value.The outcomes are presented in Table 7.
Table 6.Models used to predict future values of factors

Management decision Analysis and debate
To optimize the efficiency of construction and operation, as well as minimize resource mismatch and waste, it is imperative to align the diverse market demands with the varying capacity inputs of aviation departments and government planning.Among the predicted outcomes, the air passenger volume between Shanghai and Beijing is the highest, surpassing forty million individuals.The projected air passenger volumes for the Shanghai-Shenzhen, Beijing-Chengdu, and Shanghai-Guangzhou routes all exceed thirty-five million individuals.Additionally, the projected air passenger volumes for the Beijing-Guangzhou, Beijing-Shenzhen, Shanghai-Chengdu, and Shanghai-Chongqing routes all surpass twenty million individuals.Lastly, the projected air passenger volumes for the remaining five city pairs all exceed ten million individuals.In relation to the management of airspace resources and shipping capacity, it may be advisable for the government to contemplate elevating the anticipated air passenger flow between the specified city pairs that exhibit substantial predictability.This strategic approach aims to maximize the utilization of demand, establish routes of superior value and quality, and enhance the overall standard of the national route network.
By conducting comparative analysis of projected air passenger flows and the average annual growth rate for the years 2019 and 2027, it is possible to ascertain the strength of the growth momentum.Based on the product life cycle theory, the presence of robust growth potential indicates that the trajectory is currently situated within the growth phase of market expansion.Consequently, it is anticipated that the volume of air passenger traffic will sustain continuous growth over an extended period.Diverse growth expectations necessitate varying allocations of long-term investment between the government and enterprises.These investments may include government development planning, infrastructure construction, airline brand development, and the establishment of airline bases, among others.According to the projected outcomes, the air passenger traffic between Beijing and Chengdu exhibits the most substantial annual growth rate (10.28%) and growth volume (3,033,687).The mean yearly growth rate of the Shanghai-Chongqing and Chongqing-Guangzhou routes exceeds 8.5%.Therefore, the routes exhibit greater growth prospects, and their long-term investment and development yield more advantageous outcomes in enhancing the sustainable development capacity of the national route network quality.

CoNCLUSIoN
This study utilizes the principle of air passenger flow development between cities to establish a variable weight combined model, which integrates the PLS model and the GA-SVR model.The model comprehensively describes the linear development trend and fluctuation law of economy and society to predict the future air passenger traffic volume.The results show that the model constructed in this study is better than the weighted combination forecasting model and other traditional single models.
The empirical testing of the variable weight combined model is conducted using the Beijing-Shanghai air passenger flow data.The results indicate that both the R 2 and MAPE metrics are significantly improved compared to the individual models and fixed weight combined model.Hence, the optimization of the model has been effectively achieved, resulting in the development of an improved mathematical model for forecasting air passenger flows.This accomplishment offers a novel approach for predicting market demand in the air passenger industry.This shows that the information extraction of the comprehensive development of social economy from the two aspects of linear development trend and nonlinear fluctuation law can effectively improve the fitting effect of the final model and achieve more accurate prediction.This result provides a new way of thinking for the demand forecast of airline passenger transport market.
Secondly, this study applies the variable weight combination forecasting model to thirteen routes among the major hub cities in China and obtains the market demand forecast and expected growth rate of each route.And finally, the overall expected development of the air transportation market among the hub cities in mainland China is obtained.The results show that the passenger flow in the air transportation market between the hub cities in mainland China has an average annual growth rate of 6.51% from 2023 to 2027, with a total passenger flow of more than three hundred million.According to the analysis of this study, this is due to the fact that during the novel coronavirus epidemic, mainland China's economic growth trend remained positive and did not show negative growth, although its growth rate decreased and its aviation market continued to recover, so the recovery of Chinese passenger volume is relatively not surprising.
Then, this study analyzes the route-based market segmentation, which can help to reveal more information about the internal structure of the air passenger transport market among China's hub cities.The study employs a variable weight combined model to forecast the air passenger flows among six major hub cities in mainland China.The predicted value and anticipated growth rate are acquired.Based on projected air passenger flows, it is anticipated that the routes connecting Shanghai-Beijing, Shanghai-Shenzhen, Beijing-Chengdu, and Shanghai-Guangzhou will experience higher volumes of air passenger traffic from 2023 to 2027.Subsequently, the routes of Beijing-Guangzhou, Beijing-Shenzhen, Shanghai-Chengdu, and Shanghai-Chongqing are expected to exhibit moderate air passenger flow volumes.In terms of average annual growth rate, the Beijing-Chengdu route is projected to have the highest rate of growth in air passenger flow, followed by Shanghai-Chongqing and Chongqing-Guangzhou.Hence, considering the routes exhibiting robust market demand and growth potential, it is advisable for the government, airline companies, and airports to allocate additional resources towards thoroughly investigating the market demand within these cities.This strategic approach will facilitate the advancement of a high-caliber and sustainable route network.
Finally, it would be useful to explore whether and to what extent our findings are applicable to other countries.For example, the American aviation market, which is also affected by novel coronaviruses, is also influenced by the overall economic and social environment of the United States, and the forecast of passenger flows on American routes is also applicable to the model built in this study.

LIMITATIoNS ANd FUTURE RESEARCH
This study has several limitations that provide opportunities for further research and improvement.First, further research and exploration of passenger flow forecasts for air transportation markets in other countries may confirm and extend our results.Second, there may be differences in air passenger flow affected by the linear development trend and fluctuation law of economic society.Future research can explore the different degrees of impact of linear development trends and fluctuation law on air transport passenger flow in different situations.Third, although our samples come from several routes, they are only located in one country.Therefore, our samples may have limitations and need to be studied on a larger scale, including cross-regional, market size and hub city level routes, which can further deepen and verify our findings.
Several issues related to our research have not yet been explored.First, the study can further discuss the demand for diverse types of aircraft in the future aviation market based on the forecast of air traffic passenger flows.The experiences and lessons learned from these studies can help airlines and stakeholders to better understand how passenger flows lead to investment and financing capacity and operating costs of companies.Second, it is useful to conduct some more detailed studies, such as discussing and measuring the importance of various economic and social factors in the forecast, which may help airlines, airports, and stakeholders to judge the development trend of air passenger flow quickly and accurately by observing the changes of decisive factors in the business environment when economic fluctuations occur in the future.Thus, it is especially important to make more flexible strategic decisions and value assessment in the post-epidemic era with high uncertainty.

ACKNowLEdGMENT
(2) Incorporate training data into PLS model, extract principal components for regression, and obtain fitting model, fitting outcomes, and test results.(3) Use cross-validation and GA to obtain the best values for the penalty, tolerance, and kernel parameters after incorporating the training data into the SVR model.(4) Incorporate the training data into the SVR model to obtain test and fitting results.(5) Calculate the single model's parameters for each time point, incorporate the single model's results into the variable weight combined model along with them, obtain the variable weight combined prediction model's fitting and test results, and then restore the results in accordance with the initial standardized path.(6) Determine the evaluation index value of each model's predictive impact and conduct a comparative study.

Figure 1 .
Figure 1.Model construction flow chart

T
represent the regression coefficient matrices associated with the primary components u 1 and v 1 , respectively.Given the correlation between u 1 and v 1 , it becomes feasible to establish the association between X and Y by examining the association between u 1 and Y .This can be expressed as follows: max

FigureFigure 3 .
Figure 2. Fitness curve This work was supported by the Major Project of Humanities and Social Sciences of Tianjin Education Commission (Grant No. 2018JWZD52).Tianjin Research Innovation Project for Postgraduate Students" Theoretical Mechanism and Empirical Study on the Influence of Air Transport Network on Regional Coordinated Development" (Grant No.2022SKY168).
The utilization of the Lagrange multiplier technique is employed for solving the problem at hand.To accomplish this, non-negative Lagrange multipliers, namely a By performing the aforementioned calculation, we have successfully derived the dual problem and the ultimate decision function for SVR.
e , and the relaxation variables are denoted as x i and x i * .i , a i * , b i , and b i * , are introduced.Subsequently, the Lagrange function is formulated, and its partial derivatives with respect to w , b , x i , and x i * are equated to zero.

Table 2 . Variable names and calculations
Note: The variable superscripts 1 and 2 are used to denote two distinct cities within a given city pair.

Table 5 . Evaluation index results of models Models Index Evaluation of fitting results Evaluation of test results
variable weight combined model demonstrates a lower MAPE in comparison to other models, suggesting a higher level of accuracy in predicting the target variable.Simultaneously, the variable weight combined model exhibits superior generalization capabilities in comparison to the fixed weight combined model.Hence, this study has effectively achieved model optimization and developed an enhanced mathematical model for forecasting air passenger flows.In addition, the results also show that the information extraction of the comprehensive development of social economy from the two aspects of linear development trend and nonlinear fluctuation law can effectively improve the fitting effect of the final model and achieve more accurate prediction. the