Prediction of Ultimate Bearing Capacity of Oil and Gas Wellbore Based on Multi-Modal Data Analysis in the Context of Machine Learning

As important research for drilling engineering, the prediction of oil and gas shaft lining conditions is changing from the traditional method based on the mechanism model to the intelligent prediction method combining the mechanism model with the data model. Therefore, this paper establishes a stacking integrated model for predicting the uniaxial compression strength (UCS) of rock based on four basic parameters that can reflect the characteristics of rock mass. At the same time, the expectation-maximation (EM) algorithm is used to optimize the hidden Markov models (HMM), and a fuzzy random model of the ultimate bearing capacity of oil and gas shaft lining is established. The uncertain distribution of main parameters of rock mass is analyzed, and the corresponding fuzzy random distribution law is obtained. The experimental results show that the stacking integration algorithm is of great help to improve the prediction effect of rock mass compressive strength. The EM-HMM model has the advantages of small error, high efficiency, and fast convergence after two fuzzy random processes. Using this algorithm is helpful to analyze the stress state and parameter response mechanism of the shaft lining, dynamically generate optimized parameters, and provide technical support for reducing the incidence of complex drilling accidents, shortening the well construction period and lowering the drilling cost.


INTRODUCTION
In recent years, with the development of big data and artificial intelligence technology, drilling decision-making has gradually evolved from being experience driven and logic driven to data driven.Therefore, marine development has made great progress in drilling, thereby promoting the overall efficiency of exploration and development.At present, however, the problem of wellbore instability is still outstanding, and especially for some complex blocks, the phenomenon of sticking and falling blocks still occurs frequently during drilling.Fixing this problem requires reaming, circulation, and other operations, and the reaming operation is difficult, resulting in an increase in the nonproduction time and leading to a rise in production costs (Heydari et al., 2022).Pre-drilling prediction and realtime evaluation of wellbore stability during drilling, as well as countermeasures from drilling fluid performance, drilling engineering operation, and other aspects, help reduce drilling complexity and accident risk.These measures help companies meet the important demand for further cost reduction and increased efficiency in oil and gas development.
In the process of oil and gas drilling, energy companies face a prominent problem of wellbore instability, which seriously affects the timeliness of drilling and restricts the improvement of economic benefits of enterprises.The development of technology at home and abroad tends to be technical innovation and digitization.It is an extremely important development direction at present to build a digital technology system that serves the cost reduction and efficiency improvement of oil fields by using massive data (Yang et al., 2022;Pei et al., 2022).If big data analysis (for mechanical parameters) and artificial intelligence technology (neural networks, genetic algorithms, etc.) can be used to fully mine and analyze the aforementioned data, then providing new solutions to complex problems such as wellbore instability during drilling and even making major breakthroughs in some fields is possible (Wang et al, 2022;Yilmaz et al., 2020).Gu et al. (2022) proposed a method of predicting the stability of directional wells while drilling according to the principle of rock mechanics and seismic inversion.Through real-time analysis of leakage logging data by layered modeling of neural networks, the borehole wall stability in front of the drill bit is predicted while drilling by using seismic inversion wave impedance data.Jing et al. (2017) used elastic wave theory to analyze the influence of density, stress, strain, and other parameters on the velocity of vertical and horizontal waves; they proposed that lithology, saturation state, and stress state were key factors.Domestic research on wellbore stability is based on conventional logging, mud logging, and seismic data.Establishing the quantitative relationship between various obtained data and wellbore stability parameters enables various algorithms, mathematical models, and physical models to be used to predict wellbore stability.Among these models the prediction model of wellbore stability is mainly based on the rock mechanics model, although there are a few methods to predict wellbore stability using intelligent algorithms.Predicting rock mechanical properties before drilling enables substituting parameters, such as rock mechanical properties and in-situ stress, into the mechanism model to calculate formation collapse pressure and a formation fracture pressure profile.Setting up these parameters optimizes the machine learning algorithm and establishes an intelligent prediction model of well collapse and lost circulation driven by data and mechanism before drilling, thus enabling a better evaluation result of wellbore stability to be obtained (Jin et al., 2022).
In summary, the prediction of oil and gas well wall state based on machine learning has certain advantages.Machine learning transforms oil and gas drilling prediction from the traditional method based on the mechanism model to the intelligent prediction method integrating mechanism model and data model.In this paper my main contribution is explaining these three processes: • Establishing a Stacking integrated model for predicting rock uniaxial compressive strength (UCS) based on four basic parameters that can reflect rock characteristics-porosity, Schmidt rebound number, longitudinal wave velocity, and point load strength.• Using the maximum expectation algorithm (EM) to optimize the HMM algorithm and construct the fuzzy random model of the ultimate bearing capacity of oil and gas well wall.• Analyzing the uncertain distribution of the main parameters of rock mass, to obtain the corresponding fuzzy random distribution law, which is helpful to analyze the stress state of the borehole wall and the parameter response mechanism, dynamically generate optimization parameters, and recommend them to field personnel and land personnel for comprehensive decision-making by the relevant person in charge.

Prediction of Rock Mass Strength
UCS is widely used in rock mass engineering stability analysis (Khan et al., 2022).Wong et al. (2017) studied the relationship between UCS and point load index, longitudinal wave velocity and Schmidt hardness rebound number; they obtained the empirical regression equation by linear fitting or power function fitting.The empirical formula method is simple and practical to predict the UCS, but the factors affecting the UCS are not comprehensive enough, with a large degree of one-sidedness and uncertainty.Moreover, the basis and rationality of the selected indexes have not been systematically demonstrated, so it is difficult to popularize and apply them in engineering practice.Artificial intelligence technology provides an alternative method for the prediction of UCS that has achieved good results in recent years.
Techniques such as simple regression, multiple regression, artificial neural network, fuzzy inference, and adaptive neurofuzzy inference system have been successfully applied to the prediction of UCS (Kamgue et al., 2019;Mohamadian et al., 2021).For example, Ceryan et al. (2013) used an artificial neural network and multiple linear regression to predict the UCS of carbonate rocks.Okkan et al. ( 2020) established the prediction models of UCS and tensile strength of shale by multiple linear regression and least square support vector machine, respectively.However, the generalization ability of a single model is often weak, the prediction risk is high, and the prediction effect is uneven.

Intelligent Monitoring of Shaft Lining
At present, researchers mainly use data mining methods to predict wellbore stability, including BP neural network algorithm, error back propagation neural network algorithm, support vector machine (SVM), functional network (FN), adaptive neuro-fuzzy inference system, and other methods, these methods can achieve good accuracy, but they are not universal.Jahanbakhshi et al. (2012) put forward a prediction method of wellbore stability based on an artificial neural network that takes 23 effective parameters, such as in-situ stress, drill string performance, drilling operation level, geological conditions, and drilling fluid properties, as input and wellbore stability or instability as output.Carpenter et al. (2014) used a random method to study typical fracture and collapse models; they considered the uncertainty of the input data (in-situ stress, rock strength data, and pore pressure) of the model and put forward an analysis method of wellbore stability based on uncertainty.Tariq (2017)

Prediction Model of Rock Mass Strength Based on Stacking
Stacking is a typical representative of learning method integration strategy.The output results of several basic learning models are used as inputs, and the next layer of models are trained so that the cascading of models is realized.The output results of the last layer of models are used as the final results.The basic learning model is the most important part of the Stacking integrated learning framework.In this paper I adopt seven basic learning models based on different regression strategies-namely, polynomial regression, Ridge regression, Lasso regression, Decision tree, Gradient lifting, adaptive lifting, and XGBoosting.These regression models are good at simulating the nonlinear relationship between input and output; they also show good prediction performance in practical applications.
Stacking is a hierarchical model integration framework.The integration framework of this study consists of two layers.The first layer consists of seven basic learning models.The second layer adopts the polynomial regression model to prevent overfitting.Combined with the 5-fold cross-validation method, each basic learning model is trained separately.The operation flow of the Stacking integrated model is shown in Figure 1.
To set up the integration framework shown in Figure 1, complete these steps: Step 1: Divide the training set into five subsets, which are named P1-P5. Step

Stress Distribution
The stress distribution of rock mass around the borehole wall is simplified to a thick-walled cylinder model as shown in Figure 2 for analysis.The inner diameter and outer diameter of the thick-walled cylinder are a and b , respectively, the internal pressure is P i , the external pressure is P e , and the vertical force is F .According to the theory of elasticity, the stress state of the cylinder in the polar coordinate system at any radial distance r is shown in equation ( 1 In this equation, r is the polar radius, s rr is the radial stress, σ θθ is the circumferential stress, and τ θ r is the shear stress.The vertical stress s z is calculated according to the formula show in equation (2): (2)

HMM Model
HMM is a dual stochastic process whose state can be obtained by implicit deduction of vector (Ju, 2018;Zhou et al., 2023) , , ,  , and the state at time t is q S t Î .The states can be transferred from one another.
For the state transition matrix, the state matrix A = a ij N N ( ) × describes the transition between states, and a ij is the probability of state transition.
For the observed values of the model, let the set of observations , , ,  .The model produces an observable output y V t Î 0 when the state transition at time t is complete.For the probability distribution matrix of output, the probability distribution function matrix represents the probability that the output is v j when the state is s i at time t.
For initial state distribution, let p p p p , , ,  n be the initial state distribution of the model, where, p i i P q s = = ( )

1
. Therefore, the HMM can be represented by λ π

EM Optimization Process
The EM is adopted to improve the traditional mining method.The whole EM algorithm includes the following steps: Step 1: According to the initial values of parameters or the model parameters of the last iteration q n ( ) , calculate the maximum likelihood estimate as shown in equation (3): Step 2: With fixed Q z ( ) , when the likelihood of data is maximum, the expected estimation of parameters is calculated as shown in equation (4): Step 3: Repeat the above steps until the value of  q i + ( ) − 1 q i ( )  is small enough to stop the iteration.
Then, the minimum and maximum values of the average value calculated by the empirical formula of ultimate bearing capacity of shaft wall can be expressed as shown in equation ( 5): In equation ( 5), INF (⋅) and sup(⋅) are the minimum and maximum values, respectively, of the truncated set interval under α level, and the standard deviation of the empirical formula of ultimate bearing capacity is expanded into Taylor series as shown in equation ( 6): Substituting equation ( 6) into equation ( 7) enables the prediction model of shaft lining ultimate bearing capacity to be obtained: Combining the fuzzy random distribution of material properties, geometric parameters, and calculation modes in a big data environment enables a prediction model of ultimate bearing capacity of shaft lining to be established.The algorithm flow is shown in Figure 3.

Experimental Settings
The data set includes porosity ( n ), Schmidt rebound number ( R n ), P-wave velocity (V p ), point load strength ( I s 50 ( ) ), and UCS ( R C ).The data set is divided into training set and testing set according to a 4:1 ratio.To test the prediction performance of the integrated model, I adopted four classical regression evaluation indexes-mean absolute error (MAE), root mean square error (RMSE), decision coefficient (R 2 ), and variant allele frequency (VAF).

Results and Discussion
A scatter diagram of predicted value and actual value of rock mass strength of the Stacking model is shown in Figure 4, and other performance evaluation indexes are shown in Figure 5.
All points in the scatter diagram of the seven kinds of base learning models are divergent, and the evaluation index ranges are RMSE 23-28, MAE 18-23, and VAF 73-90.All points in the scatter plot of the integrated model are convergent and close to the straight line with a slope of 1, with RMSE of 18.49, MAE of 14.81, R 2 of 0.88, and VAF of 88.24.Regarding the evaluation indexes, the prediction effect of the basic learning model is average, and its performance is poor, whereas the Stacking integrated model is outstanding in all aspects.This analysis reveals that the Stacking integrated model has stronger learning ability than the single model; it also has better prediction performance and generalization ability for UCS.Therefore, the Stacking integration algorithm is a great tool to further increase the accuracy of predictions of a single model, thus helping to provide more accurate structural parameters for the prediction model of ultimate bearing capacity of shaft lining.

Experimental Settings
Carrying out the prototype destructive test of high-strength reinforced concrete shaft lining is difficult, so according to the similarity theory and elastic mechanics equation, the reduced-size shaft lining structure model is often used for the corresponding experimental research.To ensure the similarity, a precision mold should be used to cast the borehole wall model before the test.After processing, carry out curing, and then carry out polishing on a grinder to ensure the smooth surface of the model.Paste several strain gauges on the same level of the model to record the strain value of reinforced concrete, and set a rubber ring seal on the upper and lower end faces of the loading pedestal to ensure the free sliding of the model in the radial direction.According to the structural characteristics and similarity theory of the shaft wall, determine the parameters of the shaft wall model.The outer diameter of the specimen is 925.0 mm, and the height is 562. 5 mm.Obtain the test value of the ultimate bearing capacity of the shaft lining through the model test.
A high-strength hydraulic loading device uses a high-pressure oil pump to apply horizontal oil pressure to simulate horizontal uniform ground pressure, and vertical bolts and cover plates are tightly constrained to ensure that the shaft lining model is always in a plane strain state.After preloading for three times, load by classification and grading, loading 0.5 MegaPascal (MPa) every 30 s and then stabilizing the voltage for 1-2 min before continuing loading.The strain value of reinforced concrete under each level of load is recorded with 2 MPa as the first level, and the load is monitored by sensors in real time until the shaft wall breaks to ensure that the test results and errors are within the specified range.

Results and Discussion
The above results found that the compressive strength, ratio of wall thickness to size, and reinforcement ratio have different effects on the ultimate bearing capacity of the shaft wall.Assuming that the two parameters are constant, continue the model test.The relationship among the parameters is shown in Figure 6.
The curve shows that to increase the shaft lining's bearing capacity, concrete's compressive strength is crucial, and the ultimate bearing capacity can be increased by about 1 MPa when the strength of shaft wall concrete increases by 0. 4%.However, the reinforcement ratio has the slightest effect on bearing capacity.When the reinforcement ratio is increased by 15%, the ultimate bearing capacity of the shaft wall can be increased by about 1 MPa.In the test process, although the parameters show a general influence law, at the same time, considering the fuzzy random characteristics in the construction process of deep alluvium, to design the structure of shaft lining economically and reasonably and calculate the ultimate bearing capacity to guide the engineering practice, conducting fuzzy random analysis of various parameters at first is necessary.

Verification of eM-HMM
On the Linux host with a Red Hat 9. 0 system, MATLAB 2016 A simulates the algorithm efficiency.Based on the test data of the ultimate bearing capacity of the aforementioned shaft lining model, the traditional HMM model and the EM-HMM are used for calculation and simulation, respectively.The comparison curve of algorithm efficiency is shown in Figure 7.
With the increase of the problem scale, compared with the traditional HMM algorithm, the error of the EM optimized algorithm is smaller and smaller, the operation efficiency is higher and higher, and the convergence speed is faster and faster.From the results analysis, the fuzzy random model of big data mining synthesizes various engineering fuzzy random factors, and these factors are based on the mining of a large number of engineering test data so that the ultimate bearing capacity of shaft lining is a generalized interval value.Although the error between the overall values and test values is not big, the representation form is more reliable and reasonable.The representation form also has more practical value in engineering.In addition, considering the fuzzy randomness of working conditions, the model analysis value is smaller than the experimental value as a whole, as shown in Figure 7, and the result is more in line with the engineering practice.Based on the fuzzy random distribution of material properties, geometric parameters, and calculation modes in a big data environment, the fuzzy random model of the shaft wall ultimate bearing capacity in big data mining is established.The example proves that the model is more reliable and reasonable and has more practical engineering value.

CONCLUSION
In this study, the UCS of rock is predicted based on the Stacking algorithm, and the HMM is optimized by the EM algorithm.The improved model is subjected to two fuzzy random processes, both of which can better meet the uncertain characteristics of the project than the original algorithm.The simulation results show that compared with seven basic learner models, the Stacking integrated model has the best generalization ability, and the prediction results on RMSE, MAE, and VAF indexes are the most reliable.These results are a great help to improve the prediction performance of a single model and the prediction effect of UCS, and thus, they help to provide more accurate structural parameters for the prediction model of ultimate bearing capacity of the shaft wall.Comparing the experimental value revealed that the EM-HMM has higher operating efficiency and that the training results are more in line with the engineering practice.

ACKNOWLeDGMeNT
I would like to thank the anonymous reviewers who have provided valuable comments on this paper.

CONFLICTS OF INTeReST
I declare there is no conflict of interest.

FUNDING AGeNCy
This research received no specific grant from any funding agency in the public, commercial, or notfor-profit sectors.
used three artificial intelligence tools (adaptive network-based fuzzy inference system [ANFIS], support vector machine [SVM], and fuzzy network [FN]) and selected volume density, neutron porosity, longitudinal wave velocity, and shear wave velocity to predict the failure parameters of carbonate rocks.Kamgue et al. (2019) established the strain hardening Mogi-Coulomb model based on the Mogi-Coulomb criterion, which was used to analyze the stability of the borehole wall.Tian Ye et al. (2018) established a calculation model of formation pressure based on logging data and rock mechanics parameters and indirectly obtained rock mechanics parameters from logging data, thus realizing the prediction of wellbore stability.

Figure
Figure 2. Stress distribution around shaft lining

Figure 4 .
Figure 4. Prediction results of stacking model

Figure 5 .
Figure 5. Model evaluation results under different evaluation indices

Figure 6 .
Figure 6.Relationship curve of borehole wall structure parameters

Figure 7 .
Figure 7. Training error of EM-HMM . The main components of the HMM model are model state, state transition matrix, observed values of the model, probability distribution matrix of output, and initial state distribution.