Neural Network-Based Prediction Model for Sites’ Overhead in Commercial Projects

Construction companies need to improve the accuracy of their projects’ budgeting to achieve the targeted profit. Site overheads are the expenses related to a project but are not allocated to a specific work package. The main objective of this research is to develop a neural network model for commercial projects to predict and estimate project site overhead costs as a percentage of the direct cost. The focal point of the research is focused on the main factors affecting site overhead costs for commercial projects in Egypt. These factors and their weights were identified by experts through the collected structured data. Cost data for 55 projects in the past seven years was collected with various conditions of company rank, direct cost, project duration, project location, contract type, and type of company ownership. The results have shown that the best model developed consists of six input neurons; two hidden layers with six and five neurons respectively, and one output layer representing the percentage of project site overhead. The model was tested on six projects with accuracy of 84%.


INTROdUCTION
The construction industry is one of the major sectors in the Egyptian economy, especially real estate & Commercial Buildings (Idrees, ElSeddawy, & Zeidan, 2019).The construction sector has a great impact on the Gross Domestic Product (GDP), it represents about 16.5% of the total Egyptian GDP.However, after the Egyptian revolution in 2011 and the floating exchange rate for the Egyptian pound in November 2016 followed by a high inflation rate, many sectors suffered due to the unstable economic situation accompanied by the political risks in Egypt (Idrees, El Seddawy, & EL Moaaz, 2019) (Khedr, Idrees, & El Seddawy, 2016).With all these factors into consideration, an accurate estimate for the project cost is needed by building construction companies in Egypt, and controlling project costs has become more important and with greater impact than before (Idrees, Alsheref, & ElSeddawy, 2019).Cost is considered one of the main three challenges that face building construction companies, where the success of any project is measured by its completion within the allocated budget, through the planned baseline, and with the desired quality.So an inaccurate estimation can easily lead to a cost overrun of the project, which is reflected on the company's profit.Accordingly, an accurate cost estimate in the early stages is considered a critical stage in producing a project cost, which allows contractors to evaluate the feasibility of the project (Hastak, 2015).
The absence of structured and accurate methods that can assess site overhead costs for commercial projects in Egypt put construction companies at risk of an inaccurate estimate of bid package that may affect the profit margin of the company (Khedr, El Seddawy, & Idrees, 2014).Most of the building construction companies find no difficulty in estimating the direct cost of a project, the inaccuracy appears in the estimating of the overhead costs, producing a cost variance between estimated budget and actual cost either cost over-run or a cost-saving (Hassouna, Khedr, Idrees, & ElSeddawy, 2020).
The objective of this research is to develop an artificial intelligence model based on Artificial Neural Network (ANN) that can enhance the contractor's ability to estimate site overhead cost as a percentage of the project direct cost in the commercial construction market in Egypt.This would improve the companies' performance in predicting overheads for the upcoming projects and increase the company competitive advantage by improving the bid accuracy and also will lead to: Help to control factors affecting site expenditures, create an information system and historical data for projects to improve the predictability of site overheads in future projects and Decrease time and effort spent during the process of site overheads estimation.
The research methodology followed main steps which are more clarified and discussed in details in the following sections, these steps are: discussing the literature review on construction cost categories and identify the main factors influencing site overhead cost in the construction industry, Conducting a survey with industry experts to identify the factors influencing site overhead in the commercial construction industry in Egypt, apply data collection of actual project data according to the concluded factors, perform sensitivity analysis of the collected project data to study the bounding relationship between each factor and site overhead percentage, design and develop an artificial neural network model, test and validate the model, and finally develop a graphical user interface for the developed model to be easily used.The remaining of the research provide more elaboration for each step in the research methodology.The study outline could be stated as follows: − Problem: The absence of structured and accurate methods that can assess site overhead costs for commercial projects in Egypt put construction companies at risk of an inaccurate estimate of bid package that may affect the profit margin of the company.− Aim: enhance the contractor's ability to estimate site overhead cost − Solution: Develop an artificial intelligence model based on Artificial Neural Network (ANN) that can enhance the contractor's ability to estimate site overhead cost as a percentage of the project direct cost in the commercial construction market in Egypt.− Methodology: Neural Network model − Benefit: More accurate estimation of the commercial projects' site overhead cost − Outcome: Improved performance

PReVIOUS wORK
The behavior of a project life cycle and its trends can be forecasted before the actual beginning of the project by using simulations and modeling.Using artificial intelligence like expert systems and neural networks has proven to help solve the prediction problems (Cheng, Tsai, & Sudjono, 2010).Artificial Neural Networks (ANNs) are data modeling methods that attempt to solve complicated issues by formulating data relationships; although there is no simple equation that can map relationships between data variables.ANNs (in methods supposed to be similar to the performance of the human brain) are beneficial where conditions are difficult to identify and have been commonly employed in construction applications like estimating productivity; predicting corporate bankruptcy and measuring project final performance.Indeed, several previous studies have sought to use neural networks to address construction estimation with varying degrees of success.
A study was conducted in 2011 discussing the application of neural networks for a parametric cost estimate of construction overhead costs in Egypt.The study used 52 projects that were executed through the period from 2002 to 2009 considering 10 factors affecting overhead costs.The developed network consisted of 10 input neurons and a single hidden layer with 13 hidden neurons and a sigmoid transfer function.This model has an accuracy of 80%.(ElSawy, Hosny, & AbdElRazik, 2011).In Egypt, Georgy and Barsoum developed an ANN model for the parametric cost estimate of school construction projects.They used statistical and neural network models for estimating costs; the research found was a neural network of a single hidden layer with a number of neurons equal to two-thirds of the number of neurons in the input layer would be sufficient.(Georgy M., 2005).In the Philippines, Lyne and Maximino developed an AAN to estimate the total structural cost of building construction projects in the Philippines based on six parameters which are: number of floors, number of basements, area of floors, concrete volume, area of formwork, and weight of reinforcement steel.The data used for the learning process consists of data from 30 different building construction projects; this data set was divided into 60%, 20%, and 20% for training, validation, and testing respectively.The resulted model architecture was a network of six input neurons (one for each input parameter), one hidden layer with seven neurons, and one output neuron representing the structural cost.The model was developed using MatLab with a feed-forward backpropagation technique (Roxas, Lyne, & Maximino, 2014).Other studies have been conducted (see summary in table 1).
More recent research as in (Mostofi, To˘gan, Ayözen, & Tokdemir, 2022) which compared a set of machine learning algorithms in predicting the impact of construction rework cost targeting to propose the highest performance algorithm.The research focused on construction field attributes which are design and construction related.Another research with both the same target and field was presented in (Car-Pusic, Petruseva, Pancovska, & Zafirovski, 2020) using neural networks for construction cost prediction.The research confirmed the time saving contribution and higher accuracy by applying neural networks.Neural network has also been applied to identify the affecting factors on the project performance in (elhegazy, badra, aboul haggag, & abdel rashid, 2022).The research identified that estimating team and schedule were two of the most effective parameters.Moreover, neural network proved its effectiveness in production cost prediction with high performance in (Wei, 2022).The previous studies confirms the positive contribution of applying neural networks in the cost prediction task in different fields, however, there are lack of research considering the commercial projects which have their specific attributes.Therefore, the current study shall cover the estimating process of site overheads for commercial projects field and in Egypt community in specific aiming to focus on its affecting attributes.As mentioned in (Chen, Yu, Yang, & Shao, 2022), affecting attributes strongly varies according to the applied field as well as targeted community.The previous studies discussing the estimating process of overheads had training data for various project types and taking into consideration many factors that might lack accuracy and the focus on certain project types.The accuracy of a neural network is based on the accuracy and amount of the training data.So this research will focus on the commercial projects only that would increase its accuracy dealing with this type of projects as the collected project data will be for commercial projects only.

dATA COLLeCTION ANd ANALySIS
Collecting the essential project data for some commercial projects, which were executed by the first four categories of contractors in Egypt in the past 7 years is accomplished.This data is required for the training of the neural network model for estimating the site overhead percentage.The results in table 2 from the previous studies conducted on this specific area reflects the main factors influencing site overhead costs for building construction in general.These findings will support the survey to find the factors concerning commercial construction projects in Egypt (see table 2).
From the analysis of the collected data, it was concluded that there is a difference between the factors studied in the literature review and those factors that affect the site overhead cost in commercial projects in Egypt.Limiting the research study to commercial projects narrows the factors that may affect the site overhead percentage for a project; moreover, some factors are not taken into consideration in the Egyptian construction industry especially in the commercial construction projects.So after the analysis of the data collected from experts and comparing it with the previously studied factors in the literature review, a final list was concluded that represents the factors that contribute to the commercial construction projects in Egypt (see figure 1).
The most stated factor was the ownership type of the company.That is whether the contracting company is publically or privately owned.
The ownership type of the company should be considered in the Egyptian construction industry as there are construction firms that are owned by the public sector, and the management style of these companies is different from those of private ownership (see table 2).

CASe STUdy dATA
Data for 55 projects were collected according to the listed factors in the previous section; these data are for commercial projects that were executed in the past 7 years in Egypt.The collected project data was scanned and studied to examine the influence of each factor on the total site overhead percentage.This study will provide a comparative analysis for each factor and how it may influence the site overhead percentage.Also, it will help in identifying the critical factor

The Influence of Class of Contracting Company
The analysis of the collected project data according to the class of the contracting company classified the results into four categories.These are the first four categories of contracting companies as enrolled in "The Egyptian Federation for Construction and Building Contractors".Table 3 represents the statistical analysis of the collected project data according to the rank of the contracting company.Moreover, the analysis of the collected project data according to the class of the contracting company classified the results into four categories (see figure 2).

The Influence of direct Cost
The analysis of the collected project data according to the direct cost of the project classified the results into five categories (see figure 3).

The Influence of Project duration
The analysis of the collected project data according to the duration of the project classified the results into four categories (see figure 4) The 55 collected projects data were categorized into 10 projects with duration of less than 18 months, 21 projects with duration between 18 and 36 months, 15 projects with duration between 36 and 60 months, and 9 projects with duration of more than 60 months.

The Influence of Project Location
The analysis of the collected project data according to the location of the project classified the results into two categories, inside the capital city and outside capital city (see figure 5).

The Influence of Contract Type
The analysis of the collected project data according to the contract type of the project classified the results into three categories, Cost-plus Contracts, Unit rate Contracts and Lump-sum Contracts (see figure 6)

The Influence of Ownership Type of the Contracting Company
The analysis of the collected project data according to the ownership type of the contracting company classified the results into two categories, Private Sector and Public Sector (see figure 7)

MAJOR FACTORS ANALySIS
At this phase of the data collection and analysis, it is important to analyze the major factors collected from the previous sections.This process will enhance the study of the factors affecting site overhead percentage for commercial projects in Egypt.Sensitivity for each factor will be studied to show the effect of changing any variable or factor on the resulted site overhead percentage, to conclude how sensitive the site overhead percentage is to the change of any factor influencing it.After the analysis of the data collected from experts and comparing it with the previously concluded percentages in the first survey, final weights for the governing factors influencing the site overhead percentage were concluded as shown in Figure 8.
From the study of the results of the survey in Figure 8, it is clear that the contract type is the most significant and critical factor that most influences the site overhead percentage.This is due to the critical role that the type of contract plays in determining the budget and tendering method and pricing including the possible allocated risk on each party for each type of contract.On the other hand, the ownership type of contracting company has the lowest influence on the site overhead percentage.

deSIGN OF THe NeURAL NeTwORK
The main objective of this research is to develop an artificial neural network to predict the site overhead cost as a percentage of the total direct cost for commercial projects in Egypt.This can enhance the accuracy of estimators and decision-makers in proposing an estimate for the bid applying for.These steps of designing the network follow an iterative method of trial and error as illustrated in figure 9 to achieve the most accurate and valid network for estimating the site overhead percentage for commercial projects in Egypt.
The simple user-friendly interface of the "Neural Designer" guides the steps needed to develop the neural network model.These steps are as follows: The "Neural Designer" automatically divides the data set to three types of instances: This classification reflecting on 55 project data sets: • 33 training instances • 11 selection (validation) instances • 11 testing instances

Training Strategy
In this research Gradient descent training strategy is chosen.The back-propagation works on the gradual error reduction between inputs and the target output.It develops the inputs to output leveling by minimizing the root mean squared error and the training process shall be finished when the RMSE becomes constant.The root mean square error is a good overall measure of whether a training run was successful.In developing the neural network in this research, the loss index is set to be the Root Mean Squared Error and the training algorithm is set to be Gradient descent.
• Where n is the number of projects to be evaluated in the training phase • Xi is the model output related to the sample • E is the target output

First Package -Models with One Hidden Layer and Hyperbolic Tangent Transfer Function
The first run package of models resulted in 10 models with one hidden layer and hyperbolic tangent transfer function (see figure 10 and 11).Models vary in the number of neurons in the hidden layer starting from 3 neurons to 12 neurons.Table 4 shows the results of the first run package of models.

Second Package -Models with One Hidden Layer and Logistic Transfer Function
The second run package of models resulted in 10 models with one hidden layer and logistic transfer function (see figure 12).Models vary in the number of neurons in the hidden layer starting from 3 neurons to 12 neurons.Table 5 shows the results of the second run package of models.
From table 5, model 15 has the lowest RMSE of 0.463 corresponding to a mean percentage relative error of 7.04%, while model 17 has the highest RMSE of 0.657 corresponding to a mean   percentage relative error of 10.38%.The results of the second run package of models show that the RMSE changed in a non-linear trend as the number of neurons increases as shown in Figure 12.

Third Package -Models with Two Hidden Layer and Hyperbolic Tangent Transfer Function
The third run package of models resulted in 36 models with two hidden layers and hyperbolic tangent transfer function (see figure 13).Models vary in the number of neurons in each hidden layer starting from 3 neurons to 10 neurons.Table 6 shows the results of the third run package of models.
From table 6, model 54 has the lowest RMSE of 0.415 corresponding to a mean percentage relative error of 6.88%, while model 42 has the highest RMSE of 1.068 corresponding to a mean percentage relative error of 14.84%.The results of the third run package of models show that the RMSE changed in a non-linear trend as the number of neurons increases as shown in Figure 13.

Fourth Package -Models with Two Hidden Layer and Logistic Transfer Function
The fourth run package of models resulted in 36 models with two hidden layers and logistic transfer function; models vary in the number of neurons in each hidden layer starting from 3 neurons to 10 neurons.Table 7 shows the results of the fourth run package of models.
From table 7, model 65 has the lowest RMSE of 0.188 corresponding to a mean percentage relative error of 3.50%, while model 78 has the highest RMSE of 0.647 corresponding to a mean percentage relative error of 9.21%.The results of the fourth run package of models show that the RMSE changed in a non-linear trend as the number of neurons increases as shown in Figure 14.

Selecting the Best Model
The run of the trial and error process as discussed before resulted in 92 trial models categorized into four packages that vary in the number of hidden layers, number of neurons in each layer, and the transfer function.The next Figure 15 illustrates the linear regression for the scaled output OHP.The predicted values are plotted versus the actual ones as squares.The solid line indicates the best linear fit.The grey line would indicate a perfect fit.By reviewing the 92 models, it is concluded that model 65 is the optimum choice with the following design considerations stated in table 8 and network architecture as shown in Figure 16.
As shown above, Model 65 designed with the following considerations: • 6 input neurons (one neuron for each factor choice) • 2 hidden layers • 6 neurons in the first hidden layer • 5 neurons in the second hidden layer • Logistic transfer function for both layers • 1 output neuron represents the site overhead percentage (OHP)

VALIdATION ANd TeSTING OF THe MOdeL
To evaluate the validity and reliability of the developed model, the model has to be tested by predicting the site overhead percentage of new projects not previously introduced to the training of the model.The prediction accuracy is evaluated by comparing the predicted value with the real-life or actual value and calculates the difference between the two values.The relative error is calculated and evaluated according to the acceptable relative error of the model resulted in the developing process.

Validation of the Model
For the validation process, the model was tested on the base cases previously introduced in the learning process of the model and compared to the acceptable relative percentage error of the model.The  relative percentage error was calculated.The relative percentage error should be within the acceptable range of the model to be accepted, if not, the result is considered not accepted.
Through the analysis of the validation process, it was concluded that the model predicted 33 base cases within the acceptable range of the model, and 5 base cases out of the acceptable range.These results represent 90% accuracy for the model selected.

Testing of the Model
A number of 6 projects listed in table 9 were selected randomly aside from the 55 projects used for the learning process of the model; these 6 projects were not introduced to the model in the training  range of the model, the prediction would be considered a wrong prediction.And if the value of the relative percentage error lies within the acceptable range, the prediction would be considered right.
The developed model has an acceptable relative percentage error of ±3.05%.By considering this error margin and after running the program on the 6 test projects, the model predicted 5 out of 6 projects within the acceptable range.That represents an accuracy of approximately 84%.The accuracy of the developed model is considered to be very high for predicting the site overhead percentage of commercial projects in Egypt (see figure 18).

CONCLUSION
The research main objective was to identify the factors affecting site overhead costs for commercial projects in Egypt, and the application of neural networks to accurately estimate this percentage.Various studies on neural networks and their application in construction cost estimates have been reviewed to build a prevailing knowledge on the subject.These studies offered neural network applications for   basis for a survey to reflect them on the Egyptian commercial construction industry.Participants in this survey were chosen to be experts in the construction industry especially in commercial projects and with experience of more than ten years in the commercial market.The weights from respondents were calculated on a weighted average base according to each scale weight, and the results of the survey showed five significant factors in addition to one more factor that was not listed and added by the respondents.And a final list of six factors was concluded was project location, project direct cost, project duration, contract type, class of company, and company ownership type.Data from fifty-five commercial projects were collected according to the factors concluded from this research; these projects were commercial projects that were executed in Egypt in the past seven years.Then a comparative analysis was conducted to study the influence of each factor on the site overhead percentage, this analysis was useful in understanding how each factor choice may influence the overhead percentage of a project.Also, this data would be used as training data for the neural network during the training phase.The collected data and its analysis set the ground base for developing the neural network for estimating the site overhead percentage for commercial projects in Egypt.The "Neural Designer" software was used to develop the neural network model and the process of designing the neural networks went through different steps: coding the data set and importing it to the "Neural designer" for the training process, determining the training algorithm and strategy, determining the network architecture, validation and testing of the model, and developing a GUI for the model Ninety-two models were developed to reach the optimum network structure, the selected model was developed using the traditional trial and error process, using the back-propagation training algorithm.The developing of the model was based on the loss index of the Root Mean Squared Error (RMSE), and the selection was based on the model with the lowest RMSE.The trial model number 65 was the best choice with a network architecture of 2 hidden layers with 6 neurons in the first layer and 5 neurons in the second layer, with a logistic transfer function for each layer, RMSE of 0.188 and corresponding to a relative percentage error of 3.05%.Then the model was tested on 6 projects which were not introduced before to the network to determine the accuracy of the developed model.The test result was predicting 5 out of 6 projects within the acceptable range and only 1 out of the acceptable range of the model, this results in an accuracy of 84% that is considered to be a very accurate prediction.For the aim of implementing the model into a user-friendly interface, the model function was exported to a Python script and was used to develop a GUI for the model to be used by anyone who cannot deal with the "Neural Designer" software.
It was concluded that establishing a database for the finished projects and projects on-hand is recommended, this is for the aim of using it in developing any decision-making model and to get the full benefit from the lessons learned.Also, researchers in the future may focus on a specific aspect in the construction industry and gather all the possible factors affecting it to figure out the governing relation between them.This focus will enhance the accuracy of the models as data would be specific and in direct relation with the target output.
As a summary, the study showed that site overhead costs for commercial construction projects in Egypt are affected by six major factors which are the class of contracting company, project location, project direct cost, project duration, contract type, and ownership type of contracting company.The study also demonstrates the benefits of using artificial intelligence and neural networks in the estimating process of the site overhead costs, and the undeniable advantages of the proposed model to predict the site overhead percentage for commercial construction projects in Egypt are its simplicity, calculation speed, and accuracy.Results of testing the proposed model proved to be very accurate with an accuracy of 84%.The accuracy of the model depends on the amount of the learning data and the degree of accuracy of these data, these data should cover many cases with different situations for the model to build a relation between the different inputs and their effect on the targeted output.Also determining the suitable learning algorithm and activation function for the model plays an important role in developing an accurate calculation model.The testing and validation process of the model is an essential process to ensure the prediction accuracy of the model, this process should be done using pre-known data inputs and outputs, but not previously introduced to the training model.
The model is much more accurate and simpler to use and minimizes the estimation duration of such items.That will enhance the contractor's ability to accurately predict their costs, and so maximize their profit and their competitive advantage.It is recommended for the construction companies to establish a database for the finished projects and projects on-hand to be used as historical data in developing any decision-making model and to get the full benefit from the lesson learned.A graphical user interface should be developed for any further neural network application for the estimator to deal with it without any previous knowledge of the neural networks or how it may perform.

Figure 1 .
Figure 1.Factors affecting site overhead percentage

Figure 3 .
Figure 3. Project data analysis according to project direct cost

Figure 5 .
Figure 5. Project data analysis according to project location

Figure 7 .
Figure 7. Project data analysis according to ownership type

Figure 9 .
Figure 9. Steps for designing the neural network

Figure 10 .
Figure 10.Results of first run package of models

Figure 12 .
Figure 12. Results of second run package of models

Figure 13 .
Figure 13.Results of third run package of models

FigureFigure 16 .
Figure 14.Results of fourth run package of models

4 Table 1. Site overhead factors concluded from literature review Reference Factor Project Location Project Size Project Duration Project Complexity Payment Schedule Contract Type Tendering Method Class of Company
(S, A, & M, Project overhead costs in Saudi Arabia, 1999) 1999