Threat Attribution and Reasoning for Industrial Control System Asset

Due to the widespread use of the industrial internet of things, the industrial control system has steadily transformed into an intelligent and informational one. To increase the industrial control system’s security, based on industrial control system assets, this paper provides a method of threat modeling, attributing, and reasoning. First, this method characterizes the asset threat of an industrial control system by constructing an asset security ontology based on the asset structure. Second, this approach makes use of machine learning to identify assets and attribute the attacker’s attack path. Subsequently, inference rules are devised to replicate the attacker’s attack path, thereby reducing the response time of security personnel to threats and strengthening the semantic relationship between asset security within industrial control systems. Finally, the process is used in the simulation environment and real case scenario based on the power grid, where the assets and attacks are mapped. The actual attack path is deduced, and it demonstrates the approach’s effectiveness.


INTRodUCTIoN
With the popularization of industrial Internet of Things and the development of industrial network intelligence (Tsuchiya et al., 2018), the operation and production mode of traditional industries-such as key manufacturing (Chen, 2020), chemical industry, electric power etc. (Alaba et al., 2017)-is gradually updating itself to be more intelligent and informational (Sasaki et al., 2022).Industrial Control System (ICS) is an asset control system used in industrial manufacturing that integrates computer equipment and industrial process control components.The ICS breaks down the notion of isolation inherent in traditional industry and external access (Kumar et al., 2022).The traditional industry did not take security, especially system security, as part of the main design criterion at the beginning (Mi et al., 2021).As the development of ICS networking and information technology (Cruz et al., 2016) are developing, many security protection measures created by network isolation are increasingly being connected to the network, which may create the risk of exposing ICS security vulnerabilities to hackers (Babu et al., 2017), causing severe economic losses and negative social impact.Threats to asset security in ICS increase along with the level of asset complexity.ICS is involved in almost all aspects of industrial production (AlMedires et al., 2021), and any asset issue could affect the manufacturing and production businesses' ability to continue operations (Zhang et al., 2021), thus causing risks that are out of control.Therefore, how to deal with the behavior of hackers and how to attribute the source of the hacker attacks are the difficulties of today's research.Because of the natural inequality between attack and defense (Su et al., 2022), we must comprehend the asset type and its functions in ICS and take into account all potential threats and attacks in combination with security, so as to judge the impact of the attack on ICS, speculate the attack path of hackers, and ultimately anticipate and respond to hacks in a proactive manner.
Related researchers mainly use three ways to determine ICS security: intrusion detection, security assessment, and system configuration.Intrusion detection is mainly used to achieve prevention by detecting network attacks to avoid being attacked.Bhamare et al. (2020) investigates the applicability of machine learning for anomaly and intrusion detection in ICS but does not take into account the impact on the entire ICS when it is attacked.Security assessment focuses on evaluating system vulnerability prioritization and thus satisfying system security.Qassim et al. (2019) examines the entire network system to ensure system security by identifying a vulnerability assessment methodology in ICS that ensures system security only in terms of vulnerabilities.System configuration focuses on configuring the system for security.AlgoSec (2018) focuses on evaluating cybersecurity policies related to cloud access and implementing them where necessary.This approach focuses more on local security policies.None of the above three approaches consider the impact of a cyberattack on the ICS, and do not consider the diversity of system impacts after being attacked.
In the ICS, the ever-changing ecological environment (Zhang et al., 2019) makes attackers feel in their element.For example, manufacturers often update their software systems for the convenience purpose of users and human-computer interaction ability, but these operations may lead to new vulnerabilities (Knapp et al., 2014), especially those that lack security considerations when considering the initial design (Kriaa et al., 2015).Moreover, the attacker's method and routes are constantly updated, while the defender cannot keep abreast of the latest attack technology and vulnerability information.Therefore, simple intrusion detection, attack attribution and attack prediction cannot perfectly analyze the attack behavior.We need to design a new method to detect and analyze the complex ecological environment of the ICS in time to enhance our knowledge of the threat attack.
Considering the above issues, this paper suggests an ICS threat attribution and reasoning method.Because of the importance of assets in the ICS (Li et al., 2017), this method uses the Purdue model, MITER ATT&CK etc., to describe the asset, and divides the assets into several asset types according to the actual situation of the ICS, thus constructing the ICS's asset type.Then, we use machine learning to analyze the power system attack data set (Koay et al., 2022), which can attribute the source of related attack threats and achieve good results.In this way, it can be learned which part of the ICS has been attacked.Finally, this paper simulates the scenario of the power system attack data set and the real scenario of the attacks on the Ukrainian power grid case (Sullivan et al., 2017).It automatically adds the impact of the ICS or the impact that will be caused after the attack by the attacker through reasoning rules, and finally maps it, which can show the attack path of the attacker.
This paper mainly makes the following contributions: 1.According to the actual situation of ICS, a threat model is constructed, which takes detection, threat, asset, and reality into consideration.And it describes the ICS from the perspective of assets, combines the Purdue model and the actual situation of ICS, puts forward a new concept of asset architecture, constructs an ontology model applicable to the security, and designs six kinds of common inference rules so that it can automatically reason about the system state.
2. The problem of uneven data distribution of power system attack dataset is solved, and the attribution analysis of power system attack data using machine learning can detect the anomalies in the power system after being attacked, and in the comparative experiments.Better results are obtained and the first component that receives the impact of the attack is obtained.3. Using reasoning rules to analyze the scenarios of the power system attack dataset and the case of the Ukrainian power grid hacking.By reasoning and analyzing the behavior of the attacker, new relationships can be automatically added to obtain the attack path of the attacker, which is finally mapped it, thus the comprehensive picture of ICS will be presented to the relevant personnel.
In this paper, Section 1 introduces the pertinent context and key contributions; Section 2 introduces related work; a new ICS asset threat model is suggested in Section 3; Section 4 attributes the source of ICS attacks; Section 5 develops the industrial control system's asset security ontology and provides the pertinent reasoning guidelines; Section 6 analyzes the attribution results, and verifies the effectiveness of the method in this paper by combining the power system attack data set scenario and a real case, and maps it to the knowledge map; in the end, we summarize the entire paper and propose some future works.Figure 1 represents the overall process of this article.

ReLATed woRK
At present, the field of ICS has drawn the attention of researchers, and preliminary work on ICS security has been done.Kumar et al. (2022) used the attack tree as the common language and modelled three prominent APT attacks in the ICS.The attack tree modeling language was used to organize and systematically characterize each attack.The method was then verified by the attack scenario of the industrial oil pipeline.Mou et al. (2020) used the knowledge map to construct the map of the process manufacturing of the ICS to ensure the safety of the assets.This method only considered the internal relationship of the assets but lacked the consideration of the impact of external factors on the internal relationship of the assets.Samanis et al. (2022) evaluated 28 indicators in 18 free ICS asset scanning programs, considering the exclusive protocol in ICS.In a word, security experts had conducted exploratory research on a number of ICS related topics, but in the area of ICS security, there was no specialized threat modeling technique.The stability of the entire environment will be determined by the ICS assets.The relationship between assets and security in ICS was not taken into account in the above research.Therefore, this research suggests a novel ICS threat modeling scheme from the viewpoint of ICS assets.This framework may equip security workers with a broad spectrum of security awareness and can effectively express the relationship among assets, threats, and realities.
There are two ways to detect network attacks, one is to simply detect the attack, and the other is to attribute the source of the attack.Network attack detection can only identify the status of the current environment, such as normal status, attack status, etc., and cannot obtain more information.Mokhtari et al. (2021) summed up the intrusion detection of industrial control systems as the detection of abnormal activities.Based on the measurement data of SCADA, a MIDS intrusion detection system was proposed, and a hard-ware-in-the-loop test platform was built to detect abnormal activities.An anomaly detection technique based on state recognition was proposed by Hurley et al. (2012), which identified the normal and critical state of the system through data-driven clustering method and detected attacks in ICS.Network attack attribution is mainly divided into four levels: host attribution, control host attribution, attacker attribution, and attack organization attribution.These four kinds of attribution can be summarized into two types: component attribution and organization attribution.Both host attribution and control host attribution are component attribution, mainly for specific components in the attacked environment.Organization attribution mainly refers to the attribution of various information of the attacker, such as the identity, organization, and region of the attacker.Huang et al. (2021) used the clue data and threat intelligence in the network attack event to extract relevant information, built the network attack event attribution relationship diagram according to the network attack event attribution ontology, learned the relevant path through the graph embedding algorithm, and finally realized the organization attribution through the classifier.Li et al. (2021) trained a multi-classification model of SMOTE-RF, which could better deal with the multi-classification problem of data imbalance.This method obtained the behavior characteristics of APT from the devices in the Internet of Things and used real dynamic data to complete the organization tracing behind the APT attack.Jahromi et al. (2021) proposed an attack detection traceback framework for ICS, which used decision tree and representation learning model to detect attacks in ICS and uses deep integrated neural network to organize traceback for attacks.In the ICS, due to the diversity of components in the environment and the complexity of interconnection (Ooi et al., 2023), it is difficult for researchers to attribute component of network attacks in ICS.At the same time, the development of IoT (Internet of Things), smart devices and other technologies also provide attackers with more attack objects and use of springboard (Zhang et al., 2013), which increases the difficulty of the component attribution process.This paper uses the power system attack data set and classifies it using the machine learning method to attribute the source of the components in the environment, find the attack source of the attacker, and realize the component attribution.
By describing information items, ontology is used to exchange domain knowledge and enhance the semantic link between information objects.Ontology modeling has made some progress recently according to security researchers in the security field.A network security ontology created by Kotzanikolaou et al. (2022) had two layers of data regarding the threats and physical environment and could combine risk assessment.By using this ontology to create a malware knowledge map and extract the hidden information from it, Rastogi et al. (2020) created a malware ontology called MALONT.A security ontology for risk monitoring was proposed by Merah et al. (2021) using Cyber Threat Intelligence (CTI), highlighting the interdependence of risk concepts that could expand the use of Structured Threat Information Expression (STIX).Zhang et al. (2021) proposed a RIoTSCO Internet of Things security ontology integrating multi-source heterogeneous data.This ontology built a million-level heterogeneous database by combining intelligence.However, none of the aforementioned studies considered the specific circumstances surrounding ICS security.In order to give security personnel a comprehensive understanding of the internal workings of ICS assets, this paper suggests an ontology for security built on ICS assets that seeks to strengthen the relationships between ICS assets and assets, assets and threats, and threats and consequences by using inference rules.

Basic definition
From the standpoint of the assets, this paper describes the concepts of assets in ICS, as follows: Definition 1 -Asset: Assets are the components in ICS that have a direct or indirect impact on industrial production.Definition 2 -Component: Components include not only physical components in ICS, but also some protocols or systems used in ICS, that is, all computer equipment and industrial control process components in ICS belong to the category of components.Definition 3 -Effect: The effect of this article mainly represents two meanings, one is the relationship between assets and assets, and the other is the change of an asset after an attack.Definition 4 -Asset type: A part of assets with similar characteristics or functions belongs to the same asset type.Definition 5 -Attack source: In this article, the attack source mainly refers to the first affected component after being attacked by the attacker, that is, if the attacker uses Component A as a springboard to attack other components in ICS, Component A is the attack source of this attack.Definition 6 -Attribution: Attribution in this article indicates the determination of the attack source after being attacked.
Assume that the asset security domain in ICS is AICS_ ASSET, the component in the ICS is MICS_ ASSET, the part is PAICS_ ASSET, the protocol is POICS_ ASSET, the system is SICS_ ASSET, record the asset category as ACICS_ ASSET, then the formula is as follows: PA ICS_ASSET ,PO ICS_ASSET ,S ICS_ASSET ∈M ICS_ASSET (1) The above shows the relationship between some terms in this article.Among them, assets include all components, and asset category is a part of the assets with similar functions.Both the visible components of the entity and the virtual network protocol or system belong to components.

ICS Asset Structure
Purdue model divides the ICS into five layers through the interdependencies among the components of the industrial control system: the actual production process layer, the production process perception layer, the production process supervision layer, the manufacturing operation management layer, and the business plan and the logistics layer.To be specific, the actual production process layer mainly describes the actual manufacturing or production process; the production process perception layer mainly describes the operation and perception process of the actual production process; the production process supervision layer is mainly responsible for the monitoring and management of the production process; manufacturing operation management refers to the business process of producing target products; and the business plan and logistics layer mainly represents the relevant activities of a manufacturing organization.It can be found that the first three layers of Purdue model mainly aim at the actual production of ICS and focus on the production process.The latter two layers mainly describe the production plan of the ICS, with human factors in the majority.We research the assets in the production process of the ICS, which corresponds to the first three layers of the Purdue model.
The biggest difference between ICS and ordinary IT network is the difference in protocol.Due to the physical isolation and other characteristics of ICS protocol, security issues were not considered at the beginning of design.With the development of intelligent ICS, which is increasingly connected to the Internet, the security features of physical isolation no longer exist in front of hackers.The security of the ICS protocol was thoroughly studied by Fang et al. (2022), who concentrated on the ICS's protocol vulnerabilities.They explained the significance of the ICS protocol from three perspectives: the protocol's vulnerability, the attacker's attack strategy, and the attack's consequences.In the incident where the Ukrainian power grid was compromised, the attacker intruded into the SSH back door by using phishing mail, thus intruded into the servers of the Ukrainian power grid, caused during a significant power outage.In this event, the attacker obtains the authority of the Ukrainian power grid by attacking the SSH protocol vulnerability, thus implementing the attack.Purdue model cannot well describe the important role of protocol in this event.Therefore, this paper builds the ICS asset structure by using the protocol as part of the ICS asset.
After research, we found that attackers often attack the ICS from the monitoring and management system to affect the underlying equipment of the ICS for their own purposes.Stuxnet (Masood et al., 2021), for example, is a combination of malicious code attack and zero-day vulnerability, which destroys the field equipment level centrifuge to halt Iran's progress toward nuclear weapons.In this event, the attacker obtained some permissions of the computer through malicious code, thus completing the main attack on the device layer at the attack site.To sum up, we divide the assets of ICS into four parts: production type assets PRDICS, perception type assets ADICS, supervision type assets SDICS and protocol type assets PTDICS.Figure 2 shows the difference between Purdue model and asset division in this paper.Among them, production type assets mainly represent the actual production process of the ICS; perceived type assets mainly describe the perception process of the actual production process; supervision type assets are mainly aimed at the monitoring and management of the production process in the industrial control system; and protocol type assets mainly represent protocols in ICS.Among the four types of assets, the sequence of asset grade from high to low is: regulatory assets, perception assets, and production assets, while the asset grade of protocol assets cannot be determined because they exist in the other three types of assets.Based on the above, we can have a good description of ICS assets.

Figure 2. Comparison between purdue model and the asset structure of this paper
The ICS asset, AICS, is indicated as follows: {PRD ICS , AD ICS , SD ICS , PTD ICS } ∈ A ICS (4)

ICS Asset Threat Model
Through the ICS asset structure's split, we propose an ICS asset threat model, as shown in Figure 3. Figure 3 mainly shows the primary framework for the threat of ICS assets.In this paper, threats and consequences are added based on the four parts of ICS assets.Among them, protocol type assets, supervision type assets, perception type assets, and production type assets are collectively referred to as assets.Threat is mainly used for describing the threat to the ICS, including the attacker's organization, the tools, the time or means of the attack, etc.The consequence indicates the possible results of the ICS assets after being attacked.Attackers need to research and plan various information in the assets, and then devise one or more attack strategies that can lead to specific effects, and finally act on the relevant assets.In the same way, the consequences of related assets will also react on assets.Through the analysis of ICS asset security presented above, we can gain a more comprehensive understanding of the internal ecological environment of ICS, as well as the security circumstances surrounding all ICS assets.

Power System Attack data Set
Through the analysis and model construction in the previous section, we can already make a detailed description of the asset attacks in ICS.However, for the unique structure and assets in the ICS environment, how to attribute the source of attacks is also a very important link, that is, to detect the source of attacks.In ICS, assets are often interrelated.Once one of the assets is threatened, the entire asset environment will be affected, and even cause the collapse of the entire environment.Therefore, in ICS, we need not only to detect attacks, but also to attribute the source of attacks.In this way, we can keep track of slight changes to components in the ICS assets and grasp the situation perception in ICS in real time from point to surface.
In response to the above problems, this paper uses the power system attack data set collected by Mississippi State University and Oak Ridge National Laboratory (2014) to verify the model method in this paper.The data set is mainly divided into three modules: dual metadata set, triple metadata set, and multiple metadata set.The dual metadata set is mainly used to distinguish attack events and natural events; three data types of attack events, natural events, and non-events collected by triple metadata set; and the multiple data sets mainly mark 37 scenarios, including all aspects of the threat to the power system.Threats are classified into natural threats and attack threats.For example, short circuit fault and line maintenance caused by natural events are natural threats, that is, there are no human factors.Scenarios such as remote trip command injection and relay setting change are mainly caused by human operation, which is an attack threat.The following table makes a statistical comparison of the above data sets.
For the data sets shown in Table 1, especially the multivariate data sets, each event is marked, and each event is subdivided into components that occur to the tag, thus marking 37 scenarios.If the remote trip command injection attack is used to attack component A and component B respectively, and the relay setting change is used to attack component C, these are three scenarios.We learn and analyse the above situation via machine learning to sense the upcoming attack, attribute, and find the attack source.We then use the reasoning method mentioned in the next section to have a global grasp of ICS assets, that is, when a component is attacked, its impact on the entire ICS asset environment can be understood in detail.
The power system attack data set collected by Mississippi State University and Oak Ridge National Laboratory is shown in Figure 4 below, where AC represents the power supply, BR mainly represents the circuit breaker, and R represents the intelligent equipment that controls BR.R contains relays, and Rn controls BRn.The four intelligent devices are controlled by the substation.The data of the substation can be handed over to the switch by the PDC (Phasor Data Concentrator), or directly interact with the switch.The switch is managed by the control panel.The openPDC obtains data from the HMI and feeds it back to the control panel.The control panel generates system logs.The entire environment is detected by the snort system, which can also be regarded as a defense device in the environment.The relay is controlled by Modbus/TCP protocol, and the phasor data concentrator (PDC) mainly uses IEEE C37.118 (2013) protocol to transmit data.Through the above environment, a simple power system can be imitated.The power system data set is composed of a 128-dimensional data set by measuring the current, voltage, voltage phase angle, radio wave, and other data of the above parts, simulating attacks by destroying a component and recording the damaged component and measurement data.Through this data set, we can use machine learning method to classify the data set and speculate the impact of an attack on a component through the fluctuation of the measured value, then complete the attribution.
Figure 5 shows the raw data of the power system attack dataset, such as R1-PM7:V represents the voltage phase magnitude about PM7 measured by R1; R1-PA8:VH represents the voltage phase angle of PA8 measured by R1; R1-PA10:IH represents the current phase angle about PA10 measured by R1; R1-PM10:I represents the voltage phase magnitude about PM10 measured by R1, containing a total of 24000 data.

Related Models
This paper mainly compares five machine learning models: SVM (Jakkula, 2006), decision tree (Song & Ying, 2015), random forest (Cutler et al., 2012), XGBoost (Chen et al., 2015), and KNN (Guo et al., 2003) to classify the above data sets so as to verify the accuracy and effectiveness of this experiment.
SVM is a supervised learning classifier for binary or multivariate classification of data.The goal is to solve the hyperplane of the maximum edge distance between the decision boundary and the learning sample.In order to solve it, a convex quadratic programming problem is created.Little sample data categorization issues can be resolved with SVM, especially if the sample data is not more than 10000.SVM can also solve the dimension disaster and nonlinear separable classification problem through the kernel function method.Since SVM has the problem of solving the hyperplane of the maximum margin between the decision boundary and the learning sample, the computational complexity of SVM depends on the number of support vectors of the learning sample rather than the dimension of the learning sample.Decision tree is a supervised learning model that can be selected according to certain conditions to achieve the goal.The node of the decision tree represents the feature, and the edge represents the direction.Through the feature and direction selection, the leaf node is finally obtained, which is the classification result.In addition to not supporting missing values, the decision tree does not require any data preprocessing, and can handle both numerical variables and classification variables.However, the decision tree is not a stable model.If the sample data is not balanced before training, the decision tree will create a biased tree.
Several decision trees are combined in the integrated learning algorithm known as random forest.By voting the split results of multiple decision trees, the output result is determined by the category with the highest votes.Random forest can solve the problem of weak generalization ability of decision tree by random feature selection and random sample selection.Training can be highly parallelized and faster under large sample data.Random forest is not sensitive to some missing features because of randomly selected features.However, in noisy data sets, random forest can easily fall into over-fitting.
XGBoost is an algorithm toolkit based on the Boosting framework, an optimized distributed gradient lifting library, and a strong classifier that integrates many weak classifiers.The algorithm continuously uses feature splitting to grow additional trees.Each additional tree learns a new function, fits the residual of the last prediction by the new function, and finally adds the scores of the trained trees to get the final prediction value.XGBoost uses Taylor expansion to speed up the gradient descent.It can optimize the calculation of leaf splitting only by relying on input data.In essence, it increases the applicability of XGBoost by separating the selection of the loss function from the optimization of the model algorithm.
KNN calculates the distance from the test data to each object in the training data, sorts it according to the distance, and selects the K training data closest to the current test data as the neighbor of the current test data.Finally, it determines the category of the test category by counting the category frequency of the K training data.

ICS ASSeT THReAT oNToLoGy ANd ReASoNING
As a continuation of the attack attribution in the previous section, we can attribute the source of the attack on ICS assets once it has been attacked.In Section 3, although we have established the link between the threats of ICS assets and supplied a threat model for security, attacking real ICS assets remains a challenge.Through the formal description of specific domain knowledge, the ontology proposes the Asset Safety Ontology for Industrial Control Systems (ASOFICS), which solves the semantic heterogeneity problem among ICS asset, threat, and consequence (Xu et al., 2017;Lee et al., 2017), and plays an important role in the construction of the three-in-one ICS ecosystem of ICS asset, threat, and consequence.
The ontology for ICS asset threat in this section is inspired by multiple ontologies (Zhang et al., 2021;Li et al., 2021;Rastogi et al., 2020;Merah et al., 2021).After being combined with some concepts and adjusted by a number of details, the ontology here is more suitable for the description of assets and their environment in ICS.The ontology model to be discussed in this section considers the concepts necessary for ICS asset, threat, and consequence.These interrelated sources of knowledge are not involved in the referenced ontology.This section uses OWL language to build a unified formal description, in which concepts are equivalent to classes and relationships are equivalent to properties.In the second half of this section, the semantic Web rule language is used to design inference rules and to display the implicit information in the ontology.

ICS Asset ontology's Class and Property
Six top level classes are present in the ASOFICS ontology that is suggested in this section: production type assets, perception type assets, supervision type assets, protocol type assets, threat, and consequence.Figure 6 illustrates the connections between the six top-level classes.
The attack target of the attacker, the interplay of various assets, threats, and consequences, among other things, are described in this article through properties, as illustrated in Table 2.
As indicated in Table 2, this research presents 13 relationship properties grouped into three groups.The ICS asset structure, which is primarily used to specify properties between assets in the ICS structure, is the subject of the first group, such as ActOn and FeedbackTo, used to 'act on' and 'feedback on' separately.For example, HMI will act on production type assets, and perception assets often feedback their perception information to HMI.This is how we can define the relationship between the ICS asset components.BelongTo describes that one component is contained in another component.For instance, relays in ICS assets often exist in intelligent devices, thus Relays and intelligent devices are BelongTo properties.Control refers the control and management function of  Description of the relationship between asset and consequence regulatory assets on other assets, such as the control of production type assets in HMI; the measurement is expressed by Measure.Several sensors exist in the sensing assets to measure a range of values in the ICS manufacturing process, such as thermometer and flowmeters in chemical production as well as the current value of the ammeter in the power plant.HaveThreat indicates the threat of one asset.
The second group describes the properties of attackers, including Attack, AttackAffect, Infiltrate, MayControl, and UseAttack.Specifically, Attack describes the attack properties of attackers against assets.AttackAffect describes the impact of an attack on other associated components.Infiltrate means intrusion, expressing intrusion phenomenon of the attacker rather than a direct attack on the intrusion device, which represents an indirect relationship and an implicit attack that bypasses the alarm device (defense measures).MayControl means that the attacker has the potential to take control of both high-level and low-level resources.We are aware of the tools or attack technologies the attacker is using because the term UseAttack refers to the use of an attack.All attacks can be described by Attack, whereas UseAttack relates to the known attack techniques.Both of these terms directly define the attack.On the contrary, AttackAffect, Infiltrate, and MayControl describe the indirect impact caused by the attack.
The consequence layer's properties fall under the last group, which are MayCause and AssetAffect.MayCause is used to describe the possible consequences caused by attackers, and AssetAffect refers to the consequences caused by certain assets.

Reasoning Rule design
The inference rule connection strategy stated below not only serves the description of the ontology model, but also serves the description of actual assets in ICS.After meeting the specified connection strategy, it can automatically connect to form a new relationship.Note that RICS-asset is the inference rule set in the ICS asset security domain, and KIICS-asset is the knowledge base in the ICS security domain, thus the inference rule set is defined as follows: It should be noted that R represents the specified inference rule.Only if the condition of the inference rule (the left half) is true can the conclusion on the right be deduced, thus, as to add a new relationship.
The relationship between conditions (C) and new relationships (N) is expressed as follows and described using SWRL: In logical rules, use "and" and "or" to indicate '∩' and '|'.Three definitions apply to C and N, where C (a) denotes a member of Class C; P (x, y) denotes that certain properties are shared by x and y; and a numerical type or instance is represented by (x, y).
To facilitate the following rule design, we mark the set of classes in the area of ICS asset security as Class ICS-asset , the set of properties in the area of ICS asset security as Property ICS-asset , and the set of instances in the area of ICS asset security as Individual ICS-asset :

Class
Class Class Class n

Property Property Property
Property n Class n mainly indicates that there are n classes in Class ICS-asset .
In the area of ICS asset security, we have developed six different types of reasoning rules based on the aforementioned explanation, where '⋀' means 'and', as shown in Table 3.
Rule 1: If component a act on component b and the two are connected, we can assume that component b will be impacted by an attack on component a, thus if the attacker attacks component a or the attack has an influence on component a, then component b is affected by the attack.For instance, if a server is attacked by an attacker and the server is connected to the communication system, the server attack will have an impact on the communication system.Rule 2: If a component of the attack has an impact on an asset and that impact has consequences, we can deduce that the attacker may also have consequences.For instance, a server is attacked by an attacker; the server and the communication system are connected, so that the attacker can affect the communication system.Meanwhile, the communication system assets will affect the occurrence of "loss of connection" and other real events, thus the attack that affects the switch could result in "loss of connection" events.Rule 3: Component a may influence component b and information from component a is accessible to component b.Component a in the regulator type assets (or protocol type assets) may breach component b during an attack on component a.Since the majority of production type assets and perception type assets are manually operated and lack intelligence, attackers will typically avoid operating on the components in these two types directly in favor of attacking regulator type assets and protocol type assets, which will have an indirect impact on production type assets and perception type assets.For instance, the two-way communication between the scheduling workstation and SSH enables the scheduling workstation to both influence SSH and give scheduling workstation input.As a result, when an attacker attacks the scheduling workstation, the likelihood that he will hack SSH through it rises.Rule 4: Both component a and c have control over component b.It can be assumed that the attacker may want to seize control of component b if the attack simultaneously affects both component a and c.For instance, the attacker wants to obtain the management authority of the server through the control center and then bypasses snort to intrude the server.Therefore, the purpose of the attacker is to obtain the corresponding authority to control the server, so as to facilitate the subsequent attack.Rule 5: When an assault spreads from component a to b, component b is a part of and affects component c, we can assume that component c is the object of b's attack.For instance, SSH is used for remote calls, therefore if an attack manages to bypass the backdoor, the attack probably will have an effect on the communication server as well.Rule 6: An attacker is employing a menacing and threat-type tool.We can assume that the attacker may have consequences if the attacks (or attack effect) component a and the assets of component a result in a consequence event.For instance, we are unable to identify the attack trace if a hacker uses killdisk to hide his trace (log file).

eXAMPLe ANd eVALUATIoN
This section is mainly composed of three parts.The first part attributes the threat attack source of ICS and obtains the attack source through five machine learning classification models to verify the effectiveness and accuracy of the experiment.In the second part, the power system attack data set is used as the background to simulate two kinds of network attacks, and the inference rules are used to conduct security inference on ICS assets.The third part simulates the case of Ukraine power grid intrusion, which enables us to predict the threat attack and comprehensively grasp the asset security in ICS environment.

Comparison of Results
We first need to analyze the data distribution of the dataset.As shown below, Figure 7 Due to the fact that this dataset is in ARFF format, it is necessary to clean this dataset and convert it to the CSV format that we are familiar with.Through analysis we can find that the first 127 dimensions of data all have an impact on the final traceability results.In this dataset, there are 160,000 data for attack events and only 80,000 data for no events; it is obvious that the data is not balanced, so it is necessary to use the BorderlineSMOTE function to deal with the data imbalance mentioned above.After the processing of the data, they all obtain the same percentage.Finally, setting 20% of the test set for testing the above model gives the following results.
In this section, five commonly used machine learning classification methods-SVM, random forest, decision tree, XGBoost, and KNN-are compared to analyze the attribute of the attack source of the power grid system attack data set on Mississippi State University and Oak Ridge National Laboratory.This paper classifies three groups of power grid system attack data sets.The classification performance is shown in Figure 8.
As shown in Figure 8, A stands for accuracy, P stands for precision, R stands for recall rate, and F stands for F1 value.Additionally, B stands for binary dataset, T stands for ternary dataset, and M stands for multivariate dataset.
Accuracy represents how many samples in the prediction results are correct.It can be seen from Figure 6 that the accuracy of the SVC-RBF model is relatively low, especially in the face of multiple classification problems.On the contrary, the functionality of random forest algorithm is superior to the other four methods in the classification of dual metadata set, triple metadata set, and multiple metadata set.The specific accuracy is shown in Table 4. Precision is commonly known as P value.In this indicator, the function of SVM is better than that of accuracy.This indicator indicates how many of the samples that are predicted to be positive are true positive samples, indicating that SVM can predict the true positive samples correctly in the classification problem.However, the function is still not as good as the other four models, which may be related to the fact that SVM is only suitable for processing small sample data.Among other algorithms, random forest still shows the highest performance in three data sets, and the function of KNN is second only to random forest.The specific precision is shown in Table 5.
In the index of recall rate, the function of SVM is not ideal, that is, it cannot predict correctly compared with the original sample SVM.Random forest still shows the best performance in the index of recall rate, and KNN is also second only to random forest.The specific recall rate is shown in Table 6.
SVM does not perform well in F1-value because of the large difference in precision and recall.The other four models are stable, as shown in Table 7.
As shown in Table 8, Wang et al. (2019) also uses the power system attack dataset and adds 16 new features by using PMU to measure the physical significance of the features, combining the features by weight voting and achieves better results on SVM.In this paper, the unbalanced dataset  8.
In the experiments shown in Table 8, SVC-RBF classification is mainly adopted by SVM.Due to its unsatisfactory effect on large sample data, the effect of SVM in Figure 4 is unstable and the training time is too long.The other four models all have ideal functions; however, the training speed of XGBoost is slower than the other three models because of its requirements of integrated learning.Compared with the other four models, the function of random forest is highly desirable and stable.This is because random forest is composed of a group of decision trees, which is superior to decision trees in function.Moreover, random forest can solve the problem of weak generalization ability of the decision tree through random feature selection and random sample selection.
The above analysis allows us to attributively measure the threat attack.When the ICS is attacked, its traffic data can be obtained which is classified and attributes through machine learning, and then the source component of the ICS attack can be output.To sum up, we recommend using random forest, KNN, and XGBoost to classify the source of attacks in turn.

Scenario Simulation
Here, the above dataset scenario is imitated to verify the effectiveness of the threat attribution reasoning method of ICS assets proposed in this paper.Two attack scenarios are simulated in this dataset scenario that are remote trip command injection and relay settings change, both of which are described and analyzed in detail below.

Remote Trip Command Injection
Based on the above description, we have a general idea of the data set framework.Through the scenario description of the data set, the entities in the data set are identified and extracted, the asset structure of the power grid system is constructed, and the asset threat model of ICS is used to divide its assets.Through the attribution classification mentioned in the previous section, we can capture the first component of the attack on the power grid control system at the time the attack occurs.The single relay command injection attack on intelligent device R1 is simulated during the remote command injection attack, and the outcomes are then presented.As shown in Figure 9, the PDC receives the synchronization vector data and feeds it back to openPDC through the switch.After being attacked in the system, the source of attack can be obtained in time through the above-mentioned attributing method.When attacking the remote trip command injection, the attacker first needs to evade the snort defense system.After bypassing the defense alarm system, the attacker can intrude the master control of ICS, affect the switch, and then control the substation followed by injecting the trip command into the relay in R1 through the remote command injection, and making the BR1 circuit breaker corresponding to R1 open, resulting in tripping and power failure.
Remote trip command injection attack is a compound attack that utilizes both remote services and command line interface technologies.MITER ATT&CK (2021) describes exploitation of remote services as the abuse of remote services by using the errors of the system itself, thus causing intrusion.The command line interface is used by attackers to execute related commands interactively with the system, thereby causing attacks.Pan et al. (2015) believed that the remote trip was because the remote trip receiving side had received the trip command.Therefore, if an attacker wants to complete the remote trip command injection attack, he needs to evade the alarm device in the system, invade the system server to obtain the corresponding permissions, and finally achieve the attack through the trip command injection.By comparison, it is found that the reasoning rules in this paper can perfectly display the above logical order.

Relay Setting Change
This paper imitates the relay setting change attack in the power system attack dataset scenario as well, in which the relay has two states: enabled and disabled.The relay setting change attack is an attack by changing the state of the relay.
As shown in Figure 10, to prevent the attack from being detected, the attacker needs to bypass the snort alarm device, then attack the control center to gain access, and subsequently intrude the switch which can also control the substation.Since the substation controls four intelligent devices from R1 to R4, the change of the substation state will affect the intelligent device R1, and eventually change the state of the relay R1r contained in R1.The relay setting change attack uses the change operating mode technology in MITER ATT&CK.This technology changes the operation mode of the controller to obtain additional access rights.In this circumstance, the attacker first needs to evade the snort alarm system, then obtains the control authority, subsequently affecting the intelligent equipment in the substation through the switch, and finally completes the setting change of the relay in the intelligent equipment.After research, the logic used in this work is congruent with how an actual attack would be conducted.
Figure 10 clearly demonstrates each step of the attacker's penetration into the power grid system, allowing us to foresee the attacker's potential outcomes and prepare to defend any assets that may be impacted.
This method is proved effective by our research.This report also uncovers the information that was concealed inside the power grid system's assets following an attack.This study associates assetthreat-consequence and maps it, which demonstrates that the model technique in this paper is more thorough when combined with the inference rules.

Real Case
In order to verify the effectiveness of this method, the Ukrainian power grid intrusion case is used as an example in this section to support the proposed threat attribution and technique of reasoning for ICS assets.Based on our study and the research report on ANTIY (2016), this paper makes statistics on the internal assets of the Ukrainian power grid, as shown in Table 9.  Millions of people's daily lives have been impacted by the Ukrainian power grid intrusion.In this instance, the switch action of seven substations was mostly responsible for the 80,000 customers who lost power for 3 to 6 hours.Due to management incompetence, a malicious email sent by the attacker was read, and the infected email immediately downloaded malicious software, disconnecting the Ukrainian Electric Power Company's primary control from the substation, wiping out all evidence of the attack.The attacker also prevented residential customers who had lost power from communicating with the outside world by interfering with the communication system, which allowed them to eventually realize the coordinated attack, as seen in Figure 11.
The general mechanism of the Ukrainian power grid penetration is already clear based on the example described above.Next, we will take this case as an example to conduct knowledge reasoning and obtain the subsequent intrusion procedure.
Step 1: When an attacker is seen attempting to assault the dispatch center, which manages the main HMI and monitoring system, using blackenergy, it can be speculated that the attack will affect the above components and the attacker may control them.
Step 2: Attackers can breach SSH horizontally because the dispatch center uses SSH for communication.
Step 3: SSH is a method of getting access to the communication server.It also has a connection to the master HMI.The master HMI, which is controlled by the dispatching center, directly manages the transformer's switch operation, giving the attacker the potential to start a power outage.The first three steps are shown in Figure 12.
Step 4: Because the attacker oversees the dispatch center's monitoring system, the monitoring system may be turned off, leading to blind monitoring.
Step 5: The attack trail will be deleted by the killdisk once the attacker launches it through blackenergy, which also clears the attack trace, making it even more difficult for the staff to identify the attack's origin in time.
Step 6: After the attacker attacks the phone system with DDOS, residents are unable to feedback the outage event to the customer service via a malfunctioning which further prolong the power outage time.The latter three steps are shown in Figure 13.
Through the above discussion, we are aware of the basic strategy used in the Ukrainian power grid breach event.The knowledge map can be used to generate the following results by linking the aforementioned technique to it.The red node indicates a threat, the red line indicates possible damage the attacker and its tools could cause, the yellow line indicates potential impact on assets, and the blue line indicates the attacker's penetration attack.
Through the above illustration, each step taken by the attacker to breach the Ukrainian power infrastructure is easily understood.We may make a pre-decision beforehand based on the potential effects of the attacker, and we can protect the assets that might be impacted in advance.This method's analytical results are contrasted with power grid intrusion case report for Ukraine, and the results are shown in Table 10.
On the other hand, it turns out that the step of the Ukrainian power grid intrusion scenario that this methodology ultimately recreates is compatible with the actual intrusion process, and the reasoning conclusions in this study are more in-depth, which suggests that this method is effective.The fact that  In the future, we will not only include mitigation measures, but also consider probabilistic factors, which will lead to a more comprehensive ICS threat model.At the same time, we will also study more cases to turn the above semi-automated operations into automated steps.

Figure 1 .
Figure 1.Overall framework of this article

Figure 5 .
Figure 5. Raw data showing of power system attack dataset

Figure 6 .
Figure 6.The interrelationship among the six top classes (a) represents the data distribution of the binary dataset, Figure 7(b) represents the data distribution of the ternary dataset, and Figure 7(c) represents the distribution of the multivariate dataset, with the numbers in the figure representing the scene labels.The scenario 1 in Figure 7(a) is twice as different from scenario 2, scenario 2 in Figure 7(b) is 10 times as different from scenario 3, and the most numerous scenario in Figure 7(c) is 40 times as different from the least numerous scenario.

Figure 7 .
Figure 7. Data distribution within the power system attack dataset

Figure 9 .
Figure 9. Mapping of remote tripping command injection attack

Figure 10 .
Figure 10.Mapping of relay setting change attack

Figure 11 .
Figure 11.Invasion case of the Ukrainian power grid

Figure 14 .
Figure 14.The three aspects of asset-threat-consequence are mapped to the knowledge map

Table 1 . Dataset scenario comparison Data Type Dataset Contains Scenarios
Binary metadata setAttack events, natural events (such as sudden tripping and other natural events, not man-made events)Triple metadata set Attack event, natural event, no event (event under normal operation of power system)Multiple data setNatural event (including short circuit or line maintenance of different components), no event, attack event (including remote trip command attack of different components, data injection attack, relay setting change, etc.)

Table 4 . Specific accuracy value of each algorithm
data equalization and significantly improved on Random Forest, Decision Tree, and KNN.Especially regarding KNN, the dataset is improved by 10 percentage points and much better than the way of feature combination in the original paper.The importance of data equalization for model learning training can be seen in Table

Table 9 . Asset statistics in Ukrainian power grid Asset Type Related Asset
Production type assets Generator, transformer, circuit breaker, disconnector, transmission line, vacuum arc extinguishing chamber, switch cabinet, ring-network cabinet, lightning arrester, etc. perception type assets Monitoring and measuring instruments, voltage transformers, current transformers, relay protection devices, etc. supervision type assets DCS, SCADA, monitoring device, communication server, telephone system, MES, PLC, HMI, ERP protocol type assets IEC 60870-5, SSH, CIP, EtherNet/IP

Table 10 . Report on Ukraine power grid intrusion cases and results comparison Analysis Report Results of This Paper Attackers
use blackenergy to penetrate horizontally Use blackenergy to attack the dispatch center and intrude on the server Capture monitoring/device area host Control the primary monitoring and control system, causing a power outage and blindness Clear system log erase attack trace To erase attack traces, use killdisk Prevent users through the DDoS from contacting customer service DDoS assaults can be used to block users from calling customer support