Applications of Reinforcement Learning and Bayesian Networks Algorithms to the Load-Frequency Control Problem

Applications of Reinforcement Learning and Bayesian Networks Algorithms to the Load-Frequency Control Problem

Fatemeh Daneshfar (University of Kurdistan, Iran)
DOI: 10.4018/978-1-4666-4450-2.ch023
OnDemand PDF Download:
No Current Special Offers


Load-Frequency Control (LFC) is an essential auxiliary service to keep the electrical system reliability at a suitable level. In addition to the regulating area frequency, the LFC system should control the net interchange power with neighboring areas at scheduled values. Therefore, a desirable LFC performance is achieved by effective adjusting of generation to minimize frequency deviation and regulate tie-line power flows. Nowadays such an LFC design is becoming much more complicated and significant due to the complexity of interconnected power systems. However, most of the LFC designs are based on conventional Proportional-Integral (PI) controllers that are tuned online by trial-and-error approaches. These conventional LFC designs are usually suitable for working at specific operating points and are not more efficient for modern and distributed power systems. These problems apply to design of intelligent LFC schemes that are more adaptive and flexible than conventional ones. The present chapter addresses the frequency regulation using Reinforcement Learning (RL) and Bayesian Networks (BNs) approaches for interconnected power systems. RL and BNs are computational learning based solutions which can adapt with environment conditions. They are a kind of Machine Learning (ML) techniques which have many applications in power system engineering. The main advantages of these intelligent-based solutions for the LFC design can be simplicity and intuitive model building that is closely based on the physical power system topology, easy incorporation of uncertainty, and dependent to the frequency response model and also to the power system parameter values.
Chapter Preview


The main frequency is an important parameter of an electrical power system. It can change over a small range due to generation-load mismatches. Therefore, system frequency control on an isolated power system is particularly one of the important power system control problems and has an important role to enable power exchanges and to provide better conditions for the electricity trading (Bevrani, 2009).

However existing Load Frequency Control (LFC) solutions that use classical or trial-and-error approaches to tune the PI controller parameters are more difficult and time-consuming to design. They are usually suitable for working at specific operating points, and are not more efficient for modern power systems, considering increasing size, changing structure, and new uncertainties. These controllers are designed for a specific disturbance, if the nature of the disturbance varies, they may not perform as expected. Also most of the applied linear modern/robust control techniques suggest complex control structure with high-order dynamic controllers too which the importance and difficulties in the selection of weighting functions of these approaches and the pole-zero cancellation phenomenon associated with it produces closed loop poles and reduce their applicability (Daneshfar & Bevrani, 2012). Therefore, it is expected that using intelligent controllers in modern and distributed environment to be more adaptive/flexible than conventional ones. This chapter addresses two different learning based algorithms, Reinforcement Learning (RL) and Bayesian Networks (BNs) to satisfy LFC objectives in distributed environments. These approaches are intelligent and systematic learning based methods so that they can learn and update their decision-making capability (Ernst et. al., 2004). Also they have many applications in power system frequency control (Bevrani et. al., 2012; Daneshfar & Bevrani, 2010; Daneshfar et. al., 2011).

RL is one of the adaptive and nonlinear algorithms that is independent of environmental conditions (Sutton & Barto, 1998). It is a learning method which is suitable for unknown environments with nonlinearities and many conditions. It also allows the machine or software agent to learn the behavior based on feedback from the environment. This behavior can be learnt once and for all, or keep on adapting as time goes by. Again reinforcement learning differs from the other kinds of learning algorithms in several ways. The most important difference is that there aren’t any pairs of input/output. Instead, after choosing an action, the agent received the immediate reward and the subsequent state, but is not told which action is the best choice. The agent should gather useful experience about the possible system states, actions, transitions and rewards actively to act optimally (Sutton, 1996).

This automated learning scheme implies that RL algorithm works well in nonlinear conditions and can easily be scalable for large-scale engineering systems. Also there is a little need to a human expert who knows about the domain of application and much less time will be spent for designing a solution, since there is no need for hand-crafting complex sets of rules as with expert systems, and all that is required is someone familiar with reinforcement learning (Dung et. al., 2008).

Key Terms in this Chapter

Area Control Error (ACE): According to Bevrani (2009) , in a multi-area power system, in addition to regulating area frequency, the supplementary control should maintain the net interchange power with neighboring areas at scheduled values. This is generally accomplished by adding a tie line flow deviation to the frequency deviation in the supplementary feedback loop. A suitable linear combination of frequency and tie line power changes for area i , is known as the area control error. Actually ACE is the difference between scheduled and actual electrical generation within a control area on the power grid, taking frequency bias into account.

Q-learning: Is a model-free reinforcement learning technique that works by learning an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy. Q-learning uses temporal differences to estimate the value of In Q-learning, the agent maintains a table of where is the set of states and is the set of actions. represents its current estimate of The learned action-value function, directly approximates the optimal action-value function, independent of the policy being followed. This dramatically simplifies the analysis of the algorithm and enabled early convergence proofs.

Proportional Integral (PI) Controller: A proportional–integral–derivative controller (PID controller) is a generic control loop feedback mechanism(controller) widely used in industrial control systems. A PID controller calculates an “error” value as the difference between a measured process variable and a desired set point. The controller attempts to minimize the error by adjusting the process control inputs.

Markov Decision Process (MDP): A reinforcement learning task that satisfies the Markov property is called a Markov decision process, or MDP. It is a discrete time stochastic control process which if at each time step, the process is in some state s and the decision maker may choose any action that is available in state s . The process responds at the next time step by randomly moving into a new state S’ and giving the decision maker a corresponding reward If the state and action spaces are finite, then it is called a finite Markov decision process (finite MDP).

Bayesian Inference: According to Aster et. al., (2012) , Bayesian inference is a method of inference in which Bayes' rule is used to update the probability estimate for a hypothesis as additional evidence is learned.

Causal Relationships: Causality is the relationship between cause and effect. Then causal relationships is a relationship between one phenomenon or event () and another () in which precedes and causes .

Bayesian probability: Is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions whose truth or falsity is uncertain.

Multi-Agent Systems (MAS): A multi-agent system can be defined as a group of autonomous, interacting entities (agents) sharing a common environment, which they perceive with sensors and act with actuators. Then multi-agent systems consist of agents and their environment. According to Wooldridge (2002) . The agents in a multi-agent system have several important characteristics like, autonomy : the agents are at least partially autonomous, local views : no agent has a full global view of the system, decentralization : there is no designated controlling agent (or the system is effectively reduced to a monolithic system).

Agent: Is an autonomous entity which observes through sensors and acts upon an environment using actuators and directs its activity towards achieving goals. Russell & Norvig (2003) group agents into five classes based on their capability, Simple reflex agents , act only on the basis of the current percept, ignoring the rest of the percept history. Model-based reflex agents , it can handle a partially observable environment. Its current state is stored inside the agent maintaining some kind of structure which describes the part of the world which cannot be seen. Goal-based agents , expands on the capabilities of the model-based agents, by using “goal” information. Utility-based agents , defines a measure of how desirable a particular state is. This measure can be obtained through the use of a utility function which maps a state to a measure of the utility of the state. Learning agents , allows the agents to initially operate in unknown environments and to become more competent than its initial knowledge alone might allow.

Frequency: Is the number of occurrences of a repeating event per unit time. For counts per a time interval, the unit of frequency is the hertz (Hz), 1 Hz means that an event repeats once per second.

Complete Chapter List

Search this Book: