Active Learning in Discrete-Time Stochastic Systems

Active Learning in Discrete-Time Stochastic Systems

Tadeusz Banek (Lublin University of Technology, Poland) and Edward Kozlowski (Lublin University of Technology, Poland)
DOI: 10.4018/978-1-61692-811-7.ch016
OnDemand PDF Download:


A general approach to self-learning based on the ideas of adaptive (dual) control is presented. This means that we consider the control problem for a stochastic system with uncertainty as a leading example. Some system’s parameters are unknown and modeled as random variables with known a’priori distribution function. To optimize an objective function, a controller has to learn the system’s parameter values. The main difficulty comes from the fact that he has to optimize the objective function parallely, i.e., at the same time. Moreover, these two goals considered separately not necessarily coincide and the main problem in the adaptive control is to find the trade-off between them. Looking from the self-learning perspective the two directions are visible. The first is to extract the learning procedure from an optimal adaptive control law and to formulate it as a Cybernetic Principle of self-learning. The second is to consider a control problem with the special objective function. This function has to measure our knowledge about unknown parameters. It can be the Fisher information (Banek & Kulikowski, 2003), the joint entropy (for example Saridis, 1988; Banek & Kozlowski, 2006), or something else. This objective function in the control problem will force a controller to steer a system along trajectories that are rich in information about unknown quantities. In this chapter the authors follow the both directions. First they obtain conditions of optimality for a general adaptive control problem and resulting algorithm for computing extremal controls. The results are then applied to the simple example of the Linear Quadratic Gaussian (LQG) problem. By using analytical results and numerical simulations the authors are able to show how control actions depend on the a’piori knowledge about a system. The first conclusion is that a natural, methodological candidate for the optimal self-learning strategy, the “certainty equivalence principle”, fails to satisfy optimality conditions. Optimal control obtained in the case of perfect system’s knowledge is not directly usable in the partial information case. The need of active learning is an essential factor. The differences between controls mentioned above are visible on a level of computations and should be interpreted on a higher level of cybernetic thinking in order to give a satisfactory explanation, perhaps in the form of another principle. Under absence of the perfect knowledge of parameters values, the control actions are restricted by some measurability requirement and the authors compute the Lagrange multiplier associated with this “information constraint”. The multiplier is called a “dual” or “shadow” price and in the literature of the subject is interpreted as an incremental value of information. The authors compute the Lagrange multiptier and analyze its evolution to see how its value changes as the time goes on. As a second sort of conclusion the authors get the self-learning characteristic coming from the information theory point of view. In the last section the authors follow the second direction. In order to estimate the speed of self-learning they choose as an objective function, the conditional entropy. They state the optimal control problem for minimizing the conditional entropy of the system under consideration. Using general results obtained at the beginning, they get the conditions of optimality and the resulting algorithm for computing the extremal controls. Optimal evolution of the conditional entropy tells much about intensivity of self-learning and its time distribution.
Chapter Preview


Learning is widely recognized as an important issue in modern, knowledge based societies. There is extensive literature on this subject in the areas of Management Sciences, System Sciences, Cybernetics, widely describing a need for investigations of learning processes. It is conjectured that the quality, speed and universality of these processes are crucial factors for comparison of modern and past societies.

Here is the right moment for reflection. If the self-learning processes are important and worth understanding for modern, knowledge based societies, but to difficult for studying directly, why do not try to understand them lean on the examples solved in Adaptive Control Theory? The following questions appear immediately; can these processes be described or investigated quantitatively? What means “passive” or “active” learning? There is any hope to apply the mathematical techniques helping to understand the essence of learning?

The aim of this chapter is to convince the reader that the answer could be positive and to propose an approach which is based on the ideas of adaptive control theory. To make the problem of active learning more specific we state a stochastic control problem with unknown parameters. This means we consider controlled systems having parameters which are unknown to the controller. They are modeled as random variables with known distribution functions (a’priori). The control law which has to optimize some objective function must take into account all available information, including information (posteriori) about parameters. More precise information about parameters, better results of control actions measured by the objective function. Observing system's trajectory the controller improve his knowledge about the parameters. Selecting trajectories he can choose the best one. But how to learn on line (!) the values of unknown parameters and how to do it in the most efficient way? These are the fundamental questions in the adaptive control theory. We believe in importance of this question and its universality on the general level - independently on any connections with optimal control (adaptive or not). Moreover, we hope that understanding this problem can - and must - help in much more complex and advanced problems of learning in knowledge based societies.

The chapter is organized as follows. In section 2 we state the adaptive control problem for nonlinear systems which are affine with respect to controls and disturbances. Applying weak variations, a technique from Calculus of Variations, we obtain a necessary condition of optimality. Following a seminal paper by Rishel (1986), we transfer this condition into an algorithm for computing extremal controls in section 3. In sections 4 and 5 a concept of incremental value of information is introduced. Roughly speaking, it is an approximate amount of money one has to pay for the exact knowledge of the parameters value. This value can be used for several purposes, for instance, a comparison of the net profit with the extra cost of possible purchasing of this information. In the next section we apply our general results to a simple one dimensional LQG problem. This shows several surprising effects of imperfect information and consequences of learning. For instance, the certainty equivalence principle, which was widely recognized methodological candidate for finding an optimal adaptive control is not valid in this case. Numerical simulations based on Rishel's algorithm suggest an alternative candidate that is explained in Conclusions. Finally, in the last section we introduce the so-called self-learning. This is done by considering the control problems with the conditional entropy, entering explicitly in the performance criteria. In this manner the self-learning, being the auxiliary objective, associated with the main objective in the task considered in classical automatics, became here the objective unto itself, the fundamental objective. The resulting trajectories say a lot about ξ, but, in contrast to the case analyzed in our previous paper Banek & Kozłowski (2005), where the joint entropy minimization problem was considered, now can be arbitrarily large. For non-technical systems (economical, social, etc.) such a formulation of the self-learning problem is natural. We show that this problem and its generalization can be treated as an optimal adaptive control problem, and solved by using Rishel's methodology (see e.g. Rishel, 1986; Harris & Rishel, 1986). Next, we present some results about modeling with conditional entropy and determining the optimal control for learning process without costs.

Complete Chapter List

Search this Book: