Reference Hub1
Active Learning in Discrete-Time Stochastic Systems

Active Learning in Discrete-Time Stochastic Systems

Tadeusz Banek, Edward Kozlowski
ISBN13: 9781616928117|ISBN10: 1616928115|EISBN13: 9781616928131
DOI: 10.4018/978-1-61692-811-7.ch016
Cite Chapter Cite Chapter

MLA

Banek, Tadeusz, and Edward Kozlowski. "Active Learning in Discrete-Time Stochastic Systems." Knowledge-Based Intelligent System Advancements: Systemic and Cybernetic Approaches, edited by Jerzy Jozefczyk and Donat Orski, IGI Global, 2011, pp. 350-371. https://doi.org/10.4018/978-1-61692-811-7.ch016

APA

Banek, T. & Kozlowski, E. (2011). Active Learning in Discrete-Time Stochastic Systems. In J. Jozefczyk & D. Orski (Eds.), Knowledge-Based Intelligent System Advancements: Systemic and Cybernetic Approaches (pp. 350-371). IGI Global. https://doi.org/10.4018/978-1-61692-811-7.ch016

Chicago

Banek, Tadeusz, and Edward Kozlowski. "Active Learning in Discrete-Time Stochastic Systems." In Knowledge-Based Intelligent System Advancements: Systemic and Cybernetic Approaches, edited by Jerzy Jozefczyk and Donat Orski, 350-371. Hershey, PA: IGI Global, 2011. https://doi.org/10.4018/978-1-61692-811-7.ch016

Export Reference

Mendeley
Favorite

Abstract

A general approach to self-learning based on the ideas of adaptive (dual) control is presented. This means that we consider the control problem for a stochastic system with uncertainty as a leading example. Some system’s parameters are unknown and modeled as random variables with known a’priori distribution function. To optimize an objective function, a controller has to learn the system’s parameter values. The main difficulty comes from the fact that he has to optimize the objective function parallely, i.e., at the same time. Moreover, these two goals considered separately not necessarily coincide and the main problem in the adaptive control is to find the trade-off between them. Looking from the self-learning perspective the two directions are visible. The first is to extract the learning procedure from an optimal adaptive control law and to formulate it as a Cybernetic Principle of self-learning. The second is to consider a control problem with the special objective function. This function has to measure our knowledge about unknown parameters. It can be the Fisher information (Banek & Kulikowski, 2003), the joint entropy (for example Saridis, 1988; Banek & Kozlowski, 2006), or something else. This objective function in the control problem will force a controller to steer a system along trajectories that are rich in information about unknown quantities. In this chapter the authors follow the both directions. First they obtain conditions of optimality for a general adaptive control problem and resulting algorithm for computing extremal controls. The results are then applied to the simple example of the Linear Quadratic Gaussian (LQG) problem. By using analytical results and numerical simulations the authors are able to show how control actions depend on the a’piori knowledge about a system. The first conclusion is that a natural, methodological candidate for the optimal self-learning strategy, the “certainty equivalence principle”, fails to satisfy optimality conditions. Optimal control obtained in the case of perfect system’s knowledge is not directly usable in the partial information case. The need of active learning is an essential factor. The differences between controls mentioned above are visible on a level of computations and should be interpreted on a higher level of cybernetic thinking in order to give a satisfactory explanation, perhaps in the form of another principle. Under absence of the perfect knowledge of parameters values, the control actions are restricted by some measurability requirement and the authors compute the Lagrange multiplier associated with this “information constraint”. The multiplier is called a “dual” or “shadow” price and in the literature of the subject is interpreted as an incremental value of information. The authors compute the Lagrange multiptier and analyze its evolution to see how its value changes as the time goes on. As a second sort of conclusion the authors get the self-learning characteristic coming from the information theory point of view. In the last section the authors follow the second direction. In order to estimate the speed of self-learning they choose as an objective function, the conditional entropy. They state the optimal control problem for minimizing the conditional entropy of the system under consideration. Using general results obtained at the beginning, they get the conditions of optimality and the resulting algorithm for computing the extremal controls. Optimal evolution of the conditional entropy tells much about intensivity of self-learning and its time distribution.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.