Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Markov Decision Process (MDP)

Perspectives and Considerations on the Evolution of Smart Systems
It is a discrete time and a discrete state space stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.
Published in Chapter:
The Explainable Model to Multi-Objective Reinforcement Learning Toward an Autonomous Smart System
Tomohiro Yamaguchi (Nara College, National Institute of Technology, Japan)
Copyright: © 2023 |Pages: 17
DOI: 10.4018/978-1-6684-7684-0.ch002
Abstract
The mission of this chapter is to add an explainable model to multi-goal reinforcement learning toward an autonomous smart system to design both complex behaviors and complex decision making friendly for a human user. At the front of the introduction section, and a relation between reinforcement learning including an explainable model and a smart system is described. To realize the explainable model, this chapter formalizes the exploration of various behaviors toward sub-goal states efficiently and in a systematic way in order to collect complex behaviors from a start state towards the main goal state. However, it incurs significant learning costs in previous learning methods, such as behavior cloning. Therefore, this chapter proposes a novel multi-goal reinforcement learning method based on the iterative loop-action selection strategy. As a result, the complex behavior sequence is learned with a given sub-goal sequence as a sequence of macro actions. This chapter reports the preliminary work carried out under the OpenAIGym learning environment with the CartPoleSwingUp task.
Full Text Chapter Download: US $37.50 Add to Cart
More Results
Applications of Reinforcement Learning and Bayesian Networks Algorithms to the Load-Frequency Control Problem
A reinforcement learning task that satisfies the Markov property is called a Markov decision process, or MDP. It is a discrete time stochastic control process which if at each time step, the process is in some state s and the decision maker may choose any action that is available in state s . The process responds at the next time step by randomly moving into a new state S’ and giving the decision maker a corresponding reward If the state and action spaces are finite, then it is called a finite Markov decision process (finite MDP).
Full Text Chapter Download: US $37.50 Add to Cart
Formalizing Model-Based Multi-Objective Reinforcement Learning With a Reward Occurrence Probability Vector
It is a discrete time and a discrete state space stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.
Full Text Chapter Download: US $37.50 Add to Cart
Model-Based Multi-Objective Reinforcement Learning by a Reward Occurrence Probability Vector
It is a discrete time and a discrete state space stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR