Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Markov Decision Process

Encyclopedia of Artificial Intelligence
The most common formalism for environments used in reinforcement learning, where the problem is described in terms of a finite set of states, a finite set of actions, transition probabilities between states, a reward signal and a discount factor
Published in Chapter:
Hierarchical Reinforcement Learning
Carlos Diuk (Rutgers University, USA) and Michael Littman (Rutgers University, USA)
Copyright: © 2009 |Pages: 6
DOI: 10.4018/978-1-59904-849-9.ch122
Abstract
Reinforcement learning (RL) deals with the problem of an agent that has to learn how to behave to maximize its utility by its interactions with an environment (Sutton & Barto, 1998; Kaelbling, Littman & Moore, 1996). Reinforcement learning problems are usually formalized as Markov Decision Processes (MDP), which consist of a finite set of states and a finite number of possible actions that the agent can perform. At any given point in time, the agent is in a certain state and picks an action. It can then observe the new state this action leads to, and receives a reward signal. The goal of the agent is to maximize its long-term reward. In this standard formalization, no particular structure or relationship between states is assumed. However, learning in environments with extremely large state spaces is infeasible without some form of generalization. Exploiting the underlying structure of a problem can effect generalization and has long been recognized as an important aspect in representing sequential decision tasks (Boutilier et al., 1999). Hierarchical Reinforcement Learning is the subfield of RL that deals with the discovery and/or exploitation of this underlying structure. Two main ideas come into play in hierarchical RL. The first one is to break a task into a hierarchy of smaller subtasks, each of which can be learned faster and easier than the whole problem. Subtasks can also be performed multiple times in the course of achieving the larger task, reusing accumulated knowledge and skills. The second idea is to use state abstraction within subtasks: not every task needs to be concerned with every aspect of the state space, so some states can actually be abstracted away and treated as the same for the purpose of the given subtask.
Full Text Chapter Download: US $37.50 Add to Cart
More Results
Decision Support for Smart Manufacturing
Describes the environment for solving the optimization problem by reinforcement learning or dynamic programming. Provides a mathematical framework for modeling DM in situations where outcomes are fully or partially observable.
Full Text Chapter Download: US $37.50 Add to Cart
Bayesian Agent Adaptation in Complex Dynamic Systems
In a Markov Decision Process, changes in the environment in response to an agent’s actions are determined only by the immediate state and actions, and not by any historical information.
Full Text Chapter Download: US $37.50 Add to Cart
Robust Adversarial Deep Reinforcement Learning
This MDP ensures that the agent's decisions are based on the current state, aiming to maximize long-term rewards.
Full Text Chapter Download: US $37.50 Add to Cart
Explainable Deep Reinforcement Learning for Knowledge Graph Reasoning
A decision-making process where the system moves between different states in discrete time steps. The selection of actions based on a transition probability. The whole procedure is reward driven.
Full Text Chapter Download: US $37.50 Add to Cart
Reinforcement Learning for Combinatorial Optimization
A discrete-time stochastic control process models the decision-making process in environments with randomness.
Full Text Chapter Download: US $37.50 Add to Cart
The Threat of Intelligent Attackers Using Deep Learning: The Backoff Attack Case
Mathematical framework used to model dynamical systems and obtain their optimal control policies.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR