Carlos Diuk (Rutgers University, USA) and Michael Littman (Rutgers University, USA)

DOI: 10.4018/978-1-59904-849-9.ch122

Chapter Preview

TopIn this section, we will introduce the MDP formalism, where most of the research in standard RL has been done. We will then mention the two main approaches used for learning MDPs: model-based and model-free RL. Finally, we will introduce two formalisms that extend MDPs and are widely used in the Hierarchical RL field: semi-Markov Decision Processes (SMDPs) and Factored MDPs.

A Markov Decision Process consists of:

*•*a set of states S

*•*a set of actions A

*•*a transition probability function:

*Pr(s’ | s, a),*representing the probability of the environment transitioning to state*s’*when the agent performs action*a*from state*s*. It is sometimes notated*T(s, a, s’).**•*a reward function:

*E[r | s, a]*, representing the expected immediate reward obtained by taking action*a*from state*s.**•*a discount factor γ ∈ (0, 1], that downweights future rewards and whose precise role will be clearer in the following equations.

A *deterministic policy* p*:S -> A* is a function that determines, for each state, what action to take. For any given policy p, we can define a *value function V ^{π},* representing the

Markov Decision Process: The most common formalism for environments used in reinforcement learning, where the problem is described in terms of a finite set of states, a finite set of actions, transition probabilities between states, a reward signal and a discount factor

Semi-Markov Decision Process: An extension to the MDP formalism that deals with temporally extended actions and/or continuous time.

Factored-State Markov Decision Process: An extension to the MDP formalism used in Hierarchical RL where the transition probability is defined in terms of factors, allowing the representation to ignore certain state variables under certain contexts

Hierarchical Task Decomposition: A decomposition of a task into a hierarchy of smaller subtasks.

Hierarchical Reinforcement Learning: A subfield of reinforcement learning concerned with the discovery and use of task decomposition, hierarchical control, temporal and state abstraction (Barto Mahadevan, 2003)

State-Space Generalization: The technique of grouping together states in the underlying MDP and treating them as equivalent for certain purposes.

Reinforcement Learning: The problem faced by an agent that learns to a utility measure behavior from its interaction with the environment.

Search this Book:

Reset

Copyright © 1988-2019, IGI Global - All Rights Reserved