A Modified Value Iteration Algorithm for Discounted Markov Decision Processes

A Modified Value Iteration Algorithm for Discounted Markov Decision Processes

Sanaa Chafik (Laboratory of Information Processing and Decision Support, University Sultan Moulay Slimane, Beni Mellal, Morocco) and Cherki Daoui (Laboratory of Information Processing and Decision Support, University Sultan Moulay Slimane, Beni Mellal, Morocco)
Copyright: © 2015 |Pages: 11
DOI: 10.4018/JECO.2015070104

Abstract

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified Value Iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.
Article Preview
Top

2. Markov Decision Processes

2.1. Definition

We consider a stochastic dynamic system which is observed at discrete time points t = 1, 2,... . An MDP (Puterman, 2009) is defined by JECO.2015070104.m01, where:

  • S={1, 2, ..., M} is the set of states of the environment (the state space), let JECO.2015070104.m02 is a random variable which represent the state of the system at time t whose values are in the state space S;

  • A is the set of actions (the action space);

  • T is the transition function (the probability of transitioning to state s’ when action a is executed in state s), where: JECO.2015070104.m03

  • JECO.2015070104.m04is the reward function (the immediate utility of executing the action a in state s), where: R: S × A → ℜ

A strategy JECO.2015070104.m05 is defined by a sequence JECO.2015070104.m06 where JECO.2015070104.m07 is a decision rule, that is a function JECO.2015070104.m08 where JECO.2015070104.m09 the set of all histories up to time t, and JECO.2015070104.m10 the set of probability distributions over JECO.2015070104.m11. A Markov strategy is a strategy JECO.2015070104.m12 in which JECO.2015070104.m13 depends only on the current state at time t, a stationary strategy is a Markov strategy with identical decision rules, and a deterministic or pure strategy is a stationary strategy whose single decision rule is nonrandomized. In the following, we will assume the process stationary. For there to exist an optimal policy, there must exist an optimality criterion that places some kind of ordering on different policies.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 19: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 18: 4 Issues (2020): 3 Released, 1 Forthcoming
Volume 17: 4 Issues (2019)
Volume 16: 4 Issues (2018)
Volume 15: 4 Issues (2017)
Volume 14: 4 Issues (2016)
Volume 13: 4 Issues (2015)
Volume 12: 4 Issues (2014)
Volume 11: 4 Issues (2013)
Volume 10: 4 Issues (2012)
Volume 9: 4 Issues (2011)
Volume 8: 4 Issues (2010)
Volume 7: 4 Issues (2009)
Volume 6: 4 Issues (2008)
Volume 5: 4 Issues (2007)
Volume 4: 4 Issues (2006)
Volume 3: 4 Issues (2005)
Volume 2: 4 Issues (2004)
Volume 1: 4 Issues (2003)
View Complete Journal Contents Listing