A Modified Value Iteration Algorithm for Discounted Markov Decision Processes

A Modified Value Iteration Algorithm for Discounted Markov Decision Processes

Sanaa Chafik (Laboratory of Information Processing and Decision Support, University Sultan Moulay Slimane, Beni Mellal, Morocco) and Cherki Daoui (Laboratory of Information Processing and Decision Support, University Sultan Moulay Slimane, Beni Mellal, Morocco)
Copyright: © 2015 |Pages: 11
DOI: 10.4018/JECO.2015070104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified Value Iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.
Article Preview

2. Markov Decision Processes

2.1. Definition

We consider a stochastic dynamic system which is observed at discrete time points t = 1, 2,... . An MDP (Puterman, 2009) is defined by , where:

  • S={1, 2, ..., M} is the set of states of the environment (the state space), let is a random variable which represent the state of the system at time t whose values are in the state space S;

  • A is the set of actions (the action space);

  • T is the transition function (the probability of transitioning to state s’ when action a is executed in state s), where:

  • is the reward function (the immediate utility of executing the action a in state s), where: R: S × A → ℜ

A strategy is defined by a sequence where is a decision rule, that is a function where the set of all histories up to time t, and the set of probability distributions over . A Markov strategy is a strategy in which depends only on the current state at time t, a stationary strategy is a Markov strategy with identical decision rules, and a deterministic or pure strategy is a stationary strategy whose single decision rule is nonrandomized. In the following, we will assume the process stationary. For there to exist an optimal policy, there must exist an optimality criterion that places some kind of ordering on different policies.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 15: 4 Issues (2017)
Volume 14: 4 Issues (2016)
Volume 13: 4 Issues (2015)
Volume 12: 4 Issues (2014)
Volume 11: 4 Issues (2013)
Volume 10: 4 Issues (2012)
Volume 9: 4 Issues (2011)
Volume 8: 4 Issues (2010)
Volume 7: 4 Issues (2009)
Volume 6: 4 Issues (2008)
Volume 5: 4 Issues (2007)
Volume 4: 4 Issues (2006)
Volume 3: 4 Issues (2005)
Volume 2: 4 Issues (2004)
Volume 1: 4 Issues (2003)
View Complete Journal Contents Listing