An Introduction to Fully and Partially Observable Markov Decision Processes

An Introduction to Fully and Partially Observable Markov Decision Processes

Pascal Poupart (University of Waterloo, Canada)
DOI: 10.4018/978-1-60960-165-2.ch003


The goal of this chapter is to provide an introduction to Markov decision processes as a framework for sequential decision making under uncertainty. The aim of this introduction is to provide practitioners with a basic understanding of the common modeling and solution techniques. Hence, we will not delve into the details of the most recent algorithms, but rather focus on the main concepts and the issues that impact deployment in practice. More precisely, we will review fully and partially observable Markov decision processes, describe basic algorithms to find good policies and discuss modeling/computational issues that arise in practice.
Chapter Preview


A central goal of artificial intelligence is the design of automated systems that can robustly accomplish a task despite uncertainty. Such systems can be viewed abstractly as taking inputs from the environment and producing outputs toward the realization of some goals. An important problem is the design of good control policies that produce suitable outputs based on the inputs received. For instance, a thermostat is an automated system that regulates the temperature of a room by controlling a heating device based on information provided by heat sensors. For such a simple system, a reactive control policy can maintain the temperature more or less constant by turning on and off the heating device when the temperature is below or above some target. For more complicated systems, effective control policies are often much harder to design. Consider a system designed to assist elderly persons suffering from memory deficiencies. Memory loss can severely hamper the ability of a person to accomplish simple activities of daily living such as dressing, toileting, eating, taking medication, etc. An automated system could help a person regain some autonomy by guiding a person with some audio prompts that remind the person of the next steps in the course of an activity. Suppose the system is equipped with sensors (e.g., video-cameras and microphones) to monitor the user, and actuators (e.g., speakers) to communicate with the user. The design of a suitable prompting strategy is far from obvious. In particular, the information provided by the sensors tends to be inaccurate due to the noisy nature of image and sound processing. Furthermore, that information may be incomplete due to the limited scope of the sensors. For example, although cameras and microphones allow the system to observe movements and utterances made by a user, they do not reveal the intentions nor the state of mind of people. Ideally, if the system could read minds, the design of effective prompting strategies could be eased significantly. Instead, the system must infer the state of the user based on the limited and noisy information provided by sensors. The effects of actuators may also be quite uncertain. For example, users may not always follow the prompts depending on their mood, their physical or mental weariness, etc. The system should then have the ability to take into account this uncertainty in its strategy.

In summary, the design of an effective prompting strategy is complicated by uncertainty in the action effects and the sensor measurements, as well as the interdependencies induced by the sequential nature of the task. Many other tasks such as mobile robot navigation, spoken dialog management, resource allocation, maintenance scheduling, planning and design of experiments also include sequential decision making problems under uncertainty. Hence, the goal of this chapter is to introduce Markov decision processes as a principled and flexible framework to tackle sequential decision making problems under uncertainty.

Markov decision processes (MDPs) were initially formalized in Operations Research (Bellman, 1957) to optimize various tasks with a sequential nature and some uncertainty. The idea is to specify a task formally by modeling explicitly each component including the uncertainty. Once a model of the task is specified, a computer optimizes a policy to perform the task. In general, it is often easier to specify a model and let a computer optimize a policy instead of trying to manually specify a policy. Hence this chapter describes the components of an MDP and some algorithms that can be used to optimize a policy.

MDPs can be divided in two groups: fully observable MDPs (when there is uncertainty about the action effects only) and partially observable MDPs (when there is uncertainty about the action effects and sensor measurements). Following common practice in the literature, we will use the MDP acronym to refer to fully observable problems and POMDP (partially observable Markov decision process) to refer to partially observable problems. Naturally, POMDPs are more expressive since uncertainty in the sensor measurements allow us to model problems where the underlying state cannot be fully recognized, but as a result the algorithms to optimize the policy are more complicated.

This chapter is organized as follows. In Section 2, we describe the components of an MDP and the three main classes of algorithms that can be used to optimize a policy. Section 3 considers the more general POMDPs and describes the model components as well as the solution algorithms. Section 4 gives an overview of some of the issues to deploy MDP applications in practice. Finally we conclude in Section 5.

Complete Chapter List

Search this Book: