An Advance Q Learning (AQL) Approach for Path Planning and Obstacle Avoidance of a Mobile Robot

An Advance Q Learning (AQL) Approach for Path Planning and Obstacle Avoidance of a Mobile Robot

Arpita Chakraborty, Jyoti Sekhar Banerjee
Copyright: © 2013 |Pages: 21
DOI: 10.4018/ijimr.2013010105
(Individual Articles)
No Current Special Offers


The goal of this paper is to improve the performance of the well known Q learning algorithm, the robust technique of Machine learning to facilitate path planning in an environment. Until this time the Q learning algorithms like Classical Q learning(CQL)algorithm and Improved Q learning (IQL) algorithm deal with an environment without obstacles, while in a real environment an agent has to face obstacles very frequently. Hence this paper considers an environment with number of obstacles and has coined a new parameter, called ‘immediate penalty’ due to collision with an obstacle. Further the proposed technique has replaced the scalar ‘immediate reward’ function by ‘effective immediate reward’ function which consists of two fuzzy parameters named as, ‘immediate reward’ and ‘immediate penalty’. The fuzzification of these two important parameters not only improves the learning technique, it also strikes a balance between exploration and exploitation, the most challenging problem of Reinforcement Learning. The proposed algorithm stores the Q value for the best possible action at a state; as well it saves significant path planning time by suggesting the best action to adopt at each state to move to the next state. Eventually, the agent becomes more intelligent as it can smartly plan a collision free path avoiding obstacles from distance. The validation of the algorithm is studied through computer simulation in a maze like environment and also on KheperaII platform in real time. An analysis reveals that the Q Table, obtained by the proposed Advanced Q learning (AQL) algorithm, when used for path-planning application of mobile robots outperforms the classical and improved Q-learning.
Article Preview

Preliminaries Of Q Learning

Q learning is basically a model free Reinforcement Learning (Busoniu et al., 2010; Masoumzadeh et al., 2009), where a set of states S, a set of actions A, and a reward function R(S, A) are there. In each state sijimr.2013010105.m01S, the agent (Hsu et al., 2008; Zhou et al., 2007) takes an action aijimr.2013010105.m02A. Upon taking the action, the agent receives a reward R(s, a) and reaches to a new state s/. Q learning (Cho et al., 2007; Pandey et al., 2010), which has been developed in several stages (Chen et al., 2009), are explained briefly in the following section.

Classical Q- Learning (CQL)

In classical Q-learning, every possible state of an agent and its possible actions in a given state are deterministically known. In other words, for a given agent A, let s0, s1, s2... sn, be n- possible states, and each state has m possible actions ijimr.2013010105.m03ijimr.2013010105.m04ijimr.2013010105.m05...,ijimr.2013010105.m06. At a particular state-action pair ijimr.2013010105.m07the specific reward that the agent achieves is known as immediate reward ijimr.2013010105.m08(shown in Figure 1). The agent selects its next state from its current state using a policy that attempts to maximize the cumulative reward that the agent could have in subsequent transition of states from its next state (Dean et al., 1993; Bellman, 1957; Watkins et al., 1992). For example, let the agent be in state ijimr.2013010105.m09 and is expecting to select the next best state. Then the Q-value at state ijimr.2013010105.m10 due to action of ijimr.2013010105.m11 is given in (1).

Figure 1.

State-action pair with reward


Whereijimr.2013010105.m13 denotes the next state due to selection of action ijimr.2013010105.m14 at stateijimr.2013010105.m15. Let the next state selected beijimr.2013010105.m16.So, ijimr.2013010105.m17.Consequently selection of ijimr.2013010105.m18that maximizing ijimr.2013010105.m19 is an interesting problem. One main drawback for the above Q-learning is to know the Q value at a state ijimr.2013010105.m20for all possible actionijimr.2013010105.m21. As a result, each time it accesses the memory to get Q value for all possible actions at a particular state to determine the most appropriate next state. So it consumes more time to select the next state. Since the action ijimr.2013010105.m22 for which ijimr.2013010105.m23 is maximum needs to be evaluated, we can remodel the Q-learning equation by identifyingijimr.2013010105.m24 that drives the agent closer to the goal.

Complete Article List

Search this Journal:
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing