In a complex multiagent system, agents may have different partial information about the system’s state and the information held by other agents in the system. In a distributed urban traffic control, where each junction has an independent controller, agents that learn can benefit from exchanging information, but this exchange of information may not always be useful. In this chapter the authors analyze how agents can benefit from sharing information in an urban traffic control scenario and the consequences of this cooperation in the performance of the traffic system.
Urban traffic control (UTC) is an important and challenging real-world problem. This problem has several important characteristics, related to its dynamics (changes in the environment are not only consequences of the agents actions, changes are beyond the agents control); to non-determinism (each action may have more than one possible effect); and to partial observability (each agent perceives a limited fraction of the current environment state).
Multiagent learning can be seen as a suitable tool for coping with the issues related to the dynamicity in this scenario. Formalizing the problem of control is an important part of the solution, and the theory of Markov Decision Processes (MDP) has shown to be particularly powerful in that context. Defining the traffic control problem as a single MDP, i.e. in a centralized way, would lead to an unsolvable problem, due the large number of possible states. For instance, consider a scenario where six traffic lights with five possible states each, according to the incoming links (streets) congestion: all links have the same number of stopped vehicles, North link has more waiting vehicles, South link has more waiting vehicles, East link has more waiting vehicles, and West link has more waiting vehicles. In this case, the number of possible states is 15,625 (56) and the number of possible joint actions is 729 (36), considering that each traffic light has three possible actions. The number of Q-values is 11,390,625 (729 x 15,625). In a decentralized solution, traffic controlling agents may have different partial information about the system state and the information held by other agents in the system.
The distributed urban traffic control (DUTC) problem has some important characteristics to be considered: a large number of possible traffic pattern configurations, limited communication, limited observation, limited action frequency and delayed reward information. Delayed reward, since the traffic flow takes some time to respond to the agent’s actions, this time is, at least, the duration of one cycle time.
In this chapter we explore some questions about information shared among learning agents in a traffic scenario. We also discuss the multiagent reinforcement learning used as a solution to the traffic control problem, current limitations and further developments.