This description is based on the identification of a set of generic components, which can be found in any learning by imitation architecture. It highlights the main contribution of the proposed architecture: the use of an inner human model to help perceiving, recognizing and learning human gestures. This allows different robots to share the same perceptual and knowledge modules. Experimental results show that the proposed architecture is able to meet the requirements of learning by imitation scenarios. It can also be integrated in complete software structures for social robots, which involve complex attention mechanisms and decision layers.
Robots have been massively used in industrial environments for the last fifty years. Industrial robots are designed to perform repetitive, predictable tasks, but are not able to easily adapt or learn new behaviours (Craig, 1986). In order to execute their programmed tasks, they have to sense only a constrained set of environmental parameters, thus perceptual systems mounted on industrial robots are simple, practical and task-oriented. On the other hand, they are designed to work in environments in which human presence is limited and controlled, if allowed. Thus, while their usefulness is evident, industrial robots are strongly limited. In order to remove these limitations, a new generation of robots began to appear more than thirty-five years ago (Inoue, Tachi, Nakamura, Hirai, Ohyu, Hirai, Tanie Yokoi & Hirukawa, 2001). These robots were designed to cooperate with people in everyday activities, to adapt to uncontrolled environments and new tasks, and to become engaging companions for people to interact with. They usually benefit from sharing human perceptual and motor capabilities, and thus the term humanoid robot was used to name these agents. In the last decade, however, the difficulties of creating robots that resemble human beings have favoured the use of the more generic term social robot. Thus, today it is accepted that, although humanoid robots are certainly designed to be social, social robots do not need to be humanoid.
According to an early definition of social robot (Dautenhahn & Billard, 1999) social robots are agents designed to be part of an heterogeneous group. They should be able to recognize, explicitly communicate with and learn from other individuals in this group. They also possess history (i.e. they sense and interpret their environment in terms of their own experience). While this is a generic definition, in practice social robots are designed to work in human societies. Thus, later definitions of social robots present them as agents that have to interact with people (Breazeal, Brooks, Gray, Hancher, McBean, Stiehl & Strickon 2003). In this chapter the same ideas are followed, and social robots are understood as “robots that work in real social environments, and that are able to perceive, interact with and learn from other individuals, being these individuals people or other social agents” (Bandera, 2010, pp. 9).
Social robots have different options to achieve learning. Individual learning mechanisms (e.g. trial-and-error, imprinting, classical conditioning, etc.) are one of these options. However, their application to a social robot may lead it to learn incorrect, disturbing or even dangerous behaviours. Thus, they should be restricted to specific scenarios and tasks (e.g. games based on controlled stigmergy) (Breazeal et al., 2003; Bandera, 2010). Social learning mechanisms are a different option, which allows the human teacher to supervise the learning process avoiding most issues of individual learning. Among different social learning strategies, learning by imitation appears as one of the most intuitive and powerful ones.
This chapter describes a RLbI architecture that provides a social robot with the ability to learn and imitate upper-body social gestures. This architecture, that is the main topic of the first author's Thesis (Bandera, 2010), uses an interface based on a pair of stereo cameras, and a model-based perception component to capture human movements from input image data. Perceived human motion is segmented into discrete gestures and represented using features. These features are subsequently employed to recognize and learn gestures. One of the main differences of this proposal with respect to previous approaches is that all these processes are executed in the human motion space, not in the robot motion space. This strategy avoids constraining the perceptual capabilities of the robot due to its physical limitations. It also eases sharing knowledge among different robots. Only if the social robot needs to perform physical imitation, a translation module is used that combines different strategies to produce valid robot motion from learned human gestures.