Article Preview
Top1. Introduction
How do we know what to do? How can we be sure that we are accessing the right knowledge or doing the most appropriate behavior to be successful? These are central questions in human psychology as well as for attempts to apply understanding of human psychology in cognitive informatics, cognitive computing, and artificial systems using natural intelligence (Ferrucci et al., 2010; Georgeff et al., 1999; Lenat et al., 1990; Tian, Wang, Gavrilova, Ruhe, 2011; Wang, 2007, 2009; Wang & Ruhe, 2007). Humans and artificial systems can possess large amount of knowledge and have capability for executing a multitude of behaviors. But, they need ways to filter and select this knowledge and skill in a manner that optimizes successful decisions and behaviors. They also need ways to evolve and refine this knowledge and skill in a manner that adapts to changing environments and stimuli.
Attempts to answer these questions go back to the very start of psychological science. The initial answers came from research in behavioral psychology. Thorndike (1913) formulated the law of effect: those behaviors that lead to successful outcomes are strengthened and those behaviors that lead to unsuccessful outcomes are weakened. The law of effect was further formalized by subsequent research on reinforcement in classical (Pavlov, 1927) and operant conditioning theory (Ferster & Skinner, 1957). Essentially, reinforcement theory says that the future behavior is best predicted by those behaviors that have successfully lead to reward or successful goal attainment in the past; by reinforcement history.
Reinforcement processes have been incorporated into models of human cognition through expectancy theories (Wigfield, Tonks, & Eccles, 2004; Bandura, 1997). Reinforcement and reward processes also have been identified in neural learning (Kandel et al., 2012) and brain decision making (Deco, Rolls, Albantakis, & Romo, 2013). In computational psychology and cognitive informatics, reinforcement is the basis for the Rescorla-Wagner rule (Rescorla & Wagner, 1972) which is the foundation for the widely implemented delta rule for back propagation learning in neural networks (Jacobs, 1988). Reinforcement learning mechanisms also are widely implemented in machine learning (Mohri, Rostamizadeh, & Talwalkar, 2012) and in agent reasoning and decision making (Busoniu, Babuska, & De Schutter, 2008). Reinforcement mechanisms play a prominent role in decision theory models. Common choice mechanisms in decision making, including both utility theories and Bayesian loss minimization theories, rely on reward or reinforcement values and learning of expected reinforcers for various options (see Wang & Ruhe, 2007).
Despite this impressive history and success, even early psychological research suggested that reinforcement alone could not fully explain learning and behavioral processes. Along with the law of effect, Thorndike (1913) also observed that, irrespective of success or failure, organisms seemed to learn and behave based on cumulative repetition; what he called the law of exercise. The more often an organism did a behavior the more likely it was to do it again and the less often an organism did a behavior the less likely it was to do it again. The law of exercise forms the basis for Hebb’s rule of neural plasticity: associations between neurons that fire together are strengthened and associations between neurons that don’t fire together are weakened (Hebb, 1949). Reinforcement is tied to repetition and Hebbian neural plasticity because it is one mechanism through which repetitions occur. But, the law of exercise and Hebbian learning occur from repetition even if there is no reinforcement or reward.