The optimization of a cost function which has a number of local minima is a relevant subject in many important fields. For instance, the determination of the weights of learning machines depends in general on the solution of global optimization tasks (Haykin, 1999). A feature shared by almost all of the most common deterministic and stochastic algorithms for continuous non – linear optimization is that their performance is strongly affected by their starting conditions. Depending on the algorithm, the correct selection of an initial point or set of points have direct consequences on the efficiency, or even on the possibility to find the global minima. Of course, adequate selection of seeds implies prior knowledge on the structure of the optimization task. In the absence of prior information, a natural choice is to draw seeds from a uniform density defined over the search space. Knowledge on the problem can be gained through the exploration of this space. In this contribution is presented a method to estimate probability densities that describe the asymptotic behavior of general stochastic search processes over continuously differentiable cost functions. The relevance of such densities is that they give a description of the residence times over the different regions of the search space, after an infinitely long exploration. The preferred regions are those which minimize the cost globally, which is reflected in the asymptotic densities. In first instance, the resulting densities can be used to draw populations of points that are consistent with the global properties of the associated optimization tasks.
Stochastic strategies for optimization are essential to most of the heuristic techniques used to deal with complex, unstructured global optimization problems (Pardalos, 2004). The roots of such methods can be traced back to the Metropolis algorithm (Metropolis, Rosenbluth, Rosenbluth, Teller & Teller, 1953), introduced in the early days of scientific computing to simulate the evolution of a physical system to thermal equilibrium. This process is the base of the simulated annealing technique (Kirkpatrick, Gellat & Vecchi, 1983), which makes use of the convergence to a global minimum in configurational energy observed in physical systems at thermal equilibrium as the temperature goes to zero.
The method presented in this contribution is rooted in similar physical principles as those on which simulated annealing type algorithms are based. However, in contrast with other approaches (Suykens, Verrelst & Vandewalle, 1998) (Gidas, 1995) (Parpas, Rustem & Pistikopoulos, 2006), the proposed method considers a density of points instead of Markov transitions of individual points. The technique is based in the interplay between Langevin and Fokker – Planck frameworks for stochastic processes, which is well known in the study of out of equilibrium physical systems (Risken, 1984) (Van Kampen, 1992). Fokker - Planck equation has been already proposed for its application in search algorithms, in several contexts. For instance, it has been used to directly study the convergence of populations of points to global minima (Suykens, Verrelst & Vandewalle, 1998), as a tool to demonstrate the convergence of simulated annealing type algorithms (Parpas, Rustem & Pistikopoulos, 2006) (Geman & Hwang, 1986), or as a theoretical framework for Boltzmann type learning machines (Movellan & McClelland, 1993) (Kosmatopoulos & Christodoulou, 1994). In the context of global optimization by populations of points, it has been proposed that the populations evolve under the time – dependent version of the Fokker – Planck equation, following a schedule for the reduction of the diffusion constant D (Suykens, Verrelst & Vandewalle, 1998).
Key Terms in this Chapter
Gibbs Sampler: A procedure to sample the marginal densities from a high dimensional distribution using one dimensional conditional probabilities.
Random Deviate Generation Process: A process which generates random numbers according to a specific probability distribution.
Configurational Energy: Refers to the potential energy associated with the various forces within the elements of a physical system.
Search Space: This is the set of all the feasible solutions for an optimization problem.
Learning Machine: This term refers to the development of techniques for automatic extraction of patterns from massive data sets and to the construction of deductive rules. In the context of this article, this concept deals with the automatic learning of densities in global optimization problems
Stochastic Search: Is an optimization algorithm which incorporate randomness in its exploration of the search space.
Thermal Equilibrium: State in which a physical system is described by probability measures that are time independent.
Diffusion Process: Random displacement of particles in a physical system due to the action of a temperature.
Heuristic: Is any algorithm that finds a good quality solution to a problem in a reasonable run time.
Diffusion Constant: Measures the degree of randomness in a diffusion process. The diffusion constant is proportional to the mean square distance moved by particles under diffusion in a given time interval.