Article Preview
TopParadox-Free Data Analysis In Numerical Comparisons Of Optimization Algorithms: A Review And Case Study
Numerical comparison is essential for evaluating the performance of optimization algorithms (Mersmann et al., 2015; Miettinen et al., 2003; Schittkowski et al., 1994; Simon et al., 2011), especially for evolutionary algorithms, heuristic algorithms, and swarm-based optimization algorithms, partly due to the absence of their theoretical analysis (Dolan & Moré, 2002; Halim et al., 2021; Hamacher, 2014; Memari & Ahmad, 2017; Osaba et al., 2021; Carrasco et al., 2020). A numerical comparison often includes three steps (Halim et al., 2021; Osaba et al., 2021; Schittkowski et al., 1994):
- 1.
Choosing the algorithms to be compared and select proper test problems.
- 2.
Conducting numerical experiments and gathering the needed data.
- 3.
Analyzing the data and interpreting the results.
In the past 20 years, significant progress was made in designing benchmark problems (Gaviano et al., 2003; Hansen et al., 2010; Škvorc et al., 2019a) and establishing data analysis methods (Bartz-Beielstein, 2005; Dolan & Moré, 2002; Rosner et al., 2006). These developments make algorithm competitions possible and popular, for instance, the algorithm competition of the IEEE Congress on Evolutionary Computation (CEC) (García-Martínez et al., 2017; Škvorc et al., 2019a) and the BBOB algorithm competition of the Genetic and Evolutionary Computation Conference (GECCO) (Finck et al.,2010; Škvorc et al., 2019b).
Unfortunately, recent research has demonstrated that two paradoxes may occur in numerical comparisons of optimization algorithms (Liu et al., 2020). Furthermore, the key reason for paradoxes lies in the comparison strategy behind the data analysis method (Liu et al., 2020). Specifically, for data analysis methods adopting the “C2” strategy (pairwise comparisons), the ranking result may form a loop without producing a winner, which is called the cycle ranking paradox. On the other hand, for data-analysis methods adopting the “C2+” strategy (comparing all algorithms simultaneously), the winner may be different from that of the C2 strategy, this phenomenon is called the survival of the nonfittest paradox. According to Liu et al.’s (2020) calculation, the occurrence probability of cycle ranking paradox is about 9% when only three algorithms are compared, and it becomes larger and larger as the number of algorithms increases. When comparing nine algorithms, the occurrence probability of cycle ranking paradox is greater than 45%. These results indicate that we have to take seriously the impact of paradoxes (Liu et al., 2020).