Article Preview
TopIntroduction
Speaker recognition is the process of automatically recognizing a person by his or her voice by using speaker specific information included in speech waves (Huang, Acero, Hon, &Reddy, 2001;He& Deng, 2008).This technology is widely used in various real world situations such as access controls, telephone applications, PC logins, door control systems. Speaker recognition is a field of research that has been extensively studied in recent decades, but none of the proposed approaches is comparable to that of the human mind in terms of speed and accuracy.
Neuro-fuzzy modeling is an alternative method of speaker recognition, which combines the advantages of two intelligent approaches: neural networks and fuzzy logic in order to take advantage of the discriminative power of the former with the ability of reasoning and deduction of the latter. The model can be trained as a neural network with a linguistic interpretation of variables by fuzzy logic (Bojadziev & Bojadziev, 1995). Both encode information at the same time and distribute the architecture in a numerical framework. Several architectures have been proposed, depending on the type of rules they include Mamdani's or Sugeno's (Abraham, 2001; Kosko,1991). The Adaptive Network-based Fuzzy Inference System (ANFIS) model is one of the most influential fuzzy models proposed by Robert Jang(Jang, 1993) widely used (Fatemeh & Zahra, 2018 ;Gunasekaran, Varatharajan & Priyan, 2018 ; RazaviTermeh, Kornejady, Pourghasemi & Keesstra, 2018). The rule base of this model contains fuzzy 'if -then' rules of the Takagi and Sugeno's type in which the conclusion parts are linear functions of the inputs rather than fuzzy sets, thus reducing the number of required fuzzy rules.
Fuzzy model building involves two important phases: structure identification (which is the determination of number of fuzzy if-then rules and membership functions of the premise fuzzy sets); and, then optimization of these parameters (Jang, Sun, &Mizutani, 1997). Optimization of these parameters is one of the main issues of the ANFIS training, classical learning is based gradient descent, this algorithm has the disadvantage of falling into poor local minima since it is limited to a reduced search space in the neighborhood of an initial random solution that is not always suitable.
In this work, we propose an alternative training approach to optimize the ANFIS parameters more efficiently than the gradient method, the proposed approach is particle swarm optimization (PSO). The PSO algorithm looks for solutions in a larger search space that depends on the number of initial solutions (randomly generated) called swarms. The PSO technique is usually time-consuming to execute, the time required for a good convergence depends on the number of swarms and iterations, which depend on the number of parameters to be optimized. In order to reduce the cost of training, we propose to apply the PSO only to the premise part of the rules and to use a least square estimation (LSE) on the conclusion part. The proposed learning algorithm is tested on the CHAINS data set for speaker recognition. The obtained results are compared with those of the ANFIS trained by the gradient approach.
The rest of the paper is organized as follows: section 2 reviews related literature research on the topic of speaker recognition, section 3 describes the ANFIS model and PSO algorithm, then explain the development of the PSO-ANFIS model for speaker recognition, while section 4 details the experimental results and their discussion, finally Section 5 concludes the paper.