Study of Sensitive Parameters of PSO Application to Clustering of Texts

Study of Sensitive Parameters of PSO Application to Clustering of Texts

Reda Mohamed Hamou (Department of Mathematics and Computer Science, Dr. Moulay Tahar University of Saïda, Saïda, Algeria), Abdelmalek Amine (Department of Mathematics and Computer Science, Dr. Moulay Tahar University of Saïda, Saïda, Algeria) and Ahmed Chaouki Lokbani (Department of Mathematics and Computer Science, Dr. Moulay Tahar University of Saïda, Saïda, Algeria)
Copyright: © 2013 |Pages: 15
DOI: 10.4018/jaec.2013040104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this paper, the authors study the parameter sensitivity of the technique of particles warm optimization (PSO) for the clustering of data, in particular the text. They experienced the PSO parameters by varying within a range of research and we noted the best result of clustering based on three measures of assessment, internal, which is the index of Davies and Bouldin and two external based on recall and precision that are the F-measure and entropy. Every time they finished an experimentation of a parameter, it is fixed to its optimal value for the next experiment parameters. The results showed a high sensitivity of some parameters on the result of clustering.
Article Preview

Representation Of Textual Documents

The texts in natural language cannot be directly interpreted by a classifier or by classification algorithms from which the need for a mathematical representation of the text such that we can perform analytical processing thereon, all maintaining maximum semantics. The representation, which is generally used, is the use of a vector space as target representation space. The main feature of the vector representation is that each linguistic unit is associated with a specific dimension in the vector space. Two texts using the same textual segments are therefore projected onto identical vectors.

Several approaches for the representation of texts exist in the literature among which there are the bag of words representation, which is the simplest approach and widely used, the representation “bag of phrases” representation by lexical roots and of course the representation by n-grams which is independent representation of natural language (Shannon, 1948).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017): 3 Released, 1 Forthcoming
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing