Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

An Optimal Configuration of Sensitive Parameters of PSO Applied to Textual Clustering

Reda Mohamed Hamou, Abdelmalek Amine, Mohamed Amine Boudia, Ahmed Chaouki Lokbani

Source Title: Exploring Critical Approaches of Evolutionary Computation

DOI: 10.4018/978-1-5225-5832-3.ch010

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The clustering aims to minimize intra-class distance in the cluster and maximize extra-classes distances between clusters. The text clustering is a very hard task; it is solved generally by metaheuristic. The current literature offers two major metaheuristic approaches: neighborhood metaheuristics and population metaheuristics. In this chapter, the authors seek to find the optimal configuration of sensitive parameters of the PSO algorithm applied to textual clustering. The study will go through in dissociable steps, namely the representation and indexing textual documents, clustering by biomimetic approach, optimized by PSO, the study of parameter sensitivity of the optimization technique, and improvement of clustering. The authors will test several parameters and keep the best configurations that return the best results of clustering. They will use the most widely used evaluation measures like index of Davies and Bouldin (internal) and two external: the F-measure and entropy, which are based on recall and precision.

Chapter Preview

Top

Introduction

Currently, due to the exponentially increasing amount of electronic textual information, the major problem for computer scientists is access to the content of textual information. This requires the use of more specific tools to access and siphon through the content of texts in a faster and more effective way.

Text Mining aims to develop new and effective algorithms for processing, searching, and extracting knowledge from textual and unstructured documents. One of the techniques widely used is called clustering.

Nature is a source of inspiration for researchers in various fields. These inspirations offer a natural framework to solve these problems in a flexible and adaptive way. The swarm intelligence is a field of interdisciplinary research that is relatively recent.

We are interested in studying the algorithms that are based on the specific movements of a swarm of agents to solve a problem. We chose the PSO algorithm (“particle swarm optimization”) that uses a set of particles characterized by their position and velocity to optimize one or more fitness functions in a search space. This algorithm was initially proposed as a meta-heuristic for solving optimization problems.

In this paper, we use textual clustering by applying the PSO algorithm for multi-objective optimization (minimizing the intra-class distance and maximizing distances extra-class) and study the sensitivity parameters of the PSO for improvement on the quality of the textual clustering.

The study will go through in dissociable steps:

1.
The representation and indexing of textual documents
2.
Clustering by biomimetic approach
3.
Optimized by PSO
4.
Study the sensitivity parameter.

Top

Representation Of Textual Documents

The machine learning algorithms cannot process directly the unstructured data: image, video, and of course, the texts written in natural language. Thus, we are obliged to pass by an indexing step.

The indexing step is simply a representation of the text as a vector where each entry corresponds to a different word and the number at that entry corresponds to how many times that word was present in the document (or some function of it); this is very delicate and very important at the same time: a poor or bad representation will lead certainly to bad results.

We will represent each text as a vector where each entry corresponds to a different word and the number at that entry corresponds to how many times that word was present in the document (or some function of it). In this way, we shall have a vector which represents the text and which is exploitable by machine learning algorithms at the same time. The main characteristic of the vector representation is that every language is associated with a particular dimension in the vector space. Two texts using the same textual segments are projected on identical vectors.

Several approaches for the representation of texts exist in the literature, among whom the bag-of-words representation which is the simplest and the most used, the bag-of-sentences representation, the n-gram representation which is a representation independent from the natural language and conceptual representation.

Choice of Term

In our study, we use the n-gram method. The n-grams of character consider spaces because the not grip of spaces introduces the noise. Many works have shown the efficiency of n-grams as a method of representation of texts.

This method has many strong points, we made a comparison between the n-gram and other methods of representation of texts and we get the following points:

1.
N-grams capture the stems of the words automatically without going through the research phase of lexical roots.
2.
N-grams are language independent.
3.
The n-gram method tolerates the spelling mistakes and the noise which can be caused by using of OCR (Optical Character Recognition) for example
4.
The key limitation of n-gram feature extraction is that the length of the n-gram increases and the dimensionality of feature set will increase.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference