Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Text Clustering using Distances Combination by Social Bees: Towards 3D Visualisation Aspect

Hadj Ahmed Bouarara, Reda Mohamed Hamou, Abdelmalek Amine

Source Title: International Journal of Information Retrieval Research (IJIRR) 4(3)

DOI: 10.4018/IJIRR.2014070103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Recently, the researchers proved that 90% of the information existed on the web, were presented in unstructured format (text free). The automatic text classification (clustering), has become a crucial challenge in the computer science community, where Most of the classical techniques, have known different problems in terms of time execution, multiplicity of data (marketing, biology, economics), and the initialization of cluster number. Nowadays, the bio-inspired paradigm, has known a genuine success in several sectors and particularly in the world of data-mining. The content of our work, is a novel approach called distances combination by social bees (DC-SB) for text clustering, composed of four steps: Pre-processing using different methods of texts representation (bag of words and n-gram characters) and the weighting TF-IDF, for the construction of the vectors; Bees' artificial life, the authors have imitated the functioning of social bees using three artificial worker bees(cleaner, guardian and forager) where each one of them is characterized by a distance measure different to others generated from the artificial queen (centroid) of the cluster (hive); Clustering using the concept of filtering where each filter is controlled by an artificial worker, and a document must pass three different obstacles to be added to the cluster. For the experiments they use the benchmark Reuters 21578 and a variety of validation tools (execution time f-measure and entropy) with a variation of parameters (threshold, distance measures combination and texts representation). The authors have compared their results with the performances of other methods existed in literature (Cellular Automata 2D, Artificial Immune System (AIS) and Artificial Social Spiders (ASS)), the conclusion obtained prove that the approach can solve the text clustering problem; finally, the visualization step, which provides a 3D navigation of the results obtained by the mean of a global and detailed view of the hive and the apiary, using the functionality of zooming and rotation.

Article Preview

Top

Introduction And Problematic

The information universe, is now enduring a big revolution that involves all sectors globally and in dissimilar fields. Today's, the web is the greatest accumulation of data, especially with the evolution of the communication means Internet / Intranet, have given a birth to a new concepts like, big data that reflects the large amount of unstructured data (textual documents) available online /offline. The digital society was enriched every day with a new substance, which makes it difficult to manage. For this reason we need to develop tools that help us to find within a reasonable time the desired information, performing certain tasks in our place, and facilitate our life.

In the past 20 years and with the development of computers, visualization tools and the instruments for automatic processing of information, as data mining, applied to extract the valuable information from a large volumes of data. We attempted in our proposed work, to consider that we have a huge quantity of textual documents, and we ask a person to classify them according to its domain, without any external help. However, this person has, no cognitive background about these documents. This process requires that the classifier must read all the documents in order to get the links between them. This kind of costly problem represents the virtual image of our work in the machine called clustering, in which the aim is to treat a set of textual documents and arrange them in homogenous classes of reflections where the documents of the same class must be similar, and the ones of different classes must be as dissimilar as possible.

Many works had been done over this area and several systems had seen the light, based on classical techniques that are faced with multiple obstacles:

•
The pick of the distance measure criterion
•
The selection of the texts representation method
•
The initialisation of the cluster number
•
Execution time caused by the number of documents existed

The current scientific world, was considerably built up with the inaugural appearance of novel concepts and prototypes. Actually, for each encountered problem, we must observe the nature; it may already have the same problem, where it had found solutions, long years ago. The bio-mimicry, consists to copy the living by getting advantages from solutions and innovations made by nature. In this paper, we will imitate the lifestyle of social bees, in order to introduce a new artificial model called Distances Combination by Social Bees (DC-SB) to solve the problem of text clustering that represents a topicality challenge in the scientific middle. Our problematic, is placed in the intersection of several subjects as shown in figure 1.

Figure 1.

Problematic position

Top

State Of The Art

The automatic classification (clustering) has attracted considerable attention from research and industry, various documents have been published in the subject, and many commercial systems and software are being developed. They provide a very important meaning in modern life where all texts clustering systems follow the same process: I) text representation, ii) construction of distance matrix, iii) modelling, iiii) evaluation of results (Buhmann, 2003).

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Text Clustering using Distances Combination by Social Bees: Towards 3D Visualisation Aspect

Abstract

Introduction And Problematic

State Of The Art

Complete Article List