Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Reuters-21578

Handbook of Research on Text and Web Mining Technologies
(corpus) Is a set of financial dispatches emitted during the year 1987 by the Reuters agency in the English language and available free on the Web. This corpus is an update of the Reuters-22173 corpus. This update was carried out in 1996. The texts of this corpus have a journalistic style. The characteristic of the corpus Reuters 21578 is that each document is labeled by several classes. This corpus is often used as a comparison base between the various tools for documents classification.
Published in Chapter:
SOM-Based Clustering of Textual Documents Using WordNet
Abdelmalek Amine (Djillali Liabes University, Algeria & Taher Moulay University Center, Algeria), Zakaria Elberrichi (Djillali Liabes University, Algeria), Michel Simonet (Joseph Fourier University, France), Ladjel Bellatreche (University of Poitiers, France), and Mimoun Malki (Djillali Liabes University, Algeria)
Copyright: © 2009 |Pages: 12
DOI: 10.4018/978-1-59904-990-8.ch012
Abstract
The classification of textual documents has been the subject of many studies. Technologies like the Web and numerical libraries facilitated the exponential growth of available documentation. The classification of textual documents is very important since it allows the users to effectively and quickly fly over and understand better the contents of large corpora. Most classification approaches use the supervised method of training, more suitable with small corpora and when human experts are available to generate the best classes of data for the training phase, which is not always feasible. The unsupervised classification or “clustering” methods make emerge latent (hidden) classes automatically with minimum human intervention, There are many, and the SOM (self Organized Maps) by Kohonen is one of the algorithms for unsupervised classification that gather a certain number of similar objects in groups without a priori knowledge. This chapter introduces the concept of unsupervised classification of textual documents and proposes an experiment with a conceptual approach for the representation of texts and the method of Kohonen for clustering.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR