Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Document Representation

Handbook of Research on Text and Web Mining Technologies
Document representation is concerned about how textual documents should be represented in various tasks, e.g. text processing, retrieval and knowledge discovery and mining. Its prevailing approach is the vector space model, i.e. a document di is represented as a vector of term weights , where is the collection of terms that occur at least once in the document collection D.
Published in Chapter:
On Document Representation and Term Weights in Text Classification
Ying Liu (The Hong Kong Polytechnic University Hong Kong SAR, China)
Copyright: © 2009 |Pages: 22
DOI: 10.4018/978-1-59904-990-8.ch001
Abstract
In the automated text classification, a bag-of-words representation followed by the tfidf weighting is the most popular approach to convert the textual documents into various numeric vectors for the induction of classifiers. In this chapter, we explore the potential of enriching the document representation with the semantic information systematically discovered at the document sentence level. The salient semantic information is searched using a frequent word sequence method. Different from the classic tfidf weighting scheme, a probability based term weighting scheme which directly reflect the term’s strength in representing a specific category has been proposed. The experimental study based on the semantic enriched document representation and the newly proposed probability based term weighting scheme has shown a significant improvement over the classic approach, i.e., bag-of-words plus tfidf, in terms of Fscore. This study encourages us to further investigate the possibility of applying the semantic enriched document representation over a wide range of text based mining tasks.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR