Deriving Taxonomy from Documents at Sentence Level
Ying Liu (Hong Kong Polytechnic University, Hong Kong SAR, China), Han Tong Loh (National University of Singapore, Singapore) and Wen Feng Lu (National University of Singapore, Singapore)
Copyright: © 2008
This chapter introduces an approach of deriving taxonomy from documents using a novel document profile model that enables document representations with the semantic information systematically generated at the document sentence level. A frequent word sequence method is proposed to search for the salient semantic information and has been integrated into the document profile model. The experimental study of taxonomy generation using hierarchical agglomerative clustering has shown a significant improvement in terms of Fscore based on the document profile model. A close examination reveals that the integration of semantic information has a clear contribution compared to the classic bag-of-words approach. This study encourages us to further investigate the possibility of applying document profile model over a wide range of text based mining tasks.