Clustering Web Pages into Hierarchical Categories

Clustering Web Pages into Hierarchical Categories

Zhongmei Yao (Louisiana Tech University, USA) and Ben Choi (Louisiana Tech University, USA)
Copyright: © 2007 |Pages: 19
DOI: 10.4018/jiit.2007040102
OnDemand PDF Download:
$37.50

Abstract

Clustering is well suited for Web mining by automatically organizing Web pages into categories each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain, until now there is no such a method suitable for Web page clustering. To address this problem, we discovered a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page datasets. We also propose a new Bidirectional Hierarchical Clustering algorithm, which arranges individual Web pages into clusters and then arranges the clusters into larger clusters and so on until the average inter-cluster similarity approaches the constant factor. Having the new constant factor together with the new algorithm, we have developed a clustering system suitable for mining the Web.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017): Forthcoming, Available for Pre-Order
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing