Service Class Driven Dynamic Data Source Discovery with DynaBot

Service Class Driven Dynamic Data Source Discovery with DynaBot

Daniel Rocco (University of West Georgia, USA), James Caverlee (Georgia Institute of Technology, USA), Ling Liu (Georgia Institute of Technology, USA) and Terence Critchlow (Lawrence Livermore National Laboratory, USA)
Copyright: © 2007 |Pages: 23
DOI: 10.4018/jwsr.2007070102
OnDemand PDF Download:


Dynamic Web data sources on the Deep Web provide intuitive access to real-time information and large data repositories anywhere that Web access is available. Although recent studies suggest that the dynamic Web is larger and growing faster than static Web, dynamic content is often ignored by existing search engine indexers owing to technical challenges inherent in searching dynamic sources. To address these challenges, we present DynaBot, a service-centric crawler for discovering and clustering Deep Web sources. Dyna- Bot has three unique characteristics. First, DynaBot utilizes a service class model implemented through the construction of service class descriptions (SCDs). Second, DynaBot employs a modular architecture for focused crawling of the Deep Web. Third, DynaBot incorporates algorithms for efficiently probing, discovering, and clustering Deep Web sources through SCD-based service analysis. Experimental results demonstrate DynaBot’s effectiveness and suggest techniques for efficiently managing service discovery given the immense scale of the Deep Web.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 14: 4 Issues (2017): Forthcoming, Available for Pre-Order
Volume 13: 4 Issues (2016)
Volume 12: 4 Issues (2015)
Volume 11: 4 Issues (2014)
Volume 10: 4 Issues (2013)
Volume 9: 4 Issues (2012)
Volume 8: 4 Issues (2011)
Volume 7: 4 Issues (2010)
Volume 6: 4 Issues (2009)
Volume 5: 4 Issues (2008)
Volume 4: 4 Issues (2007)
Volume 3: 4 Issues (2006)
Volume 2: 4 Issues (2005)
Volume 1: 4 Issues (2004)
View Complete Journal Contents Listing