Web Search Results Discovery by Multi-granular Graphs

Web Search Results Discovery by Multi-granular Graphs

Gloria Bordogna (CNR-IDPA, Dalmine (Bg), Italy), Alessandro Campi (Politecnico di Milano, DEI, Milano, Italy), Giuseppe Psaila (Università di Bergamo, Facoltà di Ingegneria, Dalmine (BG), Italy) and Stefania Ronchi (Politecnico di Milano, DEI, Milano, Italy)
DOI: 10.4018/978-1-60960-881-1.ch006
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this chapter, the authors propose a novel multi-granular framework for visualization and exploration of the results of a complex search process, performed by a user by submitting several queries to possibly distinct search engines. The primary aim of the approach is to supply users with summaries, with distinct levels of details, of the results for a search process. It applies dynamic clustering to the results in each ordered list retrieved by a search engine evaluating a user’s query. The single retrieved items, the clusters so identified, and the single retrieved lists, are considered as dealing with topics at distinct levels of granularity, from the finest level to the coarsest one, respectively. Implicit topics are revealed by associating labels with the retrieved items, the clusters, and the retrieved lists. Then, some manipulation operators, defined in this chapter, are applied to each pair of retrieved lists, clusters, and single items, to reveal their implicit relationships. These relationships have a semantic nature, since they are labeled to approximately represent the shared documents and the shared sub-topics between each pair of combined elements. Finally, both the topics retrieved by the distinct searches and their relationships are represented through multi-granular graphs, that represent the retrieved topics at three distinct levels of granularity. The exploration of the results can be performed by expanding the graphs nodes to see their contents, and by expanding the edges to see their shared contents and their common sub-topics.
Chapter Preview
Top

Introduction

Graph-based visualization of Web search results has been recognized to be an effective way to provide a road and concise representation of the results; the user can quickly analyse graphs to understand if and how they are related to his/her information needs. This paradigm enables the simultaneous display of a large number of Web pages. It has been adopted for several different purposes:

  • For providing an impressed view of the contents of a Web site, allowing users to deal with the results at a coarser grain, namely, the site level rather than the page level (McCrickard et al., 2007);

  • For representing the inner structure of retrieved documents, mainly in the case of multimedia documents composed of distinct media sections (Worring, de Rooij and van Rijn, 2007);

  • For representing the relationships between documents defined by both their in and out links and shared index terms (Angelaccio, Buttarazzi and Patrignanell, 2007; Thiel et al., 2007; Belew, 1989);

  • For representing the inter structure of homogeneous sets of documents identified by flat and hierarchical clustering (de Graaf, Kok and Kosters, 2007);

  • Last but not least, for visualizing and refining Web searches, as in the popular wonderwheel feature recently introduced by Google.

In this chapter, we propose a novel framework for the exploration of the results obtained by querying possibly distinct search engines during a complex search process; we also propose a multi-granular graph-based visualization of results.

Graphs are used to represent and visualize both the main retrieved topics and their approximate semantic relationships. Specifically, we define a multi-granular graph that consists of several graphs organized on distinct layers, which represent the topics dealt with in the retrieved documents and their relationships at distinct levels of details. This multi-granular representation is very effective to provide overviews at a glance of the retrieved topics at distinct levels.

Further, the representation of the retrieved contents is done by considering the user needs expressed in the query that retrieved the web page. So, since a web page can be retrieved by distinct queries, we derive multiple representations that focus its contents from distinct points of view.

This goal is also the same pursued by the application of hierarchical categorization and hierarchical clustering of the results of web searches. Nevertheless, our approach is different: we generate the topics and their approximate semantic relationships by employing clustering techniques in combination with the application of cluster manipulation operations. These operations aid the identification of ''hidden'' relevant topics, not necessarily top-ranked and highlighted in any of the retrieved result lists.

More precisely, our approach follows the following two steps.

  • Firstly, the results in the lists, retrieved independently by possibly distinct search services (i.e. search engines) by evaluating possibly distinct queries, are clustered (each list is clustered independently of the others) The clustering of a results’ list identifies a group of clusters which are considered the most general retrieved topics by a single query.

  • Second, other hidden topics are revealed by combining both the ranked lists, i.e., the groups of clusters, the clusters, and the single retrieved items (that are represented by the web pages’ titles and snippets) by means of manipulation operators defined in this chapter: these operations allow defining approximate semantic relationships between the combined elements that highlight their shared sub-topics.

Complete Chapter List

Search this Book:
Reset