Paving the Way to an Effective and Efficient Retrieval of Data over Semantic Overlay Networks

Paving the Way to an Effective and Efficient Retrieval of Data over Semantic Overlay Networks

Federica Mandreoli (University of Modena and Reggio Emilia, Italy), Riccardo Martoglia (University of Modena and Reggio Emilia, Italy), Wilma Penzo (University of Bologna, Italy), Simona Sassatelli (University of Modena and Reggio Emilia, Italy) and Giorgio Villani (University of Modena and Reggio Emilia, Italy)
Copyright: © 2009 |Pages: 25
DOI: 10.4018/978-1-60566-028-8.ch007
OnDemand PDF Download:


In a Peer-to-Peer (P2P) system, a Semantic Overlay Network (SON) models a network of peers whose connections are influenced by the peers’ content, so that semantically related peers connect with each other. This is very common in P2P communities, where peers share common interests, and a peer can belong to more than one SON, depending on its own interests. Querying such a kind of systems is not an easy task: The retrieval of relevant data can not rely on flooding approaches which forward a query to the overall network. A way of selecting which peers are more likely to provide relevant answers is necessary to support more efficient and effective query processing strategies. This chapter presents a semantic infrastructure for routing queries effectively in a network of SONs. Peers are semantically rich, in that peers’ content is modelled with a schema on their local data, and peers are related each other through semantic mappings defined between their own schemas. A query is routed through the network by means of a sequence of reformulations, according to the semantic mappings encountered in the routing path. As reformulations may lead to semantic approximations, we define a fully distributed indexing mechanism which summarizes the semantics underlying whole subnetworks, in order to be able to locate the semantically best directions to forward a query to. In support of our proposal, we demonstrate through a rich set of experiments that our routing mechanism overtakes algorithms which are usually limited to the only knowledge of the peers directly connected to the querying peer, and that our approach is particularly successful in a SONs scenario.
Chapter Preview

Overview And Motivation

In recent years, Peer-to-Peer (P2P) systems have known an enormous success among Internet users. In these systems, each user, called peer, connects to other users (peers) for data sharing purposes. Notable examples are (Napster), (Kazaa), and (Gnutella), just to mention a few. The large diffusion of this phenomenon emphasized the need of efficient algorithms for the retrieval of relevant information. In fact, a P2P system usually involves a high number of peers, and approaches which flood the network with a huge amount of messages are not adequate as to efficiency as well as to effectiveness purposes. Thus, a key challenge when querying a large set of peers is query routing (Crespo and Garcia-Molina, 2002), i.e., the capability of selecting a small subset of relevant peers to forward a query to.

On the other hand, as envisioned by the Semantic Web (Berners-Lee, Hendler, and Lassila, 2001), the need of complementing the Web with more semantics has spurred much efforts towards a rich representation of data. Thus, being peers given more semantics, new potentialities are available as to query formulation, and, consequently, new challenges arise for query processing.

In this view, Peer Data Management Systems (PDMSs) have been introduced as a solution to the problem of large-scale sharing of semantically rich data (Halevy et al., 2004). Peers are semantically rich, in that peers' content is modelled with a schema on their local data, and peers are related each other through semantic mappings defined between their own schemas. In such a scenario, queries can be semantically richer than keyword-based search, rather they can be complex queries on ontologies, XML schemas, RDF schemas, aso. However, the key problem is that peers do not know where to find information, and in some sense a P2P network resembles a social network to a large extent, in that a question is asked to the person who one assumes that she best answers the question (Tempich, Staab, and Wranik, 2004).

Similar motivations underlie the concept of Semantic Overlay Networks (SONs) (Crespo and Garcia-Molina, 2004), where peers connects to other peers on the basis of the similarity of their semantic content. SONs cut down the inefficiencies due to random connections among peers, where queries are blindly forwarded from node to node. Then, despite of several P2P systems that require a rigid placement of content in specific nodes for efficient query processing purposes (e.g. Stoica, Morris, Karger, Kaashoek, and Balakrishnan, 2001), SONs allow peers to be autonomous as to the content they store locally, and to the connections they establish with other peers.

In such a semantically rich scenario, a query on the network is routed through semantic paths, i.e. sequences of semantic mappings between pairs of peers. In this process, a query undergoes a multi-step reformulation which may involve a chain of semantic approximations, and, consequently, the returned data may not exactly fit with the query conditions.

Our proposal is to exploit such approximations for selecting the directions which are more likely to provide the best results to a given query. To this purpose, semantic mappings can be conveniently extended with a score to reflect the relevance of peer's data to a query. Then, an important aspect to be considered is that a PDMS underlies a potentially very large network able to handle huge amounts of data. In this context, any relevant peer may add new answers to a given query and different paths to the same peer may yield different answers (Tatarinov and Halevy, 2004). Also for this reason, query routing is a fundamental issue for querying distributed resources. In particular, in a Semantic Web perspective, a query posed over a given peer should be forwarded to the most relevant peers that offer semantically related results among its immediate neighbors first, then among their immediate neighbors, and so on.

Complete Chapter List

Search this Book: