Distributed Hash Tables (DHTs) have been used in Peer-to-Peer networks to provide query lookups in typically O(log n) messages whilst requiring maintenance of only small amounts of routing state. We propose ROME, a layer which runs on top of the Chord DHT to provide control over network size through monitoring of node workload and propose the use of processes to control the addition or removal of nodes from the network. We show that this technique can reduce further the hop counts in networks where available node capacity exceeds workload, without the need to modify any processes of the underlying Chord protocol.
With many media reports of increased piracy (Digital Spy, 2005) (one file-sharing server recently shut down indexed 170 million pirated files (MPA, 2006)), free-for-all access to illegal materials (BBC News, 2005) and promises of finding answers to previously incomputable problems emerging as a result of them (Anderson, 2002;Bohannon, 2005), Peer-to-Peer networks have become a hot topic not only of interest to the scientific community, but to a wider audience too. Of course, not all coverage has been accurate, but P2P networks have nonetheless attracted much attention.
As an emerging technology, there are many opportunities for research work that will provide real benefits for the future of Peer-to-Peer networks. In this chapter we look at the area of resource discovery. Since large collections of resources are distributed throughout such networks, searching and discovering the whereabouts of them becomes a vital function.
Traditionally, networks were based on client/server architectures. Client workstations would be connected to a central server which would handle all requests. In this environment, resources (for example files or databases) are held on the server, so client-to-client communication is rare or non-existent.
Client/server architectures allow for requests for resources to be resolved efficiently. Each client is directly connected to the server which means, as it is the central resource repository, a single hop over the network is all that is required to send a request. On the other hand, having a central server leads to a single point-of-failure. Whilst communication is still possible if one of the clients fails, if the server is removed from the network then none of the machines are able to access resources.
Largely due to success of the World Wide Web, we often view the Internet as a client/server environment. Clients connect via Web browsers to Web servers that return content in response to requests. However, the original ARPANET was conceived to share computing resources throughout the United States. Machines were connected as equal peers. As the Internet grew it took on a client/server-like architecture. Recently though, the trend has begun to reverse as a new breed of peer-to-peer network has emerged (Minar, 2001).
Pure peer-to-peer (P2P) architectures tend toward the opposite of client/server architectures. There are no central servers. Instead, machines (often referred to as nodes) are directly connected to several others and have the potential to act as both clients and servers. Resources are typically hosted on nodes at the edge of the network, rather than on dedicated servers.
A pure P2P architecture is more robust than a client/server architecture because the need for a central server, and thus the single point-of-failure, is removed, although this comes at a cost. Resources are spread throughout the network so it is no longer guaranteed that a requestor will be directly connected to a provider, meaning it may take several hops over the network to find the required resource.
In practice, it is rare for a network to have a pure P2P architecture. Many are hybrids (Figure 1) positioned in-between pure P2P and client/server. Not all nodes in hybrid P2P architectures have equal standing within the network like in pure P2P. Some nodes take on additional responsibilities, becoming group managers or routers, which leads to more organisation within the network. Typically they hold more information about the network than a standard node, allowing for messages to be routed in fewer hops. Our discussion of P2P networks will treat them as high level networks running at the application layer. These networks are overlaid on top of the lower level networking layers, hence the term network overlay.
Hybrid Peer-to-Peer Architecture
Key Terms in this Chapter
P2P: A distributed network architecture may be called a Peer-to-Peer network, if the participants share a part of their own hardware resources (processing power, storage capacity, network link capacity, printers). They are accessible by other peers directly, without passing intermediary entities. The participants of such a network are thus resource providers as well as resource requestors (Servent-concept).
Distributed Hash Table: A method for storing hash tables in geographically distributed locations in order to provide a failsafe lookup mechanism for distributed computing.
Network Overlay: An overlay network is a computer network which is built on top of another network. Nodes in the overlay can be thought of as being connected by virtual or logical links, each of which corresponds to a path, perhaps through many physical links, in the underlying network.
Network Traffic: Data transmitted over the network at a given moment.
Fault Tolerance: The ability of a system to respond gracefully to an unexpected hardware or software failure.
Query Hops: The number of times the query has been forwarded.
Hash Table: A lookup table that is designed to efficiently store non-contiguous keys that may have wide gaps in their alphabetic and numeric sequences.