Communication in computer networks can be organized in two different ways, according to the client/ server model and the peer-to-peer model (Spinellis & Androutsellis-Theotokis, 2004). In the client/server model, the network is centralized. There is one host on the network, the server, which provides services to its clients. Its network address is usually well-known. On the other hand, in the peer-to-peer model, there is no central point in the network. Hosts participating are sometimes called “servents” (Gnutella, 2006), as they act both as servers and as clients at the same time: they provide services to other servents, while they also use the services of others. Nodes in unstructured peer-to-peer networks usually communicate via message flooding. For example, a search request for a given file in the Gnutella network is sent to all neighboring servents. However, this solution is not scalable, and it generates a lot of unnecessary network traffic.
Distributed Hash Tables
In a distributed hash table, a hash function is used to derive a small number from a key representing some data (the value). Information is then stored on one of the participating peers. The application level networks mentioned above all use hash tables to store information, but the exact management of storage is different in the following three aspects (see Table 1):Table 1.
|Kademlia||XOR operation||binary tree|
Key Terms in this Chapter
Hash Function: A mathematical formula, which is used to turn some data into a representing number, which can serve as a digital fingerprint. These formula can usually be applied to any data and create a seemingly random but reproducible identifier. Example algorithms for this are MD5 and SHA-1.
Firewall: This is a host or router which provides a strict gateway to the Internet for a subnetwork, checking traffic and maybe dropping some network packets.
Datagram: A short, separate message between computers, similar to a conventional letter. A typical protocol to send and receive datagrams over the Internet is UDP, the User Datagram Protocol. See “Session.”
Distributed Hash Table: A hash table stores key-value pairs and enables a fast lookup for every value (piece of information), given its key. DHTs are application level networks, functioning as hash tables which span across many computers. A hash function is used to map each key-value pair to a specific computer in the network so that other participants will know from where to retrieve it.
Session: An established channel for messages between two computers, similar to a conventional phone call. On the Internet, the TCP (Transmission Control Protocol) is used mostly to implement session-based communication. See “Datagram.”
K-bucket: A list of a node’s neighbors in the Kademlia overlay.
Client/Server Model: A communicating way, where one host has more functionality than the other. It differs from the P2P model.
Overlay Network: The applications, which create an ALN, work together and usually follow the P2P model.
Network Address Translation: Two or more network hosts sharing the same Internet address. One of the hosts serves as a gateway and forwards network packets between the Internet and the local network.
Peer-to-Peer (P2P) Model: A communication way where each node has the same authority and communication capability. They create a virtual network, overlaid on the Internet. Its members organize themselves into a topology for data transmission.
Key-value Pair: The fundamental unit of information stored in a hash table. Every piece of information content (value) is assigned an identifier (key), which is used for reference: values are mapped onto keys. An everyday example for this is a name of a file and its content.
Application Level Network (ALN): The applications, which are running in the hosts, can create a virtual network from their logical connections. This is also called an overlay network. The operations of such software entities are not able to understand without knowing their logical relations. ALN software entities usually use the P2P model, not the client/server model for the communication.
Replication: Storing data at different physical locations to enhance availability and dependability.