Audio communication over IP-based networks represents one of the most interesting research areas in the field of distributed multimedia systems. Today, routing the voice over Internet enables cheaper communication services than those deployed over traditional circuit-switched networks. BoAT (Roccetti, Ghini, Pau, Salomoni, & Bonfigli, 2001a), Ekiga, FreePhone (Bolot & Vega Garcia, 1996), iCall, Kiax, NeVot (Schulzrinne, 1992), rat (Hardman, Sasse, & Kouvelas, 1998), Skype, Tapioca, vat (Jacobson & McCanne, n.d.), WengoPhone, and YATE, are just few examples of free VoIP software available to Internet users. Without any doubts, new (wired and wireless) highspeed, broadband networks facilitate the transmission of the voice over the Internet and have determined the success of these applications. However, the best effort service offered by the Internet architecture does not provide any guarantee on the delivery of (voice) data packets. Thus, to maintain a correct time consistency of the transmitted audio stream, these voice communication systems must be equipped with schemes able to deal with the unpredictability of network latency, delay jitter, and possible packet loss.
Several proposals to face with the effects caused by network delay, delay jitter, and packet loss rate on continuous media stream playout have been presented in literature. For instance, protocol suites (e.g., RSVP, DiffServ) and networking technologies (e.g., ATM) have been devised that provide users with quality of service (QoS) guarantees (Zhang, Deering, Estrin, Shenker, & Zappala, 1993). Yet, these approaches have not been widely adopted as usual means to provide guarantees of QoS to Internet users.
An interesting alternative that is now widely exploited in most existing Internet audio communication tools amounts to the use of adaptive playout control mechanisms. Basically, these schemes are faced with the unpredictability of IP networks by compensating for variable network delays and jitters experienced during the transmission of audio packets. In particular, delay jitter is smoothed away by employing a playout buffer at the receiver side and by dynamically enqueuing audio packets in it. Output of received and buffered packets is thus artificially delayed for some time so as to have a constant audio packet playout rate, hence absorbing negative effects introduced by the delay jitter. Such a buffering policy must be adaptive, since delay jitter on the Internet may vary significantly with time. This way, dynamic playout buffers hide packet delay jitters at the cost of additional delays at the receiver (see Figure 1).
Smoothing out jitter delay at the receiver
Summing up, each audio packet transmitted on the network has an associated scheduled playout delay, being defined as the total amount of time experienced by such audio packet from the instant it is generated at the source and the instant it is played out at the destination. Such a playout delay consists of: (1) the time needed for the transmitter to collect an audio sample and prepare it for transmission, (2) the network delay, and (3) the buffering time, that is the amount of time that a packet spends queued in the destination buffer before it is played out. Based on this specific notion of playout delay, a received audio packet is defined to be late when its arrival time at the destination is after the expiration of its scheduled playout time.
Key Terms in this Chapter
Talkspurt: In audio communication-short burst of energy during, which the audio activity is carried out. An audio segment may be considered as being constituted of several talkspurts separated by silence periods.
Delay Jitter: Variance of the network delay computed over two subsequent audio packets.
Network Delay: Time needed for the transmission of a data packet from the source to the destination.