This chapter aims to introduce the potential contribution of the emerging MPEG-4 audio-visual representation standard to future multimedia systems. This is attempted by the ‘case study’ of a particular example of such a system--‘LipTelephone’--which is a special videoconferencing system being developed in the framework of MPEG-4 (Sarris et al., 2000b). The objective of ‘LipTelephone’ is to serve as a videophone that will enable lip readers to communicate over a standard telephone connection. This will be accomplished by combining model-based with traditional video coding techniques in order to exploit the information redundancy in a scene of known content, while achieving high fidelity representation in the specific area of interest, which is the speaker’s mouth. Through this description, it is shown that the standard provides a wide framework for the incorporation of methods that had been the object of pure research even in recent years. Various such methods are referenced from the literature, and one is proposed and described in detail for every part of the system being studied. The main objective of the chapter is to introduce students to these methods for the processing of multimedia material, provide to researchers a reference to the state-of-the-art in this area and urge engineers to use the present research methodologies in future consumer applications.