Expressive Audiovisual Message Presenter for Mobile Devices

Expressive Audiovisual Message Presenter for Mobile Devices

Alex Garcia Gonçalves (Department of Computer Engineering and Industrial Automation, School of Electrical and Computer Engineering, University of Campinas, Campinas, Brazil) and José Mario De Martino (Department of Computer Engineering and Industrial Automation, School of Electrical and Computer Engineering, University of Campinas, Campinas, Brazil)
Copyright: © 2013 |Pages: 14
DOI: 10.4018/jhcr.2013010105
OnDemand PDF Download:
No Current Special Offers


This paper presents the development of an extension of the SMS Service for Android Smartphones, which adds audio and visual characteristics to this widely used communication service. With this solution, when a SMS message is received, it is read by a 3D avatar, who performs the lip articulatory movements and the emotional facial expressions synchronized with the audio corresponding to the content of the received message.
Article Preview


As Smartphones with advanced hardware capacity are becoming more popular around the world, mobile users are becoming used to applications and services that present rich visual characteristics and sophisticated graphical interface, containing images, animations and audio, which make the services easier to use and more appealing.

Despite the advances in this area, one of the most used mobile communication services is still based in pure text: SMS message. This service is widely used by people around the world to send daily messages to family members and friends, setup appointments, communicate something that has happened, express feelings, etc.

As an example of how widely this service is used, Figure 1 shows how the number of SMS users has been growing in the last years in the USA. The same is expected to be true in other parts of the world.

Figure 1.

SMS growth in the USA (adapted from Cellsigns. Mobile Messaging still the killer APP for your cell phone)


However, the fact that SMS service is a text based communication mechanism presents some drawbacks, as for instance:

  • It’s difficult for the receiver to clearly interpret the emotions passed by the sender, just by seeing the emote symbols inserted in the text;

  • Reading text may be difficult for people who have poor vision (specially elderly people), or for people who are illiterate/semiliterate;

  • As the regular cell phones are being replaced by powerful smartphones, the applications used in these devices tend to be graphically rich, and a text-based communication mechanism does not follow this trend, and therefore does not have an engaging state-of-art appeal.

As a solution to these problems, we present an extension to the SMS mechanism, which adds audio and visual characteristics to this communication method, without changing the existing SMS transmission protocol.

With this extension, when a new SMS message is received a 3D avatar is displayed in the cell phone screen and “reads” the message for the user, performing visual articulatory movements according to and synchronized with the speech, as well as facial expressions driven by emotional symbols that may be present in the message, such as:-) smile,:-D open smile,:-(sad, ;-) wink,:~) surprised, among others.

This solution eliminates the need for the user to read the message, makes the understanding of the content and interpretation of the emotions contained in the message easier, and the graphical capabilities make the service more appealing for Smartphone users.


There are many published works related to facial animation with visual speech synthesis, however only a limited number of them are targeted to mobile devices (which requires optimizations on the software to achieve satisfactory performance).

Costa (2009) presents a 2D image-based face model approach for mobile devices in which the speech animation is driven by the context-dependent visemes concept proposed by De Martino et al.,(2006). This approach uses an image database containing 34 pre-defined images representing the main visemes of the Brazilian Portuguese language, and applies image morphing technique to produce the speech animation. The visual transitions are driven by a timed phonetic transcription of the sentence and animated using a non-linear space transition curve applied to five key points of the face located around the mouth and chin.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing