Article Preview
Top1. Introduction
An increasing number of human-computer interaction requirements emerge with the advance of web and mobile computing technologies. As a result, well-known products such as Talking Tom (Outfit7 Limited, 2017) become more and more popular, especially among youngsters. And the corresponding research and development works become heated in both the academia and industry. In spite of the remarkable enthusiasms and efforts, there are still many works worthy of study and improvement. For example, according to the characteristics of the user groups, it can provide custom functionality of pronunciation and intonation; or it can provide a web-based change from real photo images into 3D cartoons, and increase the action in accordance with the user personality or emotions.
Existing studies (Lin et al., 2013; Nunes et al., 2011; Bitouk, & Nayar, 2008; Lee et al., 2010; Danihelka et al., 2011; Migliardi et al., 2012; Ezzat et al., 2004; Xie, & Liu, 2007; Wang et al., 2011; Cosatto et al., 2013; Xie et al., 2015) have shown that it is not easy to design and implement a Talking Avatar product which has rich functionality and good user experience (UE). In a nutshell, the difficulties mainly have three aspects. Firstly, the Talking Avatar software should have good UE, and how to create vivid Avatars with a variety of decorations and how to implement the actions of the Avatars according to the characteristics, preference of the user is very important. Secondly, the hardware conditions of intelligent terminals and the influence of network environment should be considered, because resource-constrained mobile terminal and web environment is often difficult to support real-time, complex runtime requirement. Thirdly, because the Android platform is an open, fragmented ecological environment, the compatibility and integration between Talking Avatar and android platform is very complex.
As the evolution of web technologies and service computing, cloud computing and cloud services (Tsaftaris, 2014) open a new door for solving the above issues. Specific functions such as semantics understanding, face recognition and video sharing are offered by cloud service platforms (IFLYTEK Limited, 2017; Urakawa et al., 2016; Zhangtao Network Technology Limited, 2017). Based on these third-party cloud services, developers can focus on functions and quickly release satisfactory products for end users. However, many practical problems occur when the authors try to utilize cloud services, e.g. resource allocation and network bandwidth. Specific to Talking Avatar applications, the existing works cannot offer a complete Talking Avatars solution with various Avatar images and decorations, facial expressions, gestures, as well as the function of social sharing. This paper presents a Talking Avatar software architecture in which three cloud services are integrated under the Software-as-a-Service mode (Chang, 2011).
The contribution of the paper has three aspects. Firstly, the proposed solution integrating cloud services from different service providers implements a Talking Avatar product with more functions and better UE. Secondly, the authors present many local algorithms to make full use of the cloud services, e.g. similarity comparison and sub-string matching; however, it might be better to focus on the architecture due to the page limitation. Thirdly, the proposed architecture successfully moves some dirty works to the cloud service side, hence this method achieves richer functionality. The rest of this paper is organized as follows. Section 2 discusses related research works. Section 3 presents the proposed Talking Avatar architecture. Section 4 shows and analyzes experimental results. Finally, Section 5 draws the conclusion.