Science of Emoticons: Research Framework and State of the Art in Analysis of kaomoji-type Emoticons

Science of Emoticons: Research Framework and State of the Art in Analysis of kaomoji-type Emoticons

Michal Ptaszynski (Kitami Institute of Technology, Japan), Jacek Maciejewski (Independent Researcher, Poland), Pawel Dybala (Kotoken Language Laboratory, Poland), Rafal Rzepka (Hokkaido University, Japan), Kenji Araki (Hokkaido University, Japan) and Yoshio Momouchi (Hokkai-Gakuen University, Japan)
DOI: 10.4018/978-1-4666-0954-9.ch012


Emoticons are string of symbols representing body language in text-based communication. For a long time they have been considered as unnatural language entities. This chapter argues that, in over 40-year-long history of text-based communication, emoticons have gained a status of an indispensable means of support for text-based messages. This makes them fully a part of Natural Language Processing. The fact the emoticons have been considered as unnatural language expressions has two causes. Firstly, emoticons represent body language, which by definition is nonverbal. Secondly, there has been a lack of sufficient methods for the analysis of emoticons. Emoticons represent a multimodal (bimodal in particular) type of information. Although they are embedded in lexical form, they convey non-linguistic information. To prove this argument the authors propose that the analysis of emoticons was based on a theory designed for the analysis of body language. In particular, the authors apply the theory of kinesics to develop a state of the art system for extraction and analysis of kaomoji, Japanese emoticons. The system performance is verified in comparison with other emoticon analysis systems. Experiments showed that the presented approach provides nearly ideal results in different aspects of emoticon analysis, thus proving that emoticons possess features of multimodal expressions.
Chapter Preview


One of the primary functions of the Internet is to connect people online. The first developed online communication media, such as e-mail or BBS forums, were based on text messages. Although later improvement and popularization of Internet connections allowed for phone calls or video conferences, the text-based message did not lose its popularity. However, its sensory limitations in communication modalities (no view or sound of the interlocutors) prompted users to develop communication strategies compensating for these limitations. One such strategy is the use of emoticons, strings of symbols imitating body language (faces or gestures). Today, the use of emoticons in online conversation contributes to the facilitation of the online communication process in e-mails, BBS, instant messaging applications, or blogs (Suzuki & Tsuda, 2006b; Derks et al., 2007; Chiu, 2007). Therefore obtaining a sufficient level of computation for this kind of communication is likely to improve machine understanding of language used online, and contribute to the creation of more natural human-machine interfaces. Thus analysis of emoticons is of great importance in such fields as Human-Computer Interaction (HCI), Computational Linguistics (CL), or Artificial Intelligence (AI). However, for a long time emoticons have been considered as unnatural language entities and included in a subfield of NLP called Unnatural Language Processing (UNLP). The term “Unnatural Language Processing” (UNLP), as roughly defined for the needs of Baidu UNLP Contest ( in 2010, refers to a subfield of NLP dealing with language phenomena which cannot be captured by conventional language processing methods (Hagiwara, 2011). UNLP defined this way1 includes such problems as informal expressions, typos, emoticons, onomatopoeia, or unknown words. This chapter focuses on emoticons, and in particular on their Japanese type called kaomoji. We claim that emoticons are far from being unnatural entities in language and mention some empirical proofs for this claim. We notice further that there are two reasons for the emoticons to have been included in UNLP. Firstly, since emoticons are said to be representing body language, the information they convey is by definition nonverbal. Secondly, there is a lack of sufficient methodology for emoticon analysis. We propose such methodology in a form of a framework for the research on emoticons. Moreover, we present our research in developing a system for analysis of emoticons, based on the idea taking advantage of the fact that emoticons incorporate both linguistic and nonlinguistic information.

The outline of this chapter is as follows. After providing definitions and explanations of the nomenclature used in this chapter, we present our approach to the analysis of emoticons and explain the general idea the research is based upon. Next, we present a review of other research dealing with emoticons. We describe two general fields that take emoticons as research objects, namely, social sciences and NLP. Next, we propose a general framework for the research on emoticons. Following, we explain the particular procedures applied during automatic generation of the emoticon database applied in our research. We also describe the structure and statistics of the database. Then we describe CAO, a system for emotiCon Analysis and decOding of affective information, built on the database. We describe the evaluation settings for the system and present the results of the evaluation. Finally, the chapter is finalized with concluding remarks, future directions, and planned applications for the system.

Complete Chapter List

Search this Book: