Joint Model-Based Attention for Spoken Language Understanding Task

Joint Model-Based Attention for Spoken Language Understanding Task

Xin Liu, RuiHua Qi, Lin Shao
Copyright: © 2020 |Pages: 12
DOI: 10.4018/IJDCF.2020100103
Article PDF Download
Open access articles are freely available for download
Add to Your Personal Library: Article


Intent determination (ID) and slot filling (SF) are two critical steps in the spoken language understanding (SLU) task. Conventionally, most previous work has been done for each subtask respectively. To exploit the dependencies between intent label and slot sequence, as well as deal with both tasks simultaneously, this paper proposes a joint model (ABLCJ), which is trained by a united loss function. In order to utilize both past and future input features efficiently, a joint model based Bi-LSTM with contextual information is employed to learn the representation of each step, which are shared by two tasks and the model. This paper also uses sentence-level tag information learned from a CRF layer to predict the tag of each slot. Meanwhile, a submodule-based attention is employed to capture global features of a sentence for intent classification. The experimental results demonstrate that ABLCJ achieves competitive performance in the Shared Task 4 of NLPCC 2018.
Article Preview

Introduction And Background

According to Singh et al. (2000), “Systems in which human users speak to a computer in order to achieve a goal are called spoken dialogue systems (SDS)”. In recent years, task-oriented spoken dialogue system (SDS) that helps users finish tasks more efficiently via spoken interaction is being applied on various devices (Lison & Kennington 2016). There are many famous technology companies involved in this type of system, such as Apple Siri, Microsoft Cortana, Baidu Duer and so on (Hoy, 2018). As a critical component of SDS, spoken language understanding (SLU) aims to parse users’ queries and convert them to structured representations that machines can handle. The result of SLU is passed to SDS to update dialogue state and take the next proper action. Therefore, the performance of SLU is critical to SDS (Tur & De Mori, 2011).

As SLU has become a focus in research communities, the Shared Task 4 in NLPCC 2018 named “Spoken Language Understanding in Task-Oriented Dialogue Systems” tries to provide a platform for evaluation. It aims to parse users’ multiple rounds of queries in a session and convert them into some structure that machines can handle. To understand users’ queries expressed in spoken language, the task contains two subtasks, namely intent detection (ID) and slot filling (SF), which need to automatically recognize the intent of the queries and extract associated arguments or slots towards achieving a goal.

There are a large number of literatures on ID and SF, and many of them process the subtasks in a pipeline framework; firstly the intent is classified and secondly the semantic slots are extracted. ID is usually framed as a semantic utterance classification (SUC) problem. Many popular classifiers like support vector machines (SVMs) (Fan et al., 2008), maximum entropy (Chelba et al., 2003) and RNN models (Ravuri & Stolcke, 2015) have already been employed before. Similarly, SF can be treated as a sequence tagging problem, which is customarily solved by some traditional approaches, such as Hidden Markov Models (HMMs) (Pieraccini et al., 1992), Conditional Random Fields (CRF) (Raymond & Riccardi, 2007) and various RNN models (Mesnil et al., 2015; Vu et al., 2016; Huang et al., 2015). However, using pipeline systems not only takes more time to process tasks, but also cannot model the interaction between multiple subtasks.

In order to simplify the SLU system and use the shared information provided by ID and SF to promote the results of the two subtasks, more and more joint models for multiple tasks have also been proposed in recent years (Liu & Lane, 2016; Zhang & Wang, 2016; Wen et al., 2017). The ability to feature the correlations between subtasks helps them achieve competitive performances in ATIS.

Motivated by the inherent ability of bidirectional RNN in capturing the past and future features of sequence, this paper proposes a joint model, which uses Bi-LSTM to learn the representation of each word in Chinese query text and then share them with ID task and SF task. With a joint loss function, the two tasks can interact and promote each other through the shared representations. Experimental results demonstrate that the joint model outperforms separate models for each task.

The main contributions of this paper are:

  • Adaptation of a joint model based attention mechanism, Bi-LSTM and CRF for intent determination and slot filling;

  • An analysis of how intent determination and slot filling can benefit from the contextual information of the Chinese queries within a session.


Backbone Algorithm And Model

This paper proposes an Attention-based Bi-LSTM-CRF Joint Model (ABLCJ) to deal with both tasks simultaneously. The backbone algorithm used in this paper and the structure of ABLCJ model is shown in Figure 1. As the Figure shows, the model is composed of three sub-modules with gray background colors. The Bi-LSTM module below is responsible for feature extraction, the module at the upper left corner can detect intents, and the module at the upper right corner is used for slot filling.

Complete Article List

Search this Journal:
Volume 15: 1 Issue (2023)
Volume 14: 3 Issues (2022)
Volume 13: 6 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing