Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based Models

Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based Models

Wissam Siblini, Mohamed Challal, Charlotte Pasqual
Copyright: © 2022 |Pages: 16
DOI: 10.4018/IJDWM.298005
Article PDF Download
Open access articles are freely available for download


Open Domain Question Answering (ODQA) on a large-scale corpus of documents (e.g. Wikipedia) is a key challenge in computer science. Although Transformer-based language models such as Bert have shown an ability to outperform humans to extract answers from small pre-selected passages of text, they suffer from their high complexity if the search space is much larger. The most common way to deal with this problem is to add a preliminary information retrieval step to strongly filter the corpus and keep only the relevant passages. In this article, the authors consider a more direct and complementary solution which consists in restricting the attention mechanism in Transformer-based models to allow a more efficient management of computations. The resulting variants are competitive with the original models on the extractive task and allow, in the ODQA setting, a significant acceleration of predictions and sometimes even an improvement in the quality of response.
Article Preview


The last few years have given rise to many disruptive innovations in the field of Natural Language Processing (NLP) and allowed a significant improvement of evaluation metrics on public benchmarks (Wolf et al., 2019). In particular, the proposal of a novel architecture called the Transformer (Vaswani et al., 2017) and an adaptation into the versatile easy-to-use language model Bert (Devlin, Chang, Lee, & Toutanova, 2019) have led to a series of publications generating a continual enthusiasm. Recent transformer-based models such as RoBerta (Liu et al., 2019), XLNet (Z. Yang et al., 2019), Albert (Lan et al., 2019) managed to outperform humans on difficult benchmarks for general language comprehension assessment. This exploit led to the democratization of their use in many applications. We here focus on automatic question answering where we search for the answer of a user question in a large set of text documents (e.g. the entire English Wikipedia with millions of articles). Language models have been proven efficient on a sub-task called extractive Question Answering (eQA), sometimes also referred to as Reading Comprehension (RC), on the reference dataset SQuAD (Rajpurkar, Zhang, Lopyrev, & Liang, 2016): given a question-document pair, the goal is to find the answer within the document. But on our target task, referred to as Open Domain Question Answering (ODQA), the problem is more complex because for each question the search space is much larger. Since Transformer-based readers already require a non-negligible time to process a single question-paragraph pair, they cannot manage millions in real-time. The most common solution is to combine eQA with Information Retrieval (IR) (Manning, Schütze, & Raghavan, 2008) to first select IJDWM.298005.m01 relevant documents and only apply the costly reading comprehension model on them. Such a combination has proven itself in BertSerini (W. Yang, Xie, et al., 2019) where the widely known Lucene with BM25 (Białecki, Muir, Ingersoll, & Imagination, 2012; Robertson et al.,1995) for the IR part was combined with Bert for the eQA part.

In this paper, we propose to tackle the time issue from a more direct and complementary angle which consists in using partial attention in the eQA model so that many computations can be saved or only done once as a preprocessing. More precisely, our contributions are the following: (1) We use a Delaying Interaction Layers mechanism (DIL) on transformer-based models that consists in applying the attention between subparts (segments) of the input sequence only in the last blocks of the architecture. We implement this mechanism for both Bert and Albert and refer to the variants as DilBert and DilAlbert. (2) We study their behavior in the standard eQA setting and show that they are both competitive with the base models. (3) We analyze the impact of delayed interaction on the models complexity and then empirically confirm that in the ODQA setting, it allows to speed up computations by an order of magnitude on either GPU or CPU. (4) Finally, we evaluate the models on the reference ODQA dataset OpenSQuAD (Chen, Fisch,Weston, & Bordes, 2017) by combining them with Answerini as W. Yang, Xie, et al. (2019). Although DilBert (resp. DilAlbert) performs slightly worse than Bert (resp. Albert) when faced to a single relevant passage (eQA), it can outperform it in the ODQA setting when having to select the right answer within several paragraphs.

Additionally, our code is made available with the paper1 to (i) allow the reproduction of all the paper results and (ii) encourage new proposals by offering an ODQA pipeline similar to BertSerini and scripts to test it interactively on Wikipedia or evaluate it on OpenSQuAD.

Complete Article List

Search this Journal:
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing