Article Preview
TopIntroduction
The large size of literature in the biomedical domain makes it difficult for information seekers even in their field of interest to find the information they need (McDermid, Kristjanson, & Spry, 2010). The most used for accessing to biomedical information are information retrieval (IR) systems, such as PubMed1 which gives access to the MEDLINE2 biomedical bibliographic database (Hristovski, Dinevski, Kastrin, & Rindflesch, 2015). Indeed, finding sufficient and short precise answers is a challenging task for classical IR systems (Wren, 2011). In addition, in classical IR systems, the users have often to deal with the burden of studying and filtering the returned citations of their queries so as to find the precise information they were looking for. Therefore, to minimize searching and browsing time while maximizing the usefulness of that knowledge is a growing interest for biomedical question answering systems (Bauer & Berleant, 2012). Question answering (QA) regards a sophisticated form of IR characterized by information needs that are expressed as natural language statements or questions (Wren, 2011). It aims at providing inquiries with specific pieces of information as an answer, by automatically analyzing thousands of articles, ideally, in less than a few seconds. Typically, an automated QA system consists of three main processing phases, which can be studied and developed independently (Athenikos & Han, 2010; Cao et al., 2010; Neves & Leser, 2015): (1) question processing, (2) document processing, and (3) answer processing. Figure 1 illustrates the generic architecture of a biomedical QA system.
Given an input biomedical question, the question is first handed over to the question processing phase. The latter consists of the following components: (a) question analysis for extracting some useful information such as biomedical entity names, and semantic relationships; (b) question classification for identifying the answer format and the topic (Cao et al., 2010; Patrick & Li, 2012; Roberts et al., 2014; Lopes et al., 2014; Sarrouti et al., 2015); (c) query formulation for constructing IR-style query by transforming the question into a canonical form. The output of this phase is an appropriate query which is used as input to document processing, the second phase. An IR system is normally used to retrieve the relevant documents (Sarrouti & Alaoui, 2016). Then, passages are extracted which serve as answer candidates as well as an input to the last phase, answer processing; in this phase, the system uses an appropriate answer extraction algorithm to estimate the qualities of the candidate answers. Finally, the top-ranked candidate answers and the raw texts from which the answers were extracted are shown to the user (Sarrouti & Ouatik, 2017).
Figure 1. A typical biomedical question answering system architecture