Article Preview
Top1. Introduction
Recent technological developments and advances in the field of biomedicine have brought an increasing knowledge of molecular and cellular physiology, genomics, proteomics and pharmacology. This has led to the generation of large amounts of experimental and computational biomedical data along with new discoveries, which are generally described, in the first instance, in research biomedical publications. Only considering the bibliographic database MEDLINE, the number of published research articles is increasing between 10,000 and 20,000 articles per week (National Library of Medicine, 2014). The process of reviewing all the literature related to a biomedical or pharmacological subject is very time-consuming. Natural Language Processing (NLP) techniques can provide an interesting way to reduce the time spent by healthcare professionals and scientific researches on reviewing biomedical literature, as well as a promising approach for new knowledge discovery (Mack & Hehenbergerb, 2002).
Recently, one of the areas that have attracted a great deal of attention by the NLP research community is pharmacovigilance. Pharmacovigilance is formally defined by the World Health Organization (WHO) as the science and activities relating to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problems (WHO, 2002). A type of common and important adverse drug reaction (ADR), having a significant impact on patient safety and healthcare costs, is drug-drug interactions (DDIs) (Aronson, 2007; Jankel, McMillan, & Martin, 1994). A DDI is the process that occurs when one drug affects the levels or effects of another drug in the body. Although there is a large quantity of drug databases and semi-structured resources - such as DrugBank (Law et al., 2014), Stockley (Baxter, 2013) and Drug Interactions Facts (Tatro, 2010), among others - to assist healthcare professionals in the prevention of DDIs, the quality of these databases is very uneven and the consistency of their content is limited, so it is very difficult to assign a real clinical significance to each interaction (Paczynski, Alexander, Chinchilli, & Kruszewski, 2012; Rodríguez-Terol et al., 2009). On the other hand, despite the availability of these databases, a large proportion of the most current and valuable information on DDIs is unstructured, written in natural language and hidden in published articles. A simple search for the term “drug-drug interactions” in the web search engine Google Scholar® provides 63.500 results, and in the online library MEDLINE, 136,985 articles are indexed with the MeSH term “drug interaction”.
In the last years, DDIs are becoming even more relevant since the increasingly frequent use of several drugs for the treatment of one or more different diseases (polytherapy) in a large population leads to an increased risk for drug combinations that have not been studied in pre-authorization clinical trials (Back & Else, 2013). Moreover, it has been shown that genetic factors can lead to differences in a drug’s effect in some individuals (Martiny & Miteva, 2013). Therefore, the consequence of a DDI can differ from one patient to another. This deeper knowledge about DDIs and their related factors has brought a deluge of publications describing new aspects of known DDIs as well as the discovery of new DDIs. Thus, the development of automatic methods for collecting, maintaining and interpreting the information about drugs is crucial to achieve a real improvement in the early detection of DDIs.
Several recent NLP systems have shown promising results in extracting DDIs from biomedical literature (Chowdhury & Lavelli, 2013b; Segura-Bedmar, Martínez, & de Pablo-Sánchez, 2011; Segura-Bedmar, Martínez, & Herrero-Zazo, 2013; Segura-Bedmar, 2010; Thomas, Neves, Rocktäschel, & Leser, 2013). The major bottleneck for advancing in this area is, however, that these systems rely on specific resources providing the domain knowledge (databases, terminological vocabularies, corpora, ontologies, etc.) necessary to address the Information Extraction (IE) tasks.