Multiword Expressions in NLP: General Survey and a Special Case of Verb-Noun Constructions

Alexander Gelbukh (National Polytechnic Institute, Mexico) and Olga Kolesnikova (National Polytechnic Institute, Mexico)
DOI: 10.4018/978-1-4666-6042-7.ch010
This chapter presents a survey of contemporary NLP research on Multiword Expressions (MWEs). MWEs pose a huge problem to precise language processing due to their idiosyncratic nature and diversity of their semantic, lexical, and syntactical properties. The chapter begins by considering MWEs definitions, describes some MWEs classes, indicates problems MWEs generate in language applications and their possible solutions, presents methods of MWE encoding in dictionaries and their automatic detection in corpora. The chapter goes into more detail on a particular MWE class called Verb-Noun Constructions (VNCs). Due to their frequency in corpus and unique characteristics, VNCs present a research problem in their own right. Having outlined several approaches to VNC representation in lexicons, the chapter explains the formalism of Lexical Function as a possible VNC representation. Such representation may serve as a tool for VNCs automatic detection in a corpus. The latter is illustrated on Spanish material applying some supervised learning methods commonly used for NLP tasks.
2. Definitions

Although MWEs are understood quite easily by intuition and their acquisition presents no difficulty to native speakers (though it is usually not the case for second language learners), it is hard to identify what features distinguish MWEs from free word combinations. Concerning this issue, such MWE properties are mentioned in literature: reduced syntactic and semantic transparency; reduced or lack of compositionality; more or less frozen or fixed status; possible violation of some otherwise general syntactic patterns or rules; a high degree of lexicalization (depending on pragmatic factors); a high degree of conventionality (Calzolari, Fillmore, Grishman, Ide, Lenci, MacLeod, & Zampolli, 2002).

No convention exists so far on the definition of MWEs but almost all formulations found in research papers emphasize the idiosyncratic nature of this linguistic phenomenon. Here are some definitions that are most frequently referred to in papers; we marked in boldface those concepts and properties that we think serve as the criteria for distinguishing MWE from compositional phrases:

