Single-Sentence Compression using XGBoost

Single-Sentence Compression using XGBoost

Deepak Sahoo (IIIT-Bhubaneswar, Odisha, India) and Rakesh Chandra Balabantaray (IIIT-Bhubaneswar, Odisha, India)
Copyright: © 2019 |Pages: 11
DOI: 10.4018/IJIRR.2019070101


Sentence compression is known as presenting a sentence in a fewer number of words compared to its original one without changing the meaning. Recent works on sentence compression formulates the problem as an integer linear programming problem (ILP) then solves it using an external ILP-solver which suffers from slow running time. In this article, the sentence compression task is formulated as a two-class classification problem and used a gradient boosting technique to solve the problem. Different models are created using two different datasets. The best model is taken for evaluation. The quality of compression is measured using two important quality measures, informativeness and compression rate. This article has achieved 70.2 percent in informativeness and 38.62 percent in compression rate.
Article Preview

Dozens of systems have been introduced for sentence compression and most of them are deletion based that heavily relies on syntactic information to minimize grammatical error in the output but that leads to a very complex system. Generally the sentence compression system used as a module in text summarization system so it is required that the compression system should be simple otherwise the overall complexity of the system will be very high.

Different sentence compression systems have been proposed by different researchers since its very fitst approach by Grefenstette (1998), Knight & Marcu (2000, 2002), Jing & McKeown (2000). Parsers play an important role in NLP task. Some sentence compression methods highly depend on parsers to identify syntactic information for sentence compression task Clarke & Lapata (2006), McDonald (2006), Toutanova, Brockett, Gamon, Jagarlamundi, Suzuki and Vanderwende (2007), Nomoto (2009), but systems based on syntactic information is not robust.

To produce grammatically correct compressed sentence some methods play with the syntactic structure of the tree Galley & McKeown (2007), Cohn & Lapata (2009), Filippova & Strube (2008a), Galanis & Androutsopoulos (2010), Wang, Raghavan, Castelli, Florian & Cardie (2013) use techniques that modify/rectify the syntactic trees.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing