Article Preview
TopIntroduction
Huge amount of information is present in a lengthy news articles. Therefore, rather than reading the detailed news article one can read only the headline and understand the key concept. Headline is an important part of news. It helps to gain vital information in less amount of time. It also comprises of a single sentence that gives an entire idea of a piece of writing. It reduces the cognitive burden of a reader by taking in only significant information.
Reading of lengthy news article is very time consuming and tedious process. Therefore, there is a need of headline for saving reader’s time and quick understanding of vital content of news article. Headline construction generally includes analysis of the input text, understanding the important concepts and then formation of headline. For framing effective headline, key terms can be used which are retrieved by applying keyword extraction techniques (Habibi & Popescu-Belis, 2015). Headline generation is one of the vital techniques used to reduce information overload and key concept retrieval from the text (Kaikhah, 2004).
In literature, it is observed that there are many approaches available for automatic headline generation such as statistical versus linguistic, abstractive versus extractive etc. The techniques available are used to improve the quality of headline (Soricut & Marcu, 2007; Banko et al., 2000). In extractive headline generation approach, most relevant text is extracted and gets compressed to the proper size (Malhotra & Dixit, 2013). In abstractive headline generation approach, input text gets analyzed, significant words get selected and then glued together in order to form headline (Banko, Mittal, & Witbrock, 2000). The statistical and linguistic approaches use HMM Hedge model, Hedge Trimmer model (Dorr & Zajic, 2003), and knowledge about language structure for headline generation.
The research work presented in this paper is extended version of the work presented in (Shrawankar & Wankhede, 2016). The aim of this work is to construct headline from detailed lengthy news article by applying keyword extraction and some techniques of NLP. This work is only restricted to English news articles. The dataset used in this work consists of BBC channel news of various types including sports, politics, technical, business, education etc. Along with this, any online news article can also be given as input to the system. The input news article undergoes many pre-processing steps like sentence segmentation, tokenization etc. The key terms are retrieved by applying available keyword extraction techniques (Habibi & Popescu-Belis, 2015) which helps to construct proper headline. The keyphrases are picked out by using Keyphrase Extraction Algorithm (KEA) (Bohne et al., 2011; Witten et al., 1999; Kumar & Srinaathan, 2008; Li, He, & Yangnan, 2014), which helps to improve the quality of headline. As the structured news contains most informative sentences in the leading paragraph, some of them are selected for parse tree generation. Then the parse tree of some leading sentences is generated by using parsing technique of NLP (Li, He, & Yangnan, 2014). Natural language processing is based on the idea of designing and building a computer system that will recognise, analyse, understand and generate natural language sentences which are human language oriented. The applications of NLP are mainly divided into two parts: