Modified LexRank for Tweet Summarization

Modified LexRank for Tweet Summarization

Avinash Samuel (Department of CEA, GLA University, Mathura, India) and Dilip Kumar Sharma (Department of CEA, GLA University, Mathura, India)
Copyright: © 2016 |Pages: 12
DOI: 10.4018/IJRSDA.2016100106


Summary generation is an important process in those conditions where the user needs to obtain the key features of the document without having to go through the whole document itself. The summarization process is of basically two types: 1) Single document Summarization and, 2) Multiple Document Summarization. But here the microblogging environment is taken into account which have a restriction on the number of characters contained within a post. Therefore, single document summarizers are not applicable to this condition. There are many features along which the summarization of the microblog post can be done for example, post's topic, it's posting time, happening of the event, etc. This paper proposes a method that includes the temporal features of the microblog posts to develop an extractive summary of the event from each and every post, which will further increase the quality of the summary created as it includes all the key features in the summary.
Article Preview


A microblogging platform such as Twitter can get about 400 million tweets each day, and it has risen as an important wellspring of news, web journals, personal thoughts, etc. Tweets, in their crude structure, while being enlightening, can likewise be overpowering. For example, hunting down an interesting issue in Twitter may yield a huge number of tweets, traversing weeks. Regardless of the possibility that separating is permitted, driving through numerous tweets for critical substance would be a bad dream, also the gigantic measure of commotion and excess that one may experience. To aggravate the situation, new tweets fulfilling the sifting criteria may arrive ceaselessly, at an eccentric rate. Users not only use the microblogging sites for portraying their thoughts but also is used by many users to find answers to certain query. A system which creates the summary form the set of tweets is in some way similar to the working of the search engine and must have the thee important components an explained in D. K. Sharma and A. K. Sharma, (2013).

  • 1.

    Extractor: which fetches the set of tweets form a given keyword(s),

  • 2.

    Analyser: an indexer that processes words in the tweet and stores the resulting index in a database, and

  • 3.

    Interface Generator: a query handler that understands the need and preferences of the user.

Woodsend and Lapata (2010) in their experiment they used a single document to create the highlights. Their method the sentences and the phrases and calculated its importance and creating the outlines or highlights. Tweet rundown thus obliges functionalities which essentially vary from conventional outline. When all said is done, tweet outline needs to mull over the transient element of the arriving tweets. Consider a scenario that a user wants to search on the microblogging environment about the term 'Nepal Earthquake'. The system which is responsible for the summarization of the tweets will regularly monitor the stream of incoming tweets, for tweets related to the keywords being searched while producing the timeline of the stream of tweets in real-time.

Summarizing the data/datasets is a technology that has been used in many systems to deal with the problem of data smog. Data Smog happens when the measure of information to a framework surpasses its preparing limit. Data Analysts have genuinely restricted intellectual processing limit. Thus, when data smog happens, it is likely that a diminishment in choice quality will happen. There are many commonly used software that provide us with the functionality of data summarization as they save us from the data smog, such as Google's Picasa Picture Viewer which has a functionality of summarizing the photo album of the user and provides its summary.

Many sites such as Yelp and Zomato provide reviews of places products which is a multi-domain covering website. Let us assume that a user is trying to find information about a restaurant at Toronto or New Delhi to get the overall picture he/she would have to skim through all the reviews that have been posted about that restraunt. Yelp summarizes the reviews so that the user does not have to go through all the posts or reviews to get an idea about the place but the website provides a summarized form of all the review so that the user can get a general idea about the place. Summarization is used in multiple domains such as SNS (Social Networking Sites), Web Search Results, Document Sets, Newspaper Articles etc. Summarization helps the user to obtain a quick overview of a much larger dataset.

There are basically three types of summaries that are generated by the summarization techniques which are as follows:

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 7: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 6: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 5: 4 Issues (2018)
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 2 Issues (2015)
Volume 1: 2 Issues (2014)
View Complete Journal Contents Listing