Access Full-Text Recommend to Your Library

Buy Instant Access to This Article

Instant access upon order completion

Add to Cart

Share

Recommend to Librarian Fair Use Policy

Free Content

Sample PDF

More Information

Rights & Permissions
Access on Platform
Favorite
Cite Article Cite Article

MLA

Yu, Shanshan, et al. "Towards High Performance Text Mining: A TextRank-based Method for Automatic Text Summarization." IJGHPC vol.8, no.2 2016: pp.58-75. https://doi.org/10.4018/IJGHPC.2016040104

APA

Yu, S., Su, J., Li, P., & Wang, H. (2016). Towards High Performance Text Mining: A TextRank-based Method for Automatic Text Summarization. International Journal of Grid and High Performance Computing (IJGHPC), 8(2), 58-75. https://doi.org/10.4018/IJGHPC.2016040104

Chicago

Yu, Shanshan, et al. "Towards High Performance Text Mining: A TextRank-based Method for Automatic Text Summarization," International Journal of Grid and High Performance Computing (IJGHPC) 8, no.2: 58-75. https://doi.org/10.4018/IJGHPC.2016040104

Export Reference

For Librarians

Towards High Performance Text Mining: A TextRank-based Method for Automatic Text Summarization

Shanshan Yu (College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China), Jindian Su (College of Computer Science and Engineering, South China University of Technology, Guangzhou, China), Pengfei Li (College of Computer Science and Engineering, South China University of Technology, Guangzhou, China), and Hao Wang (Norwegian University of Science and Technology in Aalesund, Aalesund, Norway)

Source Title: International Journal of Grid and High Performance Computing (IJGHPC) 8(2)

DOI: 10.4018/IJGHPC.2016040104

Abstract

As a typical unsupervised learning method, the TextRank algorithm performs well for large-scale text mining, especially for automatic summarization or keyword extraction. However, TextRank only considers the similarities between sentences in the processes of automatic summarization and neglects information about text structure and context. To overcome these shortcomings, the authors propose an improved highly-scalable method, called iTextRank. When building a TextRank graph in their new method, the authors compute sentence similarities and adjust the weights of nodes by considering statistical and linguistic features, such as similarities in titles, paragraph structures, special sentences, sentence positions and lengths. Their analysis shows that the time complexity of iTextRank is comparable with TextRank. More importantly, two experiments show that iTextRank has a higher accuracy and lower recall rate than TextRank, and it is as effective as several popular online automatic summarization systems.

Article Preview

Top

1. Introduction

It is commonly agreed that we are in the era of big data (Wang et al. 2015). Among various types of data, texts are the most common and pervasive all over the network. Although many effective technologies such as distributive or parallel computations have been proposed, e.g., MapReduce (Slagter et al. 2013; Salgter, et al. 2015; Salgter, et al. 2015), the information overload problem is getting worse as the quantity of data keep increasing rapidly. Automatic text summarization arises as an effective technology for producing a concise and fluent summary conveying the key information in the original text document (Nenkova & McKeown, 2012). Currently, high performance automatic summarization has already become a very important topic in the area of machine learning and data mining, and it is widely used in a large number of industrial sectors, especially in search engines such as Google, Baidu, Yahoo and news portals such as BBC, CNN and NBC News. Many researchers have developed various word-based, sentence-based and graph-based summarization methods. Among them, graph-based methods have attracted a lot of attentions. For example, Ferreira et al. (2013) proposed a four-dimension (including similarity, semantic similarity, co-reference and discourse information) graph model by taking co-reference resolution and the role of pronouns in connecting the sentences into consideration. See (Gupta &Lehal, 2010) and (Joshi & Sonawane, 2015) for more detailed surveys of extractive summarization techniques and graph-based methods.

Complete Article List

Search this Journal:

Reset

Volume 18: 1 Issue (2026)

Volume 17: 1 Issue (2025)

Volume 16: 1 Issue (2024)

Volume 15: 2 Issues (2023)

Volume 14: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 13: 4 Issues (2021)

Volume 12: 4 Issues (2020)

Volume 11: 4 Issues (2019)

Volume 10: 4 Issues (2018)

Volume 9: 4 Issues (2017)

Volume 8: 4 Issues (2016)

Volume 7: 4 Issues (2015)

Volume 6: 4 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Towards High Performance Text Mining: A TextRank-based Method for Automatic Text Summarization

Abstract

1. Introduction

Complete Article List