Extracting and Summarizing the Commonly Faced Security Issues from Community Question Answering Site

Extracting and Summarizing the Commonly Faced Security Issues from Community Question Answering Site

Abhishek Kumar Singh (NIT Raipur, Raipur, India), Naresh Kumar Nagwani (National Institute of Technology Raipur, Raipur, India) and Sudhakar Pandey (National Institute of Technology Raipur, Raipur, India)
Copyright: © 2019 |Pages: 12
DOI: 10.4018/IJISP.201907010103
OnDemand PDF Download:
No Current Special Offers


Community question-answering (CQA) sites are popular as information-seeking platforms where users communicate to their peers. Security-related posts are gaining popularity with the rapid development of information technology in these sites and. CQA sites contains wide range of posts from classic cryptography to recently popular mobile security. Investigating such posts can be useful for researchers, teachers and developers. In this article, spectral clustering and frequent term-based summarization techniques are proposed for security related posts. The proposed method is developed in three stages. In the first stage, security related folksonomies are created and security post profile matrix is built with the help of tag frequency-inverse security post frequency. In the second stage, security related posts are grouped with help of spectral clustering algorithms. Finally, in the third stage, frequent terms are extracted from each cluster for security related post summarization with the help of frequent words and semantic similarity.
Article Preview

1. Introduction

Community-based Question-Answering (CQA) sites provides an online platform where community users can ask and answer questions. CQA sites help the community users for knowledge sharing and production. Some examples of the CQA sites are Yahoo Answer1, Stack Overflow2, StackExchange3 and Quora4. In recent time’s huge number of the technical workers are using the CQA sites for (Zhou et al. 2016, Jiang et al. 2015, Wang et al.2016, Davis et al. 2017) as CQA sites provide a good platform for knowledge sharing between the communities and peers. CQA sites contain millions of posts in the form of questions and answers which cover a wide range of area and topics including security related questions. With the rapid growth of questions of information technology, security related questions are also attracted huge attention of the community users. Technical workers care about security of the software they develop. Security of any software is very critical as it can cause financial loss, information loss and confidentiality leakage (Yang, 2016). There are many security related topics which changes time to time. Therefore, it is very important to make a comprehensive study to investigate security-related questions and analyse these questions. Text summarization is one the technique for analysing the high volume of data. Text summarization is a process of creating a shorter version of the original text document by preserving most of the useful information of the original text document (Nagwani and Verma 2016).

In this paper, a new model is proposed to summarize the security related posts based on folksonomy, spectral clustering, frequent terms and semantic similarity. The proposed summarization method is developed in three stages. In the first stage, security related posts of the CQA sites are extracted and a security related folksonomy is formulated for CQA sites (Godoy et al., 2014, Singh et al., 2017). In this stage, security post profile matrix is also created with the help of folksonomy. A security related folksonomy can be defined as a tuple IJISP.201907010103.m01, where U is a set of users, T is a set of tags associated to security related posts and PS is a set of security related posts and YS is a ternary relation IJISP.201907010103.m02. The security post profile matrix is created with the help of folksonomy and tag-frequency inverse security post frequency (TF-ISPF) (Amer et al., 2016).

In the second stage, spectral clustering is applied on the security post profile matrix to group the similar security related posts. Spectral clustering is one of the famous graph-based clustering algorithm because it achieves impressive results in various applications (Langone et al., 2017). Finally, in the third stage, the text summarization is done by generating the frequent terms. In this step, frequent terms is identified with the help of word frequency and semantic similarity. Semantic similarity is a concept where a set of documents is represented as metric in which distance between them is based on the likeness.

Complete Article List

Search this Journal:
Open Access Articles
Volume 15: 4 Issues (2021): 1 Released, 3 Forthcoming
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing