Article Preview
Top1. Introduction
Community-based Question-Answering (CQA) sites provides an online platform where community users can ask and answer questions. CQA sites help the community users for knowledge sharing and production. Some examples of the CQA sites are Yahoo Answer1, Stack Overflow2, StackExchange3 and Quora4. In recent time’s huge number of the technical workers are using the CQA sites for (Zhou et al. 2016, Jiang et al. 2015, Wang et al.2016, Davis et al. 2017) as CQA sites provide a good platform for knowledge sharing between the communities and peers. CQA sites contain millions of posts in the form of questions and answers which cover a wide range of area and topics including security related questions. With the rapid growth of questions of information technology, security related questions are also attracted huge attention of the community users. Technical workers care about security of the software they develop. Security of any software is very critical as it can cause financial loss, information loss and confidentiality leakage (Yang, 2016). There are many security related topics which changes time to time. Therefore, it is very important to make a comprehensive study to investigate security-related questions and analyse these questions. Text summarization is one the technique for analysing the high volume of data. Text summarization is a process of creating a shorter version of the original text document by preserving most of the useful information of the original text document (Nagwani and Verma 2016).
In this paper, a new model is proposed to summarize the security related posts based on folksonomy, spectral clustering, frequent terms and semantic similarity. The proposed summarization method is developed in three stages. In the first stage, security related posts of the CQA sites are extracted and a security related folksonomy is formulated for CQA sites (Godoy et al., 2014, Singh et al., 2017). In this stage, security post profile matrix is also created with the help of folksonomy. A security related folksonomy can be defined as a tuple
, where U is a set of users, T is a set of tags associated to security related posts and PS is a set of security related posts and YS is a ternary relation
. The security post profile matrix is created with the help of folksonomy and tag-frequency inverse security post frequency (TF-ISPF) (Amer et al., 2016).
In the second stage, spectral clustering is applied on the security post profile matrix to group the similar security related posts. Spectral clustering is one of the famous graph-based clustering algorithm because it achieves impressive results in various applications (Langone et al., 2017). Finally, in the third stage, the text summarization is done by generating the frequent terms. In this step, frequent terms is identified with the help of word frequency and semantic similarity. Semantic similarity is a concept where a set of documents is represented as metric in which distance between them is based on the likeness.