In this section, we will briefly introduce the Authorship Attribution, which is the background problem of this investigation, as well as its most important basics.
1.1. Authorship Attribution
Centuries ago, authorship attribution (or author identification) was an issue that concerned many researchers because of its important role in authentication. Today, the problem is still persisting and becomes an essential way to solve mainly internet information problems such as plagiarism and fraud detection, identifying a source of documents (Li et al., 2013), identifying new authors occurring in streaming data source (Seker et al., 2013), disputed authorship (Eder et al., 2013; Khonji et al., 2015; Napoli et al., 2015; Segarra et al., 2015; Varela et al., 2016), detecting anonymous letters and harassing e-mails or messages or identifying authors for conversational texts and social media forensics (Inches et al., 2013; Okuno et al., 2014; Spitters et al., 2015; Rocha et al., 2017), etc. The AA field studies the writing style of an author, also called “stylometry”, in order to identify an anonymous digital or handwritten text segment of an author. Accordingly, the suitable features of the text document should be extracted, and then combined with an appropriate clustering technique to retrieve the right author. For the identification task, the spelling mistakes and stop-words must be kept because they play a very important role to define the appropriate author.