Stylometry-Based Authorship Identification: An Approach from the Internet of Behaviors Perspective through Contrastive Linguistic Analysis
Duc Huu Pham (International University-Vietnam National University Ho Chi Minh City, Vietnam)
Copyright: © 2023
|
Pages: 24
DOI: 10.4018/978-1-6684-9039-6.ch016
Abstract
The Internet has helped to collect and exchange information and data with its constant expansion and evolution for devices to be intertwined with each other. These advances provide invaluable information about people and issues related to their lives including behaviors, interests, and preferences which have brought about the Internet of Behaviors (IoB) in attempts to understand the data collected from users' online activities. From a behavioral psychology perspective, the IoB can address the question of how to understand the data, and how to apply that understanding to create things that benefit humans. The IoB is related to many fields of research including technology, data analytics, and behavior science in relationship with stylometry. The applications of stylometry within the IoB framework such as analyzing the writing style of social media posts, online reviews, journal articles, or literary works at tertiary educational organizations could provide insights into the personality or motivations of the author. Thus, stylometry could potentially be used to identify authorship.
TopBackground
The background will discuss the definition of stylometry, including authorship identification and attribution, the historical use of stylometry for authorship identification, and the Internet of Behaviors; the relationship between stylometry and the Internet of Behavior through investigating the role of linguistics in authorship identification and the importance of contrastive linguistic analysis in stylometry.
Key Terms in this Chapter
Vocabulary Richness: Referring to the variety and complexity of words used in writing or speech. It is the measure of how many different words are used and how often they appear.
Average Word Length: Referring to the average number of letters in a word within a given text. It is calculated by dividing the total number of letters by the total number of words in the text.
Stylometry: The application of the study of linguistic style such as genres, usually to written language to attribute authorship to anonymous or disputed documents through statistical analysis.
Word-Length Distribution: A measure used in statistical analysis of texts that deals with the frequency of words of different lengths. It shows how many words of each length (i.e., the number of characters in a word) appear in each text.
Word Frequency: Referring to the number of times a particular word appears in a text or a corpus (a large and structured set of texts).
Internet of Behaviors (IoB): Referring to the networked connectivity of devices and information for the analysis, understanding, and prediction of human behaviors, which emerges from the Internet of Things (IoT) to connect devices and link data from these devices to human behavior.
Contrastive Linguistics: A subfield of linguistics involving the comparing and the contrasting of two or more languages to find out the similarities and differences between two or among more than two languages in terms of the contrastive analysis of phonology, syntax, morphology semantics and pragmatics.
Authorship Identification: The process of determining who wrote a particular document when the authorship is uncertain or disputed through the use of statistical methods, such as stylometry, to analyze various aspects of the text to identify unique stylistic patterns that can be associated with a particular author.
Complete Chapter List
Search this Book: