Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Predicting the Writer's Gender Based on Electronic Discourse

Szde Yu

Source Title: International Journal of Cyber Research and Education (IJCRE) 2(1)

DOI: 10.4018/IJCRE.2020010102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The present study compared three methods aimed at predicting the writer's gender based on writing features manifested in electronic discourse. The compared methods included qualitative content analysis, statistical analysis, and machine learning. These methods were further combined to create a mixed methods model. The findings showed that the machine learning model combined with qualitative content analysis produced the best prediction accuracy. Including qualitative content analysis was able to improve accuracy rates even when the training set for machine learning was relatively small. Thus, this study presented a concise model that can be fairly reliable in predicting gender based on electronic discourse with high accuracy rates and such accuracy was consistently found when the model was tested by two separate samples.

Article Preview

Top

Introduction

As digital evidence is increasingly involved in all types of crime, digital investigation is no longer a topic related only to cybercrime. Digital investigation is warranted when the subject’s presence and behavior in a digital environment may potentially reveal crucial clues. Such presence and behavior are normally referred to as digital footprints. Digital footprints can contain a wide variety of digital files, such as photos, computer logs, and videos. Nonetheless, text is still the most commonly encountered form of digital footprints as it appears in many forms of electronic discourse, including emails, text messages, social media comments, online discussions, blogs, online advertisements, and so on (Doane et al., 2016; Hill et al., 2014; Koivunen et al., 2014). As such, many research endeavors have been devoted to the analysis of electronic discourse (Miner et al., 2012; Dunne et al., 2012). The field of text analytics is growing and its importance in digital investigation is being valued more and more (Anwar and Abulaish, 2014; Al-Zaidy et al., 2012; Yu, 2015). When criminal investigators need to analyze textual evidence without knowing who the writer is, it is basically a text-based profiling process as investigators try to predict the characteristics of a person based solely on his or her electronic discourse. This concept is not new and many people are already using what they see on the Internet to make assumptions or inferences about a person they have never personally met. For instance, some people have tried to predict mental illness based on a person’s online text, such as tweets (Preotiuc-Pietro, 2015). While Internet users do not usually need to be held responsible for inaccurate predictions, in criminal investigation the accuracy of prediction matters and often bears critical consequences.

Ideally, it would be highly helpful if we can predict an unknown writer’s identity by analyzing his or her writing alone, but this is not yet a reliable technique. However, some research has found that it is not impossible to predict a person’s general characteristics, such as gender, age and education based on nothing but digital footprints (Steel, 2014; Yu, 2013). Notwithstanding, without other types of digital footprints, whether electronic discourse alone is sufficient for this purpose is not yet confirmed (Nguyen et al., 2014; Merler et al., 2015). The current trend is mainly focused on big data analysis, while in criminal investigations the text files available for analysis are generally limited in quantity as well as in content. A technique suitable for criminal investigators to apply when dealing with text-based evidence is severely lacking.

Accordingly, this study was aimed to test the ability to predict gender using nothing but the subject’s writing on an electronic platform (i.e., electronic discourse). Gender is being used for prediction in this exploratory attempt for the reason that it is a personal trait easier to verify. In criminal investigations, the ability to predict gender correctly is a huge step toward narrowing down suspects. In this study, the goal was to test different methods for this purpose. Two research questions were asked. First, can we predict the writer’s gender based solely on writing features in electronic discourse? Second, which method produces better accuracy? Built on the findings as to these questions, an additional inquiry was conducted to see if a mixed methods approach could further improve accuracy. The methods being compared here included a qualitative content analysis, a logistic regression model, and machine learning. It is important to stress that this study was not intended to create a new artificial intelligence technique that handles big data. Rather, the focus here is on identifying a technique most applicable to criminal investigations where investigators do not normally need to handle big data.

Complete Article List

Search this Journal:

Reset

Volume 5: 1 Issue (2024)

Volume 4: 2 Issues (2022): 1 Released, 1 Forthcoming

Volume 3: 2 Issues (2021)

Volume 2: 2 Issues (2020)

Volume 1: 2 Issues (2019)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Predicting the Writer's Gender Based on Electronic Discourse

Abstract

Introduction

Complete Article List