Utilizing Artificial Intelligence for Text Classification in Communication Sciences: Reliability of ChatGPT Models in Turkish Texts

Sadettin Demirel, Neslihan Bulur, Zindan Çakıcı

Source Title: Design and Development of Emerging Chatbot Technology

DOI: 10.4018/979-8-3693-1830-0.ch013

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This study delves into the evaluation of ChatGPT's effectiveness in sentiment detection and text classification tasks specifically on Turkish texts, a domain that has been relatively underexplored in existing literature predominantly focused on English texts. Leveraging datasets comprising manually labeled YouTube comments and news tweets categorized into sentiment classes and thematic topics, the authors rigorously assess the performance of ChatGPT-3.5 and ChatGPT-4 using accuracy and F1 performance metrics. These findings unveil insights into ChatGPT's proficiency in classifying Turkish textual content, illuminating its alignment with human-labeled classifications. This research not only contributes to expanding the scope of AI research beyond English language but also underscores the significance of language diversity in evaluating and refining AI models' performance for broader applicability in the research practices of social and communication sciences.

Chapter Preview

Top

Introduction

The advancements in the field of artificial intelligence have reached a significant peak, especially with the models introduced in 2022. These models belong to the category of large language models and are utilized in applications such as artificial intelligence chatbots. These technologies encompass not only the ability to engage in meaningful conversation but also various capabilities such as summarizing and translating large blocks of text. One type of generative artificial intelligence, ChatGPT, is an artificial intelligence language model developed by OpenAI, capable of understanding user queries and generating natural and human-like text responses (OpenAI et al., 2023). Particularly, GPT-3.5 and GPT-4 are trained using deep learning techniques and large amounts of text data. With a total of 175 billion parameters, these large-scale models significantly enhance language understanding and generation abilities (He et al., 2023). ChatGPT-3.5 and ChatGPT-4 demonstrate impressive performance across various tasks in natural language processing. By understanding user queries or commands, considering context, and generating responses across a broad spectrum of language, they provide a natural and fluent conversational environment. Additionally, trained on a vast corpus of text data, they offer access to information across diverse topics and excel in handling complex sentence structures. However, they may sometimes exhibit limitations such as providing illogical or inconsistent responses, reflecting societal biases, or lacking evidence-based reasoning (OpenAI et al., 2023; Brin et al., 2023).

Despite these aspects of ongoing development, ChatGPT-3.5 and ChatGPT-4, along with the support of application programming interfaces (APIs) provided by OpenAI for Chat-GPT, offer a wide range of services in commercial sectors. Besides commercial applications, ChatGPT holds significant potential as a new tool and research area in academic studies. By delegating tasks such as traditional content analysis and sentiment analysis to models like Chat-GPT, substantial time and resource savings can be achieved in these laborious processes (Gilardi, Alizadeh & Kubli, 2023; Törnberg, 2023; Rathje et al., 2023). However, many studies in the literature primarily focus on evaluating the sentiment or text classification performance of Chat-GPT on English texts, indicating a lack of research evaluating its performance in other languages, particularly Turkish.

This study aims to investigate the sentiment detection and text classification capabilities of ChatGPT on Turkish texts. By measuring its text classification prowess on Turkish texts, this research attempts to assess how reliable ChatGPT is and to what extent it can be utilized for the research practices in social and communication sciences. To compare ChatGPT models’ text classification performance on Turkish text, we used YouTube comments (n = 500), news organizations' news tweets (n = 500) which were manually classified by two coders into 3 sentiment categories and 9 themes until inter-coder agreement values were deemed sufficient. In the analysis phase, we computed accuracy and F1 performance metrics to evaluate sentiment and theme classification ability of ChatGPT-3.5 and its successor ChatGPT-4 generative AI models. The obtained results provide important insights into ChatGPT's performance on Turkish texts and its ability to classify textual contents similarly to users, thereby contributing to understanding of generative AI technologies for the research practices.

Key Terms in this Chapter

Machine Learning: Machine learning is a way for computers to learn from data without being explicitly programmed. It's like teaching a computer to recognize patterns and make decisions based on examples it's given. Instead of following fixed rules, machine learning algorithms adjust and improve their performance over time as they're exposed to more data. This helps them make predictions or decisions without being explicitly programmed for every possible scenario.

Recall: Recall, also known as sensitivity or true positive rate, measures the proportion of correctly predicted positive cases among all actual positive cases. It answers the question: “Of all the actual positive cases, how many did I correctly predict?”

Sentiment Analysis: Sentiment analysis is a technique used to determine the emotional tone or attitude expressed in a piece of text. It involves analyzing text to determine whether it expresses positive, negative, or neutral sentiment. This can be useful in understanding opinions, attitudes, or feelings expressed in customer reviews, social media posts, or any other form of textual data. Sentiment analysis algorithms typically use machine learning to classify text based on the emotions conveyed within it.

Precision: Precision measures the proportion of correctly predicted positive cases among all instances predicted as positive. In simpler terms, it answers the question: “Of all the items I predicted as positive, how many are actually positive?”

Accuracy: This metric measures the proportion of correctly classified instances among all instances. It's a simple measure of overall correctness, calculated as the number of correct predictions divided by the total number of predictions.

Content Analysis: Content analysis is a research method used to analyze the content of text, audio, video, or any other form of communication. It involves systematically categorizing and interpreting the content to identify patterns, themes, or trends. Researchers use content analysis to gain insights into the characteristics, meanings, and implications of the communication. It can be applied to various types of content, such as articles, speeches, advertisements, social media posts, and more.

Text Classification: Text classification is a process where a computer program automatically categorizes pieces of text into predefined categories or classes. For example, it could classify emails as spam or not spam, news articles by topic, or customer reviews as positive or negative. The computer uses machine learning techniques to analyze the text's content and context, identifying patterns that help it assign the correct category to each piece of text.

F1 Score: The F1 score is the harmonic mean of precision and recall. It provides a single score that balances both precision and recall. It's particularly useful when the classes are imbalanced, meaning one class has significantly more instances than the other. The F1 score ranges from 0 to 1, where a higher score indicates better performance.

Application Programming Interface (API): API is like a bridge that allows different software applications to communicate and interact with each other. It defines the rules and protocols that enable one piece of software to access the functionalities or data of another. APIs are commonly used to integrate different systems, enable interactions between web services, or allow developers to build on top of existing platforms. They specify how software components should interact, making it easier to develop new applications or extend existing ones without having to build everything from scratch.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Utilizing Artificial Intelligence for Text Classification in Communication Sciences: Reliability of ChatGPT Models in Turkish Texts

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List