Beyond Surface Linguistics: Assessing the Cognitive Limitations of GPT Through the Long Memory Test

Matej Šprogar

Source Title: Principles and Applications of Adaptive Artificial Intelligence

DOI: 10.4018/979-8-3693-0230-9.ch005

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Contemporary artificial intelligence has advanced markedly toward mimicking human intelligence. Despite linguistic proficiency, machines remain bereft of genuine text comprehension, leading to perceived, albeit superficial, intelligence. This chapter introduces a straightforward test that highlights the non-human-like cognition of machines such as ChatGPT. Eschewing the prevalent approach of progressively complex testing, the long memory test highlights ChatGPT's inability to function at a human level. Central to this assessment, the test mandates reliable information retention, a feat the transformer architecture of GPT fails to achieve.

Chapter Preview

Top

1. Introduction

Today, machines accomplish feats once deemed impossible. Not long ago, we believed that only an intelligent machine could surpass a human in playing games. First, we were forced to raise the bar from checkers to chess. When Deep Blue finally bested Garry Kasparov, our benchmark for intelligence shifted again, this time to the game of Go, our most challenging abstract strategy board game. In between, Jeopardy fell prey to IBM's Watson. Following AlphaGo's victory over Lee Sedol, we again recalibrated our standards to what many regarded as the pinnacle of complexity — human language. Presently, ChatGPT demonstrates prowess as a writer. As a result, mere language generation is no longer seen as definitive evidence of intelligence; we now think that ChatGPT's proficiency suggests that human language may not be as intricate as we once believed. Every time machines surpass a milestone, we redefine our criteria for intelligence (Blum, 2023). This ambiguity stems from our inclination to preserve what we deem uniquely human and our lack of a concrete definition for intelligence, with the ongoing debate over what constitutes consciousness. While machines may one day exhibit behaviours we associate with conscious beings, it is important to note that the term 'consciousness' remains a subject of extensive philosophical and scientific debate. However, by underestimating the capabilities of machines, we might overlook the moment they exhibit characteristics we associate with consciousness — a moment too crucial to ignore.

ChatGPT, the latest innovation in natural language processing, is powered by the Generative Pre-trained Transformer (GPT) architecture, a leading player in the large language model (LLM) arena. Representing a significant progression in the domain, ChatGPT can craft coherent, contextually relevant, and easily comprehensible responses to human prompts. Though new technologies often exhibit minor imperfections, ChatGPT's capabilities are notably impressive. An important challenge, however, is GPT's propensity to “hallucinate” — weaving accurate statements with inaccurate or fabricated information. These missteps, often subtle, underscore that GPT lacks true comprehension of its generated content. Aside from hallucinations, GPT has other limitations. Given the significant advancements and the buzz around this technology, discerning ChatGPT's genuine cognitive competencies is paramount.

Numerous publications have delved deeply into the merits and drawbacks of the GPT language model, yielding varied findings. While Marcus and Davis (2020) argue that GPT-3 does not understand the world, Sobieszek and Price (2022) identified a semantical dimension in its outputs. Moreover, Kosinski (2023) postulated a potential emergence of a theory of mind. Peregrin (2021), however, questioned the fundamental terminology of syntax and semantics employed for decades in discussions surrounding the capabilities of thinking machines. Similarly, Montemayor (2021) advised against applying psychological and philosophical vocabulary when examining GPT-like systems. There is a consensus among many scholars that AGI models should undergo distinct evaluations from humans, prompting the introduction of specialized tests (Chollet, 2019; Moskvichev, 2023).

In Spring 2023, OpenAI introduced the latest iteration of its signature GPT series, GPT-4. This model surpasses its predecessor in knowledge depth, communication finesse, and heightened reasoning prowess, even featuring image analysis capabilities. This progression naturally stirs speculation about the innovations GPT-5 might harbour and, more broadly, if a future version, termed GPT-X, could attain human-equivalent intelligence or “Artificial General Intelligence” (AGI). There are two approaches to addressing this profound question: The first is subjecting GPT-X to the ultimate intelligence benchmark, which currently eludes us. The second approach aims to craft a criterion that GPT-X inevitably fails to meet, the primary topic of our discussion here.

The main contributions of this work are as follows:

•
It emphasizes the value of simpler cognitive tests over more complex ones.
•
It challenges the notion that success rates in similar tasks correlate directly with intelligence.
•
It proposes a succinct test that rejects the view of GPT and its successors as reasoning entities.
•
It challenges the in-context learning capabilities of GPT.

Key Terms in this Chapter

Knowledge: Organized information that has been processed in a way that allows for its practical application or use. It facilitates understanding and underpins the process of learning from data.

The: Symbol Grounding Problem : This addresses the challenge of linking abstract symbols or representations to their real-world meanings or referents. While a solution to the symbol grounding problem might draw insights from how the biological brain operates, it's not definitively tied to it.

ANN Parameters: Like weights and biases, this represents the learned information within the artificial neural network, guiding its behavior and responses.

Benchmark: A standard or reference point against which performance or achievements can be assessed. It can refer to a specific score, a set of tests, or a singular test used for evaluation.

Artificial Neural Network: (ANN): A computational model inspired by the structure of biological neural networks. In ANNs, artificial neurons serve as simplified representations of their biological counterparts, offering unique processing capabilities compared to traditional mathematical models.

ANN learning, or ANN Training: A process aimed at minimizing the error made by the artificial neural network when processing the training data.

In-Context Learning: This refers to a model's ability to acquire, deduce, and utilize knowledge derived directly from its immediate context.

Environment: This refers to the external conditions or settings that influence and determine the behavior and development of a system or entity.

Artificial General Intelligence: (AGI): This refers to machine intelligence that can perform any intellectual task that a human being can, making its behavior indistinguishable from that of a human in terms of cognitive abilities.

Memory: The ability of a system or entity to retain, recall, and use information over time.

Artificial Intelligence: (AI): A broad field of science that initially aimed to develop what is now AGI. Over time, it has diversified to encompass a wide range of technologies, with a primary focus now on crafting usable machine learning solutions tailored for specific applications.

Intelligence Metrics: Quantitative measures designed to assess or detect the level of intelligence exhibited by an entity. However, their accuracy and validity are subjects of debate.

Test: A structured procedure or assessment designed to ascertain specific information or truths. It can also refer to a collection of such procedures.

Model: A representation or implementation of a concept, idea, or system, either in software or hardware, designed to exhibit specific properties or behaviors.

The: Turing Test: A proposed measure of machine intelligence, where a machine's ability to exhibit human-like intelligence is evaluated based on whether its behavior can be distinguished from that of a human.

Learning: A fundamental cognitive ability that transforms sensory data into new knowledge, essential for adapting to one's environment.

Cognitive Skills: The mental capabilities and strategies that are necessary for processing and understanding information, enabling an entity to exhibit intelligent behavior.

Context: This refers to the set of circumstances or facts that surround a particular event, situation, or model's operation, providing clarity or information about its function.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Beyond Surface Linguistics: Assessing the Cognitive Limitations of GPT Through the Long Memory Test

Abstract

1. Introduction

Key Terms in this Chapter

Complete Chapter List