Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

The Effect of Stemming on Arabic Text Classification: An Empirical Study

Abdullah Wahbeh, Mohammed Al-Kabi, Qasem Al-Radaideh, Emad Al-Shawakfa, Izzat Alsmadi

Source Title: Information Retrieval Methods for Multidisciplinary Applications

DOI: 10.4018/978-1-4666-3898-3.ch013

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.

Chapter Preview

Top

1. Introduction

The tremendous growth of available Arabic text documents on the Web and databases have posed a major challenge on researchers to find better ways to deal with such huge amount of information in order to enable search engines and information retrieval systems to provide relevant information accurately, which has become a crucial task to satisfy the needs of different end users.

Text classifications, and its techniques, have become a major tool for dealing with the large amount of available data on the Web and databases. Text classification is the task of automatically assigning text documents to one or more predefined categories based on content and linguistic features (Gharib et al., 2009; Mesleh et al., 2007; Rahman et al., 2003; Zubi, 2009; Al-Harbi et al., 2008). Several researches applied text classification and its techniques to English and other European languages. On the other hand, few researchers have addressed the issue of Arabic text classification.

Text preprocessing and preparation; especially for Arabic, is a crucial task in several applications including; information retrieval, text mining, and natural language processing where the processing tasks include different stages such as: stop word removal and stemming. Stemming tries to reduce a word to its stem (Al-Shammari et al., 2008), stemming process uses word morphological analysis in order to get the word’s stems (Sembok et al., 2011).

Stemming is a very important technique that is usually used in information retrieval and data mining as well as many other NLP applications. Stemming is important for some natural languages and unimportant in others. As reported by Sembok et al. (2011) and Al-Shammari (2008), stemming has the following benefits:

•
Stemming helps in reducing the size of the index terms.
•
Stemming is used in information retrieval systems to reduce variant word forms to common roots in order to improve retrieval effectiveness.

Arabic is a language used by millions of people around the world in more than 25 countries. Arabic consists of 28 letters, three vowels and the remaining letters are consonants. Each letter has a different style depending on its position in the word (Duwari, 2007; Kadri et al., 2006) Arabic is a highly inflectional and derivative language which makes morphological analysis a very complex task. Moreover, Arabic do not use capitalizations in order to differentiate nouns form other words in documents (El-Halees, 2010). Arabic words have two distinctive genders, feminine and masculine; three numbers, singular, dual, and plural; and three grammatical cases, nominative, accusative, and genitive (Omer et al., 2010). Finally, the Arabic language consists of three types of words; nouns, verbs and particles; where nouns and verbs are derived from a limited set of about 10,000 roots (Said et al., 2009). All these characteristics make Arabic text classification a difficult task when comparing it with other text classification tasks that deal with English and other languages.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

The Effect of Stemming on Arabic Text Classification: An Empirical Study

Abstract

1. Introduction

Complete Chapter List