Detecting Ambiguities in Requirement Documents Written in Arabic Using Machine Learning Algorithms

Detecting Ambiguities in Requirement Documents Written in Arabic Using Machine Learning Algorithms

Ahmad Althunibat, Bayan Alsawareah, Siti Sarah Maidin, Belal Hawashin, Iqbal Jebril, Belal Zaqaibeh, Haneen A. Al-khawaja
Copyright: © 2024 |Pages: 19
DOI: 10.4018/IJCAC.339563
Article PDF Download
Open access articles are freely available for download

Abstract

The identification of ambiguities in Arabic requirement documents plays a crucial role in requirements engineering. This is because the quality of requirements directly impacts the overall success of software development projects. Traditionally, engineers have used manual methods to evaluate requirement quality, leading to a time-consuming and subjective process that is prone to errors. This study explores the use of machine learning algorithms to automate the assessment of requirements expressed in natural language. The study aims to compare various machine learning algorithms according to their abilities in classifying requirements written in Arabic as decision tree. The findings reveal that random forest outperformed all stemmers, achieving an accuracy of 0.95 without employing a stemmer, 0.99 with the ISRI stemmer, and 0.97 with the Arabic light stemmer. These results highlight the robustness and practicality of the random forest algorithm.
Article Preview
Top

In the realm of software development, understanding stakeholder needs is crucial for designing complex software systems (Althunibat et al., 2022). Stakeholders, often users, contribute NLP-written requirements for large-scale projects. Ko et al. (2007) proposed an approach wherein initial data needs are automatically categorized into topics, reflecting political analyst perspectives. Experiments, utilizing datasets in both Korean and English, validate the efficacy of this strategy. This highlights the potential for an internet-based requirements analysis-supporting system to efficiently gather and evaluate dispersed end-user requirements via the network.

Moving forward, support vector machine (SVM) algorithms have garnered attention for their ideal academic characteristics and high performance (Al Qaisi et al., 2021). Yang et al. (2010) delved into the analysis of support vector characteristics, presenting a novel learning process that incorporates SVM classification algorithms. The algorithm, rooted in the equivalence of classification between support vector sets, employs incremental learning to accumulate data. Experimental results indicate its potential to expedite training processes, reduce storage costs, and maintain organizational accuracy (Quba et al., 2021).

Artificial intelligence (AI) and deep learning (DL) come to the forefront in the work of Navarro-Almanza et al. (2017). They recommend using a convolutional neural network (CNN) model to categorize software requirements, showcasing promising results on the PROMISE corpus dataset. This dataset, with pre-grouped and labeled criteria for both functional requirements (FR) and non-functional requirements (NFR), serves as a valuable resource for evaluating the suggested model. (Gill et al., 2014)

Lu and Liang (2017) further contributed to understanding user requirements by breaking them down into FRs and NFRs, including usability, portability, performance, and reliability. Their research involved diverse methods such as bag of words (BoW), CHI2, TF-IDF, and AUR-BoW, as well as ML algorithms like J48, naive Bayes, and bagging. Comparative analysis reveals that the bagging ML algorithm provides the best categorization outcome for NFRs, as validated by feedback from actual customers.

In the domain of ML techniques for classifying FR phrases, AlZu'bi and Jararweh (2020) introduced a novel approach that integrates information from various ML models. This method, implemented and trained using a single dataset, aims to enhance the accuracy and quality of FR classification.

To address imbalanced classes and improve classifier performance, Kurtanović and Maalej (2017) propose a strategy applying cross-validation to classifiers. Their focus is on the automatic identification of NFRs, particularly in the categories of security, usability, operations, and performance. This involves preprocessing steps such as stopword and punctuation removal, coupled with feature selection using BoW, bigrams, and trigrams. Notably, the inclusion of part-of-speech tags emerges as a highly informative feature in their experiments using the SVM classifier algorithm.

The landscape of software requirement classification is further enriched by exploring various methodologies (Alsawareah et al., 2023; Al-Kasabera et al., 2020). These studies aimed to establish correlations between software architecture and NFRs, emphasizing the significance of considering software architecture in addressing NFRs within the software development life cycle.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024)
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing