Article Preview
Top1. Introduction
The primary cause of code complexityisthe time frame, mismanagement,unclean shortcuts during the software development process, lack of testing,documentation issues, lack of understanding, communication issues, lack of teamwork, monitoring issues,workloadand late refactoring. Lack of cooperation and coordination often cause these problems. Project transition even harmed the whole project due to nasty coding. Code smell refers to the deeper issue inside a program's source code. These problems occurred because code smell may not affect the result, but it still harms the source code's performance. The absolute violation of basics in developingsoftware results decreases code quality by increasing the technical debt to identify code smells automatically. Wekanose is a tool used to determine the code smell from any coding using weka software. Other code detection tools are PMD, iplasma, Jdeodrant, Decoder, Checkstyle, etc.
Figure 1, illustrates the dirty code with data clump code complexity where groups of variables are combined to form objects at the class level. It increases the execution time of the program by allocating data values to the variables. In the above Figure datamembers like ccno,expmonth,expyear and amt consists of some random data values, which further leads to code complexity. It can be avoided by deleting the assigned values.
Feature selection is the automatic or manual selection of relevant features from the massive amount of data used to constructthe model. It is used to improve the accuracy of a model by reducing its complexity. It is a process of selecting a set of best features in the form of a subset before implementing any generalized algorithms.Various parameters involved for feature selectionare correlation, entropy, mutual information. Different types of feature selection methods are Recursive Feature Elimination,Chi-squared test, feature evaluation, etc.Machine learninginvolves a machine to learn from data by predicting things being programmed automatically.We have used different supervised, unsupervised and anomaly detection algorithms to identify the smelly data from the realtime datasets. In our research, the prime focus is on code smell detection using the identification of outliers. We have used different unsupervised anomaly detection methods like PCA, GMM, autoencoder, K-means clustering, and Bayesian network to identify outliers in the dirty code. We have also focused on the performance of the system by comparing its accuracy.Software quality is defined as the robustness or fitness of a software product's quality. It is analyzed by the following parameters reusability,correctness,portability and maintainability. Software quality assurance produces high-quality software by saving time and cost. Code smell affects the source code by violating the good program designing principles having a negative impact on the software quality. The primary solution to this problem is to develop the refactored code. Refactoring is used to change the internal structure of code without altering its external functionalities.Different types of techniques are replacing parameter, inline method, extract class etc.
In this work, feature selection methods reduce complexity and increase the proposed model's efficiency. Recursive Feature Elimination(RFE) is used for selecting the relevant data by removing the weakest features of the dataset.
This work illustrated the anomaly detection technique for identifying outliers by comparing the dirty code with clean code. We have used five different algorithms to identify the extreme code point that slightly deviated from the original data samples. Cluster-based anomaly detection methods give the best results for code smell detection.