Software Architecture Recovery Using Integrated Dependencies Based on Structural, Semantic, and Directory Information

Software Architecture Recovery Using Integrated Dependencies Based on Structural, Semantic, and Directory Information

Shiva Prasad Reddy Puchala, Jitender Kumar Chhabra, Amit Rathee
Copyright: © 2022 |Pages: 20
DOI: 10.4018/IJISMD.297060
Article PDF Download
Open access articles are freely available for download

Abstract

Architecture recovery techniques study dependencies in source code and reconstruct architecture. Most techniques either use structural or semantic dependencies and it is observed that the use of directory information helps in improving architecture recovery. The research carried out to date has focused on using the semantic information in a very limited manner, and directory information in a trivial manner without considering directory hierarchy. Further, all three (structural, semantic, and directory-structure) are reported to be very useful in architecture recovery but have not been used in a combined manner at all. So, this paper proposes a new scheme for architecture recovery using a weighted combination of all three dependencies. A new approach is designed to effectively mine semantic dependencies and extract directory dependencies. Finally, different dependency schemes are evaluated with four clustering algorithms on three open-source projects. The obtained results show that the proposed scheme performs better than the existing approaches in architecture recovery.
Article Preview
Top

Introduction

Software architecture is defined as the organization of a system embodying its components and their relationships. As software systems grow in size and complexity, it becomes hard for developers to keep architecture well-documented, and this phenomenon results in an architecture shift from its initial design. Most of the open-source projects lack architectural documentations and for these projects, code is the available documentation. So software architecture recovery is crucial for many reasons, to adapt a software system to changing requirements, to enable the reuse of components, and estimate the cost and risks involved in a change.

For this reason, huge research was carried out in this domain to recover the architecture of a software system, and architecture recovery is defined as a reverse engineering approach that aims at reconstructing architecture from the implementational view of software. Many techniques have already been proposed to recover the architecture of software and these techniques work on different types of input information. Depending on the input information used, these techniques are categorized as, structure-based techniques, semantic-based techniques, knowledge-based techniques (Kong et al., 2018). Structure-based techniques depend on the structure of source code to extract relations and group software elements based on structural dependencies using different clustering techniques. Semantic-based techniques depend on the textual information present in source code and documentation. These techniques try to form topics and group software elements into these topics. Knowledge-based techniques use various types of input information from software repositories viz; framework-related information, directory information, patterns, commits, and issues in version control systems.

In literature, the majority of architecture recovery techniques are either structure-based (Mancoridis et al., 1999) (Maqbool & Babri, 2004) (Andritsos & Tzerpos, 2005) (Wang et al., 2010) (Zhang et al., 2010) (Cho et al., 2019) or semantic-based (Kuhn et al., 2007) (Garcia et al., 2011) (Sajnani, 2012) (Link et al., 2019). Only a few techniques (Li et al., 2017) (Shahbazian et al., 2018) (Kong et al., 2018) (Guimaraes & Cai, 2020) exploit the available knowledge in software repositories and use them in architecture recovery. In software, readily available knowledge is its directory information, and only very few techniques (Kong et al., 2018) use this knowledge in architecture recovery. Most of the techniques use one or two types of input information in the recovery process. However, none of these techniques utilize structural, semantic, and directory information at the same time. Further, there is no proper study on how to extract available directory knowledge and integrate it with structural and semantic information for architecture recovery.

This paper aims to mine all needed semantic information, compute hierarchy-based directory dependencies information and integrate these with structural dependencies to recover the software architecture. Effective mining of semantic information including comments, identifiers, variables, class/method names as well as usage, is carried out and a new approach for extracting directory dependencies based on directory hierarchy is proposed. Various coupling schemes are formulated to evaluate the effect of using multiple dependencies in architecture recovery. These coupling schemes are also experimented with different sets of weights on three subject systems, to identify the best combination of weights for integrating dependencies. The main contributions of this paper include:

  • 1.

    Designing a new approach for computing directory dependencies from directory hierarchy by using distance and depth-based measures.

  • 2.

    Effective mining of all types of semantic information and empirically evaluation of the effect of using semantic and directory dependencies in architecture recovery by formulating six different dependency coupling schemes.

  • 3.

    Integrating all three dependencies in the best combination of weights based on experimentation.

  • 4.

    To study the effect of integrated dependencies in architecture recovery by using Complete linkage clustering.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 8 Issues (2022): 7 Released, 1 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing