Data Reengineering of Legacy Systems

Data Reengineering of Legacy Systems

Richard C. Millham (Catholic University of Ghana, Ghana)
DOI: 10.4018/978-1-60566-242-8.ch005
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Legacy systems, from a data-centric view, could be defined as old, business-critical, and standalone systems that have been built around legacy databases, such as IMS or CODASYL, or legacy database management systems, such as ISAM (Brodie & Stonebraker, 1995). Because of the huge scope of legacy systems in the business world (it is estimated that there are 100 billion lines of COBOL code alone for legacy business systems; Bianchi, 2000), data reengineering, along with its related step of program reengineering, of legacy systems and their data constitute a significant part of the software reengineering market. Data reengineering of legacy systems focuses on two parts. The first step involves recognizing the data structures and semantics followed by the second step where the data are converted to the new or converted system. Usually, the second step involves substantial changes not only to the data structures but to the data values of the legacy data themselves (Aebi & Largo, 1994).
Chapter Preview
Top

Introduction

Legacy systems, from a data-centric view, could be defined as old, business-critical, and stand-alone systems that have been built around legacy databases, such as IMS or CODASYL, or legacy database management systems, such as ISAM (Brodie & Stonebraker, 1995). Because of the huge scope of legacy systems in the business world (it is estimated that there are 100 billion lines of COBOL code alone for legacy business systems; Bianchi, 2000), data reengineering, along with its related step of program reengineering, of legacy systems and their data constitute a significant part of the software reengineering market.

Data reengineering of legacy systems focuses on two parts. The first step involves recognizing the data structures and semantics followed by the second step where the data are converted to the new or converted system. Usually, the second step involves substantial changes not only to the data structures but to the data values of the legacy data themselves (Aebi & Largo, 1994).

Borstlap (2006), among others, has identified potential problems in retargeting legacy ISAM data files to a relational database. Aebi (1997), in addition to data transformation logic (converting sequential file data entities into their relational database equivalents), looks into, as well, data quality problems (such as duplicate data and incorrect data) that is often found with legacy data.

Due to the fact that the database and the program manipulating the data in the database are so closely coupled, any data reengineering must address the modifications to the program’s data access logic that the database reengineering involves (Hainaut, Chandelon, Tonneau, & Joris, 1993).

In this article, we will discuss some of the recent research into data reengineering, in particular the transformation of data, usually legacy data from a sequential file system, to a different type of database system, a relational database. This article outlines the various methods used in data reengineering to transform a legacy database (both its structure and data values), usually stored as sequential files, into a relational database structure. In addition, methods are outlined to transform the program logic that accesses this database to access it in a relational way using WSL (wide spectrum language, a formal language notation for software) as the program’s intermediate representation.

Top

In this section, we briefly describe the various approaches that various researchers have proposed and undertaken in the reengineering of legacy data. Tilley and Smith (1995) discuss the reverse engineering of legacy systems from various approaches: software, system, managerial, evolution, and maintenance.

Because any data reengineering should address the subsequent modifications to the program that the program’s data access’ logic entails, Hainaut et al. (1993) have proposed a method to transform this data access logic, in the form of COBOL read statements, into their corresponding SQL relational database equivalents.

Key Terms in this Chapter

Legacy Data: Historical data that are used by a legacy system that could be defined as a long-term mission-critical system that performs important business functions and contains comprehensive business knowledge

Multivalued Attribute: When an attribute, or field, of a table or file may have multiple values. For example, in a COBOL sequential file, its corresponding record may have a field, A, with several allowable values (Y, N, D). Translating this multivalued attribute to its relational database equivalent model is difficult; hence, lists or linked tables containing the possible values of this attribute are used in order to represent it in the relational model.

Chicken Little Approach: An approach that allows the coexistence of the legacy and target databases during the data reengineering phase through the use of a gateway that translates data access requests from the legacy system for use by the target database system and then translates the result(s) from the target database for use by the legacy system.

Conceptual Conversion Strategy: A strategy that focuses first on the recovery of the precise semantic meaning of data in the source database and then the development of the target database using the conceptual schema derived from the recovered semantic meaning of data through standard database development techniques.

Domain Analysis: A technique that identifies commonalties and differences across programs and data. Domain analysis is used to identify design patterns in software and data.

Butterfly Approach: An iterative data reengineering approach where the legacy data are frozen for read-only access until the data transformation process to the target database is complete. This approach assumes that the legacy data are the most important part of the reengineering process and focuses on the legacy data structure, rather than its values, during its migration.

Physical Conversion Strategy: A strategy that does not consider the semantic meaning of the data but simply converts the existing legacy constructs of the source database to the closest corresponding construct of the target database.

Complete Chapter List

Search this Book:
Reset