The term metadata is frequently used in many different sciences. Statistical metadata generally used to denote “every piece of information required by a data user to properly understand and use statistical data.” Modern statistical information systems (SIS) use metadata in relational or complex object-oriented metadata models, making an extensive and active usage of metadata. Early phases of many software development projects emphasize the design of a conceptual data/metadata model. Such a design can be detailed into a logical data/metadata model. In later stages, this model may be translated into physical data/metadata model. Organisations aspects, user requirements and constraints created by existing data warehouse architecture lead to a conceptual architecture for metadata management, based on a common, semantically rich, object-oriented data/metadata model, integrating the main steps of data processing and covering all aspects of data warehousing (Pool et al, 2002). In this paper we examine data/metadata modeling according to the techniques and paradigms used for metadata schemas development. However, only the integration of a model into a SIS is not sufficient for automatic manipulation of related datasets and quality assurance, if not accompanied by certain operators/ transformations. Two types of transformations can be considered: (i) the ones used to alleviate breaks in the time series and (ii) a set of model-integrated operators for automating data/metadata management and minimizing human errors. This latter category is extensively discussed. Finally, we illustrate the applicability of our scientific framework in the area of Biomedical statistics.
Metadata and metainformation are two terms widely used interchangeably in various sciences and contexts. Until recently, metainformation was usually held as table footnotes. This was mainly due to the fact that the data producer and/or consumer had underestimated the importance of this kind of information.
When metadata integration in a pre-arranged format became evident, the use of metadata templates was proposed. This was the first attempt to capture metadata in a structured way. This approach was soon adopted since it reduced chances of ambiguous metadata as each field of the templates was well documented. However, they still had limited semantic power, as they cannot express the semantic links between the various pieces of metainformation.
To further increase the benefits of using metadata, attempts have been made to establish ways of automating the processing of statistical data. The main idea behind this task is to translate the meaning of data in a computer-understandable form. A way of achieving this goal is by using large, semantically rich, statistical data/metadata models like the ones developed in Papageorgiou et al (2001, 2002). However, in order to minimize compatibility problems between dispersed systems, the need that emerges is to build an integrated metadata model to manage data in all stages of information processing. The quantifiable benefits that have been proven through the integration of data mining with current information systems can be greatly increased, if such an integrated model is implemented. This is reinforced by the fact that both relational and OLAP technologies have tremendous capabilities for navigating massive data warehouses; nevertheless, brute force navigation of data is not enough. Such an integrated model was developed in Vardaki & Papageorgiou (2004), and it was demonstrated that an abstract, generally applied model, keeping information about storage and location of information as well as data processing steps, is essential for data mining requirements. Other related existing work focuses either mainly on OLAP databases (Pourabbas and Shoshani, 2006) or on semantically rich data models used mainly for data capturing purposes. In these cases, the authors focus their attentions on data manipulations and maximization of the performance of data aggregations.