Before the computers advent, editing was performed by large groups of persons undertaking very simple checks and detecting only a small fraction of errors. The computers evolution allowed survey designers and managers to review all records by consistently applying even sophisticated checks to detect most of the errors in the data that could not be found manually. The focus of both methodologies and applications was on the possibilities of enhancing the checks and of applying automated imputation rules to rationalize the process.
Statistical organizations periodically perform a SDE process. It begins with data collection. An interviewer can quickly examine the respondent answers and highlight gross errors. Whenever data collection is performed using a computer, more complex edits can be stored in it in advance and can be applied to data just before their transmission to a central database. In such cases, the core of editing activity is performed after completing data collection. Nowadays, any modern editing process is based on the a-priori specification of a set of edits, i.e., logical conditions or restrictions on data values. A given set of edits is not necessarily correct: important edits may be omitted and conceptually wrong, too restrictive or logically inconsistent edits may be included. The extent of these problems is reduced by a subject-matter expert edits specification. Problems are not eliminated, however, because many surveys involve large questionnaires and require the complex specification of hundreds of edits. As a check, a proposed set of edits is applied on test data with known errors before application on real data. Missing edits or logically inconsistent ones, however, may not be detected at this stage. Problems in the edits, if discovered during the actual editing or even after it, cause editing to start anew after their correction, leading to delays and incurring larger costs than expected. Any method or procedure which would assist in the most efficient specification of edits would therefore be welcome.
The final result of a SDE process is the production of clean data and the indication of the underlying causes of errors in the data. Usually, an editing software is able to produce reports indicating frequent errors in the data. The analysis of such reports allows to investigate the data error generation causes and to improve the results of future surveys in terms of data quality. Elimination of sources of errors in a survey allow a data collector agency to save money.
SDE concerns two aspects of data quality; (1) Data Validation: the correction of logical errors in the data; (2) Data Imputation: the imputation of correct values once errors in data have been localized. Whenever missing values appear in data, missing data treatment is part of the data imputation process to be performed in the SDE framework.