The problem of missing data in databases has recently been dealt with through the use computational intelligence. The hybrid of auto-associative neural networks and genetic algorithms has proven to be a successful approach to missing data imputation. Similarly, two auto-associative neural networks are developed to be used in conjunction with genetic algorithm to estimate missing data, and these approaches are compared to a Bayesian auto-associative neural network and genetic algorithm approach. One technique combines three neural networks to form a hybrid auto-associative network, while the other merges principal component analysis and neural networks. The hybrid of the neural network and genetic algorithm approach proves to be the most accurate when estimating one missing value, while a hybrid of principal component and neural networks is more consistent and captures patterns in the data more efficiently.
Traditional methods of data imputation, such as mean substitution, regression-based methods and resemblance-based or ‘hot deck imputation’ may produce biased results (Gold & Bentler, 2000). Regression-based methods predict missing values while resemblance-based methods impute new values based on similar cases (Yuan & Bentler, 2000) and other imputation methods include multiple imputation and Expectation Maximisation (Little & Rubin, 1987; Yuan & Bentler, 2000) and are dealt with by (Wayman, 2003).