Comparison of Normalization Techniques on Data Sets With Outliers

Comparison of Normalization Techniques on Data Sets With Outliers

Nazanin Vafaei, Rita A. Ribeiro, Luis M. Camarinha-Matos
Copyright: © 2022 |Pages: 17
DOI: 10.4018/IJDSST.286184
Article PDF Download
Open access articles are freely available for download

Abstract

With the fast growth of data-rich systems, dealing with complex decision problems with skewed input data sets and respective outliers is unavoidable. Generally, data skewness refers to a non-uniform distribution in a dataset (i.e., a dataset which contains asymmetries and/or outliers). Normalization is the first step of most multi-criteria decision-making (MCDM) problems to obtain dimensionless data, from heterogeneous input data sets, that enable aggregation of criteria and thereby ranking of alternatives. Therefore, when in the presence of outliers in criteria datasets, finding a suitable normalization technique is of utmost importance. As such, in this work, the authors compare seven normalization techniques (max, max-min, vector, sum, logarithmic, target-based, and fuzzification) on criteria datasets, which contain outliers to analyse their results for MCDM problems. A numerical example illustrates the behaviour of the chosen normalization techniques and an (ongoing) evaluation assessment framework is used to recommend the best normalization technique for this type of criteria.
Article Preview
Top

1. Introduction

Human beings use multi-criteria decision-making methods (sometimes also called multiple attribute decision-making, MADM) in many daily activities to solve decision problems and find an optimum decision, in face of several criteria and alternatives (Zavadskas and Turskis 2010). A Multi-Criteria Decision-Making (MCDM) problem can be defined by a decision matrix, composed of a finite set of alternatives Ai (i=1, …, m), a set of criteria Cj (j=1,…, n), the relative importance of the criteria (or weights) Wj, and the matrix cell elements, rij, representing the rating for alternative i with respect to criteria j (Jahan, Edwards, and Bahraminasab 2016; Triantaphyllou 2000). In most MCDM problems, the used criteria can be expressed either as qualitative or quantitative, usually expressed in different scales, which is an obstacle for the aggregation/ranking process (Zavadskas and Turskis 2010). Hence, there is a need to use normalization to prepare dimensionless and comparable criteria values (Jahan et al. 2016; Triantaphyllou 2000; Zavadskas and Turskis 2010). Using different normalization techniques may cause changes on the ranking of alternatives in decision problems, therefore, it is of paramount importance to ensure a proper normalization technique is selected (one objective of this paper).

In recent years, with the advent of data science and data analysing contexts (Chen, Chiang, and Storey 2012), many datasets with outliers emerged (i.e. criterion values skewness), which may greatly influence the aggregation/ranking process. Barnett and Lewis (Barnett and Lewis 1974) defined outlier as “an observation (or subset of observations) which appears to be inconsistent with the remainder of the data set”. Kennedy et al. (1992) stated that “an outlier is not an “incorrect” observation but is a realization from a distribution that is in general highly skewed…. One reason for these extreme observations is that some popular variables, such as size, have skewed distributions.”

So far, there is very little research about the effect of skewed datasets (i.e. criterion values) on decision problems especially from the normalization point of view. To the best of our knowledge there is a big gap in the literature about MCDM methods for selecting suitable normalization techniques especially when there are outliers in the input data.

Therefore, in this study, we discuss the effect of outliers in criteria values and recommend the most suitable normalization technique for MCDM problems that contain skewed criteria values. To this aim we compare seven normalization techniques, using a numerical example that contains outliers in criteria datasets, and use an (on-going) evaluation framework to recommend the best normalization technique.

The major contributions of this study are; (i) addressing the gap in literature about existence of outliers in input data sets in MCDM methods; (ii) comparing the effects of different normalization techniques in MCDM methods with outliers in input data sets; (iii) continue to develop an evaluation assessment framework by adding more metrics to the previous designed frame work (Vafaei et al. 2019; Vafaei, Rita. A. Ribeiro, and Camarinha-Matos 2018); (iv) discussing our contribution with a small illustrative example to exemplify the proposed framework.

This article first presents a brief overview of normalization techniques and assessment frameworks (section ‎2). Then, it addresses the suitability of the proposed framework for dealing with outliers and choosing the best normalization technique in MCDM problems, using an illustrative example (section ‎3). Finally, the conclusion and future work on the topic is presented (section‎ 4).

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 2 Issues (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing