Article Preview
TopIntroduction
Machine learning methods have been widely used in the field of data mining in recent years. Machine learning algorithms based on the theoretical structure of statistics and computer science can provide high performance in data extraction, estimation and classification. With the development of technology in the field of health, there has been a rapid increase in data. Traditional statistical methods have been insufficient in terms of performance regarding data extraction and classification. Machine learning methods have been a powerful alternative to traditional methods because they both save time and provide high performance. Machine learning methods form the basis of many artificial intelligence applications. These methods are used in many medical fields such as diagnosis, early diagnosis and pattern recognition.
Although machine learning methods are widely used and can be easily learned from data, there are cases where they cannot provide high classification performance in all conditions. In some cases, although there is a high performance learning from the training data set, a poor performance is given regarding test data sets. There are different reasons for this issue. Although the model provides high accuracy performance with the training data set, the main reason for the low accuracy performance with the test data set is the problem of overfitting. The problem of overfitting is an error caused by the model's memorizing the data instead of learning the pattern in the training data set. The model that memorizes the data during the training phase provides a high accuracy performance but gives a low accuracy performance with different data due to failure to learn the pattern in the testing phase. This error, which causes the problem of overfitting, is described as variance in machine learning. Variance is not the only error in machine learning. In the training phase, the model does not provide high accuracy performance with the training data set. A model that shows low accuracy performance during the training phase will also demonstrate low accuracy performance during the testing phase. The failure of the model to achieve the desired accuracy performance in the training data set is described as the problem of underfitting. The problem of underfitting occurs when the bias error is high. Bias is an error caused by the inability of the model to learn the pattern in the training data set. There are different reasons for the occurrence of bias. One of the reasons is that the correct model has not been selected for the training data set. Some models are not sufficient to classify the data set and learn the pattern. These models are weak classifiers. Therefore, strong classifiers are used to reduce bias. However, while strong classifiers provide high performance with training data sets, they might show a lower performance with test data sets. This issue induces the problem of overfitting. Another method to reduce bias is to increase model complexity. Increasing model complexity is important for reducing bias in the training data set. However, since increasing model complexity also increases variance, it results in a poor performance with the test data set.