Receiver Operating Characteristic (ROC curves) have been used for years in decision making from signals, such as radar or radiology. Basically they plot the hit rate versus the false alarm rate. They were introduced recently in data mining and machine learning to take into account different misclassification costs, or to deal with skewed class distributions. In particular they help to adapt the extracted model when the training set characteristics differ from the evaluation data. Overall they provide a convenient way to compare classifiers, but also an unexpected way to build better classifiers.
ROC analysis mainly deals with binary classifiers, models associating each entry to the positive or to the negative class. The performance of a given classifier is collected in a confusion matrix (also known as a contingency table) counting the number of training examples in each of the four cells depending on their actual classes and on their predicted classes (see Table 1).Table 1.
| Real Positive|| Real Negative|
| Predicted Positive|| True Positive (TP)|
| False Positive (FP)|
aka False Alarm
aka Type I Error
| Predicted Negative|| False Negative (FN)|
aka Type II Error
| True Negative (TN)|
aka Correct Rejection
| Total Number of Positive (P)|| Total Number of Negative (N)|
The True Positive Rate (TPR) is the fraction of positive examples correctly classified, TP/P and the False Positive Rate (FPR) is the fraction of negative examples incorrectly classified, FP/N.