Ensemble of SVM Classifiers for Spam Filtering

Ensemble of SVM Classifiers for Spam Filtering

Ángela Blanco, Manuel Martín-Merino
Copyright: © 2009 |Pages: 6
ISBN13: 9781599048499|ISBN10: 1599048493|EISBN13: 9781599048505
DOI: 10.4018/978-1-59904-849-9.ch086
Cite Chapter Cite Chapter

MLA

Blanco, Ángela, and Manuel Martín-Merino. "Ensemble of SVM Classifiers for Spam Filtering." Encyclopedia of Artificial Intelligence, edited by Juan Ramón Rabuñal Dopico, et al., IGI Global, 2009, pp. 561-566. https://doi.org/10.4018/978-1-59904-849-9.ch086

APA

Blanco, Á. & Martín-Merino, M. (2009). Ensemble of SVM Classifiers for Spam Filtering. In J. Rabuñal Dopico, J. Dorado, & A. Pazos (Eds.), Encyclopedia of Artificial Intelligence (pp. 561-566). IGI Global. https://doi.org/10.4018/978-1-59904-849-9.ch086

Chicago

Blanco, Ángela, and Manuel Martín-Merino. "Ensemble of SVM Classifiers for Spam Filtering." In Encyclopedia of Artificial Intelligence, edited by Juan Ramón Rabuñal Dopico, Julian Dorado, and Alejandro Pazos, 561-566. Hershey, PA: IGI Global, 2009. https://doi.org/10.4018/978-1-59904-849-9.ch086

Export Reference

Mendeley
Favorite

Abstract

Unsolicited commercial email also known as Spam is becoming a serious problem for Internet users and providers (Fawcett, 2003). Several researchers have applied machine learning techniques in order to improve the detection of spam messages. Naive Bayes models are the most popular (Androutsopoulos, 2000) but other authors have applied Support Vector Machines (SVM) (Drucker, 1999), boosting and decision trees (Carreras, 2001) with remarkable results. SVM has revealed particularly attractive in this application because it is robust against noise and is able to handle a large number of features (Vapnik, 1998). Errors in anti-spam email filtering are strongly asymmetric. Thus, false positive errors or valid messages that are blocked, are prohibitively expensive. Several authors have proposed new versions of the original SVM algorithm that help to reduce the false positive errors (Kolz, 2001, Valentini, 2004 & Kittler, 1998). In particular, it has been suggested that combining non-optimal classifiers can help to reduce particularly the variance of the predictor (Valentini, 2004 & Kittler, 1998) and consequently the misclassification errors. In order to achieve this goal, different versions of the classifier are usually built by sampling the patterns or the features (Breiman, 1996). However, in our application it is expected that the aggregation of strong classifiers will help to reduce more the false positive errors (Provost, 2001 & Hershop, 2005). In this paper, we address the problem of reducing the false positive errors by combining classifiers based on multiple dissimilarities. To this aim, a diversity of classifiers is built considering dissimilarities that reflect different features of the data. The dissimilarities are first embedded into an Euclidean space where a SVM is adjusted for each measure. Next, the classifiers are aggregated using a voting strategy (Kittler, 1998). The method proposed has been applied to the Spam UCI machine learning database (Hastie, 2001) with remarkable results.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.