TopSupport Vector Machines
This chapter is based on the famous book of Nello Cristianini and John Shawe-Taylor (Cristianini & Shawe-Taylor, 2000). It was one of the first introductory books to Support Vector Machines (SVMs) – a new generation learning system based on the recent advances in statistical learning theory. Currently, SVM is one of the standard tools for machine learning and data mining and it found many applications in bioinformatics, too.
To date, there are quite many books on this topic. I included the following non-exhaustive list for you: (Vapnik, The nature of statistical learning theory, 1995), (Vapnik, Statistical learning theory, 1998), (Schölkopf, Burges, & Smola, Advances in kernel methods: support vector machine learning, 1999), (Smola, Bartlett, Schölkopf, & Schuurmans, 2000), (Cristianini & Shawe-Taylor, 2000), (Kecman, 2001), (Schölkopf & Smola, 2002), (Herbrich, 2002), (Suykens, van Gestel, De Brabanter, De Moor, & Vandewalle, 2002), (Shawe-Taylor & Cristianini, 2004), (Schölkopf, Tsuda, & Vert, Kernel methods in computational biology, 2004), (Abe, 2005), (Neuhaus & Bunke, 2007), (Steinwart & Christmann, 2008), (Hamel, 2009)
Chapters on SVMs are also included in many textbooks, e.g., (Duda, Hart, & Stork, 2000), (Bishop, 2006), (Hastie, Tibshirani, & Friedman, 2003), (Izenman, 2008), (Wu & Kumar, 2009).
Gene expression based cancer classification using SVMs can be found, e.g., in (Furey, Cristianini, Duffy, Bednarski, Schummer, & Haussler, 2000), (Zhang & Ke, 2000), (Guyon, Weston, Barnhill, & Vapnik, 2002), (Wang, Makedon, Ford, & Pearlman, 2005), (Tan & Yang, 2008).
This chapter is the longest one in the book, so be ready for a long journey as I will guide you, dear readers, through jungles of mathematics threatening to seize you with boredom. I have carefully read the book of (Cristianini & Shawe-Taylor, 2000) and decoded difficult to understand text. Thus, your task is to trust me and be good followers. If Cristianini and Shawe-Taylor made the path in “SVM jungles”, I am going to cut all branches of trees and lianas along this path so that you could comfortably walk along it as if you were on the main avenue of a big city (let us assume here that there are no big crowds).
Notations
Let X and Y denote the input space and output domain, respectively. Usually, , while for binary classification , and for m-class classification . A training set is denoted by
where
is the number of instances, and
is an
n-dimensional column-vector (to produce a row vector, one can take the transpose
.
I will often talk of vectors as points in some space of features. For example, any point on a 2D plot is characterized by two features – its coordinates.
In this chapter and in several other chapters, we will frequently utilize the following mathematical operator: 〈a, b〉, where a and b are vectors of features of the same length. This operator denotes inner product that takes two vectors of equal length as input and produces a scalar as output. In Euclidean spaces1, the inner product is called the dot or scalar product. How the inner product is calculated is shown in the next section.