Abstract
Several unsupervised learning topics have been extensively studied with wide applications for decades in the literatures of statistics, signal processing, and machine learning. The topics are mutually related and certain connections have been discussed partly, but still in need of a systematical overview. The article provides a unified perspective via a general framework of independent subspaces, with different topics featured by differences in choosing and combining three ingredients. Moreover, an overview is made via three streams of studies. One consists of those on the widely studied principal component analysis (PCA) and factor analysis (FA), featured by the second order independence. The second consists of studies on a higher order independence featured independent component analysis (ICA), binary FA, and nonGaussian FA. The third is called mixture based learning that combines individual jobs to fulfill a complicated task. Extensive literatures make it impossible to provide a complete review. Instead, we aim at sketching a roadmap for each stream with attentions on those topics missing in the existing surveys and textbooks, and limited to the authors’ knowledge.
Key Terms in this Chapter
Rival Penalized Competitive Learning: It is a development of competitive learning in help of an appropriate balance between participating and leaving mechanisms, such that an appropriate number of agents or learners will be allocated to learn multiple structures underlying observations. See http://www.scholarpedia.org/article/Rival_Penalized_Competitive_Learning
Principal Component (PC): For samples with a zero mean, its PC is a unit vector w originated at zero with a direction along which the average of the orthogonal projection by every sample is maximized, i.e.,, the solution is the eigenvector of the sample covariance matrix, corresponding to the largest eigen-value. Generally, the m-PCs are referred to the m orthonormal vectors as the columns of W that maximizes
Minor Component (MC): Being orthogonal complementary to the PC, the solution of is the MC, while the m-MCs are referred to the columns of W that minimizes subject to
Least Mean Square Error Reconstruction (LMSER): For an orthogonal projection xt onto a subspace spanned by the column vectors of a matrix W, maximizing subject to is equivalent to minimizing the mean square error by using the projection as reconstruction of xt, which is reached when W spans the same subspace spanned by the PCs
Factor Analysis: A set of samples is described by a linear model x = Ay + µ + e, where µ is a constant, y and e are both from Gaussian and mutually uncorrelated, and components of y are called factors and mutually uncorrelated. Typically, the model is estimated by the maximum likelihood principle
BYY Harmony Learning: It is a statistical learning theory for a two pathway featured intelligent system via two complementary Bayesian representations of the joint distribution on the external observation and its inner representation, with both parameter learning and model selection determined by a principle that two Bayesian representations become best harmony. See http://www.scholarpedia.org/article/Bayesian_Ying_Yang_Learning
Total Least Square (TLS) Fitting: Given samples, instead of finding a vector w to minimize the error, the TLS fitting is finding an augmented vector such that the error is minimized subject to, the solution is the MC of
Independence Subspaces: It refers to a family of models, each of which consists of one or several subspaces. Each subspace is spanned by linear independent basis vectors and the corresponding coordinates are mutually independent