Independent Subspaces

Independent Subspaces

Lei Xu (Chinese University of Hong Kong, Hong Kong, & Peking University, Beijing, China)
Copyright: © 2009 |Pages: 10
DOI: 10.4018/978-1-59904-849-9.ch132
OnDemand PDF Download:
$37.50

Abstract

Several unsupervised learning topics have been extensively studied with wide applications for decades in the literatures of statistics, signal processing, and machine learning. The topics are mutually related and certain connections have been discussed partly, but still in need of a systematical overview. The article provides a unified perspective via a general framework of independent subspaces, with different topics featured by differences in choosing and combining three ingredients. Moreover, an overview is made via three streams of studies. One consists of those on the widely studied principal component analysis (PCA) and factor analysis (FA), featured by the second order independence. The second consists of studies on a higher order independence featured independent component analysis (ICA), binary FA, and nonGaussian FA. The third is called mixture based learning that combines individual jobs to fulfill a complicated task. Extensive literatures make it impossible to provide a complete review. Instead, we aim at sketching a roadmap for each stream with attentions on those topics missing in the existing surveys and textbooks, and limited to the authors’ knowledge.
Chapter Preview
Top

A General Framework Of Independent Subspaces

A number of unsupervised learning topics are featured by its handling on a fundamental task. As shown in Fig.1(b), every sample is projected into on a manifold and the error of using to represent is minimized collectively on a set of samples. One widely studied situation is that a manifold is a subspace represented by linear coordinates, e.g., spanned by three linear independent basis vectors as shown in Fig.1(a). So, can be represented by its projection on each basis vector, i.e.,

Figure 1.

or

. (1)

Typically, the error is measured by the square norm, which is minimized when e is orthogonal to . Collectively, the minimization of the average error on a set of samples or its expectation is featured by those natures given at the bottom of Fig.1(a).

Key Terms in this Chapter

Rival Penalized Competitive Learning: It is a development of competitive learning in help of an appropriate balance between participating and leaving mechanisms, such that an appropriate number of agents or learners will be allocated to learn multiple structures underlying observations. See http://www.scholarpedia.org/article/Rival_Penalized_Competitive_Learning

Principal Component (PC): For samples with a zero mean, its PC is a unit vector w originated at zero with a direction along which the average of the orthogonal projection by every sample is maximized, i.e.,, the solution is the eigenvector of the sample covariance matrix, corresponding to the largest eigen-value. Generally, the m-PCs are referred to the m orthonormal vectors as the columns of W that maximizes

Minor Component (MC): Being orthogonal complementary to the PC, the solution of is the MC, while the m-MCs are referred to the columns of W that minimizes subject to

Least Mean Square Error Reconstruction (LMSER): For an orthogonal projection xt onto a subspace spanned by the column vectors of a matrix W, maximizing subject to is equivalent to minimizing the mean square error by using the projection as reconstruction of xt, which is reached when W spans the same subspace spanned by the PCs

Factor Analysis: A set of samples is described by a linear model x = Ay + µ + e, where µ is a constant, y and e are both from Gaussian and mutually uncorrelated, and components of y are called factors and mutually uncorrelated. Typically, the model is estimated by the maximum likelihood principle

BYY Harmony Learning: It is a statistical learning theory for a two pathway featured intelligent system via two complementary Bayesian representations of the joint distribution on the external observation and its inner representation, with both parameter learning and model selection determined by a principle that two Bayesian representations become best harmony. See http://www.scholarpedia.org/article/Bayesian_Ying_Yang_Learning

Total Least Square (TLS) Fitting: Given samples, instead of finding a vector w to minimize the error, the TLS fitting is finding an augmented vector such that the error is minimized subject to, the solution is the MC of

Independence Subspaces: It refers to a family of models, each of which consists of one or several subspaces. Each subspace is spanned by linear independent basis vectors and the corresponding coordinates are mutually independent

Complete Chapter List

Search this Book:
Reset