Enhancing Web Data Mining: The Study of Factor Analysis

Enhancing Web Data Mining: The Study of Factor Analysis

Abhishek Taneja (S. A. Jain College, India)
Copyright: © 2017 |Pages: 21
DOI: 10.4018/978-1-5225-0613-3.ch005


An enormous production of databases in almost every area of human endeavor particularly through web has created a great demand for new, powerful tools for turning data into useful, task-oriented knowledge. The aim of this study is to study the predictive ability of Factor Analysis a web mining technique to prevent voting, averaging, stack generalization, meta- learning and thus saving much of our time in choosing the right technique for right kind of underlying dataset. This chapter compares the three factor based techniques viz. principal component regression (PCR), Generalized Least Square (GLS) Regression, and Maximum Likelihood Regression (MLR) method and explores their predictive ability on theoretical as well as on experimental basis. All the three factor based techniques have been compared using the necessary conditions for forecasting like R-Square, Adjusted R-Square, F-Test, JB (Jarque-Bera) test of normality. This study can be further explored and enhanced using sufficient conditions for forecasting like Theil's Inequality coefficient (TIC), and Janur Quotient (JQ).
Chapter Preview


Factor analysis is a collection of techniques employed to explore underlying latent variables/factors which influence the outcomes on a number of measured variables. All of the techniques use common factors in their underlying model which is shown in Figure 1.

Figure 1.

Factor model


This model describes in Figure 1 that in a factor based model every observed measure/prediction from measure 1 to measure 5 is influenced by the underlying latent variables/common factors. These common factors i.e., A1 to A5 are also described latent variables and demonstrates the correlation among the different factors because of the more factors in (Kim, Jae-on., Mueller, Charles W., 1978).

Factor based techniques are actually a one-sample technique (Rencher C. Alvin, 2002). For example, the author thinks a sample X1, X2, Xn from an identical population with a mean vector 978-1-5225-0613-3.ch005.m01 and covariance matrix978-1-5225-0613-3.ch005.m02. Factor based model represents each variable as a linear collection of essential common factors f1, f2, fm, with a linked residual term to account for that part of the variable that is unique. For X1, X2, Xp in any observation vector X, the model is as follows:

X1μ1 = λ11f1 + λ12f2 +· · ·+λ1m fm + ε1X2μ2 = λ21f1 + λ22f2 +· · ·+λ2m fm + ε2... Xpμp = λp1f1 + λp2f2 +· · ·+λpm fm + εp.

If possible, m should be considerably smaller than p; or else the author have not achieved a sensible explanation of the variables as functions of a few underlying factors (Kim, Jae-on & Charles W. Mueller, 1978). In the above equation, f’s in random variables that make the X’s. The loadings which serve as weights are the coefficients i.e., λij. They display how every Xi independently depends on the f ’s. λij describes the significance of the jth factor fj to the ith variable Xi and can also be used in explanation of fj. The author explains f2, for example, by examining its coefficients, λ12, λ22, λp2. The larger loadings associate f2 to its corresponding X’s. From these X’s, the author deduce a meaning or description of f2. After estimating the coefficients λij ’s, it is assumed that they will segregate the variables into parts equivalent to factors. Initially it appears that the MLR and factor analysis are similar techniques but there are fundamentally different because f’s in above equations are unobserved and equations above represents one observational vector, whereas MLR represents all n observations.

Complete Chapter List

Search this Book: