 # An Introduction to Structural Equation Modeling (SEM) and the Partial Least Squares (PLS) Methodology

Nicholas J. Ashill (American University of Sharjah, UAE)
DOI: 10.4018/978-1-4666-1601-1.ch032

## Abstract

Over the past 15 years, the use of Partial Least Squares (PLS) in academic research has enjoyed increasing popularity in many social sciences including Information Systems, marketing, and organizational behavior. PLS can be considered an alternative to covariance-based SEM and has greater flexibility in handling various modeling problems in situations where it is difficult to meet the hard assumptions of more traditional multivariate statistics. This chapter focuses on PLS for beginners. Several topics are covered and include foundational concepts in SEM, the statistical assumptions of PLS, a LISREL-PLS comparison and reflective and formative measurement.
Chapter Preview
Top

## Main Focus

### What is Structural Equation Modeling?

Structural equation Modeling (SEM), also referred to as ‘causal Modeling’, has become a popular tool in the methodological arsenal of social science researchers (Bagozzi & Baumgartner, 1994; Chau, 1997). SEM is a method for representing, estimating, and testing a theoretical network of (mostly) linear relations between variables, where those variables may be either observable or directly unobservable (Hair, Black, Babin, Anderson, & Tatham, 2006). The multivariate technique combines aspects of multiple regression (examining dependence relationships) and factor analysis (representing unmeasured concepts or factors with multiple variables) to estimate a series of interrelated dependence relationships simultaneously. The issue of simultaneity is especially important since the measures often derive their meaning from the conceptual network within they are embedded. SEM is grounded in three main premises. First, from the field of psychology comes the belief that the measurement of a valid construct cannot rely on a single measure. Second, from the field of economics comes the conviction that strong theoretical specification is necessary for the estimation of parameters. Third, from the field of sociology comes the notion of ordering theoretical variables and decomposing types of effects. Taken as a whole, these ideas have emerged into what is called latent variable structural equation modeling (Falk & Miller, 1992).

The two common approaches for SEM are the covariance-based approach used in LISREL (Linear Structural Relations), AMOS (Analysis of Moment Structures), and EQS (Anderson & Gerbing, 1988; Bentler, 1995; Bollen, 1989; Bollen & Long, 1993; Byrne, 1994; Jöreskog & Sörbom, 1982 1988), and the variance-based approach used in PLS-PC, PLS-Graph, Smart-PLS and XLSTAT-PLS (Chin, 1995 1998; Esposito Vinzi, Chin, Hensler, & Wang, 2010; Fornell & Cha, 1994; Hansmann & Ringle, 2004; Wold, 1985). Both approaches belong to the family of techniques that Fornell (1987) and Lohmoller (1989) call “the second generation of multivariate data analysis techniques”. Unlike first generation techniques such as multiple regression, principal components and cluster analysis, canonical analysis, discriminant analysis and others, second generation models are able to bring together psychometric and econometric analysis in such a way that the best features of both are exploited (Fornell & Larcker, 1981; Fornell, 1987). SEM can therefore be viewed as an extension of several first generation multivariate techniques (Hair et al. 2006) because they incorporate the psychometrician's notion of unobserved latent variables (constructs) and measurement error in the same estimation procedure. In social sciences research theoretical constructs are typically difficult to operationalize in terms of a single measure, and measurement error is often unavoidable. Therefore, given an appropriate statistical testing method, structural equation models are recognized as indispensable for theory evaluation in this type of research.

Traditional first generation techniques have a number of limitations. First, the statistical tests of the regression coefficients (and the use of procedures like stepwise regression) make assumptions of the data that may not hold, such as sufficient sample size and multivariate normal distribution. Second, the two-step process of aggregating variables to form variate scores and then testing the relationships among these variates presumes that the relative importance of items in each composite is portable across theoretical contexts, an assumption that may not be valid (Fornell, 1982). In traditional multiple regression and path analysis, scales of the latent variables are created by either averaging, summing, or according to some kind of factor analysis of observed variables, the results are then imported into a regression (or path) model. The assumption is that such scores are portable, an assumption that Fornell (1987) argued is not tenable. This two-stage analysis can potentially result in invalid estimates, since it assumes that the relationship among the measures of a construct is independent of the theoretical context within which the measures are embedded (Fornell, 1982; Fornell & Yi, 1992; Hirschheim, 1985). Third, all measurement is made with error, and though error may be estimated using methods such as factor analysis, these error estimates do not explicitly figure in regression analysis, nor are they estimated within the context of the theory being tested (Fornell, 1982). Fourth, each first generation technique can examine only a single relationship at a time i.e., a single relationship between a dependent variable and an independent variable (Hair et al. 2006). In contrast SEM can estimate many equations at once, and they can be interrelated, meaning that the dependent variable in one equation can be an independent variable in other equations.

Both covariance-based and variance-based approaches such as LISREL and PLS allow for causal interpretations of the relations between the latent variable and the indicators, as well as the relations among the latent variables. Both techniques also allow constructs to be measured with multiple indicators, thus minimizing biases imposed by measurement error (Herting, 1985; Kenny, 1979). SEM also has the added benefit of being able to model both direct and indirect relationships among constructs (or latent variables) to determine the relative importance of antecedent constructs, making it possible to test complex theoretical models. This is an advantage over traditional path analysis where the indirect effects need to be calculated by hand (Barclay, Higgins, & Thompson, 1995). Three types of effects may be distinguished with SEM: direct, indirect and total effects. The direct effect is that influence of one variable on another that is unmediated by any other variable in a path model. The indirect effects of a variable are mediated by at least one intervening variable. The sum of the direct and indirect effects is the total effects, in other words, one variable’s total effect on another is the sum of its direct effect and indirect effects (Bollen, 1989).

SEM also provides the means to resolve thorny problem of multicollinearity (Rigdon, 1998). By using multiple items in a questionnaire, the items are modeled as measures of the same common factor, and only the factor is used as a (single) structural variable. The principal component i.e., the factor explaining the most variance is used as the most reliable and valid observable indicator reflecting each of the unobservable research constructs (latent variables). Thus, all of the multiple measures are included in the model, but only one variable enters the prediction equation. High correlations among the multiple items actually improve the stability of the factor analytic measurement model.

## Complete Chapter List

Search this Book:
Reset