Simpson's paradox is a phenomenon arising from multivariate statistical analyses that often leads to paradoxical conclusions in the field of e-collaboration as well as many other fields where multivariate methods are employed. This work derives a general inequality for the occurrence of Simpson's paradox in path models with or without latent variables. The inequality is then used to estimate the probability that Simpson's paradox would occur at random in path models with two predictors and one criterion variable. This probability is found to be approximately 12.8 percent, slightly higher than 1 occurrence per 8 path models. This estimate suggests that Simpson's paradox is likely to occur in empirical studies, in the field of e-collaboration and other fields, frequently enough to be a source of concern.

Top## 2. A Path Model Illustration Of Simpson’S Paradox

Let us assume that we collected data from 300 firms about two variables: degree of collaborative management (*X*) and firm success (*Z*). The variable degree of collaborative management (*X*) measures the degree to which managers and employees collaborate to continuously improve their firms’ productivity and the quality of their firms’ products. The variable firm success (*Z*) measures the profitability of each firm.

Figure 1 shows a simple path model relating these two variables. Since this path model contains only two variables, then *p*_{ZX}=r_{ZX}=0.5; where *p*_{ZX} and *r*_{ZX} denote the path coefficient and the correlation between the two variables.

Figure 2 shows a slightly more complex path model with an additional variable pointing at *Z*: degree of e-collaboration technology use (*Y*). This new variable measures the degree to which an e-collaboration technology is used. The technology facilitates collaborative management is available in all firms studied. Because of this, firms where the degrees of collaborative management (*X*) are high tend to also use the e-collaboration technology intensely, and thus present high degrees of e-collaboration technology use (*Y*); hence the link *X*→*Y* in the model.

*Figure 2. *Three-variable path model

In this example, the addition of the new variable led the path coefficient *p*_{ZX} for the link between the variables degree of collaborative management (*X*) and firm success (*Z*) to assume a negative value (-0.2), in contrast with the positive correlation *r*_{ZX} (0.5) between the same variables. This sign reversal characterizes what is known as Simpson’s paradox in path models.

Top## 3. The Likelihood Of Simpson’S Paradox In Contingency Tables

Simpson’s paradox is generally perceived as a problematic phenomenon, since it leads to paradoxical conclusions based on empirical research (Pearl, 2009; Wagner, 1982). In the three-variable path model illustration above, the results would lead many researchers to believe that the association between degree of collaborative management (*X*) and firm success (*Z*) is negative.