The term “internationalization” refers to the process of international expansion of firms realized through different mechanisms such as export, strategic alliances and foreign direct investments. The process of internationalization has recently received increasing attention mainly because it is at the very heart of the globalization phenomenon. Through internationalization firms strive to improve their profitability, coming across new opportunities but also facing new risks. Research in this field mainly focuses on the determinants of a firms’ performance, in order to identify the best entry mode for a foreign market, the most promising locations and the international factors that explain an international firms’ performance. In this way, scholars try to identify the best combination of firms’ resources and location in order to maximize profit and control for risks (for a review of the studies on the impact of internationalization on performance see Contractor et al., 2003). The opportunity to use large databases on firms’ international expansion has raised the interesting question concerning the main data mining tools that can be applied in order to define the best possible internationalization strategies. The aim of this paper is to discuss the most important statistical techniques that have been implemented to show the relationship among firm performance and its determinants. These methods belong to the family of multivariate statistical methods and can be grouped into Regression Models and Causal Models. The former are more common and easy to interpret, but they can only describe direct relationships among variables; the latter have been used less frequently, but their complexity allows us to identify important causal structures, that otherwise would be hidden.
We now describe the most basic approaches used for internationalization. Our aim is to give an overview of the statistical models that are most frequently applied in International Finance papers, for their easy implementation and the straightforwardness of their result interpretation. In this paragraph we also introduce the notation that will be used in the following sections.
In the class of Regression Models, the most common technique used to study internationalization is the Multiple Linear Regression. It is used to model the relationship between a continuous response and two or more linear predictors.
Suppose we wish to consider a database, with N observations, representing the enterprises, the response variable (firm performance) and the covariates (firm characteristics). If we name the dependent variable Yi, with i = 1, …, N, this is a linear function of H predictors x1, …, xH, taking values xi1, …, xiH, for the i–th unit. Therefore, we can express our model in the following way:, (1) where β1, ..., βi are the regression coefficients and εi is the error term, that we assume to be normally distributed with mean zero and variance σ2 (Kutner et al., 2004).
Hitt, Hoskisson and Kim (1997), for example, applied the technique shown above to understand the influence of firm performance on international diversification, allowing not only for linear, but also for curvilinear and interaction effects for the covariates.
When our dependent variable is discrete, the most appropriate regression model is the Logistic Regression (Giudici, 2003).
The simplest case of Logistic Regression is when the response variable Yi is binary, so that it assumes only the two values 0 and 1, with probability pi and 1-pi, respectively.
We can describe the Logistic Regression model for the i-th unit of interest by the logit of the probability pi, linear function of the predictors:, (2) where is the vector of covariates and is the vector of regression coefficients.