Panel data is a regression analysis type that uses time data and spatial data. Thus, the behavior of groups, for example, enterprises or communities, is analyzed through a time scale. Panel data allows exploring variables that cannot be observed or measured or variables that evolve over time but not across groups or communities. In this chapter, two different techniques used in panel data analysis is explored: fixed effects (FE) and random effects (RE). First, theoretical concepts of panel data are presented. Additionally, a case study example of the use of this type of regression is provided. Panel data analysis is performed with R language, and a step-by-step approach is presented.
TopIntroduction
In Statistical Data Analysis, when analyzing a dataset containing variables observed through time, Panel Data regression analysis methods are commonly used. Panel data research, also associated with longitudinal or cross-sectional time-series data (t=1,…,T). can be used in the study of varied types of entities, from companies or countries to individuals (i=1,…,N). From the perspective of data structure, spatial panel data models are the combination of conventional cross-sectional and time series data models, as represented by (Zhou & Yamaguchi, 2018):
Figure 1. Structure of panel data models
Panel data research can provide means to control subjacent variables not observed or measured. Thus, it accounts for individual characteristics, for example, when studying the evolution of several communities or groups in social media through time, differences in behaviors across communities or variables that change over time but not across communities (i.e., global rules, agreements between communities or social media platforms rules)
This document is focused in two techniques used to analyze panel data:
- •
Fixed effects
- •
Random effects
Thus, the authors initiate the research by introducing the reader to the background state of the art regarding Panel Data. Then, the primary focus of the chapter considers the introduction to the case study data and obtained results. Results start with the model calibration for the Linear Regression. Additionally, the authors present and explain the case study results for four specifications, one-way or two-way, fixed or random effects, and compare the final results.
TopBackground
Historically, econometric and statistical models have been developed by using cross-sectional or time-series data (Washington, Karlaftis, & Mannering, 2003). However, in several cases, there is an availability of data based on cross-sections of individuals observed over time (or other observational units such as firms, geographic entities, and so on). Data which combines cross-sectional and time-series characteristics, can be called panel data, pooled data or longitudinal data (Dougherty, 2006).
Panel data can provide predictions on the evolution of a certain dependent variable according to other variables that are measured among distinct entities (cross-sectional) and time intervals (time series). Thus, it allows researchers to construct and test realistic behavioral models that cannot be identified using only cross-sectional or time-series data. Formally, a panel data has the following form (Kunst, 2011)
Xit,
i=1,…,
N, t=1,…,
T.
Panel data can be represented in a rectangular form, like a board. Dimension i is called the “individual dimension,” and t is the time dimension. X can be a scalar (real) variable or also a vector-valued variable. Additionally, a general panel data regression model is written as (Hauser, 2013)
.where:
is a K dimensional vector of explanatory variables, without a constant term,
β0 the intercept, is independent of i and T.
β is a (K×1) vector, the slopes, is independent of i and T.
ϵit the error, varies over i and T.
Individual characteristics (which do not vary over time), zi may be included. In this case, the panel data regression model is written as
.where
is a
K dimensional vector of individual characteristics (time-invariant).