Multiple Datasets
When a study requires multiple datasets, there is, as will be shown here, considerable effort in the preprocessing. This is particularly true when multiple years of data are combined and compared, as well as multiple types of data contained within multiple datasets as is typical of data from the Medical Expenditure Panel Survey. However, most of the studies do not indicate just how the data mergers take place. (Harman, Edlund, & Fortney, 2009; Kamble & Bharmal, 2009) For example, a recent study claimed to be able to compare a patient with diabetes to an identical patient without diabetes. However, co-morbidities are more likely with diabetes, so that there is a conditional probability factor that should also be taken into consideration. Again, no real information is provided to ensure that the preprocessing is performed correctly. (Balu, 2007) In this particular study, the total expenditure is the sum of expenditures contained within each of the different datasets, and the results depend very directly on the quality of the preprocessing.
There are different types of data merges, and you must be careful to use the correct merge. There are two basic types: one-to-one merging that combines observations from two data sets into a single observation in a new data set and match merging that combines observations from two data sets into a single observation in a new data set according to the values of a variable that you can specify. This second type of merging can be complete, or it can be what is called an inner or outer join. When using this type of join, great care must be exercised, or it is possible to have too many observations (duplicates) in the merged dataset.
The one-to-one merging is the safest. However, it is not always practical given the nature of the data. We will discuss this second type of merging in detail. According to a recent study, many to many merging is still not well understood or well managed. (Asiala & Gober, 2005) Great care must be taken when merging, or the results will lead to erroneous conclusions. (Paiba, et al., 2007)