Article Preview
TopIntroduction
As a subfield of data mining (Fayyad & Uthurusamy, 1996; Rajagopalan & Krovi, 2002), spatio-temporal data mining studies the discovery of interesting, implicit relationships and characteristics from spatio-temporal data (Koperski, Han, & Adhikary, 1998; Yao, 2003). This field has been attracting significant research interest in recent years, driven by the increasing availability of large datasets containing important spatial and temporal elements across a wide spectrum of application domains. Some examples of such application domains include public health (disease case reports), public safety (crime case reports), financial fraud detection (financial transaction tracking data), transportation (data from Global Positioning Systems (GPS)), and product lifecycle management (data generated by Radio Frequency Identification (RFID) devices) (P. Yan & Zeng, 2008a, 2008b; Zeng, Ma, Chen, & Chang, 2009). Actionable knowledge discovered from data with spatial and temporal dimensions can provide decision makers with valuable insights and support in their decision making processes.
Current practices of spatio-temporal data analysis largely reside in the identification of “hotspots”, areas that exhibit exceptionally high or low measures on some characteristic, and the discovery of significant changes in a timely manner in geographic areas (Chang, Zeng, & Chen, 2005; Kulldorff, 2001). While such analyses focus on a single type of events, we study spatio-temporal relationships among multiple event types in this paper. We focus on two case studies in the domains of infectious disease informatics (Lu, Zeng, & Chen, 2010) and crime analysis (Chen, et al., 2003; Zhao, et al., 2006) for evaluation purposes. Our methods, however, are general and can be used to solve business problems, such as transport demand modeling (D. Wang & Cheng, 2001) and financial crime analysis (Masciandaro, 2004), where understanding the spatio-temporal relationships among multiple event types may bear important managerial implications.
Assessing and analyzing spatio-temporal cross-correlations among multiple data streams can unveil the relationships among the underlying event types. Correlation analysis has been applied mainly in such fields as forestry (Stoyan & Penttinen, 2000), acoustics (Tichy, 1973; Veit, 1976), entomology (Cappaert, Drummond, & Logan, 1991), and animal science (Lean, et al., 1992; Procknor, Dachir, Owens, Little, & Harms, 1986), where the analyses have focused on either time series or spatial data. However, in applications such as infectious disease informatics and crime analysis where both spatial and temporal dimensions are essential, considering only one of the dimensions at a time can be problematic. Important correlations may be missed due to aggregate effects on the overlooked dimension. Spurious, misleading correlations may be signaled if the directionality of time is ignored.
One of the widely adopted measures of correlation is Ripley’s
function, which mainly focuses on spatial data (B. D. Ripley, 1976; B.D. Ripley, 1981). The parameter
characterizes the spatial distance scale under consideration. In order to analyze datasets with both spatial and temporal dimensions, we have extended Ripley’s
with an additional temporal parameter
. The classical
then becomes a special case of this new measure
–other than a scaling difference–when the temporal parameter
equals the entire time span under investigation and a two-tail time window is employed.