Article Preview
TopIntroduction
Social media is becoming a more prevalent part of our everyday life, due to the advancements in technology and virtualization. The availability of the Internet, cameras and real-time message boards at our fingertips has brought about live and parallel reporting, and witness testimonies during many events. These reports can be useful to responders and can help create awareness among the populace, especially in emergency situations (Meier, 2015; Watson, Finn, and Wadhwa, 2017). Despite the potential benefits, major response groups and organizations under-utilize these sources of information, as therein lie many administrative and technical challenges (Meier, 2013). Among the challenges, there are reliability issues associated with public and unstructured data, as well as information overload issues, as millions of messages are posted during a crisis situation (Bullock, Haddow, and Coppola, 2012).
There are many recent studies that propose the use of machine learning techniques to provide automated methods for analyzing social media data to reduce the information overload (Imran et al., 2015; Beigi et al., 2016). Machine learning techniques can help transform raw data into usable information by labeling, prioritizing and structuring data, and making them beneficial to responders and to the populace in times of need (Qadir et al., 2016). However, supervised learning algorithms rely on labeled training data to build predictive models. Accurate labeling of data for an emerging crisis is both time consuming and expensive, and, hence, it is not appropriate to assume that labeled data for a current crisis will be promptly available to be used for analysis. The lack of labeled data for emerging crisis events prohibits the use of supervised learning techniques.
To address this challenge, several works proposed to use labeled data from prior “source” crises to learn supervised classifiers for a “target” crisis (Verma et al., 2011; Imran et al., 2013; Imran, Mitra, and Srivastava, 2016). However, due to the divergence of each crisis in terms of location, nature, season, etc. (Palen and Anderson 2016), the source crisis might not accurately represent the characteristics of the target crisis (Qadir et al., 2016; Imran et al., 2015). Domain adaptation techniques (Pan and Yang, 2010; Jiang, 2008) are designed to circumvent the lack of labeled target data by making use of unlabeled target data as guideposts for the readily available labeled source data. Studies in the emergency space have shown that using domain adaptation techniques, which use target unlabeled data and source labeled data together, significantly improve classification results as compared to supervised techniques that solely use labeled source data (Li et al., 2015, 2017). Unlabeled data from the target crisis becomes more abundant as the event unfolds, and it can enable the use of domain adaptation techniques during emerging or occurring crisis events.