Article Preview
TopIntroduction
Tropical cyclones (TC) are severe thunderstorm system that rotates over a closed surface level low-pressure center that can vary in strength and potentially destructive power based on the TC’s maximum sustained wind speeds (Nakamua, Lall, Kushnir, & Rajagopalan, 2015; NHC, n.d.a; Simpson & Saffir 1974). This potential destructive power of TCs threatens to make landfall on world’s coastline yearly. Thus, the more intense the strength of the TC, the more intense is the potential describe power, which could lead to extensive fatalities and property damage (McAdie & Lawrence, 2000; Rappaport et al., 2009; Sheets, 1990; Zhao, Lin, Lee, Sun, & Zhang, 2016). On average $1 billion in damages accrue from a landfalling TC (National Centers for Environmental Information, 2016). To mitigate this extensive amount of fatalities and property damage the goal of TC forecasters is to provide early and relevant warnings on potential landfalling TCs (Comes et al., 2015; Gall, Franklin, Marks, Rappaport, and Toepfer, 2013; Wang et al., 2015). Early and relevant warnings give people time for preparation and evacuation.
The Hurricane Forecast Improvement Project (HFIP) a primary goal is to improve the forecast accuracy by 50% by 2019 to provide better earlier and relevant warnings (Gall et al., 2013). There are multiple ways to improve forecast accuracy and Gall et al. primarily focused on the use of dynamical and ensemble forecasting models to improve TC forecasts. But there was no mention of the use of novel methods like predictive data analytics. Therefore, this quantitative study focused on improving forecast accuracy by using the C4.5 decision tree. The C4.5 algorithm is a predictive data analytics algorithm that used the National Hurricane Center’s (NHC) tropical discussions from 2001-2015 to help improve forecast accuracy.
The NHC’s tropical discussion contains the explicitly recorded TC forecaster’s logic and knowledge behind each of the forecaster’s TC forecasts (Cangialosi, 2016; Rappaport et al., 2009; Williamson et al., 2014). Since 2001, the NHC has been creating five-day TC forecasts (Cangialosi, 2016). Therefore, from 2001 to 2015, the NHC has accrued 5,131 forecasts in the form of tropical discussions which contains over 1.35 million words (NHC, n.d.a). Thus, this dataset helped categorized this study as a study in big text analytics.
The application of big text analytics on meteorological data deepens the body of knowledge in big data analytics while furthering the field of meteorology by introducing new techniques and procedures (Corrales et al., 2015). This study will evaluate the results from both a meteorological and data analytics perspective to verify the importance and accuracy of the results (Garcia, Ferraz, & Vivacqua, 2009).
Thus, this study posed the following research question: Using the C4.5 algorithm on the five-day tropical discussions from 2001 to 2015, which weather pattern components can improve the Atlantic TC forecast accuracy? For this study, the null hypothesis (H0) is non-directional, whereas the alternative hypothesis (H1) is directional:
H0: There are no significant differences in the C4.5 algorithm derived weather pattern components, which can decipher the difference between a successful and unsuccessful TC forecast.
H1: There are significant differences in the C4.5 algorithm derived weather pattern components, which can decipher the difference between a successful and unsuccessful TC forecast.