A Multifaceted Machine Learning Approach to Understand Road Accident Dynamics Using Twitter Data

A Multifaceted Machine Learning Approach to Understand Road Accident Dynamics Using Twitter Data

DOI: 10.4018/978-1-6684-7693-2.ch013
(Individual Chapters)
No Current Special Offers


Road accidents, causing 1.35 billion deaths and 50 million injuries annually, are a significant global issue that demands timely detection and prevention. This study reviews existing research on road accident detection using data mining techniques. In this research, the authors developed a method for classifying road accident-related tweets using Twitter mining. They collected a dataset of road accident-related tweets, pre-processed them, and cleaned the data using natural language processing. Various machine learning models were applied to classify tweets into real-time, traffic, and informative categories, including SVM, logistic regression, ANN, LSTM with TF-IDF, and LSTM with BERT. The LSTM model with BERT exhibited the highest precision and recall scores of 0.88 and 0.87, respectively. The findings highlight the potential of Twitter mining for real-time road accident detection. Despite model accuracy and robustness limitations, this research is a promising starting point for leveraging social media data to enhance road safety.
Chapter Preview


Road transportation is the primary mode of transit for most people, whether on foot, by bicycle, or in a vehicle. Consequently, they may encounter or witness numerous road accidents. A road accident is defined as an incident involving at least one vehicle on a public road that results in injury or death. With 1.35 billion fatalities and 50 million injuries annually, road accidents are a significant public health concern (Alomari, Mehmood, & Katib, 2019; Aqib, Mehmood, Alzahrani, & Katib, 2020; WHO, 2021). Road accidents are the leading cause of death for individuals aged 15-30 (S. Wang et al., 2017). Detecting and analysing road accidents and predicting their location and type are crucial for reducing fatalities and other types of damage.

Social media platforms such as YouTube, Facebook, Twitter, LinkedIn, Snapchat, WhatsApp, Pinterest, Next-door, TikTok, and Reddit enable users to share and interact with content (Auxier & Anderson, 2021). These platforms generate enormous user-generated data on events occurring in real-time (Ali et al., 2021; Alomari et al., 2019). Twitter is particularly well-suited for data analysis among these platforms due to its open-source nature and large user base. Users post short text messages called “tweets,” which can contain up to 280 characters and are often organised by keywords and hashtags (Samuel, Ali, Rahman, Esawi, & Samuel, 2020). However, data obtained from tweets can be subjective, context-specific, and potentially misleading (Essien, Petrounias, Sampaio, & Sampaio, 2021).

Twitter data mining techniques have been widely used to identify trends, patterns, and events in various fields (Jain & Katkar, 2015). The platform offers API access to developers, albeit with some limitations to protect user privacy (Johnson, 2018). Text mining, a data mining technique that converts unstructured data into structured data, can be applied to Twitter data to reveal hidden relationships and patterns (Li, Xie, Jiang, Zhou, & Huang, 2019; Maghrebi, Abbasi, Rashidi, & Waller, 2015). This process can provide valuable insights for organisations, such as road accident-related ones (Giummarra, Beck, & Gabbe, 2021).

Public safety can be improved, and deaths and injuries can be decreased by raising knowledge of traffic incidents and their potential consequences. Locations where accidents are most likely to happen, might be identified to reduce risks and damage. With individuals sharing their experiences and responses to these incidents, Twitter can be a valuable source of information on traffic accidents as it is a rich real-time data source (Ali et al., 2021). It is the perfect data source for research because many tweets about traffic accidents are posted daily.

Researchers can access data about traffic accidents, including location, keywords, owner, and mentions of published tweets, by utilising Twitter's open-source API and developer accounts. Despite being unstructured and subjective, text-mining techniques can turn this data into a structured dataset with observable patterns and relationships. This method can potentially increase traffic safety by recognising trends and foretelling accident locations and types. As a result, there is a pressing need for more research on the public health implications of traffic accidents. Using Twitter data and text mining techniques, researchers can produce structured datasets that reveal patterns and links connected to auto accidents. The information gathered can be used to raise public awareness, guide public safety initiatives, and eventually lower the number of fatalities and injuries from traffic accidents.

Key Terms in this Chapter

Twitter Mining: Twitter mining collects and analyses large amounts of data from the Twitter platform. This data can include tweets, user profiles, and other information related to the Twitter activity. Twitter mining typically aims to extract useful insights and information from this data, such as identifying trends, understanding user behaviour, and detecting patterns or anomalies.

Deep Learning: Deep learning is a subfield of machine learning inspired by the human brain's structure and function, specifically the neural networks. It refers to artificial neural networks (ANN) with multiple layers, that is, architectures with more than one layer between input and output.

Term Frequency-Inverse Document Frequency (TF-IDF): TF-IDF is a numerical statistic that is used to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining.

Long Short-Term Memory (LSTM): LSTM is a Recurrent Neural Network (RNN) architecture specifically designed to handle sequential data with long-term dependencies. RNNs are neural networks that process sequential data, such as time series or natural language, by passing information from one sequence step to the next through hidden states.

Twitter: Twitter is a social media platform where users can post short messages called “tweets,” limited to 280 characters. Users can also interact with other users' tweets by “retweeting” or replying.

Support Vector Machine (SVM): SVM is a supervised learning algorithm that can be used for classification or regression problems. An SVM aims to find the best boundary (or “hyperplane”) that separates the data into different classes. Once the boundary is found, new data can be classified based on which side of the boundary it falls on.

Bidirectional Encoder Representations from Transformers (BERT): BERT is a pre-trained language model developed by Google. It is designed to understand the context of a given text by looking at the words before and after it. BERT can be fine-tuned for various natural language processing tasks such as question answering, sentiment analysis, and named entity recognition.

Complete Chapter List

Search this Book: