Multivariate Time Series Forecasting of Rainfall Using Machine Learning

Multivariate Time Series Forecasting of Rainfall Using Machine Learning

Shilpa Hudnurkar, Vidur Sood, Vedansh Mishra, Manobhav Mehta, Akash Upadhyay, Shilpa Gite, Neela Rayavarapu
DOI: 10.4018/978-1-6684-3981-4.ch007
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Predicting rainfall is essential for assessing the impact of climatic and hydrological changes over a specific region, predicting natural disasters or day-to-day life. It is one of the most prominent, complex, and essential weather forecasting and meteorology tasks. In this chapter, long short-term memory network (LSTM), artificial neural network (ANN), and 1-dimensional convolutional neural network LSTM (1D CNN-LSTM) models are explored for predicting rainfall at multiple lead times. The daily weather parameter data of over 15 years is collected for a station in Maharashtra. Rainfall data is classified into three classes: no-rain, light rain, and moderate-to-heavy rain. The principal component analysis (PCA) helped to reduce the input feature dimension. The performance of all the networks are compared in terms of accuracy and F1 score. It is observed that LSTM predicts rainfall with consistent accuracy of 82% for 1 to 6 days lead time while the performance of 1D CNN-LSTM and ANN are comparable to LSTM.
Chapter Preview
Top

Introduction

Time series forecasting has now become a significant area of interest as it has gained economic importance. Vast amounts of data are being generated in various domains and most of this is in the form of time-series data. The analysis and study of time series data hold great importance due to the need for forecasting, in fields such as weather and stock markets. It is very difficult to adapt traditional linear methods for solving multivariate, multi-input forecasting problems which makes them unfit for time series forecasting (Liu et al., 2019). Rainfall prediction is crucial because the lives and livelihoods of people depend on it. In the agricultural sector, farming activities largely depend on rainfall and weather conditions. Prediction of rainfall is challenging and complex due to its dynamic nature (Srinivas et al., 2013). Large variability in rainfall during the rainy season is also observed (Hrudya et al., 2021). This necessitates rainfall prediction over a small geographical area to increase crop yield and prevent the farmers from incurring losses due to incorrect forecasting (Singh et al., 2021).

In India, India Meteorological Department (IMD) issues weather forecasts. The prediction models used by IMD for daily rainfall prediction are dynamic. Sikka (2009) narrated how numerical weather prediction (NWP) models (dynamic models) were developed and how National Centre for Medium-Range Weather Forecasting (NCMRWF), India, evolved over the years in medium-range predictions (3 to 10 days lead time prediction)(Sikka, 2009). NWP models are based on thermodynamic equations that model the current state of the atmosphere (Laing & Evans, 2011). These models require supercomputers to solve the thermodynamic equations on various spatial resolutions and are very complex as they process a huge amount of data.

With the advances in technology and by having the satellites dedicated to weather data and images, a lot of data related to weather variables, on various temporal scales, is being generated, analyzed, and recorded. Machine learning (ML) algorithms have been found suitable to process high-volume data, detect the pattern of the data, to learn from examples. Supervised machine learning (ML) algorithms require a large amount of data to train the artificial intelligent network. With the availability of the data, various ML algorithms such as long short-term memory networks (LSTM), artificial neural networks (ANN), decision trees, and support vector machines (SVM) have been studied over the previous decade to find out the most optimal algorithm for rainfall prediction (Brereton & Lloyd, 2010; Rathnayake et al., 2011; Saha et al., 2016; Shrivastava et al., 2012). Intelligent models of neural networks like LSTM and ANN can efficiently solve multiple input variable problems (Hudnurkar & Rayavarapu, 2021; Wahyono et al., 2020). For rainfall prediction with ML or deep learning methods, weather variable data is needed to be fed to the model as training data. The data must be preprocessed to accomplish better training of the models. The trained network then requires testing for the unseen data.

This chapter discusses the development of an intelligent model for rainfall forecasting with multiple lead times and with better accuracy. The significant contributions of this study are as follows:

  • 1.

    Development of a robust model that can predict rainfall using multivariate time series data.

  • 2.

    A comparative study of three artificial intelligence-based prediction models is carried out.

  • 3.

    A model for predicting rainfall accurately for multiple lead times from 1 to 6 days has been developed, trained, and tested.

Key Terms in this Chapter

1-Dimensional Convolution Neural Network Long Short-Term Network: A long short-term memory network that uses the properties of a convolutional neural network to extract features, is called a 1D CNN LSTM network.

Artificial Neural Network: The network that uses some functions and tries to mimic the function of the human brain is called an artificial neural network.

Confusion Matrix: The way to present the output of the classifier is called as confusion matrix. It enables the user to understand how the classifier performed in terms of various evaluation parameters such as accuracy, precision, and F1 score.

Long Short-Term Memory Network: An artificial intelligence network that can retain past information by using a gate-like structure built using functions is called a long short-term memory network.

Principal Component Analysis: When the dataset contains many features, a technique that returns the most important set of features without much loss of information contained in the dataset, is called principal component analysis.

Epochs: While using supervised Machine Learning techniques, the network is trained by presenting some examples. To reduce the error between the predicted and observed examples, the training data is repeated and presented to the network. These repetitions are called epochs.

Multiclass Classification: In artificial intelligence, dividing the response variable into more than two classes and determining the original class using AI techniques is called multiclass classification.

Complete Chapter List

Search this Book:
Reset