Traffic Flows Forecasting Based on Machine Learning

Traffic Flows Forecasting Based on Machine Learning

Vladimir Deart, Vladimir Mankov, Irina Krasnova
DOI: 10.4018/IJERTCS.289198
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The article aims to develop a model for forecasting the characteristics of traffic flows in real-time based on the classification of applications using machine learning methods to ensure the quality of service. It is shown that the model can forecast the mean rate and frequency of packet arrival for the entire flow of each class separately. The prediction is based on information about the previous flows of this class and the first 15 packets of the active flow. Thus, the Random Forest Regression method reduces the prediction error by approximately 1.5 times compared to the standard mean estimate for transmitted packets issued at the switch interface.
Article Preview
Top

1. Introduction

Every year, the traffic of telecommunications networks is growing rapidly, and the number of users and services is increasing. There are new types of applications, each of which requires ensuring the quality of service at the proper level. There are several main categories of services, such as voice, video, data, and many of their extensions: audio and video conferencing, IPTV, etc., for which appropriate traffic management policies are provided. However, the network operator does not always know the nature and category of incoming traffic, which complicates the task of managing network resources and dynamically allocating bandwidth. Unidentified and non-allocated traffic flows from the network's point of view are processed according to the Best Effort principle.

The first approaches to defining traffic classes were based on a list of well-known TCP and UDP ports, but with the advent of dynamically changing ports, the use of this method became impossible.

The well-known DPI (Deep Packet Inspection) technology allows for “deep” analysis of packet headers at the upper levels of the OSI model. Nevertheless, with the help of the DPI system, it is also not always possible to identify the nature of the data flow, for example, in cases of encrypted or tunneled traffic. In addition, DPI technology does not allow to predict the future characteristics of traffic flows, such as the rate or frequency of arrival of packets.

The flow characteristics are often evaluated as packets arrive at the network interface. Flow Table as an example allows estimating the current mean rate of each identified flow. But the current flow rate does not always correspond to the mean rate of the entire flow, so forecasts using the mean rate remain quite inaccurate.

Some approaches involve predicting the bandwidth used based on time series, such as the ARIMA (AutoregRessive Integrated Moving Average) model. But in this case, the forecast is given short-term and is made based on all flows, which does not allow to evaluate the characteristics of each of the flows.

Recently, data mining methods have been used more and more effectively in telecommunications, especially Machine Learning (ML) approaches, to solve a wide range of tasks, including traffic classification and determination of its characteristics.

Supervised Learning and Unsupervised Learning stand out among the methods of machine learning. Supervised Learning methods imply the presence of a database that consists of a certain number of different samples, each of which is characterized by its own set of features and the corresponding class. This database is divided into a training and test sequence. The training sequence is used to build a classifier or regressor model, and the test sequence is used to evaluate it. During testing, the algorithm's efficiency is checked by comparing the predicted values and the true classes. Supervised Learning methods are fast and accurate but can only predict classes known to the model initially. For Unsupervised Learning methods, class values are not defined, which complicates the task and reduces prediction accuracy, but it is possible to detect new classes.

The authors suggest using Supervised Learning methods to speed up classify traffic flows based on their characteristics. The parameters of the first fifteen packets of the flow are analyzed, such as the length of each packet and the inter-interval arrival time between two consecutive incoming packets and the parameters calculated on their basis. The model uses them to determine the class of the traffic flow. A class refers to a specific application. Using Unsupervised Learning methods, the model expands, adds new classes to the model, and refines existing classes (Deart, Mankov, & Krasnova, 2021; Deart, Mankov, & Krasnova, 2020; Mankov, & Krasnova, 2019; Mankov, Deart, & Krasnova, 2021; Mankov, & Krasnova, 2017).

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 2 Issues (2018)
Volume 8: 2 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing