Detection of Virtual Private Network Traffic Using Machine Learning

Detection of Virtual Private Network Traffic Using Machine Learning

Shane Miller (Ulster University, Derry, UK), Kevin Curran (Ulster University, Derry, UK) and Tom Lunney (Ulster University, Derry, UK)
DOI: 10.4018/IJWNBT.2020070104


The detection of unauthorized users can be problematic for techniques that are available at present if the nefarious actors are using identity hiding tools such as anonymising proxies or virtual private networks (VPNs). This work presents computational models to address the limitations currently experienced in detecting VPN traffic. A model to detect usage of VPNs was developed using a multi-layered perceptron neural network that was trained using flow statistics data found in the transmission control protocol (TCP) header of captured network packets. Validation testing showed that the presented models are capable of classifying network traffic in a binary manner as direct (originating directly from a user's own device) or indirect (makes use of identity and location hiding features of VPNs) with high degrees of accuracy. The experiments conducted to classify OpenVPN usage found that the neural network was able to correctly identify the VPN traffic with an overall accuracy of 93.71%. The further work done to classify Stunnel OpenVPN usage found that the Neural Network was able to correctly identify VPN traffic with an overall accuracy of 97.82% accuracy when using 10-fold cross validation. This final experiment also provided an observation of 3 different validation techniques and the different accuracy results obtained. These results demonstrate a significant advancement in the detection of unauthorised user access with evidence showing that there could be further advances for research in this field particularly in the application of business security where the detection of VPN usage is important to an organization.
Article Preview

1. Introduction

Virtual private networks (VPNs) are becoming a popular method for criminals and other bad actors to hide their online activities (Miller et al., 2018; Miller et al., 2018). This is helped along by the increase in ease of use of VPNs; they are no longer just a tool for remotely accessing enterprise resources when travelling for work or when working from home. In fact, this could be a use-case for a criminal. If they wish to remotely access an enterprise network in order to steal company and trade secrets, they can use a VPN (or multiple VPNs) in order to hide their own location or to make it appear as if someone else was infiltrating the network (Geetha & Phamila 2016). There have been a few notable cases of this happening in recent years, such as the Sony Pictures incident from 2014, where confidential data including personal information about employees was stolen (Peterson, 2014). It is likely that the attackers used a VPN to hide their location and identity as, to this date, no one has been officially charged with the crime and brought in front of a court (Pagliery, 2014). Other attacks of note are the various data breaches which have been occurring for the last number of years, such as the LinkedIn breach of 2012 which was only discovered in 2016 (Hunt, 2016). Approximately 167 million account details including emails and passwords were stolen. It is not known whether the attacker(s) were using a VPN service to hide their location.

There are various types of anonymity technologies available with most being based on networks called “mix” networks. These ‘Mix networks’ route packets in such a way as to make it extremely difficult a link between the source of the request and the destination. They achieve this through intermediaries and ‘mixing’ packets from multiple participant. This makes it very difficult for eavesdroppers to trace end-to-end communications (Faris et al., 2019). The anonymous communication systems that result can be categorised into one of two groups: message based/high-latency applications or flow based/low-latency applications (Yang 2015). High latency applications can include email and e-voting systems. Low latency systems include the popular anonymous communication system Tor as well as various kinds of HTTP/SOCKS proxy services and VPNs (Varvello et al., 2019). Systems such as Tor fall under the category of multi-hop anonymous communications models, while HTTP/SOCKS proxies and VPNs generally fall under the category of single-hop anonymous communication models. A proxy server is a server that acts as an intermediary for requests from clients for resources located on other servers on a network or the Internet. A basic type of proxy is a gateway which can be found on most consumer wireless routers. Another type of proxy is a reverse proxy which is a server on an internal company network that acts as an intermediary for other servers based on that network. Reverse proxies are typically used as an Internet facing server that handles several different tasks, load balancing being one of them. The proxy server distributes requests between several web servers and acts as a cache for static content such as pictures and other graphical content. Proxy servers that are used to provide anonymisation are based on another type of proxy known as an “open” proxy. Open proxies are a proxy that is available to any user on the Internet. They are mostly used to set up anonymous proxy websites and categorised as a single-hop anonymous communication model. There are several different implementations of VPNs for providing anonymous communications (Zorn, 1999; Rawat et al., 2001; Lawas et al., 2016). The intended use for VPN implementations was to allow an organisation’s workers to securely access internal network resources from outside of the internal network i.e. remote access. This is achieved through setting up a connection called a tunnel between the user’s PC and the organisations servers. VPNs however can also be used as an anonymous communication system in an equivalent manner to an anonymous proxy server. The main difference between the two methods is in the VPN’s tunnelled connection. The tunnelled connection between the user and the VPN server is encrypted.

Complete Article List

Search this Journal:
Open Access Articles
Volume 10: 2 Issues (2021): Forthcoming, Available for Pre-Order
Volume 9: 2 Issues (2020)
Volume 8: 2 Issues (2019)
Volume 7: 2 Issues (2018)
Volume 6: 2 Issues (2017)
Volume 5: 1 Issue (2016)
Volume 4: 3 Issues (2015)
Volume 3: 4 Issues (2014)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing