Protocol Identification of Encrypted Network Streams

Protocol Identification of Encrypted Network Streams

Matthew Gebski (National ICT Australia and University of New South Wales, Australia), Alex Penev (National ICT Australia and University of New South Wales, Australia) and Raymond K. Wong (National ICT Australia and University of New South Wales, Australia)
DOI: 10.4018/978-1-60566-748-5.ch015
OnDemand PDF Download:
No Current Special Offers


Traffic analysis is an important issue for network monitoring and security. The authors focus on identifying protocols for network traffic by analysing the size, timing and direction of network packets. By using these network stream characteristics, they propose a technique for modelling the behaviour of various tcp protocols. This model can be used for recognising protocols even when running under encrypted tunnels. This is complemented with experimental evaluation on real world network data.
Chapter Preview


Computer security and intrusion detection are important problems in computer science that have no optimal solutions, and improvements on existing techniques are frequently published. One interesting area is that of misuse detection, wherein we attempt to identify inappropriate behavior and use of a system by its users (whether legitimate or not). With the increase in importance of the web over the past 10 years, so too has there been an increase in the number of ways that system resources can be abused. For instance, users may abuse their privileges by tunneling or using a proxy for P2P file sharing application over HTTP or SSH so that it appears to be a different activity.

This chapter looks at the problem of identifying the protocol of a network stream for which very little information is available. Unlike most previous approaches, the approach outlined here restricts itself to using only the timing, size and direction of packets and assumes that their content is scrambled. There are numerous scenarios where the amount of information for identification is limited to these attributes. For example, a proxy tunnel can be used to run an instant messenger chat application, yet the actual packets that identify the traffic as a chat protocol are enveloped inside the SSH stream and encrypted. Tunnels and proxies are sometimes used to secretly run potentially-inappropriate activities in the workplace, such as file-sharing, IRC and instant messengers, and to circumvent a company's firewall in accessing blocked websites (e.g. webmail).

When the underlying protocol is scrambled and sent through a tunnel or proxy protocol, the only information the local routers can observe is the timing, size and direction of the packets. If the inner packet headers were available then identifying the protocol would be a trivial task. However, we restrict ourselves only to surface-level information.

Our aim is to develop a model based on the traffic structure that is visible externally and use it to train protocol profiles. A new (and unknown) stream can then be matched against these profiles. Suspicious connections that are identified as an inappropriate protocol can then simply be flagged for an administrator to investigate further. The presented approach constructs a bipartite graph to model the incoming and outgoing packets, a feature present in virtually all network traffic. In this graph, edges between nodes are the likelihoods of encountering a certain packet (size, timing, direction) after another. We can then classify a stream as a particular protocol with a confidence score by finding particular subsequences of packets that are indicative of a known protocol.

The contributions of this chapter are:

  • An approach for discrimination of network protocols that is suitable for encrypted traffic.

  • An improved model which provides higher accuracy and facilitates analysis of multiple protocol steams in one session.

  • Analysis of accuracy and running time on real-world network data.

  • A data set comprising 50,000 unique connections for several common protocols.



While there are many commercial and academic tools available for monitoring computer systems, many of these are rule-based and require a large amount of human effort to precisely specify what constitutes acceptable behavior and use of a system. As such, current attention is on developing techniques and tools that facilitate a more-automated approach.

In this context, we are not actively trying to prevent unauthorized access and as such there is a difference to the related problem of intrusion detection. However, intrusion detection systems use only rudimentary protocol identification techniques. The main IDSs such as Snort (Roesch, 1999) define the protocol of a stream based on the connection ports (McAfee, 2008) and (Enterasys Networks Incorporated, 2008), (Paxson, 1999).

Instead, we concentrate on determining if actions performed by the user, whether authorized or not, appear acceptable. This can still relate to unauthorized access because a malicious party that gains access to a system may use the compromised machine for inappropriate. This activity should be flagged, even for a legitimate user.

Complete Chapter List

Search this Book: