Article Preview
Top1. Introduction
Having more knowledge of network protocol is of great value in many network security applications, such as deep packet inspection (Bossert, Guihéry, & Hiet, 2014), botnet analysis (Chang, Mohaisen, Wang, & Chen, 2015), vulnerability discovery (Rafique, Caballero, Huygens, & Joosen, 2014) and signature generation (Wang, Xiang, Zhou, & Yu, 2012). Most protocol reverse analysis focus on analyzing and inferring an unknown protocol specifications, such as message format and fields (Caballero & Song, 2013; Meijian Li, 2013), but pay little attention to the protocol's behavior. The fundamental role of the protocol behavior analysis is more important, because a protocol's behavior, especially its hidden behavior affects the protocol's execution security, which is directly related to the foundation of network security and grid security (Hailong Sun, 2011; Hoang, 2012; Papavassiliou, 2008; Sabri Pllana, 2009).
Protocol reverse analysis may be the best way to study unknown protocol’s hidden behavior. The hidden behaviors we face are varied and sometimes may be calamitous. For some protocols, the malicious functions, special modules or key code segments are encrypted, confused or control flow obfuscated. For other protocols, the malicious behaviors are embedded into normal behaviors, and they could be triggered only under specific conditions. Traditional network security technologies have less effective for hidden behavior, and may affect the normal communication, because a concealed malicious behavior does not replicate or spread, and even has no significant malicious characteristic. The growing invisibility, robustness and survivability of a hidden behavior make the traditional analysis, tracking and recognition more difficult. How to grasp a general analysis method which can mine and explore the unknown protocol's hidden behavior quickly and accurately is becoming a new challenge for network security. Mining a protocol's hidden behavior is a key problem that could not be avoided for protocol reverse analysis. The proposed method opens up a new avenue of protocol behavior research in network and cloud security (Chen, 2015; Ficco, 2013; Mansour, 2015; Pereira, 2015).
Recently binary analysis plays a vital role in combating the rapidly growing unknown protocols (Caballero & Song, 2013; Fanzhi Meng 2014; Li Xiang-Dong, 2011). Most existing analysis algorithms operate on either static message features (Wondracek, 2008) or dynamic behavior features (Fanzhi Meng 2014; Juan Caballero, 2012). However, these two distinct approaches have their own strengths and weaknesses in handling different types of protocol hidden behaviors (Wendzel & Keller, 2014). Although the ideas of combining static and dynamic have been mentioned in both industry and academia (Ying WANGa, 2013), very few have addressed their systematic integration. In this paper, we propose a novel method of integrating dynamic and instruction clustering analysis to mine an unknown protocol’s hidden behavior regardless the types and hidden techniques the protocol uses. It just relies on the protocol’s binary raw data (including both binary code and protocol message) to quickly cluster protocol’s behaviors and to finally explore its hidden behavior.