Exploration of Web Page Structural Patterns Based on Request Dependency Graph Decomposition

Exploration of Web Page Structural Patterns Based on Request Dependency Graph Decomposition

Cheng Fang (Beijing University of Post and Telecommunications, Beijing, China) and Bo Ya Liu (Beijing University of Post and Telecommunications, Beijing, China)
Copyright: © 2016 |Pages: 13
DOI: 10.4018/IJDCF.2016100101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This article first proposed a Bipartite Request Dependency Graph (BRDG) that describes the object-level interrelationships between user click requests and embedded web object requests. These two kinds of requests are classified from HTTP data by an identification algorithm. The interrelationships between user click requests and embedded web object reflect the web page structural, which contain latent web information. Exploring structural patterns is crucial for many aspects like web security analysis and web information visualization. Accordingly, the article also proposed a novel graph decomposition method called orthogonal nonnegative matrix tri-factorization (tNMF) to the BRDG. Compared to traditional web graph analysis focus on statistical and structural properties of the whole graph, the proposed method is dedicated to mine latent web structural patterns. Decomposition results demonstrate that several interesting structures exist in the BRDG. The article aims at classifying these subgraphs as several structural patterns and shedding light on the causes of these patterns.
Article Preview

Introduction

With rapid development of next generation networks (NGN) proposed by (ITU-T, 2006), heterogeneity and complexity in networks have become increasingly prominent. In the meanwhile, to manage and secure the networks, information visualization is imperative for administrators. Under these circumstances, many researchers have devoted themselves to web usage mining and obtained numerous achievements. This paper focuses on exploring latent interrelationships between web objects on World Wide Web (WWW).

One of the most widely used model for web structural analysis is Web graph, where the nodes are web pages and the edges are the hyperlinks between them. (Broder, 2000) conducted the first large-scale study on the Web graph and presented a bow-tie picture consisting of three distinct components in almost equal size to describe the macroscopic structure of the web. On this basis, (Donato, 2005) offered a better understanding of inner structures in the Web graph. Besides, (Huang & Lai, 2003) proposed a new approach to cluster the Web graph and applied it to web visualization. Nevertheless, Web graphs only represent the relationship between hyperlinks, without considering the context. (Sethu & Yates, 2010) achieved hyperlink classification using text mining analysis and built a multi-relational web graph. As compared with the fixed web, the mobile web is structurally different. (Jindal, 2008) found that the connectivity of mobile web was sparser than the fixed one and the node degree distributions fell off much more rapidly. (Liu & Ansari, 2014a) identified website communities successfully in mobile Internet based on affinity measurement.

This paper introduces the notion of the Bipartite Request Dependency Graph (BRDG), a graph derived from a dependency graph model in (Liu, 2014b), which took a two-step algorithm to identify user clicks from a plenty of HTTP requests. In other words, the BRDG is a useful application of the dependency graph model. The authors’ study finds that the BRDG is very large, sparse and seemingly complex. To explore latent web structural patterns, the authors apply a tNMF-based graph decomposition method to the BRDG and extract a number of interesting structural properties. The tNMF is a co-clustering algorithm, which has been shown to be useful in many applications, such as identifying suspicious activities through DNS failure graph in (Jiang, 2010), and understanding the intensive and continuous data usage patterns from mobile users (Jin, 2012).

The major contributions of this paper lie in three respects: (i) the authors propose the bipartite dependency graph to describe the interrelationships between user click requests and embedded web object requests in mobile Internet; (ii) the authors implement the tNMF-based decomposition method on BRDG; (iii) based on decomposition results, the authors classify them as five structural patterns and reveal the causes of these patterns. The rest of this paper is organized as follows. Section 2 introduces related work. Section 3 mainly introduces the definition and overall characteristics of the BRDGs. Section 4 presents the graph decomposition and classification methodology. Section 5 summarizes decomposition results as five structural patterns and gives meaningful interpretations. Section 6 concludes our paper briefly and proposes the future work.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing