Modeling for Time Generating Network: An Advanced Bayesian Model

Modeling for Time Generating Network: An Advanced Bayesian Model

Yirui Hu
Copyright: © 2017 |Pages: 10
DOI: 10.4018/978-1-5225-1750-4.ch003
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Modeling co-occurrence data generated by more than one processes in network is a fundamental problem in anomaly detection. Co-occurrence data are joint occurrences of pairs of elementary observations from two sets: traffic data in one set are associated with the generating entities (Time) in the other set. Clustering algorithms are valuable because they can obtain the insights from the varied distribution associated with generating entities. This chapter leverages co-occurrence data that combine traffic data with time, and compares Gaussian probabilistic latent semantic analysis (GPLSA) model to a Gaussian Mixture Model (GMM) using temporal network data. Experimental results support that GPLSA holds better promise in early detection and low false alarm rate with low complexity of implementation in a fully automatic, data-driven solution.
Chapter Preview
Top

Background

As we discussed in previous chapter, the optimal choice of model usually comes with a good understanding of domain knowledge. Cluster-based analytical models are able to identify the underlying similar patterns for temporal traffic data.

A number of studies have used GMM for detection problems as described in Tax and Duin (1998) and Desforges, Jacob and Cooper (1998). One can develop an overall model that looks at the entire data collected over all time slots without explicit regard to the time variable. GMM is a probabilistic clustering algorithm that assumes all the data points are generated from a mixture of finite number Gaussian distribution, whose parameters are estimated using EM algorithm. Mathematics form of GMM is derived in previous chapter.

However, in most applications for co-occurrence data where observations are generated by time or node ID, the usual distribution often varies with generating entities, which poses a challenge for effective anomaly detection. GMM has advantage when clusters have different sizes and correlations within them, while ignoring the generating entities may result in a missed detection in co-occurrence data. For example, if we evaluate the likelihood of a testing data point, the clustering model may yield a result of ‘likely’, whereas in reality it is unlikely for the underlying time.

Our work thus extends GMM to a Bayesian probabilistic solution through a Bayesian model. We implement GMM as a reference model, and compare to a Bayesian probabilistic model called GPLSA.

Complete Chapter List

Search this Book:
Reset