Article Preview
Top1. Introduction
The problem of underwater video object detection has received considerable attention during last decades and appreciable progress has been made in this direction (Emberton, Chittka, & Cavallaro, 2018; Hossain, Alam, Ali, & Amin, 2016; Mohapatra, Mahapatra, Mahapatra, & Swain, 2015; Walther, Edgington, & Koch, 2004).The underwater moving object suffers from limited range of visibility, low contrast, non-uniform lighting, blurring, bright artefacts, colour diminished and noise (Ancuti, Ancuti, Haber, & Bekaert, 2012; Emberton et al., 2018; Zhang et al., 2017). An automated system for detection and tracking of underwater moving objects has been developed, which of interest to the oceanographic researchers ((Mohapatra et al., 2015). Variable lighting condition and the presence of noise from high contrast debris pose challenge for object detection and tracking. Walther et al. (Walther et al., 2004) have proposed a novel method to overcome the above issues. Negrea et al. (Negrea, Thompson, Juhnke, Fryer, & Loge, 2014) have presented an adaptive background subtraction algorithm for detection and motion prediction which is used for tracking. Design of this fully automated system removes the frames without any activity and hence there is cost reduction for fish monitoring.
This problem of underwater object detection can be of two types. In first case, the object moves while the camera is static, and in second case, both the object and camera are in motion. Often in real world scenario, the second case is more prevalent and challenging than the first one. This is valid in a real-world scenario, where neither the camera model parameters nor the object is known, this motivated us to address the issue in this research work.
In this paper, we have attempted to detect the underwater video objects under varying illumination condition. The problem is formulated as an incomplete data problem and the Expectation and Maximization (EM) approach has been adopted to solve the problem. Our main contributions are: (i) three new Spatio Temporal MRF models for classification of pixel labels in the E step, (ii) new features based model parameter estimation using pipelining approach in the M step, (iii) a continuous Underwater video object detection scheme using EM framework, and (iv) the EM algorithm in Multiresolution framework. In the proposed framework, no a priori knowledge of the camera model parameters is necessary. In E-step of the EM algorithm, the video object is segmented based on the video frame model. The problem of frame label estimation is formulated as a Maximum a posterior (MAP) estimation problem and these MAP estimates are obtained by an algorithm which is a combination of Simulated Annealing (SA) and the Iterated Conditional Model (ICM) algorithm. Subsequently, in M-step, the estimated frame labels are used to estimate the intrinsic and extrinsic parameters of the camera model. The proposed features are extracted from the labelled frames and weighted appropriately before being fed to the pipeline. These weighted corner features are used to estimate the camera intrinsic and extrinsic parameters using the 2D optimization method (Zhou, Cui, Peng, & Wang, 2012). E step and M step are repeated to continuously detect the video objects with the moving camera. The camera calibration errors have been computed and the estimated parameters are chosen based on the minimum calibration error. The segmentation accuracy has been validated by four quantitative measures. The advantage of the proposed multiresolution framework is that the execution time substantially reduced as compared to considering the fine scale images. The performance of the proposed algorithm has been compared with the Stolkin’s E-MRF algorithm (Liu, Dai, Wang, Zheng, & Zheng, 2016; Prabowo, Hudayani, Purwiyanti, Sulistiyanti, & Setyawan, 2017; Rustam Stolkin, Greig, Hodgetts, & Gilby, 2008) algorithm and found to be superior.