A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection

A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection

Gang Liu, Chuyi Wang
Copyright: © 2020 |Pages: 14
DOI: 10.4018/IJDWM.2020070107
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Neural network models have been widely used in the field of object detecting. The region proposal methods are widely used in the current object detection networks and have achieved well performance. The common region proposal methods hunt the objects by generating thousands of the candidate boxes. Compared to other region proposal methods, the region proposal network (RPN) method improves the accuracy and detection speed with several hundred candidate boxes. However, since the feature maps contains insufficient information, the ability of RPN to detect and locate small-sized objects is poor. A novel multi-scale feature fusion method for region proposal network to solve the above problems is proposed in this article. The proposed method is called multi-scale region proposal network (MS-RPN) which can generate suitable feature maps for the region proposal network. In MS-RPN, the selected feature maps at multiple scales are fine turned respectively and compressed into a uniform space. The generated fusion feature maps are called refined fusion features (RFFs). RFFs incorporate abundant detail information and context information. And RFFs are sent to RPN to generate better region proposals. The proposed approach is evaluated on PASCAL VOC 2007 and MS COCO benchmark tasks. MS-RPN obtains significant improvements over the comparable state-of-the-art detection models.
Article Preview
Top

Introduction

Object detection aims to recognize and localize each object instance with a bounding box. As a classical problem in the field of computer vision, it is widely used in autonomous vehicle (Kim et al., 2017) and assistive robots (Martinez-Martin & Del Pobil, 2017). The traditional object detection methods are generally based on scale invariant feature transform (SIFT) (Lowe, 2004) and histogram of oriented gradient (HOG) (Dalal & Triggs, 2005). These methods extract the object features and sweep through the image to find regions with a class-specific maximum response. However, these methods perform well only on constrained object categories and are sensitive to noise. These problems limit the application range of the traditional object detection methods.

Recently, deep learning technology (DL) (Schmidhuber,2015) has been widely used in object detection. The object detection methods based on deep learning can be grouped into the region-free methods (Redmon et al., 2016; Liu et al., 2016) and the region-based methods (Girshick et al., 2014; Ren et al., 2017). The region-free methods frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. These methods improve the detection speed, but they still struggle with the accuracy. The region-based methods select the candidate bounding boxes based on region proposals, and then a region-wise subnetwork is designed to classify and refine these candidate bounding boxes.

Regions with CNN features (R-CNN) (Girshick et al., 2014) is a pioneer in introducing deep learning into object detection. R-CNN uses selective search to generate many region proposals and convolutional neural networks (CNNs) (Alex et al., 2012) are used to classify objects in these region proposals. It has made significant improvements in detecting more general object categories. However, selective search takes a long time to compute proposals and feature computation in R-CNN is time-consuming, as it repeatedly applies the deep convolutional networks to thousands of warped region proposals per image (He et al., 2014) . Hence, its detection speed is slow and detection efficiency is low. Fast-RCNN (Girshick, 2015) improves training and testing speed and detection accuracy of R-CNN by enabling end-to-end detector training on shared convolutional features. However, there are still some issues in this method: (1) Although Fast-RCNN has reduced the running time of these detection networks, region proposal computation has become the bottleneck; (2) The size of the feature map outputted by the last layer of VGG16 (Chen, Krishna, Emer, &Sze, 2017) is too coarse for classification of some instances with small size in Fast R-CNN; (3) Neighboring regions may overlap each other seriously.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing