The main goal of object detection is to recognize and locate the object of interest from the static image or video sequence. It is one of the key tasks in the field of computer vision. However, there are many factors in brightness, shape, color and occlusion of targets, and they are disturbed by complex environmental factors, which make the research opportunities and challenges of object detection algorithms coexist. In this paper, two main frameworks of object detection algorithm based on convolutional neural network are researched, which are based on region proposals and regression idea respectively. Then we present a joint mechanism algorithm for object detection. This algorithm makes a balance between detection efficiency and accuracy to make it more meet the actual needs. The internal of the algorithm is adjusted and optimized, so that the two detectors can make their own judgments according to the characteristics of the image, and decide whether to detect the object to classify and locate it, so that the efficiency is higher and the accuracy is also improved.
Object detection is the basic research direction in the field of computer vision. It provides basic image information data for other advanced computer vision processing and analysis tasks. With the continuous breakthrough of deep machine learning technology, especially convolutional neural network model in the field of digital image processing shows a strong ability to extract image features. By choosing the depth separable convolution layer to replace the standard convolution layer used in the traditional model, the number of parameters of CNN network model is compressed. Depth Separable Convolution Layer (DSCL) decomposes the standard convolution layer factor into depth convolution layer and point convolution layer, and extracts and merges image features in two steps to reduce the number of parameters. By introducing a depth-separable convolution layer instead of a standard convolution layer, the number of parameters of the model convolution layer is reduced by 78.1%. We choose image feature pyramid network to fuse the image features extracted from each layer of CNN network, so that the target detection model can use matching image fusion features for different size and shape of the target to be detected. The average detection precision on the PASCAL VOC dataset increased to 77.5%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.