Qingzeng Song, Maorui Hou, Yongjiang Xue, Jing Yu
Journal of Electronic Imaging, Vol. 33, Issue 01, 013006, (January 2024) https://doi.org/10.1117/1.JEI.33.1.013006
TOPICS: Object detection, Detection and tracking algorithms, Remote sensing, Data modeling, Feature extraction, Target detection, Performance modeling, Feature fusion, Convolution, Small targets
In recent years, deep learning-based objects detection algorithms have demonstrated exceptional performance in natural environments. These algorithms have been extensively used in various remote sensing applications, which include the detection of structures and roads as well as flood and earthquake disasters. In these applications, remote sensing images may be captured by satellites, drones, and other equipment. Compared with conventional images, they often feature substantial occlusion, intricate backgrounds and numerous small targets, which are difficult to detect because of the high resolution and large data volume. The existing algorithms focus on detection accuracy or speed, which often fail to achieve a balance between these. To solve this problem, we proposed a single-stage object detection algorithm MA-YOLO based on YOLOv4. We first design a backbone network aimed to enhance feature extraction capabilities while maintaining inference speed. Second, we introduced a parallel attention mechanism, which is to improve the detection performance of small targets. Finally, we applied an attention mechanism to the path aggregation network, which is to enhance the fusion effect of multi-scale features for detecting multi-scale targets. To validate the efficacy of our proposed approach, we evaluated MA-YOLO on three datasets: DIOR, RSOD, and NWPU VHR-10. The experimental results show that our proposed network achieves detection accuracy of 68.87%, 94.13%, and 93.77% on these datasets while ensuring the reasoning speed of 28.4 frames per second and realizes the effective balance between detection accuracy and speed.