26 March 2024 Assisting RGB and depth salient object detection with nonconvolutional encoder: an improvement approach
Shuo Zhang, Mengke Song, Luming Li
Author Affiliations +
Abstract

RGB-D salient object detection is a challenging task in computer vision, and deep architectures have been widely adopted in the previous studies. However, current convolutional neural network (CNN)-based models struggle with capturing global long-distance features efficiently, whereas transformer-based methods are computationally intensive. To address these limitations, we propose a nonconvolutional feature encoder. This encoder captures long-distance dependencies while reducing computation costs, making it a potential alternative to CNNs and transformers. Additionally, we introduce a spatial info enhancing mechanism to overcome weakened local information while capturing long-range dependencies. This mechanism balances local and global information at different expansion rates by exploring multiscale feature fusion in the feature maps. Furthermore, we introduce a spatial info sensing module to enhance the compatibility of multimodal features in long-range dependencies and extract informative cues from depth features. Through comprehensive experiments on four widely used datasets, we demonstrate that our proposed involution encoder significantly outperforms previous state-of-the-art RGB-D salient object detection methods based on CNNs in four key metrics. Compared to transformer-based methods, our approach balances speed and efficiency favorably.

© 2024 SPIE and IS&T
Shuo Zhang, Mengke Song, and Luming Li "Assisting RGB and depth salient object detection with nonconvolutional encoder: an improvement approach," Journal of Electronic Imaging 33(2), 023036 (26 March 2024). https://doi.org/10.1117/1.JEI.33.2.023036
Received: 15 November 2023; Accepted: 12 February 2024; Published: 26 March 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
RGB color model

Object detection

Feature fusion

Lithium

Convolution

Feature extraction

Performance modeling

Back to Top