22 April 2024 Real-time semantic segmentation with dual interaction fusion network
Shenming Qu, Jiale Duan, Yongyong Lu, Can Cui, Yuan Xie
Author Affiliations +
Abstract

Real-time semantic segmentation is critical in industries, such as autonomous driving and robotics, requiring both accuracy and speed. However, existing real-time segmentation algorithms often sacrifice low-level details to improve inference speed, leading to decreased segmentation accuracy. Therefore, we propose a new real-time semantic segmentation model dual interaction fusion network (DIFNet) to alleviate this problem. First, we propose a lightweight dual decoding fusion structure, which increases the focus on the low-level feature information and can extract richer edge details, while the structure reduces the computational overhead by decreasing the number of channels of the feature map during fusion. In addition, we construct a cross attention module to cross-weight fusion of high-level and low-level features through attention mechanism, which increases the interaction between features and effectively extracts features at different levels. Finally, we design a comprehensive perception module that introduces dilated convolution to expand the model’s receptive field, enabling it to better capture global features. Our network was validated on the Cityscapes and CamVid datasets. Specifically, on a single Nvidia GTX 2080 Ti, DIFNet achieves 77.6% mIoU at 83.9 frames per second (FPS) for 1536×768 inputs on Cityscapes test set and 77.0% mIoU at 135.8 FPS for 960×720 inputs on CamVid.

© 2024 SPIE and IS&T
Shenming Qu, Jiale Duan, Yongyong Lu, Can Cui, and Yuan Xie "Real-time semantic segmentation with dual interaction fusion network," Journal of Electronic Imaging 33(2), 023055 (22 April 2024). https://doi.org/10.1117/1.JEI.33.2.023055
Received: 31 December 2023; Accepted: 9 April 2024; Published: 22 April 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Semantics

Feature fusion

Content addressable memory

Convolution

Image fusion

Feature extraction

Back to Top