Depth estimation and semantic segmentation are crucial for visual perception and scene understanding. Multi-task learning, which captures shared features across multiple tasks within a scene, is often applied to depth estimation and semantic segmentation tasks to jointly improve accuracy. In this paper, a deformable attention-guided network for multi-task learning is proposed to enhance the accuracy of both depth estimation and semantic segmentation. The primary network architecture consists of a shared encoder, initial pred modules, deformable attention modules and decoders. RGB images are first input into the shared encoder to extract generic representations for different tasks. These shared feature maps are then decoupled into depth, semantic, edge and surface normal features in the initial pred module. At each stage, effective attention is applied to depth and semantic features under the guidance of fusion features in the deformable attention module. The decoder upsamples each deformable attention-enhanced feature map and outputs the final predictions. The proposed model achieves mIoU accuracy of 44.25% and RMSE of 0.5183, outperforming the single task baseline, multi-task baseline and state-of-the-art multi-task learning model.
The multi-frequency hierarchical method is a well-established temporal phase unwrapping technique known for its high precision. However, its requirement for numerous images significantly reduces the speed of 3D reconstruction. To address this limitation, we propose a novel coarse-fine combined phase unwrapping method. This method reallocates the actual phase order into sequentially arranged coarse and fine orders. The coarse order provides the primary range for the actual phase, while the fine orders precisely locate it. The proposed method requires only six images to achieve high-frequency absolute phase. Experimental results demonstrate that, compared to the popular three-frequency three-step phase-shifting method, the proposed method reduces the number of projection patterns by three while maintaining similar accuracy. Therefore, this method holds significant potential for high-speed 3D reconstruction applications.
Fringe projection profilometry (FPP) is a high-precision, non-contact measurement technique. The quality of 3D reconstruction largely depends on the projection quality and the number of phase shifts and frequencies. Traditional MEMS projection methods project fringes only during the forward scan, limiting projection light intensity as they do not utilize the reverse scan. To address this, a high-quality fringe projection system is developed using an FPGA and a uniaxial MEMS scanning mirror. The method projects 8-bit fringe patterns based on angle interval signals and uses both forward and reverse scans. By projecting patterns in both directions and reversing the forward pattern during the backward scan, the projection light intensity is effectively doubled compared to unidirectional methods. This bidirectional scanning approach projects the same pattern during both the forward and backward scans, doubling light intensity and improving the Signal-to-Noise Ratio (SNR) of captured images, thus enhancing the reconstruction accuracy of the MEMS-based system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.