With the increasing popularity of applications such as unmanned driving, the ability of environment perception has become more and more important, and the most common expression of environment perception is semantic reconstruction. Therefore, more and more researchers are trying to synthesize the information from multiple sensors to achieve better semantic reconstruction effects. However, most of the current estimation methods (a). Too bulky to run in real-time (b). Failure to effectively use the information of a variety of different sensors (c). Failure to generate sufficient environmental perception information under limited computing power, such as semantic information and depth information. Therefore, this paper proposes a multi-modal joint estimation network for semantic reconstruction, which can solve the above problems. Our method takes RGB image and sparse depth as input. By adding multi-scale information to the neural network, it outputs semantic segmentation and depth recovery results simultaneously while maintaining light-weighted and real-time performance, then fuses both results in point clouds to get better environment perception ability. A large number of experiments show that our method has better performance than other methods in the same application scenario.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.