Paper
1 August 2022 Cross-modal representation learning based on contrast learning
WeiYang Mao, Jshardrom Xia
Author Affiliations +
Proceedings Volume 12257, 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022); 1225710 (2022) https://doi.org/10.1117/12.2640128
Event: 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), 2022, Guangzhou, China
Abstract
Cross-modal retrieval refers to accurate retrieval between data from different modes. Its core task is how to measure the similarity between different modal data across semantic divides. In this paper, a novel cross-modal retrieval model CLCMR is proposed, which makes the identification of similar samples of different images more inclusive and the identification of different samples more discriminative. Specifically, the comparative learning method is introduced into the model, the pretraining dataset is processed by the data enhancement method, and the image Encode is pre-trained to make the image features extracted by the same sample closer The difference in image characteristics of similar samples improves their recognition accuracy and accuracy. In the downstream task, in order to ensure the effect of Encode, three loss functions are selected to ensure the nativeity of the data mapping of each mode into the common space, that is, to reduce the modal loss caused by the transformation space. Finally, in order to semantically align images and text, the content of Fast-RCNN was added to the CLCMR network framework for cross-modal retrieval. In this paper, several experiments are conducted on the Pascal Sentence dataset and the XmediaNet dataset, and the results show that the CLCMR cross-modal retrieval framework proposed in this paper is used The mAP value is better than some of the methods commonly used at present, which verifies the feasibility and perfection of the method.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
WeiYang Mao and Jshardrom Xia "Cross-modal representation learning based on contrast learning", Proc. SPIE 12257, 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), 1225710 (1 August 2022); https://doi.org/10.1117/12.2640128
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Feature extraction

Statistical modeling

Performance modeling

Image enhancement

Image processing

Visualization

Back to Top