Current research on multi-label image classification mainly focuses on exploring the correlation between labels to improve the classification accuracy of multi-label images. However, in the existing methods, the label correlation is calculated based on the statistical information of the data. This label correlation is global and depends on the data set, and is not suitable for all samples. In the process of extracting image features, the The characteristic information of small objects is easily lost, resulting in low classification accuracy of small objects. For this reason, this paper innovatively proposes a multi-label image classification model based on multi-scale semantic attention and graph attention network. vector, followed by feature fusion to enhance the feature information of small objects, and then use the self-attention mechanism in the graph attention module to adaptively mine the correlation between categories in the image, and propose an attention regularization loss. The mAP of the model on the two public datasets of VOC 2007 and MS-COCO 2014 reached 95.5% and 83.4%, respectively, and most of the indicators are better than the existing state-of-the-art methods.
Person re-identification (Re-ID) is an object recognition method based on visual appearance information. It is mainly restricted by the changes in person posture, shooting angles, the changes in the front, back and light of people that are mainly captured and the noises caused by shake or blur. Currently, single-frame person Re-ID is still the mainstream research. In view of the limited information of single-frame images, this paper adopts temporal attention sequence modeling to conduct research on person Re-ID based on video sequences, considering not only the content information of images but also the movement information between frames, etc.
In this paper, a temporal attention quality aware network (TA-QAN) is proposed. By extracting the temporal information between frames, all the frame sequences in the complementary information are effectively aggregated, and the influence of the quality image region is significantly reduced. The temporal attention quality aware network is used to extract temporal information between frames through temporal convolution. The comparison experiment with other feature extraction methods shows that the proposed method has the best performance in PRID 2011 and iLIDS-VID2014 data sets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.