Few-shot classification aims to classify samples with a limited quantity of labeled training data, and it can be widely applied in practical scenarios such as wastewater treatment plants and healthcare. Compared with traditional methods, existing deep metric-based algorithms have excelled in few-shot classification tasks, but some issues need to be further investigated. While current standard convolutional networks can extract expressive depth features, they do not fully exploit the relationships among input sample attributes. Two problems are included here: (1) how to extract more expressive features and transform them into attributes, and (2) how to obtain the optimal combination of sample class attributes. This paper proposes a few-shot classification method based on manifold metric learning (MML) with feature space embedded in symmetric positive definite (SPD) manifolds to overcome the above limitations. First, significant features are extracted using the proposed joint dynamic convolution module. Second, the definition and properties of Riemannian popular strictly convex geodesics are used to minimize the proposed MML loss function and obtain the optimal attribute correlation matrix A. We theoretically prove that the MML is popularly strictly convex in the SPD and obtain the global optimal solution in the closed space. Extensive experimental results on popular datasets show that our proposed approach outperforms other state-of-the-art methods.
The online multiobject tracking (MOT), which integrates object detection and tracking into a single network, has made breakthroughs in recent years. However, most online trackers have a more monotonous prediction of the tracking offset in two consecutive frames, which may not be reliable when facing extreme situations such as occlusion and object deformation. Once the tracking offset of an object is biased, its corresponding tracklet will no longer maintain temporal consistency, which will seriously affect the tracking performance. In this paper, we propose a new online multiple object tracker with feature enhancement mechanism, namely En-Tracker. In En-Tracker, a multibranch kinematic analysis network (MKANet) is designed to address the above problems. MKANet estimates the pixel offset and instance offset of the object in parallel based on imitating the human thinking to joint position and appearance representations. Note that these two types of offsets compensate and facilitate each other to effectively deal with some extreme scenarios. In addition, we propose a kinematic-assisted feature synthesis enhancement (KFSE) module, which has a more comprehensive enhancement mechanism. Specifically, KFSE propagates previous tracking information to the current frame based on kinematic trend analysis while enhancing the characterization of detection features and appearance embeddings, which not only assists in object detection but also ensures the uniqueness of the appearance embeddings. Extensive experiments on MOT16 and MOT17 verify the effectiveness and advantage of our model.
Visual simultaneous localization and mapping (VSLAM) is one of the foremost principal technologies for intelligent robots to implement environment perception. Many research works have focused on proposing comprehensive and integrated systems based on the static environment assumption. However, the elements whose motion status changes frequently, namely short-term dynamic elements, can significantly affect the system performance. Therefore, it is extremely momentous to cope with short-term dynamic elements to make the VSLAM system more adaptable to dynamic scenes. This paper proposes a coarse-to-fine elimination strategy for short-term dynamic elements based on motion status check (MSC) and feature points update (FPU). First, an object detection module is designed to obtain semantic information and screen out the potential short-term dynamic elements. And then an MSC module is proposed to judge the true status of these elements and thus ultimately determine whether to eliminate them. In addition, an FPU module is introduced to update the extracted feature points according to calculating the dynamic region factor to improve the robustness of VSLAM system. Quantitative and qualitative experiments on two challenging public datasets are performed. The results demonstrate that our method effectively eliminates the influence of short-term dynamic elements and outperforms other state-of-the-art methods.
The goal of pedestrian trajectory prediction is to predict the future trajectory according to the historical one of pedestrians. Multimodal information in the historical trajectory is conducive to perception and positioning, especially visual information and position coordinates. However, most of the current algorithms ignore the significance of multimodal information in the historical trajectory. We describe pedestrian trajectory prediction as a multimodal problem, in which historical trajectory is divided into an image and coordinate information. Specifically, we apply fully connected long short-term memory (FC-LSTM) and convolutional LSTM (ConvLSTM) to receive and process location coordinates and visual information respectively, and then fuse the information by a multimodal fusion module. Then, the attention pyramid social interaction module is built based on information fusion, to reason complex spatial and social relations between target and neighbors adaptively. The proposed approach is validated on different experimental verification tasks on which it can get better performance in terms of accuracy than other counterparts.
Visual target tracking is an important function in real-time video monitoring application, whose performance determines the implementation of many advanced tasks. At present, Siamese-network trackers based on template matching show great potential. It has the advantage of balance between accuracy and speed, due to the pre-trained convolutional network to extract deep features for target representation and off-line tracking of each frame. During tracking, however, the target template feature is only obtained from the first frame of the video in the existing algorithms. The tracking performance is completely depending on the framework of template matching, resulting in the independence of frames and ignoring the feature of inter-frame connection of video sequence. Therefore, the existing algorithms do not perform well in the face of large deformation and severe occlusion. We propose a long short-term memory (LSTM) improved Siamese network (LSiam) model, which takes advantages of both time-domain regression capability of the LSTM and the balanced ability in tracking accuracy and speed of Siamese network. It focus on the temporal and spatial correlation information between video sequences to improve the traditional Siamese-network trackers with an LSTM prediction module. In addition, an improved template updating module is constructed to combine the original template with the changed appearance. The proposed model is verified in two types of difficult scenarios: deformation challenge and occlusion challenge. Experimental results show that our proposed approach can get better performance in terms of tracking accuracy.
Most of the current semantic segmentation approaches have achieved state-of-the-art performance relying on fully convolutional networks. However, the consecutive operations such as pooling or convolution striding lead to spatially disjointed object boundaries. We present a dense boundary regression architecture (DBRS2), which aims to use boundary cues to aid high-level semantic segmentation task. Specifically, we first propose a multilevel guided low-level boundary (MG-LB) learning method, where we exploit multilevel convolutional features as guidance for low-level boundary detection. The predicted MG-LB boundaries are used to enable consistent spatial grouping and enhance precise adherence to segment boundaries. Then, we present a significant global energy model based on boundary penalty and appearance penalty, which are respectively defined on the predicted boundaries and coarse segmentations obtained by the DeepLabv3 network. Finally, the refined segmentations are regressed by minimizing the global energy model. Extensive experiments over PASCAL VOC 2012, ADE20K, CamVid, and BSD500 datasets demonstrate that the proposed approach can obtain state-of-the-art performance on both semantic segmentation and boundary detection tasks.
Convolutional neural network (CNN)-based approaches have received state-of-the-art results in scene classification. Features from the output of fully connected (FC) layers express one-dimensional semantic information but lose the detailed information of objects and the spatial information of scene categories. On the contrary, deep convolutional features have been proved to be more suitable for describing an object itself and the spatial relations among objects in an image. In addition, the feature map from each layer is max-pooled within local neighborhoods, which weakens the invariance of global consistency and is unfavorable to scenes with highly complicated variation. To cope with the above issues, an orderless multi-channel mid-level image representation on pre-trained CNN features is proposed to improve the classification performance. The mid-level image representation of two channels from the FC layer and the deep convolutional layer are integrated at multi-scale levels. A sum pooling approach is also employed to aggregate multi-scale mid-level image representation to highlight the importance of the descriptors beneficial for scene classification. Extensive experiments on SUN397 and MIT 67 indoor datasets demonstrate that the proposed method achieves promising classification performance.
KEYWORDS: Principal component analysis, Signal to noise ratio, Telescopes, Databases, Space telescopes, Electronics engineering, Pattern recognition, Astronomical telescopes, Absorption, Spectral data processing
The problem of identifying spectra collected by large sky survey telescope is urgent to study to help astronomers
discover new celestial bodies. Due to spectral data characteristics of high-dimension and volume, principle component
analysis (PCA) technique is commonly used for extracting features and saving operations. Like many other matrix
factorization methods, PCA lacks intuitive meaning because of its negativity. In this paper, non-negative matrix
factorization (NMF) technique distinguished from PCA by its use of nonnegative constrains is applied to stellar spectral
type classification. Firstly, NMF was used to extract features and compress data. Then an efficient classifier based on
distance metric was designed to identify stellar types using the compressed data. The experiment results show that the
proposed method has good performance over more than 70,000 real stellar data of Sloan Digital Sky Survey (SDSS).
And the method is promising for large sky survey telescope projects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.