KEYWORDS: Convolution, Data modeling, Video, RGB color model, Bone, Feature extraction, Motion models, Data hiding, Visual process modeling, Neural networks
Human action recognition task has gradually become one of the most popular research topics in the field of computer vision. In this task, the action recognition based on human bone data is the most attractive. The human skeleton data contains a lot of correlation information and hidden information, so this kind of task model can well extract the difference characteristics and human movement trajectory, etc., which plays a key role in improving the accuracy of the task. At the same time, the skeleton-based action recognition algorithm based on CNN, RNN, GCN, LSTM and other basic models improves the task capability from the aspects of accuracy, computational complexity and so on. From this perspective, this paper reviews the deep learning models and variants of action recognition based on skeleton data, and also summarizes the bone information datasets used for such tasks.
Attention mechanism is one of the most basic and core tasks in computer vision. Its essence is to locate the information in the region of interest and suppress useless information. The results are usually displayed in the form of probability graph or probability eigenvector. Attention mechanism has become an important concept in convolutional neural network, which has been widely studied in different application fields and has strong practical value. This paper introduces the classification of attention mechanism and its application in fine-grained image recognition. The classification is mainly divided into channel attention mechanism, spatial attention mechanism and channel spatial mixed attention mechanism. Finally, the future research direction of attention mechanism in fine-grained images is discussed.
KEYWORDS: Video, Convolution, Video surveillance, 3D modeling, Optical flow, Neural networks, Feature extraction, Motion models, Data modeling, Video processing
Human action recognition is a basic problem in video tasks. The deep learning method has profound theoretical research significance and strong application value in intelligent monitoring, automatic driving, medical care and other aspects. Firstly, the paper summarizes the video preprocessing and the improvement of network structure, and concludes the latest and hottest deep learning methods in the field of human action recognition. Then two kinds of datasets related to human action recognition are listed and introduced in detail.
Object detection is one of the most basic and central task in computer vision. Its task is to find all the interested objects in the image, and determine the category and location of the objects. Object detection is widely used and has strong practical value and research prospects. Applications include face detection, pedestrian detection and vehicle detection. In recent years, with the development of convolutional neural network, significant breakthroughs have been made in object detection. This paper describes in detail the classification of object detection algorithms based on deep learning. The algorithms are mainly divided into one-stage object algorithm and two-stage object algorithm, and the general data sets and performance indicators of object detection.
In order to avoid the influence of external factors on the subsequent recognition of RGB video and improve the accuracy of human motion recognition, an algorithm of human action recognition based on Two-Stream Ind Recurrent Neural Network is proposed. In terms of extracting features, the temporal network extracts the information on the 3D coordinate of different joints at each time and classifies it by a softmax layer. The spatial network converts the spatial positional relationship of the joints at each moment into a skeleton sequence and inputs it into the softmax layer to classify. Finally, the results of the classification of the temporal network and the spatial network are weighted and summed to obtain the final classification result. Experiments verify the validity of the model on the largest 3D skeleton action recognition dataset NTU RGB + D and SBU interactive dataset.
In response to the slow running speed of the Deformable Part Model algorithm in the process of the pedestrian detection, this paper incorporated Cascade Detection algorithm and Branch-and-Bound algorithm into a fast pedestrian detection algorithm which is based on Deformable Part Model. In a pedestrian detection process, a sequence model evaluates individual parts sequentially to quickly prune most of the smaller possible objects. This aims to accelerate the process of object positioning, and to optimize global classification results in all possible image regions. Meanwhile, the boundaries of the maximum are adopted to search the clipping operation of the window. In order to improve detection speed without compromising the accuracy of the detection, this paper increase the number of the part models involved. According to the experimental results on INRIA data set, the proposed algorithm successfully improved the accuracy and the speed of detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.