This paper presents a new method for action recognition using an extremely low-resolution infrared imaging sensor. Thermopile arrays give users privacy but this comes at the price of limited information captured. The question of what methods are applicable to this sensor remains open. In our work, we adopt a two-stream deep learning architecture that accepts both spatial and temporal sequences, processes them based on CNN and stacked GRU layers separately, and finally fuses the features for action classification. To the best of our knowledge, this is the first optical-flow-based method used in combination with extremely low-resolution thermal image sequences. We use a dataset of 16 × 16 pixel image sequences introduced by a related work to directly compare the results and demonstrate the superiority of our method. Experiments show that we are able to achieve a gain of nearly 6% (96.98% vs. 91.07%) in recognition accuracy in 5-classes setup classification.
The goal of this research is to apply the state-of-the-art deep learning approach to human fall-down event detection based on Motion History Images (MHI) from multiple color video sequences captured at different viewing angles. MHI is derived by detecting and combining temporal 2D human contours from surveillance cameras. A human action can then be represented by several continuous MHI images. We then use deep learning approach (CNN + LSTM architectures) to recognize the fall-down behavior from MHI sequences. Our method is capable of not only recognizing the actions of walking, standing, falling down, but also rising after falling down to avoid excessive false alarms. The accuracy of classification into the above 4 short-term actions is capable of achieving 97.66%. We also compare the performances of deep learning architectures that use simple CNN or CNN+LSTM, one or two-stage training, and single or two cameras. Our contributions lie on two aspects: (1) improving the performance on human action recognition based on MHIs and a combination of CNN+LSTM architecture, (2) preventing the false alarm of falling-down events that actually need no help.
Rate control for video encoders can be partitioned into procedures of rate allocation and bit achievement at several
levels. For non-real-time applications, we propose to adopt a two-pass video encoding mechanism, where the framelevel
rate allocation in the second pass is characterized of using content-aware models constructed by using information
collected in the first pass. In our proposed scheme, a video sequence is divided into units of non-overlapping window,
where a two-pass procedure is sequentially applied. The goal of the first pass is to find the encoding modes and motion
vectors (MVs) for each MB and generate the proposed R-λ (rate vs. Lagrangian multiplier) and D-λ (distortion vs.
Lagrangian multiplier) models to represent content characteristics of each frame. In the second pass, the encoding modes
and MVs for MBs found in the first pass are retained, but the QPs are re-assigned MB-by-MB according to bit-rate and
distortion predictions based on the above-constructed models. The performance of the proposed two-pass video encoder
was compared with the H.264 reference model JM8.0. From the results, the proposed method outperforms JM 8.0 in
PSNR by up to 1.38 dB for the test images used. Besides, the proposed algorithm is capable of keeping a controlled bit
rate of much more precision (< 0.017% of error, two orders better than JM8.0), as well as a slightly smoother video
quality, than that obtained by JM 8.0.
A watermarking scheme, proposed to embed both image-dependent and fixed-part marks for dual protection (content
authentication and copyright claim) of JPEG images, is described in this paper. To achieve the goals of efficiency,
imperceptibility, and robustness, a compressed-domain informed embedding algorithm, which adopts Lagrangian
multiplier optimization approach followed by an iterative refinement procedure, is developed. To robustly detect the
fixed-part watermark, a two-stage watermark extraction procedure is devised. At the first stage, the semi-fragile
watermark in each channel is extracted for content authentication. At the second stage, a weighted soft-decision decoder,
in which the signal detected in each channel is weighted according to the estimated channel condition, is used to raise the
recovery rate of the fixed-part watermark for copyright protection. Experimental results manifest that the proposed
scheme can not only achieve the purposes of content authentication (semi-fragile watermarks to resist mild image
alterations and detect malicious tampered regions) and copyright protection (robust watermarks to claim the ownership),
but also maintain higher visual quality (by at least 4 dB than the prior method) at a specified watermark robustness.
In this paper, we propose a multi-mode linear prediction (MM_LP) scheme for the compression of multi-spectral satellite images. This scheme, extending our prior work on block-based single mode linear prediction, discards the prediction residuals and transforms the traditional residual-encoding problem into another mode-map encoding problem. The increase in the extra storage for more coefficients is nearly negligible and the compression of mode-map might be expected to have a higher efficiency than the residuals can achieve. We also propose an alternative scheme to hide the mode information in the LSB (least significant bit) of the residual data, which are then encoded to give a nearly lossless compression with PSNR larger than 51 dB (error variance (sigma) 2 equals 0.5/per pixel). Comprehensive experiments justify performance of our MM_LP schemes and recommend that MM_LP (k >= 2) is suitable for PSNR less than 41.5 dB; single-mode LP (k equals 1) is for PSNR between 41.5 dB and 50 dB, while 2-mode mode- embedding approach is for PSNR > 50 dB.
A mechanism is presented to achieve adaptive scene
contrast enhancement-a common problem in TV and IR imager,j applications-by controlling camera gain and pedestal in an automatic fashion. The goal of adaptivity, in a precise meaning, is "content
windowing" where image signals are selectively extracted and contrast enhanced, probably with respect to both dynamic-range compression and expansion. We adopt the image-analysis strategy,
distinct from classical electronic methods (e.g., automatic gain control circuitry), such that the overall behavior of frame pixels (e.g., image histogram) is optimized for feedback control of camera gain and pedestal in a live video process. The video formation process is linearly modeled so that we can derive an automatic control fashion to meet the proposed image-quality criterion, which is naturally simple and flexible for practical use in a variety of applications. Experiments show that our method adapts well in dynamic environments and can be easily hardware implemented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.