This paper presents an architecture and algorithms for model based video object segmentation and its applications to vision augmented interactive game. We are especially interested in real time low cost vision based applications that can be implemented in software in a PC. We use different models for background and a player object. The object segmentation algorithm is performed in two different levels: pixel level and object level. At pixel level, the segmentation algorithm is formulated as a maximizing a posteriori probability (MAP) problem. The statistical likelihood of each pixel is calculated and used in the MAP problem. Object level segmentation is used to improve segmentation quality by utilizing the information about the spatial and temporal extent of the object. The concept of an active region, which is defined based on motion histogram and trajectory prediction, is introduced to indicate the possibility of a video object region for both background and foreground modeling. It also reduces the overall computation complexity. In contrast with other applications, the proposed video object segmentation system is able to create background and foreground models on the fly even without introductory background frames. Furthermore, we apply different rate of self-tuning on the scene model so that the system can adapt to the environment when there is a scene change. We applied the proposed video object segmentation algorithms to several prototype virtual interactive games. In our prototype vision augmented interactive games, a player can immerse himself/herself inside a game and can virtually interact with other animated characters in a real time manner without being constrained by helmets, gloves, special sensing devices, or background environment. The potential applications of the proposed algorithms including human computer gesture interface and object based video coding such as MPEG-4 video coding.
High-quality and low-delay MPEG-2 video coding can be achieved by avoiding the use of intra (I) and bidirectional prediction (B) pictures. Such coding requires intra macroblocks refreshing techniques for channel error propagation resilience and for compliance with the accuracy requirements of the MPEG-2 standard. This paper describes some of these techniques and presents software simulation results of their performance in terms of image quality and their robustness of transmission channel errors.
In this paper we continue our study of fast motion estimation techniques for H.263 video coding. We experiment with a hybrid method based on two techniques, the three-step search method, and out block-based gradient descent search method. Our method is a two phase procedure. For our first phase, borrowing from the three-step method, we do a search on a sparse grid. In the second phase, the block-based gradient descent search is applied to the candidate/s yielded from the first phase. The paper reports on the various parameters we have selected to test this new method, the timings for algorithms run using these parameters, and quality measurements for the resulting compressed video.
While block motion compensation has been the preferred method for reducing inter-frame dependencies in most standards for video coding (H.261, MPEG), a new proposal for very low bit rate video coding (H.263) has included overlapped block motion compensation (OBMC) as an optional mode of operation. In this paper, we present fast algorithms for motion estimation when compensating with OBMC. Standard block matching motion vectors are not optimal for OBMC. Our algorithms estimate which block motion vectors yield the most improvements upon optimizing motion, orders the blocks and optimizes motion vectors based on the ordering. The estimation is based on readily available information about block matching, viz., prediction errors over blocks. As simulation results will demonstrate, the algorithms result in near optimal performances at low computational costs. An additional advantage of the algorithms is that they may be terminated after a few motion vectors have been optimized and still result in high performance gains. This is of advantage in situations where the available computational power at the encoder varies (as in a videophony situation where the frame rate adapts depending on scene activity or available band-width) and it becomes desirable that the motion vectors chosen for optimization result in the highest gains possible.
In this paper, we present a unsupervised orthogonalization neural network, which, based on Principal Component (PC) analysis, acts as an orthonormal feature detector and decorrelation network. As in the PC analysis, this network involves extracting the most heavily information- loaded features that contained in the set of input training patterns. The network self-organizes its weight vectors so that they converge to a set of orthonormal weight vectors that span the eigenspace of the correlation matrix in the input patterns. Therefore, the network is applicable to practical image transmission problems for exploiting the natural redundancy that exists in most images and for preserving the quality of the compressed-decompressed image. We have applied the proposed neural model to the problem of image compression for visual communications. Simulation results have shown that the proposed neural model provides a high compression ratio and yields excellent perceptual visual quality of the reconstructed images, and a small mean square error. Generalization performance and convergence speed are also investigated.
A number of training algorithms for neural networks are based on the 'competition' learning method. This is regarded as an adaptive process for tuning neural networks to specific features of input. The responses from the neural network, then, tend to become localized. However, a shortcoming of this model is that some neural units can remain inactive. Since a neural unit never learns unless it wins, it is possible that some of the neural units are always outperformed by others, and therefore never learn. This paper presents a new unsupervised learning algorithm, less-interclass-disturbance learning (LID), which deals with the limitations of the simple competitive neural network. The main idea of the method is that it reinforces the competing neurons in such a way as to prevent the weights from 'fooling around.' A new compound similarity metric is introduced in this algorithm to reduce the interclass disturbance during the training process. The behavior of this algorithm was investigated through computer simulations. It is shown that LID learning is quite effective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.