Our interest is in data registration, object recognition and object tracking using 3D point clouds. There are three steps to our feature matching system: detection, description and matching. Our focus will be on the feature description step. We describe new rotation invariant 3D feature descriptors that utilize techniques from the successful 2D SIFT descriptors. We experiment with a variety of synthetic and real data to show how well our newly developed descriptors perform relative to a commonly used 3D descriptor, spin images. Our results show that our descriptors are more distinct than spin images while remaining rotation and translation invariant. The improvement in performance incomparison to spin images is most evident when an object has features that are mirror images of each other, due to symmetry.
In this paper, we focus on the problem of automated surveillance in a parking lot scenario. We call our research system
VANESSA, for Video Analysis for Nighttime Surveillance and Situational Awareness. VANESSA is capable of: 1)
detecting moving objects via background modeling and false motion suppression, 2) tracking and classifying pedestrians
and vehicles, and 3) detecting events such as person entering or exiting a vehicle. Moving object detection utilizes a
multi-stage cascading approach to identify pixels that belong to the true objects and reject any spurious motion, (e.g.,
due to vehicle headlights or moving foliage). Pedestrians and vehicles are tracked using a multiple hypothesis tracker
coupled with a particle filter for state estimation and prediction. The space-time trajectory of each tracked object is
stored in an SQL database along with sample imagery to support video forensics applications. The detection of pedestrians
entering/exiting vehicles is accomplished by first estimating the three-dimensional pose and the corresponding entry
and exit points of each tracked vehicle in the scene. A pedestrian activity model is then used to probabilistically assign
pedestrian tracks that appear or disappear in the vicinity of these entry/exit points. We evaluate the performance of
tracking and pedestrian-vehicle association on an extensive data set collected in a challenging real-world scenario.
Moving cameras are needed for a wide range of applications in robotics, vehicle systems, surveillance, etc. However, many foreground object segmentation methods reported in the literature are unsuitable for such settings; these methods assume that the camera is fixed and the background changes slowly, and are inadequate for segmenting objects in video if there is significant motion of the camera or background. To address this shortcoming, a new method for segmenting foreground objects is proposed that utilizes binocular video. The method is demonstrated in the application of tracking and segmenting people in video who are approximately facing the binocular camera rig. Given a stereo image pair, the system first tries to find faces. Starting at each face, the region containing the person is grown by merging regions from an over-segmented color image. The disparity map is used to guide this merging process. The system has been implemented on a consumer-grade PC, and tested on video sequences of people indoors obtained from a moving camera rig. As can be expected, the proposed method works well in situations where other foreground-background segmentation methods typically fail. We believe that this superior performance is partly due to the use of object detection to guide region merging in disparity/color foreground segmentation, and partly due to the use of disparity information available with a binocular rig, in contrast with most previous methods that assumed monocular sequences.
We present a framework for estimating 3D relative structure (shape) and motion given objects undergoing non-rigid deformation as observed from a fixed camera, under perspective projection. Deforming surfaces are approximated as piece-wise planar, and piece-wise rigid. Robust registration methods allow tracking of corresponding image patches from view to view and recovery of 3D shape despite occlusions, discontinuities, and varying illumination conditions. Many relatively small planar/rigid image patch trackers are scattered throughout the image; resulting estimates of structure and motion at each patch are combined over local neighborhoods via an oriented particle systems formulation. Preliminary experiments have been conducted on real image sequences of deforming objects and on synthetic sequences where ground truth is known.
A computer vision method is presented for recognizing the non-rigid motion observed in objects moving in a 3D environment. This method is embedded in a more complete mechanism that integrates low-level (image processing), mid- level (recursive 3D trajectory estimation), and high-level (action recognition) processes. Multiple moving objects are observed via a single, uncalibrated video camera. A Kalman filter formulation is used in estimating the relative 3D motion trajectories. The recursive estimation process provides a prediction and error measure that is exploited in higher-level stages. In this paper we concentrate in the action recognition stage. The 3D trajectory, occlusion, and segmentation information are utilized in extracting stabilized views of the moving object. Trajectory-guided recognition (TGR) is then proposed as an efficient method for adaptive classification of action. The TGR approach is demonstrated using 'motion history images' that are then recognized via a mixture of Gaussian classifier. The system was tested in recognizing various dynamic human outdoor activities; e.g., running, walking, roller blading, and cycling.
A framework for object recognition via combinations of nonrigid deformable appearance models is described. An object category is presented as a combination of deformed prototypical images. An object in an image can be represented in terms of its geometry (shape) and its texture (visual appearance). We employ finite element based methods to represent the shape deformations more reliably and automatically register the object images by warping them onto the underlying finite element mesh for each prototype shape. Vectors of objects from the same class (like faces) can be thought to define an object subspace. Assuming that we have enough prototype images that encompass major variations inside the class, we can span the complete object subspace. Thereafter, by virtue of our subspace assumption, we can express any novel object from the same class as a combination of the prototype vectors. We present experimental results to evaluate this strategy and finally, explore the usefulness of the combination parameters for analysis, recognition and low-dimensional object encoding.
We describe the Photobook system, which is a set of interactive tools for browsing and searching images and image sequences. These tools differ from those used in standard image databases in that they make direct use of the image content rather than relying on annotations. Direct search on image content is made possible by use of semantics-preserving image compression, which reduces images to a small set of perceptually significant coefficients. We describe three Photobook tools in particular: one that allows search based on gray-level appearance, one that uses 2-D shape, and a third that allows search based on textural properties.
Previously, we have describe modal analysis, an efficient, physically-based solution for recovering, tracking, and recognizing solid models from 2-D and 3-D sensor data. The underlying representation consists of two levels: modal deformations, which describe the overall shape of a solid, and displacement maps, which provide local and fine surface detail. In this paper, we give details about the mathematics behind implicit function and displacement map calculations. In addition, we describe an extension which can be used to incorporate measurement uncertainty in the recovered modal deformation parameters. The result is an energy-based implicit function; as a consequence, collision detection, path planning, dynamic simulation, and model comparisons can frequently be performed in closed-form--even for complex shapes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.