PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE-IS&T Proceedings Volume 7252, including the Title Page, Copyright information, Table of Contents, and Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
General object recognition involves recognizing an object in a scene in the presence of several distortions and when its
location is not known. Since the location of the test object in the scene is unknown, a classifier needs to be applied for
different locations of the object over the test input. In this scenario, distortion-invariant filters (DIFs) are attractive, since
they can be applied (efficiently and fast) for different shifts using the fast Fourier transform (FFT). A single DIF handles
different object distortions (e.g. all aspect views and some range of scale and depression angle). In this paper, we show a
new approach that combines DIFs and the kernel technique (to form "kernel DIFs"), addresses the need for fast on-line
filter shifts, and improves performance. We consider polynomial and Gaussian kernels (polynomial results are
emphasized here). We consider kernel versions of the synthetic discriminant function (SDF) filter and DIFs that
minimize an energy function such as the minimum average correlation energy (MACE) filter. We provide insight into
and compare several different formulations of kernel DIFs. We emphasize proper formulations of kernel DIFs and
provide data in many cases to show that they perform better. We recall that kernel SDF filters are the most
computationally efficient ones and thus emphasize them. We use the performance of the minimum noise and correlation
energy (MINACE) filter as the baseline to which we compare kernel SDF filter results. We consider the classification of
two true-class objects and the rejection of unseen clutter and unseen confuser-class objects with full 360° aspect view
distortions and with a range of scale distortions present (shifts of all test images are addressed for the first time, for
kernel DIFs); we use CAD (computer-aided design) infrared (IR) data to synthesize objects with the necessary
distortions and we use only problematic (blob) real IR clutter data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
History shows that problems that cause human confusion often lead to inventions to solve the problems,
which then leads to exploitation of the invention, creating a confusion-invention-exploitation cycle.
Robotics, which started as a new type of universal machine implemented with a computer controlled
mechanism in the 1960's, has progressed from an Age of Over-expectation, a Time of Nightmare, an Age
of Realism, and is now entering the Age of Exploitation.
The purpose of this paper is to propose architecture for the modern intelligent robot in which sensors permit
adaptation to changes in the environment are combined with a "creative controller" that permits adaptive
critic, neural network learning, and a dynamic database that permits task selection and criteria adjustment.
This ideal model may be compared to various controllers that have been implemented using Ethernet, CAN
Bus and JAUS architectures and to modern, embedded, mobile computing architectures. Several
prototypes and simulations are considered in view of peta-computing. The significance of this comparison
is that it provides some insights that may be useful in designing future robots for various manufacturing,
medical, and defense applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a micromanipulation platform for micro- and nanoscale applications. The micromanipulation
platform is a device platform that can be used for different applications that require actuation and sensing at nanometer
resolution. Presently, nanoactuation devices on the market are very expensive, and often limited in applications. Our
approach is to make the platform with off-the-shelf components and thus enable reasonable cost of the instrument.
In this paper we present a generalized modular architecture for both the device hardware and the control software on a
PC. The modular architecture enables swift changing of actuators, sensors and tools with minimal effort, thus being an
ideal frame for various applications. As a test case we present an adhesion measurement by pushing a small particle on a
coated surface and show how the architecture is used in this context. The test case shows several problems that occur in
nanoscale devices and how the device platform overcomes these problems. The results of the test case are analyzed and
the results are presented. The test case shows that the architecture is suitable for its purpose.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Intelligent Ground Vehicle Competition (IGVC) is one of three, unmanned systems, student competitions that were founded by the Association for Unmanned Vehicle Systems International (AUVSI) in the 1990s. The IGVC is a multidisciplinary exercise in product realization that challenges college engineering student teams to integrate advanced control theory, machine vision, vehicular electronics and mobile platform fundamentals to design and build an unmanned system. Teams from around the world focus on developing a suite of dual-use technologies to equip ground vehicles of the future with intelligent driving capabilities. Over the past 16 years, the competition has challenged undergraduate, graduate and Ph.D. students with real world applications in intelligent transportation systems, the military and manufacturing automation. To date, teams from nearly 70 universities and colleges have participated. This paper describes some of the applications of the technologies required by this competition and discusses the educational benefits. The primary goal of the IGVC is to advance engineering education in intelligent vehicles and related technologies. The employment and professional networking opportunities created for students and industrial sponsors through a series of technical events over the four-day competition are highlighted. Finally, an assessment of the competition based on participation is presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To capture human pictures with good quality, auto focus, exposure and white-balance on human face areas is very
important. This paper presents a novel method to detect multi-view faces fast and accurately. It combines accurate 3-
level all-chain structure algorithm and fast skin color algorithm. The 3-level all-chain structure algorithm has 3 levels
and all the levels are linked from the top to the bottom. The level 1 rejects the non-face samples for all the views with
improved real-boosting method. The level 2 proposes a specially designed cascade structure with 2 sub levels to estimate
and verify the view class of face sample from coarse to fine. The level 3 is independent view verifier for each view.
Between neighboring levels(or sub levels), the sample classification confidence of previous level would be passed to
next level. Inner each level(or sub levels), the classification confidence of previous stage would be the first weak
classifier of next stage. It is because the previous classification result contains very useful information for current
situation. The fast skin color algorithm could remove the non-skin area with little computation, which makes the system
work much faster. The experimental result shows that this method is very efficient and it could correctly detect the multiview
human faces in real-time. It can also estimate the face view class at the same time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a method for the detection of faces (via skin regions) in images where faces may be low-resolution and no
assumptions are made about fine facial features being visible. This type of data is challenging because changes in appearance
of skin regions occur due to changes in both lighting and resolution. We present a non-parametric classification scheme based
on a histogram similarity measure. By comparing performance of commonly-used colour-spaces we find that the YIQ colour
space with 16 histogram bins (in both 1 and 2 dimensions) gives the most accurate performance over a wide range of imaging
conditions for non-parametric skin classification. We demonstrate better performance of the non-parametric approach vs.
colour thresholding and a Gaussian classifier. Face detection is subsequently achieved via a simple aspect-ratio and we show
results from indoor and outdoor scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
By using synthetic aperture methods for Ground Penetrating Radar (GPR), subsurface structural images are
reconstructed from spatial and temporal two-dimensional images that are known as B-Scope images. The spatial and
temporal coordinates in B-Scope images correspond to the horizontal position on the surface and the propagation time of
the reflected waveforms from the buried object. The synthetic aperture methods visualize buried objects by deconvolving
the B-Scope image with the transfer function of the reflected waveforms. Based on the characteristic that the transfer
function continuously changes with depth, the authors proposed an algorithm for suppressing the ill effect of the change
of the transfer function to enhance the reconstructed images. When applying the deconvolution of the B-Scope images
and the transfer function, the B-Scope images are divided into several sectors in the depth direction based on the amount
of the change of the transfer function, which is defined for respective sectors. Experimental results demonstrated the
effectiveness of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a method for efficiently determining qualitative depth maps for multiple monoscopic videos of the same scene
without explicitly solving for stereo or calibrating any of the cameras involved. By tracking a small number of feature points
and determining trajectory correspondence, it is possible to determine correct temporal alignment as well as establish a
similarity metric for fundamental matrices relating each trajectory. Modeling of matrix relations with a weighted digraph
and performing Markov clustering results in a determination of emergent depth layers for feature points. Finally, pixels
are segmented into depth layers based upon motion similarity to feature point trajectories. Initial experimental results are
demonstrated on stereo benchmark and consumer data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a novel approach for geometric alignment of 3D sensor data. The Iterative
Closest Point (ICP) algorithm is widely used for geometric alignment of 3D models as a
point-to-point matching method when an initial estimate of the relative pose is known.
However, the accuracy of the correspondence between point and point is difficult when the
points are sparsely distributed. In addition, the searching cost is high because the ICP
algorithm requires a search of the nearest-neighbor points at every minimization. In this paper,
we describe a plane-to-plane registration method. We define the distance between two planes
and estimate the translation parameter by minimizing the distance between the planes. The
plane-to-plane method is able to register the set of scatter points which are sparsely distributed
and the density is low with low cost. We tested this method with the large scatter points of a
manufacturing plant and show the effectiveness of our proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are many objects in the real world, especially, man made objects often having a polyhedral shape. Shape
from shading (SFS) is a well known and the most robust technique of Computer vision. SFS is a first order
nonlinear, ill-posed problem. The main idea for solving ill-posed problems is to restrict the class of admissible
solution by introducing suitable a priori knowledge. To overcome the ill-posedness in SFS techniques, Bayesian
estimation of geometrical constraints are used. The Lambertian reflectance model is used in this method due to
its wide applicability in SFS techniques. The priori or the constraints are represented in the form of probability
distribution function, so that the Bayesian approach can be applied. The Monte Carlo method is applied
for generating the sample fields from the distribution so that the model can represent our priori knowledge and
constraints. The optimal estimators are also computed by using Monte Carlo method. The geometric constraints
for lines and planes are used in probabilistic manner to eliminate the rank deficiency to get the unique solution.
In case of incorrect line drawings, it is not always possible to reconstruct the object shape uniquely. To deal with
this problem, we have processed each planar face separately. Hence, the proposed method is applicable in case
of slight error in computation of vertex positions in the images of polyhedral objects. The proposed method is
used on various synthetic and real images and satisfactory results are obtained.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The detection of pedestrians in real-world scenes is a daunting task, especially in crowded situations. Our experience
over the last years has shown that active shape models (ASM) can contribute significantly to a robust pedestrian
detection system.
The paper starts with an overview of shape model approaches, it then explains our approach which builds on top of
Eigenshape models which have been trained using real-world data. These models are placed over candidate regions and
matched to image gradients using a scoring function which integrates i) point distribution, ii) local gradient orientations
iii) local image gradient strengths. A matching and shape model update process is iteratively applied in order to fit the
flexible models to the local image content.
The weights of the scoring function have a significant impact on the ASM performance. We analyze different settings of
scoring weights for gradient magnitude, relative orientation differences, distance between model and gradient in an
experiment which uses real-world data. Although for only one pedestrian model in an image computation time is low, the
number of necessary processing cycles which is needed to track many people in crowded scenes can become the
bottleneck in a real-time application. We describe the measures which have been taken in order to improve the speed of
the ASM implementation and make it real-time capable.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multiple video cameras are cheaply installed overlooking an area of interest. While computerized single-camera
tracking is well-developed, multiple-camera tracking is a relatively new problem. The main multi-camera
problem is to give the same tracking label to all projections of a real-world target. This is called the
consistent labelling problem.
Khan and Shah (2003) introduced a method to use field of view lines to perform multiple-camera tracking.
The method creates inter-camera meta-target associations when objects enter at the scene edges. They also
said that a plane-induced homography could be used for tracking, but this method was not well described.
Their homography-based system would not work if targets use only one side of a camera to enter the scene.
This paper overcomes this limitation and fully describes a practical homography-based tracker.
A new method to find the feet feature is introduced. The method works especially well if the camera is
tilted, when using the bottom centre of the target's bounding-box would produce inaccurate results. The new
method is more accurate than the bounding-box method even when the camera is not tilted. Next, a method
is presented that uses a series of corresponding point pairs "dropped" by oblivious, live human targets to find
a plane-induced homography. The point pairs are created by tracking the feet locations of moving targets that
were associated using the field of view line method. Finally, a homography-based multiple-camera tracking
algorithm is introduced. Rules governing when to create the homography are specified. The algorithm ensures
that homography-based tracking only starts after a non-degenerate homography is found. The method works
when not all four field of view lines are discoverable; only one line needs to be found to use the algorithm. To
initialize the system, the operator must specify pairs of overlapping cameras. Aside from that, the algorithm
is fully automatic and uses the natural movement of live targets for training. No calibration is required.
Testing shows that the algorithm performs very well in real-world sequences. The consistent labelling
problem is solved, even for targets that appear via in-scene entrances. Full occlusions are handled. Although
implemented in Matlab, the multiple-camera tracking system runs at eight frames per second. A faster implementation
would be suitable for real-world use at typical video frame rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A mean shift algorithm has gained special attention in recent years due to its simplicity to enable real-time tracking.
However, the traditional mean shift tracking algorithm can fail to track target under occlusions. In this paper we propose
a novel technique which alleviates the limitation of mean shift tracking. Our algorithm employs the Kalman filter to
estimate the target dynamics information. Moreover, the proposed algorithm performs the background check process to
calculate the similarity which expresses how similar to target the background is. We then find the exact target position
combining the motion estimation by Kalman filter and the color based estimation by the mean shift algorithm based on
the similarity value. Therefore, the proposed algorithm can robustly track targets under several types of occlusion, while
the mean shift and mean shift-Kalman filter algorithms fail.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Q is an unmanned ground vehicle designed to compete in the Autonomous and Navigation Challenges of the AUVSI
Intelligent Ground Vehicle Competition (IGVC). Built on a base platform of a modified PerMobil Trax off-road wheel
chair frame, and running off a Dell Inspiron D820 laptop with an Intel t7400 Core 2 Duo Processor, Q gathers
information from a SICK laser range finder (LRF), video cameras, differential GPS, and digital compass to localize its
behavior and map out its navigational path. This behavior is handled by intelligent closed loop speed control and robust
sensor data processing algorithms. In the Autonomous challenge, data taken from two IEEE 1394 cameras and the LRF
are integrated and plotted on a custom-defined occupancy grid and converted into a histogram which is analyzed for
openings between obstacles. The image processing algorithm consists of a series of steps involving plane extraction,
normalizing of the image histogram for an effective dynamic thresholding, texture and morphological analysis and
particle filtering to allow optimum operation at varying ambient conditions. In the Navigation Challenge, a modified
Vector Field Histogram (VFH) algorithm is combined with an auto-regressive path planning model for obstacle
avoidance and better localization. Also, Q features the Joint Architecture for Unmanned Systems (JAUS) Level 3
compliance. All algorithms are developed and implemented using National Instruments (NI) hardware and LabVIEW
software. The paper will focus on explaining the various algorithms that make up Q's intelligence and the different ways
and modes of their implementation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The development of a vision system for an autonomous ground vehicle designed and constructed for the Intelligent
Ground Vehicle Competition (IGVC) is discussed. The requirements for the vision system of the autonomous vehicle
are explored via functional analysis considering the flows (materials, energies and signals) into the vehicle and the
changes required of each flow within the vehicle system. Functional analysis leads to a vision system based on a laser
range finder (LIDAR) and a camera. Input from the vision system is processed via a ray-casting algorithm whereby the
camera data and the LIDAR data are analyzed as a single array of points representing obstacle locations, which for the
IGVC, consist of white lines on the horizontal plane and construction markers on the vertical plane. Functional analysis
also leads to a multithreaded application where the ray-casting algorithm is a single thread of the vehicle's software,
which consists of multiple threads controlling motion, providing feedback, and processing the data from the camera and
LIDAR. LIDAR data is collected as distances and angles from the front of the vehicle to obstacles. Camera data is
processed using an adaptive threshold algorithm to identify color changes within the collected image; the image is also
corrected for camera angle distortion, adjusted to the global coordinate system, and processed using least-squares
method to identify white boundary lines. Our IGVC robot, MAX, is utilized as the continuous example for all methods
discussed in the paper. All testing and results provided are based on our IGVC robot, MAX, as well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present Kratos, an autonomous ground robot capable of static obstacle field navigation and
lane following. A sole color stereo camera provides all environmental data. We detect obstacles by generating a
3D point cloud and then searching for nearby points of differing heights, and represent the results as a cost map
of the environment. For lane detection we merge the output of a custom set of filters and iterate the RANSAC
algorithm to fit parabolas to lane markings. Kratos' state estimation is built on a square root central difference
Kalman filter, incorporating input from wheel odometry, a digital compass, and a GPS receiver. A 2D A* search
plans the straightest optimal path between Kratos' position and a target waypoint, taking vehicle geometry into
account. A novel C++ wrapper for Carnegie Mellon's IPC framework provides flexible communication between
all services. Testing showed that obstacle detection and path planning were highly effective at generating safe
paths through complicated obstacle fields, but that Kratos tended to brush obstacles due to the proportional law
control algorithm cutting turns. In addition, the lane detection algorithm made significant errors when only a
short stretch of a lane line was visible or when lighting conditions changed. Kratos ultimately earned first place
in the Design category of the Intelligent Ground Vehicle Competition, and third place overall.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Construction Equipment Robot Kit (CERK) Warfighter Experiment was conducted March 17-28, 2008 at Fort Leonard Wood, Missouri to assess if robotic construction equipment systems are an operationally effective and suitable means to support hasty route clearance and route remediation operations. This paper will present the findings of the Soldier testing of two different common-off-the-self (COTS) robotic kits installed on two different pieces of field construction equipment through a variety of operational vignettes. These vignettes were preformed during both day and night operations, with and without all weather gear on by Soldiers that are experienced operators of construction equipment. This paper will go into detailed analysis of the communications systems used between the robots and their operator control units (OCU) and the need to improve communications of both robotic kits. The results will show that these operations are possible to accomplish while removing the Soldier from harms way.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The operational ability to project and sustain forces in distant, anti-access and area denial environments poses new challenges for combatant commanders. One of the new challenges is the ability to conduct sustainment operations at operationally feasible times and places on the battlefield. Combatant commanders require a sustainment system that is agile, versatile, and survivable throughout the range of military operations and across the spectrum of conflict. A key component of conducting responsive, operationally feasible sustainment operations is the ability to conduct sustainment convoys. Sustainment convoys are critical to providing combatant commanders the right support, at the right time and place, and in the right quantities, across the full range of military operations. The ability to conduct sustainment convoys in a variety of hostile environments require force protection measures that address the enemy threat and protect the Soldier. One cost effective, technically feasible method of increasing the force protection for sustainment convoys is the use of robotic follower technology and autonomous navigation. The Convoy Active Safety Technologies (CAST) system is a driver assist, convoy autopilot technology aimed to address these issues. The CAST Warfigher Experiment II, being held at The Nevada Automotive Test Center in the fall of 2008, will continue analysis of the utility of this vehicle following technology not only in measures of system integrity and performance vs. manual driving, but also the physiological effects on the operators themselves. This paper will detail this experiment's methodology and analysis. Results will be presented at the SPIE Electronic Imaging 2009 symposium.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A mobile robot moving in an environment in which there are other moving objects and active agents, some of which
may represent threats and some of which may represent collaborators, needs to be able to reason about the potential
future behaviors of those objects and agents. In this paper we present an approach to tracking targets with complex
behavior, leveraging a 3D simulation engine to generate predicted imagery and comparing that against real imagery. We
introduce an approach to compare real and simulated imagery and present results using this approach to locate and track
objects with complex behaviors.
In this approach, the salient points in real and imaged images are identified and an affine image transformation that maps
the real scene to the synthetic scene is generated. An image difference operation is developed that ensures that the
matched points in both images produce a zero difference. In this way, synchronization differences are reduced and
content differences enhanced. A number of image pairs are processed and presented to illustrate the approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Scene Analysis, Localization, and Computer Vision I
In this paper, we propose a scene categorization method based on multi-scale category-specific visual words. The
proposed method quantizes visual words in a multi-scale manner which combines the global-feature-based and local-feature-
based scene categorization approaches into a uniform framework. Unlike traditional visual word creation
methods which quantize visual words from the whole training images without considering their categories, we form
visual words from the training images grouped in different categories then collate the visual words from different
categories to form the final codebook. This category-specific strategy provides us with more discriminative visual words
for scene categorization. Based on the codebook, we compile a feature vector that encodes the presence of different
visual words to represent a given image. A SVM classifier with linear kernel is then employed to select the features and
classify the images. The proposed method is evaluated over two scene classification datasets of 6,447 images altogether
using 10-fold cross-validation. The results show that the classification accuracy has been improved significantly
comparing with the methods using the traditional visual words. And the proposed method is comparable to the best
results published in the previous literatures in terms of classification accuracy rate and has the advantage in terms of
simplicity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we propose a method to build digital still cameras that can take pictures of a given scene with the knowledge of photographic experts, professional photographers. Photographic expert' knowledge means photographic experts' camera controls, i.e. shutter speed, aperture size, and ISO value for taking pictures of a given scene. For the implementation of photographic experts' knowledge we redefine the Scene Mode of currently commercially available digital cameras. For example instead of a single Night Scene Mode in conventional digital cameras, we break it into 76 scene modes with the Night Scene Representative Image Set. The idea of the night scene representative image set is the image set which can cover all the cases of night scene with respect to camera controls. Meanwhile to appropriate picture taking of all the complex night scene cases, each one of the scene representative image set comes along with corresponding photographic experts' camera controls such as shutter speed, aperture size, and ISO value. Initially our work pairs off a given scene with one of our redefined scene modes automatically, which is the realization of photographic experts' knowledge. With the scene representative set we use likelihood analysis for the given scene to detect whether it is within the boundary of the representative set or not. If the given scene is classified within the representative set it is proceeded to calculate the similarities with comparing the correlation coefficient between the given scene and each of the representative images. Finally the camera controls for the most similar one of the representative image set is used for taking picture of the given scene, with finer tuning with respect to the degree of the similarities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new fusion scheme for enhancing the result quality based on the combination of multiple different
detectors. We present a study showing the fusion of multiple video analysis detectors like "detecting unattended
luggage" in video sequences. One of the problems is the time jitter between different detectors, i.e. typically one system
can trigger an event several seconds before another one. Another issue is the computation of the adequate fusion of
realigned events. We propose a fusion system that overcomes these problems by being able (i) In the learning stage to
match off-line the ground truth events with the result of the detectors events using a dynamic programming scheme (ii)
To learn the relation between ground truth and result (iii) To fusion in real-time the events from different detectors
thanks to the learning stage in order to maximize the global quality of result. We show promising results by combining
outputs of different video analysis detector technologies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
'Fast and robust' are the most beautiful keywords in computer vision. Unfortunately they are in trade-off relationship.
We present a method to have one's cake and eat it using adaptive feature selections. Our chief insight is that it compares
reference patterns to query patterns, so that it selects smartly more important and useful features to find target. The
probabilities of pixels in the query to belong to the target are calculated from importancy of features. Our framework has
three distinct advantages: 1 - It saves computational cost dramatically to the conventional approach. This framework
makes it possible to find location of an object in real-time. 2 - It can smartly select robust features of a reference pattern
as adapting to a query pattern. 3- It has high flexibility on any feature. It doesn't matter which feature you may use. Lots
of color space, texture, motion features and other features can fit perfectly only if the features meet histogram criteria.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pattern matching between input and template images, which is carried out using Sum of Squared Differences (SSD), a
similarity value, has been widely used in various computer vision applications such as stereo measurements and superresolution
image syntheses. The crucial process in the pattern matching problem is estimating the translation of the input
image to match both images; a technique exists for improving the accuracy of the translation estimation at the subpixel
level. In addition, subpixel estimation accuracy is improved by synthetic template images that are assumed to represent
subpixel translated images using linear interpolation. However, calculation cost increases because the technique
necessitates additional SSD calculations for the synthetic template images. To eliminate the need for additional SSD
calculations, we found that we can obtain additional SSD values for the synthetic subpixel translated images by
calculating just the SSD values for the original template images: we never need additional SSD calculations. Moreover,
based on this knowledge, we proposed a novel algorithm for speeding up the estimation error cancellation (EEC) method
that was developed for estimating subpixel displacements in pattern matching. Experimental results demonstrated the
advantages of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Scene Analysis, Localization, and Computer Vision II
To achieve environments in which humans and mobile robots co-exist, technologies for recognizing hand gestures from
the video sequence acquired by a dynamic camera could be useful for human-to-robot interface systems. Most of
conventional hand gesture technologies deal with only still camera images. This paper proposes a very simple and stable
method for extracting hand motion trajectories based on the Human-Following Local Coordinate System (HFLC
System), which is obtained from the located human face and both hands. Then, we apply Condensation Algorithm to the
extracted hand trajectories so that the hand motion is recognized. We demonstrate the effectiveness of the proposed
method by conducting experiments on 35 kinds of sign language based hand gestures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
If the main features, or the skeleton (e.g., the corner points and the boundary lines,) of a 3D
moving object can be represented by an ND analog vector, then the whole history of movement
(rotation, translation, deformation, etc.) of this object can be described by an ND curve in the ND
state space. Each point on the curve corresponds to a snap-shot of the 3D object at a certain time
during the course of movement. We can approximate this ND curve by an ND broken line just
like the linearization of a 2D curve by a 2D broken line. But the linearization of a branch of an
ND curve is just to apply the ND convex operation to the two end points of this branch.
Therefore remembering all the end points (or all the extreme points) in the ND curve will allow
us to approximately reconstruct the ND curve, or the whole 3D object's moving history, by
means of the simple mathematical operation, the ND convex operation. Based on this ND
geometry principle, a very simple, yet very robust, and very accurate dynamic neural network
system (a computer graphic program) is proposed for recognizing any moving object not only by
its static images, but also by the special way this object moves.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The registration of correlation signals with high dynamic range leads to increase of recognition's accuracy and
robustness. Digital photo sensors with common Bayer colour filter array can be used for this purpose. In case
of quasimonochromatic illumination used in optical-digital correlator, it is possible to register correlation signals
with high dynamic range. For signal's registration it can be used not only colour channel, which corresponded
to the wavelength of illumination, but other colour channels too.
In this work the application of the spatially varying pixels exposures technique for obtaining linear high
dynamic range images of correlation signals from digital photo sensors with Bayer mosaic is presented. Bayer
colour filters array is considered as an array of attenuating filters in a quasimonochromatic light. Images are
reconstructed using information from all colour channels and correction coeficients that obtained at the preliminary
calibration step. The registered image of the correlation signal is mapped to the linear high dynamic
range image using a simple and eficient algorithm. Calibration procedure for correction coeficients obtaining is
described. Quantitative estimation of optical-digital correlator's accuracy is provided. Experimental results on
obtaining images of correlation signals with linear high dynamic range are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The main problem of now visual tracking algorithm is that the algorithm is lack of robustness, precision and speed. This
paper gives a visual tracking method based on dynamic object features extracting. First extract object features according
to the value of current frame image and build feature base. Then evaluate the recognition ability of every feature in
feature base using fisher criteria and select high-recognition features to generate object feature set. Dynamic adjust the
feature vectors of feature set according to the changes of environment object lie in. finally process visual tracking
adopting particle filter method using feature vectors of feature set. Experiments have proved that this method can
improve the tracking speed while assure tracking accuracy when lighting environment that moving objects lie in
changes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.